Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Functions of Several Real Variables

Download as pdf or txt
Download as pdf or txt
You are on page 1of 269

Functions of Several Real Variables1

Matı́as Raja

1 Curso de Grado en Matemáticas en la Universidad de Murcia 2021.


I practiced every night,
now I’m ready.
L. Cohen
Contents

Preface 5

1 Metric Spaces (for Analysis) 7


1.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Space of continuous functions . . . . . . . . . . . . . . . . . . . 15
1.6 Fractals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7 Rationale and remarks . . . . . . . . . . . . . . . . . . . . . . . 19
1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Normed Spaces 21
2.1 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Finite-dimensional normed spaces . . . . . . . . . . . . . . . . . 22
2.3 Linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Spaces of functions . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Rationale and remarks . . . . . . . . . . . . . . . . . . . . . . . 29
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Functions of several real variables: a starter 35


3.1 Graphical representation . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Genuine functions on Rn ? . . . . . . . . . . . . . . . . . . . . . 39
3.4 Rationale and remarks . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

1
4 Differentiable mappings 43
4.1 The basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Second order differentiability and more . . . . . . . . . . . . . . 49
4.4 Applications to extrema . . . . . . . . . . . . . . . . . . . . . . 53
4.5 Two applications to Algebra . . . . . . . . . . . . . . . . . . . . 56
4.5.1 The Fundamental Theorem of Algebra . . . . . . . . . . 56
4.5.2 Diagonalization of symmetric matrices . . . . . . . . . . 57
4.6 Rationale and remarks . . . . . . . . . . . . . . . . . . . . . . . 59
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 Theorems of the inverse mapping and implicit functions 63


5.1 Theorem of the inverse mapping . . . . . . . . . . . . . . . . . . 63
5.2 The implicit function theorem and smooth manifolds . . . . . . 67
5.3 Some applications . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3.1 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . 71
5.3.2 Functional dependence . . . . . . . . . . . . . . . . . . . 73
5.3.3 Envelope of a family of curves. . . . . . . . . . . . . . . . 75
5.4 Rationale and remarks . . . . . . . . . . . . . . . . . . . . . . . 76
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6 Riemann Integral 83
6.1 Rectangles and partitions . . . . . . . . . . . . . . . . . . . . . 83
6.2 Integrals on compact rectangles . . . . . . . . . . . . . . . . . . 85
6.3 Integrability and continuity points . . . . . . . . . . . . . . . . . 87
6.4 Integration on general domains . . . . . . . . . . . . . . . . . . 89
6.5 Iterated integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.6 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.7 Rationale and remarks . . . . . . . . . . . . . . . . . . . . . . . 96
6.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7 Change of Variables in Integration 99


7.1 Linear volume transformations . . . . . . . . . . . . . . . . . . . 99
7.2 The change of variables theorem . . . . . . . . . . . . . . . . . . 102
7.3 The Morse-Sard theorem . . . . . . . . . . . . . . . . . . . . . . 105
7.4 Brouwer fixed point theorem . . . . . . . . . . . . . . . . . . . . 107
7.5 Assorted changes of variables . . . . . . . . . . . . . . . . . . . 108
7.5.1 Sum of the inverse of the squared integers . . . . . . . . 108
7.5.2 Integrals of Euler . . . . . . . . . . . . . . . . . . . . . . 110

2
7.5.3 Integrals of Dirichlet . . . . . . . . . . . . . . . . . . . . 111
7.6 Rationale and remarks . . . . . . . . . . . . . . . . . . . . . . . 111
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

8 Measure Theory and Lebesgue Integral 113


8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.2 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.3 Construction of measures . . . . . . . . . . . . . . . . . . . . . . 119
8.4 Measurable functions . . . . . . . . . . . . . . . . . . . . . . . . 123
8.5 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.6 Approximation and topology . . . . . . . . . . . . . . . . . . . . 131
8.7 Product measures . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.8 Signed measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.9 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.10 Rationale and remarks . . . . . . . . . . . . . . . . . . . . . . . 148
8.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

9 Integration on curves and surfaces 153


9.1 Functions of bounded variation . . . . . . . . . . . . . . . . . . 153
9.2 Curves in normed spaces . . . . . . . . . . . . . . . . . . . . . . 155
9.3 Some formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.4 Integration with respect to the arc length . . . . . . . . . . . . . 158
9.5 Alternative parameterizations . . . . . . . . . . . . . . . . . . . 161
9.6 Another way to compute the length . . . . . . . . . . . . . . . . 162
9.7 Area of a C 1 surface with boundary . . . . . . . . . . . . . . . . 163
9.8 Alternative expressions for the area . . . . . . . . . . . . . . . . 166
9.9 Area measure and integration on surfaces . . . . . . . . . . . . . 169
9.10 Rationale and remarks . . . . . . . . . . . . . . . . . . . . . . . 172
9.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

10 Differential forms of low degree 177


10.1 Forms of degree 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 177
10.2 Integration of 1-forms on paths . . . . . . . . . . . . . . . . . . 178
10.3 The Green-Riemann formula . . . . . . . . . . . . . . . . . . . . 182
10.4 Forms of degree 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 186
10.5 Integration of 2-forms on surfaces . . . . . . . . . . . . . . . . . 190
10.6 Gauss and Stokes . . . . . . . . . . . . . . . . . . . . . . . . . . 192
10.7 Rationale and remarks . . . . . . . . . . . . . . . . . . . . . . . 197
10.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

3
11 Classic Vector Analysis 201
11.1 Operations with vectors in R3 . . . . . . . . . . . . . . . . . . . 201
11.2 Differential forms on R3 . . . . . . . . . . . . . . . . . . . . . . 203
11.3 Vector operators . . . . . . . . . . . . . . . . . . . . . . . . . . 204
11.4 Newtonian potential . . . . . . . . . . . . . . . . . . . . . . . . 208
11.5 Harmonic functions . . . . . . . . . . . . . . . . . . . . . . . . . 213
11.6 Vector Analysis in R2 . . . . . . . . . . . . . . . . . . . . . . . . 215
11.7 Assorted applications . . . . . . . . . . . . . . . . . . . . . . . . 218
11.7.1 Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 218
11.7.2 Hydrostatics . . . . . . . . . . . . . . . . . . . . . . . . . 219
11.7.3 Hydrodynamics . . . . . . . . . . . . . . . . . . . . . . . 221
11.7.4 Electromagnetic fields . . . . . . . . . . . . . . . . . . . 224
11.8 Rationale and remarks . . . . . . . . . . . . . . . . . . . . . . . 228
11.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

12 Appendix A: The Stone-Weierstrass theorem 233


12.1 General Topology . . . . . . . . . . . . . . . . . . . . . . . . . . 233
12.2 Approximation by continuous functions . . . . . . . . . . . . . . 235

13 Appendix B: Some properties of Lp spaces 239


13.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
13.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
13.3 Classification of Lp spaces and examples . . . . . . . . . . . . . 244
13.4 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
13.5 Uniform convexity of Lp (µ) for 1 < p < ∞ . . . . . . . . . . . . 250

14 Appendix C: Introduction to Lagrangian and Hamiltonian me-


chanics 253
14.1 Coordinates and speeds . . . . . . . . . . . . . . . . . . . . . . . 253
14.2 Forces, work and energy . . . . . . . . . . . . . . . . . . . . . . 255
14.3 Equations of movement . . . . . . . . . . . . . . . . . . . . . . . 256
14.4 The Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
14.5 The Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . 260

Bibliography 265

4
Preface

These lectures are based on quite a few years of teaching Functions of Sev-
eral Real Variables for the degree in Mathematics at the University of Murcia.
The purpose is to cover not only the usual topics, but a “little more” which
makes the difference with other courses, notably, the inclusion of many related
topics and nontrivial applications.

The corpus of Functions of Several Real Variables is often divided in two or


three subjects when it comes to teach it. However, I have decided to keep some
unity of the material, with the exception of the chapter on Measure Theory and
Lebesgue Integral that is independent somehow. The contents can be grouped
as follows:

1. Topological and linear background: chapters 1 and 2.


2. Differential Calculus: chapters 3, 4, and 5.

3. Integral Calculus: chapters 6, 7 and 8.


4. Integration on manifolds and Vector Analysis: chapters 9, 10 and 11.
Some comments are necessary. In case of withdrawing the Riemann integral,
it is possible to adapt the change of variables theorem (chapter 7) to Lebesgue
integral (chapter 8) without much trouble. As to the Integration on manifolds,
the techniques are addressed to objects of dimension less than 3 (curves and
surfaces). Therefore, the theory of differential forms is not fully developed, but
all the classic applications are covered. Every chapter has a section at the end
called Rationale and remarks where the point of view adopted in the chapter
is discussed and some further developments are proposed.

5
I have added three appendices at the end. Two of them complement infor-
mation on the most important spaces of functions that appear in the course:
the space of continuous functions over a compact C(K) and the spaces of inte-
grable functions Lp (µ). The results we have included (Stone-Weierstrass the-
orem, relations amongst the types of convergence for integrable functions. . . )
lie in a limbo between Real Analysis and Functional Analysis. The third ap-
pendix on Mechanics is an “experiment” based in the fact that is possible to
obtain the Lagrange’s equations of the movement from Newton’s laws and the
chain rule of Calculus.

The bibliography consists of the books I read as student to learn most


of the results here, and those I have consulted during the preparation of the
manuscript. I apologize for not giving precise references at all times.

An early version of this manual was presented as a Proyecto Docente for


the promotion of the author to Catedrático de Universidad on July 22, 2021.
The current version is updated every now and then with new examples and
less mistakes, happily. However, to achieve a fully satisfactory version, as
Edward E. Gibbon said, would require many years of health, of leisure and of
perseverance.

Murcia, Spring 2022.

6
Chapter 1

Metric Spaces (for Analysis)

1.1 Generalities
The basics on metric spaces are allegedly known by the students, so we will
get through this first section rather quickly.

A metric on a set M is a function d : M × M → [0, +∞) with these three


properties:
a) d(x, y) = 0 if and only if x = y;
b) d(x, y) = d(y, x) for any x, y ∈ M ;

c) d(x, y) ≤ d(x, z) + d(z, y).


A pair (M, d) consisting on a set and a metric on it is called metric space. The
open and closed balls with center x ∈ M and radius r > 0 are defined in this
way
B(x, r) = {y ∈ M : d(x, y) < r}
B[x, r] = {y ∈ M : d(x, y) ≤ r}
A set A ⊂ M is said open if for every x ∈ A there is some r > 0 such that
B(x, r) ⊂ A. The family of all the open sets of M is called its topology and
has the following properties:

a) ∅ and M are open;


b) arbitrary unions of open sets give open sets;

7
c) finite intersections of open sets give open sets.

Statements on metric spaces that can be formulated in terms of open sets (the
topology) are called topological. For instance, we may define convergence of
sequences in a metric space as follows (xn ) ⊂ M is converging to x ∈ M if
limn d(xn , x) = 0. Apparently, the definition strongly uses the metric, however
convergence of sequences is a topological notion actually, because it can be
equivalently formulated as: for every U ∋ x open there is nU such that xn ∈ U
whenever n ≥ nU . We say that x is a cluster point of a sequence (xn ) if there
is a subsequence (xnk ) with limit x. A cluster point of an infinite set is the
limit of a sequence of different point from the set.

A set A ⊂ M is said closed if its complement is an open set. It follows that


∅ and M are closed as well, closed sets are stable by finite unions and arbitrary
intersections. The interior of a set A denoted A◦ is the largest open contained
in a set A, and the closure of A denoted A is the smaller closed set containing
A. Of course, we have the duality formula M \ A◦ = M \ A. Moreover, we
have this useful characterization in terms of convergent sequences.

Proposition 1.1.1. The closure A of a set A ⊂ M is the set of limits of all


the sequences contained in A which are convergent. In particular, a set A is
closed if and only if the limits of sequences of points from A remain in A.
Given two metric spaces (M1 , d1 ) and (M2 , d2 ) and a map f : M1 → M2
we say that f is continuous at x ∈ M1 is for every ε > 0 there is δ > 0 such
that for y ∈ M1 and d1 (x, y) < δ then d2 (f (y), f (x)) < ε. It is not diffi-
cult to check that continuity is a topological property. Note that continuity
at one point is characterized by means of sequences: f is continuous at x if
limn d2 (f (xn ), f (x)) = 0 for every sequence (xn ) ⊂ M1 converging to x. A map
f : M1 → M2 is said continuous if it is continuous at every point of M1 . Note
that continuity of f is equivalent to say that f −1 (U ) is open whenever U ∈ M2
is open. With the same notation, the map f is said uniformly continuous if
for every ε > 0 there is δ > 0 such that for x, y ∈ M1 then d2 (f (y), f (x)) < ε
(note the change of position of a quantifier). A particular important case of
uniformly continuous maps those satisfying the Lipschitz condition. We say
that f is Lipschitz if there is some λ > 0 such that d2 (f (y), f (x)) ≤ λ d1 (x, y).

Examples of real functions are provided by the distance functions to sets:


for A ⊂ M set d(x, A) = inf{d(x, y) : y ∈ A}. Indeed, continuity follows easily
from |d(x, A) − d(y, A)| ≤ d(x, y). Given two disjoint closed sets A, B ⊂ M ,

8
the continuous function
d(x, A)
f (x) =
d(x, A) + d(x, B)
satisfies that f (x) ∈ [0, 1], A = f −1 (0) and B = f −1 (1).

Given two metrics d1 and d2 on the same set M , we say that d1 is finer
than d2 (equivalently, d2 is coarser than d1 ) if any open set with respect to
d2 is also open with respect to d1 . Note that this is equivalent to the conti-
nuity of the identity map Id : (M, d1 ) → (M, d2 ). The two metrics on M are
said equivalent if they produce the same topology, that is, the identity map
is continuous forth and back (topological homeomorphism). Given a metric
space (M, d), we may always suppose that the metric is bounded just taking
the equivalent metric d1 (x, y) = min{1, d(x, y)}.

A very useful operation with metric spaces (and more general topological
spaces) is the product. Let (M1 , d1 ) and (M2 , d2 ) be metric spaces. We can
endow M1 × M2 with the metric defined by
d((x1 , x2 ), (y1 , y2 )) = d1 (x1 , y1 ) + d2 (x2 , y2 ).
That operation can be extended to more finitely many factors. In the particular
case of the product of “copies” of R, it is not difficult to check that the product
metric is equivalent to the Euclidean distance. We may even consider countable
many factors {(Mn , dn )}n∈N . In that case, define the metric by a series

X
d((xn ), (yn )) = 2−n dn (xn , yn )
n=1

where dn is an equivalent metric on Mn bounded by 1.

1.2 Separability
A subset A ⊂ M is said to dense if A = M . A metric space is said to be
separable if it contains a countable dense set {xn : n ∈ N}. Note that in such a
case, the collection of balls {B(xn , 1/m) : n, m ∈ N} is a countable base of the
topology, that is, every open set can be expressed as a union of balls from that
collection. A metric (or more generally, topological) space is said Lindelöf if
every cover of the space by open sets has a countable subcover. With all these
definitions we have the following.

9
Proposition 1.2.1. A metric space is separable if and only if it is Lindelöf.

Proof. If M is separable, there is a countable base (Bn ). For every x ∈ M


there is an open set U form the cover such that x ∈ U , and by the definition
of base, there is n ∈ N such that x ∈ Bn ⊂ U . Doing this operation for all
the points in M involves only a countable number of sets from (Bn ). The
collection of open supersets is a countable cover of M . On the other hand,
assume M is Lindelöf, and for m ∈ N take the cover {B(x, 1/m) : x ∈ M },
which turns out to have a countable subfamily covering M . Let (xn,m ) be the
collection of centres of the balls for the countable subcovering. By construc-
tion, {xn,m : n, m ∈ N} is a dense set.

Let ε > 0. A set A ⊂ M is said ε-discrete if d(x, y) ≥ ε for any x, y ∈ A


with x ̸= y. The set is said (metrically) discrete if it is ε-discrete for some
ε > 0. We have the following.
Proposition 1.2.2. A metric space is separable if and only if it not contains
an uncountable discrete set.
Proof. If M contains an ε-discrete uncountable set A, then {B(x, ε/3) : x ∈
A} is a disjoint uncountable collection of balls. A countable subset of M can-
not meet all those balls by cardinality, so that set cannot be dense. In the
other hand, assume that discrete sets are countable. Given ε > 0 there is a
maximal ε-discrete set Aε . Such a set is countable and has the property that
S x ∈ M there is y ∈ Aε such that d(x, y) < ε. Taking ε = 1/n we get
for any
that ∞ n=1 A1/n is dense.

Separability implies that ε-discrete sets are countable. We say that the
metric space M is totally bounded if all the discrete sets are finite. We will call
ε-net to a maximal ε-discrete set.

1.3 Completeness
A sequence (xn ) is said Cauchy if for every ε > 0 there is N ∈ N such that
d(xn , xm ) < ε whenever n, m ≥ N (equivalently, limn,m d(xn , xm ) = 0). The
reader could establish these easy facts: every convergent sequence is Cauchy;
a Cauchy sequence with a cluster point must be convergent. A metric space is
said complete if every Cauchy sequence is convergent. The notion of complete-
ness is non topological. Observe that (−π/2, π/2) is not complete with the

10
usual metric on R but the metric d(x, y) = | tan x − tan y| makes its complete.

A useful observation is that completeness is inherited by closed subsets and


products (with the standard product metric). On the other hand, a subset of a
metric space which is complete with respect when endowed with the restricted
metric must be closed in the overspace.
Proposition 1.3.1 (Cantor). A metric space M is complete if and only if
T ∞
n=1 Fn ̸= ∅ whenever (Fn ) is a decreasing sequence of nonempty
T closed sets
of M such that limn diam(Fn ) = 0. In such a case, we have ∞ n=1 Fn = {x}
for some x ∈ M .

Proof. Observe that the hypothesis implies that (xn ) is a Cauchy sequence
for any choice Tof xn ∈ Fn . If the space is complete then limn xn = x and
clearly {x} = ∞ n=1 Fn . On the other hand, if M is not complete then there
is a Cauchy sequence (xn ) with no limit. Since (xn ) cannot have cluster
points, the sets Fn = {xk : k ≥ n}T are closed. Cauchy property implies
that limn diam(Fn ) = 0, but we have ∞ n=1 Fn = ∅.

The following is the celebrated Baire’s theorem.


Theorem 1.3.2 (Baire). Let MTbe a complete metric space. If (Un ) a sequence
of dense open sets of M , then ∞ n=1 Un is dense.

Proof. Denseness of n=1 Un is equivalent to U ∩ ∞


T∞ T
n=1 Un ̸= ∅ for every
nonempty open U . Fix U ⊂ M a nonempty open set. Since U1 is dense,
U ∩ U1 ̸= ∅ and there are x1 ∈ M and r1 ≤ 1 such that B[x1 , r1 ] ⊂ U ∩ U1 .
Again B(x1 , r1 )∩U2 ̸= ∅ so there are x2 ∈ M and r2 ≤ 1/2 such that B[x2 , r2 ] ⊂
B(x1 , r1 ) ∩ U1 . Proceeding in this way we may have sequences (xn ) ⊂ M and
(rn ) ⊂ R+ such that B[xn , rn ] ⊂ U ∩ Un and

B[x1 , r1 ] ⊃ B[x2 , r2 ] ⊃ · · · ⊃ B[xn , rn ] ⊃ . . .


T∞
thus by T
the previous proposition n=1 B[xn , rn ] = {x}. By construction
∞ T ∞
x ∈ U ∩ n=1 Un and so U ∩ n=1 Un ̸= ∅.

Baire’s theorem is sometimes preferred in this equivalent form.

Corollary 1.3.3. Let M be a complete


S∞ metric space and (Fn ) a sequence of
closed subsets of M such that M = n=1 Fn . There there is n ∈ N such that
Fn has nonempty interior.

11
The following is Banach’s fixed point for contractive mappings.

Theorem 1.3.4 (Banach). Let M be a complete metric space and f : M → M


a map such that there is λ < 1 such that

d(f (x), f (y)) ≤ λ d(x, y)

Then there is unique point x ∈ M such that f (x) = x (a fixed point for f ).
Moreover, whenever x1 ∈ M is chosen, the sequence defined inductively by
xn = f (xn−1 ) for n ≥ 2 is converging to x.
Proof. If y ∈ M is another fixed point for f and y ̸= x, then we have

d(x, y) = d(f (x), f (y)) ≤ λ d(x, y) < d(x, y)

which is a contradiction proving the uniqueness.


Now, for an arbitrary chosen x1 ∈ M , consider the sequence (xn ) recursively
generated as in the statement. Note that in case (xn ) converges to some point
x ∈ M , then it is a fixed point. Indeed

f (x) = lim f (xn ) = lim xn+1 = x.


n n

It just remains to prove that (xn ) is converging, and this will be done checking
that (xn ) is Cauchy. For that aim, firstly observe that

d(xn , xn−1 ) = d(f (xn−1 ), f (xn−2 )) ≤ λ d(xn−1 , xn−2 ).

Recursively we have
d(xn , xn−1 ) ≤ λn−2 d(x2 , x1 ).
Triangle inequality gives us for n > m ≥ 1 that

d(xn , xm ) ≤ d(xn , xn−1 ) + · · · + d(xm+1 , xm )

d(x2 , x1 ) m−1
≤ (λn−2 + · · · + λm−1 ) d(x2 , x1 ) ≤ λ .
1−λ
The inequality clearly implies that (xn ) is Cauchy.

12
1.4 Compactness
The compactness is one of the most important properties for Analysis.
Definition 1.4.1. A topological space is said to be compact if any open cover
has a finite subcover.
Passing to complement sets compactness is equivalent to the following prop-
erty: a family of closed sets has nonempty intersection whenever it has the fi-
nite intersection property, that is, if any of its finite subfamilies have nonempty
intersection. Note as well that compactness is preserved by continuous maps.
Proposition 1.4.2. “Compactness versus countable compactness”.
1. In a compact topological space any infinite subset has a cluster point.

2. If topological space satisfies that any infinite subset has a cluster point,
then any countable open cover has a finite subcover.

Proof. Suppose that the infinite set A ⊂ X has not cluster points. Then, for
any subset B ⊂ A the set A \ B is closed. Now note that

{A \ B : A ⊃ B is finite}

is a family of closed subsets with the finite intersection property and empty
intersection.
For the second statement it is enough to show that a decreasing sequence (Fn )
of nonempty closed subsets of X has nonempty intersection. Indeed, take a
point xnT∈ Fn . If (xn ) is finite, then x = xn for infinitely many n ∈ N and
so x ∈ ∞ n=1 Fn . Otherwise, (xn ) is infinite and thus it has a clusterTpoint x
which is a cluster point of anyT∞set {xk : k ≥ n} ⊂ Fn . Therefore x ∈ ∞ n=1 Fn .
In any case, the intersection n=1 Fn is nonempty.
Proposition 1.4.3. For a metric space M , the following are equivalent:

(i) M is compact;
(ii) any sequence in M has a convergent subsequence;
(iii) M is complete and totally bounded.

13
Proof. (i)⇒(ii) Clearly we may assume that the sequence has infinitely many
points, and so it has a cluster point as infinite set. In a metric space, a cluster
point of a sequence is the limit of some of its subsequences.
(ii)⇒(i) Note that any infinite subset of M has a cluster point. Note as well
that the metric space M must be separable since otherwise it would contain
an uncountable metrically discrete subset, and any sequence of different points
made from that set has no convergent subsequence. Now M is Lindelöf, any
open cover has a countable subcover. By previous result, this countable cover
has further a finite subcover.
(i)+(ii)⇒(iii) Given ε > 0, then {B(x, ε) : x ∈ M } is an open cover of M ,
which has a finite subcover of the form {B(xk , ε) : 1 ≤ k ≤ n}. Clearly,
{xk : 1 ≤ k ≤ n} is a finite ε-net of M . Given a Cauchy sequence (xn ), it
has a convergent subsequence with limit x ∈ M . That implies that the whole
sequence (xn ) is converging to x.
(iii)⇒(ii) Suppose we are given a sequence (xn ) ⊂ M . Since M is covered by
finitely many balls of radius 1/2, there is at least one that contains infinitely
many terms of the sequence (xn ), that is, there is A1 ⊂ N infinite such that
d(xn , xm ) ≤ 1 for n, m ∈ A1 . With the same argument, we can find A2 ⊂ A1
infinite such that d(xn , xm ) ≤ 1/2 for n, m ∈ A2 . Proceeding in this way, we
will have A1 ⊃ A2 ⊃ · · · ⊃ Ak ⊃ . . . all infinite such that if xn , xm ∈ Ak then
d(xn , xm ) ≤ 1/2k . Now, we may take inductively n1 < n2 < · · · < nk < . . .
such that xnk ∈ Ak , the construction shows that (xnk ) is a Cauchy sequence,
which should be convergent by the completeness of M .

The characterization of compactness in Rn follows straight. For the time


being, Rn is endowed with the Euclidean metric.
Corollary 1.4.4 (Heine-Borel). A subset of Rn is compact if and only if it is
bounded and closed.
Proof. Note that being Rn complete, the crux of the proof is to prove that
in Rn boundedness and total boundedness is the same, which is essentially
reduced to the Archimedean property of R.

Let us put together two classic results.


Theorem 1.4.5 (Heine - Weierstrass). A real function defined on a compact
metric space is bounded, uniformly continuous and attains its maximum and
its minimum.

14
Proof. The readers should be able to prove that by theirselves.

Next results provides the so called Lebesgue’s number of a covering.


Proposition 1.4.6 (Lebesgue). Let (M, d) be a metric compact space and let
{Ui }i∈I be an open cover of M . Then there is ξ > 0 such that for any x ∈ M
there is some i ∈ I such that B(x, ξ) ⊂ Ui .
Proof. We may assume without loss of generality that the cover is finite. The
functions fi (x) = d(x, M \ Ui ) are continuous and fi (x) > 0 if and only if
x ∈ Ui . Therefore, the function f (x) = min{fi (x) : i ∈ I} is strictly positive
on M . Let ξ > 0 be the infimum value of f on M . A direct computation shows
that ξ has the desired property.

The product of compact spaces is compact in a wide topological context.


Here we have the following which will be enough for some applications. Re-
call that the product topology is the coarser for which the projections are
continuous.
Proposition 1.4.7. The finite or countable product of compact metric spaces
is also metrizable and compact with the product topology.
Hint of proof. For the finite case it is enough to consider the product of two
spaces (M1 , d1 ) and (M2 , d2 ). Endow M1 × M2 with the metric
d((x1 , x2 ), (y1 , y2 )) = d1 (x1 , y1 ) + d2 (x2 , y2 ).
With the topology associated to that metric the projections are continuous and
a the diagonal method shows that any sequence has a convergent subsequence.
A coarser topology coincides with that one. The infinite product case is done
in similar way but with a metric defined by a series

X
d((xn ), (yn )) = 2−n dn (xn , yn )
n=1

where dn is an equivalent metric on Mn bounded by 1. Again, compactness of


the metric topology can be show by a diagonal method.

1.5 Space of continuous functions


Take C(K) the set of continuous real functions defined on K and define ∥f ∥∞ =
sup{|f (x)| : x ∈ K} < +∞ as the infimum is attained. Endow C(K) with the
metric d(f, g) = ∥f − g∥∞ called the uniform metric.

15
Proposition 1.5.1. The space (C(K), ∥ · ∥∞ ) is complete.

Proof. If (fn ) is a Cauchy sequence in C(K), then (fn (x)) is a convergent


sequence in R for every x ∈ K thus defining a real function by the formula
f (x) = limn fn (x). We claim that f is continuous. Indeed, given ε > 0 take
N ∈ N such that if n, m ≥ N then ∥fn − fm ∥∞ < ε/3. Then

|fn (x) − f (x)| = lim |fn (x) − fm (x)| ≤ ε/3


m

for every n ≥ N . Fix U ∋ x neighborhood such that |fN (y) − fN (x)| < ε/3 if
y ∈ U . Triangle inequality gives that |f (y) − f (x)| < ε for y ∈ U . Finally, the
above inequality also gives that ∥fn − f ∥∞ ≤ ε/3 for any n ≥ N which implies
the convergence in the uniform distance to f of (fn ).

The relation of pointwise and uniform convergence is delicate. The follow-


ing is a quite useful result.

Theorem 1.5.2 (Dini). Let (fn ) ⊂ C(K) a sequence of functions that con-
verges to some f ∈ C(K). If the sequence (fn ) is monotone (increasing or
decreasing), then (fn ) converges uniformly to f .
Hint of proof. Assume that (fn ) is decreasing, for instance. Then fix ε > 0
and prove that the sequence of sets

Un = {x : fn (x) − f (x) > ε}

is an open cover of K.

Finally, we will prove this useful characterization of compactness for sets


of continuous functions.
Theorem 1.5.3 (Arzelà-Ascoli). A subset A ⊂ C(K) is compact if and only
if its closed, bounded and equicontinuous.
Proof. Obviously, compactness of A implies it is closed and bounded, to see
that A is equicontinuous we will use its total boundedness. Given ε > 0 take
{fk }nk=1 ⊂ A a ε/3-net of A. Fixed x ∈ K, we may find U ∋ x open such that
maxk |fk (y) − fk (x)| < ε/3 whenever y ∈ U . Triangle inequality implies that
|f (y) − f (x)| < ε for any f ∈ A and x ∈ U .
If A is equicontinuous, given ε > 0 then for every x ∈ K there is Ux ∋ x open
such that |f (y) − f (x)| < ε/3 whenever f ∈ A and y ∈ Ux . Note that {Ux }x∈K

16
Sn
is an open cover of K, so there are points {xk }nk=1 such that K = k=1 Uxk .
Since A is bounded, the set

{(f (x1 ), f (x2 ), . . . , f (xn )) : f ∈ A}

is bounded in Rn and therefore it has a ε/3-net (with the maximum distance)


that we will denote (λi )N i=1 where λi = (λi (1), . . . , λi (n)). Take fi ∈ A such that
fi (xk ) = λi (k) for every i ∈ {1, . . . , N }. We claim that {fi }N i=1 is a ε-net for A.
Indeed, given f ∈ A, there is i such that |f (xk ) − fi (xk )| = |f (xk ) − λi (k)| < ε
for every k ∈ {1, . . . , n}. For any x ∈ K, there is some k such that x ∈ Uxk .
Therefore |f (x) − f (xk )| < ε/3 and |fi (x) − fi (xk )| < ε/3. Triangle inequality
gives that |f (x) − fi (x)| < ε for arbitrary x ∈ K, that is, the ε-net property.

1.6 Fractals
Let (M, d) be a complete metric space and denote by K(M ) the set of nonempty
compact subsets of M . For A ∈ K(M ) and r > 0 define a closed “neighbour-
hood” as
D[A, r] = {x ∈ M : d(A, x) ≤ r}.
And now a distance between elements from K(M ) by

d(A, B) = inf{r > 0 : A ⊂ D[B, r], B ⊂ D[A, r]}

where A, B ∈ K(M ). There is no problem in using d for the distance on


K(M ) since it extends the metric of M considering the points as (compact)
singletons. It is easy to verify that d is a metric in K(M ). Indeed, clearly the
less evident fact is the triangle inequality. If A, B, C ∈ K(M ) and ε > 0 we
may find r < d(A, B) + ε and s < d(B, C) + ε such that

A ⊂ D(B, r) and B ⊂ D(C, s).

That implies A ⊂ D(C, r + s). The reverse containment is obtained likewise,


thus
d(A, C) ≤ r + s ≤ d(A, B) + d(B, C) + 2ε
that proves the claim as ε was arbitrary. The metric d is known as the Haus-
dorff metric.

Our objective is the following,

17
Theorem 1.6.1. If M is a complete metric space, then (K(M ), d) is complete.
Proof. Consider a Cauchy sequence (An ) ⊂ K(M ). We claim that for any
choice xn ∈ An , the sequence (xn ) has a cluster point. Indeed, fix ε > 0 and let
nε such that if n ≥ nε then An ⊂ D[Anε , ε], and thus (xn ) ⊂ D[Anε , ε] except
finitely many terms. It is clear that D[Anε , ε] can be covered by finitely many
balls of radius (3/2)ε and infinitely many xn ’s are inside of one of those balls.
Therefore d(xnk , xnj ) ≤ 3ε for some subsequence (xnk ). This selection process
applied for ε = 1/m and further diagonal argument will produce a Cauchy
subsequence of (xn )
Let A ⊂ M be the set of all the cluster points of sequences obtained as before.
Note that if x ∈ A, we can take xn ∈ An such that (xn ) converges to x. Note
as well that A has to be closed since any cluster point of A can be reached by
a suitable diagonal choice. Now, for ε > 0 note that A ⊂ D(An , ε) and n large
enough. That implies that A is totally bounded, and therefore A ∈ K(M ).
Also implies that A is “half limit” of (An ). In order to complete, the proof we
have to prove that An ⊂ D(A, ε) for n large. If it is not the case, we can take
xn ∈ An such that d(A, x) ≥ ε for infinitely many n’s. That would produce a
cluster point x such that d(A, x) ≥ ε. On the other hand, by definition of A
we have x ∈ A. The contradiction proves the theorem.

Observe that if f : M → M is contractive, then the induced map f :


K(M ) → K(M ) just taking A → f (A) is also contractive. Indeed, if λ ∈ (0, 1)
is the contraction ratio, then
f (D[A, r]) ⊂ D[f (A), λr]
which implies d(f (A), f (B)) ≤ λd(A, B), that is, f is contractive in K(M )
too. Since a contractive map can have only a fixed point, that one has to be
the singleton fixed by f in M . Nevertheless, on K(M ) we can perform other
operations with maps.
Proposition 1.6.2 (Iterated function system). Let f1 , . . . , fn be contractive
maps on M , and define f : K(M ) → K(M ) by
f (A) = f1 (A) ∪ · · · ∪ fn (A).
Then f is contractive on K(M ), and therefore f has a fixed point.
Proof. Indeed, if λ is the maximum of the contraction ratios we still have the
containment
[n n
[
f (D[A, r]) = fk (D[A, r]) ⊂ D[fk (A), λr]) = D[f (A), λr]
k=1 k=1

18
which implies the contractivity of f .

Taking f1 , . . . , fn affine and contractive on R2 we can obtain several self-


similar typical fractals. Look up on Google for the Barnsley fern algorithm.

1.7 Rationale and remarks


Most likely the students already know some metric topology, so it is not neces-
sary to stress on bizarre examples. The idea is to recall the role and importance
of completeness and compactness. A curious application of Baire’s theorem,
the construction of fractals or the characterization of compactness on C[a, b]
can help the students to take topology more seriously.

1.8 Exercises
1. Prove that closed balls are closed sets, and open balls are open sets.
2. Prove that uniformly continuous maps between metric spaces preserve
Cauchy sequences.
3. The distance d(A, x) from a point x to a set A ∈ M in a metric space
is defined by d(A, x) = inf{d(y, x) : y ∈ A}. Prove the following state-
ments:
(a) |d(A, x) − d(A, y)| ≤ d(x, y),
(b) A = {x ∈ X : d(A, x) = 0},
(c) d(A, x) ≤ d(B, x) if and only if B ⊂ A.
4. Prove with the help of Dini’s theorem that the uniform convergence on
bounded subsets of the sequence
 x n
fn (x) = 1 + .
n

5. Define inductively a sequence of functions on [0, 1] by f1 (x) = 0 and


1 2
√(x) = fn (x) + 2 (x − fn (x) ). Prove that (fn (x)) uniformly converges
fn+1
to x. Deduce as a consequence that the function |x| can be uniformly
approached by polynomials on bounded intervals of R.

19
6. Prove that separability of a metric space is hereditary to subsets.

7. A set in a metric space is said to be perfect if it has no isolated point.


Given a set A ⊂ M , a point x ∈ A is said of condensation if every of its
neighbourhoods meets A at an uncountable set. Assume that the metric
space M is separable. Show that for every A ⊂ M , the subset of the non-
condensation points of A is countable, and the subset of condensation
points is perfect. Deduce that M can be expressed as the union of a
perfect set and a countable set.
8. Let M, N be complete metric spaces, D ⊂ M a dense subset and f :
D → N a uniformly continuous function. Prove that f can be extended
to a unique uniformly continuous function f˜ : M → N . Show that if f
T∞the extension can be done to a set B ⊂ A such
is only continuous, then
that B ⊂ A and B = n=1 Un being (Un ) a sequence of open subsets of
M.
9. Let f ∈ C ∞ (R) be a function such that for every x ∈ R there is n ∈ N
such that f (n) (x) = 0. Prove that f is a polynomial.
10. Prove the following abstract version of the of Cantor diagonal method.
Let (An ) be a decreasing sequence of infinite subsets of N. Then there is
an infinite subset A ⊂ N such that A \ An is finite for all n ∈ N.
11. Prove that if M is a metric compact, then K(M ) is compact too.

20
Chapter 2

Normed Spaces

2.1 Norms
A basic notion in Analysis is the notion of normed space, which is just a vector
space together a norm. Let X be a vector space (either on R or C, say K). A
function ∥ · ∥ : X → [0, +∞) is called a norm if:
1. ∥x + y∥ ≤ ∥x∥ + ∥y∥ for all x, y ∈ X;
2. ∥λx∥ = |λ|∥x∥ for all x ∈ X, λ ∈ K;
3. ∥x∥ = 0 if and only if x = 0.
A norm induces a distance on X by means of d(x, y) = ∥x − y∥, and
that provides a topological structure, as a metric space. There are several
weakenings of the notion of norm which are also interesting, see the section
“Complements”. Sometimes, we use (X, ∥ · ∥) to denote a normed space, how-
ever that is not necessary when the norm we are dealing with is understood.

From now on we will focused on real normed spaces. The notation for open
and closed balls will be the same that within metric spaces, however we will
distinguish the unit ball

BX := B[0, 1] = {x ∈ X : ∥x∥ ≤ 1}.

All the closed balls in X can be obtained by translation and scaling of BX .


Note that the unit sphere

SX := {x ∈ X : ∥x∥ = 1}

21
is the topological boundary of BX and so the interior of BX is exactly B(0, 1).
That is not generally true in metric spaces.

Two norms ∥ · ∥1 and ∥ · ∥2 on the same vector space are said equivalent is
they generate the same topology. A nice consequence of the similarity of balls
is the nice characterization of the equivalence of norms.
Proposition 2.1.1. Let X be a vector space and let ∥ · ∥1 and ∥ · ∥2 be two
norms on X. Then the norms are equivalent if and only if there are constants
α, β > 0 such that
α∥x∥1 ≤ ∥x∥2 ≤ β∥x∥1
for all x ∈ X.
The notion of completeness has several nuances. Firstly, a consequence of
the former Proposition.
Corollary 2.1.2. The completeness (or its absence) of a normed space is
invariant among the equivalent norms.

P∞Now we will consider series in normed spaces. As in the real case, a series
n=1 xn is just a symbolicPnexpression. The series is said to be convergent
if the partial sums sn = k=1 xk converge to some element in X called the
sum of the series. We say that a series is unconditionally convergent if any
rearrangement
P∞ of its terms is convergent with Pthe same sum. Finally, a series

x
n=1 n is said to be absolutely convergent if n=1 ∥xn ∥ < +∞. Despite the
name, absolute convergence does not always imply convergence.

Proposition 2.1.3. A normed space (X, ∥ · ∥) is complete if and only if every


absolutely convergent series is convergent. In such a case, the series will also
be unconditionally convergent.

2.2 Finite-dimensional normed spaces


A vector space X of finite dimension n is algebraically isomorphic to Rn (or
Cn in the complex case, that we will not consider here). The isomorphism is
determined by fixing a basis {e1 , . . . , en } and after that we may consider as
defined on X any of the functions on Rn , in particular any of the standard
norms. A key fact that we will use is that the subsets of Rn which are closed
and bounded with respect the Euclidean norm are compact.

22
Theorem 2.2.1. All the norms on a finite dimensional space X are equivalent.
Proof. We will denote by ∥ · ∥2 the Euclidean norm on X given by the
isomorphism associated to a basis {e1 , . . . , en }. Let ∥ · ∥ be an arbitrary norm
on X. We will show that ∥ · ∥ is continuous as a function on (X, ∥ · ∥2 ). Indeed,
note that
∥x∥ = ∥λ1 e1 + · · · + λn en ∥ ≤ |λ1 |∥e1 ∥ + · · · + |λn |∥en ∥
≤ (|λ1 |2 + · · · + |λn |2 )1/2 (∥e1 ∥2 + · · · + ∥en ∥2 )1/2 = c∥x∥2
by the Cauchy-Schwarz inequality and taking c = (∥e1 ∥2 + · · · + ∥en ∥2 )1/2 .
Now, we have
|∥x∥ − ∥y∥| ≤ ∥x − y∥ ≤ c∥x − y∥2
that means that ∥ · ∥ is Lipschitz (with constant c) with respect to ∥ · ∥2 ,
and thus continuous as wanted. Let α and β the minimum and maximum
respectively of ∥ · ∥ on the set S = {x ∈ X : ∥x∥2 = 1}. We have α > 0 since
∥ · ∥ is a norm and S does not contain 0. If x ∈ X \ {0} then x/∥x∥2 ∈ S and
thus
x
α≤ ≤β
∥x∥2
and so
α∥x∥2 ≤ ∥x∥ ≤ β∥x∥2
which is the desired equivalence.

Once we know that any norm on Rn is equivalent to the Euclidean norm


∥ · ∥2 , we have freedom to work with alternative norms that do not involve
square roots, such as
∥(x1 , . . . , xn )∥1 := |x1 | + · · · + |xn |,
∥(x1 , . . . , xn )∥∞ := max{|x1 |, . . . , |xn |}.
Corollary 2.2.2. The finite dimensional subspace of a normed space are closed.
Proof. The restriction of the norm to a finite dimensional subspace is equiv-
alent to the Euclidean norm and so it is complete. As a complete subset it is
closed in the overspace.
Corollary 2.2.3. Let X be a normed space and let Y ⊂ X be a finite dimen-
sional subspace. Then, for every x ∈ X there is y ∈ Y such that
∥x − y∥ = d(x, Y ) := inf{∥x − z∥ : z ∈ Y }.

23
Proof. Note that f (z) = ∥z −x∥ is a continuous function on Y whose infimum
can be computed on a bounded subset.

Now we will prove the existence of “almost orthogonal” elements in normed


spaces.
Proposition 2.2.4. Let X be a normed space, Y ⊂ X a proper closed subspace
and ε ∈ (0, 1). Then, there exists x ∈ X \ Y with ∥x∥ = 1 and d(x, Y ) > 1 − ε.
In case that Y is of finite dimension, then x can be taken such that d(x, Y ) = 1.
Proof. Take x0 ∈ X \ Y . Then d(Y, x0 ) = d > 0, and we may take y0 ∈ Y
such that
d
d ≤ ∥x0 − y0 ∥ ≤ ,
1−ε
and put
x0 − y 0
x := .
∥x0 − y0 ∥
Note that x ∈ SX . Now we will estimate d(Y, x). If y ∈ Y , note that
x0 − y0
∥x − y∥ = −y
∥x0 − y0 ∥
1 d
= ∥(x0 − y0 − ∥x0 − y0 ∥y)∥ ≥ ≥ 1 − ε,
∥x0 − y0 ∥ ∥x0 − y0 ∥
because y0 + ∥x0 − y0 ∥y ∈ Y . In case X is finite-dimensional we could have
taken y0 such that ∥x0 − y0 ∥ = d which would have led to an equality.

Corollary 2.2.5. If X is an infinite-dimensional normed space, then there


exists an infinite sequence (xn ) ⊂ BX such that ∥xn − xm ∥ ≥ 1 for n, m ∈ N
with n ̸= m.
Proof. The construction is inductive: take any x1 ∈ SX . Assume x1 , . . . , xn ∈
SX already chosen. Let Y the finite dimensional subspace spanned by those el-
ements. Clearly, Y ̸= X, thus we can find xn+1 ∈ SX such that d(Y, xn+1 ) = 1
by the proposition. In particular ∥xk − xn+1 ∥ ≥ 1 for 1 ≤ k ≤ n.

This result is the culmination of the section.


Theorem 2.2.6. A normed space X has finite dimension if and only if its
unit ball BX is compact.

24
Proof. If X has finite dimension n, then it is isomorphic to Rn , and therefore
the unit ball, as closed bounded set, is compact. On the other hand, if X
has infinite dimension, then BX contains a sequence with no convergent sub-
sequence. In such a case, BX cannot be compact.

Finite dimensional spaces are also characterized by the fact that uncondi-
tionally convergent series are absolutely convergent. On implication is clear,
the other one is the celebrated Dvoretsky-Rogers theorem.

2.3 Linear operators


Linear continuous maps between normed spaces, also called operators are es-
sential for the development of the theory. Firstly note the following.

Proposition 2.3.1. Let X, Y be normed spaces (on the same field) and T :
X → Y be linear. Then the following are equivalent:

1. T is continuous;
2. T is continuous at 0;
3. there is c > 0 such that ∥T (x)∥ ≤ c∥x∥ for every x ∈ X;

4. T (BX ) is a bounded set in Y .


Proof. Note that continuity at one point for a linear function equals global
continuity, 1⇔2. Also 3⇒4. The main trick to use the homogeneity to show
that the continuity at 0 implies the boundedness of T (BX ). If c = sup{∥y∥ :
y ∈ T (BX )}, again by homogeneity, we can deduce that ∥T (x)∥ ≤ c ∥x∥.

A similar statement can be proved for bilinear or multilinear maps. The set
of continuous operators from X to Y is denoted L(X, Y ). Note tha tL(X, Y )
becomes a normed space with the norm

∥T ∥ = sup{∥T (x)∥ : x ∈ BX }

This norm inherits the completeness from Y , that is L(X, Y ) is a Banach space
if and only if Y is.

25
The norm, by its very definition has the following remarkable property: If
T ∈ L(X, Y ) and S ∈ L(Y, Z), then S ◦ T ∈ L(X, Z) and ∥S ◦ T ∥ ≤ ∥S∥∥T ∥.

Let us stress two particular interesting cases of spaces L(X, Y ). If Y = K,


that is the scalar field, then we set X ∗ := L(X, K). The space X ∗ , which is
always complete, is called the dual space of X. Duality theory studies how
properties of a space induce properties of its duals, and viceversa. This is some-
times very useful in linear optimization problems. Note that for any x ∈ X
and x∗ ∈ X ∗ we always have |x∗ (x)| ≤ ∥x∗ ∥∥x∥.

The other case is when Y = X, where we prefer the notation L(X) :=


L(X, X), that enables with the structure of algebra (that will be important
for spectral operator theory) since any two operators T, S ∈ L(X) can always
be composed, that is, ST, T S ∈ L(X), and moreover ∥ST ∥ ≤ ∥S∥∥T ∥. Here
we may consider the invertible operators: T ∈ L(X) is said invertible if there
exists S ∈ L(X) such that T S = I and ST = I, being I the identity operator
on X. In such a case, S is unique and we denote T −1 := S. The invertible
operators are also called isomorphisms when we want to express that they
preserve the linear and topological structure of X. An operator T is called an
isometry if it satisfies ∥T (x)∥ = ∥x∥ for all x ∈ X. It is not difficult to see
that the isometries are isomorphisms.

Finally, a very important observation in finite-dimensional spaces.


Proposition 2.3.2. Every linear o multilinear map defined on a finite dimen-
sional normed space is continuous.
Proof. Using a base it is possible to write the map in terms of coordinates
and so an explicit bound on the unit ball can be proved easily for the norms
∥ · ∥1 or ∥ · ∥∞ . Since all the norms are equivalent on the domain space, we
deduce the statement.

2.4 Spaces of functions


The uniform convergence of sequences and series of functions can be under-
stood in the frame of normed spaces. Let us denote by ℓ∞ (M ) the set of all
bounded real functions defined on a set M . For f ∈ ℓ∞ (M ) we denote

∥f ∥∞ = sup{|f (x) : x ∈ M |}.

26
That norm makes ℓ∞ (M ) a complete normed space and the induced topology
is usually referred as the topology of uniform convergence. When M has an
additional structure as being a metric space we can study the properties that
are preserved by uniform limits. We already know that it is the case with the
continuity. Something more general can be said.
Proposition 2.4.1. Assume that (fn ) ⊂ ℓ∞ (M ) and (xm ) ⊂ M are such that:
1. the limit of (fn ) exists uniformly;
2. limm fn (xm ) exists for every n ∈ N.
Then the following iterated limits exist and satisfies the equality
lim (lim fn (xm )) = lim (lim fn (xm )).
n m m n

A great deal of Analysis is devoted to commutation of limits, understanding


by “limit” some operations in Analysis that rely on the notion of limit as series,
derivatives and integrals. The commutation of limits of sequences and series
of functions with the integral will be treated in the corresponding chapter. As
to the commutation of derivatives and limits of sequences, it is easy to put
examples of the failing. However, the situation is more dramatic: the only
linear spaces of C 1 functions where we can expect a good behaviour of limits
are finite dimensional.
Theorem 2.4.2. Let K be a compact metric space and X ⊂ C(K) a (closed)
subspace made up of Lipschitz functions. Then X is finite-dimensional.
Hint of proof. Use Baire’s theorem to show that the Lipschitz constant
is bounded on BX . Now, BX is closed, bounded and equicontinuous, so by
Arzèla-Ascoli BX is compact. That implies that X has finite dimension.

Despite the fact that a uniform limit of Lipschitz functions could not be
Lipschitz, as for instance, the sequence fn (x) = n−1 sin n2 x on [0, π], it is
possible to endow the set of Lipschitz functions with a norm that makes it
complete. Indeed, denote by L(M ) the set of Lipschitz functions defined on
the metric space M and fix a point x0 ∈ M . Then the number
 
|f (x) − f (y)|
∥f ∥ = |f (x0 )| + sup : x, y ∈ M, x ̸= y}
d(x, y)
defines a norm on L(M ) that makes it complete. We left the proof to the
reader. A variation for differentiable functions is asked among the exercises of
the chapter.

27
2.5 Complements
In this section we include, without proof, several results that traditionally are
reserved as topics for Functional Analysis, despite some proofs are accesible to
this level.

A function p : X → [0, +∞) is said to be sublinear if it satisfies the


properties:
(a) p(tx) = tp(x) for t ≥ 0 and any x ∈ X (positive homogeneity);
(b) p(x + y) ≤ p(x) + p(y) for all x, y ∈ X (triangle inequality).
If p satisfies the stronger property
(A) p(tx) = |t|p(x) for t ∈ K and any x ∈ X (homogeneity)
then p is said to be a seminorm. Of course, p will be a norm provided that
p(x) = 0 if and only if x = 0. Note that if
1. if p is sublinear then x → max{p(x), p(−x)} is a seminorm,
2. if p is a seminorm, then |p(x) − p(y)| ≤ p(x − y).
A Hausdorff topology can be induced also by a family of seminorms (pi )i∈I
if for every x ∈ X \ {0} there is some i ∈ I such that pi (x) > 0. In this case,
a basis of neighbourhoods of a point x can be defined by

{y ∈ X : pi1 (y − x) < ε, . . . , pin (y − x) < ε}

for i1 , . . . , in ∈ I and ε > 0. It is possible to prove that if a sequence of


seminorns (pn ) induces a Hausdorff topology then it can be metrized by the
translation invariant metric

X
d(x, y) = 2−n min{1, pn (x − y)}.
n=1

The topologies defined by a family of seminorms appear quite often and they
will be considered in other chapters, for instance, uniform convergence on com-
pact subsets of Rn .

The Hahn-Banach theorem guarantees the existence of extensions of linear


forms under very general conditions.

28
Theorem 2.5.1 (Hahn-Banach). Let X be a real vector space, Y ⊂ X a
subspace, p a sublinear homogeneous functional defined on X and f a linear
form defined on Y such that f (x) ≤ p(x) for every x ∈ Y . Then there is a
linear form f˜ defined on X such that f˜|Y = f and f˜(x) ≤ p(x) for all x ∈ X.
The formulation with a sublinear functional is the key to prove separation
results for convex sets, which is a matter we are not interested here. Never-
theless, if p is of the form c∥ · ∥, we obtain the following consequence.
Theorem 2.5.2 (Hahn-Banach). Let X a normed space and x ∈ X. There
exists x∗ ∈ X ∗ with ∥x∗ ∥ = 1 such that x∗ (x) = ∥x∥.
Note that the result informally says that the dual X ∗ is, at least, as large
as X, which was quite evident for a finite dimensional X. A straightforward
application of that result says that X embeds isometrically into X ∗∗ := (X ∗ )∗ .
The closure of X as a subset of X ∗∗ provides a model for the completion of X.
Corollary 2.5.3. Every normed space can be isometrically embedded into a
complete normed space as a dense subset.
A completion can be built “more directly” as a quotient of the space of
Cauchy sequences.

In the finite dimensional case, the following result of Auerbach is very


interesting, and far from being obvious.
Theorem 2.5.4 (Auerbach). Let X a normed space of dimension n. There
exist bases {x1 , . . . , xn } of X and {x∗1 , . . . , x∗n } of X ∗ such that ∥xk ∥ = ∥x∗k ∥ = 1
for 1 ≤ k ≤ n and x∗j (xk ) = δjk and x ∈ X. There exists x∗ ∈ X ∗ with
∥x∗ ∥ = 1 such that x∗ (x) = ∥x∥.
For X a Banach space, the property enjoyed by L(X) and C(K), of being
both Banach space and algebra (there is a product compatible with the sum)
with the additional property ∥xy∥ ≤ ∥x∥∥y∥ leads to the notion of Banach
algebras, whose theory is richer than the one of normed spaces, specially when
the algebra is considered over the complex field.

2.6 Rationale and remarks


Normed spaces are the frame for the differential calculus in the next chapter.
Once the metric topology is clear, it is necessary to point out that the normed

29
spaces have a richer theory. In particular, the properties of finite normed
spaces (or subspaces) should be remarked.

Some aspects of the convergence of sequences and series of functions are bet-
ter understood in the frame of normed spaces because the uniform convergence
is a metric one. However, the most important cases, power and trigonometric
series, have a particular treatment in other subjects along the degree studies.

2.7 Exercises
1. Prove that the notions of boundedness, Cauchy sequence and complete-
ness are invariant by equivalence of norms. Show with an example that
the same does not hold in general metric spaces.

2. For points x, y ⊂ X in a normed space, the segment joining them is the


set
[x, y] = {λ x + (1 − λ) y : 0 ≤ λ ≤ 1}.
A set A ⊂ X is said convex if for every x, y ∈ A then [x, y] ⊂ A. Prove
that balls are convex sets and a closed set A ⊂ M is convex if and only
if 21 (x + y) ∈ A for every x, y ∈ A.
3. Recall that a metric space M is said to be connected if the only subsets
that are both closed and open are M and ∅. Prove that any to points
in a connected open set of a normed space can be joined by a polygonal,
that is, made up of segments, continuous line.

4. Find the optimal constants a, b > 0 for the equivalence of norms in Rn

a∥ · ∥2 ≤ ∥ · ∥1 ≤ b∥ · ∥2

5. Define on Rn the norm ∥ · ∥p for p ≥ 1 by the formula


p
∥(x1 , . . . , xn )∥p = p |x1 |p + · · · + |xn |p .

Prove that for x ∈ Rn and p ≤ q, then ∥x∥p ≥ ∥x∥q and

lim ∥x∥p = ∥x∥∞ .


p→∞

30
6. Define on C[a, b] the norm ∥ · ∥p for p ≥ 1 by the formula
s
Z b
p
∥f ∥p = |f (t)|p dt
a

Prove that ∥ · ∥p is actually a norm for p = 1, 2 (the other cases are mor
difficult). Show that ∥ · ∥p is not equivalent to ∥ · ∥q if p ̸= q, on C[a, b].
Prove also that for any f ∈ C[a, b] then

lim ∥f ∥p = ∥f ∥∞ .
p→∞

7. Prove that for every n ∈ N there is a constant Cn such that for all the
n × n matrices with non-negative entries (ai,j ) the following inequality is
verified n X n n X n
X X
2
( ai,j ) ≤ Cn ( ai,j )2 .
i=1 j=1 j=1 i=1

8. Prove that the following formula

∥f ∥ = |f (0)| + ∥f ′ ∥∞

defines a norm on C 1 [0, 1]. Show also that C 1 [0, 1] endowed with such a
norm is complete.
9. Let X be a normed space and consider the unit sphere S = {x ∈ X :
∥x∥ = 1}. Show that
d(S, x) = | 1 − ∥x∥ |.

10. A function f : X → R is said to be convex if it satisfies

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)

for every x, y ∈ X and λ ∈ (0, 1). Prove that:

(a) a norm is a convex function;


(b) f (x) = ∥x∥2 is convex too;
(c) d(A, x) is convex if and only if A is convex.
11. Prove that (C[a, b], ∥ · ∥∞ ) is separable.

31
12. The set of real bounded sequences is denoted ℓ∞ , and the formula

∥(xn )∞
n=1 ∥∞ = sup{|xn | : n ∈ N}

for (xn )∞
n=1 ∈ ℓ

defines a norm. Show that (ℓ∞ , ∥ · ∥∞ ) is complete and
non separable.
13. We say that a function Q : X → R defined on a vector space is a quadratic
form if there exists a bilinear form B : X × E → R such that Q(x) =
B(x, x). Show that B is not determined, in general, by Q. However, if
we ask the bilinear form to be symmetric, that is, B(x, y) = B(y, x) for
all x, y ∈ X then
1
B(x, y) = (Q(x + y) − Q(x) − Q(y)).
2
Find the generalization of that result for cubic forms and symmetric
trilinear forms.

14. Show that a quadratic form Q : X → R is continuous if and only if there


is k > 0 such that |Q(x)| ≤ k∥x∥2 for all x ∈ X.

15. Let X be a finite dimensional normed space.


(a) Show that all the quadratic forms on X are continuous.
(b) If Q : X → R is a quadratic form such that Q(x, x) > 0 for all
x ̸= 0, then there is a > 0 such that Q(x) ≥ a∥x∥2 .
16. Let h : R → R be a differentiable function. Study the limit of the
sequence
fn (x) = n(h(x + n−1 ) − h(x)).
Find a necessary and sufficient condition on h for the uniform conver-
gence of (fn ) on bounded intervals.

17. Consider the sequence of functions fn : [0, 1] → R defined for p > 0 by

fn (x) = np x(1 − x2 )n

Study the convergence of the sequence (pointwise and uniform) depend-


ing on the parameter p. Is possible to say what happens with the limit
of the derivatives?

32
18. Find the set where the series

X
e−nx
n=0

converges. Find also the sets where the convergence is uniform.


19. Let f : R → R be a continuous function. Consider F : R → C[a, b]
defined by F (t)(x) = f (x + t). Prove that F is continuos as a function
on (C[a, b], ∥ · ∥∞ ).

20. Study the convergence of the sequence of functions defined on R by


n
x2 √

fn (x) = 1 − if |x| ≤ n
n

and fn (x) = 0 si |x| > n.

21. Prove Hadamard’s radius formula: Let


1
R= p .
lim supn n
|an |

Then the power series ∞ n


P
n=1 an x converges if |x| < R and diverges for
|x| > R. What about the regions of uniform convergence?
22. Find a sequence of functions fn : [0, 1] → [0, 1] such that ∞
P
n=1 fn con-
verges uniformly, but not absolutely.

33
34
Chapter 3

Functions of several real


variables: a starter

3.1 Graphical representation


Actually, is more important to have a “mind representation”, in other words,
how to figure a function of several variables. In practise, those functions ap-
pears as a formula f (x, y, . . . ) that eventually may have a vector output. After,
the introduction to normed spaces one could think that functions depends one
vector variable instead of several numerical ones. That is almost right, but not
quite. We should make the same distinction that in Affine Geometry: points
and vectors are different objects. Eventually, both points and vectors are built
from the same “ingredient” Rn but they are not the same. That distinction is
compulsory when dealing with abstract manifolds: for every point there is a
set of admisible directions, the tangent space, and two directions from tangent
spaces at different points are even not comparable (a special “device” for that
has to be introduced).

Also, in (Newtonian) Physics there is a notion of coordinate-free space.


Coordinates only appear when you fix a coordinate system, made up of one
point where three perpendicular axes meet. You, as observer, cannot even tell
if the axes are moving with the time. However, two frames of reference can
move with respect to each other. Then, the validity of the Principle of Inertia
helps you to choose a good frame to develop the Mechanics. Therefore, we
should think of the domain of a function on the physical space as composed
of points so its representation as a formula is just a consequence of fixing a

35
coordinate system. In such a way is how we should think of the intrinsicness
of vector operators (chapter on Vector Analysis).

Now, assume you have a function of two variables given by some formula
f (x, y). What is the simplest way to represent it? For pedagogical reasons,
the best answer is the graph, that is, the set

{(x, y, z) : z = f (x, y)}.

However, in practise these functions occurs often and the graph is not advis-
able in some cases: think of (x,y) being a geographical position (longitude,
latitude) and the function being the height (over the sea level) or the atmo-
spheric pressure (at ground level). For that case, several curves (level curves)
of the form
{(x, y) : f (x, y) = c}
for some values of the constant c provide a contour map that can be read as
we, allegedly, can read a topographical map. Eventually, the curves, which
necessarily are discrete, could be changed by the continuous variation of tone
or colour.

Since thinking in four dimensions is difficult, unless you are A. Einstein or


S. Dalı́, for a function of three variables f (x, y, z). The graph is not a good
idea. Fortunately, level curves can be generalized to level surfaces (or more
generally, to level sets) by taking

{(x, y, z) : f (x, y, z) = c}.

As the representation could be difficult to visualise (the surfaces cover one an-
other like Russian dolls), it would be convenient to choose just one significative
value of c, for instance, when it reach a maximum. That is exactly what you
see in the pictures of atomic orbitals in Chemistry books.

An interesting situation is when the function has vector values. For func-
tions of the form f : R → R3 the representation is a curve, and we could
think of it as a trajectory regarding the variable as time. For functions as
f : R2 → R2 or f : R3 → R3 we may think of them as deformations of the
space, and we can visualise the functions by watching how they act on simple
sets of points: curves, simple domains. . . In that way is usually done in Com-
plex Analysis since a complex function is, in practise, a function from the plane
to the plane. Another way to think of functions f : R2 → R2 is to consider

36
the domain composed of points and, on the other hand, the images as vectors.
That is a plane vector field that can be depicted by choosing a regularly or-
dered set of points and drawing an arrow on each of them (usually the arrow
starts at the point) that represents the value of the function. In this way, the
speed of wing is represented in weather forecast informations, for instance.

Be aware, that non-cartesian coordinate systems, as polar or spherical co-


ordinates, can be used to convey a function defined on points of the plane or
the space to a formula. On the other hand, some properties of functions are
more or less evident depending on the way to represent them. For instance, the
property of the exponential ex+y = ex ey is not evident in cartesian coordinates,
but there is a sort of innuendo in polar coordinates r = eθ .
Example 3.1.1. The importance of the choice of coordinates: the ellipse.

The ellipse is defined as the curve made up of points from the plane such
that the sum of the distances to two points (foci) is constant. The curve is
symmetric with respect to the line passing through the foci, as well as the
bisector line of the segment joining the foci. When referred the ellipse to those
axes (X and Y , respectively) the well known cartesian equation is

x2 y 2
+ 2 = 1,
a2 b
where a is the long semi-axis and b the short one. The equation is particularly
simple because of the good choice of the coordinate frame. Note that√the sum
of distances to the foci is 2a and the distance between them is 2c = 2 a2 − b2 .
However, if we try to obtain the polar equation straight from the previous one,
the result is
r2 cos2 θ r2 sin2 θ
+ =1
a2 b2
and so
ab
r=√
b2 cos2 θ + a2 sin2 θ
which is not specially nice. In orden to obtain a better expression, move the
origin to one of the focus (the one on our right). The distance to the focus at
the origin is r. The distance to the other focus is
p q √
(x + 2c)2 + y 2 = (r cos θ + 2c)2 + r2 sin2 θ = r2 + 4cr cos θ + 4c2 .

37
Therefore, √
r2 + 4cr cos θ + 4c2 = 2a − r.
Squaring we get

r2 + 4cr cos θ + 4c2 = 4a2 − 4ar + r2 ,

thus, cr cos θ + ar = a2 − c2 = b2 ,
and the polar expression now is

b2 p
r= =
a + c cos θ 1 + ϵ cos θ
being p = b2 /a and ϵ = c/a the eccentricity. This equation, which is common
for al the conics (ϵ = 0 for the circle, ϵ ∈ (0, 1) ellipse, ϵ = 1 parabola and ϵ > 1
for the hyperbola) is useful for the description of the movement of planets.

3.2 Topology
The topology required to deal with functions of several variables is exactly the
metric topology, where the metric is induced by any of the usual norms, which
turn out to be equivalent. Moreover, guessing that geometry could be of any
help when dealing with limits or continuity is a wrong idea. You may think
that the limit of a function exits because it exits through all the lines going to
that point (radial limit), and, however, the ordinary (topological) limit may
not exits. That is the case of
2xy 2
f (x, y) =
x2 + y 4

having limit 0 at (x, y) = (0, 0) through lines, however the limit is not null
using suitable parabolas. In any case, radial and other variations of limits
are useful as training exercises, and the more interesting fact that they relate
limits in two o more variables to the functional limit. Indeed, the existence of
radial limits at a point implies the existence of the ordinary limit if the radial
limit is uniform with respect to the angle.

Despite the example, a radially continuous function is not so bad. For in-
stance, if we assume that a function is separately continuous, that is, for every
fixed value of all the variables except one the restricted function is continuous

38
with respect the remaining variable. It is possible to prove with the help of
Baire’s theorem that a separately continuous function has a dense set of points
of actual continuity.

In finite dimension there is a great availability of compactness, therefore a


map which is continuous and injective behaves locally as a homeomorphism.
It is quite easy to show, that the segment [0, 1] is not homeomorphic to the
square [0, 1]1 : the map on the border of the square cannot be injective. How-
ever, when we drop injectivity, strange things happen. For instance, there is
a continuous map from [0, 1] onto [0, 1]2 . That is the Peano map, that can be
built as the limit of a uniformly convergent sequence of maps. Of course, by
Heine, that map is uniformly continuous also. However, it cannot be Lipschitz
(recall that Lipschitz for a map between metric spaces f : (M1 , d1 ) → (M2 , d2 )
means the existence of a constant λ > 0 such that d2 (f (x), f (y)) ≤ λ d1 (x, y)
for any x, y ∈ M1 ). The reason can be derived from elementary facts about
the Lebesgue measure. That shows the existence of a huge gap between con-
tinuous maps and Lipschitz maps (in particular C 1 ). The only simple fact
abut continuous maps is its definition, almost nothing more. Intuition can be
dangerous.

3.3 Genuine functions on Rn?


The reader could be disappointed if after stressing the idea that a function of
several variables actually depends on a point-set of Rn , all the examples are
actually a superposition of functions of one variable applied to the several scalar
variables available. Well, Kolmogorov proved that any continuous function of
several variables can be expressed as a certain superposition (two compositions
and a linear combination). The general result is quite complex to write, so we
will restrict ourselves to only two variables. Given a continuous function f (x, y)
there exist six continuous functions g0 , g1 , . . . , g5 and a number λ such that
5
X
f (x, y) = g0 (gk (x) + λgk (y)).
k=1

The complicated formulation is due to the fact that the functions g1 , . . . , g5


given by Kolmogorov’s theorem are universal, that is, they do not depend on
f . Nevertheless, the idea is clear: there is not a “genuine” continuous real
function of two or more variables.

39
3.4 Rationale and remarks
Despite the very general frame with metric and normed spaces, sometimes is
necessary to point out that the matter is “Functions of several real variables”,
not a few, but not too many.

This chapter is a reflection about the notion of function of several real


variables. The idea is to blend some “philosophical” comments with simple
examples and pictures. There is a lot of information to awake the curios-
ity: Mechanics as an axiomatic theory; separately continuous functions (Baire
classes and so); Peano curves (they can satisfy a Hölder condition but not a
Lipschitz one); and the surprising Kolmogorov theorem, of course. All good
for a TFG.

3.5 Exercises
1. Use polar coordinates to express the set limited by the triangle with
vertices at (0, 0), (0, 1) y (1, 0).
2. Use polar coordinates to find the equation of a circle that passes by the
origin.
3. Express the set {(x, y, z) : 0 ≤ 2z ≤ 1 − x2 − y 2 } into spherical coordi-
nates.
4. Find a two variable function whose level curves is the family of circles
that are tangent to the Y axis at the origin.
5. Parameterize the curve resulting form the intersection of these surfaces

S1 = {(x, y, z) : (x − 1)2 + y 2 + z 2 = 1}; S2 = {(x, y, z) : z 2 = x2 + y 2 }

6. Cancel the parameter in the parametric equation of the curve

(r cos at, r sin at, bt)

with t ∈ R, that is, find two functions f (x, y, z), g(x, y, z) such that the
curve is the set

{(x, y, z) : f (x, y, z) = 0, g(x, y, z) = 0}.

40
p
7. Prove that f (x, y) = 4 x2 + y 2 does not satisfy Lipschitz condition and
yet it is uniformly continuous on R2 .
8. Prove the existence and compute the limit

x3 + y 3
lim .
(x,y)→(0,0) x2 + y 2

9. Let h : R → R be a differentiable function. Consider the two variable


function
f (x) − f (y)
f (x, y) =
x−y
for (x, y) ∈ R2 with x ̸= y. Compute then iterated, the radial and the
double limits at the points of the form (x, x). Find conditions on h for
f to be continuously extended to R2 .
10. Show that a separately continuous function of [a, b] × [c, d] is a pointwise
limit of a sequence of continuous functions.

41
42
Chapter 4

Differentiable mappings

4.1 The basics


Differentiable maps are those whose increments behave almost linearly on a
small scale (it is hard to think that things could be different in the real world).

Definition 4.1.1. A map f : D ⊂ E → F between normed spaces is differen-


tiable at x0 ∈ D if there exists A ∈ L(E, F ) such that

f (x) − f (x0 ) = A(x − x0 ) + o(∥x − x0 ∥). (4.1)

It is immediate from the definition that

1. the map f is continuous at x0 ;


2. the element A ∈ L(E, F ) satisfying (4.1) is unique, thus we will write

df (x0 ) := A;

3. the assignment f → df (x0 ) is linear among the maps that are differen-
tiable at x0 ;
4. if f is linear, then df (x) = f at any x ∈ E.
For real valued functions, the geometrical idea behind the notion of dif-
ferentiability can be understood as that the “graph of the function f is well
approximated by the tangent plane”. However, to be rigorous the definition
of tangent plane should depend on the notion of differentiability.

43
Definition 4.1.2. We call the tangent plane to the graph of f : D ⊂ Rn → R
at (x0 , f (x0 )) to the set

Tx0 (f ) = {(x, y) ∈ Rn+1 : y = f (x0 ) + df (x0 )(x − x0 )}

provided that f is differentiable at x0 .


Our aim now is to develop the rules to compute differentials likewise it is
done with derivatives for functions of one variable. This is Leibniz differenti-
ation of a product rule. It could be stated for any finite number of “factors”.
Proposition 4.1.3. Let B : E1 × E2 → F a continuous bilinear map. Then
B is differentiable at any point and

dB(x0 , y0 )(x, y) = B(x, y0 ) + B(x0 , y)

Proof. Indeed

B(x, y) − B(x0 , y0 ) = B(x − x0 , y0 ) + B(x0 , y − y0 ) + B(x, x0 , y − y0 )


p
and B(x, x0 , y − y0 ) = O(∥x − x0 ∥ · ∥y − y0 ∥) = o( ∥x − x0 ∥2 + ∥y − y0 ∥2 ).

Now we will prove the chain rule.


Theorem 4.1.4. Let f : E → F and g : F → G and x0 ∈ dom(f ) and
y0 = f (x0 ) ∈ dom(g). Then g ◦ f is differentiable at x0 and

d(g ◦ f )(x0 ) = dg(y0 ) ◦ df (x0 ).

Proof. By hypotheses we have

f (x) − f (x0 ) = df (x0 )(x − x0 ) + o(∥x − x0 ∥),

g(y) − g(y0 ) = dg(y0 )(y − y0 ) + o(∥y − y0 ∥).


Putting y = f (x) into the second equation we get

(g ◦ f )(x) − (g ◦ f )(x0 ) = dg(y0 )(f (x) − f (x0 )) + o(∥x − x0 ∥)

because the differentiability of f implies ∥f (x) − f (x0 )∥ = O(∥x − x0 ∥). A


further replacement leads to

(g ◦ f )(x) − (g ◦ f )(x0 )

44
= dg(y0 )(df (x0 )(x − x0 ) + o(∥x − x0 ∥)) + o(∥x − x0 ∥)
= (dg(y0 ) ◦ df (x0 ))(x, x0 ) + o(∥x − x0 ∥)
which implies the differentiability of the composed map and d(g ◦ f )(x0 ) =
dg(y0 ) ◦ df (x0 ) as wished.

The action of the differential on a given h ∈ E can be computed as a


directional derivative
 
df (x0 + th) f (x0 + th) − f (x0 )
df (x0 )(h) = = lim .
dt t=0
t→0 t
The differentiability implies that the limit is uniform on bounded sets with
respect to h. For vector valued functions of one real variable, there is essentially
a unique direction so we can keep the standard notation
f ′ (x0 ) = df (x0 ) ∈ L(R, F ) ∼ F
and we say that f is derivable at x0 instead of differentiable.

Simple examples such as f (t) = (cos t, sin t, t) for t ∈ [0, 2π] show that
the mean value theorem with an equality is not longer true for vector valued
functions
f (2π) − f (0) ̸= f ′ (t)(2π − 0)
for all t ∈ [0, 2π]. However, we have the following that it is enough for most
applications.
Theorem 4.1.5. Let E be a normed space and let f : [a, b] → E and g :
[a, b] → R be continuous functions such that they are derivable on (a, b) and
satisfy ∥f ′ (t)∥ ≤ g ′ (t) for all t ∈ (a, b). Then
∥f (b) − f (a)∥ ≤ g(b) − g(a).
Proof. Take ε > 0 and consider the set
I = {t ∈ [a, b] : ∀ a ≤ s ≤ t, ∥f (s) − f (a)∥ ≤ g(s) − g(a) + ε(s − a) + ε}.
By construction and continuity of the functions it is clear that I = [a, s] for
some a < s ≤ b. If s = b for all ε > 0 we are done, so we may assume s < b in
order to get a contradiction. Assume that there exists a decreasing sequence
(sn ) ⊂ (s, b) with limit s such that
∥f (sn ) − f (a)∥ > g(sn ) − g(a) + ε(sn − a) + ε.

45
Therefore

g(sn ) − g(a) + ε(sn − a) < ∥f (sn ) − f (s)∥ + ∥f (s) − f (a)∥

≤ ∥f (sn ) − f (s)∥ + g(s) − g(a) + ε(s − a)


and thus
g(sn ) − g(s) + ε(sn − s) ≤ ∥f (sn ) − f (s)∥.
Dividing by sn − s we have

g(sn ) − g(s) f (sn ) − f (s)


+ε≤
sn − s sn − s

and taking limits we get g ′ (s) + ε ≤ ∥f ′ (s)∥ which is a contradiction.

Corollary 4.1.6. Let f : D ⊂ E → F be a differentiable map, x, y ∈ D two


points that the segment [x, y] that join them by is contained in D. Then

∥f (y) − f (x)∥ ≤ sup{∥df (z)∥ : z ∈ [x, y]} · ∥y − x∥.

4.2 Partial derivatives


In practice, functions defined on Rn are given by means of a formula involving
the coordinates of the point x = (x1 , . . . , xn ) that we can represent simply as
f (x1 , . . . , xn ), understanding an identification of the function and its formula.
For low dimensions we may use f (x, y), f (x, y, z). . . with the same meaning. In
this setting we may study the behaviour of a function with respect to one vari-
able xi leaving the others constant, and in particular to compute the derivative
with respect to xi if possible. This is called the partial derivative with respect
to xi and it is denoted as
∂f
(x1 , . . . , xn ).
∂xi
From the point of view of the previous section, a partial derivative is just a
directional derivative for the direction given by a vector of the canonical basis.
Therefore, the differentiability implies the existence of the partial derivatives.
However, the partial derivatives are easier to compute, thus for function that
is differentiable at x0 = (x01 , . . . , x0n ) we have

df (x0 )(λ1 e1 + · · · + λn en ) = λ1 df (x0 )(e1 ) + · · · + λn df (x0 )(en )

46
∂f 0 ∂f 0
= λ1(x ) + · · · + λn (x ).
∂x1 ∂xn
If we denote by dxi the linear map λ1 e1 + · · · + λn en → λi , then for any x ∈ Rn
we may write the previous identity as
∂f 0 ∂f 0
df (x0 )(x) = dx1 (x)(x ) + · · · + dxn (x) (x )
∂x1 ∂xn
that can be rewritten in a more aesthetically way
∂f 0 ∂f 0
df (x0 ) = (x )dx1 + · · · + (x )dxn
∂x1 ∂xn
or simply
∂f ∂f
df = dx1 + · · · + dxn
∂x1 ∂xn
∂f
despite the fact that the partial derivatives ∂x i
may have vector values. For
real valued functions compare with the formula of the tangent plane in terms
of the partial derivatives
∂f 0 ∂f 0
y − f (x0 ) = (x )(x1 − x01 ) + · · · + (x )(xn − x0n ).
∂x1 ∂xn
In the old times before the arrival of rigor in Calculus, was usual to think that
dx1 , . . . , dxn where infinitesimal increments of the variables. That spirit still
last in reasonings that can be found in some Physics and Engineering books.

The chain rule for the differential implies a chain rule for partial derivatives.
Indeed, assume that the variables x1 , . . . , xn are replaced by derivable functions
X1 (t), . . . , Xn (t). Take X(t) = (X1 (t), . . . , Xn (t)). Then
d ∂f dX1 ∂f dXn
(f (X(t))) = (X(t)) (t) + · · · + (X(t)) (t).
dt ∂x1 dt ∂xn dt
Typical abuse of language and removal of variables that are obvious leads to
this neat expression
df ∂f dx1 ∂f dxn
= + ··· +
dt ∂x1 dt ∂xn dt
that reminds of the expression of the differential above divided by “dt”. If t
were one of several other variables, then the expression of the chain rule would
be with partial derivatives
∂f ∂f ∂x1 ∂f ∂xn
= + ··· + .
∂t ∂x1 ∂t ∂xn ∂t

47
Let us stress once more that the chain rule is valid provided that the (sec-
ond) function is differentiable. So far we have not provided a differentiability
criterion based on the partial derivatives. The following will fill the gap.
Theorem 4.2.1. Let f : D ⊂ Rn → R be a function such that its first
partial derivatives are defined on a neighbourhood of x0 ∈ D and they are also
continuous at x0 . Then f is differentiable at x0 .
Proof. The idea of the proof is the same in n dimensions than 2. In order not
to complicate much the notation we will assume something in the middle, say
3 dimensions. The point from the hypothesis will be denoted p = (x0 , y0 , z0 ).
Fix ε > 9 and let δ > 0 be such that the partial derivatives exists B(p, δ) (we
shall consider the Euclidean norm) and its values on points of B(p, δ) differs
less than ε from the value at (x0 , y0 , z0 ). Assume that ∥(x, y, z) − p∥ < δ, then
the four points (x0 , y0 , z0 ), (x, y0 , z0 ), (x, y, z0 ) and (x, y, z) are in the ball and
so the segments joining them. We have
f (x, y, z) − f (x0 , y0 , z0 ) =
f (x, y0 , z0 ) − f (x0 , y0 , z0 ) + f (x, y, z0 ) − f (x, y0 , z0 ) + f (x, y, z) − f (x, y, z0 )
∂f ∂f ∂f
= (x, y0 , z0 )(x − x0 ) + (x, y, z0 )(y − y0 ) + (x, y, z)(z − z0 ),
∂x ∂y ∂z
where x ∈ [x0 , x], y ∈ [y0 , y] and z ∈ [z0 , z] are given by the finite increments
theorem. Now
∂f ∂f ∂f
f (x, y, z) − f (p) − (p)(x − x0 ) − (p)(y − y0 ) − (p)(z − z0 )
∂x ∂y ∂z

≤ ε|x − x0 | + ε|y − y0 | + ε|z − z0 | ≤ 3ε∥(x, y, z) − p∥.
That means f is differentiable at p as wished.
Corollary 4.2.2. Let f be a function whose first derivatives are null on a
connected domain. Then f is constant.
Proof. In that case f is differentiable by the the previous theorem and its
differential is null everywhere. Two arbitrary points can be joined by a C 1
path γ. As f ◦ γ has null derivative, it is constant and thus the function has
the same value at the butts.

We could skip the use of Theorem 4.2.1 in the Corollary by showing that
two points in a connected (open) domain can be joined by a path made of
finitely many segments which are parallel to the axes.

48
4.3 Second order differentiability and more
Assume that a map f : D ⊂ E → F is differentiable at any point of D. In
such a case we may consider the differential map df : D ⊂ E → L(E, F ). We
may consider the continuity of df with respect to the norm on L(E, F ) and,
moreover, we may consider its further differentiability at some point x0 ∈ D.
In such a case, note that d(df )(x0 ) ∈ L(E, L(E, F )). For simplicity, we have
the identification
L(E, L(E, F )) = B(E × E, F )
which means bilinear maps on E valued in F . Therefore, d2 f (x0 ) = d(df )(x0 )
can be interpreted as a bilinear form. The relation of that bilinear form to the
increment of the function is depicted in the following result.
Theorem 4.3.1. Let f : D ⊂ E → F be twice differentiable at x0 ∈ D. Then
1
f (x0 + h) = f (x0 ) + df (x0 )(h) + d2 f (x0 )(h, h) + o(∥h∥2 ).
2
Proof. By the very definition, given ε > 0 there is δ > 0 such that if ∥h∥ < δ
then
∥df (x0 + h) − df0 (x0 ) − d2 f (x0 )(h)∥ < ε∥h∥.
The definition of the norm for linear operators implies

|df (x0 + h)(v) − df (x0 ) − d2 f (x0 )(h)(v)| < ε∥h∥∥v∥

for every v ∈ E. Fix h ∈ E with ∥h∥ < δ and consider the functions h(t) = t2
and
t2
g(t) = f (x0 + th) − f (x0 ) − tdf (x0 )(h) − d2 f (x0 )(h, h).
2
The finite increment theorem for two functions says that

g(1) − g(0) g ′ (τ )
= ′
h(1) − h(0) h (τ )

for some τ ∈ (0, 1). Therefore we have


1
|f (x0 + h) − f (x0 ) − df (x0 )(h) − d2 f (x0 )(h, h)| = |g(1)|
2
= (2τ )−1 |df (x0 + τ h)(h) − df (x0 )(h) − τ d2 f (x0 )(h, h)|

49
= (2τ )−1 |df (x0 + τ h)(h) − df (x0 )(h) − d2 f (x0 )(τ h, h)|
ε∥τ h∥∥h∥ ε∥h∥2
< =
2τ 2
which proves the theorem.

Second order differentiability is related to second order derivation. The


previous result implies that
 2 
2 d f (x0 + th)
d f (x0 )(h, h) = .
dt2 t=0

In order to discuss what happens in finite dimension, E = Rn , we need to


introduce the second order derivatives
∂ 2f
 
∂ ∂f
:= .
∂xj ∂xk ∂xk ∂xj
Do not mind much that convention about the derivation order since the deriva-
tives commute in very general conditions. With the new notation and putting
h = (h1 , . . . , hn ) we have
n
df (x0 + th) X ∂f
= (x0 + th)hj ,
dt j=1
∂x j

and the second derivative at t = 0 is


 2 n X n
∂ 2f

d f (x0 + th) X
= (x0 )hj hk .
dt2 t=0 k=1 j=1
∂x j ∂x k

The matrix of coefficients of the quadratic form d2 f (x0 ) given by


 2 
∂ f
(x0 )
∂xj ∂xk j,k

is called the Hessian matrix. The Hessian is symmetric under very general
conditions, actually if f is twice differentiable at x0 , however we will prove a
fairly general result with a simpler proof. We will use a more compact notation
∂2f ∂2f
fx = ∂f∂x
, fy = ∂f
∂y
, fxy = ∂x∂y and fyx = ∂y∂x for the statement and the proof
of the following result.

In all what follows we will consider only real valued functions. Some re-
sults can be extended in an obvious way to functions taking values in finite
dimensional space.

50
Theorem 4.3.2. Let f be a real function defined on a neighbourhood of (x0 , y0 )
such that fx , fy , fxy and fyx are also defined and continuous. Then

fxy (x0 , y0 ) = fyx (x0 , y0 ).

Proof. Take h, k small enough and for a function g(x, y) we introduce the
notation
∆x g(x, y) = g(x + h, y) − g(x, y),
∆y g(x, y) = g(x, y + k) − g(x, y).
Note that ∆x (∆y f ) = ∆y (∆x f ). Now we will work with one of the terms,
being the other one similar. The finite increments theorem implies

∆x (∆y f )(x0 , y0 ) = (∆y f )x (x0 + θ1 h, y0 )h

for some 0 < θ1 < 1. Note that (∆y f )x = ∆y fx , therefore

∆x (∆y f )(x0 , y0 ) = ∆y fx (x0 + θ1 h, y0 )h.

A second application of the finite increments theorem gives

∆x (∆y f )(x0 , y0 ) = fxy (x0 + θ1 h, y0 + θ2 k)hk.

That implies
∆x (∆y f )(x0 , y0 )
lim = fxy (x0 , y0 ).
h,k hk
The commutation of the increments claimed at the beginning implies the com-
mutation of the derivatives.

The Taylor formula for several variables. The commutativity of the derivations
can be extended to orders higher that 2 if the hypothesis is satisfied. In partic-
ular, for a C k function all the derivatives commute till the order k, meaning by
order of a derivative the sum of of the orders with respect to each variable. In
order to consider formulae involving complicated derivatives is convenient to
introduce a multi-index notation: let α = (k1 , . . . , kn ) be an n-uple of positive
integers (including 0) and let

k = k1 + · · · + kn .

We will denote
∂kf ∂kf
= .
∂xα ∂xk11 . . . ∂xknn

51
The fact that the derivations are ordered with the variables implicitly means
that the commutation is assumed. For the next result we will also need facto-
rials, multi-powers and related functions. Put

α! = k1 ! . . . kn !

and  
k k!
= .
α α!
If x = (x1 , . . . , xn ), then put xα = xk11 . . . xknn . With this notation, we can prove
Newton’s multinomial formula
X m
m
(x1 + x2 + · · · + xn ) = xα .
α
|α|=m

Analogously, the Taylor polynomial up to grade m of a function f at the point


p = (x01 , . . . , x0n ) is the following
X 1 ∂ |α| f
Tm (p, x) = α
(p)(x − p)α .
α! ∂x
|α|≤m

The reasons for that choice will be clear along the proof of the following result.

Theorem 4.3.3. Let f be a C m+1 (Rn ) function and let c > 0 be bound for
the absolute value of the derivatives of order (m + 1) on B(p, r). Then for
x ∈ B(p, r), where the ball is taken with respect to the ∥ · ∥1 norm, we have

c rm+1
|f (x) − Tm (p, x)| ≤ .
(m + 1)!

Proof. Without loss of generality we may assume p = (0). Consider that


following auxiliary function h(t) = f (tx1 , . . . , txn ). The derivatives of h are
X ∂f
h′ (t) = (tx1 , . . . , txn ) xi
i
∂xi

X X ∂ 2f
h′′ (t) = (tx1 , . . . , txn ) xi xj
i j
∂xi ∂xj

52
and so on, following the schema of the powers of a multinomial. Therefore, we
can gather the terms in this way
X k  ∂k f
(k)
h (t) = (tx1 , . . . , txn ) xα .
α ∂xα
|α|=k

Now we have m
X h(k) (0) h(m+1) (θx1 , . . . , θxn )
h(1) = +
k=0
k! (m + 1)!
where θ ∈ (0, 1) by the one variable Taylor formula with Lagrange remainder.
Clearly
m X 1 ∂ |α| f
X h(k) (0)
= α
(0) xα
k=0
k! α! ∂x
|α|≤m

and
h(m+1) (θx1 , . . . , θxn ) X 1 ∂ m+1 f
= α
(θx1 , . . . , θxn ) xα
(m + 1)! α! ∂x
|α|=m+1

c X m + 1  c
≤ |x|α = (|x1 | + · · · + |xn |)m+1
(m + 1)! α (m + 1)!
|α|=m+1

c ∥x∥m+1
1

(m + 1)!
as wished.

4.4 Applications to extrema


We will use derivatives in order to investigate the (relative) extreme values of a
function f on a domain D. A point x0 ∈ D is said critical if df (x0 ) = 0. Points
where df does not exist are considered critical too in the literature, however
we will not consider them here. Let us start by the following observation.
Proposition 4.4.1. If a differentiable function f a relative extremum at an
interior point of its domain, then necessarily the point is critical.

53
In this case the domain is not assumed open. Typically, problems about
extrema are posed on a compact domain. The existence of extrema is assured
by Weierstrass theorem, however finding them is a different matter.

The strategy to compute the relative extremum on a “regular” compact


domain is the following:
ˆ the extremum is attained at an interior point, so we could find it among
the critical points;
ˆ otherwise, the extremum is on the border: then parametrise the border,
which reduces the dimension by one, and start again.
There is method that spare us from parameterise the border, the so called
Lagrange multipliers (see Section 6.3.1), whose explanation relies on the im-
plicit functions theorem. In any case, the iteration of the previous algorithm
will produce a list of points among where the maximum and the minimum are
attained. We only have to check the values of f .

In case we are interested in relative extrema, the method above cannot


distinguish them. If the function is C 2 we may consider the second order
approximation Theorem 4.3.1. Assuming that df (x0 ) = 0, the local behaviour
of f at x0 is the same that d2 f (x0 )(x, x) at 0 in these cases:
ˆ if d2 f (x0 )(x, x) ≥ c∥x∥2 for some c > 0, f has a local minimum at x0 ;
ˆ if d2 f (x0 )(x, x) ≤ −c∥x∥2 for some c > 0, f has a local maximum at x0 ;
ˆ if d2 f (x0 )(x, x) takes both positive and negative values, then no relative
extremum is attained at x0 (saddle point).
The cases where merely d2 f (x0 )(x, x) ≥ 0 or d2 f (x0 )(x, x) ≤ 0 cannot be
decided with this method. The proof for the two first cases follows straight
from Theorem 4.3.1. The third case can be reduced to one variable: there are
directions such that the restriction of the function to the line has a minimum
at x0 , and other directions such that the restriction has a maximum.

The discussion in the finite dimensional case is done with the help of the
Hessian matrix  2 
∂ f ∂2f ∂2f
2 ∂x1 ∂x2
. . . ∂x1 ∂xn
 ∂x. 1 . . .. 
A= . . .
. . . .  
2
∂ f 2
∂ f 2
∂ f
∂xn ∂x1 ∂xn ∂x2
. . . ∂x2
n

54
In Linear Algebra is shown a method called Sylvester criterion to know if the
associated quadratic form is positive or negative by checking n determinants.
The quadratic form d2 f , and so the Hessian, can also be used to check the
convexity of a C 2 function.

We will show how differential calculus works in an infinite dimensional con-


text. The Calculus of Variations appeared as a collection of techniques to find
(or characterize) the extrema of certain kind of functionals, that usually in-
volve the function together its derivatives. The following is the most basic, but
it is enough to solve the brachistochrone problem, by reducing the variational
problem to an ordinary differential equation. Let ϕ(y, ẏ, x) a function defined
on R2 × [a, b] and consider the associated functional
Z b
Φ(f ) = ϕ(f (x), f ′ (x), x) dx
a

defined for C 1 functions f on [a, b].


Theorem 4.4.2 (Euler - Lagrange). Let A, B ∈ R. The relative extremes of
the functional Φ among the C 1 functions f such that f (a) = A and f (b) = B
satisfies the differential equation
 
d ∂ϕ ∂ϕ
= .
dx ∂ ẏ ∂y
Proof. In order to show the extremality of f will consider only C 2 perturba-
tions of the form h(x) such that it and ḣ(x) vanish at a, b. By hypothesis, the
directional derivative
d b
Z
ϕ(f (x) + sh(x), f ′ (x) + sh′ (x), x) dx
ds a
must be 0 at s = 0. The derivation with respect s can be performed under the
integral sign this way (we omit the variable x in some of the functions for the
sake of readability)
Z b
∂ϕ ∂L 
(f + sh, f ′ + sh′ , x)h′ + (f + sh, f ′ + sh′ , x)h′′ dx.
a ∂y ∂ ẏ
Now, the formula of integration by parts gives that
Z b
∂ϕ
(f + sh, f ′ + sh′ , t)h′′ dx =
a ∂ ẏ

55
b Z b 
∂ϕ ′ ′ ′ d ∂ϕ 
(f + sh, f + sh , t)h − (f + sh, f ′ + sh′ , x) h′ dx
∂ ẏ a a dx ∂ ẏ
Z b 
d ∂ϕ 
=− (f + sh, f ′ + sh′ , x) h′ dt.
a dx ∂ ẏ
Going back to the derivative of the functional, we have

d b
Z
ϕ(f + sh, f ′ + sh′ , x) dx =
ds a
Z b
∂ϕ d  ∂ϕ 
(f + sh, f ′ + sh′ , x) − (f + sh, f ′ + sh′ , t) h′ dx.
a ∂y dx ∂ ẏ
For s = 0 we get that
Z b
∂ϕ d  ∂ϕ 
(f, f ′ , x) − (f, f ′ , x) h′ dx = 0
a ∂y dx ∂ ẏ
for all the perturbations h satisfying the required assumptions. As it is possible
to take h with arbitrarily small support contained into (a, b), we deduce that

∂ϕ ′ d  ∂ϕ ′

(f, f , x) − (f, f , x) = 0
∂y dx ∂ ẏ

for x ∈ (a, b) as we wanted.

4.5 Two applications to Algebra


Here we shall present to interesting applications of the calculus of several
variables to proof to results of Algebra, that turns out to be useful in Analysis.

4.5.1 The Fundamental Theorem of Algebra


Theorem 4.5.1. Every non-constant polynomial with complex coefficients has
a complex root.
Proof. Let p(z) = aN z N + · · · + a0 with aN ̸= 0. Note that

|p(z)| ≥ |aN ||z|N − |aN −1 z N −1 + · · · + a0 |.

56
Therefore, the real function f : R2 → R defined by f (x, y) = |p(x+iy)| satisfies

lim f (x, y) = +∞.


(x,y)→∞

Together the continuity, that implies the existence of (x0 , y0 ) such that

m := f (x0 , y0 ) = inf{f (x, y) : (x, y) ∈ R2 }.

Our aim is to prove that m = 0. Let z0 = x0 + iy0 . We may assume that


z0 = 0 just by replacing p by p(z − z0 ). Using conjugates, we have

|p(z)|2 = (a0 + an z n + · · · + aN z N )(a0 + an z n + · · · + aN z N )

where n is the index of the first non-null coefficient after a0 . We have

|p(z)|2 = a0 a0 + a0 an z n + a0 an z n + . . .

where the following non-written terms are of degree greater than n. Put a0 a0 =
A(cos α + i sin α) and z = r(cos θ + i sin θ) and note that A ̸= 0. Now we can
use the fact that |a0 | = m and De Moivre ’s to get

|p(z)|2 = m2 + Arn (cos(nθ + α) + i sin(nθ + α))

+Arn (cos(nθ + α) − i sin(nθ + α)) + O(rn+1 )


= m2 + 2A cos(nθ + α) + O(rn+1 ).
We have

0 ≤ R(r, θ) := |p(z)|2 − m2 = R(r, θ) = 2A cos(nθ + α) + O(rn+1 ).

Fix θ ∈ [0, 2π]. We deduce thus that

0 ≤ lim r−n R(r, θ) = 2A cos(nθ + α),


r→0

which is impossible as we can choose θ such that cos(nθ + α) = −1.

4.5.2 Diagonalization of symmetric matrices


Recall that the linear homeomorphism of Rn that preserve the Euclidean metric
are characterised by orthogonal matrices, for which the inverse coincide with
the transpose. We will prove the following well known result from Linear
Algebra.

57
Theorem 4.5.2. Let A be a symmetric matrix. Then there exists an orthog-
onal matrix such that Qt AQ is diagonal.
2
P
Proof. For an n × n matrix B = (bij ) define σ(B) = i̸=j bij . The set
of orthogonal n × n matrices, name it Ω can be identifies with a closed and
2
bounded subset of Rn . Therefore Ω is compact. Assume that the symmetric
matrix A is fixed and consider a function f : Ω → R defined by

f (Q) = σ(Qt AQ)

That function attains its minimum value at some matrix, say B. We claim
that B is diagonal. The idea is to prove that if B is not diagonal, then we
could obtain a smaller value for σ by applying an orthogonal transformation
to B. Thus, assume bsr = brs ̸= 0 for some r ̸= s. Consider a rotation
 
cos θ − sin θ
sin θ cos θ

on the 2-dimensional space corresponding to the coordinates r, s and enlarge


it to an orthogonal Q matrix by taking the identity on the (n − 2)-dimensional
orthogonal complement. Consider now the matrix C = Qt AQ and note that
for any pair of indices i, j such that {i, j} ∩ {r, s} = ∅ we have bij = cij . In
case that the sets have only one element in common, the coefficients change,
but not enough to modify the value of σ. For instance, if i ̸∈ {r, s}, then it
can be computed that
b2ir + b2is = c2ir + c2is .
And for the same pair r, s of entries we have
     
cos θ sin θ brr brs cos θ − sin θ crr crs
=
− sin θ cos θ bsr bss sin θ cos θ csr css

A tedious computation and the help of some trigonometry gives


1
c2rs + c2sr = ((bss − brr ) sin 2θ + 2brs cos 2θ)2 .
2
For θ = 0 we recover the value 2b2rs = b2rs + b2sr . If bss = brr any small
perturbation of θ reduces the value of the function σ. If bss ̸= brr is posible
to choose the sign of the perturbation, and so the sing of sin 2θ in such a way
that we reduce the value of σ.

58
4.6 Rationale and remarks
Differentiability is a fundamental notion in Analysis. The students should be
get used to all the different versions as well as the right use of the chain rule. In
practise, people do not derive functions, they derive one variable with respect
another. From the formal point of view, that always entails abuse of language.
Nevertheless, future mathematicians should get used to that use in order to
communicate with scientists and engineers.

The Taylor polynomial of second degree with vector values and infinitesimal
remainder is presented mostly to show the complexity of the second differential
from the point of view of the involved operator spaces.

The use of the Euler-Lagrange equation to solve a particular problem de-


pends on the knowledge of differential equations. Diagonalization of symmetric
matrices is more important in Physics than in Algebra. I like to recall the no-
tion of tensor of inertia when I have the opportunity.

4.7 Exercises
1. Compute at any point the directional derivative of
ln(ex + ey )
along directions parallel to the line x = y. Provide a simple explanation
for the result.
2. Calculate
∂u ∂u ∂u
x +y +z
∂x ∂y ∂z
for
y2 + z2
 
u = arctan .
x2
3. Let the function
x2 − y 2
u = xy .
x2 + y 2
Compute at (0, 0) the following functions
   
∂ ∂u ∂ ∂u
y
∂x ∂y ∂y ∂x

59
4. Consider the function on R2 \ {(0, 0)} defined by
x3 y 3
f (x, y) = .
x4 + y 4
Show that is possible to continuously extend the function to R2 . Then,
study its differentiability.
5. Consider a function F : R2 → R of the form F (x, y) = yf (x) + xg(y)
where f, g : R → R are continuous at 0. Prove that F is differentiable
at (0, 0). Find a reasonable hypothesis to guarantee that F is twice
differentiable at (0, 0).
6. Let w = f (x, y, z) and z = g(x, y). Then
∂w ∂w ∂x ∂w ∂y ∂w ∂z ∂w ∂w ∂z
= + + = +
∂x ∂x ∂x ∂y ∂x ∂z ∂x ∂x ∂z ∂x
∂y
since ∂x
∂x
= 1 y ∂x = 0. Therefore ∂w ∂z
∂z ∂x
= 0. Now, assume that w =
∂w ∂z
x + y + z and z = x + y. Then we get ∂z = ∂x = 1 an so 1 = 0. Please,
find the mistake.
2 −y 2
7. Find and classify the critical points of f (x, y) = (x2 + y 2 ) ex .
8. Find the maximum volume of the straight parallelepiped contained into
the ellipsoid
x2 y 2 z 2
+ 2 + 2 = 1.
a2 b c
9. Find the minimum volume ellipsoid
x2 y 2 z 2
E(a, b, c) = {(x, y, z) : + 2 + 2 = 1}
a2 b c
passing at the point (1, 2, 3).
10. Prove that the maximum value of the function
f (x1 , x2 , . . . , xn ) = x21 x22 . . . x2n
on the sphere S = {(x1 , x2 , . . . , xn ) : x21 + x22 + · · · + x2n = 1} is n−n . Find,
as an application, the arithmetic-geometric mean inequality: for every
n ∈ N and ak ≥ 0 with 1 ≤ k ≤ n we have
√ a1 + a2 + · · · + an
n
a1 a2 . . . an ≤ .
n
60
2 2
11. Let the function f : R3 → R be defined by f (x, y, z) = xx2 +y z
2 for

(x, y, z) ̸= (0, 0, z) and f (0, 0, z) = 0.√ Find Dv f (1, 1, 2) being v an
unitary vector which tangent at (1, 1, 2) to the curve

x2 + y 2 = 2x;

x2 + y 2 + z 2 = 4.

12. A function f : Rn \ {0} → R is said to be homogeneous of degree m


if f (tx) = tm f (x) for every x ∈ Rn \ {0} and t > 0. Prove that f is
homogeneous of degree m if and only if
∂f ∂f
x1 + · · · + xn = mf
∂x1 ∂xn
for all x ∈ Rn \ {0}.

13. Prove that the function


xy
f (x, y) =
(x + y)(1 + x)(1 + y)

defined on Ω = {(x, y) : x > 0, y > 0} is uniformly continuous and find


its maximum value.

14. Consider the function ⟨x| a⟩ e−∥x∥2 defined on Rn , and find its maximum
and minimum values.

15. Determine the values of the parameters a, b ∈ R for which the surface
2
z = eax+y + b cos(x2 + y 2 )

is below or above its tangent plane at (0, 0).


16. Prove that x3 + y 3 + 6xy is convex on A = {(x, y) : xy > 1, x > 0}.
17. Find a ball centred at (0, 0, 0) where is convex the function

log(1 + x2 + y 2 + z 2 ).

61
18. We say that a function f : Ω → R is analytic on an open domain Ω ⊂ Rn
if it admits a power expansion centred at any point of Ω, that is, for
every (x01 , . . . , x0n ) ∈ Ω there are coefficients (ak1 ,...,kn ) such that

X ∞
X
f (x1 , . . . , xn ) = ··· ak1 ,...,kn (x1 − x01 )k1 . . . (xn − x0n )kn
k1 =0 kn =0

for (x1 , . . . , xn ) in a a neighbourhood of (x01 , . . . , x0n ). Prove that an


analytic function in Ω is C ∞ (Ω).

19. Prove that the function defined by


sin y − sin x
f (x, y) =
y−x

if x ̸= y and f (x, x) = cos x is analytic on R2 .


20. Find the minimal distance to the origin from the line where meet the
planes x + 2y + z = 4 and 3x + y + 2z = 3.
21. Find the closest and the farthest point from the ellipsoid x2 +2y 2 +3z 2 = 1
to the plane x + y + z = 10.
22. Find the closest and the farthest point form the ellipse x2 /9 + y 2 /4 = 1
to (1, 0).
23. Find the maximum and minimum values of f (x, y) = x2 − y 2 on the set

x2 /16 + y 2 /9 ≤ 1.

62
Chapter 5

Theorems of the inverse


mapping and implicit functions

5.1 Theorem of the inverse mapping


A continuous real function defined on an interval has (continuous) inverse if
and only if it is strictly monotone (increasing or decreasing). For two or more
variables, the existence of global inverse is more complicated so we will relax
our exigences to the existence of local inverse for regular enough maps. In
order to set the suitable hypotheses, let us analyse the one variable case first.
Monotonicity on the neighbourhood of a point x0 is related to the condition
f ′ (x0 ) ̸= 0, but this condition is not necessary if we do not ask regularity to
the inverse: the function f (x) = x3 fails the condition at 0 and it has local,
and global, continuous inverse, however f −1 (y) fails to be derivable at 0. On
the other hand, the condition f ′ (x0 ) ̸= 0 is neither sufficient for the existence
of the local inverse if we do not ask regularity to the derivative: consider the
function f (x) = x + 2x2 sin(1/x) for x ̸= 0 and f (0) = 0, which is derivable at
every point and f ′ (0) = 1, but fails to be monotone at every neighbourhood
of 0 and so it fails to have local inverse too.

The first aim in this chapter is to prove the inverse mapping theorem for
maps defined on subsets of Rd . In this context, the one variable condition
f ′ (x0 ) ̸= 0 is replaced by the non degeneracy of df (x0 ), that is, it is invertible
as map on Rd . For that, we need a couple of lemmata to understand the local
behaviour of a C 1 map on Rd . That information will play a crucial role in the
proof of theorem for change of variable in multiple integrals.

63
We will prove the first auxiliary results in the frame of Banach spaces
because we do not need any special property of Rd . Before stating the first
lemma, let us state this “mean value theorem” for vector valued functions,
which is just a corollary of Theorem 4.1.5:

∥f (x) − f (y)∥ ≤ sup{∥df (tx + (1 − t)y)∥ : t ∈ [0, 1]} · ∥x − y∥

where we suppose that f is differentiable at every point the segment joining x


and y which, of course, is contained in the domain of f .
Lemma 5.1.1. Let E be a Banach space and let f : D ⊂ E → E be a
differentiable map such that B[0, r] ⊂ D for some r > 0, f (0) = 0, df (0) = I
and there is 0 < η < 1 such that

∥df (x) − I∥ ≤ η

for every x ∈ B[0, r]. Then, we have


(a) (1 − η)∥x − y∥ ≤ ∥f (x) − f (y)∥ ≤ (1 + η)∥x − y∥ for any x, y ∈ B[0, r];
(b) B[0, (1 − η)r] ⊂ f (B[0, r]) ⊂ B[0, (1 + η)r];
(c) there are neighbourhood U and V of 0 such that f is a homeomorphic
bijection from U onto V ;
(d) Moreover, f −1 (defined on V ) is differentiable at 0 with d(f −1 )(0) = I.
Proof. Applying the mean value theorem to g(x) = f (x) − x and noticing
that ∥dg(x)∥ = ∥df (x) − I∥ ≤ η if x ∈ B[0, r] we have

∥f (x) − f (y) − (x − y)∥ ≤ η∥x − y∥.

In particular, for y = 0, we have ∥f (x) − x∥ ≤ η∥x∥ that we will need later.


The triangle inequality implies that

|∥f (x) − f (y)∥ − ∥x − y∥| ≤ η∥x − y∥

and so we obtain the desired inequality of statement (a)

∥x − y∥ − η∥x − y∥ ≤ ∥f (x) − f (y)∥ ≤ ∥x − y∥ + η∥x − y∥.

Now observe that if x ∈ B[0, r] then ∥f (x)∥ = ∥f (x) − f (0)∥ ≤ (1 + η)∥x∥


and therefore f (x) ∈ B[0, (1 + η)r], which is the right hand-side set inclusion

64
of statement (b).
The other set inclusion is more delicate and requieres Banach’s fixed point
theorem. Assume that y ∈ B[0, (1 − η)r]. We want to find x ∈ B[0, r] such
that y = f (x). Observe that such a point x is a fixed point of the map

ϕ(x) := x − f (x) + y

We claim that ϕ is a contractive map from B[0, r] into itself. Indeed, if x ∈


B[0, r] then

∥ϕ(x)∥ ≤ ∥x − f (x)∥ + ∥y∥ ≤ η∥x∥ + ∥y∥ ≤ η r + (1 − η)r = r

so ϕ(x) ∈ B[0, r]. Assume now that x, z ∈ B[0, r]. Then

∥ϕ(x)−ϕ(z)∥ = ∥x−f (x)+y−z+f (z)−y∥ = ∥f (z)−f (x)−(z−x)∥ ≤ η∥x−z∥.

Since η < 1, the map ϕ is contractive as desired.


In order to prove (c), note that the inequality in (a) implies that f is one-
to-one on B[0, r], Lipschitz and the inverse f −1 defined on f (B[0, r]) is also
Lipschitz. Therefore, f |B[0,r] is an homeomorphism of B[0, r] onto f (B[0, r]).
Note that f (∂B[0, r]) is closed in E because f −1 preserves Cauchy sequences
and completeness equals closedness in E. Since 0 ̸∈ f (∂B[0, r]), we may take
δ > 0 such that B(0, δ) ∩ f (∂B[0, r]) = ∅. Then we set V = B(0, δ) and
U = B(0, r) ∩ f −1 (V ). Clearly the choice of U and V implies that f is a
bijection between them.
Finally statement (d). In order to show that f −1 is differentiable at 0, we
have to prove that ∥f −1 (y) − y∥ = o(∥y∥) where y ∈ V . Fix ε > 0. As f is
differentiable at 0, there is δ > 0 such that

∥f (x) − x∥ = ∥f (x) − f (0) − I(x − 0)∥ < (1 − η)ε∥x∥

for ∥x∥ < δ. Put x = f −1 (y) and observe that if y ∈ f −1 (B(0, δ)) we have

∥f −1 (y) − y∥ = ∥x − f (x)∥ ≤ (1 − η)ε∥x∥ ≤ ε∥y∥.

where we are using one of the inequalities from (a). Note that the last inequal-
ity holds for every y ∈ f (B(0, δ)) which is a neighbourhood of 0. As ε > 0 was
arbitrary, that implies the differentiability of f −1 at 0 with d(f −1 )(0) = I.
Lemma 5.1.2. Let f : D ⊂ E → E be a differentiable map and let x0 ∈ D
such that df is continuous at x0 and df (x0 ) has a continuous inverse. Then

65
for every 0 < η < 1 there exists δ > 0 such that f |B[x0 ,δ] is one-to-one, f −1 is
differentiable at f (x0 ) and
f (x0 ) + df (x0 )(B[0, (1 − η)r]) ⊂ f (B[x0 , r]) ⊂ f (x0 ) + df (x0 )(B[0, (1 + η)r])
for every 0 ≤ r ≤ δ In particular, the image through f of a neighbourhood of
x0 is a neighbourhood of f (x0 ). Moreover, f (U ) is open whenever U ⊂ D is
open and df (x) has a continuous inverse at every point x ∈ U .
For the proof we will use the shift map τh (x) = x + h.

Proof. Put y0 = f (x0 ) and consider the map


g := df (x0 )−1 ◦ τy−1
0
◦ f ◦ τ x0
Observe that g(0) = 0 and dg(0) = I by the chain rule. Moreover, dg is
continuous at 0 and thus there is δ > 0 such that ∥dg(x) − I∥ ≤ η for every
x ∈ B[0, δ]. The application of the previous lemma for 0 < r ≤ δ gives us
B[0, (1 − η)r] ⊂ g(B[0, r]) ⊂ B[0, (1 + η)r]
and applying df (x0 ) to all the members we have
df (x0 )(B[0, (1 − η)r]) ⊂ f (B[x0 , r]) − y0 ⊂ df (x0 )(B[0, (1 + η)r])
which is equivalent to the set inclusion of the statement. The differentiability
of f −1 at y0 is consequence of the expression
f −1 = τx0 ◦ g −1 ◦ df (x0 )−1 ◦ τy−1
0
.
For the last part, just observe that x + df (x)(B(0, ξ)) is a neighbourhood of x
for every ξ > 0 whenever df (x) is nonsingular.
Remark 5.1.3. Note that the δ > 0 given by the previous lemma depends on
the modulus of continuity of df (x) and an upper bound for ∥df −1 (x)∥. There-
fore, if f is C 1 then it is easy to modify the statement to get the same δ for all
the points on a certain neighbourhood of x0 .
We can state now the inverse mapping theorem.
Theorem 5.1.4. Let f : D ⊂ Rd → Rd be a C k map with k ≥ 1 and let
x0 ∈ D such that df (x0 ) is nonsingular. Then there exist neighbourhoods U of
x0 and V of y0 = f (x0 ) such that f is a bijection of U onto V and the inverse
map f −1 defined on V is C k .

66
Proof. Being f of class C 1 we may restrict our attention to a neighbourhood of
x0 where df is nonsingular. By the previous lemma we may fix neighbourhoods
U of x0 and V of y0 such that f is a bijection of U onto V . For any x ∈ U the
application of the previous lemma gives us that f −1 is differentiable at f (x),
thus d(f −1 )(y) is defined for every y ∈ V . Before proving that f −1 (y) is C k ,
note that the map sending the nonsingular linear maps A on Rd to their inverses
A−1 is C ∞ . Indeed, use the matrix expression for A and observe that the
coefficients of the matrix of A−1 are polynomials on the coefficients of A divided
by the determinant, which is a non vanishing polynomial of the coefficients of
A. Now we will proceed by induction: if k = 1 then d(f −1 )(y) = (df (f −1 (y)))−1
is continuous as a composition of continuous maps and so f −1 is C 1 . Assume
that the theorem is proven for k − 1 and f is C k . In such a case we know
that df (x) is C k−1 and, by the induction hypothesis f −1 (y) is C k−1 . Therefore,
d(f −1 )(y) = (df (f −1 (y)))−1 is C k−1 as composition of C k−1 maps, which means
that f −1 is C k .

5.2 The implicit function theorem and smooth


manifolds
Now we are interested in the following problem: consider an equation F (x, y) =
0, find conditions to solve it in the form y = f (x) for some range of values
of x. Of course, not for every x there is y such that F (x, y) = 0, and when
such a y exists it is not unique necessarily. It seems natural to find conditions
ensuring that the function f is as regular as possible and not a random choice
of solutions. For that, a reference value for f should be fixed in form of a
particular solution (x0 , y0 ) of the equation. If we think of the set F (x, y) = 0
like a curve on the plane, we need that it looks like a graph around (x0 , y0 ), so
we should skip having a vertical tangent there.
Theorem 5.2.1. Let F : D ⊂ Rn+m → Rm be a C k map, k ≥ 1. Assume that
(x0 , y0 ) ∈ D is such that F (x0 , y0 ) = 0 and dF (x0 , y0 ) is nonsingular when
restricted to the subspace 0 × Rm . Then there are neighbourhoods U of x0 and
V of y0 with U × V ⊂ D and such that for every x ∈ U there exists a unique
y ∈ V such that F (x, y) = 0 and the map f : U → V defined by f (x) = y in
that way is C k .
Proof. Consider the map G : Rn+m → Rn+m defined by G(x, y) = (x, F (x, y))
and observe that dG(x0 , y0 ) is nonsingular because of the box decomposition
of its matrix. Therefore, G has a C k inverse defined on a neighbourhood of

67
G(x0 , y0 ) = (x0 , 0), that we may assume of the form U × B(0, δ), with values
in a neighbourhood of (x0 , y0 ), that we may assume of the form U × V . The
condition U × V ⊂ D can be achieved by shrinking U and δ > 0. Note that if
x ∈ U , then G−1 (x, 0) = (x, y) with y ∈ V , and thus F (x, y) = 0. That point
y is unique because G is injective on U × V , therefore we may define f (x) = y.
Now, the map f can be written as the composition of G−1 with a couple of
linear maps, and thus it is C k .

Once we know the existence of the implicit function f = (f1 , . . . , fm ) around


(x̄0 , ȳ0 ) we may be interested in computing its the partial derivatives. For that
aim it is enough to derivate with respect to xj the equalities
Fi (x1 , . . . , xn , f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn )) = 0
for 1 ≤ i ≤ m. Then assign the value x̄0 to (x1 , . . . , xn ) and the derivatives
∂fi
∂xj
with 1 ≤ i ≤ m will show up as the solution of a linear system whose
matrix is composed of the last m rows from the matrix of dF (x0 , y0 ), which is
nonsingular by hypothesis (note that the possibility of obtaining the derivative
of as unique solution of the system implies nonsingularity). The derivatives of
superior order can be obtained by further derivation of the formules.

Smooth manifolds. A d-dimensional smooth manifold in Rn , for 1 ≤ d < n,


defined implicitly is a set of the form
M = {x ∈ Rn : F (x) = 0}
where F : Rn → Rn−d is at least C 1 and dF (x) has rank n − d at every
x ∈ M . Smooth manifolds defined that way appear very often in Analysis, for
instance in optimization problems. We will prove that every point of M has a
neighbourhood that can be parameterized by d variables, that is, like a graph
of a map from Rd to Rn . That allows to reduced constrained optimization
problems to nonconstrained ones, for instance.
Proposition 5.2.2. Let M ⊂ Rn a d-dimensional smooth manifold and x0 ∈
M . Then there is a subset A ⊂ {1, . . . , n} of cardinality d, a neighbourhood U
of x0 and f : πA (U ) → Rn−d such that M ∩ U is the graph of f .
Note that the tangent space to an implicitly defined smooth manifold can
be expressed also implicitly. Indeed, at a point x0 ∈ M , the tangent space is
the d-dimensional subspace
Tx0 M = {x ∈ Rn : dF (x0 )(x) = 0}.

68
Do not confuse with the tangent manifold at x0 , that is the affine space
x0 + Tx0 M . It s not difficult to prove that for a (n − 1)-dimensional mani-
fold the tangent manifold that can be expresed as the graph of a real function
coincides with tangent plane introduced in Section 4.1.

The previous proposition only gives local information on the set, that is,
being a manifold. Additional properties have to be obtained by different tech-
niques.

Example 5.2.3. Let f : R2 → R a convex C 1 function that attains its mini-


mum m exactly at (0, 0). Show that for any λ > m, the set

Mλ = {(x, y) ∈ R2 : f (x, y) = λ}

is a C 1 manifold that is homeomorphic to the circle T.


Firstly note the unique critical points that a convex function can have are
those where the minimum is attained. In this case, the unique point were both
partial derivatives vanishes at once is (0, 0). That implies that
 
∂f ∂f
(x, y), (x, y) ̸= (0, 0)
∂x ∂y

for (x, y) ̸= (0, 0), or equivalently, for f (x, y) > m. Therefore, Mλ is a C 1


manifold.
The global result requires a more detailed analysis. Let r > m and take

s = inf{f (x, y) = x2 + y 2 = r2 }.
p
The convexity implies that f (x, y) ≥ (s/r) x2 + y 2 for x2 + y 2 ≥ r2 . We
easily deduce that Mλ is bounded, and thus compact (it is evidently closed).
We also deduce that f is strictly increasing on any line starting at (0, 0). The
mapping ϕ : Mλ → T defined by

(x, y)
ϕ(x, y) = p
x2 + y 2

is continuous and one-to-one. That implies ϕ is an homeomorphism and there-


fore Mλ is homeomorphic to T as wished.

69
Elimination of variables. When we have a system of equations like

f (x, y, z) = 0
g(x, y, z) = 0

we may consider the possibility of eliminating one of the variables, say z, in


order to obtain a simpler relation satisfied by the two remaining variables,
namely h(x, y) = 0. Geometrically that is equivalent to find the equation of
satisfied by the orthogonal projection of the one-dimensional manifold (curve)
in R3 defined by the previous system. Typically, this projection is an one-
dimensional manifold in R2 , however there could be points of singularity, for
instance when the tangent vector has null projection on the XY plane. Indeed,
assume that the curve in R3 is parameterized by t and take the derivatives of
the composed functions
∂f ′ ∂f ′ ∂f ′
x + y + z = 0,
∂x ∂y ∂z
∂g ′ ∂g ′ ∂g ′
x + y + z = 0.
∂x ∂y ∂z
If t is such that (x′ (t), y ′ (t)) = (0, 0) and z ′ (t) ̸= 0. That enforces that
( ∂f , ∂g ) = (0, 0) at that point. For that reason, the result about elimina-
∂z ∂z
tion of variables have to be local. Given a point (x0 , y0 , z0 ) we can assure
that z can be eliminated locally if ∂f ∂z
(x0 , y0 , z0 ) · ∂g
∂z
(x0 , y0 , z0 ) ̸= 0. Indeed, in
that case Theorem 5.2.1 will give functions ϕ, γ defined on a neighbourhood
of (x0 , y0 ) such that
f (x, y, ϕ(x, y)) = 0,
g(x, y, γ(x, y)) = 0
with ϕ(x0 , y0 ) = z0 and γ(x0 , y0 ) = z0 ). In that case h(x, y) = ϕ(x, y) − γ(x, y)
realises the elimination of z on a neighbourhood of (x0 , y0 , z0 ).

5.3 Some applications


Our exposition along the section will be carried out with the less necessary
number of variables, that means two, for the sake of simplicity.

70
5.3.1 Lagrange multipliers
Let f : M → R be a C 1 function defined on a 1-dimensional C 1 manifold
M = {(x, y) : g(x, y) = 0} (a curve in R2 ). We look for the relative extrema
(maximum or minimum) of f on M . Assume (x0 , y0 ) ∈ M is one of such
points. We can represent M around (x0 , y0 ) ∈ M by a C 1 parameterization
(x(t), y(t)) with x0 = x(0), y0 = y(0). Necessarily we have
 
d
f (x(t), y(t)) = 0.
dt t=0

Applying the chain rule that is equivalent to


∂f ∂f
(x0 )x′ (0) + (y0 )y ′ (0) = 0.
∂x ∂y

The chain rule applied to g(x(t), y(t)) = 0 at t = 0 gives

∂g ∂g
(x0 )x′ (0) + (y0 )y ′ (0) = 0.
∂x ∂y

As (x′ (0), y ′ (0)) ̸= (0, 0) the vectors


 
∂f ∂f
∇f (x0 , y0 ) = (x0 ), (y0 ) ,
∂x ∂y
 
∂g ∂g
∇g(x0 , y0 ) = (x0 ), (y0 )
∂x ∂y
are linearly dependent. From a geometrical point of view, that means that the
level curves of f and g are tangent at (x0 , y0 ). The hypothesis on g (M is a
manifold) implies the existence of λ ∈ R such that

∇f (x0 , y0 ) + λ∇g(x0 , y0 ) = 0.

Note that the argument works the other way around, so the existence of such
a λ implies that (x0 , y0 ) is a critical point of f on M . Consequently, we deduce
that the extrema of f on M are contained among the solutions of the equations

∇f (x0 , y0 ) + λ∇g(x0 , y0 ) = 0,
(5.1)
g(x, y) = 0.

71
Curiously, the solutions of the system (5.1) is equivalent to the search of critical
points of the function
F (x, y, λ) = f (x, y) + λg(x, y).
The new variable λ is called Lagrange multiplier and its introduction reduces
the constrained problem of extrema to an unconstrained problem. That can
be done in similar terms with more variables and constraints, adding one mul-
tiplier by each constraint. For instance, looking for the extrema of f (x, y, z) on
the 1-dimensional manifold {(x, y, z) : g(x, y, z) = h(x, y, z) = 0} is equivalent
to investigate the critical points of
F (x, y, z, λ, ν) = f (x, y, z) + λg(x, y, z) + νh(x, y, z).
Example 5.3.1. The production function of Cobb-Douglas (with 3 variables)
is a function that modelizes the profits after a investment in different stages of
the manufacturing of a product: materials, machinery... and maybe tech and
marketing too. The function has the form
f (x, y, z) = c xα y β z γ ,
where c, α, β, γ > 0 and by homogeneity we should have α + β + γ = 1.
We wish to maximize the production f with a limited budget x + y + z ≤ m.
Obviously, we can restrict ourselves to a budget equal to m. As Lagrange
auxiliary function we can take
F (x, y, z, λ) = xα y β z γ + λ(x + y + z).
The partial derivatives should be zero
αxα−1 y β z γ + λ = 0
βxα y β−1 z γ + λ = 0
γxα y β z γ−1 + λ = 0
Multiplying by x, y, z respectively and adding we get
(α + β + γ)xα y β z γ + λ(x + y + z) = xα y β z γ + λm = 0.
Using that information in the first equation we get
αxα−1 y β z γ = m−1 xα y β z γ ,
therefore x = αm. Analogously, y = βm and z = γm.

72
5.3.2 Functional dependence
Now we will discuss functional dependence. It is an easy task to check that
the functions cos x and sin x are linearly independent. However they are alge-
braically dependent since cos2 x + sin2 x = 1. More generaly, we say that the
functions fk : D ⊂ Rn → R for k = 1, . . . , m are functionally dependent if
there is a nontrivial F : Ω ⊂ Rm → R such that

F (f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn )) = 0.

Here “nontrivial” means dF of maximal rank. Note that any pair of C 1 one
variable functions f and g are functionally dependent on some interval. Indeed,
we may assume that (f ′ (x0 ), g ′ (x0 )) ̸= (0, 0), otherwise both functions are
constant and so dependent. Therefore, one of the functions is locally monotone.
Let us assume it is f , and thus f −1 is defined on some neighbourhood. Now
note that F (f (x), g(x)) = 0 where F (u, v) = g(f −1 (u)) − v is non trivial
(rank 1). For a couple of functions f and g defined on an open subset of
R2 its functional dependence is locally equivalent to another one of the form
g(x, y) = G(f (x, y)) thanks to the implicit function theorem. Observe that
∇g = G′ ∇f at every point and thus
∂(f, g)
= 0,
∂(x, y)
where we are using the standard notation for the Jacobian determinant.

In general, if m > n in the definition above the functions are necessarily


functionally dependent. That is a consequence of this important result.
Theorem 5.3.2. Let fk : D ⊂ Rn → R for k = 1, . . . , n functions, at least
C 1 . If they are functionally dependent, then
∂(f1 , . . . , fn )
=0
∂(x1 , . . . , xn )

on D. Conversely, if the Jacobian vanishes on D then the functions (fk ) are


functionally dependent locally at any point of D.
Proof. Suppose that the last function can be expressed by means of the others
by means of some nontrivial C 1 function G(y1 , . . . , yn−1 ) as

fn (x1 , . . . , xn ) = G(f1 (x1 , . . . , xn ), . . . , fn−1 (x1 , . . . , xn )).

73
Then the chain rule implies
n−1
X ∂G
∇fn = ∇fk
k=1
∂yk

and so the jacobian vanishes. The converse is a little more technical, thus we
will prove the particular case n = 2 which enough to show the ideas behind.
Suppose we are given functions f (x, y) and g(x, y) such that
∂(f, g)
=0
∂(x, y)
on D ⊂ R2 . Consider the system of equations
u − f (x, y) = 0
v − g(x, y) = 0
∂(f,g)
Assume that the coefficients of ∂(x,y) do not vanishes at once on any open sub-
set, otherwise all the functions are constant there and so they are functionally
dependent. Without loss of generality we may assume that ∂f ∂y
̸= 0 on some
open subset. In that case, we may use the first equation to solve y as a function
of (x, u), that is, y = ϕ(x, u). Later we will need the derivative ∂ϕ∂x
expressed
in terms of f . That can be done by implicit derivation
∂ ∂f ∂f ∂ϕ
0= (u − f (x, ϕ(x, u)) = − −
∂x ∂x ∂y ∂x
∂ϕ
therefore ∂x
= −( ∂f
∂y
)−1 ∂f
∂x
. Consider the composition

G(x, u) = g(x, ϕ(x, u))


Now we compute the partial derivative with respect to x
 −1  −1
∂G ∂g ∂g ∂ϕ ∂g ∂g ∂f ∂f ∂f ∂(f, g)
= + = − =− = 0.
∂x ∂x ∂y ∂x ∂x ∂y ∂y ∂x ∂y ∂(x, y)
That means that actually G depends only on u, the substitution u = f (x, y)
and ϕ(x, u) = y in order to remove u gives
G(f (x, y)) = g(x, y)
which is a functional dependence valid on some open subset.

74
5.3.3 Envelope of a family of curves.
Consider a family of curves in R2 depending of a parameter. The more general
way to express such a family is the implicit form

f (x, y, t) = 0

where t is the parameter. Sometimes it happen that there exists a curve


ϕ(x, y) = 0 that meets all the curves of the family exactly at one point for
every value of t and the curves ϕ(x, y) = 0 and f (x, y, t) = 0 are tangent. For
instance, the family of straight lines

x cos t + y sin t = 1

are tangent to the circle x2 + y 2 = 1. We say that such a curve ϕ(x, y) = 0 is


the (or a) envelope of the family. We are going to show how to obtain ϕ from
f . First of all, note that the curves f (x, y, t0 ) = 0 and f (x, y, t) = 0 for t ̸= t0
meet typically at some point (x(t), y(t)). If t ∼ t0 the point (x(t), y(t)) is close
to the envelope. We could formalize this by saying that

(x0 , y0 ) = lim (x(t), y(t)) ∈ {(x, y) : ϕ(x, y) = 0}


t→t0

if such a limit exists. Moreover, in such a case we have

f (x(t), y(t), t) − f (x(t), y(t), t0 ) ∂f


0 = lim = (x0 , y0 , t0 )
t→t0 t − t0 ∂t

provided that f is C 1 . Therefore, if f (x, y, t0 ) and ϕ(x, y) meet at the point


(x0 , y0 ) then
f (x0 , y0 , t0 ) = 0,
∂f (5.2)
(x0 , y0 , t0 ) = 0.
∂t
We can find ϕ(x, y) by removing t from the system of equations

 f (x, y, t) = 0,
(5.3)
 ∂f (x, y, t) = 0.
∂t
That can be done likewise for envelopes of families of surfaces depending on
one parameter, or families of spatial curves depending on two parameters.

75
Example 5.3.3. Find the envelope of all the trajectories of an object which
is thrown from the same point, at the same speed and only affected by the
(uniform) gravitational force.
Without loss of generality, the objects departs from the origin. We will consider
only the trajectories contained in a vertical plane XY (the spatial case will
follow by symmetry). Let v be the speed, θ the angle of depart and g denote
the gravitational force per unit of mass. Elementary Newtonian Physics gives
the trajectory as a function of the time t (t = 0 at the depart moment)

x = (v cos θ) t,

y = (v sin θ) t − (g/2) t2 .
The parameter time can be eliminated (put t = x(v cos θ)−1 in the second
equation)

xv sin θ g  x 2 g
y= − = (tan θ)x − 2 (tan2 θ + 1)x2 .
v cos θ 2 v cos θ 2v
Therefore, the family of trajectories in terms of the angle θ is given by
g
y − (tan θ)x + (tan2 θ + 1)x2 .
2v 2
Derivation with respect to θ gives
g
−(tan2 θ + 1)x + (2 tan θ)(tan2 θ + 1)x2 .
2v 2
The factor tan2 θ + 1 can be eliminated, so x tan θ = v 2 /g. The substitution
above produces
2
v2 v2 v2

g g 2 g
0=y− + 2 + 2
x = y − + 2 x2 ,
g 2v g 2v g 2v

that shows that the envelope is a larger parabola.

5.4 Rationale and remarks


Note that Lemma5.1.1 and Lema 5.1.2 were stated in the Banach space setting,
however Theorem 5.1.4 and Theorem 5.2.1 are restricted to Rd . The inverse

76
mapping theorem and the implicit functions theorem are still valid in the Ba-
nach frame, but the proof requires to know that the inversion of operators is
a C ∞ mapping, analytic in fact.

The notion of smooth manifold is studied at large in Differential Geometry.


We merely need the existence of local parameterizations for our purposes. I
have included a discussion on the elimination of variables because is a very
usual operation, and yet is seldom treated from a theoretical point of view in
modern texts. Functional dependence is also a forgotten classic topic.

There are interesting directions to suggest for a TFG. For instance, Saint-
Raymond proved the inverse mapping theorem on Rn under weaker hypotheses,
or the study of global invertibility following the ideas of Hadamard.

5.5 Exercises
1. Study the local and global invertibility of the mapping f : D ⊂ R3 → R3
defined by
x y z
f (x, y, z) = ( , , ),
1−x−y−z 1−x−y−z 1−x−y−z
where D = {(x, y, z) ∈ R3 : x + y + z ̸= 1}.
2. Study the local and global invertibility of the mapping f : R2 → R2
defined by f (x, y) = (x2 − y 2 , 2xy).
3. Consider the mapping J : R2 \ {(0, 0)} → R2 defined by means of polar
coordinates on the domain and Cartesian for the image by
(r, θ) → ((r + 1/r) cos θ, (r − 1/r) sin θ).
Prove that every point of R2 \ {(−2, 0), (2, 0)}has exactly two preimages.
Find the maximal regions in R2 where J is a diffeomorphism.
4. Show that the equation x2 + xy + y 3 − 11 = 0 defines y as a function of
x around x = 1, taking the value y = 2. Compute the first and second
derivatives of that function at x = 1.
5. The mapping f (x, y, z) = (y 3 + z 5 , x + z 5 , x + y 3 ) is globally invertible on
R3 ¿Does it satisfies the hypotheses of the inverse mapping theorem at
(0, 0, 0)? ¿What is the relation with the possible differentiability of f −1 ?

77
6. Consider the mapping f : R2 → R2 defined by f (x, y) = (u, v) where
u = x, v = y − x2 if x2 ≤ y, v = (y 2 − x2 y)/x2 if 0 ≤ y < x2 and
v(x, y) = −v(x, −y) in case that y < 0. Prove that f is differentiable at
(0, 0), compute its differential and show that it is one-to-one ¿It verified
the inverse mapping theorem around (0, 0)?
7. Assume that the equation f (x, y, z) = 0 defines every variable as a func-
tion of the remaining two ones. Show that
∂x ∂y ∂z
= −1.
∂y ∂z ∂x

8. Find the extreme values of the implicit functions defined by the equation
y 3 − x2 y + x3 − 3 = 0.

9. Prove that the equation


sin x + cos y = 1,
defines y as a function of x around (π/2, π/2). Find the first and second
derivatives of the implicit function at that point.
10. Prove that the equation xy − y x = 0 defines y = f (x) around (2, 4) and
compute f ′ (2). Find the largest open interval where f (x) is defined. Is
there any implicit function defined around (e, e)? Find the largest open
interval where f (x) is defined.
11. The polynomial x3 − λx2 + λ2 x − 1 has a unique real root ρ(λ) when the
parameter λ runs on a neighbourhood of 0. Suppose that λ > 0 is very
near to 0 ¿Is it possible that ρ(λ) < 1?
12. Prove that the set of roots of an algebraic polynomial
xn + an−1 xn−1 + · · · + a0
regarded as a compact subset in C with the Hausdorff metric on R2 , is a
continuous function of the n coefficients of the polynomial.
13. Prove that the equation
2y
cos(x2 + y) + sin(x + y) + ex =2
defines y as a function g of class C ∞ on a a neighbourhood of x = 0, and
g(0) = π/2. Show that g has a local minimum at x = 0.

78
14. Prove that the equation
π cos θ = t θ
has a unique solution θ(t) for t in a neighbourhood of 3/2 and find θ(3/2).
Prove also that θ′ (t) exists on a neighbourhood and compute θ′ (3/2).
15. Prove that the equations

4x2 − 3y 2 − z = 0

x2 + y 2 + z 2 = 24
define a C ∞ curve on a neighbourhood of (2, −2, 4). Find the tangent
line at that point. Show that, actually, the equations define a closed C ∞
curve.
16. Let f : D ⊂ Rn → Rm be a C 1 mapping and x0 ∈ D. Prove that:
(a) if df (x0 ) is one-to-one, then there is a neighbourhood V of x0 such
that f |V is one-to-one;
(b) if df (x0 ) is onto, then there is a neighbourhood V of x0 such that
f (V ) is a neighbourhood of f (x0 ).
17. Check that these functions are functionally dependent and find their
relation
x x−y
f (x, y) = ; g(x, y) = .
y x+y
18. Check that these functions are functionally dependent and find their
relation

f (x, y) = 2xy + 2x + 1; g(x, y) = x2 y 2 + 2x2 y + x2 − 1.

19. Check that these functions are functionally dependent and find their
relation

f (x, y, z) = x2 + y 2 + z 2 ; g(x, y, z) = x + y + z; h(x, y, z) = xy + yz + zx.

20. Let M ⊂ R3 be C 1 a compact manifold of dimension 2, that is, a C 1


surface in R3 . Assume that M is oriented and there is continuous normal
⃗ . Prove that N
field N ⃗ takes all the values in R3 . Show that the statement
is not true if we drop compactness or the manifold is piecewise C 1 .

79
21. Consider the set P ⊂ R3 defined by the equations

x2 + 4y 2 = 16

9x2 + 16z 2 = 144


Show that y, z are defined as C ∞ functions of x around (0, 2, 3). Find
the first and second derivatives at that point and the tangent line. Prove
that P is not a manifold.
22. Assume that the function f implicitly defined in a neighbourhood of x0
by F (x, y) = 0 and f (x0 ) = y0 has a critical point at x0 . Prove that if F
is C 2 and
∂ 2F ∂F
2
(x0 , y0 ) · (x0 , y0 ) > 0,
∂x ∂y
then f has a local maximum at x0 .
23. Let F : BRn → BRn be a C 1 contractive mapping. Prove that for every
t ∈ (0, 1] the mapping Ft (x) := tF (x), which is also defined on BRn , has
a unique fixed point x(t) that continuously depends on the parameter t.
Compute limt→0+ x(t).
24. Let f : R → R be a C 2 function and let x0 ∈ R be such that f ′′ (x0 ) > 0.
Show that the equality
f (x) − f (x0 )
f ′ (y) =
x − x0
define y = g(x) implicitly as a function of x on an interval (x0 , x1 ). Prove
the existence and find the value of the limit
g(x) − x0
lim+ .
x→x0 x − x0

25. An object is thrown from the origin of R2 with the same speed and
variable direction, and its movement is affected only by its weight so
it follows parabolic trajectories. Find the envelope of the family of all
possible trajectories.
26. Let f (x1 , . . . , xn ) = a1 x1 + · · · + an xn . Find the maximum of f on the
set
Bpn = {(x1 , . . . , xn ) : |x1 |p + · · · + |xn |p ≤ 1}.

80
27. Let X be a Banach space and let L(X) denote the linear continuous
operators acting on X with the operator norm.
(a) Let A ∈ L(X) such that ∥I − A∥ < 1, where I lis the identity map
on X. Show that A is invertible.
(b) Assume that A ∈ L(X) is an invertible. Prove the existence of δ > 0
such that if B ∈ L(X) with ∥B − A∥ < δ, then B is invertible.
(c) Deduce that the assignation A → A−1 within the invertible opera-
tors of L(X) is continuous and, moreover, it is C ∞ .

81
82
Chapter 6

Riemann Integral

6.1 Rectangles and partitions


In all that follows we will assume that the dimension of the space is a fixed
number d ∈ N. The case d = 1 is the one dimensional Riemann integral that
has been studied previously in first year, but the characterizations of Riemann
integrability in terms of continuity point are not likely studied in that setting.
A rectangle in Rd will always be a compact rectangle R = [a1 , b1 ] × · · · × [ad , bd ]
unless we specify other kind of rectangle (open). The d-dimensional volume of
the rectangle is the nonnegative number

m(R) = (b1 − a1 )(b2 − a2 ) . . . (bd − ad )

The rectangle is non degenerate if m(R) > 0. Clearly, the topological interior
of R is the set
(a1 , b1 ) × · · · × (ad , bd )
Two rectangles R1 and R2 are said not overlapping if they meet only on their
borders.
Any non degenerate rectangle R can be tiled with smaller non degener-
ate rectangles {Ri }ni=1 which are pairwise not overlapping. To see that, just
consider the rectangles of the form I1 × · · · × Id where each Ik is an interval
coming from a finite partition of [ak , bk ]. Then arrange all these rectangles into
a sequence {Ri }ni=1 . The tiling {Ri }ni=1 of R obtained
Pn in this way is called a
grill of R. It is not difficult to see that m(R) = i=1 m(Ri ) in this case, but
something more general is true. Given a rectangle R, a S collection π = {Ri }ni=1
is said a partition of R if they are not overlapping and ni=1 Ri = R.

83
Proposition 6.1.1. If {Ri }ni=1 is a partition of a rectangle R, then
n
X
m(R) = m(Ri ).
i=1

Proof. Assume that R = [a1 , b1 ] × · · · × [ad , bd ] is non degenerate, since in


other case the result is trivial. As well, we may assume that {RiS }ni=1 contain
no degenerate rectangle, since after removing them we still have ni=1 Ri = R
(the union of the interiors are dense in R). Fix a coordinate 1 ≤ k ≤ d. The
k-projection of Ri is a subinterval of [ak , bk ]. Consider the one-dimensional
partition of [ak , bk ] generated for all the endpoints of such intervals for 1 ≤
i ≤ n, and then consider the grill {Rj′ }m j=1 obtained form those intervals by
cartesian products. For each 1 ≤ j ≤ m there is exactly one 1 ≤ i ≤ n
such that Rj′ ⊂ Ri since both have nonempty interior. S Consider the sets
Ai = {j : Rj′ ⊂ Ri } for 1 ≤ i ≤ n which are disjoint and ni=1 Ai = {1, . . . , m}.
Observe that {Rj′ }j∈Ai is a partition of Ri . Now
n
X n X
X m
X
m(Ri ) = m(Rj′ ) = m(Rj′ ) = m(R)
i=1 i=1 j∈Ai j=1

With similar arguments it is possible to prove the following


Proposition 6.1.2. If {Ri }ni=1 is collection of non overlapping
Sn rectangles
Sm and
{Rj′ }m is another collection of rectangles such that R
i=1 i ⊂ R ′
, then
Pn i=1 Pm ′
j=1 j
i=1 m(R i ) ≤ j=1 m(R j ).

A partition π ′ = {Rj′ }m n
j=1 is finer than π = {Ri }i=1 if for every j : 1 . . . m
there is i : 1 . . . n such that Rj′ ⊂ Ri . Observe that in this case we have
[
Ri = {Rj′ : Rj′ ⊂ Ri }.

Given two partitions π = {Ri }ni=1 and π ′ = {Rj′ }m


j=1 is always possible to
find a third partition which is finer. Just take the rectangles Ri ∩ Rj′ having
nonempty interior.

84
6.2 Integrals on compact rectangles
Given a bounded function f : R → R defined on a rectangle and partition
π = {Ri }ni=1 of R, we consider the numbers
n
X
L(f, π) = inf{f, Ri }m(Ri )
i=1

n
X
U (f, π) = sup{f, Ri }m(Ri )
i=1

named lower and upper sums respectively. Observe that for π1 ≤ π2 partitions
of R we always have

L(f, π1 ) ≤ L(f, π2 ) ≤ U (f, π2 ) ≤ U (f, π1 )

The Darboux lower and upper integrals of f (on R) are defined this way
Z
f = sup{L(f, π) : π partition of R}

Z
f = inf{U (f, π) : π partition of R}.

DefinitionR 6.2.1.R A bounded function f : R → R is said Riemann integrable


(on R) if f = f . In that case, its integral (in Riemann sense) is that
R R R R
common value f = R f := f = f .

Recall that the oscillation of a function f : R → R on a set A ⊂ R is the


number
osc(f, A) = sup{|f (x) − f (y)| : x, y ∈ A}
In order to establish the properties of integrable functions the following crite-
rion will be very useful.
Proposition 6.2.2. A bounded function f : R → R is Riemann integrable if
and only if for every ε > 0 there is a partition π = {Ri }ni=1 of R such that
n
X
osc(f, Ri )m(Ri ) < ε
i=1

85
Hint of Proof. Just notice that osc(f, Ri ) = sup{f, Ri } − inf{f, Ri }.

The first application provides us with an important class of integrable func-


tions.
Corollary 6.2.3. If f : R → R is continuous, then it is Riemann integrable.
Proof. Since R is compact, then f is uniformly continuous. Given ε > 0, take
a partition π = {Ri }ni=1 made of rectangles small enough to guarantee that
osc(f, Ri ) < ε/m(R).

The reader that is acquainted with the properties of the Riemann integral
for one variable functions will not see anything new in the following result.
Proposition 6.2.4. Let R(R) denote the set of functions which are Riemann
integrable on R. Then
R R R
1. R(R) is a vector space and R (αf + βg) = α R f + β R g whenever
f, g ∈ R(R) and α, β ∈ R.
2. R(R) is stable by products (so it is an algebra).
R R
3. If f, g ∈ R(R) and f ≤ g, then R f ≤ R g.

4. If f ∈ R(R), then f + , f − , |f | ∈ R(R) and | R f | ≤ R |f |.


R R

5. If f ∈ R(R) and S ⊂ R a rectangle, then f |S ∈ R(S).


6. If f ∈ R(R) and {Ri }ni=1 is a partition of R, then R f = ni=1 Ri f .
R P R

Hint of Proof. Observe that


Z Z Z Z Z Z
f + g ≤ (f + g) ≤ (f + g) ≤ f + g
R R R R R R

and Z Z Z Z
αf = α f, αf = α f
R R R R

for α > 0, while if α < 0 then


Z Z Z Z
αf = α f, αf = α f.
R R R R

86
Integrability of products can be reduced to integrability of squares of positive
functions. In such a case, we have

osc(f 2 , A) ≤ 2 sup{f, A}osc(f, A)

which is suitable for that purpose.

6.3 Integrability and continuity points


The goal of this section is to give a characterization of Riemann integrability
by means of the set of continuity points of the function. Let us begin with a
simple but useful observation.
R
Proposition 6.3.1. If f ∈ R(R), f ≥ 0 and R f = 0, then f (x) = 0 whenever
x ∈ R is a point of continuity of f .
A bounded set A ⊂ Rd is said of null content Sn if for everyPnε > 0 there is
n
a family of rectangles {Ri }i=1 such that A ⊂ i=1 Ri and i=1 m(Ri ) < ε.
Notice that being of content null is stable by subsets, finite unions and closures.

A set A ⊂ Rd is said of null measure


S∞ if for every
P∞ ε > 0 there is a family of

rectangles {Ri }i=1 such that A ⊂ i=1 Ri and i=1 m(Ri ) < ε. Measure null
sets are stable by subsets and countable unions. Of course, content null sets
are measure null, but the converse is not true: just consider A = [0, 1] ∩ Q. As
the countable union of its singletons it is of null measure. On the other hand,
A = [0, 1] so this set cannot be of null content. Notice that a compact set of
null measure is of null content since there is no restriction in considering the
cover made of open rectangles (slightly larger ones).

Given a function f : R → R, we may define its oscillation at some x ∈ R


as
osc(f, x) = inf{osc(f, U ) : U neighborhood of x}.
Observe that f is continuous at x if and only if osc(f, x) = 0. Moreover, for
every δ > 0 the set {x ∈ R : osc(f, x) < δ} is open (relatively to R). The
following is the celebrated Riemann-Lebesgue characterization of the Riemann
integrability.
Theorem 6.3.2. Let f : R → R be a bounded function defined on a non
degenerate compact rectangle R ⊂ Rd . The following statements are equivalent:

87
i) f is Riemann integrable on R;
ii) {x ∈ R : osc(f, x) ≥ δ} is of null content for every δ > 0;
iii) the set of discontinuity points of f is of null measure.
Proof. Note that the equivalence between ii) and iii) is consequence of this
set equality

[
{x ∈ R : osc(f, x) > 0} = {x ∈ R : osc(f, x) ≥ 1/n}
n=1

bearing in mind that the first are the discontinuity points of f and the second
is represented as a union of compact subsets of R.
Suppose that f is Riemann integrable. For ε, δ > 0, take a partition {Ri }ni=1
of R into rectangles such that
n
X
osc(f, Ri )m(Ri ) < δε
i=1

Consider the open set O = ni=1 Ri◦ . If y ∈ O ∩ {x ∈ R : osc(f, x) > δ}, then
S
osc(f, Ri ) > δ if y ∈ Ri . Take N = {i : 1 ≤ i ≤ n, osc(f, Ri ) > δ} and observe
that X X
δ m(Ri ) < osc(f, Ri )m(Ri ) < δε
i∈N i∈N

S O ∩ {x ∈ R : osc(f, x) > δ} is covered by {Ri }i∈N . Since


following that
R \ O = ni=1 ∂Ri is of null content and ε > 0 arbitrary, we deduce that
{x ∈ R : osc(f, x) > δ} is of null content.
Supose now that statement ii) holds. Given ε > 0, set M = osc(f, R) and
m
Pm{Sj }j=1 by open rectangles
take a cover S of the set {x : osc(f, x) ≥ ε/m(R)}
such that j=1 m(Sj ) < ε/M . If O = m j=1 Sj , then R \ O is compact. Every
x ∈ R \ O has an open neighborhood Ux such that osc(f, Ux ) < ε/m(R). Let
ξ > 0 be the Lebesgue number of the covering {Ux }x∈R\O . Note that R \ O is a
finite union of non overlaping rectangles, that can be decomposed into smaller
nonoverlaping rectangles of diameter less than ξ. That family of rectangles
can be extended to a partition {Ri }ni=1 of R adding rectangles filling R ∩ O.
With all these ingredients we have
n
X X X
osc(f, Ri )m(Ri ) = osc(f, Ri )m(Ri ) + osc(f, Ri )m(Ri )
i=1 Ri◦ ⊂O Ri ⊂R\O

88
X ε X ε ε
≤M m(Ri ) + m(Ri ) ≤ M + m(R) = 2ε.
Ri◦ ⊂O
m(R) M m(R)
Ri ⊂R\O

That proves the Riemann integrability of f .


R
Corollary 6.3.3. If f ∈ R(R), f ≥ 0 and R f = 0, then {x ∈ R : f (x) ̸= 0}
is of null measure.

6.4 Integration on general domains


Let D ⊂ Rd a bounded subset and f : D → R a bounded function. We say
that f is Riemann integrable on D if given a compact rectangle R ⊃ D, the
function f˜ : R → R defined as f˜(x) = f (x) if x ∈ D and f (x) = 0 if x ∈ R \ D
is Riemann integrable on R. In such a case, we take
Z Z
f := f˜.
D R

It is not difficult to check that the definition is independent of the chosen rect-
angle R, and taking R(D). Properties of function integrables on rectangles
extend naturally to R(D). In a similar fashion, for f : Rd → R with compact R
support, that is, if the set {x ∈ Rd : f (x) ̸= 0} is bounded, we may define f
in terms of integration on rectangles.

The integrability of f : D → R depends on the continuity points of the


extended function f˜ which in turn depends both on the values of f and the
“distribution” of D into Rd . It seems to be a good idea to investigate the sets
of Rd where the continuous functions, at least, are integrable.
Definition 6.4.1. A bounded subset A ⊂ Rd is said Jordan measurable (or
Jordan domain) if its indicator
R function χA is Riemann integrable. In such a
case, the number c(A) = χA is called the Jordan content of A.
Observe that null content sets are those Jordan measurable sets having
content zero. For a bounded set A ⊂ Rd we Rmay define the inner content
c∗ (A) = χA and the outer content as c∗ (A) = χA . We have that a bounded
R

setA is measurable Jordan if and only if c∗ (A) = c∗ (A), whose interpretation


is related to the Greek’s exhaustion method for areas and volumes.
Proposition 6.4.2. A bounded subset A ⊂ Rd is Jordan measurable if and
only if its boundary ∂A is of null content.

89
Proof. The discontinuities of χA happen exactly at the points of ∂A.

We have defined the Jordan content from the Riemann integral. The other
way around is posible as shows the following result. The details of the proof
are left to the reader.
Proposition 6.4.3. Let R ⊂ Rd be a rectangle.
1. If f : R → [0, +∞) a bounded function and consider F = {(x, t) : x ∈
R, 0 ≤ t ≤ f (x)}. Then
Z Z
f = c∗ (F ), f = c∗ (F )
R R

where the Jordan content is taken in Rd+1 . In particular, fR is Riemann


integrable if and only if F is Jordan measurable, and then R f = c(F ).
2. Bounded sets defined by subgraphs and epigraphs of Riemann integrable
functions are Jordan measurable.
3. A bounded function f : R → R is Riemann integrable if and only if its
graph {(x, f (x)) : x ∈ R} is of null content in Rd+1 .
We have the mean value property of the integral.
Proposition 6.4.4. If D is a Jordan set and f ∈ R(D) then
Z
1
inf{f, D} ≤ f ≤ sup{f, D}.
c(D) D
Proof. Just compare f with λχD with λ ∈ {inf{f, D}, sup{f, D}} and inte-
grate.

The characterization Theorem 6.3.2 is extended with no trouble.


Proposition 6.4.5. A bounded function f : D → R is Riemann integrable on
a Jordan domain D if and only if the set of its points of discontinuity is of null
measure (equivalently, the set of points where the oscillation is bigger than δ
is of null content for every δ > 0).
Note that Jordan sets are stable by finite unions, finite intersections and
differences. We say that two Jordan sets A and B do not overlap if A ∩ B ⊂
∂A ∪ ∂B. The problem of measuring sets in Rd is solved in the frame of Jordan
sets.

90
Proposition 6.4.6. If Ai ni=1 ⊂ Rd is a non overlapping finite
P family of Jordan
sets, then its union is Jordan as well and c( ni=1 Ai ) = ni=1 c(Ai ).
S

We can easily deduce the following uniqueness result.

Corollary 6.4.7. Let k be a positive, monotone and additive function defined


on a class of subsets of Rd that includes the rectangles and is invariant by
translations. Then there exists some λ > 0 such that k = λc.
A Jordan partition of a Jordan set DS is a non overlapping finite family
{Di }ni=1 of Jordan sets such that D = ni=1 Di . Jordan partitions provide
a good frame for Riemann sums, which provide a more explicit way for the
computation of integrals.
Theorem 6.4.8. Let f ∈ R(D) where D is a Jordan domain. For every
ε > 0 there is δ > 0 such that if {Di }ni=1 is a Jordan partition of D into sets
of diameter less than δ, then
n
X Z
| f (ti )c(Di ) − f| < ε
i=1 D

for any choice of points ti ∈ Di .

Proof. Without loss of generality we may assume that D is compact. Indeed,


take D and extend f to D \ D as zero. Fix ε > 0. Let M = osc(f, D). The set
x ∈ D : osc(f, x) ≥ ε/c(D) is covered by finitely many open rectangles such
that its union is an open set O with c(O) < ε/2M . For any point x ∈ D \ O,
take Ux ∋ x such that osc(f, Ux ) < ε/2c(D) and consider the Lebesgue number
ξ of the open cover {O} ∪ {Ux }x∈D\O . If {Di }ni=1 is a Jordan partition such
any set Di has diameter less than ξ, take N to be the set of such indices i for
which Di ⊂ O. We have
n
X Z n Z
X
| f (ti )c(Di ) − f| ≤ |f (ti ) − f |
i=1 D i=1 Di

XZ XZ
= |f (ti ) − f | + |f (ti ) − f |
i∈N Di i̸∈N Di
X X ε ε
≤ M c(Di ) + c(Di ) ≤ M c(O) + c(D) ≤ ε
i∈N i̸∈N
2c(D) 2c(D)

91
whenever the points ti ∈ Di are chosen.
In fact, the thesis in the previous statement implies the Riemann integra-
bility suitably reformulated. Indeed, if the Riemann sums
n
X
f (ti )c(Di )
i=1

have a common limit when the Jordan partition {Di }ni=1 is either refined or
the maximum diameter of its sets goes to zero, then the function f must be
integrable on D.

The convergence of Riemann sum can be applied to prove the change of


variables formula in a very important particular case.
Theorem 6.4.9. Let E ⊂ [0, +∞) × [0, 2π] a Jordan domain mapped on the
Jordan domain D ⊂ R2 by the map (θ, r) → (r cos θ, r sin θ). Then for any
f ∈ R(D) we have
ZZ ZZ
f (x, y) dxdy = f (r cos θ, r sin θ) r drdθ
D E

Proof. Without loss of generality we may assume that E is a rectangle, since


the extension of f to be zero on the complement do not change the value of
the integrals. Set f˜(r, θ) = f (r cos θ, r sin θ). Take a partition on E with nodes
{(ri , θj )}n,m
i=1,j=1 . The rectangles are mapped on sectors Di,j having area

ri−1 + ri
c(Di,j ) = (ri − ri−1 )(θj − θj−1 )
2
The associate Riemann sum over D with the evaluation on central points is
n X m r + r
X i−1 i θi−1 + θi ri−1 + ri θi−1 + θi 
f cos( ), cos( ) c(Di,j )
i=1 j=1
2 2 2 2
RR
which approaches D
f (x, y) dxdy. On the other hand, the sum coincides with
Xn X m r + r θ + θ r + r
˜ i−1 i j−1 j i−1 i
f , (ri − ri−1 )(θj − θj−1 )
i=1 j=1
2 2 2
RR
which is a Riemann sum associate to E f (r cos θ, r sin θ) r dr dθ. The refining
of the partition in the sense of Theorem 6.4.8 gives the equality of the two
integrals of the thesis.

92
6.5 Iterated integrals
Until this moment we have not said how Riemann integrals in Rd are com-
puted. The idea is to reduce to iterated integral in spaces of lesser dimension,
which in practice means that all can be reduced to one dimensional integrals
where the calculus of primitive functions is the main device for its computation.

Next result is known as Fubini theorem for Riemann integral.


Theorem 6.5.1. Let R ⊂ Rd1 and S ⊂ Rd2 rectangles and f ∈ R(R × S). For
x ∈ R take fx (y) = f (x, y) defined on S and consider its Darboux integrals
Z Z
L(x) = fx , U (x) = fx .
S S

Then L, U ∈ R(R) and


Z Z Z
f= L= U
R×S R R

Moreover, fx ∈ R(S) for x ∈ R except a null measure set.


Proof. Consider partitions into rectangles {Ri }ni=1 and {Sj }m
j=1 of R and S
respectively. Observe that m(Ri × Sj ) = m(Ri )m(Sj ), where each volume is
understood according to the dimension of the space. If x ∈ Ri then
sup{fx , Sj } ≤ sup{f, Ri × Sj }
that implies
Z m
X m
X
U (x) = fx ≤ sup{fx , Sj }m(Sj ) ≤ sup{f, Ri × Sj }m(Sj )
S j=1 j=1

Taking supremum on x ∈ Ri we get to


m
X
sup{U, Ri } ≤ sup{f, Ri × Sj }m(Sj )
j=1

that implies
Z n
X n X
X m
U≤ sup{U, Ri }m(Ri ) ≤ sup{f, Ri × Sj }m(Ri × Sj )
R i=1 i=1 j=1

93
Taking infimum on the left hand side we get that
Z Z Z
U≤ f= f
R R×S R×S

A similar argument will show that


Z Z Z
L≥ f= f
R R×S R×S

On the other hand, we have these obvious inequalities


Z Z Z
L≤ U ≤ U
R R R

R R
All together implies that R U = R U , so U is Riemann integrable on R. Sim-
R R
ilarly, we have R L = R L and so the Riemann integrability of L, as well as
R R
the equality with R×S f . Now observe that R (U − L) = 0 and the function
U − L is positive, so U = L except a null measure set.

Not only multiple integrals are reduced to iterated integrals. A “typical


exercise” is to transform an impossible iterated integral into a feasible one.
Example 6.5.2. Z 1 Z 1 
y2
e dy dx
0 x

Firstly, note that

D = {(x, y) : 0 ≤ x ≤ 1, x ≤ y ≤ 1} = {(x, y) : 0 ≤ y ≤ 1, 0 ≤ x ≤ y}.

Therefore,
Z 1 Z 1  ZZ Z 1 Z y 
y2 y2 y2
e dy dx = e dx dy = e dx dy
0 x D 0 0

1 1 1
e−1
Z Z

y2
y 
y21 2
= xe dy = ye dy = ey = .
0 x=0 0 2 y=0 2

94
6.6 Improper integrals
Not always the interesting integrals satisfy the Riemann requirements: bound-
edness of the function or boundedness of the domain. In such a case we R have
an improper Riemann integral. The approach is simple. Assume that D f is
improper. If we S
can take an increasing sequence of bounded (Jordan) domains
(Dn ) such that ∞ n=1 Dn = D and f |Dn is bounded (and integrable Riemann,
of course), we may define Z Z
f = lim f
D n Dn
if the limit exists. That is not totally arbitrary: the way to produce the se-
quence (Dn ) is standard: on R we take intervals and R2 rectangles or circles,
depending on the geometry of the domain. In the problem is just a singular-
ity of the function, the domains consists in removing a neighbourhood of the
singularity, usually an Euclidean ball centred at the singularity.

For positive functions the existence of the limit is guarantied by Lebesgue


theory independently of the geometry of the domains. If the function takes pos-
itive and negative values around a singularity, the limit of the integrals could
depend on the choice of the domains. We say that the convergence happens in
principal value if there is convergence when
R +∞ R S the singularity
Rb is skipped
R c−ε symmet-
Rb
rically. For instance −∞ f = limS→+∞ −S f or a f = limε→0+ ( a f + c+ε )
if c ∈ [a, b] is the singularity of f .

Now, we will combine the iterated integration technique, the change to


polar variables and the notion of improper integral to obtain a very important
example: the famous Gaussian integral.
Example 6.6.1. Z +∞ √
2
e−x dx = π
−∞

Once we are convinced about the existence (finiteness) of


Z +∞ Z S
−x2 2
I= e dx = lim e−x dx
−∞ S→+∞ −S

we will consider the following improper integral on the plane


Z +∞ Z +∞
2 2
J= e−x −y dxdy.
−∞ −∞

95
Using integration on squares [−S, S]2 we deduce that J = I 2 . However, the
plane integral can be calculated through circles with the same result. Indeed,
the function is positive and every circle is contained into a square and viceversa.
With the help of polar coordinates we have
2π R R
−1 −r2
Z Z
2
2
I = lim e−r r dr dθ = lim 2π e =π
R→+∞ 0 0 R→+∞ 2 r=0

and therefore we get that I = π.

6.7 Rationale and remarks


The Riemann integral is more constructive, so more pedagogical, than Lebesgue
integral (I do not share Dieudonne’s views [14, p. 146]). The chapter is develop
so we could eventually skip Lebesgue theory, using the Jordan measurability
and content, which is enough for a practical use of integration.

With the same spirit, we include an independent proof of the transforma-


tion of the integral to polar coordinates if there is no time to fully develop
the change of variables theorem. With the same idea is possible to justify the
change to spherical coordinates and some other ones simple enough.

Note that the content is denoted c, although it coincides with Lebesgue


measure denoted m later.

6.8 Exercises
1. Prove that the null content sets in Rd are stable by finte unions and
closures.

2. Let R ⊂ Rd be a rectangle, D ⊂ R a null content subset and f : R → R


a bounded function such that f (x) = 0 for every x ∈ R \ D. Prove that
f is Riemann integrable on R and compute its integral.

3. For x ∈ [0, 1] consider the set

N (x) = {n ∈ N : ⌊3n x⌋ − 1 is multiple of 3}

96
where ⌊·⌋ is the integer part of a real number. Consider also the set

D = {x ∈ [0, 1] : N (x) ̸= ∅}.

Finally, for x ∈ D take n(x) = min(N (x)).


(a) Prove that [0, 1] \ D is a null measure set.
(b) Prove that the function f : [0, 1] → R defined by f (x) = 3−n(x) if
x ∈ D and f (x) = 0 otherwise, is Riemann integrable.
R1
(c) Calculate 0 f (x) dx.
4. A cylindrical cradle is a body limited by the horizontal plane, a vertical
cylinder and a tilted plane that meets the horizontal plane at a diameter
of the cylinder base. Find the volume of cylindrical cradle of radius r
and height h.
5. Let D = [0, 1]2 . Find the values α > 0 for which is finite the improper
integral ZZ
dx dy
α
D |x − y|

and calculate its value .


6. Compute √ √ !
Z π Z π
sin(x2 ) dx dy.
0 y

7. Prove the convergence and compute for D = {(x, y) : x, y ≥ 0, x2 + y 2 ≤


1} the integral ZZ
dxdy
.
D x+y

8. Let f : R → R a bounded function defined on a rectangle R ⊂ Rd . Prove


that f is Riemann integrable on R if an only if its graph

graf(f ) = {(x, f (x)) : x ∈ R)}

has null content. Prove, as a consequence, that the compact smooth


manifolds in Rd of dimension d − 1 have null content.

97
9. A function f : [a, b] → R is said to be step if there exists a partition
of [a, b] such that f is constant on the interior of each of the intervals
defined by the partition. A function is said ruled if it is a uniform limit
of step functions. Prove the following statements:
(a) Ruled functions has countably many discontinuities, at most.
(b) Ruled functions are Riemann integrable on their domains.
(c) A function is ruled if and only if at each c ∈ [a, b) exists limx→c+ f (x)
and at each c ∈ (a, b] exists limx→c− f (x).
10. Let f : [0, a] → [0, b] be a decreasing continuous bijection. Prove with
the help of a plane integral that
Z a Z b
f (x) dx = f −1 (x) dx.
0 0

11. Generalize the Riemann-Lebesgue theorem for functions defined on Jor-


dan domains.
12. Let Di ⊂ Rd i = 1, . . . , n be pairwise disjoint Jordan measurable sets and
let fi : Di → R be Riemann integrable functions. Prove the Sn Riemann
integrability of the “glued” function f : D → R where D = i=1 Di and
f (x) = fi (x) if x ∈ Di . Show also that
Z n Z
X
f= fi .
D i=1 Di

98
Chapter 7

Change of Variables in
Integration

7.1 Linear volume transformations


The objetive of this section is to prove that for any linear map T : Rn → Rn
one has that
m(T (A)) = |det(T )| m(A)
where m means volume in the sense of the Jordan content or the Lebesgue
measure (Chapter 9). The statement implicitly includes the measurability
of T (A). Firstly note the measurability is clear if det(T ) = 0 because in
such a case the image is include into a subspace of dimension n − 1 so it
has measure 0 or content 0 if it is moreover bounded. If det(T ) ̸= 0, then
T is an homeomorphism, so it preserves topology and Borel measurability.
We will discuss Lebesgue or Jordan measurability. Lebesgue measurability is
characterized by the fact that a measurable set differs from a Borel set in a set
of null measure. Jordan measurable sets are characterized for having Lebesgue
null boundary. Thus in both cases it is enough to study how T transforms null
sets. The main tool for this purpose is an estimation of the measure of images
of Lipschitz maps.
Proposition 7.1.1. Assume that Rn is endowed with the supremum norm.
Let f : D ⊂ Rn → Rn be a Lipschitz map with constant λ. Then

m∗ (f (A)) ≤ λn m∗ (A)

where m∗ denotes the outer Lebesgue measure.

99
Proof. The result is consequence of three easy observations. Firstly, the
Lebesgue outer measure can be approximated by coverings of balls (actually
cubes) if the norm we are using is ∥·∥∞ (actually, thanks to a result of Vitali we
may use any norm for the same purpose). Indeed, the outer measure is defined
by coverings of generalized rectangles and those rectangles can be arbitrarily
approached by non-overlapping unions of cubes. The second observation is
that for any ball we have

f (B[x, r]) ⊂ B[f (x), λ].

Finally we have m(B[x, λr]) = λn m(B[x, r]).

As the notion of null measure does not depend on the norm we have.
Corollary 7.1.2. A locally Lipschitz map f : D ⊂ Rn → Rn carries Lebesgue
null sets to Lebesgue nul sets.
Proof. The domain D can be decomposed in countably many domains where
the restriction of f is Lipschitz, so the previous theorem is applicable.

Now that the measurability of T (A) is not a problem, remember that the
Lebesgue measure is the unique non trivial translation invariant Borel measure
on Rn , but a multiplicative positive constant. Therefore, for every T linear
there is k(T ) ≥ 0 such that

m(T (A)) = k(T ) m(A).

Obviously k(T ) is the volume of the image of the unitary cube through T .
Note also that the constant is multiplicative

k(T S) = k(T ) k(S).

Now we will prove the following result.

Theorem 7.1.3. Let k be a nonnegative nontrivial multiplicative function de-


fined on the square matrices of size n. Assume that k is continuous at the
identity matrix I. Then there is λ ∈ R such that

k(T ) = | det(T )|λ .

100
Proof. Clearly k(I) = k(II) = k(I)2 . Since k is not trivial we have k(I) = 1.
If T is invertible we have k(T )k(T −1 ) = k(I) = 1. Therefore k(T ) ̸= 0 and
k(T −1 ) = k(T )−1 . We deduce that k takes the same value for similar matrices

k(S −1 T S) = k(S −1 ) k(T ) k(S) = k(T ).

Consider now the diagonal matrices Dx having 1’s on the diagonal except the
first entry which takes the value x ∈ R. Observe that

k(Dx )2 = k(Dx Dx ) = k(D−x D−x ) = k(D−x )2

and so k(D−x ) = k(Dx ). If we define f (x) = k(Dx ) then f (−x) = f (x)


and f (xy) = f (x)f (y). Since the assignment x → Dx is linear (among other
properties) and it is continuous at x = 1. The function g(t) = log(f (et ))
defined on R satisfies the equation g(t + s) = g(t) + g(s) and it is continuous at
0. It is well know that there is λ ∈ R such that g(t) = λt and thus f (x) = |x|λ .
Now, a diagonal matrix can be written as a product of matrices which are
similar to Dx ’s matrices (a permutation of the basis is a similarity operation)
being the x’s the eigenvalues. We deduce that k(D) = | det(D)|λ for a diagonal
matrix. The result now extends to symmetric matrices which are known to be
similar to diagonal matrices. Finally, an arbitrary matrix T is similar to its
transpose implying that k(T ) = k(T t ). As we have that T T t is symmetric we
deduce p
k(T ) = k(T T t ) = | det(T T t ))|λ/2 = | det(T )|λ
which concludes the proof.

Now we can achieve the objetive stated at the beginning of the section.
Theorem 7.1.4. Given a linear map T : Rn → Rn we have

m(T (A)) = |det(T )| m(A)

for any measurable, either in sense of Jordan or Lebesgue, set A ⊂ Rn .


Proof. Assume Rn is equipped with the supremum norm, and take C =
B[0, 1/2]. The set C has n-dimensional volume 1. If we denote by k(T ) =
m(C), we already know that m(T (A)) = k(T ) m(A) for any measurable set
and k(T S) = k(T ) k(S). Therefore, in order to prove the result we only have
to reduce it to the previous theorem checking that k is continuous at I and the
constant λ must be equal to 1. Let 0 < ε < 1. Since the operation of taking

101
inverse is continuous, there is 0 < δ < ε/2 such that ∥T − I∥ < δ implies
∥T −1 − I∥ < ε/2. We have ∥T ∥, ∥T −1 ∥ ≤ 2 and

T (C) ⊂ C + B[0, ε/2] = (1 + ε)C;

T −1 (C) ⊂ C + B[0, ε/2] = (1 + ε)C.


Applying T to the last we get C ⊂ (1 + ε)T (C), following that

(1 + ε)−1 C ⊂ T (C) ⊂ (1 + ε)C

which implies the continuity of k at I because (1 + ε)−d ≤ k(T ) ≤ (1 + ε)n .


Now we have that k(T ) = | det(T )|λ for some λ. If we set T = 2I we have

2n = m(T (C)) = k(T ) = | det T |λ = 2λd

following that λ = 1 as wanted.

The proof of the formula for the transformation of volumes through linear
maps can be obtained also by geometrical considerations which are specially
clear for R2 : showing that a parallelogram is equivalent to a rectangle by
decomposing it into 2 pieces.

7.2 The change of variables theorem


Now we will turn our attention to nonlinear transformations. Let start with
this fact which is just Lemma 5.1.2 stated in Rn .
Lemma 7.2.1. Let T : D ⊂ Rn → Rn be a C 1 map and let x0 ∈ D such that
dT (x0 ) is nonsingular. Then for every 0 < η < 1 there exists δ > 0 such that
T |B[x0 ,r] is one-to-one, T −1 is differentiable at f (x0 ) and

T (x0 ) + dT (x0 )(B[0, (1 − η)δ]) ⊂ T (B[x0 , δ]) ⊂ T (x0 ) + dT (x0 )(B[0, (1 + η)δ]).

In particular, the image through T of a neighbourhood of x0 is a neighbourhood


of T (x0 ). Moreover, f (U ) is open whenever U ⊂ D is open and dT (x) is
nonsingular at every point x ∈ U .
After the proof of the lemma a remark was done: the number δ > 0 only
depends on the modulus of continuity of dT (x) and an upper bound for ∥dT −1 ∥

102
Theorem 7.2.2. Let R ⊂ Rn be a compact rectangle, let T : R → Rn be
an one-to-one C 1 map with dT non singular on R. Then T (R) is Jordan
measurable and for every Riemann integrable function f : T (R) → R then
f ◦ T is Riemann integrable on R and
Z Z
f= f ◦ T | det(dT )|
T (R) R

where the determinant is computed for the matrix of dT with respect to the
canonical bases.
Proof. First of all, we may assume that R has nonempty interior. Otherwise
R would be measure 0 and so its image T (R) being the result true trivially. We
may assume that f ≥ 0 as well. By Theorem 5.1.4 we know that the interior
of R is transformed into an open set by T , therefore the boundary of T (R)
is contained in T (∂R) which has null measure. That implies T (R) is Jordan
measurable.
Observe that ∥(dT )−1 ∥ is bounded on T (R) which implies that T −1 is Lipschitz.
If D is the null measure set of discontinuities of f then T −1 (D) is also null.
Since the set of discontinuities of f ◦ T is exactly T −1 (D) we get that f is
Riemann integrable.
We may set the norm of Rn to have the unit ball a translation of R. Take
0 < η < 1 and note that now R can be decomposed into N n non overlapping
balls of radius 1/N . By the continuity of dT on a larger open containing R
we may take N large enough to guarantee that the set containment of the
Lemma can be applied with such η to all the balls of radius 1/N . Let xk with
1 ≤ k ≤ 2N the centres of the balls covering R and Bk = B[xk , N1 ]. We have
now
  1 − η    1 + η 
T (xk ) + dT (xk ) B 0, ⊂ T (Bk ) ⊂ T (xk ) + dT (xk ) B 0, .
N N
Having in mind that m(L(S)) = | det(L)|m(S) for any linear map L and any
compact rectangle S, we get

(1 − η)n m(Bk )| det(dT (xk ))| ≤ m(T (Bk )) ≤ (1 + η)n m(Bk )| det(dT (xk ))|.

Multiplying by f (T (xk )) and adding we get


2 N
X
n
(1 − η) f (T (xk )) m(Bk ) | det(dT (xk ))|
k=1

103
2N 2N
X X
n
≤ f (T (xk )) m(Bk ) ≤ (1 + η) f (T (xk )) m(Bk ) | det(dT (xk ))|
k=1 k=1
The sums are of Riemann type, standard ones at the ends and associated to a
Jordan partition of T (R) in the middle. so letting n going to infinity we will
get
Z Z Z
n n
(1 − η) f ◦ T | det(dT )| ≤ f ≤ (1 + η) f ◦ T | det(dT )|
R T (R) R

As η can be taken arbitrarily close to 1 we get the desired result.

The result can be extended to general Jordan domains.


Theorem 7.2.3. Let D ⊂ Rn an open Jordan domain, let T : D → Rn be a
C 1 map such that T is one-to-one and dT is non singular on D. Then T (D) is
Jordan measurable and for every Riemann integrable function f : T (D) → R
then f ◦ T is Riemann integrable on D and
Z Z
f= f ◦ T | det(dT )|.
T (D) D

Proof. The arguments employed above for the Jordan measurability of T (D)
and the Riemann integrability of f ◦ T can be adapted here with some small
changes. As before T (D) is open and the boundary of T (D) is included into
T (∂D). However, T −1 is locally Lipschitz which implies that the set of discon-
tinuities of f ◦ T is null.
To prove the formula, cover ∂D with a finite union of compact rectangles whose
volumes sums less than ε. Then D \ S can be decomposed into a finite union of
non-overlapping rectangles. The previous theorem applied on each rectangle
and having in mind that the images by T of the rectangles are non-overlapping
give us Z Z
f= f ◦ T | det(dT )|.
T (D\S) D\S
If M is an upper bound to f we have
Z Z Z
f− f ≤ |f | ≤ M m(S)
T (D) T (D\S) T (D∩S)

and Z Z Z
T ◦f − T ◦f ≤ |T ◦ f | ≤ M λn m(S)
D D\S D∩S
can be done arbitrarily small which leads to the desired equality.

104
Corollary 7.2.4. If D is an open Jordan domain and T : D → Rn be a C 1
map such that T is one-to-one and dT is non singular on D then T (D) is a
Jordan domain Z
m(T (D)) = | det(dT )|.
D

7.3 The Morse-Sard theorem


So far we have been asking the map T to have non singular differential dT
on the interior of the domain. We will see that the singular points of dT
are actually negligible when it comes to integration. That is the spirit of the
following result known as the Morse-Sard theorem, that we will prove only the
version for mapping between spaces of the same dimension.

Theorem 7.3.1. Let T : D ⊂ Rn → Rn be a C 1 map and consider the set of


singular points
S = {x ∈ D : dT (x) is singular}.
Then T (S) has null measure.

Proof. First of all note that S is closed. The proof will be by induction on
the dimension n.
Suppose d = 1. It is enough to show that S ∩(a, b) has null measure. Note that
in this case dT (x) is singular if and only if T ′ (x) = 0. Given ε > 0, consider
the set
U = {x ∈ (a, b) : |T ′ (x)| < ε}
Then S ∩ (a, b) ⊂ U and T is ε-Lipschitz on every interval composing U thanks
to the mean value theorem. Now apply Proposition 7.1.1 to obtain that

m∗ (T (S ∩ (a, b))) ≤ ε m∗ (S ∩ (a, b)) ≤ ε(b − a).

That implies m∗ (T (S ∩ (a, b))) = 0 as ε is arbitrary.


Assume now that the statement is proven for n − 1. We will write

T (x) = (f1 (x1 , x2 , . . . , xn ), f2 (x1 , x2 , . . . , xn ), . . . , fn (x1 , x2 , . . . , xn )).

Consider the set


∂fi
Z = {x ∈ D : dT (x) = 0} = {x ∈ D : (x) = 0, 1 ≤ i, j ≤ d}.
∂xj

105
Obviously Z ⊂ S. If B is an arbitrary closed ball and ε > 0, then its is possible
to cover Z ∩ B with finitely many non-overlapping convex sets such that T
is ε-Lipschitz on each of them thanks to the mean value theorem (in several
variables). Reasoning as in the 1-dimensional case that gives m∗ (T (Z ∩ B)) ≤
εm∗ (B) which implies m∗ (T (Z)) = 0 on account of B and ε.
The objetive now is to show that T (S \ Z) has null measure. Note that it
is enough to show that every x0 ∈ S \ Z has a neighbourhood U such that
∂fi
T ((S \ Z) ∩ U ) is null. As x0 ∈ S \ Z there are i, j such that ∂x j
(x) ̸= 0.
Reordering the variables and the coordinate functions we may assume that
∂f1
∂x1
(x) ̸= 0. Consider the map G(x) = (f1 (x), x2 , . . . , xn ) and note that dG(x0 )
is not singular. By the inverse mapping theorem there is a neighbourhood U of
x0 and V of G(x0 ) such that G is a bijection form U onto V . The composition
H = T ◦ G−1 defined on V is of the form

H(y) = (y1 , h2 (y), . . . , hn (y))

and dH(y) is singular if and only if y ∈ A = G((S \ Z) ∩ U ) because they come


from the singular points of dT . On the other hand, dH(y) is singular if and
only if the n − 1 dimensional Jacobian
 ∂h2 ∂h2

∂y2
. . . ∂yn
 .. .. .. 
 . . . 
∂hn ∂hn
∂y2
... ∂yn

is singular at y, thanks to particular form of H. Evidently T ((S \ Z) ∩ U ) =


H(A) so it is enough to prove that H(A) is null. For t ∈ R we will consider
the affine hyper-plane

Pt = {y = (y1 , . . . , yn ) ∈ R : y1 = t}.

and the map Ht (y2 , . . . , yn ) = H(t, y2 , . . . , yn ) defined on V ∩Pt whose Jacobian


is just above. The set of singular points of dHt is exactly A ∩ Pt . By the
induction hypothesis we get that Ht (A ∩ Pt ) is null and therefore

{t} × Ht (A ∩ Pt ) = Pt ∩ H(A)

is a (n − 1)-dimensional null set, and this is true for every t ∈ R. A well known
consequence of Fubini’s theorem says that H(A) is a n-dimensional null set.

106
7.4 Brouwer fixed point theorem
A spectacular application of the change of variables formula is a simple proof
of the topological theorem about fixed points due to Brouwer.
Theorem 7.4.1. A continuous map from BRn into itself has a fixed point.
Along the section n ∈ N is fixed and we will write B = BRn and S = ∂B.
By standard techniques it is easy to prove the equivalence of the fixed point
property (FPP) for B with the nonexistence of a retraction of B onto S, that is,
a continuous map from B onto S that fixes the points of S. That can be done
not only in the category of continuous maps but also C 1 , which will important
for the proof.
Lemma 7.4.2. The FPP of B for C 1 maps implies the FPP of B for contin-
uous maps.
Proof of the Theorem 7.4.1. Let P be a C 1 retraction of B onto S. We
will arrive to a contradiction after a witty construction. For every t ∈ [0, 1]
take
Pt (x) = (1 − t)x + tP (x)
and note that Pt is a C 1 map from B onto itself that fixes S. We claim that
for t small enough Pt is an homeomorphism onto its image. Indeed, let L be
the Lipschitz constant of P . Then
∥Pt (x) − Pt (y)∥ = ∥(1 − t)(x − y) + t(P (x) − P (y))∥
≥ (1 − t)∥x − y∥ − t∥P (x) − P (y)∥ ≥ (1 − t)∥x − y∥ − Lt∥x − y∥
≥ (1 − (L + 1)t)∥x − y∥.
Therefore, taking t < (L + 1)−1 the inverse of Pt is defined and Lipschitz.
Moreover, for t small enough the map and its inverse are open. Indeed, that is
consequence on the Inverse Map Theorem since det(dPt ) is nearly 1 for t close
to 0. That implies Pt carries one-to-one S onto the ∂Pt (B). As Pt fixes S we
deduce that Pt (B) = B for t small enough.
On the other hand, note that det(dPt ) is a polynomial in t, so it is the function
Z
h(t) = det(dPt (x)) dx
B

defined for t ∈ [0, 1]. As for t small enough Pt is a diffeomorphism, the change
of variables formula Theorem 7.2.3 says that h(t) = m(B) > 0. As h is a
polynomial, being constant in an interval implies to be constant everywhere.
However, h(1) = 0 because P1 = P collapses on S. That is a contradiction.

107
Corollary 7.4.3. Any compact set that is homeomorphic to, or a retract of,
an Euclidean ball has the FPP.

7.5 Assorted changes of variables


In practise we do not need introduce much new letters to do a change of
variables: if we put

T (u1 , . . . , un ) = (x1 (u1 , . . . , un ), . . . , xn (u1 , . . . , un ))

then the Jacobian of T is usually denoted


 ∂x1 ∂x1

...
∂(x1 , . . . , xn )  ∂u. 1 ..
∂un
.. 
=  .. . . 
∂(u1 , . . . , un ) ∂xn ∂xn
∂y2
... ∂un

so the change of variables in the integral becomes


Z Z
··· f (x1 , . . . , xn ) dx1 . . . dxn
T (D)

Z Z
∂(x1 , . . . , xn )
= ··· (f ◦ T )(u1 , . . . , un ) du1 . . . dun
D ∂(u1 , . . . , un )
that it is easy to remember.

7.5.1 Sum of the inverse of the squared integers


Consider the integral Z 1 Z 1
dxdy
0 0 1 − xy
−1
P∞ n n
Since (1 − xy) = n=0 x y , the integration term by term gives that
Z 1 Z 1 ∞ Z 1 Z 1 ∞
dxdy X
n n
X 1
= x y =
0 0 1 − xy n=0 0 0 n=1
n2

where the equality can be justified by the monotone convergence theorem (ap-
plication of Riemann theory needs a more detailed analysis).

108
Let T be the triangle with vertices (0, 0), (1.0) and (1, 1). By symmetry we
have Z 1Z 1 ZZ
dxdy dxdy
=2
0 0 1 − xy T 1 − xy
Consider now the change of variables given by x = v + u, y = v − u where
(u, v) runs over the triangle D with vertices (0, 0), (1/2, 1/2) and (0, 1). As
the jacobian is 2 we have
ZZ ZZ
dxdy dudv
= 2 2
T 1 − xy D 1−v +u

In order to compute the last integral we do the decomposition


ZZ Z 1/2 Z v Z 1 Z 1−v
dudv dudv dudv
2 2
= 2 2
+
D 1−v +u 0 0 1−v +u 1/2 0 1 − v 2 + u2
The first integral is calculated as follows
Z 1/2   Z 1/2
1 v dv
√ arctan √ dv = arcsin(v) √
1−v 2 1−v 2 1 − v2
0 0

1 2 1/2 π2
= arcsin(v) |0 =
2 72
As to the second integral, we have after a first integration that
Z 1  
1 1−v
√ arctan √
1/2 1 − v2 1 − v2

Putting v = cos t, then 1 − v = 2 sin2 (t/2) and



1 − v 2 = sin t = 2 sin(t/2) cos(t/2).
Therefore
1−v sin(t/2)
√ = = tan(t/2)
1 − v2 cos(t/2)
 
1−v t 1
arctan √ = = arccos(v)
1 − v2 2 2
and with this
Z 1
π2
 
1 1−v 1 −1
√ arctan √ dv = arccos(v)2 |11/2 =
1/2 1 − v2 1 − v2 2 2 36

109
Finally,
∞  2
π2 π2

X 1 π
= 4 + =
n=1
n2 72 36 6
as desired.

7.5.2 Integrals of Euler


They are defined as parametric one variable integrals, however the relation
between them is explained with the help of a two variable integral. Consider
for p, q > 0 the functions
Z +∞
Γ(p) = tp−1 e−t dt
0
Z 1
B(p, q) = tp−1 (1 − t)q−1 dt
0
Observe that
Z +∞  Z +∞ 
p−1 −t q−1 −s
Γ(p)Γ(q) = t e dt s e ds
0 0
ZZ Z +∞ Z 1 
p−1 q−1 −t−s p−1 q−1 −r
= t s e dtds = (rw) ((1 − w)r) e r dw dr
Q 0 0
Z +∞  Z 1 
p+q−1 −r p−1 q−1
= r e dr w (1 − w )dw = Γ(p + q)B(p, q)
0 0

where Q stands for the first quadrant and the change of variables is

t = rw, s = r(1 − w),

whose Jacobian (absolute value) is r. Therefore

Γ(p)Γ(q)
B(p, q) = .
Γ(p + q)

110
7.5.3 Integrals of Dirichlet
The transformation of the pyramid

{(x1 , . . . , xn ) : x1 , x1 ≥ 0, . . . , xn ≥ 0, x1 + · · · + xn ≤ 1}

into a cube [0, 1]n can be perform with this change of variables

x1 + x2 + · · · + xn = u1
x2 + · · · + xn = u1 u2
..
... . ...
xn = u1 u2 . . . un

that leads to x1 = u1 (1 − u2 ), x2 = u1 u2 (1 − u3 ) and so on. The transformation


is a diffeomorphism for the interior of the domains, however the boundary
collapses (that will not be a problem). The Jacobian can be computed with
the usual tricks to reduce the complexity of the determinant
∂(x1 , . . . , xn )
= un−1
1 u2n−2 . . . un−1
∂(u1 , . . . , un )
With the help of this transformation one can generalise the formula of Euler
above, for instance. Let p0 , p1 , . . . , pn > 0 and consider the pyramid

D = {(x1 , . . . , xn ) : x1 , x1 ≥ 0, . . . , xn ≥ 0, x1 + · · · + xn ≤ 1}.

Then
Z Z
··· xp11 −1 xp22 −1 . . . xpnn −1 (1 − x1 − · · · − xn )p0 −1 dx1 dx2 . . . dxn
D

Γ(p0 )Γ(p1 ) . . . Γ(pn )


= .
Γ(p0 + p1 + · · · + pn )

7.6 Rationale and remarks


The changes of variables theorem is stated only for Riemann integral, nev-
ertheless it can be obviously adapted to Lebesgue integral. We will not do
that explicitly, but is clear that in Lebesgue theory some troubles of Riemann
integral disappear.

111
The theorem of Brouwer is usually proved in Topology books with discreti-
sation and combinatorics. The proof using the change of variables is due to
Milnor and Rogers.

The change of variables of Dirichlet deserves a discussion in the classroom


because is not evident how a triangle or a pyramid can be transformed into a
square or a cube, respectively.

7.7 Exercises
1. Find the volume of the body limited by the sphere x2 + y 2 + z 2 = 1 and
the cylinder x2 + y 2 = 2x.
2. Let D = {(x, y) : x2 + y 2 ≤ 1}, and calculate
ZZ p
1 + x2 + y 2 dxdy.
D

3. Let B = {(x, y, z) : x2 + y 2 + z 2 ≤ 1}, and calculate


ZZZ
dxdydz
2 2 2
.
B x + y + (z − 2)

4. Let E = {(x, y, z) : 4x2 + 9y 2 + 36z 2 ≤ 36}, and calculate


ZZZ
(2x + 3y + 6z)2 dxdydz.
E

5. Prove the convergence and find the value of the integral


2 2
e−x −y
ZZ
p dxdy.
R2 x2 + y 2
6. Prove that Γ(n) = (n − 1)! for n ∈ N.

7. Prove that Γ(1/2) = π.
8. Prove that, for p > 0, the n-dimensional volume of the set
Bpn = {(x1 , . . . , xn ) : |x1 |p + · · · + |xn |p ≤ 1}.
is
2n Γ( p1 )n
vol(Bpn ) = .
pn Γ( np + 1)

112
Chapter 8

Measure Theory and Lebesgue


Integral

8.1 Motivation
A moment’s reflection on how the notion of area for polygons is treated in
elementary texts shows that the existence of the area and its additive prop-
erty are mostly assumed a priori, so the actual task is to compute the areas
of progressively more complicate polygons. In a second stage the area can be
extended to some nonpolygons, as the circle, assuming monotonicity. Well, it
is possible to provided a sound basis to the elemental method: define the area
for triangles, show that it is independent of the position of the triangle, prove
that it is additive within the triangles, and finally extend it to polygons by
decompositions into triangles. . . However, knowing in advance the scope of this
method, we could opt for an easier approach which will lead to the same results.

An elemental measure theory on Rn can be developed as follows. We will


consider in the first step rectangles R = [a1 , b1 ] × · · · × [an , bn ] whose measure
is the number m(R) = (b1 − a1 ) . . . (bn − an ). We may also consider products
of open intervals with the same measure (the border of the rectangles is negli-
gible) or infinite intervals with the convention +∞ · 0 = 0 · (+∞) = 0. When
a rectangle can be decomposed into a finiteSn number of disjpoint Pnrectangles,
or merely non overlapping, namely R = k=1 Rk then m(R) = k=1 m(Rk ).
Indeed, any decomposition can be refined to a grid decomposition, for whom
the additive property is just a consequence of the distributivity of the sum
with respect to the product.

113
We will say that a set is elemental if it is union of finitely many rectangles.
The measure can be extended to elemental sets and the measure is extended
additively using non overlapping decompositions into rectangles. The reason
why an elemental set can be reduced to a finite union of non overlapping
rectangles lies in the fact the difference of two rectangles is a finite union of
rectangles. Checking that the definition of m does not depend on how the
decomposition is chosen and the additivity of m with respect to finite disjoint
(or not overlapping) unions offers no challenge.

The last step of the construction is the extension of the measure to a


more general class of sets. Since the measure must be monotone, given an
arbitrary set A ⊂ Rn its the measure m(A), in case it is possible to define it,
must satisfy m(E1 ) ≤ m(A) ≤ m(E2 ) whenever E1 , E2 ⊂ Rn are elemental
and E1 ⊂ A ⊂ E2 . Let us say that a set A ⊂ Rn is measurable (in the
sense of Jordan) if for every ε > 0 there are elemental sets E1 , E2 ⊂ Rn with
E1 ⊂ A ⊂ E2 and m(E2 \ E1 ) < ε. In that case, we can assign a measure to
A by
m(A) = sup{m(E) : E ⊂ A elemental} = inf{m(E) : E ⊃ A elemental}.
The numbers appearing are called inner content and outer content (in Jordan’s
sense) of the set A respectively, so measurability appears as the agreement of
inner and outer measures. Finite unions and finite intersections of measurable
sets are measurable and the measure m is finitely additive for finite disjoint
unions of measurable sets.

It is not difficult to prove that a bounded set A is measurable if and only


if its border ∂A has measure 0, which in this context means that it can be
covered by finitely many rectangles such that the sum of their measures can
be made arbitrarily small. That implies easily that all the simple geometrical
objects (polygons, circles, etc.) are measurable and their standard measure-
ment is back up by a rigorous construction.

The method just sketched above, namely Jordan theory of measure (see
Chapter 7), is somehow related to Riemann integral. It serves well at ele-
mentary level but it has many limitations. For instance, sets as simple as the
rational numbers between 0 and 1 are not measurable. Moreover, the approx-
imation of the circle area from within using polygons can be understood as a
limit process which implies a decomposition of the circle into countably many

114
rectangles. The measure m should be countably additive as the geometric in-
terpretation deserves, however it cannot. Otherwise, the set of rationals would
have measure 0, since it is a countable union of points.

It would be desirable to define a countably additive measure for (some) sets


of Rn that generalizes the Jordan construction. This is actually possible and
the measure is called the Lebesgue measure. The next sections of the chapter
will be devoted to measures and their construction and Lebesgue measure and
its properties will appear as a by product of more general results.

8.2 Measures
We need a family of sets where we can perform all the required operations
with a countably additive function. Motived by the previous section we will
introduce algebras and σ-algebras. An algebra of subsets of a set Ω is family
A ⊂ P(Ω) which satisfies

1. ∅, Ω ∈ A;
2. A ∈ A implies Ac ∈ A;
3. nk=1 Ak ∈ A whenever A1 , . . . , An ∈ A.
S

We say that a family Σ ⊂ P(Ω) is a σ-algebra if it is an algebra and satisfies


moreover

3’. ∞ ∞
S
n=1 Ak ∈ Σ whenever (An )n=1 ⊂ Σ.

As we will see algebras and measures on them appears naturally, however the
theory works nicer with σ-additivity on σ-algebras. On the other hand to
build nontrivial σ-additive measures is a delicate task that we will face in later
sections. Here we will study some properties of systems of sets and measures
provided they are given.

Note that all the σ-algebras that one can define on a nonempty set Ω lie be-
tween the smallest one {∅, Ω} and the biggest one P(Ω). Since the intersection
of σ-algebras is again a σ-algebra, given F ⊂ P(Ω) there is a smaller σ-algebra
containing F called the σ-algebra generated by F and denoted σ(F). This σ-
algebra can be actually built explicitly from F using transfinite induction.
Among the σ-algebras generated by families of sets in a topological space we

115
will consider the Borel σ-algebra which is generated by the open (eq. closed)
sets and the Baire σ-algebra which is the smaller making measurable the con-
tinuous functions. Borel and Baire sets coincide for a metrizable space but
they are different in general.

Sometimes it is necessary to check that a given family of sets is a σ-algebra


however the stability by complements is far from being obvious. The following
notion can be helpful in those cases. A monotone class on Ω is a family of sets
M ⊂ P(Ω) which is stable by countableS∞ monotone unions and intersections,
namely if
T∞ 1 A ⊂ A 2 ⊂ · · · ∈ M then n=1 An ∈ M and if B1 ⊃ B2 ⊃ · · · ∈ M
then n=1 Bn ∈ M.
Theorem 8.2.1. Let A be an algebra of subsets of Ω. Then σ(A) is the
smallest monotone class that contains A.
Proof. The existence of a smallest monotone class M containing (A) is clear,
and obviously it is contained into σ(A) as any σ-algebra is a monotone class,
therefore we have to show the reverse implication. Take a set A ⊂ Ω and
consider the class
M(A) = {B ⊂ Ω : A \ B, B \ A, A ∪ B ∈ M}.
Observe that M(A) is a monotone class for any A ⊂ Ω. Suppose that A ∈ A.
The definition of the set above implies clearly that A ⊂ M(A) and therefore
M ⊂ M(A) by minimality. Now suppose that B ∈ M. Since B ∈ M(A), the
definition of M(A) implies that A ∈ M(B) and this is true for any A ∈ A.
Therefore M ⊂ M(B) for any B ∈ M. Appealing again to the definition
of M(B) we deduce that if A, B ∈ M then A \ B, A ∪ B ∈ M. The first
containment implies the stability by complements, the second one the stabil-
ity by countable unions because now they can be reduced to monotone unions.

A nonempty set Ω endowed with a σ-algebra Σ is called a measurable space.


A (countably additive or σ-additive, if we want to stress the notion) measure
µ defined on a measurable space (Ω, Σ) is a function µ : Σ → [0, +∞] which
satisfies µ(∅) = 0 and

[ ∞
X
µ( An ) = µ(An )
n=1 n=1
whenever (An )∞
n=1 ⊂ Σ are mutually disjoint. Sometimes we will have to
consider the weaker notion of finitely additive measure. In that case the qual-
ification is always necessary to avoid confusions. Finitely additive (an so σ-
additive) measures has these properties whose verification is left to the reader:

116
1. If A, B ∈ Σ, A ⊂ B and µ(A) < +∞ then µ(B \ A) = µ(B) − µ(A).

2. If A, B ∈ Σ then µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B).


3. If (Ai )ni= ⊂ Σ have finite measure then
n
!
[
µ Ai =
i=1

n
!
X X X \
µ(Ai ) − µ(Ai ∩ Aj ) + µ(Ai ∩ Aj ∩ Ak ) − · · · ± µ Ai .
i i̸=j #{i,j,k}=3 i=1

Now some properties that take advantage of the σ-additivity.


Proposition 8.2.2. Let (Ω, Σ, µ) be a measure space and (An ) ⊂ Σ.
1. If A1 ⊂ A2 ⊂ A3 ⊂ . . . then µ( ∞
S
n=1 An ) = limn µ(An ).

2. If A1 ⊃ A2 ⊃ A3 ⊃ . . . and µ(A1 ) < +∞ then µ( ∞


T
n=1 An ) = limn µ(An ).

Proof. In the first case define sets Bn = An \ k<n Ak and note An = nk=1 Bk
S S
being the last union disjoint. Therefore
n
X ∞
X ∞
[ ∞
[
lim µ(An ) = lim µ(Bk ) = µ(Bk ) = µ( Bk ) = µ( Ak ).
n n
k=1 k=1 k=1 k=1

The second case is consequence of the first when applied to the sets Cn =
A1 \ An . Indeed, we have

\ ∞
[ ∞
[
µ( An ) = µ(A1 \ Cn ) = µ(A1 ) − µ( Cn )
n=1 n=1 n=1

= µ(A1 ) − lim µ(Cn ) = lim µ(A1 \ Cn ) = lim µ(An ).


n n n

Now we will give an example in order to discuss a classification of measure


spaces. Consider a set Γ with the σ-algebra of all its subsets
P P(Γ). Take
numbers (aγ ) ⊂ [0, +∞] for γ ∈ Γ and define µ(A) = γ∈A aγ . it is not
difficult to check that the measure is σ-additive despite the series could be
uncountable. The particular case with aγ = 1 for all γ ∈ Γ is called the
cardinal measure. Note that the singletons “{γ}” of positive measure cannot

117
be decomposed into sets of smaller positive measure. Given a measure space
(Ω, Σ, µ) a set A ∈ Σ is called an atom if 0 ∈ {µ(B), µ(A \ B)} whenever
B ∈ Σ with B ⊂ A. We say that two atoms A, B ∈ Σ are equivalent if
µ((A \ B) ∪ (B \ A)) = 0. Sometimes we may requiere to work with finite
measure sets in order to have a result and then, in second step, to extend the
result to countable unions of those sets. We say that a measure space S (Ω, Σ, µ)
is σ-finite if there exists (An ) ⊂ Σ with µ(An ) < +∞ such that Ω = ∞ n=1 An .
Note that our example (Γ, P(Γ), µ) is σ-finite if and only if {γ : aγ = +∞} = ∅
and {γ : aγ ̸= 0} is countable. In particular, the cardinal measure on Γ is σ-
finite if and only if Γ is countable. With the previous definitions we can prove
the following results.
Proposition 8.2.3. A σ-finite measure space (Ω, Σ, µ) has countably many
non equivalent atoms, at most, whose union is called atomic part, and its
complement is called atom-free part is case it has positive measure.
Proof. In this case the atoms must have finite measure. A maximal set of
nonequivalent atoms is necessarily countable at most again by the σ-finiteness.
Indeed, if Ω is decomposed into disjoint parts with finite measure (Ωn ) and A
is an atom then there is only one n such that µ(A \ Ωn ) = 0, that is A ⊂ Ωn
except a measure null set. That enforces that given m ∈ N only finitely many
atoms essentially contained into An have measure greater than 1/m.
Theorem 8.2.4. Let (Ω, Σ, µ) be an atom-free measure space. Then
{µ(A) : A ∈ Σ} = [0, µ(Ω)].
Proof. The application of Zorn’s lemma allows us to find a maximal family
A of subsets from Σ which is totally ordered and µ|A is injective. For every
0 < t < µ(Ω) the sets
[ \
At = {A ∈ A : µ(A) ≤ t} and At = {A ∈ A : µ(A) ≥ t}
belong to Σ because they equal a countable union and a countable intersection
respectively. Evidently µ(At ) ≤ t ≤ µ(At ). We claim that µ(At ) = µ(At ).
Otherwise µ(At \ At ) > 0 and there is E ⊂ At \ At ) such that 0 < µ(E) <
µ(At \ At ). The set At ∩ E can added to A preserving the total ordering and
the injectivity of µ, and therefore violating the maximal property. Now we
have µ(At ) = t (actually At = At ).

As a consequence of these results we have that any σ-finite space can be


decomposed into two parts: the atomic one, that behaves like a measure on
(N, P(N)); and the atom-free part, which reminds the Lebesgue measure on R.

118
8.3 Construction of measures
The notion of outer measure plays an essential role here. Let Ω be a nonempty
set. A function µ∗ : P(Ω) → [0, +∞] is called an outer measure if satisfies:
1. µ∗ (∅) = 0;
2. µ∗ (A) ≤ µ∗ (B) if A ⊂ B;

3. µ∗ ( ∞
S P∞ ∗
n=0 An ) ≤ n=1 µ (An ).

Obviously an outer measure is nor a measure in the sense of the previous sec-
tion. The idea is that outer measures are easier to define and we will show
that an outer measure behaves as a measure on a “rich” σ-algebra.

It is not very difficult to check that the following function for sets of Rn is
an outer measure

X ∞
[
m∗ (A) = inf{ m(Rn ) : (Rn )∞
n=1 rectangles, A ⊂ Rn },
n=1 n=1

namely Lebesgue’s outer measure.

Given an outer measure µ∗ on a set Ω, we say that A ⊂ Ω is measurable


(in the sense of Caratheodory, with respecto to µ∗ ) if

µ∗ (B) = µ∗ (B ∩ A) + µ∗ (B \ A)

for any B ⊂ Ω. Note that one inequality is guaranteed, so checking measura-


bility reduces to prove that “≥” holds above. The definition of measurabilty
could be explained using an ephemeral notion of inner measure in Rn with
respect to a bounded rectangle R. For simplicity assume A ⊂ R and define

m∗ (A) = m(R) − m∗ (R \ A).

In that case, the equality m∗ (A) = m∗ (A) expressing the measurability of A


witnessed by R is analogous to Jordan’s notion of measurability.

Theorem 8.3.1. Let be a nonempty set Ω and an outer measure µ∗ defined


on its subsets.Then the family of measurable sets is a σ-algebra Σ and µ∗ |Σ is
a (σ-additive) measure.

119
Proof. Denote by Σ the family of measurable sets. Clearly ∅, Ω ∈ Σ and
A ∈ Σ if and only of Ac ∈ Σ. As to the union of sets, we will begin by showing
that the union of two sets: assume A1 , A2 ∈ Σ and B ⊂ Ω is arbitrary. The
measurability of A1 witnessed by B ∩ (A1 ∪ A2 ) gives
µ∗ (B ∩ (A1 ∪ A2 )) = µ∗ (B ∩ (A1 ∪ A2 ) ∩ A1 ) + µ∗ (B ∩ (A1 ∪ A2 ) ∩ Ac1 )
= µ∗ (B ∩ A1 ) + µ∗ (B ∩ A2 ∩ Ac1 ).
And the measurability of A2 witnessed by B ∩ Ac1 gives
µ∗ (B ∩ Ac1 ∩ A2 ) + µ∗ (B ∩ Ac1 ∩ Ac2 ) = µ∗ (B ∩ Ac1 ).
Now we have
µ∗ (B ∩ (A1 ∪ A2 )) + µ∗ (B ∩ (A1 ∪ A2 )c ) =
µ∗ (B ∩ A1 ) + µ∗ (B ∩ A2 ∩ Ac1 ) + µ∗ (B ∩ Ac1 ∩ Ac2 )
= µ∗ (B ∩ A1 ) + µ∗ (B ∩ Ac1 ) = µ∗ (B)
which implies the measurability of A1 ∪ A2 .
Clearly, that implies that Σ is closed for finite union of sets, and so it is closed
for finite intersections and differences via complements. Therefore, in order to
show that Σ is closed for countable unions it is enough to consider sequences
of disjoint sets (An ) ⊂ Σ. Firstly we will prove by induction the following
formula n n
X \
µ∗ (B) = µ∗ (B ∩ Ak ) + µ∗ (B ∩ Ack ).
k=1 k=1
Indeed, for n = 1 is just the measurability of A1 . Now, the measurability of
An implies
n−1
\ n−1
\ n
\
µ∗ (B ∩ Ack ) = µ∗ (B ∩ Ack ∩ An ) + µ∗ (B ∩ Ack )
k=1 k=1 k=1
n
\
= µ∗ (B ∩ An ) + µ∗ (B ∩ Ack ).
k=1
If we assume the formula is true for n − 1, the last equality added will imply
the formula is true for n.
The formula easily implies

X ∞
\
∗ ∗ ∗
µ (B) ≥ µ (B ∩ Ak ) + µ (B ∩ Ack ) ≥
k=1 k=1

120

[ ∞
[
∗ ∗
µ (B ∩ Ak ) + µ (B ∩ ( Ak )c ) ≥ µ∗ (B)
k=1 k=1
S∞
that gives both the measurability of k=1 Ak and the σ-additivity of µ∗ |Σ .

Note that the class of Caratheodory measurable sets with respect to an


exterior measure µ∗ has the following property: if µ∗ (A) = 0 then A is mea-
surable. In particular, the σ-algebra Σ given by the previous theorem satisfies
A ∈ Σ whenever A ⊂ B and µ(B) = 0. Any measure space with such a prop-
erty is called complete.

The previous theorem is devoid of content if we do not ensure that there


are many other measurable sets besides ∅ and Ω.
Theorem 8.3.2. Let A be an algebra of subsets of Ω and µ : A → [0, +∞]
which is σ-additive (on A) and consider the outer measure generated by µ as

X ∞
[

µ (A) = inf{ µ(An ) : A ⊂ An , (An ) ⊂ A}
n=1 n=1

for any A ⊂ Ω and let Σ be the σ-algebra of µ∗ -measurable sets. Then we have
A ⊂ Σ and µ∗ |A = µ.
Proof. Firstly we will show the measurability of any A ∈ S A. Let B ⊂ Ω be
arbitrary.SFor any cover (An ) ⊂ A of B we have B ∩ A ⊂ ∞ n=1 (An ∩ A) and
B\A⊂ ∞ n=1 (An \ A) being both covers made of sets from A. We have

X ∞
X ∞
X
∗ ∗
µ (B ∩ A) + µ (B \ A) ≤ µ(An ∩ A) + µ(An \ A) = µ(An )
n=1 n=1 n=1

which implies µ∗ (B ∩ A) + µ∗ (B \ A) ≤ µ∗ (B). As to recover µ from µ∗ as-


sume A ∈ A and P take a cover (An ) ⊂ A. The cover can be improved in
the sense that ∞ n=1 µ(An ) does not increase changing An by An ∩ A and by
making the sequence disjoint inductively A1 , A2 \ A1 , A3 \ (A1 ∪ A2 ), . . . so
we
P∞may assume that (An ) is a countable decomposition of A. By hypothesis

n=1 µ(An ) = µ(A), therefore µ (A) = µ(A).

Perhaps a delicate point in order to apply the previous results is to prove


that the “pre-measure” µ is σ-additive. This verification can be reduced to a
simpler class of subsets R ⊂ A provided that a few properties are satisfied.

121
Proposition 8.3.3. Let R ⊂ P(Ω) be a class of sets and let Σ be the σ-algebra
that generates. Suppose that:
1. There is a function µ : R → [0, +∞] which is σ-additive (on R);

2. R is a “rectangular class”, namely if R, S ∈ R implies R ∩ S ∈SR and


there are mutually disjoint sets R1 , . . . , Rn ∈ R such that R\S = ni=1 Ri .

Then µ can be extended to Σ as a σ-additive measure.


Sketch of proof. Consider A the algebra generated by R. Note that
n
[
A={ Ri : R1 , . . . , Rn ∈ R}.
i=1

It is easy to check the following facts:

1. Every A ∈ A can be expressed as a disjoint union A = ni=1 Ri with


S
R1 , . . . , Rn ∈ R

2. The formula µ(A) = ni=1 µ(Ri ) using the disjoint decomposition above
P
extends unambiguously the measure µ to all A.

3. µ is σ-additive on A.
Now we can apply Theorem 8.3.2 in order to finish the proof.

The family R plays the role of the rectangles in the construction of measures
on Rn and that is the reason for the choice of the name rectangular, obviously.
At this point we can resume the construction of the Lebesgue measure. Recall
that we have a finitely additive measure m defined on the algebra generated
by the rectangles and exterior measure m∗ built from countable covers with
rectangles. We can use Theorem 8.3.2 to show that we recover m from m∗ ,
and according to the reduction to rectangular families, we only have to show
that m is σ-additive within the rectangles.

Fact 8.3.4. If a rectangle R ⊂ Rd is a countable union RP= ∞


S
n=1 Rn of
disjoint (or merely non overlapping) rectangles then m(R) = ∞ n=1 m(R n ).

Proof. If one of the dimensions of R collapse to 0 then all the measures are 0
and so the equality holds, so we may assume that all the sides of R has length
greater than 0. The case R has an infinite edge so m(R) = +∞ can be reduced

122
to the bounded case by intersecting ∥ · ∥∞ -balls centred at the origin. Assume
then R is bounded and closed (the faces have d-dimensional measure 0). Fix
ε > 0 and for every n ∈ N take Bn an open ∥ · ∥∞ -ball centred at the origin
such that
m(Rn + Bn ) < m(Rn ) + 2−n ε
which is possible by the continuous dependence of the measure on the lengths
of the edges. Since the enlarged rectangles
Sn Rn + Bn are open an cover R there
are finitely many such that R ⊂ k=1 (Rn + Bn ). We deduce that
n
X ∞
X
m(R) ≤ m(Rk ) + ε ≤ m(Rk ) + ε
k=1 k=1

and thus m(R) ≤ ∞


P
n=1 m(Rn ) since ε > 0 is arbitrary. The reverse inequality
follows from the finite additivity of m.

That proves the existence of a σ-additive measure on Rd that extends


the measure of the rectangles, and so the Jordan measure, which is called
the Lebesgue measure. Some of the additional properties and features of the
Lebesgue measure will be developed along the next sections as particular cases.

8.4 Measurable functions


The underlying idea behind Riemann integration theory is that the integral
can be defined for those functions that can be approximated suitably by func-
tions constant on intervals. Uniform approximation by functions constant on
intervals is enough for the integration of continuous functions. The class of Rie-
mann integrable functions is slightly greater, but their functions are bound to
be continuous almost everywhere. Lebesgue integration theory carries the def-
inition of integral to functions that can be approximated suitably by functions
constant on measurable sets. Although Lebesgue measure and the integration
of functions defined on Rn is always a main motivation, integration theory will
be developed for a general measure space.

Let (Ω, Σ) a measurable space. We call a function s : Ω → R simple


it is P
a linear combination of characteristic functions of sets from Σ, namely
s = nk=1 ak χAk with ak ∈ R and Ak ∈ Σ for k = 1, . . . , n. If µ is a finite

123
measure (even finitely additive) defined on (Ω, Σ) the number
Z n
X
s dµ = an µ(Ak )
k=1

noes not depend on the particular expression of s. This is a tedious but ele-
mentary verification basedR on the algebra structure of Σ and the additivity of
µ. Note that the integral dµ defines a linear operator on the space of simple
functions S that can be naturally
R extended to any closure of S whith respect
to a topology which makes dµ continuous. For instance, the topology of
uniform convergence in case of µ(Ω) < +∞ would do the work. However this
is not the way, and the theory is more powerful if the extension of the integral
is done by monotonicity and the set of integrable functions can be described
in an easier way.

Let (Ω1 , Σ1 ) and (Ω2 , Σ2 ) be measurable spaces. We say that a map f :


Ω1 → Ω2 is measurable if f −1 (A) ∈ Σ1 whenever A ∈ Σ2 . Note that this
definition reminds the continuity. Clearly, the composition of measurable maps
is again measurable. The statement can be easily proved by the reader using
the minimality of the generated σ-algebra.

Proposition 8.4.1. If Σ2 = σ(F) then f : Ω1 → Ω2 is measurable if and only


if f −1 (A) ∈ Σ1 whenever A ∈ F.
As we will consider mainly real functions defined on a measurable space
(Ω, Σ), the measurability of f : Ω → R will be understood with respect to the
Borel σ-algebra on R. The measurability criterion just said implies that f :
Ω → R is measurable if one of the sets f −1 ((−∞, a)), f −1 ((−∞, a]), f −1 ((a, +∞))
or f −1 ([a, +∞)) lies in Σ for all a ∈ R. The properties of the measurable func-
tions are summarized in the following result.

Proposition 8.4.2. Let (Ω, Σ) be a measurable space and let M denote the set
of measurable real functions defined on it and let M∞ denote the set of mea-
surable functions valued into R = [−∞, +∞] (also with the Borel σ-algebra).
Then:
n
1. If f1 , . . . , fn ∈ M (or M∞ ) then (f1 , . . . , fn ) : Ω → Rn (or R ) is
n
measurable for the Borel σ-algebra on Rn (resp. R ).
2. M is an algebra, M∞ is stable by inverses, M and M∞ are lattices.

124
3. M∞ is stable by supremums and infimums of countable sets.
4. M∞ is stable by lim inf and lim sup of sequences, and thus it is also
stable by limits of pointwise convergent sequences.
5. If f ∈ M is bounded then it can be uniformly approximated by simple
functions.
6. If f ∈ M∞ and f ≥ 0 there is an increasing sequence of simple functions
0 ≤ s1 ≤ s2 ≤ · · · ≤ f
which converges pointwise to f .
Proof. (1) In both cases the topologies are generated by rectangles. Note that
n
\
−1
(f1 , . . . , fn ) ([a1 , b1 ] × · · · × [an , bn ]) = fk−1 ([ak , bk ]) ∈ Σ.
k=1

(2) Binary algebra operations can be expressed by a composition with a con-


tinuous function ∗ : R2 → R. That apply also to lattice operations on the ex-
tended reals. On the other hand, note that t → t−1 is continuous on [−∞, +∞].
(3) If f = sup{fn : n ∈ N} then

\
−1
f ([−∞, a]) = fn−1 ([−∞, a]) ∈ Σ
n=1
and the infimum is done likewise.
(4) Note that
lim sup fn = inf{sup{fk : k ≥ n} : n ∈ N};
n

lim inf fn = sup{inf{fk : k ≥ n} : n ∈ N}


n
so both are measurable.
(5) Assume f (Ω) ⊂ [a, b) and fix ε > 0. Take (tk )nk=1 a partition of [a, b],
that is, a = t1 < t2 < · · · < tn = b such that tk+1 − tk < ε and the sets
Ak = f −1 ([tk , tk+1 )). The simple function
n−1
X
s= ak χ A k
k=1

satisfies s ≤ f and ∥f − s∥∞ < ε.


(6) The previous construction applied to fn = min{f, n} can produce a simple
function s′n ≤ fn such that ∥fn − s′n ∥∞ < 1/n. Now take sn = max{s′1 , . . . , s′n }
which is simple and the sequence (sn ) converges pointwise to f .

125
8.5 Integration
Now we are ready to define the integral for positive functions. Let (Ω, Σ, µ)
be a measure space. Recall that the integral was already defined for simple
functions in case
Pn that µ(Ω) < +∞. If we limit ourselves to positive simple
functions s = k=1 an χAk with ak ≥ 0 we may remove the finiteness hypothesis
and the formula Z n
X
s dµ = an µ(Ak )
k=1

will make sense in [0, +∞]. It is easy to see that also in this case the value
does not depend on the particular (positive) representation of s.
We define the integral of f : Ω → [0, +∞] with respect to µ as the value in
[0, +∞] given by
Z Z
f dµ = sup{ s dµ : 0 ≤ s ≤ f, s ∈ S}
R R R
and A f dµ = χA f dµ for A ∈ Σ. Note that the computation of s dµ
could need operations involving +∞, however the limitation to positive values
avoids us possible troubles. The very definition implies these almost obvious
properties that we will need later.
Proposition 8.5.1. Under the notation and assumptions above we have:
R R
1. if 0 ≤ f ≤ g are measurable then f dµ ≤ g dµ;
R R
2. if A, B ∈ Σ, A ⊂ B and f is measurable then A f dµ ≤ B f dµ;
R R
3. if f ≥ 0 is measurable and λ ≥ 0 a real number then λf dµ = λ f dµ.
Theorem 8.5.2 (Monotone convergence theorem). Let (Ω, Σ, µ) be a measure
space and let 0 ≤ f1 ≤ f2 ≤ · · · ≤ f a sequence of measurable functions defined
on Ω with values in [0, +∞] that pointwise converges to f . Then
Z Z
lim fn dµ = f dµ.
n

Proof. The limit of the lefthand side exists in [0, +∞] by monotony and it is
obvious the inequality Z Z
lim fn dµ ≤ f dµ.
n

126
For the converse, fix a simple function s ≤ f and a number λ ∈ (0, 1). Note
that the sequence of measurable sets

An = {x ∈ Ω : fn (x) ≥ λs(x)}
S∞
is increasing and n=1An = Ω. We have
Z Z Z
fn dµ ≥ fn dµ ≥ λ s dµ.
An An
R
Note that ν(A) = A s dµ defines a positive measure on Σ, so taking limits we
have Z Z
lim fn dµ ≥ λ lim ν(An ) = λ ν(Ω) = λ s dµ.
n n

Being λ < 1 arbitrary and taking into account the definition of the integral we
get Z Z
lim fn dµ ≥ f dµ
n

as wished.
Corollary 8.5.3. If f, g : Ω → [0, +∞] are measurable then
Z Z Z
(f + g) dµ = f dµ + g dµ.

Proof. Take increasing sequences of simple functions pointwise converging


sn → f and zn → g. Then
Z Z
(f + g) dµ = lim (sn + zn ) dµ
n
Z Z Z Z
= lim sn dµ + lim zn dµ = f dµ + g dµ
n n

applying the monotone convergence theorem and having in mind that the
additivity of the integral was established for simple functions.
Corollary 8.5.4. Let (Ω, Σ, µ) be a measure space and let (fn ) be a sequence
of measurable functions valued in [0, +∞]. Then

Z X ! ∞ Z
X
fn dµ = fn dµ.
n=1 n=1

127
Proof. Just apply the monotone convergence theorem to the increasing se-
quence of functions gn = nk=1 fk whose limit is ∞
P P
f
k=1 k .
Proposition 8.5.5 (Fatou’s lemma). Let (fn ) be a sequence of non negative
measurable functions. Then
Z Z
lim inf fn dµ ≤ lim inf fn dµ.
n n

Proof. Consider the increasing sequence gn = inf{fk : k ≥ n} and apply the


monotone convergence theorem
Z Z Z Z
lim inf fn dµ = lim gn dµ = lim gn dµ ≤ lim inf fn dµ
n n n n
R R
since gn dµ ≤ fn dµ for all n ∈ N.

Now we are ready to extend the notion of integral to non positive Rfunctions.
We say that f : Ω → [−∞, +∞] is integrable if it is measurable and |f | dµ <
+∞. In such a case we define the integral of f as the real number
Z Z Z
f dµ = f dµ − f − dµ.
+

R R
We will also consider the integrals over sets A
f dµ := χA f dµ. The following
properties are not a surprise.
Proposition 8.5.6. Let L1 (µ) denote the set of integrable functions defined
on the measure space (Ω, Σ, µ). Then
1. L1 (µ) is vector lattice;
2. the integral is a linear functional on L1 (µ);
R R
3. | f dµ| ≤ |f | dµ
The following result is the key of the versatility of Lebesgue integral.
Theorem 8.5.7 (Dominated convergence theorem). Let (fn ) ⊂ L1 (µ) a se-
quence which converges pointwise to f . Assume that there is g ∈ L1 (µ) such
that |fn | ≤ g for all n ∈ N. Then f ∈ L1 (µ) and
Z Z Z
lim fn dµ = f dµ and lim |fn − f | dµ = 0.
n n

128
Proof. The integrability of f is clear from the inequality |f | ≤ g. We may
apply Fatou’s lemma to the positive sequence (2g − |fn − f |) we get
Z Z Z
2g dµ = lim(2g − |fn − f |) dµ ≤ lim inf (2g − |fn − f |) dµ
n n

Z Z
= 2g dµ − lim sup |fn − f | dµ.
n
R R
We deduce lim supn |fn − f | dµ = 0 and thus limn |fn − f | dµ = 0, which
easily implies the other part of the statement.

So far we have Lebesgue measure on Rd and the basics of abstract inte-


gration theory. It is time to put them together and to compare the result to
Riemann integration theory.
Proposition 8.5.8. Let f : D → R be a Riemann integrable function defined
on a Jordan measurable set D ⊂ Rd . Then f is measurable and both integrals,
Lebesgue and Riemann, coincide for f .
Proof. Jordan measurable sets are Lebesgue measurable as they differs from
an open set in a null measure set. Lower and upper sums associated to Riemann
integral can be understood as the integrals of simple functions encompassing
f . Choosing a sequence of partitions, which are ordered and approaching the
integral for Riemann sums, the lower and upper sequences of simple functions
converge to limits f and f which are measurable, satisfy f ≤ f ≤ f and by
construction Z
(f − f ) dm = 0.

That implies f coincides with f and f almost everywhere. That implies the
Labesgue measurability of f and the coincidence of Riemann and Lebesgue
integrals.

Unfortunately there are some important integrals which are not covered by
Lebesgue theory. For instance, the following one exists in improper Riemann
sense but not in Lebesgue Z +∞
sin x
dx.
0 x
The convergence theorems cast some light on the following question: when
can we commute derivation and integration? That is, whether is true the

129
following formula
Z Z
∂ ∂f
f (x, y) dµ(x) = (x, y) dµ(x).
∂y ∂y
If we express the derivation by its very definition at y0
Z Z

f (x, y) dµ(x) = lim h−1 (f (x, y0 + h) − f (x, y0 )) dµ(x)
∂y y=y0
h→0

and this last limit can be written sequentially, taking h = hn with limn hn = 0,
for instance. Thus the question is reduced to know the limit
Z Z
−1 ∂f
lim hn (f (x, y0 + hn ) − f (x, y0 )) dµ(x) = lim (x, y0 + θ(x, n)) dµ(x)
n n ∂y
where |θ(x, n)| < |hn | is given by the finite increments theorem. If the family of
functions { ∂f
∂y
(x, y) : y} were dominated by a positive integrable function for y
in a neighbourhood of y0 we could apply the dominated convergence theorem.
The analysis for interesting integrals is sometimes more tricky. Lets go back
to the improper Riemann non-Lebesgue integral above.

We will consider the auxiliary function defined for y > 0 by an integral


Z +∞
sin x
F (y) = e−xy dx
0 x
We claim that
Z +∞   Z +∞
′ ∂ −xy sin x
F (y) = e dx = − e−xy sin x dx
0 ∂y x 0

In order to show the domination by a positive integrable function observe that


0 ≤ xe−xy ≤ xe−xy1
if y ≥ y1 > 0 and x ∈ [0, +∞). Now, using elementary calculus of primitives
Z +∞
−1
− e−xy sin x dx =
0 1 + y2
Therefore F (y) and − arctan y should differ in a constant. Taking limits as y
goes to +∞, it is clear that F (y) goes to 0, thus we have
π
F (y) = − arctan y
2
130
We claim that Z +∞
sin x π
dx = lim+ F (y) = .
0 x y→0 2
Indeed, take Z nπ
sin x
an (y) = e−xy dx
(n−1)π x
P∞
and notice that n=1 an (y) is a Leibniz series, that is, signs are alternating
and |an (y)| goes monotonically to 0. For n ∈ N odd, we have
∞ nπ
e−πny
Z
X 2
0≤ ak (y) ≤ an (y) ≤ sin x dx ≤
k=n
πn (n−1)π πn

Decompose the integral as follows


Z +∞ Z (n−1)π Z +∞ ∞
−xy sin x −xy sin x −xy sin x X
e dx − e dx = e dx = ak (y)
0 x 0 x (n−1)π x k=n

Taking limits for y → 0+ , the first term converges to


Z (n−1)π
π sin x
− dx
2 0 x

while the last one remains bounded by 2/(πn) for n odd. Taking limits in n,
we get that Z +∞
sin x π
dx = .
0 x 2

8.6 Approximation and topology


We already know that the approximation of measurable functions by simple
ones is possible uniformly in the bounded case. On the space L1 (µ) we may
define a seminorm by the formula
Z
∥f ∥1 = |f | dµ.

Proposition 8.6.1. The set of simple functions on finite measure sets is dense
in (L1 (µ), ∥ · ∥1 ).

131
Proof. Given f ∈ L1 (µ) the sequence fn = min{n, max{f, −n}} is dominated
by |f | and converges to f in ∥ · ∥1 , so we may assume f is bounded. Now
consider the sequence (fn ) where fn (x) = f (x) if |f (x)| ≥ 1/n and fn (x) = 0
otherwise. This sequence is also dominated by |f | and converges to f , and so
for the seminorm ∥f ∥1 . The functions fn have supports of finite measure, so
they can be uniformly approached by simple functions also with supports of
finite measure.

The previous result makes clear that the approximation of integrable func-
tions by others reduces to the approximation of simple functions, and thus
the approximation of characteristic functions. Define a pseudometric on Σ by
dµ (A, B) = µ(A∆B) where A∆B = (A \ B) ∪ (B \ A) is the symmetric dif-
ference. Note that d is actually the restriction of the seminorm ∥ · ∥1 through
characteristic functions d(A, B) = ∥χA − χB ∥1 .

Proposition 8.6.2. Let (Ω, Σ, µ) a finite measure space and assume that Σ is
generated by an algebra A. Then A is dense in (Σ, dµ )
Proof. Consider the set

M = {A ∈ Σ : for all ε > 0 there is B ∈ A with dµ (A, B) < ε}.

It is obvious that ∅, Ω ∈ M and A ∈ M implies Ac ∈ M as A∆B = Ac ∆B c .


For the union of two sets note that

(A1 ∪ A2 )∆(B1 ∪ B2 ) ⊂ (A1 ∆B1 ) ∪ (A2 ∆B2 ).

Now, as we have stability by complements and unions, we have also stability by


intersections and differences. In particular, in order to see that M is stable by
S (An ) ⊂
countable unions it is enough to consider unions of disjoint sequences
M. Given ε > 0, as µ(Ω) < +∞ we Pmay take n ∈ N such that µ( k>n Ak ) <
ε/2. If (Bk )nk=1 ⊂ A are such that nk=1 dµ (Ak , Bk ) < ε/2. We have

! n
! n
[ [ [ [
Ak ∆ Bk ⊂ (Ak ∆Bk ) ∪ Ak
k=1 k=1 k=1 k>n

which implies dµ ( ∞
S Sn
k=1 Ak , k=1 Bk ) < ε. Since M ⊂ Σ is a σ-algebra that
contains A they must be the same.

With similar ideas we can deal with the completion of a measure space.

132
Proposition 8.6.3. Let (Ω, Σ, µ) a measure space. There exists a complete
measure space over the same set (Ω, Σ, µ) which is the smaller possible and has
the following property: for every A ∈ Σ there is B ∈ Σ such that µ(A∆B) = 0,
that is, Σ is dense in Σ with respect to dµ .
Proof. Evidently, a completion of (Ω, Σ, µ) must contain the family of sets

N = {M ⊂ Ω : ∃N ∈ Σ, µ(N ) = 0, M ⊂ N }.

Using the same ideas than in the previous proposition it is possible to prove
that
Σ = {A ⊂ Ω : ∃B ∈ Σ, A∆B ∈ N }
is a σ-algebra and µ(A) = µ(B) if A∆B ∈ N is well defined.
Corollary 8.6.4. Let (Ω, Σ, µ) be the completion of (Ω, Σ, µ). If f is Σ mea-
surable, then there is a Σ-measurable function g such that f = g almost every-
where with respect to µ.
Proof. For every t ∈ Q take a set Nt ∈ Σ with µ(Nt ) = 0 such that S there is
At ∈ Σ such that At ⊂ {f ≤ t} and {f ≤ t} \ At ⊂ Nt . The set N = t∈Q Nt
is null. Define g(x) = f (x) if x ̸∈ N and g(x) = 0 otherwise. By construction
g fulfils the requirements.

Under the hypotheses of the dominated convergence theorem a sequence


(fn ) ⊂ L1 (µ) converges to its limit with respect to the seminorm ∥ · ∥1 . It is
interesting that the converse is true through passing to a suitable subsequence.
Theorem 8.6.5. If limn ∥fn − f ∥1 = 0 then there is a subsequence (fnk ) which
converges to f almost everywhere.
Proof. For every ε > 0 we have
Z
1
µ({|fn − f | > ε}) ≤ |fn − f | dµ → 0.
ε
Therefore it is possible to find n1 such that

µ({|fn1 − f | > 1}) ≤ 1/2.

Inductive it is possible to build an increasing sequence n1 < n2 < . . . such


that the sets
Ak = {|fnk − f | > 1/k}

133
satisfy µ(Ak ) ≤ 2−k . Take A = ∞
T S
k=1 j≥k Aj . And note that µ(A) = 0. By
c
construction we have for any x ∈ A that |fnk (x) − f (x)| ≤ 1/k from a certain
k on, and so the theorem is proven.

Now we will consider a topological space X (see Appendix A) endowed


with its Borel σ-algebra B and a measure µ. We say that A ⊂ X is regular
if for every ε > 0 there is a closed set F ⊂ A and an open set U ⊃ A such
that µ(U \ F ) < ε. We say that µ is regular if all the sets in B are regular.
We may also use this weaker form of regularity which is more convenient for
infinite measures. We say that A ∈ B is inner regular if

µ(A) = sup{µ(F ) : F ⊂ A closed}

and outer regular if

µ(A) = inf{µ(u) : U ⊃ A closed}.

Proposition 8.6.6. Let (X, B, µ) be a topological space with a finite Borel


measure. If all the open (or closed) sets are regular then µ is regular.
Proof. Defining the family of sets

M = {A ∈ B : ∀ ε > 0 ∃F ⊂ A ⊂ U, µ(U \ F ) < ε}

we may proceed by applying the same ideas of the proof of Proposition 8.6.2.

Since in a metric space the open sets are a countable union of closed sets
we have the following.
Corollary 8.6.7. Every finite Borel measure in a metrizable space is regular.
The possibility of changing closed sets by compact sets in the inner approx-
imation.
Theorem 8.6.8. Assume that X is separable and completely metrizable and
µ a finite Borel measure on it then

µ(A) = sup{µ(K) : K ⊂ A compact}

for every A ∈ B.

134
Proof. After the previous proposition it is enough to show the result is true
for A closed. Fix ε > 0. For every n ∈ N take a countable cover (Bn,m )∞
m=1 of
A by balls of radius less than 1/n. Now fix mn such that
mn
[
µ(A \ Bn,m ) < 2−n ε.
m=1

Now we have ∞ [
mn
\
B= Bn,m
n=1 m=1

is totally bounded and µ(A \ B) < ε by construction. The set K = A ∩ B ⊂ A


is compact and satisfies µ(K) > µ(A) − ε.

The results discussed so far could be adapted for σ-finite measures with
some additional hypotheses. For instance, it is easy that Theorem 8.6.8 is still
true if the space X can be covered by countably many closed sets of finite
measure. Let us mention that the result is still true even in the σ-finite case
since any Borel subset of the completely metrizable space X can be completely
metrized for the relative topology. In any case, for our most important case
we have the following.
Theorem 8.6.9. A Lebesgue measurable set of Rd differs from a Borel set in
a null measure set and it is regular for the Lebesgue measure.
Proof. Let A ⊂ Rd a Lebesgue measurable set. Since the Lebesgue outer
measure can be computed by open covers we may find a Gδ -set A (countable
intersection of covers) E ⊃ A such that m(E) = m(A). That implies E \ A
has null measure. In order to prove the regularity it is enough to work with
Borel sets. If A were bounded the result could be deduced from Proposition
8.6.7. Otherwise, fix ε > 0 and take the sets Cn = B(0, n + 1) \ B(0, n). Find a
−n
compact Kn ⊂ C Sn∞∩ A and an open Un ⊃ CS n ∩ A such that m(Un \ Kn ) < 2 ε.

Obviously U = n=1 Un is open, and F = n=1 Kn is closed. Indeed, converg-
ing sequences stays in only one Kn . Clearly m(U \ F ) < ε.

Now we will discuss the approximation of measurable functions by more


regular ones.
Proposition 8.6.10. Let (X, B, µ) be a normal topological space endowed with
a regular Borel measure. Then the continuous functions with support of finite
measure are dense in (L1 (µ), ∥ · ∥1 ).

135
Proof. After Proposition 8.6.1 it is enough to prove the statement for char-
acteristic functions χA with A ∈ B and µ(A) < +∞. Fix ε > 0 and take
closed an open sets F ⊂ A ⊂ U such that µ(U ) < ∞ and µ(U \ F ) < ε. By
Urysohn’s lemma there is f : X → [0, 1] continuous such that f |F = 1 and
f |U c = 0. Note that the support of f has finite measure and ∥χA − f ∥1 < ε.

8.7 Product measures


Consider two measure spaces (Ω1 , Σ1 , µ1 ) and (Ω2 , Σ2 , µ2 ). Denote by

Σ1 ⊗ Σ2 = σ({A × B : A ∈ Σ1 , B ∈ Σ2 }).

Note that the sets of the form A × B with A ∈ Σ1 and B ∈ Σ2 is a rectangular


class and the function µ(A × B) = µ1 (A)µ2 (B) is finitely additive on the
algebra generated by them, just as the are for rectangles on R2 . If we prove
that µ is σ-additive within the class of “rectangles” then µ has an extension
to a σ-algebra that includes Σ1 ⊗ Σ2
Proposition 8.7.1. Given measure spaces (Ω1 , Σ1 , µ1 ) and (Ω2 , Σ2 , µ2 ) there
exist a measure µ1 ⊗ µ2 on Σ1 ⊗ Σ2 such that

(µ1 ⊗ µ2 )(A × B) = µ1 (A)µ2 (B).

If (Ω1 , Σ1 , µ1 ) and (Ω2 , Σ2 , µ2 ) are σ-finite then the measure on Σ1 ⊗ Σ2 with


such a property is unique.
Proof. After the preliminary discussion showing that Σ1 ⊗ Σ2 S
is a rectagular
class we only have to check the σ-additivity. Assume R × S = ∞ n=1 Rn × Sn .
Now define measurable functions on R by

fn = µ2 (Sn )χRn .

For every x ∈ R the sets {Sn : x ∈ Rn } are disjoint and their union is S.
Therefore ∞
X X
µ2 (S) = µ2 (Sn ) = fn (x)
x∈Rn n=1
P∞
and thus n=1 fn = µ2 (S)χR . The monotone convergence for series gives

X ∞ Z
X Z
µ1 (Rn )µ2 (Sn ) = fn dµ1 = µ2 (S)χR dµ1 = µ1 (R)µ2 (S)
n=1 n=1

136
as wished. If (Ω1 , Σ1 , µ1 ) and (Ω2 , Σ2 , µ2 ) are σ-finite then (Ω1 × Ω2 , Σ1 ⊗
Σ2 , µ1 ⊗ µ2 ) also is. Assume first that µ1 ⊗ µ2 is finite. Proposition 8.6.2
implies that the measure is determined by the value on the algebra generated
by the rectangles. This can be extended to the σ-finite case in an obvious way.

Once that the uniqueness of the product measure is guarantied under σ-


finiteness we will assume that hypothesis for the rest of the discussion. In that
way, the computation of product measure can be reduced to integration with
respect to the factor measures (cross section integral).
Theorem 8.7.2. Suppose that (Ω1 × Ω2 , Σ1 ⊗ Σ2 , µ1 ⊗ µ2 ) is σ-finite. Take a
set A ∈ Σ1 ⊗ Σ2 and for x ∈ Ω1 and y ∈ Ω2 denote
Ax = {z ∈ Ω2 : (x, z) ∈ A};
Ay = {z ∈ Ω1 : (z, y) ∈ A}.
Then the functions f (x) = µ2 (Ax ) and g(y) = µ1 (Ay ) are measurable with
respect the corresponding σ-algebras and
Z Z
µ(A) = f dµ1 = g dµ2 .

Proof. Consider the class M ⊂ P(Ω1 × Ω2 ) for which the statement of the
theorem is true. Clearly, Σ1 × Σ2 ⊂ M and the sets of the algebra generated
by Σ1 × Σ2 because of the reduction to disjoint unions. In order to prove that
M actually contains Σ1 ⊗ Σ2 we will use Theorem 8.2.1. Indeed, if (An ) ⊂ M
is an increasing sequence, then fn (x) = µ2 ((An )x ) and gn (y) = µ1 ((An )y ) are
also increasing, so the monotone convergence applies to get that

[ Z Z
µ( A) = lim fn dµ1 = lim gn dµ2 .
n n
n=1
S∞ S∞ y
Note
S∞ that limn f (x) = µ2 (( n=1 A)x ) and limn g(y) = µ2 (( n=1 A) ) and so
n=1 An ∈ M. The proof for decreasing sequences is similar but using dom-
inated convergence instead if we assume that the measure is finite. Now, the
σ-finite case follows straight: the intersection of M with every finite measure
set lies on Σ1 ⊗ Σ2 .

After the result for sets we will prove the corresponding for functions. In
order the result be more powerful, we will consider measurability with respect
the completion of the product measure. In this way, cross section technique
for integration on Rd will be covered by the result.

137
Theorem 8.7.3 (Fubini, Tonelli). Suppose that (Ω1 , Σ1 , µ1 ) and (Ω2 , Σ2 , µ2 )
are complete and σ-finite. Let f : Ω1 × Ω2 → R measurable with respect of
the completion of Σ1 ⊗ Σ2 and assume either f is positive or integrable
R and
put fRx ( ) = f (x, ) f y ( ) = f ( , y) for x ∈ Ω1 and y ∈ Ω2 . Then fx dµ2
and f y dµ1 exists for almost x and y (with respect to µ1 and µ2 ), they are
measurable on their respective spaces and
Z Z Z  Z Z 
y
f d(µ1 ⊗ µ2 ) = fx dµ2 dµ1 = f dµ1 dµ2 .

In order to avoid possible confusions we will write the integration variables


sometimes in this way
Z Z  Z Z 
f (x, y) dµ2 (y) dµ1 (x) and f (x, y) dµ1 (x) dµ2 (y).

Proof. Firstly note that the result is true for simple functions built on subsets
from Σ1 ⊗Σ2 and the result extends to simple functions because the expression
is linear. If f where positive and measurable with respect Σ1 ⊗ Σ2 the result
would be consequence of the observation and the monotone convergence theo-
rem. Obviously the result extends to f measurable with respect Σ1 ⊗ Σ2 and
integrable. Now, if f is measurable with respect of the completion of Σ1 ⊗ Σ2 ,
then there is g which is Σ1 ⊗ Σ2 measurable and coincides with f almost ev-
erywhere. The support of |f − g| is contained in a set N ∈ Σ1 ⊗ Σ2 of null
measure. Theorem 8.7.2 implies that the set Nx has null measure for almost
all x ∈ Ω1 . Then f (x, ) is measurable for those x and coincides with g(x, )
almost everywhere. A similar reasoning works for f ( , y).

We can prove a result on the derivation of parametric integrals with the


help of Fubini’s theorem.
Proposition 8.7.4. Assume (Ω, Σ, µ) is σ-finite and let f : Ω × (a, b) → R be
measurable with respect to the product measure. Suppose moreover that f (x, λ)
is derivable with respect to λ ∈ (a, b) for almost all x ∈ Ω and
Z
∂f
(x, λ) dµ(x) dλ < +∞.
Ω×(a,b) ∂λ

Then, the integral is derivable with respect to λ and the following equality holds
Z Z
∂ ∂f
f (x, λ) dµ = (x, λ) dµ
∂λ ∂λ
at the points λ ∈ (a, b) where the second term is continuous.

138
R
Proof. Put F (λ) = f (x, λ) dµ. For λ1 , λ2 ∈ (a, b) we have
Z λ2 Z Z Z λ2 
∂f ∂f
(x, λ) dµ dλ = (x, λ) dλ dµ
λ1 ∂λ λ1 ∂λ
Z
= (f (x, λ2 ) − f (x, λ1 ))dµ = F (λ2 ) − F (λ1 ).

Therefore,
λ2
F (λ2 ) − F (λ1 )
Z Z
1 ∂f
= (x, λ) dµ dλ
λ2 − λ1 λ2 − λ1 λ1 ∂λ

and the result follows from the mean value theorem.

8.8 Signed measures


Let (Ω, Σ) be a measurable space. We may consider “measures” eventually
taking negative values. We say that ν : Σ → R is a signed measure if

[ ∞
X
ν( An ) = ν(An )
n=1 n=1

whenever (An )∞n=1 ⊂ Σ are mutually disjoint. Note that any permutation of
the sets in the union on the lefthand-side leaves the value unchanged so the
series on the righthand-side have to be unconditionally convergent, which is
the same that absolutely convergent for real numbers, namely

X
|ν(An )| < +∞
n=1

whenever (An )∞
n=1 ⊂ Σ are mutually disjoint.

The first task to do with a signed measure is finding sets where the measure
behaves monotonically. Let us say that A ∈ Σ is positive if ν(B) ≥ 0 for any
B ∈ Σ with B ⊂ A. Analogously negative sets can be defined.
Lemma 8.8.1. Let ν be a signed measure. Then any set A ∈ σ with ν(A) > 0
contains a positive set P ∈ Σ with ν(P ) ≥ ν(A).

139
Proof. Consider

d1 = inf{ν(B) : B ⊂ A, B ∈ Σ} ∈ [−∞, +∞)

If d1 ≥ 0 the set A is already positive and there is nothing to do. In other case
d1 < 0. Take a set B1 such that ν(B1 ) < max{d1 /2, −1}. Assume the sets
B1 , . . . , Bn−1 are already built and take
n−1
[
dn = inf{ν(B) : B ∈ Σ, B ⊂ A \ Bk }.
k=1

If this number is dn ≥ 0 the process stop because we have already a positive


set whose complement in A has negative measure, otherwise take Bn ∈ Σ such
that ν(Bn ) < max{dn /2, −1}. Assume we have the disjoint sequence of sets
(Bn ) already built. Note that the sequence dn either satisfies dn < −1 or there
is some n0 such that dn ≥ −1 for all n ≥ n0 . The first possibility is not possible
because it it would imply

[
ν( Bn ) = −∞
n=1

that is impossible since ν takes values in R only. Therefore, from some


P n on
we have 0 > dn > 2ν(Bn ) which implies the convergence ofS the series ∞
n=1 dn .

In particular, we have limn dn = 0. We claim that P = A \ n=1 Bn is positive.
Indeed, if for ν(B) < 0 some B ⊂ P , there is n such that dn > ν(B) and this
contradicts the definition of dn .

Let us say that A ∈ Σ is ν-null if for every B ∈ Σ with B ⊂ A, then


ν(B) = 0.
Theorem 8.8.2 (Hahn decomposition). Given a signed measure of bounded
variation ν on a measurable space (Ω, Σ) there exists sets P, N ∈ Σ such that
P ∩ N = ∅, P ∪ N = Ω, P is positive and N negative. This decomposition is
unique up to ν-null sets.
Proof. Consider

s = sup{ν(B) : B ∈ Σ positive} < +∞

S∞ take Pn ∈ Σ positive such that limn ν(Pn ) = s. It is obvious


and
c
that P =
P
n=1 n is positive and ν(P ) = s. On the other hand, N = P is negative.
Otherwise, if A ⊂ N is such that ν(A) > 0, we may assume that A is positive

140
after the lemma. Then P ∪A would be positive and ν(P ∪A) > s which violates
the definition of s. Now, it is clear that if A ⊂ Σ is positive then ν(A \ P ) = 0
and if A is negative then ν(A \ N ) = 0, which implies the uniqueness of the
decomposition up to null measure sets.
Corollary 8.8.3. A signed measure is the difference of two positive finite
measures.
Proof. Let (P, N ) be the Hahn decomposition of ν. Take ν + (A) = ν(P ∩ A)
and ν − (A) = −ν(N ∩ A). Obviously, we have ν = ν + − ν − .

Define the variation of ν as the finite positive measure


|ν|(A) = ν + (A) + ν − (A).
It is not difficult to see that the variation can be recovered by this formula
X∞ ∞
[

|ν|(A) = sup{ |ν(An )| : (An )n=1 disjoint, An = A}.
n=1 n=1

The fact that |ν|(Ω) < +∞ is expressed usually by saying that ν has finite
variation. The formula above is more interesting for vector valued measures
(we will skip the definition, but the reader can easily guess it) because it allows
to define a positive measure |ν| that accurately controls ν. However, in the
infinitely dimensional case |ν| could be not finite.

A signed measure ν can be expressed also in form of an integral with respect


a positive measure. For that, take g = χP − χN and note that |g| = 1 and
Z
ν(A) = g d|ν|.
A

Now we will study more general representations of measures as indefinite in-


tegrals with respect to a positive measure. We say that a signed measure ν
is absolutely continuous with respect to a positive measure µ, both defined on
(Ω, Σ) if ν(A) = 0 whenever A ∈ Σ with µ(A) = 0. Note that for the same set
A we have |ν|(A) = 0 which implies that also |ν| is absolutely continuous with
respect to µ. The name of the property refers to the following characterization
quite similar to continuity of real functions.
Proposition 8.8.4. A signed measure ν is absolutely continuous with respect
to µ if and only if for every ε > 0 there is δ > 0 such that µ(A) < δ for A ∈ Σ
implies that ν(A) < ε.

141
Proof. Without loss of generality we may assume ν positive. One of the
implications is clear. For the converse just assume the ε-δ property is false.
Namely, there is some ε > 0 such that for all n ∈ N there is An ∈ Σ such that
µ(An ) < 2−n and ν(An ) > ε. Note now that the set
∞ [
\ ∞
A= Ak
n=1 k=n

satisfies that µ(A) = 0 and ν(A) > ε contradicting absolute continuity.

Note that a signed measure defined by a function f ∈ L1 (µ) by the formula


Z
ν(A) = f dµ
A

is absolutely continuous with respect to µ. The less trivial converse is true


under very general hypotheses.
Theorem 8.8.5 (Radon-Nikodym). Let (Ω, Σ, µ) be a σ-finite measure space
and let ν be a positive measure defined on Σ and absolutely continuous with
respect to µ. Then there exists f : Ω → [0, +∞) measurable such that
Z
ν(A) = f dµ
A

for every A ∈ Σ. The function f is uniquely determined µ almost everywhere.


Proof. It is enough to prove the result for a finite measure space as the general
result can be deduced by gluing the functions defined on each finite measure
part. Note that if µ(Ω) < +∞ then ν(Ω) < +∞. Consider the set
Z
F = {f ≥ 0 measurable : f dµ ≤ ν(A) for all A ∈ Σ}.
A

It is easy to see that if f1 , f2 ∈ F then max{f1 , f2 } ∈ F. Therefore it is posible


to build an increasing sequence (fn ) ⊂ F such that
Z Z
lim fn dµ = sup{ f dµ : f ∈ F} ≤ ν(Ω) < +∞.
n

Take f = limn fn which is integrable and so finite µ almost everywhere. With-


out loss of generality we may assume f takes only finite values and also f ∈ F.

142
R
We claim f is the function we are looking for. Note that ν0 (A) = ν(A)− A f dµ
defines a positive measure which is also absolutely continuous with respect to
µ. Suppose that ν0 (Ω) > 0 in order to get a contradiction. Fix ε > 0 such
that ν0 (Ω) > εµ(Ω). Now applying the Hahn decomposition to ν0 − εµ we get
a positive part P with respect to such a measure. Since (ν0 − εµ)(Ω) > 0 we
get (ν0 − εµ)(P ) > 0 and also

ν0 (A ∩ P ) ≥ εµ(A ∩ P )

for every A ∈ Σ. Now we have


Z Z
ν(A) = f dµ + ν0 (A) ≥ f dµ + ν0 (A ∩ P ) ≥
A A
Z Z
f dµ + εµ(A ∩ P ) = (f + εχP ) dµ
A A
for every A ∈ Σ, which implies f + εχP ∈ F. However
Z Z Z
f dµ ≥ (f + εχP ) dµ = f dµ + εµ(P )

implies that µ(P ) = 0 and so ν0 (P ) = 0 too by absolute continuity. This


contradicts the assumptions above so the theorem is proven.
Corollary 8.8.6 (Signed Radon-Nikodym). Let (Ω, Σ, µ) be a σ-finite measure
space and let ν be a signed measure defined on Σ and absolutely continuous with
respect to µ. Then there exists f ∈ L1 (µ) such that
Z
ν(A) = f dµ
A

for every A ∈ Σ. The function f is uniquely determined µ almost everywhere.


Proof. It is enough to apply Theorem 8.8.5 to the positive measures given by
Hahn decomposition of ν, Theorem 8.8.2.

8.9 Differentiation
We will develop the Lebesgue theory of differentiation on Rd endowed with
the d-dimensional Lebesgue measure m. The chosen norm on Rd will not play

143
an essential role, however it must fixed from the beginning. Instead of the
difference quotients we will use integral averages
Z
1
Ar (f )(x) = f dm
m(B(x, r)) B(x,r)

for every r > 0 and f ∈ L1 (µ). We have to prepare the tools for the main
result. The first one is expresses a general property of the convolution, actually.
Proposition 8.9.1. The average Ar (f ) is norm 1 operator on f ∈ L1 (µ).

Proof. Consider the (in)equalities


Z Z
1
∥Ar (f )∥1 = χB(y,r) (x)f (x) dm(x) dm(y)
m(B(y, r))
Z Z
1
≤ χB(y,r) (x)|f (x)| dm(x) dm(y)
m(B(0, r))
Z Z
1
= χB(y,r) (x)|f (x)| dm(y) dm(x)
m(B(0, r))
Z Z
1
= χB(x,r) (y)|f (x)| dm(y) dm(x)
m(B(0, r))
Z
= |f (x)| dm(x) = ∥f ∥1
R
since χB(x,r) (y) = χB(y,r) (x) and χB(x,r) (y)dm = m(B(0, r)).

The previous result (and its proof) can be interpreted in terms of convolu-
tion with a family of kernels.

Now we will prove a version of Vitali’s covering lemma.


Proposition 8.9.2. Let A ⊂ Rd be a measurable set which is covered by a
family F of balls with radii uniformly bounded and whose union covers A.
Then there is a disjoint sequence (Bn ) ⊂ F (maybe finite) such that
X
m(Bn ) ≥ 5−d m(A).
n

144
Proof. Write R for the radius of a ball or the supremum of the radii of
a family of balls. The choice of balls will be by induction. Take B1 ∈ F
with a R(B1 ) ≥ 2−1 R(F). Suppoose now that Bk are already chosen for
k = 1, . . . , n − 1. Find Bn ∈ F such that
1
R(Bn ) ≥ R({B ∈ F : B ∩ Bk = ∅, k = 1, . . . , n − 1})
2
if that choice is possible, otherwise the construction stops. PIn order to show
that the sequence satisfies the statement, we may assume n m(Bn ) < ∞.
Let Bn′ be a ball with the same center that Bn and radius 5 times bigger.
We claim that (Bn ) meets every set in F. Indeed, take B ∈ F and assume
Bn ∩ B = ∅. ThatP would imply R(Bn+1 ) > R(B)/2 and therefore the sequence
is infinite and n m(Bn ) = ∞ against the previous assumption. Now, since B
meets some Bn , assume n is minimum. Then we have

S R(B′
n ) > R(B)/2 and
so B ⊂ Bn (draw a picture). In consequence, A ⊂ n Bn , and thus
X X
m(A) ≤ m(Bn′ ) = 5d m(Bn )
n n

as desired.

Given a measurable function f its maximal function is defined as


Z
1
M (f )(x) = sup |f | dm
r>0 m(B(x, r)) B(x,r)

Proposition 8.9.3. Assume f ∈ L1 (Rd ), then for every ε > 0


5d
m({M (f ) > ε}) ≤ ∥f ∥1
ε
Proof. Fix ε and put A = {M (f ) > ε}. By the definition of the maximal
function, for every x ∈ A there is a ball Bx centred at x such that
Z
|f | dm > ε m(Bx )
Bx

Clearly the radii of the balls are uniformly bounded. Using Vitali’s lemma
there is disjoint sequence (maybe finite) in {Bx : x ∈ A} that we denote (Bn ).
We have Z X
S
|f | dm > ε m(Bn ) ≥ ε 5−d m(A)
n Bn n
which implies the statement.

145
Theorem 8.9.4. Let f ∈ L1 (Rd ) then for almost every point x ∈ Rd there
exists the limit Z
1
lim f dm = f (x).
r→0+ m(B(x, r)) B(x,r)

Proof. Firstly we will show that the convergence of the averages happens with
respect to ∥ · ∥1 , namely

lim ∥Ar (f ) − f ∥1 = 0.
r→0+

Indeed, if f where continuos with compact support then Ar (f ) would converge


uniformly to f . Since Ar (f ) has support contained into sup(f ) + B(0, r), that
the convergence is also with respect to ∥ · ∥1 . Now take ε > 0 and f ∈ L1 (Rd ).
Take g continuous with compact support such that ∥f − g∥1 < ε/3. That
also implies ∥Ar (f ) − Ar (g)∥1 < ε/3. Now, if r > 0 is small enough to have
∥Ar (g) − g∥1 < ε/3 then

∥Ar (f ) − f ∥1 ≤ ∥Ar (f ) − Ar (g)∥1 + ∥Ar (g) − g∥1 + ∥g − f ∥1 < ε.

Define the mean oscillation


Z Z
1 1
osc(f, x) = lim sup f dm − lim inf f dm.
r→0+ m(B(x, r)) B(x,r) r→0+ m(B(x, r)) B(x,r)

Note that osc(f, x) ≤ 2 M (f )(x) and the desired convergence at x is equivalent


to osc(f, x) = 0. If f were continuous the result would be trivial, as we have
already seen before. Take ε > 0. In general, for any f ∈ L1 (Rd ) we can find
continuous (compactly supported) functions g such that ∥f − g∥1 is arbitrarily
small. Note that osc(f − g, x) = osc(f, x) and so

5d
m({osc(f, x) > ε/2}) ≤ ∥f − g∥1
ε
that implies m({osc(f, x) > ε/2}) = 0. Therefore the integral averages of f
converge almost everywhere and the limit must coincide with f almost every-
where by the first part of the proof and Theorem 8.6.5.

Is clear that Theorem 8.9.4 can be extended to functions that are integrable
on a bounded open subset of Rd (locally integrable). In this way the result
naturally applies to characteristic functions.

146
Corollary 8.9.5. Given a measurable set A ⊂ Rd the limit
m(B(x, r) ∩ A)
lim+
r→0 m(B(x, r))

exists for almost all x ∈ Rd with values 0 or 1.


Now we will discuss the derivation of real functions of one variable from
the point of view given by measure theory. Firstly note that we should change
the previous averages by these more convenient ones

1 x 1 x+r
Z Z
lim f (t) dt and lim f (t) dt.
r→0+ r x−r r→0+ r x

It is not difficult to check that all the previous theory can can be adapted to
these averages despite x is not the central point. As a consequence we have
the following.
Rx
Corollary 8.9.6. The indefinite integral F (x) = a f (t) dt of a locally in-
tegrable function on R is differentiable almost everywhere and the equality
F ′ (x) = f (x) holds almost everywhere on its domain.
A related important question is to recognise those functions which are in-
definite integrals
R of L1 (R) functions. The key idea is the fact that the measure
µ(A) = A f dm is absolutely continuous with respect the Lebesgue measure
on R. We say that a function F defined on an interval (a, b) of R is absolutely
continuous if for every ε > 0 there is δ > 0 such that for any choice of points

a < a1 < b1 < a2 < b2 < · · · < an < bn < b

with nk=1 (bk − ak ) < δ then nk=1 |F (bk ) − F (ak )| < ε. Evidently, an abso-
P P
lutely continuous function is continuous and also it is of bounded variation on
bounded intervals.
Theorem 8.9.7. Let F : (a, b) → R be an absolutely continuos function. Then
F is differentiable almost everywhere and its derivative F ′ (x) = f (x) is locally
integrable and satisfies
Z d
f (x) dx = F (d) − F (c)
c

for every interval [c, d] ⊂ (a, b).

147
Proof. We may assume that [a, b] is finite and the butts belong to the domain.
The members of the algebra A generated by the intervals can be represented
as a disjoint finite union of intervals. We define a function on A by
n
X
ν(A) = (F (bk ) − F (ak ))
k=1

where the intervals |ak , bk | made up the decomposition of A. As a function it


is uniformly continuous with respect to the pseudometric d(A, B) = m(A∆B),
Proposition 8.6.2. Since A is dense with respect to d in the measurable subsets
of (a, b), ν extends as a uniformly continuous function to all those sets. It is not
difficult to check that ν is a signed σ-additive measure of bounded variation and
absolutely continuous with respect to m. By R the Radon-Nikodym Theorem
8.8.5 there is f ∈ L1 (a, b) such that ν(A) = A f dm. Therefore
Z x
F (x) = F (a) + f dm.
a

By the previous Corollary F is differentiable almost everywhere and the equal-


ity F ′ (x) = f (x) holds almost everywhere.

8.10 Rationale and remarks


This chapter is noticeably longer than others for the sake of its autonomy and
it could be used in an advanced course together Appendix B. Nevertheless,
the choice of topics is free and may depend on the time employed for Riemann
integral. In case one decides to withdraw Riemann integral from the course,
the chapter on Change of Variables should be adapted for the Lebesgue inte-
gral. My personal choice is to keep Riemann integral (actually, the integration
of continuous functions with bounded support) in order to regard Lebesgue
integral as an extended linear operator.

In the classroom, I like to begin the topic with the “proof” of Pythagoras
Theorem based in different decompositions of a square. In this way, I discuss
in what extent we are using an intuitive notion of area and how we could calm
the need for rigor. We can devise a “naı̈ve” theory of area for polygons based in
equidecomposability and then I would mention the theorems of Bolyai-Gerwien
and Hadwiger-Glur. However, the same program cannot be developed in three
dimensions because of the solution of Dehn to Hilbert’s third problem. That

148
shows that integral methods are needed even if we restrict ourselves to volumes
of polyhedral bodies.

We arrive just to the differentiation results, for measures and integrals.


Going on will requiere special properties of L1 (µ), namely completeness with
respect to (semi)norm, which is traditionally considered Functional Analysis.
That material can be found in the appendix. We also managed to skip, ex-
plicitly, the convolution. Some topics can be properly treated in Advanced
Analysis topics.

As to the methods to carry out the topic, I want to point out the strug-
gle to make clear the need for countable additive measures and the meaning
of Caratheodoy’s measurability definition. Our proof of the Radon-Nikodym
theorem is constructive, instead of von Neumann’s idea using Hilbert proper-
ties of L2 . In our vision, some specific properties of function spaces belong to
Functional Analysis, so they are relegated to the auxiliary chapter.

8.11 Exercises
1. Use the formula for the measure of a union of sets to deduce the area of
a spherical triangle in terms of its angles.
2. Let Sn be the permutation group action on a set of n elements. Let Fn
be the subset of Sn that fixes at least one element. Show the existence
and find the value of the limit
#(Fn )
lim .
n n!
3. Find the values of α for which
Z 1
lim nα (1 − x)xn cos(πx/n) dx = 0.
n 0

4. Find the domain of the function


Z π/2
f (x) = tx sin t dt.
0

Show that f (x) is convex and it attains an absolute minimum between


0 and 1.

149
R1 log(1+xt)
5. Let f (x) = 0 1+t2
dt defined for x > 0. Show that f (1) = π log(2)/8
and
log 4 + πx − 4 log(1 + x)
f ′ (x) = .
4(1 + x2 )
6. Prove that the function
+∞ √
Z  
cos(xt) −t
f (x) = x t+ √ e dt
0 t
is defined on R, it continuous and monotone.
7. Let f : R → R be integrable, and for n ∈ N take

n +∞
Z
f (t) dt
fn (x) = .
π −∞ 1 + n2 (t − x)2

Prove that the integral is finite and if f is continuous at some x ∈ R


then limn fn (x) = f (x).
8. Let f : Rd → R be an integrable function.

(a) Show that limt→+∞ tm({x ∈ Rd : |f (x)| > t}) = 0.


(b) If K ⊂ Rd is compact, show that
Z
lim |f (t)| dt = 0.
∥x∥→+∞ x+K

9. Prove that a closed interval cannot be written as a countable disjoint


union of smaller closed intervals.

10. Let F ⊂ P(Ω) be a family of sets. Prove that for every A ∈ σ(F) there
is FA ⊂ F countable such that A ∈ σ(FA ).

11. Prove that the cardinality of the Borel sets of R is c = 2N and the
cardinality of Lebesgue measurable sets of R es 2c .
12. Find a non measurable Lebesgue set (use an equivalent version of the
Axiom of Choice).
13. Prove that there is no probability µ on P(N) such that µ(nN) = 1/n for
all n ∈ N.

150
14. For any set A ⊂ R, let D(A) = {x−y : x, y ∈ A}. Prove that if m(A) > 0
then D(A) is a neighbourhood of 0. Show that the reciproque does not
hold by computing D(T ) where T is the ternary Cantor set.
15. Let f : R → R a first Baire class function, that is, a pointwise limit of
continuous functions. Show that f −1 (U ) is a countable union of closed
sets for every open U ⊂ R. Deduce that the indicator fuction of the
rationals is not first Baire class, but it is second Baire class (pointwise
limit of first Baire class functions).
16. Let ⟨x⟩ denote the not-integer part of a number x ∈ R. Prove that for
all α ∈ R \ Q and every f ∈ C[0, 1] then
n Z 1
1X
lim f (⟨kα⟩) = f (x) dx.
n n 0
k=1

17. Let (fn ) ⊂ C[a, b] a bounded sequence of continuous function pointwise


Rb
converging to 0. Show that limn a fn = 0 without invoking the domi-
nated convergence theorem.
18. Let g : R → R be an increasing left-continuous function. Prove that there
is a unique Borel measure µ such that µ([a, b)) = g(b) − g(a) for every
a < b. The integration with respect to µ is called Lebesgue–Stieltjes,
because it extends the older Riemann–Stieltjes integral, and it is denoted
Z b
f dg.
a

19. Let f, g : [0, +∞) → R be functions such that f Ris decreasing with
y
limx→+∞ f (x) = 0 and there is M > 0 such that | x g(t)dt ≤ M | for
every x, y ∈ [0, +∞). Show that if x < y ∈ [0, +∞), then
Z y
f (t)g(t) dt ≤ M f (x).
x

20. Show that the product of two absolutely continuous functions (defined
on the same compact interval) is absolutely continuous too.

151
152
Chapter 9

Integration on curves and


surfaces

9.1 Functions of bounded variation


We will discuss the notion of length of a curve in a normed space, but for that
aim is better to start by a related notion in the setting of real valued functions.

Given a real function f : [a, b] → R we may consider the following number,


eventually +∞,
n
X
Vab (f ) = sup{ |f (xi ) − f (xi−1 )| : (xi )ni=0 partition of [a, b]}
i=1

called the variation of f on [a, b]. If Vab (f ) < +∞ we say that f is of bounded
variation. Note that monotone functions are trivially of bounded variation. A
less trivial example: the existence and boundedness of the derivative implies
bounded variation since
n
X n
X
|f (xi ) − f (xi−1 )| = |f ′ (ξi )|(xi − xi−1 ) ≤ M (b − a)
i=1 i=1

for some ξi ∈ (xi−1 , xi ) and M > 0 being a bound for f ′ (x) on (a, b).

Elementary properties of variation are summarised here.


Proposition 9.1.1. Let f : [a, b] → R be a function, then:

153
(a) |f (b) − f (a)| ≤ Vab (f );
(b) If [c, d] ⊂ [a, b] then Vcd (f ) ≤ Vab (f );
(c) if c ∈ [a, b] then Vac (f ) + Vcb (f ) = Vab (f ).
Proof. (a) Note that |f (b)−f (a)| is the sum associated to the trivial partition
of [a, b]. Statements (b) and (c) follow just taking suitable partitions of [a, b]
including points a, b.

The following summarises some properties of the variation when it is finite.


Proposition 9.1.2. Let f, g : [a, b] → R be functions of bounded variation,
then:
(a) Vab (f + g) ≤ Vab (f ) + Vab (g), Vab (λf ) = |λ|Vab (f ) for λ ∈ R;
(b) Vax (f ) is increasing for x ∈ [a, b];
(c) Vax (f ) − f (x) is increasing for x ∈ [a, b].
Proof. (a) It is just the triangle property and homogeneity of the absolute
value. (b) It is consequence of (c) of previous proposition. In order to prove
(c), for x ≤ y we have

f (y) − f (x) ≤ Vxy (f ) = Vay (f ) − Vax (f )

and thus
Vax (f ) − f (x) ≤ Vay (f ) − f (y)
which finishes the proof.

We are ready for the main characterization.


Theorem 9.1.3. A function f : [a, b] → R is of bounded variation if and only
if it is the difference of two monotone functions.
Proof. If f is of bounded variation, by the previous proposition we have

f (x) = Vax (f ) − (Vax (f ) − f (x))

which a representation of f as a difference of two increasing functions. On the


other hand, monotone functions are of bounded variation and this property is
stable by sums.

154
Corollary 9.1.4. A function of bounded variation has at most countably many
discontinuities of jump type.
Now we are interested in knowing if continuity of the function is inherited
by its variation.
Theorem 9.1.5. If f : [a, b] → R is continuous and of bounded variation then
Vax (f ) is continuous for x ∈ [a, b].
Proof. Suppose that Vax (f ) ̸→ Vac (f ) for some c ∈ [a, b]. We may assume
that c > a and x < c, otherwise we could made a similar argument. Therefore
there is some η > 0 such that Vxc (f ) > η for every x < c. Take a1 = a and find
a partition (xi )ni=1 of [a1 , c] such that
n
X
|f (xi ) − f (xi−1 )| > η.
i=1

By the continuity of f we may assume that |f (xn−1 ) − f (c)| < η/2. Now take
a2 = xn−1 < c and observe that Vaa12 > η/2. Proceed likewise to find a3 and so
an increasing sequence (an ) such that Vaann+1 (f ) > η/2. Thus

Vaa1n (f ) = Vaa12 + · · · + Vaan−1


n
(f ) ≥ (n − 1)η/2.

That clearly violates the boundedness of the variation of f .


Corollary 9.1.6. A continuous function of bounded variation is the difference
of two continuous monotone functions.

9.2 Curves in normed spaces


Consider a parameterized (non necessarily continuous) curve γ : [a, b] → X
where X is a normed space. We say that the curve is rectifiable if
n
X
Lba (γ) = sup{ ∥γ(ti ) − γ(ti−1 )∥ : (ti )ni=0 partition of [a, b]} < +∞.
i=1

It is easy to check that a rectifiable curve is rectifiable for any equivalent norm
on X unless the length is obviously not invariant.

Note the similarities of the length with the variation. Some of the argu-
ments in the preceding section can be adapted to prove the following properties.

155
Proposition 9.2.1. Let γ : [a, b] → X be a rectifiable continuous curve. Then
(a) ∥γ(b) − γ(a)∥ ≤ Lba (f );
(b) if c ∈ [a, b] then Lca (f ) + Lbc (f ) = Lba (f );
(c) Lta (γ) is increasing for t ∈ [a, b];
(d) Lta (γ) is continuous for t ∈ [a, b].
In the following, we may assume that parameterized curves are always con-
tinuous. We will begin with the characterization in finite dimensional spaces.
Theorem 9.2.2. A curve γ : [a, b] → Rd is rectifiable if and only if its coordi-
natewise functions are of bounded variation.
Proof. Being rectifiable is independent of the norm on Rd since all the norms
are equivalent. We will use the ∥ · ∥1 norm to prove the equivalence. Write
γ(t) = (x1 (t), . . . , xd (t)) and observe that
n
X n
X d X
X n
|xk (ti ) − xk (ti−1 )| ≤ ∥γ(ti ) − γ(ti−1 )∥1 = |xj (ti ) − xj (ti−1 )|
i=1 i=1 j=1 i=1

where (ti )ni=0 is a partition of [a, b] and k = 1, . . . , d. The first inequality implies
that xk is of bounded variation when γ is rectifiable. The equality on the right
hand side implies that γ is rectifiable if all the functions xj for j = 1, . . . , d are
of bounded variation.

Using a deep result saying that monotone functions have derivative almost
everywhere we can deduce the following corollary.
Corollary 9.2.3. A rectifiable curve γ : [a, b] → Rd has a tangent line at γ(t)
for almost every t ∈ [a, b].
However, this derivative is of little use as it doesn’t show the global be-
haviour of of the curve unless we assume extra regularity. Indeed, remind that
there exist non trivial monotone functions with null derivative at almost every
point (Cantor’s staircase e.g.).

We will fix the standard of regularity in order to get profit of the derivative
of the curve. We will say that a curve γ : [a, b] → X is C 1 (please, remark: on
[a, b]) if it has derivative at every point of [a, b] including the endpoints with

156
side derivatives and the derivative is continuous on [a, b]. It is no difficult to
prove that this is equivalent to say that there exists a C 1 extension of γ to
an open interval containing [a, b]. We will say that a curve γ : [a, b] → X is
piecewise C 1 if it continuous and there exists a finite partition of [a, b] such
that γ is C 1 on every subinterval of the partition.
Theorem 9.2.4. Let γ : [a, b] → X be a piecewise C 1 curve. Then γ is
rectifiable and Z b
b
La (γ) = ∥γ ′ (t)∥ dt.
a

Proof. Firstly we will assume that γ is C 1 on [a, b], which implies the uniform
continuity of γ ′ (t) on [a, b]. Given ε > 0 find δ > 0 such that |t − ξ| < δ implies
∥γ ′ (t) − γ ′ (ξ)∥ < ε. Take a partition (ti )ni=0 of [a, b] such that |ti − ti−1 | < δ.
Using the mean value theorem on the auxiliary function
f (t) = γ(t) − γ(ti−1 ) − γ ′ (ξi )(t − ti−1 )
with ξi ∈ [ti−1 , ti ] we get that
∥γ(ti ) − γ(ti−1 ) − γ ′ (ξi )(ti − ti−1 )∥ = ∥f (ti ) − f (ti−1 )∥
≤ sup{∥f ′ (t)∥ : t ∈ [ti , ti−1 ]}(ti − ti−1 ) ≤ ε(ti − ti−1 ).
Therefore
|∥γ(ti ) − γ(ti−1 )∥ − ∥γ ′ (ξi )∥(ti − ti−1 )| ≤ ε(ti − ti−1 ).
Using that on any interval of the partition we have
n
X n
X n
X

| ∥γ(ti ) − γ(ti−1 )∥ − ∥γ (ξi )∥(ti − ti−1 )| ≤ ε(ti − ti−1 ) = ε(b − a)
i=0 i=0 i=0

Since we could take partitions such that the first term approaches the length
and the second one the Riemann integral, taking limits we get
Z b
b
|La (γ) − ∥γ ′ (t)∥ dt| ≤ ε(b − a)
a

following that Lba (γ) < +∞. Now, as ε > 0 was arbitrary we get the equality
between both numbers. Finally, the general case with γ being piecewise C 1
reduces to the last equality by the additivity of the length and the integral
with respect to intervals.

157
9.3 Some formulas
Despite the generality of the results of the previous section, the Euclidean norm
still plays a fundamental role. If the curve γ is parameterized as (x(t), y(t), z(t))
for t ∈ [a, b], the length is given by
Z bp
Lba (γ) = x′ (t)2 + y ′ (t)2 + z ′ (t)2 dt.
a

The important case of the graph y = f (x) of a function we have for the length
for x ∈ [a, b]
Z bp
b
La = 1 + f ′ (x)2 dx.
a

However, a plane curve could be given in polar form r = ϕ(θ) with θ ∈ [α, β].
We have (locally)
x(θ) = ϕ(θ) cos θ,
y(θ) = ϕ(θ) sin θ.
The derivation gives

x′ (θ) = −ϕ(θ) sin θ + ϕ′ (θ) cos θ,

y ′ (θ) = ϕ(θ) cos θ + ϕ′ (θ) sin θ.


A simple calculus shows that

x′ (θ)2 + y ′ (θ)2 = ϕ(θ)2 + ϕ′ (θ)2 .

Therefore, the length in this case is


Z βp
β
Lα = ϕ(θ)2 + ϕ′ (θ)2 dθ.
α

9.4 Integration with respect to the arc length


Suppose that γ : [a, b] → X is rectifiable and f is a function defined on γ([a, b]).
Sometimes is necessary to consider the convergence of Riemann-like sums of
the form n
X
f (γ(ξi ))Lttii−1 (γ)
i=0

158
for a partition (ti )ni=0 of [a, b] and ξi ∈ [ti−1 , ti ]. For instance, the mass of
a curve in terms of its linear density can be obtained this way. The reader
acquainted with the Riemann-Stieltjes integral can see that the integral could
be expressed as Z b
f (γ(t)) dLta (γ)
a
That implies, in particular, that the existence is guarantied for f continu-
ous. A direct proof can be obtained by just mimicking the proof of Riemann
integrability of continuous functions. We will follow the following notation
Z n
X
f dℓ = lim f (γ(ξi ))Lttii−1 (γ)
γ i=0

and dℓ is called “arc element”. It worth noticing that the same limit, when
existing, can be obtained by the Riemann-like sums
n
X
f (Ξi )∥γ(ti ) − γ(ti−1 )∥
i=0

where the point Ξi can be taken from γ([tt−1 , ti ]) or from [γ(ti−1 ), γ(ti )] in case
of f is defined on a neighbourhood of γ([a, b]) where it is uniformly continuous.

We will prove a formula to compute that integral in case of regularity of γ.


Proposition 9.4.1. Let γ : [a, b] → X be a piecewise C 1 curve and f :
γ([a, b]) → R then
Z Z b
f dℓ = f (γ(t)) ∥γ ′ (t)∥ dt
γ a

Proof. Take a partition (ti )ni=0 of [a, b] and find ξi ∈ [ti−1 , ti ] such that

Lttii−1 (γ) = ∥γ ′ (ξi )∥(ti−1 − ti )

and so n n
X X
f (γ(ξi ))Lttii−1 (γ) = f (γ(ξi ))∥γ ′ (ξi )∥(ti−1 − ti )
i=0 i=0

Taking limits with respect to the partitions we will get the desired identity.

159
For the following we will restrict ourselves to the Euclidean norm on Rd
since the scalar product is involved. A very alike notion appears when we
wish formalise the path integral used to compute the work done by a force.
Suppose that f : γ([a, b]) → Rd is continuous and consider the convergence of
Riemann-like sums of the form
X n
f (γ(ξi )) · (γ(ti−1 ) − γ(ti ))
i=0

where “·” is the scalar product, (ti )ni=0 a partition of [a, b] and ξi ∈ [ti−1 , ti ].
Again, the existence of this limit called the line integral can be proved by
standard methods and its value is denoted by

− → −
Z
f ·d ℓ
γ

(the notation with d→ −s is also popular but we will try to avoid when it could
lead to confusion). Note that we could work in a Hilbert space instead of
Rd because of the properties of the scalar product, or more generally, we
could assume that f takes values in X ∗ , so we may consider sums of terms
f (γ(ξi ))(γ(ti ) − γ(ti−1 )) which is actually what appear in the theory of inte-
gration of differential forms.

In case of regularity we have an explicit formula for the line integral. We


will state it only in the case of Rd with the scalar product.
Proposition 9.4.2. Let γ : [a, b] → Rd be a piecewise C 1 curve and f :
γ([a, b]) → Rn then
Z b

− → −
Z
f · d ℓ := f (γ(t)) · γ ′ (t) dt.
γ a

Proof. Given ε > 0 we have establish in the proof of Theorem 9.2.4 that
∥γ(ti−1 ) − γ(ti ) − γ ′ (ξi )(ti − ti−1 )∥ < ε(ti − ti−1 )
for a fine enough partition, and so
|f (γ(ξi )) · (γ(ti−1 ) − γ(ti )) − f (γ(ξi )) · γ ′ (ξi )(ti − ti−1 )| < εM (ti − ti−1 )
where M > 0 is an upper bound for f . Summing all the terms we have
X n n
X
f (γ(ξi )) · (γ(ti−1 ) − γ(ti )) − f (γ(ξi )) · γ ′ (ξi )(ti − ti−1 ) < εM.
i=1 i=0

Clearly, that implies the desired result as ε > 0 is arbitrary.

160
9.5 Alternative parameterizations
A change of parameterization of a curve γ : [a, b] → X can be easily done
just taking an increasing onto function h : [c, d] → [a, b] and considering γ ◦ h.
The regularity of the new parameterization depends on que quality of both γ
and h. We will try to solve the inverse problem: assume that we have two
parameterizations giving the same curve (image and orientation), are these
two parameterizations linked by a regular change of variables?

We will show that the arc length provides a natural parameterization of


curves. Note that Lta (γ) is an increasing continuous function defined on [a, b],
and take
τ (s) = inf{t : Lta (γ) = s}
τ (s)
which is actually it is a minimum and therefore La (γ) = s for every s ∈
[0, Lba (γ)]. Note as well that τ is strictly increasing and discontinuous, however,
the composition γ̃(s) = γ(τ (s)) is continuous with respect to s ∈ [0, Lba (γ)].
Indeed, it is 1-Lipschitz
τ (s )
∥γ̃(s) − γ̃(s0 )∥ ≤ Lτ (s)0 (γ) = |Lτa(s) (γ) − Lτa(s0 ) (γ)| = |s − s0 |.

This is the parameterization of γ with respect to the arc length, which is usu-
ally denoted by the choice of the letter s as a variable.

The regularity of this parameterization depends on the regularity of γ, and


so from a previous parameterization. If γ is C 1 then dLta (γ)/dt = ∥γ ′ (t)∥ by the
length formula, and thus a continuous function. Note that τ (s) is differentiable
at s whenever γ ′ (τ (s)) ̸= 0 by the rule of the derivative of the inverse function.
But if we want to guarantee that τ is C 1 we have to keep γ ′ (t) away from zero.

Proposition 9.5.1. Let γ : [a, b] → X be a C 1 curve such that γ ′ (t) ̸= 0 for


all t ∈ [a, b]. Then its re-parameterization γ̃(s) with respect to the arc length
is C 1 too and ∥γ̃ ′ (s)∥ = 1 for all s ∈ [0, Lba (γ)].

Proof. Under the hypotheses Lta (γ) is strictly increasing and thus τ is an ac-
tual inverse function whose derivative τ ′ (s) = 1/∥γ ′ (τ (s))∥ which is continuous
too. Moreover
γ̃ ′ (s) = γ ′ (τ (s))τ ′ (s)
which has norm one by the previous formula.

161
The answer to the question of the beginning of the section turns out as a
corollary.
Corollary 9.5.2. If γ1 : [a, b] → X and γ2 : [c, d] → X are two C 1 parameter-
izations of the same curve (image and orientation) whose derivatives do not
vanish, then there exists a C 1 increasing bijection h : [a, b] → [c, d] such that
γ1 = γ2 ◦ h.
Proof. We have two re-parameterizations by the arc length γ̃1 = γ1 ◦ τ1 and
γ̃2 = γ2 ◦ τ2 . Note that we must have γ̃1 = γ̃2 and therefore γ1 = γ2 ◦ τ2 ◦ τ1−1 .

9.6 Another way to compute the length


In order to discuss the notion of area of a surface we will assume that the “am-
bient space” R3 is equipped with the Euclidean norm. Even in that intuitive
case, the area of a surface cannot be defined in the same way that we defined
the length of a curve. In the case of curves, the length is the limit of the lengths
of the polygonal lines obtained from the nodes placed following partitions. In
this case, the segments the polygonal is made of approach tangent lines of the
curve. However, the faces of the polyhedric surfaces built likewise may not
approach the tangent plane, even for surfaces as simple as the cylinder. For
that reason we will propose an alternative method to compute a curve’s length
that can be adapted to define a notion of area for surfaces. This will be done
in R2 in order to keep a greater analogy for the next section.

Let γ(s) with t ∈ [0, L] a C 2 curve in R2 already parameterized by the arc


length (so its length is L). We claim that

Area(γ([0, L]) + B[0, ε])


L = lim+
ε→0 2ε
This formula means that we can recover the length of γ from its image with
the help of 2-dimensional Lebesgue measure (please, make a picture). The
idea of the proof we are going to sketch is to compute the measure of the set
γ([0, L]) + B[0, ε]) with the help of a parameterization. As γ(s) = (x(s), y(s))
is parameterized by the length the vector γ ′ (s) has norm one and so the normal
vector n(s) = (y ′ (s), −x′ (s)). Define a 2 variables function by

F (s, t) = γ(s) + tn(s) = (x(s) + ty ′ (s), y(s) − tx′ (s)).

162
Now, if ε > 0 is small enough then F is injective when defined on [0, L]×[−ε, ε]
and its image covers γ([0, L]) + B[0, ε]) except the butts which are covered by
two semicircles of radius ε so

Area(γ([0, L]) + B[0, ε]) = Area(F ([0, L] × [−ε, ε])) + πε2

The area of F ([0, L] × [−ε, ε]) can be computed by integrating the absolute
value of the Jacobian of F over [0, L] × [−ε, ε]. Firstly, the computation leads
to
∂F x′ (s) + ty ′′ (s) y ′ (s)
= ′
∂(s, t) y (s) − tx′′ (s) −x′ (s)

= −x′ (s)2 − y ′ (s)2 − ty ′′ (s)x′ (s) + tx′′ (s)y ′ (s) = −1 + O(t)


Thus
ZZ
Area(F ([0, L] × [−ε, ε])) = 1 ds dt + O(t) = 2εL + O(ε2 )
[0,L]×[−ε,ε]

(actually, the term O(ε2 ) is null). Having in mind the butts of the curve, we
still have Area(γ([0, L]) + B[0, ε]) = 2εL + O(ε2 ). Dividing by 2ε and taking
limits we will recover L, proving so the claim.

9.7 Area of a C 1 surface with boundary


Some simple examples show that the area of a surface cannot be defined as
simply as the length of a curve, that means, as the supremum of areas of poly-
hedral surfaces built upon the points of a triangulation. Based on the ideas
of the previous section we will propose a method for the definition of the area
of a surface. For the computations, we assume already known the definition
and the properties of the vector product (see Section 11.1 for some historical
information).

Let Γ(u, v) be a parameterized C 1 surface with boundary, which means


that:
1. (u, v) ∈ D ⊂ R2 , Γ(u, v) ∈ R3 , Γ is injective;
2. there is D ⊂ C ⊂ R2 where Γ(u, v) extends as C 1 function;
∂Γ ∂Γ
3. ∂u
(u, v) × ∂v
(u, v) ̸= 0 for (u, v) ∈ D.

163
Any implicitly defined surface can always be parameterized in that way locally.
Note that
∂Γ ∂Γ
(u, v) and (u, v)
∂u ∂v
are tangent vectors at the point Γ(u, v). The condition in terms of the vec-
tor product “×” means that they generate the tangent plane at that point.
Moreover, ∂Γ
∂u
(u, v) × ∂Γ
∂v
(u, v) is a normal vector to that plane.
Definition 9.7.1. The area (2-dimensional measure) of a compact subset S ∈
R3 is defined by the limit, whether it exists, as

vol(S + B[0, ε])


Area(S) = lim+
ε→0 2ε
Any C 1 surface with boundary has an area that can be calculated with
the formula given in the following result, however the result is proved for C 2
surfaces in order to simplify the proof.
Theorem 9.7.2. Assume that Γ(u, v) is a parameterized C 2 surface with
boundary. Then its area is given by the formula
ZZ
∂Γ ∂Γ
× du dv.
D ∂u ∂v

Proof. Set the unitary normal vector as


−1
∂Γ ∂Γ ∂Γ ∂Γ
N= × ×
∂u ∂v ∂u ∂v

The points that are at distance less or equal than ε from Γ(D) with exception
of the points that attains the minimum distance to Γ(∂D) are covered with
the image of the map

F (u, v, t) = Γ(u, v) + tN (u, v)

with (u, v, t) ∈ Ωt = D × [−ε, ε]. The computation of the jacobian, done


more explicitly later, will show that it does not vanish for a small choice of
ε > 0. That implies that F is locally injective on D × [−ε, ε] for such a small
ε. Now we will show that a more careful choice of ε > 0 will make F actually
injective. Indeed, by the Inverse Mapping theorem and the Lebesgue covering
theorem there is δ > 0 such that F is injective on every subset of D × [−ε, ε]

164
whose diameter does not exceed δ. Now we may assume that ε > 0 is small
enough in order to satisfy ε < δ/3 and ∥Γ(u1 , v1 ) − Γ(u2 , v2 )∥ < 2ε implies
∥(u1 , v1 ) − (u2 , v2 )∥ < δ/3 which is possible by the uniform continuity of Γ−1
defined on Γ(D). In order to prove global injectivity assume that F (u1 , v1 , t1 ) =
F (u2 , v2 , t2 ). Since |t1 |, |t2 | < ε we have ∥Γ(u1 , v1 ) − Γ(u2 , v2 )∥ < 2ε and so
∥(u1 , v1 , t1 ) − (u2 , v2 , t2 )∥ < δ which contradicts the local injectivity.
The volume of Γ(D) + B[0, ε] differs from the volume of F (Ωt ) in O(t2 ), ob-
tained by estimation of the volume of those points in Γ(D) + B[0, ε] whose
distance, less than ε, is attained at some point from ∂D which is composed of
finitely many C 1 curves.
In order to compute this volume, as F is injective and C 1 , we may use the
change of variable formula
ZZZ
∂F
du dv dt
Ωt ∂(u, v, t)

Note that the partial derivatives that we need for the computation of the
Jacobian can be expressed in vector notation as
∂Γ ∂N ∂Γ ∂N
+t ; +t ;
∂u ∂u ∂v ∂v
and N . Therefore, the Jacobian can be computed as the mixed product of the
three vectors    
∂Γ ∂N ∂Γ ∂N
+t × +t ·N
∂u ∂u ∂v ∂v
 
∂Γ ∂Γ ∂Γ ∂N ∂Γ ∂N 2 ∂N ∂N
= × +t × −t × +t × ·N
∂u ∂v ∂u ∂v ∂v ∂u ∂u ∂v
∂Γ ∂Γ
= × + tf + t2 g
∂u ∂v
where f, g are continuous functions on D. Thus
ZZ
∂Γ ∂Γ
vol(F (Ωt )) = 2ε × du dv + O(ε2 )
D ∂u ∂v

Therefore
ZZ
vol(Γ(D) + B[0, ε]) ∂Γ ∂Γ
lim+ = × du dv
ε→0 2ε D ∂u ∂v

165
which completes the proof.

Actually the formula in the previous Theorem is true for C 1 parameterized


surfaces with boundary but with a more complicate proof. The real purpose of
the Theorem is to show that the standard formula for the area does not actually
depend on the parameterization but on the set of points. There are different
alternative methods to introduce the area of a surface as a measure which
depends only on the point sets. One of them is the notion of 2-dimensional
Hausdorff measure.
Example 9.7.3. Compute the area of the torus defined by the parametric
equation

Γ(θ, ϕ) = ((R + r cos ϕ) cos θ, (R + r cos ϕ) sin θ, r sin ϕ),

for θ, ϕ ∈ [0, 2π] and with 0 < r < R.


The condition on 0 < r < R ensures the regularity of the surface. We have
∂Γ
(θ, ϕ) = (−(R + r cos ϕ) sin θ, (R + r cos ϕ) cos θ, 0),
∂θ
∂Γ
(θ, ϕ) = (−r sin ϕ cos θ, −r sin ϕ sin θ, r cos ϕ).
∂ϕ
These vectors are orthogonal, so the modulus of their vector product is just
the product of their modules

∂Γ ∂Γ
× = (R + r cos ϕ)r.
∂u ∂v

Then we can compute the area


Z 2π Z 2π
Area = (R + r cos ϕ)r dθ dϕ = 4π 2 Rr.
0 0

9.8 Alternative expressions for the area


We will introduce some classic notation
2 2
∂Γ ∂Γ ∂Γ ∂Γ
E= ; F = · ; G=
∂u ∂u ∂v ∂v

166
which are the coefficients of the so called first fundamental form in differential
geometry of surfaces. With this notation we have
2
∂Γ ∂Γ
× = EG − F 2
∂u ∂v

and therefore
ZZ ZZ √
∂Γ ∂Γ
Area = × du dv = EG − F 2 du dv
D ∂u ∂v D

which is easier to compute, specially in the case of an orthogonal parameteri-


zation because F = 0 in that case. An important feature of the formula for the
area based in the first fundamental form coefficients is that it is independent of
the dimension of the ambient space: the formulas are valid in Rd for any d ≥ 3.

Now we will discuss two important particular cases. The first one is the
form adopted by the integral for a surface given as the graph of a function
z = f (x, y) with (x, y) ∈ D. In this case the parameters are the variables x, y
and Γ(x, y) = (x, y, f (x, y)) and so

∂Γ ∂f ∂Γ ∂f
= (1, 0, ); = (0, 1, ).
∂x ∂x ∂x ∂y
We have then
 2  2
∂f ∂f ∂f ∂f
E =1+ ; F = ; G=1+ .
∂x ∂x ∂y ∂y

After an easy computation we get that


 2  2
2 ∂f ∂f
EG − F = 1 + +
∂x ∂y

and s
ZZ  2  2
∂f ∂f
Area = 1+ + dx dy.
D ∂x ∂y
The second case of surfaces admitting a particular formula for their area we are
going to discuss is the case of the surfaces of revolution. Assume that we rotate
around the the X-axis the graph of an one variable function y = f (x) ≥ 0 with

167
x ∈ [a, b]. In such a case, in addition to the variable x ∈ [a, b], we have to
consider a rotation parameter θ ∈ [0, 2π], so the surface can be expressed as

Γ(x, θ) = (x, f (x) cos θ, f (x) sin θ)

and for the partial derivatives


∂Γ ∂Γ
= (1, f ′ (x) cos θ, f ′ (x) sin θ); = (0, −f (x) sin θ, f (x) cos θ)
∂x ∂θ
which points out the (evident) fact that the parameterization is orthogonal, so
F = 0 and p
E = 1 + f ′ (x)2 ; G = f (x).
Therefore the area can be write as
Z 2π Z b p Z b p
Area = ′ 2
f (x) 1 + f (x) dx dθ = 2π f (x) 1 + f ′ (x)2 dx
0 a a
Rbp
which admits an intuitive interpretation based on the fact that a 1 + f ′ (x)2 dx
is the length of the graph of f . There is another physical interpretation for this
integral. If we consider that the curve has a constant linear density, the vertical
coordinate of the center of mass of the segment of graph {(x, f (x)) : x ∈ [a, b]}
is Rb p
a
f (x) 1 + f ′ (x)2 dx
CMY = R b p .
a
1 + f ′ (x)2 dx
With this notation, the area generated by the rotation of the curve is

Area = 2π CMY Length

which is the classical first Pappus–Guldin theorem. The reader can check that
the result of Example 9.7.3 can be easily obtained in this way.

For the sake of completeness, let us comment also the second Pappus–
Guldin theorem. Consider now the vertical coordinate of the center of mass of
the trapezoid
{(x, y) : x ∈ [a, b], y ∈ [0, f (x)]}
that is Rb
a
f (x)2 dx
CMY = Rb .
2 a
f (x) dx

168
Then the volume generated by the rotation can be calculated as
Volume = 2π CMY Area
being “Area” the area of the trapezoid. The formula is evident by integrating
through cross sections. Note that the center of mass involves quadratic degree.
That was used by Archimedes in his Method to reduce one degree in “integra-
tion”, reducing in that way, for instance, the computation of the area limited
by a parabola to a property of the triangle.

9.9 Area measure and integration on surfaces


Once we know how compute the area of a parameterized surface with boundary
we can extend the notion to other kinds of surfaces. If the surface is simply the
finite union of parameterized surfaces with boundary (think of a polyhedron)
we may take the union. In case we have a compact 2-dimensional C 1 manifold
it is possible to decompose it into finitely many parameterized surfaces with
boundary with the help of the implicit function theorem. This is intuitive clear
but the details are quite tricky. What is more important is the fact that the
value of the area does not depend on how the decomposition was done. We
may cast some light using notions from measure theory.

On a parameterized C 1 surface with boundary Γ (the parameter domain is


D) embedded into R3 we can define a positive measure S by the formula
ZZ
∂Γ ∂Γ
S(A) = × du dv
Γ−1 (A) ∂u ∂v
where A is a relative Borel subset of S, that is, the intersection of Γ(D) with
a Borel subset of R3 , and the integral is taken in Lebesgue sense with respect
to the Lebesgue measure on R2 . Note that there is no trouble considering the
measure also on ∂Γ(D) = Γ(D) \ Γ(D) and its measure is 0. We know that
this measure does not depends on the particular C 2 parameterization but a
proof is required for C 1 surfaces.
Theorem 9.9.1. Given a piezewise C 1 surface Σ in Rn (2-dimensional man-
ifold) there exists a Borel measure on Rn concentrated on Σ such that for any
Borel set A ⊂ Σ which is covered by a parameterized C 1 piece Γ, then
ZZ
∂Γ ∂Γ
S(A) = × du dv.
Γ−1 (A) ∂u ∂v

169
Proof. The surface can be represented as finite or countable union of non
overlapping parameterized C 1 surfaces with boundaries. The formula above
defines a measure on each piece and the sum (series) of all those measures will
be the wanted measure. We have to check that the measure does not depends
on the decomposition into parameterized C 1 surfaces neither the parameteri-
zations.
Suppose that Σ has two different decompositions. The intersection of both
decompositions induce a finer decomposition, at most countable. It is not dif-
ficult to see that the problem of uniqueness for S reduces to check if it is the
same on each of such a pieces.
Suppose that Γ1 and Γ2 with domains D1 and D2 . Firstly, note that h =
Γ−1 1
2 ◦ Γ1 is an injective C map form D1 onto D2 . Therefore the theorem of
change of variables is applicable
ZZ ZZ
∂Γ2 ∂Γ2 ∂Γ2 ∂Γ2 ∂(u, v)
× du dv = × ds dt
−1
Γ2 (A) ∂u ∂v −1
(h−1 ◦Γ2 )(A) ∂u ∂v ∂(s, t)

where h(s, t) = (u(s, t), v(s, t)). Now we have to express the tangent vectors
in terms of Γ1 and the variables s, t. We have Γ1 = h ◦ Γ2 , thus
∂Γ1 ∂Γ2 ∂u ∂Γ2 ∂v ∂Γ1 ∂Γ2 ∂u ∂Γ2 ∂v
= + ; = + .
∂s ∂u ∂s ∂v ∂s ∂t ∂u ∂t ∂v ∂t
The vector product gives
 
∂Γ1 ∂Γ1 ∂u ∂v ∂v ∂u ∂Γ2 ∂Γ2 ∂(u, v) ∂Γ2 ∂Γ2
× = − × = ×
∂s ∂t ∂s ∂t ∂s ∂t ∂u ∂v ∂(s, t) ∂u ∂v
and thus
ZZ ZZ
∂Γ2 ∂Γ2 ∂Γ1 ∂Γ1
× du dv = × ds dt
Γ−1
2 (A)
∂u ∂v Γ−1
1 (A)
∂s ∂t

since (h−1 ◦ Γ−1 −1 −1


2 )(A) = (h ◦ Γ2 ) (A) = Γ1 (A). That proves that the measure
S is independent on the parameterization and so on the decomposition.
Remark 9.9.2. There are more general approach to the problem of define the
area that do not require parameterizations, like the 2-dimensional Hausdorff
measure. However, dealing with Hausdorff measures is not a quite easy task.
Integration on surfaces with respect to the area measure S plays a very
important role in applications, like integration with respect to the arc length

170
for curves. The construction of the measure S by means of an integral together
standard methods of measure theory (approximation by simple functions) im-
plies for function f defined on a surface Σ which is integrable with respect to
S we have Z X ZZ ∂Γn ∂Γn
f dS = f ◦ Γn × du dv.
Σ n Dn ∂u ∂v
where (Γn , Dn ) is a finite or countable decomposition of Σ.

We will discuss some cases which may happen in physical applications in


3
R . In this setting
RR is preferred to write the surface integral with double in-
tegration sign Σ f dS and vectors are often distinguished with little arrows
above.

Firstly, in many occasions it will be necessary to integrate vector fields




F = (f1 , f2 , f3 ). In that case we have


ZZ ZZ ZZ ZZ
F dS = ( f1 dS, f2 dS, f3 dS)
Σ Σ Σ Σ

However y many occasions, the vector field will be normal to the surface, so it

− →
− →

can be write as F = f N , being N a normal unitary vector field and f a scalar
function. Let us remark that at any point there are two unitary vectors which


are normal to the surface. To set a continuous normal field N is to choose the
orientation of Γ between the two possible ones for a parameterized surface with
boundary (general surfaces cannot always be oriented as the Moebius strip. In
that case there is a specific notation

− →

ZZ ZZ
f N dS = f dS
Σ Σ


The notation using the oriented element of area d S is very suitable because
in the case of a parameterized surface Γ we have


ZZ ZZ
∂Γ ∂Γ
f dS = f ◦Γ × du dv.
Γ D ∂u ∂v


in case the unitary normal N points in the same direction that ∂Γ
∂u
× ∂Γ
∂v
. Indeed,
in such a case we have
∂Γ ∂Γ ∂Γ ∂Γ → −
× = × N.
∂u ∂v ∂u ∂v

171
Finally we will consider the so called flux of a vector field throughout a surface.
If the vector field represents the speeds at a given moment of the particles a
moving fluid is composed of, the flux integral computes the volume of fluid
crossing the surface by time unit at that moment. Let Γ be a parameterized
C 1 surface with boundary and assume that there is an orientation given to Γ


which agrees with ∂Γ∂u
× ∂Γ
∂v
. Then the flux of a field F throughout Γ is defined
as

− → − →
− → −
ZZ ZZ
F · dS = F · N dS
Γ Γ
and, obviously, it can be computed by

− → − →
− ∂Γ ∂Γ
ZZ ZZ
F · dS = F · × du dv
Γ D ∂u ∂v
and the scalar-vector product inside can be computed straightly by means of
a determinant of the coefficients of the three vectors. The flux integrals can
be interpreted as integrals of differential forms of second degree.

9.10 Rationale and remarks


In this chapter we define the length of parameterized curves in normed spaces
and the area of parameterized surfaces in the Euclidean R3 . The difference of
generality between both approaches is due to these facts:
1. Rectangles provide the most simple example of computation of area. Of
course, “rectangles” requiere a notion of orthogonality that is not obvious
without Euclidean structure.
2. The restriction to dimension 3 is a consequence of the chosen method of
definition. The proof also involves the use of the vector product, which
is a peculiarity of R3 .
We choose the parameterized form of the manifolds because it provides
explicit formulas. If the manifold cannot fully parameterized, the measure is
extended piecewise. In the case of curves, we consider simply piecewise C 1 .
For surfaces the construction is a bit more complicated using that any surface
admits locally parameterizations.

In order to characterize rectifiable curves in finite dimensional spaces we


start with the theory of functions of bounded variation. As a consequence, the

172
fact of being rectifiable implies that the curve necessarily is continuous but
countably many points and continuous rectifiable curves have tangents almost
everywhere.

The definition of area of a surface is motivated by an alternative approach


to the computation of the length of a curve in R2 . The purpose of our approach
is to have a “coordinate free” definition for the area, however the method of
proof gives the existence of the area (and the formula) only for C 2 surfaces
with boundary. Later, the use of the area formula for piecewise C 1 surfaces is
discussed. Other coordinate free approaches, as the Hausdorff 2-dimensional
measure are not simpler.

Integration on curves and surfaces is mainly intended for scalar functions,


since the integration of differential forms is done in other lesson. However, line
and flux integrals are addressed briefly here because the theory of differential
forms, despite its elegance, does not provide a constructive approach.

9.11 Exercises
1. A logarithmic spiral is a curve that can be represented in polar coordi-
nates by r = aebθ . Calculate the length of the arc for θ ∈ [0, 2π].
2. Calculate the length of the cardioid, a classic curve whose polar formula
is
r = 1 + cos θ
where θ ∈ [−π, π].
3. Calculate the length of the closed curve

x = a sin3 t; y = a cos3 t.

4. Find the length of the Viviani curve

x2 + y 2 + z 2 = 1; x2 + y 2 − x = 0.

5. Prove that the length of a continuous rectifiable curve is a continuous


function of the parameter.

173
6. Find the explicit formula for the length of a curve contained in a C 1 sur-
face T (u, v) = (x(u, v), y(u, v), z(u, v)) in terms of the first fundamental
form and the this expression for the curve γ = T ◦ η donde η : [a, b] → R2
also C 1 .
7. Calculate the area of the portion of sphere x2 + y 2 + z 2 = 2x inside the
cone x2 + y 2 = z 2 .
8. Calculate the area of the cone x2 + y 2 = z 2 with z ≥ 0 inside the sphere
x2 + y 2 + z 2 = 2ax.
9. Consider the surface z = Axy con x2 +y 2 ≤ R2 where A, R ≥ 0. Estimate
the value of the parameter A for the area of the surface be twice the area
of its orthogonal projection on the plane XY .
p
10. Find the area of the portion of cone z = 2x2 + 2y 2 below the plane
z = x + 1.

11. Calculate the area of the surface z = 2xy with (x, y) ∈ [0, a] × [0, b].
12. Calculate the area of the piece of paraboloid y 2 + z 2 = 2px limited by
the plane x = a.
13. Calculate the area of the piece of paraboloid x2 /a + y 2 /b = 2z inside the
cylinder x2 /a2 + y 2 /b2 = 1.
14. Calculate the area of the cone z 2 = 2xy limited by the planes x = 0,
x = a, y = 0, y = b.
15. Find the are of the surface

z = arcsin(sinh x sinh y)

limited between the planes x = a, x = b with a, b > 0.


16. Calculate the area of the piece of paraboloid y 2 + z 2 = 4ax intercepted
by the cylinder y 2 = ax and the plane x = 3a.
RR
17. Calculate S z dS where S is the half-sphere x2 + y 2 + z 2 = a2 , z > 0.
18. Calculate the area and volume of a torus.
19. Find with the help of the Pappus-Guldin theorem the position of the
center of mass of a solid homogeneous semicircle.

174
20. The catenary is the curve with the shape of a hanging chain and it is
modelled by the hyperbolic cosine. The catenoid is the surface generated
by a catenary that rotates around its symmetry axis.
(a) Find the length of an arc of catenary.
(b) Find the area of a catenoid between its vertex and a circular section.

21. Consider the arc of cycloid x(t) = t − sin t, y(t) = 1 − cos t for t ∈ [0.2π].
(a) Find the length of the arc.
(b) Find the area of the surface generated by the rotation of the arc
around the X axis.

175
176
Chapter 10

Differential forms of low degree

10.1 Forms of degree 1


The aim of the following device is to understand the line integral with not
appeal to the scalar product. Remember that a very common interpretation
of the path integral is the work done by a force along a trajectory and scalar
product is a keystone in such a formulation.

Let X denote a normed space. The set of continuous linear functionals


defined on X is also a normed space denoted usually by X ∗ (the dual space).
The elements of X ∗ act on the elements of X giving a real number: x∗ (x) ∈ R
for x ∈ X and x∗ ∈ X ∗ . Note that in the case that X = Rd we may identify
X ∗ with Rd too, however they have different nature and the norm may differ.
Definition 10.1.1. A differential form ω of degree 1 (also called 1-form) is a
function defined on a open subset of X with values on X ∗ .
The 1-form is said continuous, differentiable. . . if it is so considered as a map
between normed spaces. Usually we will consider 1-forms which are continuous
at least, but later we will requiere more regularity for some applications.

The most typical example of 1-form is the differential of a differentiable


real function f defined on some open subset D ⊂ X. Indeed, df is defined on
D and df (x) is a continuous linear map from X to R, that is, an element of X ∗
for every x ∈ D. By the way, we may think of scalar functions as differential
forms of degree 0, so the differentiation increases by one the degree of the form.
Later we will increase to degree 2 by a further derivation.

177
Now we will address our attention to the finite dimensional case X = Rn .
If we denote a generic point as x = (x1 , x2 , . . . , xn ) we may consider the linear
forms given by the assignation to the k-th coordinate x → xk an to denote by
dxk this linear form. It turns out that {dx1 , dx2 , . . . , dxn } is a basis of (Rn )∗ .
Therefore, any 1-form ω can be expressed in terms of the basis as
ω = f1 dx1 + f2 dx2 + . . . fn dxn
where fk with k = 1 . . . n are scalar functions. After that, we can express the
differential of a scalar function in this way
∂f ∂f ∂f
df = dx1 + dx2 + · · · + dxn .
∂x1 ∂x2 ∂xn
If n ≤ 3 we will use the set of variables x, y, z rather than the numeration.

10.2 Integration of 1-forms on paths


Given a parameterized curve γ : [a, b] → D ⊂ Rn and a continuous 1-form ω
defined on D, the integral of ω on (or along) γ is the number
Z Z b
ω= ω(γ(t))(γ ′ (t)) dt.
γ a

The first natural question is to know in what extent the definition depends on
γ as a function rather than on the “shape” γ([a, b]). Actually the integral will
depend on the set γ([a, b]) and the order in which the points are placed, the
sense the curve is walked. To decide between the two possible ways to walk
the curve is to set an orientation. To be more precise, the integral of 1-forms
does not change by reparameterizations which preserve the orientation.
Proposition 10.2.1. Let γ : [a, b] → D ⊂ Rn piecewise C 1 , ω a continuous
1-form defined on D and j : [c, d] → [a, b] an increasing piecewise C 1 bijection.
Then γ̃ : [c, d] → D is a piecewise C 1 curve and
Z Z
ω= ω
γ̃ γ

Proof. The following equalities are not bothered by finite set of points where
the derivatives are not defined
Z Z d Z d

ω= ω(γ̃(t))(γ̃ (t)) dt = ω(γ(j(t)))(γ ′ (j(t))j ′ (t)) dt
γ̃ c c

178
Z d Z b Z
′ ′ ′
= ω(γ(j(t)))(γ (j(t))) j (t) dt = ω(γ(τ ))(γ (τ )) dτ = ω
c a γ
where we have used the chain rule and the linearity of the form.

The previous proposition motivates the notion of “path”. A path is the


class of equivalence of all the curves that can be obtained one form the other
through regular reparameterizations that no reverse the orientation.
We may perform some operations with paths: if a curve γ can be considered
as the concatenation of two of them γ1 , γ2 , up to a suitable reparameterization
, we write γ = γ1 + γ2 ; if some parameterization γ ∗ of γ([a, b]) goes backwards
we write γ ∗ = −γ. Now we have this obvious result.
Proposition 10.2.2. For paths γ, γ1 , γ2 with values on the domain of a con-
tinuous 1-form ω the following equalities hold:
Z Z Z Z Z
ω= ω+ ω ; ω = − ω.
γ1 +γ2 γ1 γ2 −γ γ

The chain rule is also the trick behind the following result.
Proposition 10.2.3. Let f : D ⊂ Rn → R be C 1 and let γ : [a, b] → D
piecewise C 1 . Then Z
df = f (γ(b)) − f (γ(a)).
γ
R
In particular, if γ is closed, that is, γ(a) = γ(b), then γ
df = 0.
Proof. Only one line
Z Z b Z b
′ d
df = df (γ(t))(γ (t)) dt = (f (γ(t))) dt = f (γ(b)) − f (γ(a)).
γ a a dt

Note that to say that the integral depends only on the starting and ending
points (from now on “endpoints” with a distinction between them) means that
it does not matter how the path is joining them, which is actually equivalent
to say that the integral along a closed path is always 0. We have the following
important result.
Theorem 10.2.4. Let ω be an 1-form defined on D ⊂ RRn . There there exists
a C 1 function f : D → R such that ω = df if and only if γ ω depends only on
R
the endpoints of γ (equivalently, if γ ω = 0 for every closed γ ⊂ D).

179
Proof. Clearly one implication is a consequence of the previous proposition.
Assume now that the line integral depends only on the endpoints of the curve.
We may assume that D is connected, so the for a general open set it is enough
to make the following construction on every connected component. Fix a point
x0 ∈ D and for any other point x ∈ D fix a C 1 curve γx ⊂ D starting at xR0 and
ending at x (the parameter interval is not relevant). Define now f (x) := γx ω.
Note that this definition is not ambiguous by the hypothesis of independence
on how the points x0 and x are joined.
Now fix x ∈ D and h ∈ Rn and take δ > 0 small enough to have x + h ∈ D
for ∥h∥ < δ]. Observe that
Z Z Z
f (x + h) − f (x) = ω− ω= ω
γx+h γx γx+h −γx

and the fact that the curve γx+h − γx joins the points x and x + h, so it can
be replaced by the segment x + th with t ∈ [0, 1]. We have then
Z 1
f (x + h) − f (x) = ω(x + th)(h) dt.
0

Since ω is continuous at x, given ε > 0 we can take a smaller δ > 0 so


∥ω(x + th) − ω(x)∥ < ε for all t ∈ [0, 1], what implies

|ω(x + th)(h) − ω(x)(h)| < ε∥h∥.


R1
As obviously 0
ω(x)(h) dt = ω(x)(h), putting it all together
Z 1
|f (x + h) − f (x) − ω(x)(h)| = | (ω(x + th)(h) − ω(x)(h)) dt| ≤ ε∥h∥
0

what means that f is differentiable at x and its differential is precisely ω(x)


as wanted.

An 1-form ω is called exact if it has a primitive f , that is, if ω = df . Note


that the primitive is not unique and two primitives of the same form should
differ in a function whose differential is 0 and therefore constant on every con-
nected component of the domain.

The characterization of exact 1-forms in terms of


Pintegrals is not a practical
n
one. First of all note that if an exact 1-form ω = k=1 fk dxk = df is C 1 then

180
∂f
its coefficients satisfy that fk = ∂xk
and therefore

∂fk ∂ 2f ∂ 2f ∂fj
= = =
∂xj ∂xk ∂xj ∂xj ∂xk ∂xk

since f is C 2 the commutation of derivatives holds. Amazingly, this necessary


condition turns out to be almos a sufficient condition for exactness. For that
reason we are going to name it: a 1-form ω = nk=1 fk dxk is called closed if
P
∂fk ∂f
∂xj
= ∂xjk on all its domain and for every pair of indices i, j. We say that an
open subset D is star-shaped if there is a point x0 ∈ D such that for any other
point x ∈ D the segment joining x0 and x is contained in D. In particular,
star-shaped sets are connected, and convex open sets are star-shaped and
Theorem 10.2.5 (Lemma of Poincaré). Let D ⊂ Rn be a star-shaped open
set and let ω be a C 1 closed form. Then ω is exact.
Proof. By a translation we may assume that D is star-shaped with respect
to 0, so the segment joining 0 and (x1 , . . . , xn ) is simply (tx1 , . . . , txn ) with
t ∈ [0, 1]. The candidate to primitive is obviously
Z 1 n
X
f (x1 , . . . , xn ) = fk (tx1 , . . . , txn )xk dt
0 k=1

Now, for the partial derivative with respect to xj we have


Z 1 n
∂f X fk
(x1 , . . . , xn ) = fj (tx1 , . . . , txn ) + (tx1 , . . . , txn )txk dt
∂xj 0 k=1
∂xj
Z 1 n
X fj
= fj (tx1 , . . . , txn ) + (tx1 , . . . , txn )txk dt
0 k=1
∂xk
Z 1
d
= fj (tx1 , . . . , txn ) + (fj (tx1 , . . . , txn )) dt
0 dt
Z 1
d
= (tfj (tx1 , . . . , txn )) dt = tfj (tx1 , . . . , txn ) |10 = fj (x1 , . . . , xn )
0 dt
where for the first equality we may exchange derivation and integration by the
regularity of the functions, the second equality makes use of the hypothesis and
the following equalities are based on the chain rule and derivation of products.

181
That shows that df = ω as wanted.

A explicit form for the primitive of ω is given in the proof, however for
small dimension is better to proceed in this way: suppose that ω = p dx + q dy
∂p ∂q
is closed, that is ∂y = ∂x . Then find a partial primitive g of q with respect to
∂g
y, that is, ∂y = q. The function we are looking for can be written as f = g + h
where h does not depend on y as
∂h ∂f ∂g
= − = q − q = 0.
∂y ∂y ∂y
Then h(x) is a function of a single variable and
∂f ∂g ∂g
h′ (x) = − =p− .
∂x ∂x ∂x
The last term should be only function of x in order to find h. Actually, it is
∂ ∂g ∂p ∂ 2g ∂p ∂q
(p − )= − = − =0
∂y ∂x ∂y ∂x∂y ∂y ∂x
by the hypothesis. Now find h and we have f = g + h explicitly.

Poincaré’s results points out that closed forms are exact on special domains.
If the domain is not start-shaped then the result may fail. Indeed, the form
−y dx x dy
2 2
+ 2
x +y x + y2
on R2 \ {(0, 0)} is closed (easily checkable) and not exact (consider the integral
around the unit circle). The formula f (x, y) = arctan(y/x) provides a primitive
valid on a halfplane that can be extended to any domain of the form R2 \ R
where R is a halfline departing from (0, 0) (being f (x, y) is a measure of the
angle between (x, y) and R). The characterization of the domains where all
closed form is exact is actually a topological matter, however the notions and
proofs involved are beyond the scope of these notes.

10.3 The Green-Riemann formula


This section is concerned with integration of 1-forms in R2 over simple closed
paths, or more generally, the boundary of bounded open domain provided that

182
it is piecewise C 1 . We will start with simpler elementary domains. We say
that a domain is elemental with respect to the X axis if it is limited by two
vertical lines x = a and x = b and two graphs of C 1 functions f, g : [a, b] → R
such that f > g. The boundary ∂D of an elementary domain D is supposed
to be oriented anticlockwise, that is the graph of f is walked from right to left
and the segment on x = a is walked down, for instance. A domain elemental
with respect to the Y axis is defined similarly, or we can think of a how looks
like a domain elemental with respect to the X axis after a rotation of π/2.

Lemma 10.3.1. Let D be a elemental domain with respect to the X axis and
p(x, y) a C 1 function defined on a domain containing D. Then
Z ZZ
∂p
p dx = − dxdy.
∂D D ∂y

Proof. Note that the vertical segments do not add to the line integral because
it does not contain dy. The graphs are parameterized by (x, g(x)) and (x, f (x)),
but this last one must be walked backwards. Thus
Z Z b Z b
p dx = p(x, g(x)) dx − p(x, f (x)) dx =
∂D a a
!
Z b Z b Z f (x)
∂p
− (p(x, f (x)) − p(x, g(x))) dx = − (x, y) dy dx =
a a g(x) ∂y
ZZ
∂p
− (x, y) dxdy
D ∂y
as wanted.

If we had considered an elemental domain with respect to the Y axis the


proof would have followed the same lines but the anticlockwise orientation of
the domain has an opposite relation to the orientation of the Y axis. Therefore
the analogous result does not contain the sign minus.

Lemma 10.3.2. Let D be a elemental domain with respect to the Y axis and
q(x, y) a C 1 function defined on a domain containing D. Then
Z ZZ
∂q
q yx = dxdy.
∂D D ∂x

183
It is easy to prove that a convex domain with C 1 boundary is elemental with
respect to both the X and Y axis, and many other domains can be expressed
as a non overlapping union of domains which are elemental with respect to the
two axis. Putting together the previous lemmas and the observation we have
the following.
Proposition 10.3.3 (Green-Riemann, elemental domains). Let D be a domain
which is elemental with respect to both the X and Y axis, and let ω = p dx+q dy
an 1-form which is C 1 on a domain that contains D. Then
Z Z ZZ  
∂q ∂p
ω= p dx + q dy = − dxdy.
∂D ∂D D ∂x ∂y
Moreover, the same formula hold if D is a domain such that it can be decom-
posed into a finite non overlapping union of domains with C 1 boundaries which
are elemental with respect to both the X and Y axis.
Proof. The first statement is just the sum of the formulas provided by the
lemmas. For the second statement we have only to remark that the double
integras
Sn are additive for non overlapping unions of domains. Namely, if D =
k=1 Dk then

n ZZ   ZZ  
X ∂q ∂p ∂q ∂p
− dxdy = − dxdy.
k=1 Dk ∂x ∂y Sn
k=1 Dk ∂x ∂y

On the other hand,Sany C 1 non trivial (not reduced to a single point) piece
of curve contained nk=1 ∂Dk \ ∂D is contained into a shared boundaryR ∂Di ∩
∂Dj where i ̸= j are unique. This curve does not RcontributeR to ∂D since
it is walked in opposite directions when computing ∂Di and ∂Dj with the
subsequent
Pn cancelation. That can be expressed with the “arithmetics of paths”
as k=1 ∂Dk = ∂D. Therefore
n Z
X Z Z
p dx + q dy = Pn
p dx + q dy = p dx + q dy
k=1 ∂Dk k=1 ∂Dk ∂D

which completes the proof of the proposition.

The previous result cast more light on the relation between closed and exact
1-forms. Indeed, if the 1-form is closed then the function inside the double
integral vanishes, so the line integral along the boundary of any elemental

184
domain is zero. Any Jordan closed curve is the boundary of a region. However,
provided that such a curve is C 1 , it may be impossible to decompose it into
finitely many elemental domains. We will improve the previous proposition for
more general domains.
Theorem 10.3.4 (Green-Riemann, general). Let D be a bounded open domain
with C 1 boundary and let p dx + q dy be an 1-form which is C 1 on a domain
that includes D. Then
Z ZZ  
∂q ∂p
p dx + q dy = − dxdy.
∂D D ∂x ∂y
Proof. It is enough to prove the result for 1-forms of type p dx as we have
seen before. Let F1 ⊂ ∂D be the finite set of points where the boundary is not
smooth. Let K ⊂ ∂D be the compact set of the points of the boundary where
the tangent line is vertical. Indeed, the set has closed intersection with each
C 1 piece, explicitly {γ(t) · (1, 0) : γ ′ (t) · (1, 0) = 0}. The orthogonal projection
of K onto the X axis KX is also compact and has measure 0 by Morse-Sard’s
theorem. The set KX can be covered with finitely many open intervals whose
total length can be taken smaller than a number δ we will precise later. Let
F2 denote the finite set of their butts. The vertical lines built on the points
of F = F1 ∪ F2 define closed strips Sk with k = 1, . . . , n of two types: stripes
of type B (bad) if they contain points of ∂D at which the tangent is vertical;
stripes of type A (alright) just the others. Note that at any point of ∂D ∩ S
with S of type A is possible to express locally the boundary as the graph of
a function y = f (x). Indeed, the points where the implicit function theorem
is not applicable are contained in the strips of type B. Adding a finite set of
points F3 to F we may moreover assume that any connected part of ∂D ∩ S
for S of type A is a graph of the sort y = f (x). Under the hypotheses, the
number of connected parts of ∂D ∩ S must be finite and thus D ∩ S can be
decomposed into finitely elemental domains. Therefore
Z ZZ
∂p
p dx = − dxdy
∂(D∩S) D∩S ∂y

for any S of type A. Now we are going to deal with the stripes of type B.
∂p
Take ε > 0. Since the set D is bounded and ∂y continuous we may take δ > 0
small enough to guarantee that
X ZZ ∂p
dxdy < ε.
k∈B D∩Sk
∂y

185
On the other hand, if the covering by open intervals of KX is tight enough
we may assume that first component of the derivative (γ ′ (t) · (1, 0) for γ a C 1
piece of ∂D) is small than ε/length(∂D). That implies
X Z
p dx < ε.
k∈B ∂(D∩Sk )

Now we have
X ZZ ∂p X Z
dxdy + p dx < 2ε.
k∈A∪B D∩Sk ∂y k∈A∪B ∂(D∩S)

Since the cancelation happens for the vertical segments added by the sripe, we
get Z ZZ  
∂p
p dx + dxdy < 2ε
∂D D ∂y
and being ε > 0 arbitrary we arrive to the formula in the statement.

It is necessary to remark that the anticlockwise orientation of D not always


goes “anticlockwisely”. Consider the square [0, 3]2 which is made up of 9
smaller squares of side 1. Remove the central square [1, 2]2 and call D the
remainder. Then ∂D has two connected components, and the inner one is
walked clockwise. Indeed, endow the boundary of the 8 squares whose union
is D with the anticlockwise orientation drawing the arrows and then look.
Another explanation based in Green-Riemann, if we add [1, 2]2 to D with its
anticlockwise orientation on the border, the shared boundary of D should be
walked the other way around for the annihilation of the line integrals.

10.4 Forms of degree 2


We will begin by the definition the algebraic forms of degree 2. Consider a
bilinear continuous map b : X × X → R. We say that b is symmetric if
b(x, y) = b(y, x) and antisymmetric if b(x, y) = −b(x, y) for every x, y ∈ X.
Note that the bilinear continuous forms on X made up a vector space and the
sets or symmetric and antisymmetric are linear subspaces. Moreover, the space
of continuous bilinear forms is direct sum of the symmetric and antisymmetric
subspaces. That is consequence of the unique decomposition of a bilinear form
b as a symmetric and antisymmetric forms
1  1 
b(x, y) = b(x, y) + b(y, x) + b(x, y) − b(y, x) .
2 2
186
If the dimension of X is finite, say n, then the dimension of the space of bilinear
forms (all them are continuous in this case) is n2 . Indeed, if {e1 , . . . , en } is
basis of X the value of a bilinear form b is determined by the n2 values of the
coefficients {b(ei , ej ) : 1 ≤ i, j ≤ n} as
Xn n
X n X
X n
b( xi ei , yj ej ) = xi yj b(ei , ej ).
i=1 j=1 i=1 j=1

Now, observe that a symmetric form is determined by n(n + 1)/2 coefficients


since b(ei , ej ) = b(ej , ei ), therefore that number is the dimension of the sub-
space of symmetric forms. Analogously, the dimension of the subspace of
antisymmetric forms is n(n − 1)/2. This is a lesser number because for all
1 ≤ i ≤ n we have b(ei , ei ) = 0.

Let A2 (X) denote the set of all continuous antisymmetric bilinear forms
on X. This space is naturally endowed with the topology of the supremum
norm (on BX × BX ). In order to keep coherence with the general theory of
alternate forms, which we are no discussing here, A0 (X) will denote the scalars
and A1 (X) = X ∗ . We will introduce the exterior product in a very restricted
version
∧ : A1 (X) × A1 (X) → A2 (X)
defined as (α ∧ β)(x, y) = α(x)β(y) − α(y)β(x) for α, β ∈ A1 (X). There is not
difficulty in checking that ∧ is bilinear and antisymmetric.

In case of finite dimension, with X = Rn we will often consider a basis of


A2 (Rn ) which is built of from the standard basis of A1 (Rn ) with the help of
the exterior product. Namely, the basis made up of the following elements


 1 if k = i, l = j



dxi ∧ dxj (ek , el ) = −1 if k = j, l = i




0 otherwise

The basis of A2 (Rn ) can be enumerated as follows


{dxi ∧ dxj : 1 ≤ i < j ≤ n},
however for reasons that will be clear later in the case of R3 , being the dimen-
sion of A2 (R3 ) = 3, we prefer the cyclic ordering {dx2 ∧dx3 , dx3 ∧dx1 , dx1 ∧dx2 },

187
or alternatively {dy ∧ dz, dz ∧ dx, dx ∧ dy}.

After the algebraic part we are ready for the analytic definition.
Definition 10.4.1. A differential form ω of degree 2 (also called 2-form) is a
function defined on a open subset of X with values on A2 (X).
As in the case of 1-forms we are interested in 2-forms which are at least
continuous, and often differentiable or with further regularity. This regularity
is revealed by the scalar coefficient functions in the case of finite dimension
X
ω= fij dxi ∧ dxj .
1≤i<j≤n

Pn
Given a differentiable 1-form ω = i=1 fi dxi its exterior derivative is the
2-form defined as
X  ∂fj ∂fi

dω = − dxi ∧ dxj .
1≤i<j≤n
∂x i ∂x j

It is not difficult to see that the operation of exterior differentiation is linear,


and moreover it satisfies the following analogous of the Leibniz differentiation
rule: given a function f and ω a 1-differential form then

d(f ω) = df ∧ ω + f dω

were the properties of the operation ∧ are used when it comes to simplification.

Note that with the help of exterior differentiation we can rewrite the con-
dition of closeness for 1-forms: the 1-form ω is closed if and only if dω = 0.

Now we are going to discuss exactness and closeness for 2-forms. Let ω be
a 2-form. We say that ω is exact if there is a 1-form α such that ω = dα. As
in the case of exact 1-forms, the 2-forms which are exact satisfy a sort identity
with partial derivatives of the coefficients. Firstly, consider an exact 2-form
in R3 , whose primitive f1 dx1 + f2 dx2 + f3 dx3 is C 2 , and so the 2-form can be
written as
     
∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1
− dx2 ∧ dx3 + − dx3 ∧ dx1 + − dx1 ∧ dx2 .
∂x2 ∂x3 ∂x3 ∂x1 ∂x1 ∂x2

188
Note now that the following identity holds
     
∂ ∂f3 ∂f2 ∂ ∂f1 ∂f3 ∂ ∂f2 ∂f1
− + − + − = 0.
∂x1 ∂x2 ∂x3 ∂x2 ∂x3 ∂x1 ∂x3 ∂x1 ∂x2
In other words (and standard notation in R3 ), if ω = Ady ∧ dz + Bdz ∧ dx +
Cdx ∧ dy is exact then
∂A ∂B ∂C
+ + = 0.
∂x ∂y ∂z
The previous equality satisfied by exact 2-forms can be replaced by n3

equalities in dimension n which are the key of the next definition. We say that
a 2-form X
ω= fij dxi ∧ dxj
1≤i<j≤n

is closed if the following equality is satisfied


∂fij ∂fjk ∂fki
+ + =0
∂xk ∂xi ∂xj
whenever i, j, k are different integers between 1 and n with the convention that
frs = −fsr in case r > s. The expressions above are actually the coefficients
of a 3-form, the exterior differential of ω. We are not going into it, but we
may infer that d2 (α) = d(dα) = 0 for every 1-form α, as it was for scalar
functions, namely d2 (f ) = d(df ) = 0 (always under the hypothesis of being
C 2 ). Therefore the iteration of the operation of exterior differential is always
0 (it is not the case for the standard differential).

The Lemma of Poincaré is true also for 2-forms, that is closed 2-forms
defined on star-shaped domains are exact. Instead of proving that we will
provide a method to compute primitives of 2-forms in R3 . Consider the form
ω = Ady ∧ dz + Bdz ∧ dx + Cdx ∧ dy where A, B, C are functions of x, y, z. The
objetive is to eliminate z form both the functions and the basis of 2-forms.
Firstly consider an 1-form α = pdx + qdy (p, q are functions of x, y, z). Its
exterior differential is
 
∂q ∂p ∂q ∂p
dα = − dx ∧ dy − dy ∧ dz + dz ∧ dx.
∂x ∂y ∂z ∂z
Now we are going to compute p, q so ω − dα only contains the dx ∧ dy term.
For that it is necessary these two equations be fulfilled
∂q ∂p
A=− ; B=
∂z ∂z
189
what turns out possible with partial primitives. From now on the functions
p, q are supposed known. We have
 
∂q ∂p
ω − dα = C − + dx ∧ dy.
∂x ∂y

We claim that the function between brackets does not depend on z. We will
compute its partial derivative with respect to z

∂ 2q ∂ 2p
 
∂ ∂q ∂p ∂C ∂C ∂A ∂B
C− + = − + = + + =0
∂z ∂x ∂y ∂z ∂x∂z ∂y∂z ∂z ∂x ∂y

where the hypothesis of being ω closed is used by the first time. Once we know
that the function between brackets does not contain z the problem is reduced
to dimension 2 where to find a primitive is not difficult by partial integration
as we may assume that the primitive is of the form f (x, y)dx (or g(x, y)dy).

10.5 Integration of 2-forms on surfaces


We will consider in first place parameterized C 1 surfaces with boundary em-
bedded into Rn which can be described by an injective function Γ : D → Rn
and the following conditions:
1. D ⊂ R2 is compact with C 1 boundary;
2. there is an open set D ⊂ C ⊂ R2 where Γ extends as C 1 function;
∂Γ ∂Γ
3. ∂u
(u, v) and ∂v
(u, v) are linearly independent for every (u, v) ∈ D.
The integral of a 2-form of parameterized C 1 surface with boundary Γ is
defined when Γ(D) is contained into the domain of ω by the formula
Z ZZ
 ∂Γ ∂Γ 
ω= ω Γ(u, v) (u, v), (u, v) dudv.
Γ D ∂u ∂v

Firstly, note that if we interchange the role of the variables (u, v) taking
Γ̃(v, u) = Γ(u, v) defined on D̃ = {(v, u) : (u, v) ∈ D} then the value of
the integral change multiplied by −1. Indeed, this is consequence of the an-
tisymmetry of ω. This phenomenon is the analogous of the change of sign
in the integral of 1-forms when the path is walked backwards. The principle
behind is that parameterized surfaces (the ones we are considering) can given

190
an “orientation” that plays a role similar to the orientation of curves. In the
case of surfaces embedded into R3 having an orientation is simply to distin-
guish between the to “faces” of the surface, as for instance we can distinguish
between up and down when the surface is given as the graph of a function of
two variables.

Another issue we have to deal with is the to prove that the notion integral
for 2-forms does not depend on the particular choice of the parameterization
but on the shape Γ(D) of the surface together with the orientation, that means,
a similar statement to Proposition 10.2.1.
Proposition 10.5.1. Let Γ : D → Rn a C 1 surface with boundary, ω a
continuous 2-form defined on a set containing Γ(D) and h : D̃ → D a C 1
bijection with positive jacobian. Then Γ̃ : D̃ → Rn is a piecewise C 1 surface
and Z Z
ω= ω
Γ̃ Γ

Proof. Writing the change of variables as h(s, t) = (u(s, t), v(s, t)) and sub-
stituting into the expression of the first integral (some variables are omitted
for the sake of readability) we have
ZZ
 ∂ Γ̃ ∂ Γ̃ 
ω Γ̃(s, t) (s, t), (s, t) dsdt =
D̃ ∂s ∂t
ZZ
 ∂Γ ∂u ∂Γ ∂v ∂Γ ∂u ∂Γ ∂v 
ω Γ̃(s, t) + , + dsdt =
D̃ ∂u ∂s ∂v ∂s ∂u ∂t ∂v ∂t
ZZ
 ∂Γ ∂Γ ∂u ∂v ∂v ∂u 
ω Γ̃(s, t) , )( − dsdt =
D̃ ∂u ∂u ∂s ∂t ∂s ∂t
ZZ
 ∂Γ ∂Γ  ∂(u, v)
ω Γ(u(s, t), v(s, t)) (u(s, t), v(s, t)), (u(s, t), v(s, t)) dsdt
D̃ ∂u ∂v ∂(s, t)
ZZ
 ∂Γ ∂Γ 
= ω Γ(u, v) (u, v), (u, v) dudv
D ∂u ∂v
where in the first equality we have used the chain rule for derivatives, in the
second equality the bilinearity and antisymmetry of the 2-form, third equality
is just to make explicit the involved variables and finally the last equality is
due to the change of variables formula for the integral.

191
Now we will consider a more general type of surfaces. We say that con-
nected set in Rn is an oriented piecewise C 1 surface if it is the union of the
images of finitely many parameterized C 1 surfaces with border, those surfaces
can only intersect on points of their borders and the intersection when happens
is a non trivial curve, and finally the orientations induced by the parameteri-
zations on each piece are compatible. This is something with a clear meaning
for surfaces in R3 thinking of orientation with the help of the normal vector
field. In this 3-dimensional setting there is an important example. Asume
that a compact set with nonempty interior has a boundary which is made up
of finitely many parameterized surfaces (with boundary). Then there is a nat-
ural standard orientation: the normal field points to the exterior of the set.
Thus, in case of an oriented piecewise C 1 surface Γ = Γ1 + · · · + Γm where Γk
are parameterized C 1 pieces we define
Z m Z
X
ω= ω
Γ k=1 Γk

for any 2 form defined on a domain containing Γ. It is not difficult to check that
the definition does not depends on how Γ is decomposed into C 1 parameterized
pieces. For instance, the sphere needs such a decomposition and it can be done
of infinitely many fashions. Moreover, removing one point of the sphere, the
remainder is a parameterized surface and one point less does not bother when
it comes to integration.

10.6 Gauss and Stokes


This section will be developed in R3 providing analogous and related results
to the Green-Riemann formula. We say that a bounded open domain E ⊂ R3
with C 1 boundary is elemental with respect to the XY plane if its boundary is
contained in the “cylinder” {(x, y, z) : (x, y) ∈ ∂D} where D is the orthogonal
projection of E onto the XY plane, and the graphs of two C 1 functions f, g :
D → R with f > g. Analogously elemental domains with respect to the Y Z
and the XZ planes are defined.
Lemma 10.6.1. Let E ⊂ R3 be an elemental domain with respect to the XY
plane and R(x, y, z) a C 1 function defined on a domain containing E. Then
Z ZZZ
∂R
R dx ∧ dy = dxdydz
∂E E ∂z

192
Proof. The integral of the 2-form on those parts of ∂E contained in the
“cylinder” is null (from the geometrical point of view the field R dx ∧ dy is ver-
tical meanwhile the normal vectors of the cylinder are horizontal). The upper
and lower parts of the domain are given by the parameterizations Γ1 (x, y) =
(x, y, f (x)) and Γ2 (x, y) = (x, y, g(x, y)) with (x, y) ∈ D, where the second has
to be reversed to be according to the orientation (towards the exterior). As
we have  ∂Γ ∂Γ   ∂Γ ∂Γ 
1 1 2 2
(dx ∧ dy) , = (dx ∧ dy) , =1
∂x ∂y ∂x ∂y
then
Z ZZ ZZ
R dx ∧ dy = R(x, y, f (x, y)) dxdy − R(x, y, g(x, y)) dxdy =
∂E D D
!
ZZ ZZ Z f (x,y
 ∂R
R(x, y, f (x, y)) − R(x, y, g(x, y)) dxdy = dz dxdy
D D g(x,y) ∂z
ZZZ
∂R
= dxdydz
E ∂z
as wanted.

Since the analogous results are true for elemental domains with respect to
the Y Z and XZ planes we have the following.
Theorem 10.6.2 (Gauss-Ostrogradsky). Let E ⊂ R3 be a domain which is
elemental with respect to the three planes XY , Y Z and XZ and let P dy ∧ dz +
Qdz ∧ dx + Rdx ∧ dy a 2-form which is C 1 on a domain containing E. Then
Z
P dy ∧ dz + Q dz ∧ dx + R dx ∧ dy =
∂E
ZZZ  
∂P ∂Q ∂R
+ + dxdydz
E ∂x ∂y ∂z
Moreover, the same formula hold if E is a domain such that it can be decom-
posed into a finite non overlapping union of domains with C 1 boundaries which
are elemental with respect to the three coordinate planes.
Proof. The result is just the sum of the three equalities
Z ZZZ
∂P
P dy ∧ dz = dxdydz
∂E E ∂x

193
Z ZZZ
∂Q
Q dz ∧ dx = dxdydz
∂E E ∂y
Z ZZZ
∂R
R dx ∧ dy = dxdydz
∂E E ∂z
where the last one come from the previous lemma and the two other ones are
the analogous that can be obtained switching the coordinate planes.
Remark 10.6.3. The theorem of Gauss-Ostrogradsky can be proved with a
similar degree of generality that the Green-Riemann theorem, but the extra
work is not worth at all.
Now we will obtain a result which relates the integration of an 1-form along
the relative boundary (the “free points” of the boundaries of the pieces) of an
oriented piecewise C 1 surface and the integral of its exterior differential over
that surface. Firstly, given an oriented piecewise C 1 surface S we have to assign
an orientation to the relative boundary ∂S. That will be the anticlockwise
orientation when we look at the surface from “above”, that is, from the part
the normal vectors points towards. We will say that a piece of the surface is
flat with respect to the plane XY if it can be represented as the graph of a
function z = f (x, y).
Theorem 10.6.4 (Stokes). Let S be an oriented piecewise C 2 oriented surface
and let ω an 1-form which C 1 on a domain containing S. Then
Z Z
ω= dω.
∂S S

Proof. Decomposing the surface into C 2 pieces we just have to prove the
result for each piece which are C 2 surfaces with boundary. Indeed, the surface
integrals are additive and the integral of the 1-form vanishes on the shared
parts of the relative boundary. With the help of the implicit function theorem
we may decompose the surface into smaller flat pieces (remember that the
function representing the surface can be enlarged smoothly beyond its domain).
Therefore we may assume that S is C 2 flat with respect to XY (the other
two orientations are obtained likewise). Assume now that S is represented
as z = f (x, y) with (x, y) ⊂ D a domain with C 1 boundary. In order to
proof the result we are going to develop both members of the equality. Put
ω = P dx + Qdy + Rdz and (X(t), Y (t)) with t ∈ [a, b] a parameterization of
the border. Thus we have Z
ω=
∂S

194
Z b
∂f ∂f
P (·)X ′ (t) + Q(·)Y ′ (t) + R(·)( (··)X ′ (t) + (··)Y ′ (t)) dt =

a ∂x ∂x
Z b
∂f ∂f
(P (·) + R(·) (··))X ′ (t) + (Q(·) + R(·) (··))Y ′ (t) dt =

a ∂x ∂y
Z b Z
p(·)X ′ (t) + q(·)Y ′ (t) dt =

p dx + q dy
a ∂D
where (·) = (X(t), Y (t), f (X(t), Y (t))), (··) = (X(t), Y (t)) and
∂f
p(x, y) = P (x, y, f (x, y)) + R(x, y, f (x, y)) (x, y),
∂x
∂f
q(x, y) = Q(x, y, f (x, y)) + R(x, y, f (x, y)) (x, y).
∂y
Therefore Z Z ZZ 
∂q ∂p 
ω= p dx + q dy = − dxdy
∂S ∂D D ∂x ∂y
where the last equality is thanks to the Green-Riemann formula. Now we have
to compute
∂ 2f
 
∂p ∂P ∂P ∂f ∂R ∂R ∂f ∂f
= + + + +R
∂y ∂y ∂z ∂y ∂y ∂z ∂y ∂x ∂x∂y
∂ 2f
 
∂q ∂Q ∂Q ∂f ∂R ∂R ∂f ∂f
= + + + +R
∂x ∂x ∂z ∂x ∂x ∂z ∂x ∂y ∂y∂x
where the variables have been removed for sake of better readability. Therefore
∂q ∂p ∂Q ∂Q ∂f ∂R ∂f ∂P ∂P ∂f ∂R ∂f
− = + + − − − = (∗)
∂x ∂y ∂x ∂z ∂x ∂x ∂y ∂x ∂z ∂y ∂y ∂x
Now we are going to compute the surface integral of the statement. Firstly
     
∂R ∂Q ∂P ∂R ∂Q ∂P
dω = − dy ∧ dz + − dz ∧ dx + − dx ∧ dy
∂y ∂z ∂z ∂x ∂x ∂y
that implies
∂R ∂f ∂Q ∂f ∂P ∂f ∂R ∂f ∂Q ∂P
dω(U, V ) = − + − + + − = (∗∗)
∂y ∂x ∂z ∂x ∂z ∂y ∂x ∂y ∂x ∂x
where U = (1, 0, ∂f
∂x
) and V = (0, 1, ∂f
∂y
). The equality (∗) = (∗∗) completes the
proof of the theorem.

195
Remark 10.6.5. The hypothesis C 2 in the last theorem contrasts with the C 1
assumption in previous results. This is a consequence of the chosen method of
proof. And the result can be proved under more relaxed hypotheses.
Many of the previous results can be expressed in terms of the relation
between the integrals of a (k − 1)-form and it exterior differential, which is a
k-form, on the (k−1)-dimensional smooth boundary of a k-dimensional object,
respectively. Now we are ready for this new point of view:
1. Let γ : [a, b] → Rn be a parameterized piecewise C 1 curve (injective). Its
relative boundary is ∂γ = {γ(a), γ(b)}. The orientation of γ induces an
orientation on that two points set, thatR is a distinction. Given a 0-form,
that is a scalar function, f we define ∂γ f = f (γ(b)) − f (γ(a)). With
this notation Proposition 10.2.3 becomes
Z Z
df = f.
γ ∂γ

2. If ω = p dx + q dy is an 1-form on R2 which is C 1 , its exterior differential


is the 2-form
 
∂q ∂p
d(p dx + q dy) = dp ∧ dx + dq ∧ dy = − dx ∧ dy.
∂x ∂y
Since 2-forms inRR2 have dimension
R 1 they are assimilable to scalar func-
tions. Taking D f dx ∧ dy = D f dxdy, the Green-Riemann formula
Theorem 10.3.4 becomes
Z Z
ω= dω.
∂D D

3. Let ω = P dy ∧ dz + Q dz ∧ dx + R dx ∧ dy be a 2-form in R3 . Its exterior


differential ia 3-form, which is assimilable to a scalar function because
there is only one basic element in dimension 3, namely dx ∧ dy ∧ dz. The
formula for the exterior differential was insinuated in Section 4
 
∂P ∂Q ∂R
dω = + + dx ∧ dy ∧ dz.
∂x ∂y ∂z
After this, the Gauss-Ostrogradsky Theorem 10.6.2 becomes
Z Z
ω= dω.
∂E E

196
4. And, of course, Stokes Theorem 10.6.4 itself follows the same scheme.

All these results can be summed up into a general Cartan-Stokes Theorem


proved in the frame of the theory of differential forms on manifolds.

10.7 Rationale and remarks


According to the syllabus, we will consider only the integration of 1-forms and
2-forms. Eventually, scalar functions will be considered 0-forms, and 3-forms
appear implicitly in the necessary conditions for a 2-form to be exact. There-
fore, the exterior multiplication is only defined for 1-forms and the definition
of exterior derivative is a restricted one.

The development of the theory is quite standard. Only two comments: the
topological facts are not much stressed (no mention of simply connected do-
mains, nor homotopy neither homology), and the proof of the Green-Riemann
formula is a real one, that means, it is not based on an a priori existence of a
nice decomposition of the domain. Such a struggle is not repeated for the R3
theorems, though.

An interesting comment could be that some differential equations that have


local solutions may have or not global solutions depending on the domain where
they are considered, and that is a purely topological matter (suggest they look
for information on the Rham cohomology).

10.8 Exercises
1. Calculate the integral Z
y dx − x dy
γ

being γ the triangle with vertices (0, 0), (1, 0) and (0, 1) orientated like-
wise.

2. Calculate the integral


Z
(y − z)dx + (z − x)dy + (x − y)dz
γ

197
being γ e the triangle with vertices (a, 0, 0), (0, b, 0) and (0, 0, c) with
a, b, c > 0 orientated likewise.
3. Calculate the integral
Z
z 2 dx + x2 dy + y 2 dz,
γ

being γ the spherical triangle with vertices (a, 0, 0), (0, a, 0) and (0, 0, a),
on the sphere centred at (0, 0, 0) with radius a > 0.
4. Consider the differential form
2x 2y  x2 + y 2 
ω(x, y, z) = dx + dy + 1 − dz,
z z z2
defined on the set
A = {(x, y, z) ∈ R3 : z ̸= 0}.
Show that it is exact and find all its primitives.
5. Find all the functions φ, ψ ∈ C 1 (R) with ψ(0) = 0 such that the differ-
ential form
ω(x, y, z) = (z + z 2 ) dx + φ(y) ψ(z) dy + x + 2z(x + y 2 ) dz,


is exact. Then, find all its primitives.


6. Interpret geometrically the integral of the differential form
−ydx + xdy
ω= ,
x2 + y 2
along a closed curve counterclockwise that encloses the origin. Then
deduce the value of the following integral
Z 2π
dt
.
0 a cos t + b2 sin2 t
2 2

7. Calculate ZZ
z dx ∧ dy,
ϕ
where ϕ is the parameterized surface
{ϕ(u, v) := (u + v, u2 + v 2 , u − v) : u, v ∈ [−1, 1]}.

198
8. Prove that on a oriented surface M there is a 2-form ω such that for
every N R⊂ M surface with boundary, then the area of N is the absolute
value of N ω.
9. Given the 1-form on R3

ω1 (x1 , x2 , x3 ) = x1 dx1 − dx3

ω2 (x1 , x2 , x3 ) = 2x23 dx1


ω3 (x1 , x2 , x3 ) = dx1 − x2 x3 dx2
find ω = (2ω1 − x2 ω3 ) ∧ ω2 .
10. Given f (x, y) = x2 y, g(x, y) = xy 2 , find the simplest expression for

ω(x, y) = df (x, y) ∧ dg(x, y).

11. Given two differentiable functions f and g on R3 , assume that

df ∧ dg = λ dx ∧ dy

where λ is a non null function on R3 . Prove that f and g depends only


on x, y.
12. Compute the exterior derivative of the form

ω(x, y, z) = yz 2 (cos xy) dx + xz 2 (cos xy) dy + (x + y) dz.

13. Compute the exterior derivative of the form

ω(x, y, z) = (2xy + y 2 ) dx + (x2 + 2xy) dy + 3z 2 dz,

and interpret the result.


14. Find all the primitives of the 2-form on R2

ω(x, y) = (2x + y − 3x2 y 2 ) dx ∧ dy.

15. Find all the primitives of the 2-form on R3

ω(x, y, z) = (y 3 − x3 )dx ∧ dy + (x − 2z)dy ∧ dz + (2z − y)dz ∧ dx.

199
16. Given the 1-form

ω(x, y, z) = y dx − x dy + dz,

find conditions on the C 1 functions u(x, y, z) y v(x, y, z) in order to the


form ω − v du be closed. Show that u and v do not depend on z.
17. Let m ≥ 0 and take r = (x2 + y 2 + z 2 )1/2 as usual. Find a function f (r)
such that if F⃗ = f (r)(x, y, z), then div(F⃗ ) = rm . Apply that to express
the integral ZZZ
rm dxdydz
D

in terms of a surface integral on ∂D (regularity is assumed). Find the


value for D = B(0, 1).
18. Let f : R2 → R an arbitrary C 1 function and consider the 2-form

ω = f (x, y) dx ∧ dy + x2 y dy ∧ dz − xy 2 dz ∧ dx.

Show that ω is exact and find a primitive.


19. Let ω be a 1-form C 1 defined on R3 \ {0}. Show that if ω is closed, then
it is exact. What is the role of the dimension?
20. Let ω be a 1-form C 1 defined on Rn \ {0}. Let R0 ⊂ Rn be a half-line
starting at 0 such that ω has primitives on the set Rn \ R0 . Prove that ω
has primitives on Rn \ R for R any half-line starting at 0. Does ω have
primitive on Rn \ {0}?

21. Prove that the following 2-form is closed


x dy ∧ dz y dz ∧ dx z dx ∧ dy
+ 2 + 2 ,
(x2 2
+y +z ) 2 3/2 2 2
(x + y + z ) 3/2 (x + y 2 + z 2 )3/2

and find a primitive defined on a maximal open subset of R3 .

200
Chapter 11

Classic Vector Analysis

11.1 Operations with vectors in R3


The geometrical interpretation in R2 of the arithmetical operations in the field
of complex numbers C thanks to Argand’s diagram is of great help in plane
geometry. That motivated Hamilton to seek a similar arithmetic structure for
R3 . After several barren tries, in 1843 he came out with the idea of defining the
operation in R4 instead of R3 and giving up with commutativity. He defined
the quaternions like an extension of the complex numbers (and so of the real
ones) as the formal expressions
a + bi + cj + dk
where i, j, k are imaginary units that interact among them accordingly to these
rules
i2 = j 2 = k 2 = −1; ij = −ji = k; jk = −kj = i, ki = −ik = j.
The real numbers are interpreted as those such that b = c = d = 0, so
a is called the scalar part and bi + cj + dk is called the vector part of the
quaternion. The arithmetic operations between quaternions are performed
using distributivity, implicitly assumed, in order to reduce the result to the
canonical form above with the help of the relationships between units, minding
that the order matters. Following this rules, the product turns out to be
associative and every non zero element has an inverse (both left and right).
Namely, the inverse of a + bi + cj + dk is
a − bi − cj − dk
.
a2 + b 2 + c 2 + d 2

201
There is an obvious analogy with complex numbers. The term on the numer-
ator is called conjugate and the real number on the denominator is the square
of the modulus. As it happens with the modulus of complex numbers, the
modulus is multiplicative.

Now we will consider quaternions with real part zero, which are called
purely imaginary and can be interpreted into R3 . The product of two purely
imaginary quaternions is not purely imaginary in general
(x1 i + y1 j + z1 k)(x2 i + y2 j + z2 k) =
−(x1 x2 + y1 y2 + z1 z2 ) + (y1 z2 − z1 y2 )i + (z1 x2 − x1 z2 )j + (x1 y2 − y1 x2 )k
The scalar part of the result, after changing the sign can be identified with the
Euclidean scalar product of vectors. The vector part of the product is called
the vector product. If u, v are quaternions with null scalar part its product is
can be represented as
uv = −u · v + u × v
being u × v the vector product. Since the modulus is multiplicative we have
|u|2 |v|2 = |u · v|2 + |u × v|2
It is not difficult to check that
u · (u × v) = v · (v × u) = 0
which means that u × v is orthogonal to both u and v, so its direction is well
determined in R3 if u and v are independent. We also have a consequence of
the non commutativity: u × v = −v × u.

Some time time after the discovery of the quaternions it was clear that in
order to deal with the Euclidean geometry of R3 it is not necessary the full
power of its the algebraic structure. We can work more easily in that frame
just keeping the scalar and vector products once we know their properties.
Let us note that the easiest method to compute the vector product without
appealing to quaternions is the following symbolic determinant
i j k
u × v = u1 u2 u3
v1 v2 v3
where u(u1 , u2 , u3 ) and v = (v1 , v2 , v3 ). Since most of the vectors will be
crowned with a little arrow in the following sections, we will denote from now
on the basis of R3 derived from the quaternions by {⃗i, ⃗j, ⃗k}.

202
11.2 Differential forms on R3
Along this chapter we will consider scalar and vector fields in R3 . These are
simply real functions and functions with values in R3 defined on some open
domain of R3 , often the whole space. We will follow this terminology (fields)
in order to stress the fact the different nature of the domain, which is made
of points, and the range which can be made of either numbers or vectors. As
to vector fields comes, it is worth noticing that it can be interpreted both
as differential 1-forms or 2-forms. Firstly we will establish the identification
between the vectors of R3 and the alternate forms of degrees 1 and 2 on R3 ,
whose respective spaces on have dimension 3. Such an identification can be
done with the help of basis, namely

a⃗i + b⃗j + c⃗k ←→ a dx + b dy + c dz;

a⃗i + b⃗j + c⃗k ←→ a dy ∧ dz + b dz ∧ dx + c dx ∧ dy.


These associations are canonical in the sense that the actions of the forms on
vectors ⃗u, ⃗v ∈ R3 , being ⃗u = (u1 , u2 , u3 ), ⃗v = (v1 , v2 , v3 ), can be represented by
the previous vector products for 1-forms as

(a dx + b dy + c dz)(u) = (a⃗i + b⃗j + c⃗k) · u = au1 + bu2 + cu3

and for 2-forms as follows

(a dy ∧ dz + b dz ∧ dx + c dx ∧ dy)(u, v) =

a b c
⃗ ⃗ ⃗
(ai + bj + ck) · (⃗u × ⃗v ) = u1 u2 u3 .
v1 v2 v3
The proof of the these equalities can reduced to check them on pairs or triplets
of basic vectors thanks to the linearity. For instance, the second one on the
triplet (⃗i, ⃗j, ⃗k) we have (dy ∧ dz)(⃗j, ⃗k) = 1 and ⃗i · (⃗j, ⃗k) = ⃗i · ⃗i = 1.

Once we know how to switch from differential forms language to vector


analysis language we can establish a relation between the integration of forms
and the integration on curves and surfaces (see the corresponding part in pre-
vious chapters). Using the relations above any differential form ω of degrees

203


1 or 2 defined on a domain of R3 can be transformed into a vector field F . If
γ(t) is a parameterized curve we have
Z b Z b

− →
− →

Z Z
′ ′
ω= ω(γ(t))(γ (t)) dt = F (γ(t)) · γ (t) dt = F · dℓ.
γ a a γ

Now, if Γ is a parameterized C 1 surface with boundary (with domain D) and




ω is a 2 form that is identified with the a vector fiel F then
   

− ∂Γ ∂Γ →
− → −
Z ZZ ZZ ZZ
∂Γ ∂Γ
ω= ω(Γ) , dudv = F· × dudv = F ·d S .
Γ D ∂u ∂v D ∂u ∂v Γ

These transformations show that the integration of forms can be expressed


actually as integration with respect to the intrinsic measures either for curves
or surfaces. That implies the already proven result that the integration of
forms is invariant by change of parameterization but an eventually change of
sign in case the orientation is reversed.

11.3 Vector operators


The exterior differential acting on differential forms on R3 can adopt several
forms, after the identification with fields of the previous section, called vector
(differential) operators. Since the correspondence is done through a choice of
an orthonormal basis, the appearance of the differential operators strongly re-
lies on the associate coordinates ”x, y, z” and the associated partial derivations
which may give the false impression that the canonical basis and the carte-
sian coordinates are privileged. For that reason, we will stress the fact that
the vector operators are intrinsic, that is, they do not depend on the choice
of coordinates. This could be done by direct computation on an orthogonal
change of coordinates, or proving that the exterior differentiation is intrinsic.
However we will choose alternative methods which moreover cast light on the
geometrical or physical meaning of the vector operators which is basic for the
applications.

Under the hypothesis of some regularity, there are some operations involv-
ing differentiation that can be performed to scalar and vector fields. These
operations can be labelled with the help of a symbolic operator named nabla
 
∂ ∂ ∂ ∂ ⃗ ∂ ⃗ ∂ ⃗
∇= , , = i+ j+ k.
∂x ∂y ∂z ∂x ∂y ∂z

204
The first operation we will consider is well known: the gradient. Let us
recall that gradient of a scalar field (function) f is the vector field defined by
 
∂f ∂f ∂f
∇f = , , .
∂x ∂y ∂z
Despite the fact that the gradient is defined in term of the cartesian coordinates
associated, it has an intrinsic meaning. Indeed, its modulus is the maximum
value of the directional derivative over all the norm one vectors, and provided
it is not zero, the gradient points in the maximizing direction.


Given a vector field F = (f1 , f2 , f3 ), its divergence is the scalar field defined
by

− ∂f1 ∂f2 ∂f3
∇· F = + + .
∂x ∂y ∂z
It is not obvious from this definition that the divergence is an intrinsic notion.
That can be deduced by alternative methods, as straight computation. It is
easier to remark that if we identify the vector field with a (differential) 2-form,
its divergence can be identified with its exterior differential. Therefore, if we
know that the exterior differential is an intrinsic notion independent from the
coordinate system then the same is true for the divergence. The third method
we will give also provides an interpretation of the divergence. The Gauss-
Ostrogradsky theorem says wit this notation that if D ⊂ R3 is a bounded


open domain and F is C 1 on a domain which includes D then

− → − →

ZZ ZZZ
F · dS = ∇ · F dV.
∂D D

For that reason divergence can be interpreted as the net rate of the flux leav-
ing/entering a small volume around the point
RR →
− → −

− ∂B(x,ε)
F · dS
∇ · F (x) = lim+ .
ε→0 Vol(B(x, ε))
Suppose that the vector field represents the speeds of a fluid. If the fluid is
incompressible that implies that the net rate of the flux is 0 as the amount of
fluid entering the ball equals that one getting out, so the divergence is 0. If the
is not incompressible then the divergence represents variations in density at
that point. In other interpretations of vector fields the divergence represents
a magnitude related to the field that is created/destroyed at the point. For

205
instance, the divergence of the static electric force field represents the charge
per unit volume.


Given a vector field F = (f1 , f2 , f3 ), its rotational or curl is the vector field
defined by
 

− ∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1
∇× F = − , − , − .
∂y ∂z ∂z ∂x ∂x ∂y

− →

Note that if we identify the vector field F with a n 1-form ω then ∇× F can be
identify with dω. That shows that the definition of the rotational is intrinsic.
In order to have an interpretation we have to appeal to Stokes’ theorem, which
can be rewritten in those terms. Let Γ be an oriented parameterized C 2 surface
with boundary in R3 then
− →
→ − →
− → −
Z ZZ
F ·d ℓ = ∇ × F · dS .
∂Γ Γ

Consider a disc D(x, ε, → −n ) with center at x, radius ε and contained in a plane


perpendicular to a norm one vector → −n . Then provided that the orientation of
the disc is the one of →−n we have
R − →
→ −

− −
→ F · d ℓ
∇ × F (x) · → − ∂D(x,ε, n )
n = lim+ .
ε→0 πε2

− →

F · d→−
R
If F is a force field, then ∂D(x,ε,− →
n)
r is the work done along a closed
circuit. In case the field is conservative then this number is 0,which means that
a mass moving along the circle by effect of the force do not gain kinetic energy
after a doing a turn. If we place a still tiny wheel at x whose axis is align with

−n it will not turn in a conservative field. However, if the field is not a con-


servative one, the wheel will turn with an impulse proportional to ∇× F (x)· → −n.

Now we will consider an operator that involves second order derivatives, the
Laplacian, which is actually a combination of gradient and divergence. Given
a twice differentiable scalar field take
∂ 2f ∂ 2f ∂ 2f
∆f = ∇ · (∇f ) = + + .
∂x2 ∂y 2 ∂z 2
The Laplacian is also represented as ∇2 f . The intrinsicness of the Laplacian is
clear, however the combination of the interpretations of the gradient and the

206
divergence do not cast light on the what the Laplacian means. For that reason
we are going to build a straight one. Assume that f has a Taylor development
around 0 = (0, 0, 0), for simplicity, of second order of the form
∂f ∂f ∂f
f (x, y, z) = f (0) + (0)x + (0)y + (0)z+
∂x ∂y ∂z

∂ 2f ∂ 2f ∂ 2f

1 2 2
(0)x + (0)y + (0)z 2 +
∂x2
2 ∂y 2 ∂z 2
∂ 2f ∂ 2f ∂ 2f

2 (0)xy + 2 (0)xz + 2 (0)yz + o(∥(x, y, z)∥2 ).
∂x∂y ∂x∂z ∂y∂z
The integration over the sphere ∂B(0, ε) leads to the cancellation of the first
order terms and the mixed ones (those containing xy,xz and yz), so it remains

4πε4 ∂ 2 f 4πε4 ∂ 2 f 4πε4 ∂ 2 f


ZZ
2
f dS = 4πε f (0) + 2
(0) + 2
(0) + 2
(0) + o(ε4 )
∂B(0,ε) 6 ∂x 6 ∂y 6 ∂z

4πε4
∆f (0) + o(ε4 ).
= 4πε2 f (0) +
6
The trick to compute easily the integrals was the following: by symmetry we
obviously have
ZZ ZZ ZZ
2 2
x dS = y dS = z 2 dS
∂B(0,ε) ∂B(0,ε) ∂B(0,ε)

but
ZZ ZZ ZZ ZZ
2 2 2
x dS + y dS + z dS = ε2 dS = 4πε4 .
∂B(0,ε) ∂B(0,ε) ∂B(0,ε) ∂B(0,ε)

Therefore the average of f (x, y, z) − f (0, 0, 0) over the sphere for ε > 0 small
is
ε2
ZZ
1
2
(f − f (0)) dS = ∆f (0) + o(ε2 ).
4πε ∂B(0,ε) 6
Therefore, the Laplacian measures the difference between the value of the
function on a point and its average around the point. The functions whose
Laplacian is null are called harmonic and we will see later that the value at a
given point is actually the average of the values on centred spheres.

207
The last operator we will consider is the Laplacian of a vector field, that


appears in applications to Electromagnetism. If F = (f1 , f2 , f3 ) then we define


∆ F = (∆f1 , ∆f2 , ∆f3 ).
The fact that this definition is intrinsic is consequence of the following identity
whose proof is left to the reader

− →
− →

∆ A = ∇(∇ · A) − ∇ × (∇ × A).

11.4 Newtonian potential


The function
1
f (x, y, z) = p
x2 + y 2 + z 2
plays a very important role in the study of fields in R3 , not only because it ap-
pears relate to classic physical fields as the gravitational and electromagnetic,
but also because its study will provide us with theoretical tools to deal with
very general mathematical problems.

We will write its derivatives of first and iterated second order


∂f −x ∂f −y ∂f −z
= 2 ; = ; = ;
∂x (x + y 2 + z 2 )3/2 ∂y (x2 + y 2 + z 2 )3/2 ∂z (x2 + y 2 + z 2 )3/2
∂ 2f 3(2x2 − y 2 − z 2 ) ∂ 2 f 3(2y 2 − z 2 − x2 ) ∂ 2 f 3(2z 2 − x2 − y 2 )
= ; = ; = .
∂x2 (x2 + y 2 + z 2 )5/2 ∂y 2 (x2 + y 2 + z 2 )5/2 ∂z 2 (x2 + y 2 + z 2 )5/2
These equalities show that the decreasing rate at infinity is r−2 for the first
derivatives and r−3 for the second order ones (even the non computed). More-
over, clearly ∇f = 0 thus it is harmonic on R3 \{0}. Now we will consider more
complicated functions build from the function above. In order to keep some
simplicity we will introduce the notation → −
p and →

r for points of R3 , mainly the
first one will be a “free” variable and the second one an “integration” variable.
Assume we are given points → −
r 1, . . . , →

r n and numbers m1 , . . . , mn that we will
call “charges” (or “masses”). The potential produced at → −p by the charges mk ’s
placed at the points → −r k ’s is
n
mk
Φ(→

X
p)= .
k=1
∥r k−→

− −
p∥

208
Note that this function is harmonic except at the singularities →
−r k ’s. If we
think of the potential produced by many small charges we arrive naturally
to a generalization of the potential with the help of integration. Let µ be
a finite signed σ-additive Borel measure with compact support. Under these
assumptions the potential can be defined for points →

p out the support of µ as
dµ(→

Z

− r)
Φ( p ) = .
∥r −→

− −
p∥
This function is harmonic on the complement of the support of the measure
as the interchange of derivation and integration is not a problem in absence
of singularities. However, the potential Φ could be defined at more points if
the integral is convergent, although we cannot say anything of the regularity
of Φ unless we make some assumptions on the measure µ. As we will see later,
for measures compactly supported which are continuous with respect to the
Lebesgue measure (continuous densities from now on appealing to the physical
origin) the potential Φ is defined everywhere. For not compactly supported
measures, if we impose special decay conditions to µ at infinity we may even
have the potential defined everywhere. Nevertheless some special cases are
treated by analogy with physical situations.

We will obtain a result which is basic in order to understand the behaviour


of the potential of continuous distributions, but firstly we have to consider a
particular distribution on a particular nice surface: the sphere. Consider the
potential produced by a homogeneous charge located on a sphere of radius
R > 0 centred at 0. Assume that the density is ρ, that is the quotient of the
charge by the measure area. By symmetry reasons the potential must depend
only on the distance of the point → −p to the origin, that is, the norm ∥→−p ∥.
Therefore we may assume that the point lies on the positive part of the Z axis
and so →
−p = (0, 0, z0 ) with z0 ≥ 0. Consider the following parameterization of
the sphere

 x = R cos θ cos ϕ;


r (θ, ϕ) = y = R sin θ cos ϕ;
z = R sin ϕ.

The factor of area transformation is R2 cos ϕ and the distance to → −


p is

∥→

r −→

p ∥2 = R2 cos2 θ cos2 ϕ + R2 sin2 θ cos2 ϕ + (R sin ϕ − z0 )2

= R2 − 2Rz0 sin ϕ + z02 .

209
The potential at →

p is given by
2π π/2
ρR2 cos ϕ dθdϕ
Z Z
Φ(→

p)= p
0 −π/2 R2 − 2Rz0 sin ϕ + z02
π/2 π/2
ρR2 cos ϕ dϕ
Z
2πρR
q
= 2π = − R 2 − 2Rz sin ϕ + z 2
p 0 0
−π/2 R2 − 2Rz0 sin ϕ + z02 z0 −π/2

2πρR p 2πρR p 2πρR


= (R + z0 )2 − (R − z0 )2 = (R + z0 − |R − z0 |).
z0 z0 z0
The expression depends on the relative position of the point with respect to
the sphere
4πRρ if ∥→ −
(

− p ∥ < R;
Φ( p ) = 4πR ρ2 →

∥−
→p∥
if ∥ p ∥ ≥ R.

Note that the integral converges at the singular points (the sphere of radius R)
making the potential continuous on all R3 , although not differentiable. More-
over, for exterior points the potential of the charged sphere behaves as if all
the charge (4πR2 ρ) was concentrated at the origin.

Now we will consider a homogeneous charge on a ball where the volumetric


density is denoted ρ too. The ball of radius R > 0 can be considered as made
up of spheres of radiuses 0 ≤ s ≤ R. The potential produced by a sphere of
radius s and thickness δs at the point →−
p is

δs if 0 ≤ s ≤ min{∥→−
(
4πs2 ρ
p ∥, R};
Φs (→
− ∥−

p∥
p)= →

4πsρ δs if min{∥ p ∥, R} < s ≤ R.

In case ∥→

p ∥ ≥ R only the first formula is necessary and therefore
R
4πs2 ρ 4πR3 ρ
Z
Φ(→

p)= ds =
0 ∥→
−p∥ 3∥→

p∥

which means that the homogeneously charged sphere behaves like a punctual
charge concentred at the origin. In case that ∥→

p ∥ < R we have
R ∥−

p∥ R
4πs2 ρ
Z Z Z
Φ(→

p)= Φs (→

p ) ds = ds + 4πsρ ds =
0 0 ∥→
−p∥ ∥−

p∥

210
4π∥→

p ∥3 ρ 2π →

− + 2πR2 ρ − 2π∥→

p ∥2 ρ = 2πR2 ρ − ∥−
p ∥2 ρ.
3∥ p ∥ 3
Putting both expressions together we have

→ 2
2πR2 ρ − 2π∥ 3p ∥ ρ if ∥→

(

− p ∥ < R;
Φ( p ) = 3
4πR ρ →

3∥−

p∥
if ∥ p ∥ ≥ R.

In spite of the complicated formula this function is C 1 . At points not belonging


to the sphere the regularity is C ∞ , however the potential is not harmonic in
the interior of the ball
∂ 2Φ ∂ 2Φ ∂ 2Φ 4πρ
2
= 2
= 2
=−
∂x ∂y ∂z 3
and so ∆Φ = −4πρ in the interior of the ball.

Consider now a variable volumetric density ρ with compact support. If


we assume ρ is measurable and bounded, the potential created by the charge
µ = ρ dV , where dV represents the 3-dimensional Lebesgue measure, is

ρ(→

ZZZ

− r ) dV
Φ( p ) =
∥r −→

− −p∥

which is defined everywhere. Indeed, the support of ρ can be included into a


ball and the function ρ is bounded, therefore the convergence of the integral
on balls above implies the convergence in this situation. A direct argument
is possible too: if →−
p belongs to the support of ρ it is a singular point, how-
ever the volume of a ball of radius ε > 0 is proportional to ε3 meanwhile the
function goes to ∞ proportionally to ε−1 , which implies the convergence. Ac-
tually, this argument works with a function inside the integral whose growth
proportionally to ε−2 . That implies the convergence of

|ρ(→

ZZZ ZZZ
r )| dV
= |ρ(→

r )|∥∇(∥→
−r −→ −p ∥−1 )∥ dV
∥→−r −→ −p ∥2

and therefore the differentiability of Φ(→


−p ) everywhere and the validity of the
following formula
ZZZ


∇Φ( p ) = ρ(→

r )∇(∥→ −
r −→−p ∥−1 ) dV.

211
In order to have second order derivatives we need to ask some regularity to ρ. If
ρ were differentiable at some point →−p then it would be possible to compensate
−3
the growing of rate ε near the singularity with the “balanced” difference
ρ(→
−r ) − ρ(→
−p ), implying that we may change locally ρ by the constant value


ρ( p ). This is a delicate task that we are no going to detail here, however the
interpretation is easy: the Laplacian of Φ at →−
p can be calculated decomposing
the charge into two parts: the part inside a small ball of radius ε where we may
assume that ρ is constant and the charge out the small ball which produces
a potential whose Laplacian is 0 at → −
p since this point is not in the support.
The consequence of that argument provided the extra regularity of ρ is the
possibility of being recovered from the potential it generates
∆Φ(→−
p ) = −4πρ(→ −p ).
This remarkable formula is known as Poisson’s equation. It is natural to think
the feasibility of the inverse problem: given a function f defined on a domain
D with some regularity hypotheses. is possible to express f as a potential? In
general the answer is negative. Indeed, the function to be a candidate for the
charge is evidently
−1
ρ= ∆f.

If f is C 3 and D is bounded then the potential
−1 ∆f (→

ZZZ

− r ) dV
Φ( p ) =
4π D∥r −→

− −p∥
is a function such that ∆Φ = ∆f on D, but Φ ̸= f in general. For instance,
if f (x, y, z) = x2 + y 2 + z 2 and D = B(0, 1) the integral formula will produce
the potential of a homogenous charged ball (as above in this section) which
differs from f in a constant. In general, the difference Φ − f will be a harmonic
function on D. If f and its derivatives satisfy some particular decay conditions
we could enforce the equality as a consequence of the properties of harmonic
functions we are going to study in next section.

The previous arguments have a nice application. Every C 2 vector field can
be decomposed locally as the sum of a gradient of a scalar function and the

− →

rotational of a vector field. Indeed, given F take ρ = ∇ · F . If D ⊂ R3 is a
ball (or more generally, a bounded star shaped domain) then we may consider
the potential Φ generated by the density ρ on D and take f = −(4π)−1 Φ. Now
we have →

∇ · ( F − ∇f ) = ρ − ρ = 0.

212


Therefore, the field F − ∇f is closed regarded as a 2-form. Then there exists

− →
− →

a primitive G on D such that ∇ × G = F − ∇f and thus

− →

F = ∇f + ∇ × G
as we wanted. Is worth noticing that the decomposition above does not make
any sense from the point of view given by the theory of differential forms.
Indeed, we are adding a 1-form and a 2-form.

11.5 Harmonic functions


In the previous section we have seen that potential functions are harmonic
away their support, that is, their Laplacian is zero. Harmonic functions appear
in many applications, so that we will prove some additional properties they
have. Let start by this straightforward application of the Gauss-Ostrogradsky
theorem: if f is harmonic in a domain that contains the ball B[→ −p , R] with
R > 0 then


ZZ ZZZ
∇f · d S = ∆f dV = 0.
∂B[−

p ,R] B[−

p ,R]

We can rewrite this equality




ZZ
∇f · N dS = 0
∂B[−

p ,R]



and the term ∇f · N can be interpreted as a normal derivative, usually denoted
∂f
by ∂n (in this particular case is a radial derivative ∂f
∂r
). We may parameterize
the sphere ∂B[→ −
p , r] by means of the unit sphere ∂B[0, 1] as →−p + r→

x . In such

− →

a case we have x = N as well. Since the sizes of the spheres differ in a r2
factor we have


ZZ ZZ

− →
− →
− →
− 1
∇f ( p + r x ) · x dS( x ) = 2 ∇f · N dS = 0.
∂B[0,1] r ∂B[−

p ,r]

Then, integrating with respect to r we get


Z R ZZ
0= ∇f (→
−p + r→−
x)·→

x dS(→

x ) dr =
0 ∂B[0,1]
ZZ Z R ZZ R
d → − →
− →
− f (→

p + r→
− dS(→


f ( p + r x ) dr dS( x ) = x) x)
∂B[0,1] 0 dr ∂B[0,1] r=0

213
ZZ
= (f (→

p + R→

x ) − f (→

p )) dS(→

x)
∂B[0,1]

which implies after rescaling (integration over the ball of radius R) that
ZZ ZZ
0= →

(f − f ( p )) dS = f dS − 4πR2 f (→
−p)
∂B[−

p ,R] ∂B[−

p ,R]

and so ZZ
1
f (→

p)= f dS.
4πR2 ∂B[−

p ,R]

This remarkable identity is the so called mean value property of the harmonic
functions, that is the value at any point can be expressed as an average of the
values over any sphere around that point. The mean value property is true in
any dimension with the corresponding adaptation. Note that in dimension 1
is evident because the harmonic functions are exactly the affine functions, so
the mean value property just says that the value at the middle of a segment is
the arithmetic mean of the values at the butts. In dimension 2 the above proof
can be adapted with the use of the Green-Riemann theorem and the ideas to
be discussed in the next section. Nevertheless, in dimension 2 the theory of
harmonic functions have strong bonds with complex analysis which provide
alternative techniques.

We will go on with the 3-dimensional frame to state and prove the results
although the they are valid in any dimension. The mean value property has a
surprising consequence.
Theorem 11.5.1. Let D ⊂ R3 be a connected domain and f a harmonic
function defined on D. Then
(a) f does not have relative strictly extremum values;
(b) if f attains an absolute extreme value on D then f is constant;

(c) if D is bounded and f can be extended continuously to D then f attains


its extreme values on ∂D.

Proof. We will argue with maximums, being the argument with minimums
similar.
Assume that f has a relative strict maximum at →

p . Then there is ε > 0

214
such that f |∂B(− →

p ,ε) < f ( p ). By the continuity of f (a strict inequality at a

particular point integrated remains strict) we get that
ZZ
1
2
f dS < f (→

p)
4πε
which is a contradiction.
Now, if the function attains a maximum the previous argument shows that
actually we have f |∂B(− →
− →

p ,ε) = f ( p ) for any p where the maximum is attained

and any ε > 0 such that B[→ −
p , ε] ⊂ D. That shows that the set

{→

p ∈ D : f (→

p ) = max(f )}

is open. As it is clearly closed, then it must be all D by connection.


Finally, the last statement is a consequence that the maximum has to be at-
tained somewhere. If attained on D, then the function is constant and so the
maximum is also attained on ∂D.

A consequence of the previous result is the uniqueness of the solution of


the Dirichlet’s problem on bounded domains. In case, of unbounded domains
we have the following.
Corollary 11.5.2. A harmonic function defined on R3 which vanishes at ∞
must be null.
This result can be applied to prove that a function vanishing at ∞ such
that its derivatives also vanishes at ∞ with a suitable rate of decay is actually
a potential with charge given by Poisson’s equation.

11.6 Vector Analysis in R2


So far we have discussed results and topics for R3 . The adaptation of the
results to R2 is not merely to take z = 0 everywhere. Let start with the
following observation: the Green-Riemann formula
Z ZZ  
∂q ∂p
p dx + q dy = − dx dy
∂D D ∂x ∂y

can be deduced from Stokes’ formula just doing z = 0. However, what is


the analogous of Gauss-Ostrogradsky? In other words, how to express the

215
flux integral on the plane and what is its corresponding “divergence” formula?
In order to answer that question, suppose firstly that the boundary of D is
given by some C 1 curve → −γ (s) with s ∈ [0, L] the arc-length. That implies

− ′ →

∥ γ (s)∥ = 1. We put γ (s) = (x(s), y(s)) the integral above over ∂D can be
expressed as
Z Z L
p dx + q dy = (p(x(s), y(s)) x′ (s) + q(x(s), y(s)) y ′ (s)) ds
∂D 0

whose interpretación has beed discussed previously. If we desire a 2-dimensional


flux integral we need the to work with the unitary normal vector (pointing out-
side) →
−n (s) = (y ′ (s), −x′ (s)). The corresponding flux integral is
Z L

− → →
− →
Z Z

F · d ν := −
F · n ds = (p(x(s), y(s)) y ′ (s) − q(x(s), y(s)) x′ (s)) ds
∂D ∂D 0



where F = (p, q). Note that the last member can be interpreted as a standard
line integral
Z L Z
′ ′
(−q(x(s), y(s)) x (s) + p(x(s), y(s)) y (s)) ds = −q dx + p dy.
0 ∂D

Therefore, applying Green-Riemann we get


ZZ  

− →
Z
− ∂p ∂q
F ·dν = + dx dy
∂D D ∂x ∂y

which is a genuine version of the Gauss-Ostrogradsky theorem. It is clear,


that this formula can be extended to the same hypotheses of Green-Riemann.
Moreover, we may interpret the expression inside the plane dimensional inte-
gral as a 2-dimensional divergence and the meaning of this divergence is the
same that in 3 dimensions for two dimensional fluids (this idealization is also
studied in Fluid Dynamics).

The application of the previous result to the gradient of a scalar function


f (x, y) gives Z ZZ


∇f · d ν = ∆f dx dy.
∂D D

In particular, if the function is harmonic (in 2 dimensions) then the integrals


are zero. In the theory of Newtonian potential in 3 dimensions an important

216
role was played by the function (x2 + y 2 + z 2 )−1/2 which the unique non trivial
harmonic function with spherical symmetry, turning a blind eye on the fact
that it s not defined at 0. The analogous role in R is played by the function
ϕ(x, y) = (−1/2) log(x2 + y 2 ) for which
Z
∇ϕ · d→
−ν = −2π
∂D

on any domain D containing 0. The function ϕ can be interpreted as a 3-


dimensional potential produced by a charge placed on the Z axis with linear
density 1, being necessary some “physical trick” in order to obtain the result.
Also, the proof of the mean value theorem in the previous section can adapted
without trouble to obtain
Z

− 1
f( p ) = f dℓ
2πR ∂B[−→p ,R]

for any harmonic function defined on a domain that contains B[→



p , R].

Finally, we will discuss the application of Green-Riemann to the computa-


∂q ∂p
tion of areas. It is clear, that a choice of functions p, q such that ∂x − ∂y = 1,
as for instance q = x, p = 0 or p = −y, q = 0, will imply
Z
area(D) = p dx + q dy.
∂D

The advantage that this method offers is based that it is often easier to pa-
rameterize the boundary rather than to express as a graph. Another choice of
functions p, q which offers some symmetry is the following
Z
1
area(D) = −y dx + x dy.
2 ∂D
In case of not having a natural parameterization of the curve, the part con-
tained in one of the half-planes x > 0 or x < 0 taking as parameter t = y/x on
each piece where it can be done uniquely. The relation among the differentials
dy = t dx + x dt carried to the last area formula gives
Z Z
1 2 1
area(D) = −tx dx + tx dx + x dt = x2 dt
2 ∂D 2 ∂D

217
which is a remarkable formula that keeps some resemblance with the well
known polar formula for the area

1 β 2
Z
area(D) = ρ dθ
2 α

if D = {(r cos θ, r sin θ) : 0 ≤ r ≤ ρ(θ), α ≤ θ ≤ β}.

11.7 Assorted applications


We include some applications coming from different branches of Physics.

11.7.1 Mechanics
Assume that a particle of mass m follows a path → −
r (t) as a consequence of a


force field F acting on it. Newton’s second law says that


− d2 →

r
F =m 2 .
dt
Assume that the force is conservative, that is, it is the gradient of some scalar


function. Take a function V such that F = −∇V and call it potential energy.
We already know that
Z t2

− →
F · d−
r = V (→−r (t1 )) − V (→
−r (t2 ))
t1

where the first integral is a line integral along →



r (t) with t1 ≤ t ≤ t2 (we follow
the standard notation labelling the integral with the time interval instead of
the path). It is worth noticing that the Leibniz rule for differentiation of
products also works for scalar and vector products. In particular

d  d→

r d→−r  d2 →

r d→−r d→
−r d2 →
−r d→
−r d2 →
−r
· = · + · = 2 · .
dt dt dt dt2 dt dt dt2 dt dt2
We will apply that to the line integral above

d2 →

r d→− m d→

r d→−
t2 t2 t2

− → mv 2 (t2 ) mv 2 (t1 )
Z Z
r r
F · d−
r = m · dt = · = −
t1 t1 dt2 dt 2 dt dt t1 2 2

218


where v = ∥ ddtr ∥. The magnitude mv 2 /2 is called the kinetic energy. Now
from the equality
mv 2 (t2 ) mv 2 (t1 )
V (→

r (t1 )) − V (→

r (t2 )) = −
2 2
we get
mv 2 (t1 ) mv 2 (t2 )
V (→

r (t1 )) + = V (→

r (t2 )) +
2 2
which is called the conservation of energy law: the sum of the kinetic and the
potential energy remains constant along the time.

11.7.2 Hydrostatics
On a still fluid the pressure p is a scalar field that at any point represents
the magnitude of the force per unit area applied on the a face of a tiny plane
surface under the assumption that the fluid is removed from the other side.
The experimental knowledge says that the force is always normal to the surface
and its magnitude does not depend on the orientation of the surface, at the
same point. Assume that a non porous body D with C 1 boundary is subdued
to a pressure field. The total force applied on D is given by the surface integral

− →

ZZ ZZ
− p N dS = − pdS
∂D ∂D

where the sign “−” is necessary because the pressure by its very definition is


positive, the normal N points out the outside meanwhile the force is exerted
towards the body. The trick to compute this integral, which is vector-valued,


is to reduce it to a flux integral. Consider the field F = p⃗i. Then

− →
− → − →

ZZ ZZ ZZZ ZZZ
⃗i · ∂p
pdS = F · dS = ∇ · F dV = dV.
∂D ∂D D D ∂x

by Gauss-Ostrogradsky at the last step. That can be done likewise also for ⃗j
and ⃗k with obvious consequences that can be written simultaneously for each
coordinate as the integral of a vector function


ZZ ZZZ  ZZZ
∂p ∂p ∂p 
− p N dS = − , , dV = − ∇p dV.
∂D D ∂x ∂y ∂z D

If we consider the hydrostatic pressure at ground level given by


p(x, y, z) = c − ρgz

219
where c is constant (pressure at z = 0), ρ the density of the fluid (that also
may depend on the point, but we are considering constant at our scale) and g
the standard gravity constant at ground level. Since we have ∇p = −ρg⃗k, the
total force exerted on D is
ZZZ ZZZ
− ∇p dV = ρg⃗k dV = vol(D)ρg⃗k
D D

that is the so called Archimedes’ principle: the total force is exerted vertically
upright and is equivalent to the weight of the mass of fluid that the body
D displaces. We may complete this result calculating the line of action the
resultant force. Indeed, the physical forces are not completely represented
by vectors of R3 . It is necessary to specify the line through this force acts or
equivalently its moment with respect a given point. The moment of a resultant
force is the sum of the individual moments. Let → −
r = (x, y, z) be the position
of a point of ∂D. The total moment of the force exerted by the pressure is


ZZ
− p→−
r × N dS.
∂D

Since the result is a vector we will repeat the previous trick multiplying scalarly
by ⃗i and so

− →

ZZ ZZ
−⃗i · →

p r × N dS = − p⃗i · (→

r × N ) dS =
∂D ∂D


− →

ZZ ZZ
− p · (⃗i × →

r ) · N dS = − p · (⃗i × →

r ) · dS
∂D ∂D
that can be calculated by the Gauss-Ostrogradsky theorem. Using that p =
c − ρgz we have
−p · (⃗i × →

r ) = (cz − ρgz 2 )⃗j + (ρgyz − cy)⃗k
and so
∇ · (−p · (⃗i × →

r )) = ρgy.
Therefore we have


ZZ ZZZ
−⃗i · p→

r × N dS = ρg y dV.
∂D D

Analogous computations show that




ZZ ZZZ
−⃗j · →

p r × N dS = −ρg x dV ;
∂D D

220


ZZ
−⃗k · p→

r × N dS = 0.
∂D
Knowing that the center of mass of D (with uniform density) is the point of
coordinates
Z Z Z 
−→
ZZZ ZZZ
1
CM = x dV, y dV, z dV
vol(D) D D D

our result can be written as



− −→
ZZ
− p→−
r × N dS = CM × vol(D)ρg⃗k
∂D

which means that the resultant force vol(D)ρg⃗k is exerted along the line passing
−→
through CM , exactly at the weight of D as if it was filled with fluid. In despite
of the technical difficulty of our calculations, it is possible a much simpler way
to reach the same conclusions: the volume−→ D filled with fluid would be in
equilibrium so its weight applied on CM compensates all the external forces
and moments over ∂D exerted by the rest of the fluid.

11.7.3 Hydrodynamics
Assume we have a moving fluid in such a way that at every point we have
a speed →−
v , a pressure p and a density ρ that also may depend on time. If
we delimit a region D within the fluid, the conservation of the mass implies
that the mass flux through the boundary ∂D per time unit must balance the
variation of mass inside D, that is


ZZ ZZZ ZZZ

− d ∂ρ
ρ v · dS = − ρ dV = − dV
∂D dt D D ∂t

where the sign “−” is due to the fact that fluid going out counts positively
and the last equality is just standard derivation of integrals with respect to
parameters. Applying Gauss-Ostrogradsky we have
ZZZ ZZZ ZZZ  

− ∂ρ →
− ∂ρ
0= ∇ · (ρ v ) dV + dV = ∇ · (ρ v ) + dV.
D D ∂t D ∂t
As this has to be true for every domain D we deduce
∂ρ
∇ · (ρ →

v)+ =0
∂t
221
which is the so called continuity equation of fluids. As we said at the beginning
this equation just expresses the mass conservation. If the density is constant
(e.g. liquids) we obtain ∇ · → −
v = 0: volumen entering equals volume going out.
Assume now that ∂D encompasses a part of the fluid and moves along with it.
Newton’s second and third laws combined say that the acceleration observed
on D (the mass inside) is due to the external forces, notably the effect of
the pressure and the weight if we are studying the problem at ground level.
Assuming that D is small enough to consider → −
v homogeneous on it we have
d→
− →
− →

Z Z Z Z Z ZZZ
v
ρ dV = − pdS + ρ F dV
dt D ∂D D

− →

being F a force by mass unit ( F = g⃗k for the ordinary weight). After Gauss-
Ostrogradsky and replacing terms
ZZZ →
d− →

ZZZ ZZZ
v
0= ρ dV + ∇p dV − ρ F dV
D dt D D

where ∇p comes from our previous study of the Archimedes’ principle. Since
the equality has to be true for D arbitrarily small we get
d→−
v →

ρ + ∇p − ρ F = 0.
dt
This equation has an important handicap for the applications. In practise it
is easier to determine the speed at− a given point and then its variation along
∂−
→v d→
v
time ∂t which is different from dt that represents the acceleration of a part
of the moving fluid. In order to obtain the relation between both derivatives
assume → −
v = (vx , vy , vz ) being all of them functions of x, y, z, t. Now for vx we
have
dvx ∂vx ∂vx dx ∂vx dy ∂vx dz
= + + + =
dt ∂t ∂x dt ∂y dt ∂z dt
∂vx ∂vx ∂vx ∂vx ∂vx
+ vx + vy + vz = + ∇vx · →
−v.
∂t ∂x ∂y ∂z ∂t
The same can ve done for vy , vz and putting all together we get
d→−v ∂→−
v
= + (→
−v · ∇)→
−v
dt ∂t
where the term → −v · ∇ acts like a differential operator on each coordinate of

−v . The previous equations of fluids takes now the form
∂→−v 1 →

+ (→−v · ∇)→−v + ∇p − F = 0
∂t ρ

222
which known as Euler’s equation. Note that if the fluid −
is stationary. that is,

the speed at any point remains constant in time, then ∂∂tv = 0.
Euler’s equation is still much complicated, however some reasonable assump-
tions can lead to simpler forms. Assuming that the fluid is irrotational which
means free of whirlpools and in terms of equations ∇ × →−v = 0 then

∂vx ∂vx ∂vx ∂vx ∂vy ∂vz ∂→



v →
∇vx · →

v = vx + vy + vz = vx + vy + vz = ·−
v
∂x ∂y ∂z ∂x ∂x ∂x ∂x

and the same is true for y, z. Going above to the place where (→

v · ∇)→

v first
appeared we get
1
(→

v · ∇)→
−v = ∇v 2
2
(remember, under the hypothesis that ∇ × → −
v = 0). Using this for Euler’s
equation gives us
∂→−
v 1 1 →

+ ∇v 2 + ∇p − F = 0
∂t 2 ρ


still not quite practical. Assume moreover that the fluid is stationary ( ∂∂tv = 0),


the external force is conservative ( F = −∇V ) and the density ρ constant
(liquid). Then we have

1 1  v2 p 
0 = ∇v 2 + ∇p + ∇V = ∇ + +V
2 ρ 2 ρ
which implies the important equation

v2 p
+ + V = constant
2 ρ
known as Bernoulli’s equation which is a form of the law of conservation of
energy for fluids. Note that for V constant or being its variation negligible with
respect to the pressure (e.g. at ground level the fluid moves approximatively
at the same height) then the raising of the speed implies a lowering of the
pressure, which is the so called Venturi effect. Introducing the effect of the
viscosity, which implies that the speed of the fluid near immobile objects is
zero for instance, is possible to obtain a set of formulas which model much
better real fluids: the Navier-Stokes equations.

223
11.7.4 Electromagnetic fields
The well known Coulomb’s law says that to point charges are repelled (or
attracted if they are of different sign) in the empty space with a force pro-
portional to their magnitudes and inversely proportional to the square of the
distance between them. Following standard conventions the intensity of this
force is written q1 q2
F =
4πϵ0 r2
where the constant ϵ0 depends on the unit system. The air, or matter in
general, between the charges has some effect that could be included in the
formula but will not consider. Note that the field produced by a single charge
is of Newtonian type, thus the theory of Newtonian potential can be applied


to study the field produced by a charge density. Indeed, let E the electric field
field produced by a continuous electric density ρ. Poisson equation, after the
correction of the 4π term is →
− ρ
∇· E = .
ϵ0
Note that there is not “−” because the repulsion is the effect suffered by a
positive test charge placed in a field produced by a positive density ρ > 0.
Static electric fields have a potential that simplifies their description and the
mechanical effects on charges.
However, a main ingredient here is the great mobility of charges, notably
through certain substances called conductors. Therefore, the variation with


time of ρ, and so that of E , must be taken into account. The law of con-
servation of charge implies that we may apply the fluid model to the electric


current J (flux of charge per time and surface unit) to obtain


− → −
ZZZ ZZ
d
− ρ dV = J · dS
dt D ∂D

whose meaning must be obvious at this stage. Commuting derivation and inte-
gral together the Gauss-Ostrogradsky theorem and the fact that D is arbitrary
leads to
∂ρ →

− =∇· J.
∂t
Note that so far we are considering the charge to be either positive or negative
and the current is interpreted in a positive sense, that is, if a region receives
a positive flux of charge then the charge “increases”. Actually, what moves
inside conductors are electrons which have negative charge. This fact is not

224
relevant form the point of view of classic electrodynamics.
When an ordinary conductor is placed into an electric field the charges move
inside and after a while the movement stops because of the electrical resistance.
In the reached equilibrium, the electrostatic potential is constant on the con-


ductor and so the electric field E is normal to the surface of the conductor.
The potential could be computed solving the Laplace equation ∆V = 0 with
the boundary conditions imposed by the charges distributed on the conductors
(V is constant on the boundaries). The former argument does not apply to
superconductors which are substances that under certain conditions (extremely
low temperatures) possess null electrical resistance.
A system of two point charges of equal magnitude and different signs placed
at “short” distance is called a dipole. Far from a dipole the intensity of the
field decreases faster than for a single point charge because there is an almost
cancelation of effects: the sum of two nearly oposite vectors with nearly the
same modulus. However, near the dipole things are obviously different, and the
effect of the field on another dipole not only will include attraction/repulsion
but also a torque (rotational force).
This brief discussion on dipoles was aimed to present the magnetic force. Mag-
nets behave between them like electric dipoles. The magnetic field is repre-


sented by a vector field B whose effect is not only felt on magnetic materials
but also on moving electric charges according to the formula

− →

F = q→ −v ×B
where q and → −v are respectively the charge and the speed. Since the magnetic
force is perpendicular to the trajectory it does not modify the kinetic energy


(the work done by F is 0), however the magnetic field bends the trajectory
and eventually drives trapped ionized moving particles to the poles following
a helix path (look for the explanation of the auroras). It is well known that a
piece of magnet is again a magnet with two different poles, so it is impossible
to isolate “magnetic monopoles”. That leads to consider a larger magnet as
composed of a density of “magnetic dipoles” instead of charges and so in any
arbitrarily (small) volume there is always a compensation written as


∇ · B = 0.

− →

There are equations linking E and B notably when they vary with the time.
Charges in movement (currents) produce a magnetic field, for instance elec-
tromagnets, and variations in the magnetic field induce currents, that is, vari-
ations on the electric field, for instance dynamos. Both phenomenons are

225
modelled by the laws of Faraday-Henry and Ampère-Maxwell



− ∂B
∇× E =− ;
∂t



− →
− ∂E
∇ × B = µ0 J + ϵ0 µ0 .
∂t
Note that the application of the divergence (∇·) to the first equation says
nothing new meanwhile for the second we recover the conservation of charge.

− →

The set of for equations: these two last ones together ∇· E = ρ/ϵ0 and ∇· B =
0 is called Maxwell equations which totally describes the electromagnetic field.
The constant µ0 plays for the magnetism in vacuum a role analogue to ϵ0 , and
we are deliberately omitting the modifications that happens inside the matter.


We will do more tricky manipulations on Maxwell equations. Since ∇ · B = 0

− →
− →

there is A such that ∇ × A = B . Now we have



− ∂ →
− ∂A
∇ × E = − (∇ × A) = −∇ ×
∂t ∂t
and so →

→− ∂A 
∇× E + = 0.
∂t
Therefore there exists a function ϕ such that



− ∂A
−∇ϕ = E +
∂t


that would allow us to consider that E has a scalar potential (like in the static


case) but corrected by a term ∂∂tA coming from the magnetic counterpart of
the field. →

Note that the property of A remains by adding the gradient of a scalar func-


tion. Let us assume for instance that ∇· A = 0 (technically that would require


a special decay of B at infinity, but we will turn a blind eye on it). Under
that hypothesis we would have

− ρ
∆ϕ = −∇ · E = − .
ϵ0
This is a very nice consequence, however we have to forget it because there is


a choice for A that was better for the development of the theory. The vector

226


potential A can be chosen to satisfy an apparently more strange condition due

− →
− →

to Lorenz: take A such that ∇ × A = B and

− ∂ϕ
∇ · A + ϵ0 µ0 = 0.
∂t
Working with the Lorenz’s condition we have


→
− ∂A  ρ ∂ →
− ρ ∂ 2ϕ
−∆ϕ = ∇ · E + = + (∇ · A) = − ϵ0 µ0 2
∂t ϵ0 ∂t ϵ0 ∂t
that can be rewritten as
∂ 2ϕ ρ
∆ϕ − ϵ0 µ0 = − .
∂t2 ϵ0
On the other hand, the Ampère-Maxwell equation can be transform as



− →
− ∂ ∂A 
∇ × (∇ × A) = µ0 J + ϵ0 µ0 − ∇ϕ − =
∂t ∂t

− →


−  ∂ϕ  ∂2 A →
− →
− ∂2 A
= µ0 J − ϵ0 µ0 ∇ − ϵ0 µ0 = µ0 J + ∇(∇ · A) − ϵ0 µ0 .
∂t ∂t2 ∂t2

− →
− →

Having in mind that ∆ A = ∇(∇ · A) − ∇ × (∇ × A) we have



− ∂2 A →

∆ A − ϵ0 µ0 2
= −µ0 J .
∂t
The couple of equations we have just obtained
∂ 2ϕ ρ
∆ϕ − ϵ0 µ0 2
=− ,
∂t ϵ0



− ∂2 A →

∆ A − ϵ0 µ0 = −µ 0 J
∂t2
provide a simpler and quite symmetric version of Maxwell equations in terms


of the potentials ϕ and A which are more convenient for theoretical studies.
In absence of charges and currents (that is, far away from them in practise),
the equations became homogeneous
∂ 2ϕ
∆ϕ − ϵ0 µ0 = 0,
∂t2
227



− ∂2 A
∆ A − ϵ0 µ0 = 0.
∂t2
It is a very remarkable fact that still have non trivial solutions which are
of wave type. Note that in the same conditions of absence of charges and

− →

currents E and B satisfy similar equations that can be obtained more easily
from Maxwell equations. The speed of the waves is the number
1
c= √
ϵ0 µ0

which amazingly coincides with the speed of light. This is even more shocking
if we consider that ϵ0 and µ0 could be determined working with batteries and
wires in a modest laboratory. That leads to Maxwell to think that light is
actually an electromagnetic wave. Moreover, the system of equations can be


written as an only equation in R4 for the pair ( A, ϕ) and a 4-dimensional
Laplacian-like operator
 ∂2
∂2 ∂2 1 ∂2 
□= , , ,− .
∂x2 ∂y 2 ∂z 2 c2 ∂t2
All these goes to the Theory of Relativity, but our way ends here.

11.8 Rationale and remarks


The dimension 3 played an important role in the chapter of differential forms.
It was so because the statement of the classic results. It is not difficult to
see that the general formulation of in terms of differential forms allows the
formulation in higher dimensions. However, there are aspects of the dimension
3 that exclusive. For instance, 3 is also the dimension of the 2-forms, so vector
fields can be thought of as either 1-forms or 2-forms.

That peculiarity is related to the so called “vector product”, incidentally


used in the computation of the area of a surface. For that reason, we start
with the quaternion origin of that notion. We show the correspondence be-
tween differential forms and fields.

Vector operators are the classic interpretation of the exterior differential.


The geometrical-physical interpretation is valuable for the applications later.
Then we introduce the Newtonian potential as a way to found a function with a

228
prescribed Laplacian. The computations are somehow informal but complete.
The problem of uniqueness leads naturally to the study of harmonic functions.

Among the applications we have choose some basic hydrostatics (the theo-
retical results compares to heuristic ones), hydrodynamics and electro-magnetic
field, where we get the 4-dimensional form of the Maxwell’s equations at the
end.

11.9 Exercises

− →

1. Let f and g be scalar functions, F and G be vectorial fields, all defined
on R3 . Prove the following formulas:
(a) ∇(f g) = g ∇f + f ∇g.

− →
− →

(b) ∇ · (f F ) = ∇f · F + f ∇ · F .

− →
− →

(c) ∇ × (f F ) = ∇f × F + f ∇ × F .

− → − →
− → − → − →

(d) ∇ · ( F × G ) = (∇ × F ) · G + F · (∇ × G ).
2. Show that →
− →
− →

∆ F = ∇(∇ · F ) − ∇ × (∇ × F ).

3. Prove using orthogonal coordinate transformations (positive for the rota-


cional) in R3 that the vector operators are intrinsic.
4. Find the expression of the vector operators in cylindrical and spherical
coordinates.
5. Show that the following operations do not define intrinsic operators:
 ∂ 2f ∂ 2f ∂ 2f 
f −→ , , ;
∂x2 ∂y 2 ∂z 2

−  ∂f
1 ∂f1 ∂f1 ∂f2 ∂f2 ∂f2 ∂f3 ∂f3 ∂f3 
F −→ + + , + + , + + ;
∂x ∂y ∂z ∂x ∂y ∂z ∂x ∂y ∂z


where F = (f1 , f2 , f3 ).
6. Compute the flux of the field (ax, by, cz) on the sphere centred at the
origin and radius R > 0. Compute the flux of the same field on any
other sphere.

229
7. Assume that the bounded domain D ⊂ R3 has piecewise C 1 border and
F⃗ is a C 2 field defined on R3 . Find the flux of rot(F⃗ ) through ∂D.

8. Consider on R3 the vector field F⃗ = (x3 /a4 , y 3 /b4 , z 3 /c4 ) where a, b, c > 0.

(a) Describe geometrically the surfaces that are orthogonal to F⃗ .


(b) Compute the flux of F⃗ through the sphere centred at (0, 0, 0) and
radius 1.


9. Find the flux of the field F = (x + αy, y − αx, β), where α, β ∈ R are
parameters, through the portion of paraboloid z = 1 − x2 − y 2 above the
plane z = 0 and exterior orientation.

10. Consider the pyramid on [−1, 1]2 with vertex at (0, 0, 3) and let L e the
surface made up from the lateral faces with exterior orientation. Let
F⃗ = ex+y−2z (1, 1, 1). Compute
ZZ
F⃗ · dS.

L

11. Show that the integral for the Newtonian potential generated by a con-
stant linear density ρ > 0 on the Z axis diverge. However, the analogous
integral for the Newtonian force converges on R3 except the Z axis. Find


a function Φ such that F = ∇Φ and explain why that is compatible with
the first statement of this exercise.
2 2 2
12. Consider the ellipsoid S with formula xa2 + yb2 + zc2 = 1 and let ρ(x, y, z)
be the distance to the origin from the tangent plane to the ellipsoid at
(x, y, z). Show that ZZ
ρ dS = 4πabc;
S
ZZ
1 4π 1 1 1
dS = abc 2 + 2 + 2 .
S ρ 3 a b c
q
x2 y2 z2
Hint: 1/ρ(x, y, z) = a4
+ b4
+ c4
.

13. Show that the following function is harmonic


xy
f (x, y, z) = .
(x2 + y 2 + z 2 )5/2

230
Calcule lthe flux integral
ZZ
I= ⃗
∇f dS
∂D

where D ⊂ R3 has C 1 border and (0, 0, 0) ̸∈ ∂D.


14. Compute the area limited on the first quadrant by the curve

x3 + y 3 = 3xy.

15. Let f a harmonic function defined on an open set of R2 . Let D be a


compact disc centred at p with r > 0 contained on the domain of f .
Prove that Z
1
f (p) = f ds.
2πr ∂D

16. Let α ≥ 1 and consider the function fα (x, y, z) = (x2 + y 2 + z 2 )α .


(a) Find a simplified expression for ∆fα .
(b) Compute the flux of ∇f1 through the sphere

x2 + y 2 + z 2 = 2z.

17. Let D ⊂ R3 be compact with C 1 border and let f a scalar C 1 function


defined on a neighbourhood of D. Prove that


ZZ ZZZ
fd S = ∇f dV.
∂D D

18. Let D ⊂ R3 be a compact with C 1 border. Show that the integral


 


ZZ
1
∇ · dS
∂D ρ
p
where ρ = x2 + y 2 + z 2 takes the values −4π or 0 depending on 0 being
interior or exterior to D. Interpret the integral in terms f the solid angle
and make a guess on the values in case 0 ∈ ∂D.

231
19. Prove that if f, g are enough regular in D with D ⊂ R3 (or R2 ) open,
then


ZZZ ZZZ ZZ
g∆f dV + ∇g · ∇f dV = g∇f · d S .
D D ∂D

Prove that a continuous function f : D → R which is an harmonic on D


minimices the “energy integral” among all the regular functions taking
the same values on ∂D, that is,
ZZZ ZZZ
2
∥∇f ∥ dV = min{ ∥∇g∥2 dV : g|∂D = f |∂D }.
D D

20. Let f : Rn → R be a C 2 function whose level sets coincide with the level
sets of a harmonic function. Show that
∆f
∥∇f ∥2

depends only on the value of f .

21. Let f : R2 → R be a C 2 functino such that


ZZ
∇f dS ≥ 0
∂D

for every bounded open set D with C 1 border. Prove that f cannot have
strict relative maximums. What about strict minimums?
22. Find the conditions that should be satisfied by the functions ϕ, ψ in order
to p
f (x, y, z) = ϕ( x2 + y 2 )ψ(z)
p
be harmonic. (Hint: use ϕ(r) with r = x2 + y 2 ).

232
Chapter 12

Appendix A: The
Stone-Weierstrass theorem

12.1 General Topology


Topology of metric spaces it is not enough to cover the needs of tha Analysis of
functions of real variables. Indeed, we know that the uniform convergence can
be understood in terms of a metric topology, although pointwise convergence
cannot. For that reason we need general topological spaces.

A topology on a set X is a family τ ⊂ P(X) such that:

1. ∅, X ∈ τ ;
2. if U, V ∈ τ , then U ∩ V ∈ τ ;
S
3. if (Ui )i∈I ⊂ τ , then i∈I Ui ∈ τ .
Note that the family of open sets in a metric space is a topology, so we will
refer to the elements of τ as open sets, and their complements will be called
closed sets. A set X endowed with a topology is called a topological space.
The topological space (X, τ ) is said to be Hausdorff if for every x, y ⊂ X with
x ̸= y there exists A, B ∈ τ with x ∈ U , y ∈ V and U ∩ V = ∅. Clearly,
metric topologies are Hausdorff. Continuity can be defined locally, nut we are
only interested in global continuity: a mapping f : (X1 , τ1 ) → (X2 , τ2 ) between
topological spaces is continuous if f −1 (V ) ∈ τ1 whenever U ∈ τ2 .

233
A topological S family (Ui )i∈I ⊂
S space (X, τ ) is said to be compact if for every
τ with X = i∈I Ui there exists J ⊂ I finite such that X = i∈J Ui . Compact-
ness is a very important property in Analysis since it works quite well together
continuity.

Theorem 12.1.1 (Weierstrass). Let K be a compact space and f : K → R a


continuous function. Then f is bounded and its maximum and its minimum
are reached at some points of K.

Proof. It is not difficult to prove that compactness is stable by continuous


images. Therefore, f (K) would be a compact subset of R, so it is bounded
and it contains its maximum and minimum.

As in the metric case, for a compact space K, we will denote C(K) the
set of real continuous functions defined on K, eventually endowed with the
supremum norm.
Theorem 12.1.2 (Urysohn). Let K be a compact Hausdorff space and let
A, B ⊂ K be disjoint closed subsets. Then there exists f : K → [0, 1] such that
f |A = 0 and f |B = 1.
Proof. Consider a maximal family F of open sets such if U ∈ F then A ⊂
U ⊂ X \ B and U ⊂ V for all V ∈ F with U ⊂ V . The family F either
contains a clopen subset (simultaneously open and closed) or for any U, V ∈ F
with U ⊊ V there is W ∈ F such that U ⊊ W ⊊ V . In the first case the
construction of f is obvious, so we will assume the second case holds. By
induction we may take Ut ∈ F for every dyadic t ∈ [0, 1] in such a way that
t ≤ s implies Ut ⊂ Us . Define now

f (x) = inf{t ∈ [0, 1] : x ∈ Ut }

This function satisfies f |A = 0 and fB = 1. We have to check its continuity.


By construction we have
[ [
f −1 ([0, s)) = Ut ; f −1 ((s, 1]) = (X \ Ut )
t<s t>s

which implies the continuity of f .

234
12.2 Approximation by continuous functions
The results in this section exploits two additional structures on C(K), namely
the algebra structure, i.e. C(K) is closed for the standard product of functions,
and the lattice structure, which means that C(K) is closed for the boolean
operations max and min.
Proposition 12.2.1. A linear subspace X ⊂ C(K) is dense if and only there
is ε ∈ (0, 1) such that for every disjoint closed sets A, B ⊂ K there is f ∈ X
with values in [−1, 1] such that f |A ≤ ε − 1 and f |B ≥ 1 − ε.
Proof. Assume that X is dense and take a continuous function g such that
g|A = −1 + ε/2 is its minimum and g|B = 1 − ε/2 is its maximum. Then find
f ∈ X such that ∥f − g∥∞ < ε/2. Clearly f fulfils the required conditions.
The reverse implication is more delicate. Firstly note that by scaling it is
enough to prove the approximation by elements from X for functions with
values in [−1, 1]. Consider g ∈ C(K) with values in [−1, 1] and take A =
g −1 ([−1, −2/3]) and B = g −1 ([2/3, 1]). Apply the hypothesis to find f with its
values in [−1, 1], f (A) ⊂ [−1, ε − 1] and f (B) ⊂ [1 − ε, 1]. Take f1 = 3−1 f and
set λ = (2 + ε)/3 < 1. A elemental computation shows that ∥g − f1 ∥∞ < λ,
that is (g −f )(K) ⊂ [−λ, λ]. We have now that λ−1 (g −f1 ) is a function taking
values in [−1, 1], so we can repeat the previous argument to find f2 such that

−λ ≤ λ−1 (g − f1 ) − f2 ≤ λ

what implies
∥g − f1 − λf2 ∥∞ ≤ λ2 .
Inductively we can find a sequence (fn ) such that

g − f1 − λf2 − · · · − λn−1 fn ∞
≤ λn

which prove that g is approximated by elements from X as λn goes to 0.


Theorem 12.2.2. Let X ⊂ C(K) be a vector lattice that contains the con-
stants and assume that X tells apart on points from K. Then X is dense in
C(K).

Proof. The linearity and the possibility of adding constants allows to find a
function fx,y ∈ X such that fx,y (x) = −2 and fx,y (y) = 2 whenever x, y ∈ K
with x ̸= y. Let A ⊂ K be closed and y ∈ K \ A. The family of sets (Ux )x∈A

235
−1
where Ux = fx,y ((−∞, −1)) is an open cover of A. Let x1 , . . . , xn ∈ A such
that the corresponding sets cover A. Then

fA,y = max{−1, min{fx1 ,y , fx2 ,y , . . . , fxn ,y }}

is a function such that fA,y |A = −1 and fA,y (y) = 2. Now, if B ⊂ K is


closed and A ∩ B = ∅ we may consider the open cover of B given by the sets
−1
Vy = fA,y ((1, +∞)) and take a finite subcover given by points y1 , . . . , ym . Thus
the function
fA,B = min{1, max{fA,y1 , fA,y2 , . . . , fA,ym }}
satisfies fA,B |A = −1 and fA,B |B = 1. The proof finishes by applying the
denseness criterion Proposition 12.2.1.
Lemma 12.2.3. Let (pn (t)) the sequence of polynomials defined on [0, 1] in-
ductively by p1 (t) = 0 √ and pn+1 (t) = pn (t) + 2−1 (t − pn (t)2 ). Then (pn (t))
converges uniformly to t and consequently the function |t| can be uniformly
approximated by polynomials on any bounded interval of R.
√ √
Proof. Firstly show that pn (t) ≤ t. Indeed, assuming pn (t) ≤ t by
induction and keeping in mind that t ∈ [0, 1] we have
√ √
pn+1 (t) = pn (t) + 2−1 ( t + pn (t))( t − pn (t))
√ √ √ √
≤ pn (t) + t( t − pn (t)) ≤ pn (t) + ( t − pn (t)) = t.
Now we have pn+1 (t) ≥ pn (t) so the sequence is increasing and bounded. The
limit p(t) is the only solution of the functional equation p(t) = p(t) + 2−1 (t −
p(t)2 ). The convergence is uniform by Dini’s theorem 1.5.2.
It is clear that the composition pn (t2 ) converges to |t| uniformly on [−1, 1]. In
order to approximate |t| on any bounded interval we may suppose that it is
of the form [−M, M ] with M > 0. It is not difficult to see that sequence of
polynomials  2 
t
Pn (t) = M pn
M2
converges uniformly to |t| on [−M, M ].
Theorem 12.2.4. Let X ⊂ C(K) be a subalgebra that contains the constants
and assume that X tells apart on points from K. Then X is dense in C(K).

236
Proof. Take any f ∈ X and let (Pn (t)) be a sequence of polynomials which
converges uniformly to |t| on the set f (K) ⊂ R. Note that Pn ◦ f ∈ X by the
hypotheses, so |f | ∈ X, which implies the lattice property for X. Theorem
12.2.2 says that X is dense in C(K) and so X = C(K).

As a consequence we recover the classical theorems of Weierstrass.


Corollary 12.2.5. The following statements are due to K. Weierstrass:

1. The algebraic polynomials in one variable are dense in C[a, b].


2. The polynomials in n variables are dense in C(K) with K ⊂ Rn compact.
3. The trigonometric polynomials are dense in C[0, 2π].

Proof. Only the last statement needs to be addressed. The fact that the
trigonometric polynomials are an actual algebra follows from these well known
equalities
cos(α) cos(β) = 2−1 (cos(α + β) + cos(α − β))
sin(α) sin(β) = 2−1 (cos(α − β) − cos(α + β))
sin(α) cos(β) = 2−1 (sin(α + β) + sin(α − β))
and separation of points in [0, π] is done just by sin t and cos t.

237
238
Chapter 13

Appendix B: Some properties of


Lp spaces

13.1 Basic properties


Along the chapter (Ω, Σ, µ) will be a complete measure space. Eventually
we could require the measure to be finite or σ-finite, however completeness is
important in order not to care about what happens on a null measure set. The
spaces Lp (µ) for 0 < p < +∞ are defined as
Z
L (µ) = {f measurable : |f |p dµ < ∞}.
p

The case p = ∞ is treated in a different way

L∞ (µ) = {f measurable : ∃M > 0, µ(|f | > M ) = 0}.

We call that property being essentially bounded. We may complete the scale
of spaces by taking L0 (µ) the set of measurable real-valued functions, that was
named M in the chapter of Measure Theory. Firstly note the following fact.
Proposition 13.1.1. Lp (µ) is a vector space for 0 ≤ p ≤ +∞.

Proof. The case L0 (µ) was already studied in Measure Theory, and L∞ (µ) is
quite obvious. Note that if a, b ≥ 0 then
 p
a+b
≤ max{ap , bp } ≤ ap + bp .
2

239
Therefore Z Z Z
p p p p
|f + g| dµ ≤ 2 |f | dµ + 2 |g|p dµ < ∞

if f, g ∈ Lp (µ). Homogeneity is evident.

On these spaces we define a distinguished functional for p >′ 0


Z 1/p
p
∥f ∥p = |f | dµ if f ∈ Lp (µ) and 0 < p < ∞ ;

∥f ∥∞ = inf{M ≥ 0 : µ(|f | > M ) = 0} if f ∈ L∞ (µ).


It is easy to see that ∥λf ∥p = |λ|∥f ∥p for any λ ∈ R and ∥f ∥p = 0 if and only
if f = 0 almost everywhere. The following is not so obvious.
Proposition 13.1.2. ∥ · ∥p is a seminorm for 1 ≤ p ≤ ∞.
Proof. The case p = ∞ is easy, so we will assume 1 ≤ p < ∞. After we
know the functional is homogeneous, being a norm is equivalent to prove the
convexity of the “unit ball”
Z
Bp = {f ∈ L (µ) : ∥f ∥p ≤ 1} = {f ∈ L (µ) : |f |p dµ ≤ 1}.
p p

Let f, g ∈ Bp and λ ∈ [0, 1]. The convexity of t → tp implies


|λf (x) + (1 − λ)g(x)|p ≤ λ|f (x)|p + (1 − λ)|g(x)|p
Integrating we get
Z Z Z
|λf + (1 − λ)g| dµ ≤ λ |f | dµ + (1 − λ) |g|p dµ ≤ 1
p p

which is the desired convexity of Bp .


For the sake of completeness we will prove that the convexity of Bp implies the
triangle property for ∥ · ∥p . Indeed, we may assume f, g ∈ Lp (µ) are such that
∥f ∥p , ∥g∥p ̸= 0. Then f /∥f ∥p , g/∥g∥p ∈ Bp and thus
∥f ∥p f ∥g∥p g
+ ∈ Bp
∥f ∥p + ∥g∥p ∥f ∥p ∥f ∥p + ∥g∥p ∥g∥p
and thus
f +g
≤1
∥f ∥p + ∥g∥p p

240
implying ∥f + g∥p ≤ ∥f ∥p + ∥g∥p as desired.

Note that the key inequality for convexity goes the oposite way if p < 1,
however we have the following.
Proposition 13.1.3. The formula d(f, g) := ∥f − g∥pp defines an invariant
translation pseudometric on Lp (µ) for 0 < p < 1.
Proof. Note that for a, b ≥ 0 and 0 < p ≤ 1 we have (a + b)p ≤ ap + bp .
Therefore Z Z Z
|f − g| dµ ≤ |f − h| dµ + |h − g|p dµ
p p

and the fact that the metric is translation invariant is obvious.

Another useful inequality.


Theorem 13.1.4. Assume that µ(Ω) = 1, we are given a convex function
ϕ : (a, b) → R and f ∈ L1 such that f (x) ∈ (a, b) for almost every point. Then
Z  Z
ϕ f dµ ≤ ϕ ◦ f dµ

where the integral is taken with value ∞ in case ϕ ◦ f ̸∈ L1 .


Proof. The inequality is evident for simple functions provided that they are
taken into the “canonical form”, that is, on a partition of Ω. The general
statement follows by taking limits: if sn → f pointwise then ϕ ◦ sn → ϕ ◦ f for
a sequence of simple functions such that sn ∈ (a, b) because a convex function
is continuous. In order to ensure the convergence of the integral note that for
(ϕ ◦ f )− is posible to use dominated convergence (the function ϕ is bounded
below by a liner one) and for (ϕ ◦ f )+ monotone convergence.

Consider ∼ the equivalence relation f ∼ g if f = g almost everywhere. The


quotient spaces Lp (µ) = Lp (µ)/ ∼ are still vector spaces and the functional ∥·∥p
is well defined on them. Note that now ∥ · ∥p is a norm on Lp (µ) if 1 ≤ p ≤ ∞
and ∥ · ∥pp defines a translation invariant metric on Lp (µ) if 0 < p < 1.
Theorem 13.1.5. (Lp (µ), ∥ · ∥p ) is complete for 1 ≤ p ≤ ∞.
Proof. Assume p < ∞. It is enough to prove that P an absolutely convergent
series is convergent, so assume (fn ) ∈ Lp (µ) with ∞n=1 ∥fn ∥p < ∞. Take

241
Pn
gn = k=1 |fk | which is an increasing sequence of positive functions. The
triangle property of the norm implies supn ∥gn ∥p < ∞. In particular we have
Z Z
lim gn = lim gnp dµ < ∞
p
n n

that implies limn gn < ∞ almost everywhere. Therefore the series ∞


P
n=1 fn is
almost everywhere pointwise convergent. Let f its limit whereas its convergent
and take 0 otherwise. Clearly f is a measurable function and we have
Z Z X ∞ Z
p
|f | dµ ≤ ( |fn |) dµ = lim gnp dµ < ∞
p
n
n=1

meaning that f ∈ Lp (µ). The convergence of the series to f will be consequence


of the ∥ · ∥p -boundedness of the tails
n
X ∞
X ∞
X
∥f − fk ∥pp ≤ ∥ |fk |∥pp ≤ ( ∥fk ∥p )p → 0
k=1 k=n+1 k=n+1

where the last inequality comes from the monotone convergence theorem ap-
plied to the triangle inequality. The case p = ∞ can be handled with the same
standard ideas as the proof of the completeness of ℓ∞ or C(K).

13.2 Convergence
Here we will compare several types of convergence. Firstly we will introduce
the notion of convergence in measure. We say that a sequence (fn ) ⊂ L0 (µ)
converge in measure to f if
lim µ({|fn − f | > ε}) = 0
n

for every ε > 0. Clearly, the limit in measure is determined almost everywhere.
Note that Chebyshev inequality says that if f ∈ Lp (µ) and ε > 0 then
Z
−p
µ({|f | ≥ ε}) ≤ ε |f |p dµ = ε−p ∥f ∥pp

implying the following.


Proposition 13.2.1. The convergence in norm ∥ · ∥p implies the convergence
in measure.

242
We also have.

Proposition 13.2.2. If µ(Ω) < ∞ then the convergence almost everywhere


implies the convergence in measure.

Proof. If (fn ) converges to f almost everywhere then for every ε > 0 we have
∞ [
\
µ( {|fk − f | > ε}) = 0.
n=1 k≥n

Thus, if µ(Ω) < ∞, then


[
µ({|fn − f | > ε}) ≤ µ( {|fk − f | > ε}) → 0
k≥n

as wished.

The convergence in measure is a topological one.

Proposition 13.2.3. If µ(Ω) < ∞ then the convergence in measure is metrized


by Z
d(f, g) = min{|f − g|, 1} dµ.

Proof. Since the notion is translation invariant we will consider neighbour-


hoods of 0. If 0 < ε < 1 then
Z
−1
µ(|f | > ε) ≤ ε min{|f |, 1} dµ

and thus the d-convergence implies the convergence in measure. On the other
hand, if (fn ) converges in measure to 0 the right-hand side of the following
formula can be done as smaller as we wish
Z
min{|fn |, 1} dµ ≤ εµ(Ω) + µ({|fn | > ε})

which implies the convergence in the metric d.

The argument in the proof of the following proposition was already used
to prove Theorem 8.6.5.

243
Proposition 13.2.4. If a sequence is convergent in measure, then it has a
subsequence which converges almos everywhere.
Proof. Let (fn ) converging in measure to f . Then it is possible to find n1
such that
µ({|fn1 − f | > 1}) ≤ 1/2.
Inductive it is possible to build an increasing sequence n1 < n2 < . . . such
that the sets
Ak = {|fnk − f | > 1/k}
satisfy µ(Ak ) ≤ 2−k . Take A = ∞
T S
k=1 j≥k Aj . And note that µ(A) = 0. By
construction we have for any x ∈ Ac that |fnk (x) − f (x)| ≤ 1/k from a certain
k on, and so the theorem is proven.

We also have the following.


Theorem 13.2.5 (Egoroff’s theorem). Assume µ(Ω) < ∞ and let (fn ) be a
sequence of measurable functions that converges to f almost everywhere. Then
for every ε > 0 there is a set Ωε ∈ Σ such that µ(Ω\Ωε ) < ε and (fn ) converges
to f uniformly on Ωε .
Proof. Consider the sequence gn = sup{|fk − f | : k ≥ n} which converges to
0 almost everywhere. By Proposition 13.2.2, (gn ) converges to 0 in measure.
Therefore, for every n ∈ N we can find kn ∈ N such that

µ ({gkn > 1/n}) < 2−n ε.

Take An = {gkn ≤ 1/n} and Ωε = ∞


T
n=1 An . By construction, it is easy to
check that µ(Ω \ Ωε ) < ε. For any n ∈ N and x ∈ Ωε ⊂ An we have

|fm (x) − f (x)| ≤ gkn (x) ≤ 1/n

for any m ≥ kn , that implies the uniform convergence of (fn ) on Ωε .

13.3 Classification of Lp spaces and examples


Let start with two examples with totally opposite behaviour. For a set Γ we
will denote ℓp (Γ) the space Lp built on the measure space (Γ, P(Γ), #). If
Γ = N we will write simply ℓp . For these spaces we have ℓp1 ⊂ ℓp2 if p1 ≤ p2 ,
and moreover the inclusion is continuous with norm 1. On the other hand,

244
if (Ω, Σ, µ) is a finite measure space the inclusion happens in the reverse way
Lp2 (µ) ⊂ Lp1 (µ) if p1 ≤ p2 . Indeed, assume f ∈ Lp1 (µ), then
Z Z Z
p2 p2
|f | dµ ≤ |f | dµ + |f |p2 dµ
|f |≤1 |f |>1
Z
≤ µ(Ω) + |f |p1 dµ < ∞.
|f |>1

The norm of the inclusion can sharply estimated with the help of Hölder in-
equality, see next section.

The general case happens to be a blend of the two previous ones, although
we will state the general result under the hypothesis of σ-finiteness.
Proposition 13.3.1. Let (Ω, Σ, µ) be a σ-finite measure space and 1 ≤ p ≤ ∞.
Then Lp (µ) is isometric to a direct sum of ℓp (Γ) and Lp (ν) where Γ ⊂ N and
ν is an atom-free finite measure (eventually void).
Proof. By a result of measure theory we know that Ω = Ωa ∪ Ωf with
Ωa ∩ Ωf = ∅ where Ωa is atomic and Ωf is atom-free. Clearly
Z Z Z
p p
|f | dµ = |f | dµ + |f |p dµ
Ωa Ωf

what implies Lp is ℓp -sum of Lp (Ωa ) and Lp (Ωf ). Now, let (Aγ )γ∈Γ an enumer-
ation of the atoms. The map

T : ℓp (Γ) → Lp (Ωa )

defined by T ((xγ )) = γ xγ µ(Aγ )−1 χAγ is an isometry (details are left to the
P

reader). For the atom-free part,Sif it is not of finite measure already, we may
consider a decomposition Ωf = n Pn where µ(Pn ) = 1. Consider the measure
ν(A) = 2−n ν(Pn ∩ A) which is finite on Ωf and the map

S : Lp (Ωf ) → Lp (ν)

2n χPn f is an isometry.
P
defined by T (f ) = n

In case that Lp (µ) is separable for some 1 ≤ p < ∞ (equivalently, Lp (µ) is


separable for all 1 ≤ p < ∞ or (Σ, dµ ) is separable, see Proposition 8.6.2) it is
posible to chose ν to be the Lebesgue measure on [0, 1].

245
13.4 Duality
We will use the following arithmetical identity: if 1 < p, q < ∞ satisfies
1/p + 1/q = 1 and a, b ≥ 0 then

ap b q
ab ≤ +
p q
and the identity only happens if ap = bq . The proof of this identity can be
obtained geometrically by interpreting the summands on the right-hand side
as areas limited by the curve y = xp−1 (or equivalently x = y q−1 ).

Theorem 13.4.1 (Hölder inequality). Let 1 ≤ p, q ≤ ∞ satisfy 1/p + 1/q = 1.


If f ∈ Lp (µ) and g ∈ Lq (µ) then f g ∈ L1 (µ), ∥f g∥1 ≤ ∥f ∥p ∥g∥q . Moreover, if
the functions are normalized then the equality holds if and only if |f |p = |g|q
almost everywhere.
Proof. The case 1 ∈ {p, q} is easy. If f ∈ L∞ then the inequality |f | ≤ ∥f ∥∞
holds almost everywhere, so for every integrable function g we have
Z Z
|f g| dµ ≤ ∥f ∥∞ g dµ = ∥f ∥∞ ∥g∥1 .

Assume 1 < p, q < ∞. We also may assume ∥f ∥p , ∥g∥q > 0 otherwise both
members of the inequality turn to be 0. Consider the norm-one functions
f /∥f ∥p and g/∥g∥q and apply the arithmetic inequality

|f g| |f |p |g|q
≤ + .
∥f ∥p ∥g∥q p∥f ∥pp q∥f ∥qq

Integration gives

|f g| |f |p |g|q
Z Z Z
1 1
dµ ≤ p dµ + q dµ = + = 1.
∥f ∥p ∥g∥q p∥f ∥p q∥g∥q p q

Therefore Z
|f g| dµ ≤ ∥f ∥p ∥g∥q

as wanted. The statement about when the equality holds follows easily.

246
With the help of Hölder inequality we can now obtain a precise bound for
the inclusion operator between Lp -spaces when µ(Ω) < ∞. Assume p1 ≤ p2
and f ∈ Lp2 dµ. Then
Z Z Z 1/q Z p1 /p2
p1 p1 q p1 p2 /p1
|f | dµ = 1 · |f | ≤ 1 dµ (|f | ) dµ

where q is the conjugate exponent to p2 /p1 , that is, 1/q = 1 − p1 /p2 and thus
we have
∥f ∥p1 ≤ µ(Ω)1/p2 −1/p1 ∥f ∥p2 .
The sharpness of this bound can be tested on the function f = 1.

q p ∗
R 1 ≤ p, q ≤ ∞.
Theorem 13.4.2. Let (Ω, Σ, µ) a measure space and
q
Then
the map J : L (µ) → L (µ) defined by J(g)(f ) = f g dµ for g ∈ L (µ) and
f ∈ Lp (µ) is an (injective) isometry. Moreover
1. Lp (µ)∗ = J(Lq (µ)) if 1 < p < ∞;
2. L1 (µ)∗ = J(L∞ (µ)) if µ is σ-finite or a cardinal measure;

3. J(L1 (µ)) ⊊ L∞ (µ)∗ when they are infinite dimensional.


Proof. Hölder’s inequality implies that J is well defined and ∥J(g)∥ ≤ ∥g∥p .
In order to check that J is actually an isometry, if 1 < p < ∞ take f =
sign(g)|g|q−1 and note that ∥f ∥p = ∥g∥q−1
q We have
Z Z
f g dµ = |g|q dµ = ∥g∥qq = ∥f ∥p ∥g∥q

In case, p = ∞ take just f = sign(g) with same proof, and for p = 1 it is


possible to find “almost norming” functions taking ε > 0 and f = χA where
A ⊂ {|g| ≥ ∥g∥∞ − ε} with 0 < µ(A) < ∞.
Firstly assume µ is finite and 1 ≤ p < ∞. Let F be a continuous linear
functional defined on Lp (µ). Then the formula ν(A) = F (χA ) for A ∈ Σ
defines a σ-additive signed measure. Indeed, the formula is well defined as
χA ∈ Lp (µ) and finite additivity is quite obvious. For the σ-additivity firstly
note that |ν(A)| ≤ (µ(A))1/p . If we have a disjoint sequence (An ) ⊂ Σ then

[
∥χ S∞
k=1 Ak −χ Sn
k=1 Ak ∥p = ∥χ S∞
k=n+1 Ak ∥p = µ( Ak )1/p → 0
k=n+1

247
as n goes to ∞. That implies

[ n
X ∞
X
ν( Ak ) = F (χ S∞
k=1 Ak ) = lim F (χ Sn
k=1 Ak ) = lim ν(Ak ) = ν(Ak ).
n n
k=1 k=1 k=1

We also have ν is of bounded variation



X ∞
X X∞
|ν(An )| = ±F (χAn ) = F ( ±χAn ) ≤ ∥F ∥ µ(Ω)1/p .
n=1 n=1 n=1

The measure ν is absolutely continuous with respect to µ, so the Radon-


Nikodym theorem gives us a function g ∈ L1 (µ) such that
Z Z
F (χA ) = ν(A) = g dµ = χA g dµ.
A

The extreme equality extends to simple functions naturally and then to any
f ∈ L∞ (µ) Z
F (f ) = f g dµ

because of the uniform denseness of simple functions among the bounded mea-
surable functions. Now we are going to check that g lies actually in Lq (µ).
Indeed, if p = 1 we claim that g is essentially bounded. Otherwise, for every
n it would be possible to find A ∈ Σ with µ(A) > 0 and |g(x)| > n for x ∈ A.
Taking f = sign(g)χA we will have
Z
|F (f )| = f g dµ ≥ n µ(A) = n ∥f ∥1

which violates the continuity of F as f ̸= 0. This argument also gives


∥g∥∞ ≤ ∥F ∥. If p > 1, assume that A is a set where g is bounded. Take
f = sign(g)χA |g|q−1 which is also bounded. Note that |f |p = |g|q on A. We
have
Z Z Z 1/p
q q
|g| dµ = f g dµ = F (f ) ≤ ∥F ∥ ∥f ∥p = ∥F ∥ |g| dµ
A A

and thus Z 1/q Z 1−1/p


q q
|g| dµ = |g| dµ ≤ ∥F ∥.
A A

248
Since the bound does not depend on A, taking An = {|g| ≤ n} and applying
the monotone convergence theorem we get g ∈ Lq (µ) and ∥g∥q ≤ ∥F ∥.
The cases p = 1 and p > 1 can be put together in the following way: if
R then for every A ∈ Σ there
(Ω, Σ, µ) is a general measure space, is gA ∈ Lq (µ)
supported by A such that F (f ) = f gA dµ for every f ∈ Lp (µ) supported by
A and ∥gA ∥q ≤ ∥F ∥. In case, (Ω, Σ, µ) was σ-finite it is clear how to extend
the result. Assume (An ) are disjoint, cover Ω and have finite measure. Put
gn = gAn and define g(x) = gn (x) if x ∈ An . The function g is measurable and
for every f ∈ Lp (µ) and n ∈ N we have
n
X n Z
X Z
F (χSn
k=1 Ak f ) = F (χAk f ) = f g dµ = Sn
f g dµ
k=1 k=1 An k=1 Ak

which means that g represents F on nk=1 Ak . The previous observation gives


S

Z
|g|q dµ ≤ ∥F ∥q .
S n
k=1 Ak

The monotone convergence theorem implies g ∈ Lq (µ). Now g represents F


on the whole space because χSnk=1 Ak f → f in the ∥ · ∥p norm.
So far, representation theorem has been proved for 1 ≤ p < ∞ and (Ω, Σ, µ)
a σ-finite measure space. The general statement for 1 < p < ∞ follows from
the fact that a continuous operator defined of Lp (µ) is supported on a σ-finite
subset of Ω. Let F : Lp (µ) → R a continuous operator such that is not σ-
finitely supported in the following sense: for every countably many sets of
finite positive measure (An ) there is A disjoint with them all and f ∈ Lp (µ)
supported in A such that F (f ) > 0. Using transfinite induction is possible
to find a disjoint collection (Aα )ωα=1
1
⊂ Σ with positive measure such that Aα
p
supports a norm one element fα ∈ L (µ) with F (fα ) ̸= 0. A standard argument
with non countable cardinals shows that there δ > 0 such that F (fα ) > δ for
infinitely
P many α’s. We may relabel
P some of these elements with N. Now note
that ∥ nk=1 fn ∥ = n1/p and F ( nk=1 fn ) > nδ. That implies ∥F ∥ > δn1−1/p
that goes to ∞ as n does which is a contradiction. We have now F is supported
on a σ-finite set so we can apply the previous part to get g ∈ Lq (µ) supported
on this same set representing f .
The last fact follows form the existence of finitely additive measures on P(N)
which are not σ-additive, which in terms of Banach spaces is just the fact that
(ℓ1 )∗∗ ̸= ℓ1 . A Hahn-Banach extension of a functional suitably defined by a
finitely additive measure on (Ω, Σ) will do the work.

249
13.5 Uniform convexity of Lp(µ) for 1 < p < ∞
We say that a Banach space is uniformly convex if
x+y
δX (t) = 1 − sup{ : ∥x∥ = ∥y∥ = 1, ∥x − y∥ ≥ t} > 0
2
for all t ∈ (0, 2]. The function δX (t) is called the modulus of uniform convexity
of X. The main aim now is to prove the uniform convexity of Lp (µ) spaces for
1 < p < ∞. We will distinguish between two cases with different proofs.
Theorem 13.5.1. Lp (µ) is uniformly convex for 1 < p < ∞.
Proof. The norm ∥ · ∥p in R2 is strictly convex for 1 < p < ∞. A compactness
argument shows that for every t > 0 then
p
x+y
δ(t) = 1 − sup{ : |x|p + |y|p = 1, |x − y| ≥ t} > 0.
2
Using homogeneity we get that if |a − b|p ≥ tp (|a|p + |b|p ) then
p  p
|a| + |b|p

a+b
≤ (1 − δ(t)) .
2 2
Assume f, g ∈ Lp (µ) with ∥f ∥p = ∥g∥p = 1 and ∥f − ∥p ≥ t. Consider the set

A = {tp (|f |p + |g|p ) ≤ 4|f − g|p }

Note that
tp tp
Z Z
p
|f − g| dµ ≤ (|f |p + |g|p ) dµ ≤
Ac 4 Ac 2
and therefore
tp
Z
|f − g|p dµ ≥ .
A 2
We get the following inequality we will use soon
p p
|f |p + |g|p |f | + |g| f −g tp
Z Z Z
dµ ≥ dµ ≥ dµ ≥ .
A 2 A 2 A 2 2p 2
The uniform convexity follows easily now
p Z  p p
f +g |f | + |g|p f +g
1− ≥ − dµ ≥
2 p 2 2

250
p
|f |p + |g|p |f |p + |g|p tp δ(t)
Z  Z
f +g
− dµ ≥ δ(t) dµ ≥ p+1 .
A 2 2 A 2 2
Indeed,
1/p
tp δ(t)

f +g
sup{ : ∥f ∥p = ∥g∥p , ∥f − g∥p ≥ t} ≤ 1 − p+1 <1
2 p 2

as wished.

In the case p ≥ 2 is possible to obtain a sharper inequality by simpler


means. We will use an arithmetical inequality. If a, b ∈ R and 2 ≤ p < ∞ then

p p 1/p 2 2 !1/2 1/2


a2 + b 2
   
a+b a−b a+b a−b
+ ≤ + =
2 2 2 2 2

and so p/2
p p
a2 + b 2 |ap | + |b|p

a+b a−b
+ ≤ ≤
2 2 2 2
because of the convexity of t → tp/2 . Now, if f, g ∈ Lp (µ) with ∥f ∥p = ∥g∥p = 1
then
p p Z 
f −g
Z Z Z
f +g 1 p p
dµ + dµ ≤ |f | dµ + |g| dµ = 1
2 2 2

and thus p p
f +g f −g
≤1− .
2 p 2 p

It follows   p 1/p
f +g t
≤ 1−
2 p 2
if ∥f − g∥p ≥ t. Therefore Lp (µ) is uniformly convex with modulus of uniform
convexity
 p 1/p
tp

t
δLp (µ) (t) ≥ 1 − 1 − ∼ p
2 2p
when t ∼ 0.
The case 1 < p < 2 is trickier and it is known that δLp (µ) (t) ∼ cp t2 .

251
252
Chapter 14

Appendix C: Introduction to
Lagrangian and Hamiltonian
mechanics

14.1 Coordinates and speeds


The standard way of presenting Newtonian mechanics is with the help of vec-
tor notation. For instance, the position of a system made up of N particles
is depicted by N spatial vectors (⃗r1 , . . . , ⃗rN ). The equations in Lagrangian
mechanics are scalar, so we need to change to the quite unnatural notation

(x1 , x2 , x3 , . . . , x3N −2 , x3N −1 , x3N ) := (⃗r1 , . . . , ⃗rN )

In many applications the positions of the particles cannot be totally arbitrary,


that means, the variables {xi : 1 ≤ i ≤ 3N }, and perhaps the time t, sat-
isfy dependence relations, which can be interpreted in the language of smooth
manifolds (see Section 5.2). Assume that locally the position can be described
by n ≤ 3N independent variables {qj : 1 ≤ j ≤ n} which are calles gener-
alized coordinates. In Newtonian mechanics the equations of movement are
elegantly expressed in terms of the cartesian coordinates and their derivatives.
Lagrangian mechanics seeks a form of the equations of movement valid for ev-
ery set of coordinates. In particular, expressing the equations of the movement
in terms of the generalized coordinates would simplify their resolution.

As the state of a system is determined by the positions and speeds of all


the particles whose it is made up, the first step is to find the relations between

253
cartesian and generalized coordinates. Put q = (q1 , . . . , qn ) and q̇ = (q̇1 , . . . , q̇n )
is derivatives with respect time, that we will call generalized speeds. As we said
above
xi = xi (q, t)
so the derivative with respect time will be of the form
ẋi = ẋi (q, q̇, t).
The explicit expression can be find using the chain rule
X ∂xi ∂xi
ẋi = q̇j + .
j
∂gj ∂t

Note the difference between total derivative with respect to t which concerns
to an actual movement, and the partial derivative with respect to t which
expresses a constraint of the system that depends of the moment. On the
other hand, we may consider theoretical variations of coordinates keeping t
constant, which are called virtual displacements.
Lemma 14.1.1. The following relations hold
∂ ẋi ∂xi
= ;
∂ q̇j ∂qj
 
∂ ẋi d ∂xi
= .
∂qj dt ∂qj
Proof. We have
X ∂xi ∂xi
ẋi = q̇k +
k
∂qk ∂t
from which the first formula follows trivially. For the second one, just compare
these expressions
∂ ẋi X ∂ 2 xi ∂ 2 xi
= q̇k +
∂qj k
∂qk ∂qj ∂t∂qj
and, by the chain rule,
d

∂xi
 X ∂ 2 xi ∂ 2 xi
= q̇k +
dt ∂qj k
∂qj ∂qk ∂qj ∂t

which are equal by commutativity of derivation as the coordinate functions are


supposed regular enough.

254
14.2 Forces, work and energy
The force acting on a particle is a quantitative description on how the interac-
tion with the rest of the universe. We may distinguish between the interaction
with other particles of the system and interactions with objects from outside
the system, that is internal forces and outer forces. We assume that the prin-
ciple of superposition holds, that is, that the interactions are additive and so
the total force applied on a particle is the sum of all the individual forces (one
for each interaction) applied on it.

The work done by a force, which only depends on the configuration of the
system, along the displacement of a particle is independent of time in the sense
that the speed of the particle does not matter for the computation. Therefore,
we may consider the work done by forces in virtual displacements. As the force
is variable, it is convenient to use a differential expression that physicist like to
interpret in terms of infinitesimals. In order to tell apart of real displacements
done as time goes by, we will use δ instead of d for differentials. The differential
of work expressed in cartesian coordinates appears as
X
δW = fi δxi
i

where {fi : 1 ≤ i ≤ 3N } are the components of the force suitably enumerated.


In order to express δW in terms of the generalized coordinates, observe that
X ∂xi
δxi = δqj
j
∂qj

and then
X X ∂xi X X ∂xi X
δW = fi ( δqj ) = ( fi ) δqj = Qj δqj
i j
∂qj j i
∂qj j

where Qj = i fi ∂x
P i
∂qj
are called the generalized components of the force. One
important task is to determine the forces acting on a given system.

In the very important case that the force is conservative (work done be-
tween two point does not depend on the trajectory) the differential form δW
∂V
is exact and there exist a function V = V (q) such that Qj = − ∂q j
where the
sign minus is taken for the sake of the physical interpretation of V as potential

255
energy. The equation we will prove for the movement admits more general
potentials that eventually could depend on the speed too.

Another important case of forces, specially from the Lagrangian point of


view, are the constraint forces which reduce the degrees of freedom of the sys-
tem and so motivates the use of a suitable choice of generalized coordinates. In
many cases practical cases the constraint forces do no work in virtual displace-
ments (e.g. they are normal to the movement). The system where that holds
are called holonomic systems and the generalized components of the constraint
forces reduce to 0.

If the masses of the particles are suitably enumerated, the kinetic energy
of the system is defined as
1X
T = mi ẋ2i
2 i
In despite that the energy depends only on the cartesian speeds, in generalized
coordinates depends on {q, q̇, t}. However, if t does not appear explicitly in
the change of variables, then T is a quadratic function with respect to q̇. Let
us finish with the following easy fact.
Lemma 14.2.1.  
d ∂T
mi ẍi = .
dt ∂ ẋi

14.3 Equations of movement


From now on we will assume that the cartesian coordinates are taken with
respect to an inertial frame. In such a case, Newton’s law applies to every
particle of the system and therefore
mi ẍi = fi
for 1 ≤ i ≤ 3N and fi being the i-th component of the total force applied to
the correspondent particle of the system. We wish to find the form of those
equations in generalized coordinates, the answer is the following.
Proposition 14.3.1. The equations of the movement of the system of n de-
grees of freedom in generalized coordinates are
 
d ∂T ∂T
− = Qj
dt ∂ q̇j ∂qj

256
for 1 ≤ j ≤ n. Moreover, if the system is holomomic the computation of Qj
does not include the constraint forces.
Proof. Consider the following chain of equalities where the previous lemmata
are applied together a trick based on the derivative of a product
X ∂xi X  
∂xi X d ∂T ∂xi
Qj = fi = mi ẍi = =
i
∂q j i
∂q j i
dt ∂ ẋ i ∂q j

X d  ∂T ∂xi  X ∂T d  ∂xi 
− =
i
dt ∂ ẋ i ∂q j i
∂ ẋ i dt ∂q j

X d  ∂T ∂ ẋi  X ∂T ∂ ẋi 
d ∂T

∂T
− = −
i
dt ∂ ẋi ∂ q̇j i
∂ ẋi ∂qj dt ∂ q̇j ∂qj
which gives the desired equality. The observation about constraint forces is
consequence that their generalized components with respect to such a set of
coordinates is 0.

14.4 The Lagrangian


Assume that our system is holonomic and the external forces derive from a
∂V
potential V (q, t) in the sense that Qj = − ∂q j
(that includes the conservative
case if there is no dependence on t) then the equations of movement can be
rewritten as  
d ∂T ∂T ∂V
− + = 0.
dt ∂ q̇j ∂qj ∂qj
Define the Langrangian function of the system by

L(q, q̇, t) = T (q, q̇, t) − V (q, t)

and then the previous equality becomes


 
d ∂L ∂L
− = 0.
dt ∂ q̇j ∂qj

That equation could be obtained under more general assumptions meaning


that we would consider the equation as the starting point for the foundations
of Mechanics. Let us state this fact as a theorem.

257
Theorem 14.4.1. Every mechanical system is characterized by a function
L(q, q̇, t) in such a way that at any initial configuration the trajectories of the
system in time satisfies the equations
 
d ∂L ∂L
− =0
dt ∂ q̇j ∂qj
for 1 ≤ j ≤ n, being n the degrees of freedom of the system.
From now on we will asume that the Lagrange equations characterize the
evolution of the system once we know its Lagrangian, however the derivation
from Newton’s laws to this form was done under specific hypotheses (holon-
omy, potentials depending only on positions. . . ).

In order to have a nicer foundation for the Mechanics, it is worth noticing


that it is possible to change the practical but unnatural differential equation
by an “universal minimization principle”.
Theorem 14.4.2. Every mechanical system is characterized by a function
L(q, q̇, t) in such a way that the trajectory carried out by the system between two
given configurations (q1 , q̇1 , t1 ) and (q2 , q̇2 , t2 ) an extremal value of the integral
Z t2
L(q, q̇, t) dt
t1

amongst all the other smooth trajectories between the same configurations.
Proof. For simplicity we will assume that the system has only a degree of
freedom. Actually, this computation was done on Chapter 4, nevertheless we
will repeat it with the notation from Mechanics. From now on (q, q̇, t) denotes
the trajectory followed by the system, so (q(ti ), q̇(ti )) = (qi , q̇i ) for i = 1, 2.
In order to show the extremality of the real trajectory we will consider C 2
perturbations of the form h(t) such that it and ḣ(t) vanish at t1 , t2 . Extremality
implies the directional derivative
d t2
Z
L(q + sh, q̇ + sḣ, t) dt
ds t1
must be 0 at s = 0. The parametric derivation can be performed under the
integral sign this way
Z t2 
∂L ∂L 
(q + sh, q̇ + sḣ, t)ḣ + (q + sh, q̇ + sḣ, t)ḧ dt.
t1 ∂q ∂ q̇

258
Integration by parts give that
Z t2
∂L
(q + sh, q̇ + sḣ, t)ḧ dt =
t1 ∂ q̇

t2 Z t2 
∂L d ∂L 
(q + sh, q̇ + sḣ, t)ḣ − (q + sh, q̇ + sḣ, t) ḣ dt
∂ q̇ t1 t1 dt ∂ q̇
Z t2 
d ∂L 
=− (q + sh, q̇ + sḣ, t) ḣ dt.
t1 dt ∂ q̇
Using this information above we get
d t2
Z
L(q + sh, q̇ + sḣ, t) dt =
ds t1
Z t2 
∂L d  ∂L 
(q + sh, q̇ + sḣ, t) − (q + sh, q̇ + sḣ, t) ḣ dt
t1 ∂q dt ∂ q̇
and the annulation at s = 0 of this implies
Z t2 
∂L d  ∂L 
(q, q̇, t) − (q, q̇, t) ḣ dt = 0
t1 ∂q dt ∂ q̇
for all the perturbations h satisfying the required assumptions. As it is possible
to take h with support as small as we wish contained into (t1 , t2 ) we deduce
that
∂L d  ∂L 
(q, q̇, t) − (q, q̇, t) = 0
∂q dt ∂ q̇
for t ∈ (t1 , t2 ) as we wanted.

Conservation laws are always a first step for the integration, or at least
simplification, of the equations of the movement. The conservation of linear
moment and angular moment in Newtonian mechanics can be generalized in
the following fashion. Associate to a generalized coordinate qj we will consider
the generalized mometum
∂L
pj = .
∂ q˙j
We have the following.
Proposition 14.4.3. If the Lagrangian L does not contain explicitly the co-
ordinate qj then pj remains constant along the trajectory of the system.

259
Proof.  
dpj d ∂L ∂L
= = = 0.
dt dt ∂ q̇j ∂qj

14.5 The Hamiltonian


If the pass from cartesian to generalized coordinates does not involve the time
the kinetic energy T is a quadratic form with respect to q̇. Assuming as well
that V does not depend on q̇ we have
X X ∂T
pj q̇j = q̇j = 2T
j j
∂q j

and so the total energy can be written as


X
T + V = 2T − L = pj q̇j − L.
j

We can prove the following principle which is somehow more general that the
assumptions of the preceding computation.
Theorem 14.5.1. If L does not contain explicitly the time, then the following
magnitude remains constant along the trajectories of the system
X
H= pj q̇j − L.
j

Proof.
dH X dpj X dL
= q̇j + pj q̈j − =
dt j
dt j
dt
X d  ∂L  X ∂L X ∂L X ∂L
q̇j + q̈j − q̇j − q̈j =
j
dt ∂ q̇j j
∂ q̇j j
∂qj j
∂ q̇j
X  d  ∂L  ∂L 
− q̇j = 0
j
dt ∂ q̇j ∂qj

as claimed.

260
If the matrix whose coefficients are
 ∂ 2L 
∂ q̇i ∂ q̇j i,j
has non null determinant which is a plausible hypothesis regarded from the
point of view of the kinetic energy, then for the Jacobian we have
∂(p1 , p2 , . . . , pn )
̸= 0
∂(q̇1 , q̇2 , . . . , q̇n )
because the matrix is the same. That implies the possibility of replacing the
set of generalized speeds q̇ by the generalized moments p.

The Hamiltonian is the function H of the preceding theorem when ex-


pressed in terms of the set of variables (q, p). The previous result establishes
the invariance of the Hamiltonian along the trajectories of the system if it does
not contain explicitly the time. Note that we can recover L from H, so they
containt the same information about the system. However, the equations of
the movement when expressed in terms of the Hamiltonian look a bit different.
Theorem 14.5.2. The trajectories of the system with Hamiltonian H(q, p)
with n degrees of freedom are the solutions of the equations
∂H dpj ∂H dqj
=− ; = .
∂qj dt ∂pj dt
Allegedly the number of equations is double. The reason is that the rela-
tionships between variables p and q must be included. This is equivalent to
consider the obvious relations q̇ = dq/dt in Theorem 14.4.1.

Proof. Using the very definition of H we have


∂H X ∂ q̇i  ∂L  X ∂ q̇i ∂L X ∂L ∂ q̇i
= pi − = pi − − =
∂qj i
∂q j ∂q j p=cte
i
∂q j ∂q j i
∂ q̇ i ∂q j

X ∂ q̇i d  ∂L  X ∂ q̇i dpj


pi − − pi =− .
i
∂qj dt ∂ q̇j i
∂qj dt
And for the other equation note that
∂L X ∂L ∂qi X ∂L ∂ q̇i X ∂ q̇i
= + =0+ pj ,
∂pj i
∂q i ∂p j i
∂ q̇ i ∂p j i
∂p j

261
so
∂H X ∂ q̇i ∂L X ∂ q̇i X ∂L ∂ q̇i dqj
= pi + q̇j − = pi + q̇j − =
∂pj i
∂pj ∂pj i
∂pj i
∂ q̇j ∂pj dt

as wanted.

The solution of the Hamilton system produces trajectories in the phase


space (q, p). The advantage with respect to the representation in (q, q̇) is that
in the phase space. For instance, if H does not contain the time then the level
curves H = cte are the trajectories of the system. Indeed, the orthogonal field
to the level curves is ( ∂H , ∂H ) and so the field ( ∂H
∂q ∂p ∂p
, − ∂H
∂q
) which is orthogonal
to the previous one must be tangent to the level curves of H.

Another reason, the 2n-dimensional divergence of the Hamiltonian field is


zero. Indeed,
X ∂  ∂H  X ∂  ∂H  X ∂ 2 H X ∂ 2H
+ − = − = 0.
j
∂qj ∂pj j
∂pj ∂qj j
∂pj ∂qj j
∂qj ∂pj

That implies the 2n-dimensional volume is preserved by the flow of the system
(as we did in dimension 3, see Section 11.3), that is, interpreting the Hamil-
tonian field ( ∂H
∂p
, − ∂H
∂q
) a the speed field of a fluid. That leads to the following
result of Liouville.
Theorem 14.5.3. Given a mechanical system, its Hamiltonian flow preserves
volumes in the phase space. In particular, given an open set D in (q, p) which
is composed of initial states, let be Dt the evolution of those states after a time
t > 0. Then the 2n-dimensional volume of Dt remains constant and equal to
the volume of D.
A well know application is the so called Poincaré’s recurrence theorem: if
the orbits of the system are confined in a bounded set then any initial state
will be arbitrarily approximated by the evolution of the system after some
time. We will finish with a version of the uncertainty principle for mechanical
systems. For simplicity consider only a degree of freedom. Assume that the
position q and momentum p are known with some errors ∆q(0) and ∆p(0)
at the beginning. Then after some time the combined uncertainty does not
decrease
∆q(t)∆p(t) ≥ ∆q(0)∆p(0).

262
Indeed, the product of the uncertainties ∆q(t)∆p(t) represents the area of a
rectangle that contains the evolution through the flow of the system of the
rectangle of sides ∆q(0) and ∆p(0) that contain all the possible initial states.
It is somehow surprising that Classical Mechanics anticipates Heissemberg’s
uncertainty principle.

263
264
Bibliography

[1] M. Aigner, G. M. Ziegler, Proofs form the Book, (6th ed. ), Springer,
2018.
[2] T. M, Apostol, Análisis Matemático, (1a ed.), Ed. Reverté, Barcelona,
1960.
[3] T. M, Apostol, Análisis Matemático, (2a ed.), Ed. Reverté, Barcelona,
1991.
[4] F. Bombal, L. Rodrı́guez, G. Vera Problemas de Análisis
Matemático, (3 vol.), Editorial AC, 1994.
[5] I. Bronshtein, K. Semendiaev, Manual de Matemáticas para inge-
nieros y estudiantes, Editorial MIR, Moscú, 1973.
[6] G. Bruhat Cours de Physique générale, (4 vol.) Masson, 1963-1968.
[7] H. Cartan, Formas Diferenciales, Omega - Colección Métodos,
Barcelona, 1972.
[8] F. del Castillo, Análisis matemático II, Alhambra, 1980.

[9] G. Choquet, Cours de Topologie, (2eme ed.), Dunod, Paris, 2000.


[10] D. L. Cohn, Measure Theory, Birkhäuser, 2013.
[11] R. Courant, F. John, Introducción al Cálculo y al Análisis
Matemático, (2 vol.), Limusa, México, 1982.

[12] P. J. Davis, R. Hersh, Experiencia Matemática, Ministerio de Edu-


cación y Ciencia, Ed. Labor, Barcelona, 1988.

[13] E. A. Desloge, Classical Mechanics, John Wiley & Sons Inc, 1982.

265
[14] J. Dieudonné, Fundamentos de Análisis Moderno, Ed. Reverté,
Barcelona, 1979.
[15] B. A. Dubrovin, A. T. Fomenko, S. P. Novikov, Métodos y apli-
caciones de Geometrı́a Moderna, (2 vol.), Editorial URSS, Moscú, 2000.
[16] C. H. Edwards JR., Advanced Calculus of Several Variables, Dover
Publ. Inc., New York, 1994.

[17] M. Fabian, P. Habala, P. Hájek, V. Montesinos and V. Zizler,


Banach Space Theory. The Basis for Linear and Nonlinear Analysis, CMS
Books in Mathematics, Springer, New York, 2011.
[18] K. Falconer, Fractal Geometry : Mathematical Foundations and Appli-
cations, Wiley, 2006.
[19] J. A. Fernández Viña, Análisis Matemático, (3 vol.), Tecnos, Madrid,
1986.
[20] J. A. Fernández Viña, E. Sánchez Mañes, Ejercicios de Análisis
Matemático, (3 vol.), Tecnos, Madrid, 1994.
[21] V. Ilin, E. Pozniak, Fundamentos del Análisis Matemático, (3 vol.)
Editorial MIR, Moscú, 1991.
[22] G. Jäger, Fı́sica Teórica, Labor, 1959.

[23] G. Joos, I. M. Freeman, Theoretical Physics, (3rd ed.), Blackie & Son
LMT, Glasgow, 1960.

[24] A. N. Kolmogórov, S. V. Fomı́n, Elementos de la Teorı́a de Fun-


ciones y del Análisis Funcional, Editorial MIR, Moscú, 1975.

[25] L. D. Kudriávtsev, Curso de Análisis Matemático, (2 vol.), Editorial


MIR, Moscú, 1988.

[26] G. G. Lorenz, Approximations of Functions, (2nd ed.), Chelsea, 1986.


[27] J. A. Marı́n Tejerizo, Ampliación de Matemáticas para Técnicos,
S.A.E.T.A., Madrid, 1965.
[28] J. A. Marı́n Tejerizo, Problemas de Cálculo Integral, S.A.E.T.A.,
Madrid, 1968.

266
[29] A. Mishchenko, A. Fomenko, A course of Differential Geometry and
Topology, MIR Publishers, Moscow, 1988.
[30] N. Piskunov, Cálculo Diferencial e Integral, Montaner y Simon, S.A.,
Barcelona, 1978.
[31] P. Puig Adam, Curso Teórico-Práctico de Cálculo Integral, Biblioteca
Matemática S. L., Madrid, 1975.

[32] J. Rey Pastor, P. Pi Calleja, C. A. Trejo, Análisis Matemático


(3 vol.) Kapelusz, Buenos Aires, 1968.

[33] W. Rudin, Análisis Real y Complejo, (3a ed.) McGraw-Hill, 1987.


[34] S. Salas, E. Hille, G. J. Etgen, Calculus Una y varias variables, Ed.
Reverté, 2006.
[35] L. A. Santaló, Vectores y Tensores, con sus Aplicaciones, EUDEBA,
Buenos Aires, 1962.
[36] J. J. Scala, R. Riaza López, L. Ortiz Berrocal, Cálculo Vectorial
Aplicado, Sección de Pub. E.T.S. de Ingenieros Industriales, Madrid, 1967.
[37] J. H. Shapiro, A Fixed-Point Farrago, Springer, 2016.
[38] M. Spivak, Cálculo en Variedades, Ed. Reverté, Barcelona, 1988.

[39] E. M. Stein, Singular Integrals and Differentiability Properties of Func-


tions, Princeton University Press, Princeton, 1970.

[40] E. M. Stein, R. Shakarchi, Princeton Lectures in Analysis, (4 vol.),


Princeton and Oxford University Press, 2003.

[41] G. Valiron, Théorie des Fonctions, (3eme ed.) Masson, Paris, 1990.
[42] G. Vera, Lecciones de Análisis Matemático II, https://webs.um.es/
gvb/OCW/OCW-AM-II_files/PDF/AM-II.pdf
[43] C. E. Weatherburn, Advanced Vector Analysis, G. Bell and Sons,
LMT, London, 1943.

267

You might also like