Secchi S A Circleline Study of Mathematical Analysis
Secchi S A Circleline Study of Mathematical Analysis
Secchi S A Circleline Study of Mathematical Analysis
Simone Secchi
A Circle-Line Study
of Mathematical
Analysis
UNITEXT
Volume 141
Editor-in-Chief
Alfio Quarteroni, Politecnico di Milano, Milan, Italy; École Polytechnique Fédérale
de Lausanne (EPFL), Lausanne, Switzerland
Series Editors
Luigi Ambrosio, Scuola Normale Superiore, Pisa, Italy
Paolo Biscari, Politecnico di Milano, Milan, Italy
Ciro Ciliberto, Università di Roma “Tor Vergata”, Rome, Italy
Camillo De Lellis, Institute for Advanced Study, Princeton, New Jersey, USA
Massimiliano Gubinelli, Hausdorff Center for Mathematics, Rheinische
Friedrich-Wilhelms-Universität, Bonn, Germany
Victor Panaretos, Institute of Mathematics, École Polytechnique Fédérale de
Lausanne (EPFL), Lausanne, Switzerland
Lorenzo Rosasco, DIBRIS, Università degli Studi di Genova, Genova, Italy
Center for Brains Mind and Machines, Massachusetts Institute of Technology,
Cambridge, Massachusetts, USA; Istituto Italiano di Tecnologia, Genova, Italy
The UNITEXT - La Matematica per il 3+2 series is designed for undergraduate
and graduate academic courses, and also includes advanced textbooks at a research
level.
Originally released in Italian, the series now publishes textbooks in English
addressed to students in mathematics worldwide.
Some of the most successful books in the series have evolved through several
editions, adapting to the evolution of teaching curricula.
Submissions must include at least 3 sample chapters, a table of contents, and
a preface outlining the aims and scope of the book, how the book fits in with the
current literature, and which courses the book is suitable for.
For any further information, please contact the Editor at Springer:
francesca.bonadei@springer.com
THE SERIES IS INDEXED IN SCOPUS
***
UNITEXT is glad to announce a new series of free webinars and interviews
handled by the Board members, who rotate in order to interview top experts in their
field.
Click here to subscribe to the events!
https://cassyni.com/events/TPQ2UgkCbJvvz5QbkcWXo3
Simone Secchi
A Circle-Line Study of
Mathematical Analysis
Simone Secchi
Department of Mathematics and
Applications
University of Milano Bicocca
Milan, Italy
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
Cover illustration: The circular structure of the book. Painted by Francesca Vettori
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
This book is dedicated to Francesca, for her
constant and doubtless support
Preface
vii
viii Preface
defined in terms of sets. But this viewpoint would prevent any student
√ from using
numbers on a daily basis. Students are not supposed to think of . 2 as a Dedekind
cut or as an equivalence class of sequences of rational numbers, when they first meet
limits, derivatives and integrals. And I would bet that no experienced analyst ever
learned analysis (or fell in love with analysis) this way. It is a matter of fact that
many students will never feel the need of studying Dedekind cuts, and they will be
happy to know that .R can be constructed in one way or another.
A first taste of topology on the real line then comes into play, and this is enough
to have a rigorous definition of all the main tools of analysis in one dimension:
limits, sequences and series, derivatives and Riemann integrals. Our first definition
of the derivative is formally the same definition that one would give in normed vector
spaces. We prove that the derivative is the usual limit of the incremental ratio, but we
stress that a Calculus definition must be changed as early as we want to differentiate
functions of two variables. Linearization is the keyword of differential calculus, and
we use this in full strength.
Metric spaces are not overestimated in this book. It is a common opinion
that metric spaces are the natural environment for doing analysis. This is surely
supported by some recent theories, but we prefer to present them for what they
Preface ix
are: a special case of topological spaces. Most textbooks avoid the generality of
topological spaces, and students often feel that mathematical analysis can exist only
if distances can be measured. This is largely false, as we will see. Furthermore, I
totally dislike the habit of keeping analysis and topology disjoint: some branches of
Functional Analysis do need abstract topological structures.
The chapter on differentiation may look short. Since this is not a Calculus
textbook, we do not spend time to compute dozens of elementary derivatives. The
purpose of the chapter is to present the theory of differentiation in one real variable,
so I omit any discussion about the physical meaning of derivatives. I assume that
the average reader of my book knows that “velocity is the derivative of motion, and
acceleration is the derivative of velocity.” Although Mathematics is continuously
stimulated by other sciences, it is indeed a science itself. My students are not
supposed to define derivatives as slopes of a line and integrals as energies.
The chapter on Riemann integration is probably longer than usual: I present a few
results that most textbooks only propose in the special case of continuous functions.
Indeed, I think that the class of Riemann integrable functions should be kept distinct
from the subclass of continuous functions as long as possible. The most important
feature of modern mathematics is the capability to arrange objects according to their
properties: should we deal with differentiable functions only, since they form an
interesting subclass of continuous functions? I do not think so.
A chapter on the so-called elementary functions is necessary in any rigorous
exposition of mathematical analysis. On one hand, it is true that differentiating the
cosine function before providing a definition of the cosine function is nonsense.
On the other hand, however, it would be unrealistic to remove any example which
involves elementary functions from a textbook. In my humble opinion, this is a
problem without solution: we must teach differential and integral calculus before
teaching the construction of the elementary functions. But a formal approach to
such functions is not enough for a good analyst, who needs to learn why elementary
functions actually exist.
Our journey can now proceed backwards, so that we look again at the basic ideas
from a more advanced viewpoint. In particular, I propose a quick-and-dirty review
of axiomatic set theory. Set theory is, somehow, what every mathematician pretends
to know. Students are not exposed to the difficulties of the axiomatization of this
discipline, and they usually ignore what entities are really primitive. A function is
more often than not proposed as a black box that converts a number into another
number. Once sets have been introduced or assumed to exist, there is no need to
suppose that functions are a primitive notion. This may be convenient, but it is
definitely unnecessary.
Axiomatic Set Theory looks close to philosophy, since different and non-
equivalent approaches have been proposed over the decades. I present a review of
John Kelley’s set theory, which I find particularly suited to the analyst’s mindshape.
For the sake of completeness, I also list the axioms of Zermelo and Fraenkel, which
are probably dominant among experts.
I then introduce some general topology: open and closed sets, neighborhoods,
and of course limits. I am proud to define nets, a generalization of sequences
x Preface
Inversion Theorem is proved in Hausdorff spaces, and this is a nice result that
is seldom proposed in Analysis books. My gratitude goes to Antonio Ambrosetti
and Giovanni Prodi, who popularized abstract differential calculus in the Italian
mathematical community.
The two chapters on integration and measure theory are a journey inside the
journey. I present a flavor of Integration Theory based on the Daniell approach:
the integral becomes a suitable extension of a linear functional with some weak
continuity condition. This extension leans on an elementary integral which we may
imagine as the Cauchy integral of continuous functions with compact support. I
believe that this functional-analytic construction may be of great utility to young
mathematicians who do not need all the pathologies of Geometric Measure Theory.
The subsequent chapter returns to the integral via a different path, based on
suitable families of sets called .σ -algebras. I would like the reader to understand that
there is a path which connects measurable sets and integrable functions, and that we
can decide in what direction we prefer to go along this path. But even in this second
chapter, I completely avoided the Carathéodory machinery of outer measures, and
the concrete Lebesgue measure is constructed via a Riesz Representation Theorem
à la Rudin.
This book contains a few figures, often realized in a sketchy way. The use of
personal computers would surely allow us to produce perfect figures, but I wanted
to draw the same pictures I would draw on the blackboard. Most drawings were
made by Dr. Francesca Vettori, to whom I am indebted.
Who will read this book? I hope that it can be useful to students of Mathematics
and Physics who wish to go further than standard Calculus. If compared to other
similar textbooks, the main difference here is that we offer a second glance—and
indeed more than a glance—on every traditional topic. Of course, I also hope that
some colleagues may find the book useful for preparing their lectures.
Preface xi
Instructors will surely need to complement this text with some additional
examples. In my opinion, the same example can be enlightening for a student but
obscure for another student. A good instructor knows his/her audience, and can
suitably illustrate abstract ideas with concrete examples. Exercises appear after
some important results, so that the reader is invited to solve them before proceeding
further. Several chapters end with a short collection of problems, which may be
solved by collecting the ideas of the whole chapter to which they refer.
I would like to express my gratitude Dr. Francesca Bonadei and Dr. Francesca
Ferrari of Springer Nature for their support during the preparation of the manuscript.
I am obviously responsible for any misprint appearing in this book. If you are
reading it and you have just found a (serious) error in the text, feel free to contact
me at simone.secchi@unimib.it.
Every scientific book is the result of several months of hard work, but also of several
years of study. First of all I need to acknowledge the support of my family during
the last year. I also want to express my gratitude to my university for providing me
with all the necessary tools that were used during the preparation of the manuscript.
xiii
Contents
xv
xvi Contents
5.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 62
5.5 Comments .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 65
Reference .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 65
6 Series . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 67
6.1 Convergence Tests for Positive Series . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 71
6.2 Euler’s Number as the Sum of a Series . . . . . . . .. . . . . . . . . . . . . . . . . . . . 74
6.3 Alternating Series .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 76
6.3.1 Product of Series . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 77
6.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 79
6.5 Comments .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 80
Reference .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 80
7 Limits: From Sequences to Functions of a Real Variable . . . . . . . . . . . . . 81
7.1 Properties of Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 86
7.2 Local Equivalence of Functions .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 87
7.3 Comments .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 89
8 Continuous Functions of a Real Variable . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 91
8.1 Continuity and Compactness .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 95
8.2 Intermediate Value Property .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 98
8.3 Continuous Invertible Functions . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 99
8.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 101
9 Derivatives and Differentiability .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 103
9.1 Rules of Differentiation, or the Algebra of Calculus .. . . . . . . . . . . . . 105
9.2 Mean Value Theorems .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109
9.3 The Intermediate Property for Derivatives.. . . .. . . . . . . . . . . . . . . . . . . . 113
9.4 Derivatives at End-Points .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 114
9.5 Derivatives of Derivatives . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 114
9.6 Convexity .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 117
9.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 119
9.8 Comments .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 122
References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 122
10 Riemann’s Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 123
10.1 Partitions and the Riemann Integral .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 123
10.2 Integrable Functions as Elements of a Vector Space . . . . . . . . . . . . . . 128
10.3 Classes of Integrable Functions . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 130
10.4 Antiderivatives and the Fundamental Theorem.. . . . . . . . . . . . . . . . . . . 134
10.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 137
10.6 Comments .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 138
11 Elementary Functions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 139
11.1 Sequences and Series of Functions.. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 139
11.2 Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 141
11.3 The Exponential Function . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 145
11.4 Sine and Cosine .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 148
Contents xvii
Sentences are just statements to which it is possible to attach a binary value: true
(T) or false (F). For example, “Roses are flowers” is a sentence, “dogs have five
legs” is another sentence. But “Any cat is” is not. Sometimes sentences depend on
free variables, as in “The integer n is a prime”, or “The real number x is irrational”.
The variable n and x are free in the sense that they may take any (admissible) value:
compare with “For every integer n, .n + 1 > n”. In this sentence, the variable n
is quantified by “For each”, and is not a free variable. Another example is “There
exists a positive real number r such that .r 2 = 2”.
Definition 1.1 (Negation) If A is a sentence, its negation .¬A is the sentence
governed by the following table:
A ¬A
. T F
F T
A B A∧B
T T T
.T F F
F T F
F F F
Definition 1.3 (Disjunction) If A and B are sentences, then their disjunction .A∨B
is the sentence governed by the following table:
A B A∨B
T T T
.T F T
F T T
F F F
Remark 1.1 Although the mathematical conjunction agrees with the use of “and” in
everyday language, the mathematical disjunction reflects a use of “or” which may
differ from the use in common language. To be explicit, we may formulate a golden
rule: .A ∨ B corresponds to “either A, or B, or both”. In common language we tend
to understand “either A or B, but not both.”
Definition 1.4 (Implication) If A and B are sentences, the sentence .A ⇒ B is
defined by the following table:
A B A ⇒ B
T T T
.F T T
T F F
F F T
A B A ⇐⇒ B
T T T
.F T F
T F F
F F T
1.2 Quantifiers
. ∀x A(x)
(∀x)A(x)
∀x, A(x)
(x) A(x).
The last one is clearly the most economic, and the first one is affordable. Logicians
tend to avoid brackets as far as they may, and also commas are seen as inessential
objects. It is a matter of facts that most mathematicians use brackets freely on their
blackboards, and commas are also ubiquitous.
Remark 1.3 The existential quantifier is not a primitive symbol, since the sentence
.∃x A(x) is logically equivalent to (and actually defined as) .¬(∀x ¬A(x)). As a
consequence, the negation of .∃x A(x) is precisely .∀x ¬A(x), and the negation of
.∀x A(x) is precisely .∃x ¬A(x).
Abstract We start our journey with naïve set theory. In the second half of the book
we will provide a rigorous foundation of these ideas.
The variable x is a dummy variable, in the sense that it can be replaced by any other
symbol without affecting the validity of the definition of the set X.
Example 2.1 To clarify the use of dummy variables, consider
. {x | x is a cat} = {C | C is a cat} .
On both sides we are introducing the set of all cats, no matter how we name the
generic cat.
By definition, .X = {x | x ∈ X}. Two sets X and Y are equal when they share the
same elements: .x ∈ X if and only if .x ∈ Y .
Definition 2.2 (Empty Set) The empty set is
∅ = {x | x = x},
.
Exercise 2.1 Prove that and .∅ contains no element at all. Hint: for every x, the
statement .x = x is false.
It should be remarked that the definition of the empty set is meaningful, in the
sense that it does not rely on some intuitive knowledge. The empty set could be
equally defined by means of any statement which is false, for instance
.∅ = x ∈ R x 2 = −1
Example 2.2 Why don’t we define the opposite of the empty set, namely
U = {x | x = x}?
.
This object would contain anything, since anything is equal to itself by definition
of equality. It would be desirable to have such a “set”. wouldn’t it? Unfortunately
.U cannot be a set, as Russel showed in his celebrated paradox. Let us consider
.R = {x | x ∈ / x}, the set of all sets which do not belong to themselves. What can we
say about the relation .R ∈ R?
Well, if .R ∈ R, then R is a set which does not belong to itself, so that .R ∈ / R.
Viceversa, if .R ∈
/ R, then R is not a set which does not belong to itself, hence .R ∈ R.
Formally, .R ∈ R if and only if .R ∈ / R. The consequence of this logical equivalence
is that sets cannot be described unrestrictedly, and the universe .U cannot be a set
in the naïve sense. We will see in the second part of this book that Axiomatic Set
Theory can be used to speak of sets without facing Russel’s paradox. But most
mathematicians think of sets naïvely, and so will we do for the moment. The only
recommendation is to avoid any use of the universe.
Definition 2.3 (Subsets) If A and B are sets, then A is a subset of B if and only if
each element of A is an element of B: in symbols,
∀x(x ∈ A ⇒ x ∈ B).
.
Definition 2.4 (Union and Intersection) The union of two sets A and B is the set
A ∪ B of all points that are element of either A or B (or both):
.
A ∪ B = {x | (x ∈ A) ∨ (x ∈ B)}.
.
The intersection of two sets A and B is the set .A ∩ B of all points that are elements
of both A and B:
A ∩ B = {x | (x ∈ A) ∧ (x ∈ B)}.
.
Let us suppose that for each element .α of a set A, which is called the index set,
we are given a set .Xα . We can extend our definition of union and intersection as
follows:
. {Xα | α ∈ A} = Xα = {x | ∃α(α ∈ A ∧ x ∈ Xα )}. (2.1)
α∈A
{Xα | α ∈ A} = Xα = {x | ∀α(α ∈ A ∧ x ∈ Xα )}. (2.2)
α∈A
A particular case arises when the index set is a collection .A of sets, and in this case
we can write
. {A | A ∈ A} = {x | x ∈ A for some A ∈ A}
and similarly
. {A | A ∈ A} = {x | x ∈ A for each A ∈ A}.
Exercise 2.2 For each positive real numbers .α and .β, let .Qα,β be the rectangle
[0, α] × [0, β] in the plane. Describe the sets
.
. {Qα,β | α > 0, β > 0}, {Qα,β | α > 0, β > 0}.
Theorem 2.1 Let A be an index set, and for each .α ∈ A let .Xα be a subset of a
fixed set Y . Then
(a) If B is a subset of A, then
. {Xβ | β ∈ B} ⊂ {Xα | α ∈ A},
2 Sets, Relations, Functions in a Naïve Way 11
and
. {Xβ | β ∈ B} ⊃ {Xα | α ∈ A}.
\ {Xα | α ∈ A} = {Y \ Xα | α ∈ A}, and .Y \ {Xα | α ∈ A} =
(b) .Y
{Y \ Xα | α ∈ A}.
Proof
(a) If .x ∈ {Xβ | β ∈ B}then there exists .β ∈ B such that .x ∈ Xβ . By assumption
.β ∈ A, and thus .x ∈ {Xα | α ∈ A}. If .x ∈ {Xα | α ∈ A} then .x ∈ Xα for
Exercise 2.3 Prove that indeed .(x, y) = (u, v) if and only if .x = u and .y = v.
Hint: by assumption .{{x}, {x, y}} = {{u}, {u, v}}. Consider first the case .x = y,
then deal with the general case.
Definition 2.8 (Relations) A relation is a set of ordered pairs: a relation is
therefore a set whose elements are ordered pairs.
Definition 2.9 The domain of a relation R is the set .{x | ∃y((x, y) ∈ R)}. The
range of a relation R is the set .{x | ∃x((x, y) ∈ R)}. The field of a relation R is the
union of the domain and of the range of R.
One of the simplest relations is the set of ordered pairs .(x, y) such that x is a
member of a fixed set A, and y is a member of a fixed set B. This relation reduces
12 2 Sets, Relations, Functions in a Naïve Way
therefore to
A × B = {(x, y) | (x ∈ A) ∧ (y ∈ B)} ,
.
and is called the cartesian product of A and B: see Fig. 2.4. It is clear that any
relation is a subset of the cartesian product of its domain and its range.
Remark 2.1 The identification of sets and relations usually sounds strange to
students. In this book we will never think of relations or functions like black boxes
which transform elements of some set into elements of some other set.
The inverse of a relation R, denoted by .R −1 , is the relation obtained by swapping
each of the ordered pairs belonging to R. Formally,
R −1 = {(y, x) | (x, y) ∈ R} ,
.
We remark that, roughly speaking, first comes S, then comes R, and not viceversa.
The domain of .R ◦ S is the domain of S, while the range of .R ◦ S is the range of R.
This will be of crucial importance when we introduce functions.
Definition 2.10 Suppose that R is a relation and X is the set of all points that are
elements of either the domain or the range of R. We say that R is
• reflexive, if each element of X is in relation R with itself;
• symmetric, if xRy whenever .yRx;
• antisymmetric, if xRy and yRx imply .x = y;
• transitive, if xRy and yRz imply xRz.
2 Sets, Relations, Functions in a Naïve Way 13
It is customary to use the symbol .∼ for equivalence relations, and .≤ for order
relations.
A function is a relation such that no two distinct members have the same first
coordinate. More explicitly, a relation f is a function if for each element x of its
domain there exists a unique element y of its range such that .(x, y) ∈ f , see Fig. 2.5.
Uniqueness means that if .(x, y) ∈ f and .(x, z) ∈ f , then .y = z. For a function it
is customary to abandon the general notation .(x, y) ∈ f (or xfy) in favor of .y =
f (x). Then .f (x) is the image of the element x of the domain of f . In mathematical
analysis a function .f ⊂ X × Y is denoted by the (more complicated) symbol
f : X → Y,
. x → f (x).
is called the image of the set A under f . Similarly, if B is a set and f is a function,
is called the pre-image of B under f . We notice that .f −1 (B) is just the image of
the set B under the inverse relation .f −1 . Clearly .f (A) is a subset of the range of f ,
while .f −1 (B) is a subset of the domain of f .
Theorem 2.2 If f is a function and A and B are sets, then
(a) .f −1 (A \ B) = f −1 (A) \ f −1 (B);
(b) .f −1 (A ∪ B) = f −1 (A) ∪ f −1 (B);
(c) .f −1 (A ∩ B) = f −1 (A) ∩ f −1 (B).
More generally, if we are given a set .Xα for each member .α of a non-empty index
set C, then
(d) .f −1 ({Xα | α ∈ C}) = {f −1 (Xα ) | α ∈ C};
(e) .f −1 ( {Xα | α ∈ C}) = {f −1 (Xα ) | α ∈ C}.
Proof We prove part (e),leaving the rest of the proof as a simple exercise. A point
x is an element of .f −1 ( {Xα | α ∈ C}) if and only if .f (x) is an element of this
intersection, in which case .f (x) ∈ Xα for each .α ∈C. But the latter condition is
equivalent to .x ∈ f −1 (Xα ) for each .α ∈ C, i.e. .x ∈ {f −1 (Xα ) | α ∈ C}.
Remark 2.2 Any function f is invertible as a relation. However the inverse relation
f −1 need not be again a function: this happens if and only if for each y there exists
.
a unique x such that .yf −1 x, i.e. .f (x) = y. We have proved that the relation .f −1
is a function if and only if f is a bijective function. It is customary to say that a
function .f : X → Y is invertible if it is bijective.
Remark 2.3 Any injective function .f : X → Y can be somehow inverted, in the
sense that we can define a function .g : f (X) → X such that .g(y) = x if and only
if .f (x) = y. In general the domain of g is a proper subset of Y , but the rule which
defines g is exactly the same rule which defines .f −1 . Many mathematicians do not
require surjectivity in order to define invertible functions. This is fairly reasonable,
since .f (X) is the largest subset of Y on which we can define the inverse function of
the injective function f .
Exercise 2.5 Let .f : X → U and .g : Y → V denote two functions. Prove that
(x, y) → (f (x), g(y)) defines a function .f × g : X × Y → U × V , which we call
.
2.1 Comments
. {x | P (x)} (2.3)
for the definition of a set. In this book we may seem to be lazy, since such a notation
is allowed and even typical. Let us try to elaborate on this issue.
From a very abstract viewpoint, (2.3) contains the troublesome formula
. {x | x = x} ,
which leads to the paradox of the universe. On the contrary, the more precise
formula
. {x | x ∈ U ∧ P (x)} ,
Reference
Abstract Classical mathematical analysis is actually analysis over the field of real
numbers. In a later chapter we will construct the set .R of real numbers from the
axiom of set theory, as a completion of the set .Q of rational numbers. These are
in turn constructed from the set .Z of (signed) integers, which are constructed from
the set .N of positive integers, or natural numbers. However, this approach is time-
consuming, and we prefer to quickly introduce real numbers axiomatically.
Important: Warning
The existence of the natural numbers 1, 2, 3, 4, . . . will be taken as granted. This is
a reasonable compromise in a first approach to Analysis. In the second half of the
book we will show that numbers can be defined in terms of sets.
Remark 3.1 The subset P should be thought of as the subset of positive numbers.
The choice of isolating a subset P instead of introducing an order relation is clearly
idiosyncratic.
One summarizes the first five axioms by saying that R is a field. Abstract fields
are algebraic structures, and we will not discuss them in this book. It is noteworthy
that these axioms allow us to recover the basic algebraic rules of manipulation of
numbers.
Theorem 3.1 If x, y, z and w are real numbers, and w = 0, then x + z = y + z
implies x = y. Furthermore, xw = yw implies x = y.
Proof Indeed
x = 1x = x1 = x(ww−1 ) = (xw)w−1
.
= (yw)w−1 = y(ww−1 ) = y1 = 1y = y.
Similarly,
x = x + (z + (−z)) = (x + z) + (−z)
.
= (y + z) + (−z) = y + (z + (−z)) = y.
3.1 The Axioms of R 19
Theorem 3.2 If x, y, z and w are real numbers such that z = 0 and w = 0, then
1. x0 = 0;
2. −(−x)) = x;
3. (w−1 )−1 = w;
4. (−1)x = −x;
5. x(−y) = −xy = (−x)y;
6. (−x) + (−y) = −(x + y);
7. (−x)(−y) = xy;
8. (x/z)(y/w) = (xy)/(zw);
9. (x/z) + (y/w) = (wx + zy)/(zw).
The proof is left as an exercise. For instance, x0 +x0 = x(0 +0) = x0 = 0 +x0,
thus x0 = 0. Or x + (−x) = 0 = (−x) + (−(−x)) = −(−x) + (−x), thus
x = −(−x).
.R = P ∪ {0} ∪ (−P ).
We call P the subset of positive real numbers. The set −P is the subset of
negative real numbers. It is customary to write x < y instead of y − x ∈ P ,
or equivalently x − y ∈ −P . Moreover x ≤ y will mean that either x < y or
x = y. Finally, y ≥ x is the same as x ≤ y, and y > x is the same as x < y.
1 The symbols glb E for inf E and lub E for sup E are old-fashioned.
3.2 Order Properties of R 21
Concretely, s = sup E if and only if s is an upper bound for E, and for each
upper bound b of E there results s ≤ b. A similar statement holds for the infimum
of E, and is left as an exercise.
Example 3.1 Consider E = {x | (∃n ∈ N) (x = 1/(n + 1))}. With the convention
that N = {0, 1, . . .}, we claim that inf E = 0 and sup E = 1. Indeed, 1 ∈ E with the
choice n = 0, and clearly 1/(n + 1) < 1 for each n ≥ 1. On the other hand, 0 is a
lower bound for E, since x ≥ 0 for each x ∈ E. We fix any ε > 0 and prove that ε
is not a lower bound for E. Indeed, there exists n0 ∈ N so large that 1/(n0 + 1) < ε:
any positive integer larger that 1/ε − 1 will suffice.2 Hence 0 = inf E.
Definition 3.2 Let E be a non-empty subset of R. If b ∈ E is an upper bound of E,
then we call it the maximum of E, denoted by b = max E. Similarly, if b ∈ E is a
lower bound of E, then b is called the minimum of E, denoted by b = min E.
Definition 3.3 For each x ∈ R we define the absolute value of x to be
x if x ≥ 0
.|x| =
−x if x < 0.
|x + y| ≤ |x| + |y|.
.
Proof Let B the set of all upper bounds of E, and set A = R \ B. Then B = ∅, and
if x ∈ E, then x − 1 ∈ A, so that A = ∅. Let a ∈ A, b ∈ B; then a is not an upper
bound of E, so there exists x ∈ E such that a < x ≤ b and thus a < b. It follows
now from Axiom 8 that there exists a unique real number s such that a ≤ s for each
a ∈ A and s ≤ b for each b ∈ B. If s ∈ A, then there would exist x ∈ E with s < x.
Setting a = (s + x)/2 we would derive a ∈ A because a < x ∈ E, and s < a:
contradiction. Therefore s ∈ B, and is an upper bound of E. Since s is smaller than
any upper bound b of E, it follows that s = sup E.
Corollary 3.1 Every non-empty set F ⊂ R that is bounded from below has an
infimum in R.
Proof Let E be the set of all lower bounds of F . It follows that E = ∅ and that
each element of F is an upper bound of E. Hence there exists s = sup E. Let us
now prove that s = inf F . Indeed, it is clear that s is larger than any each lower
bound of F . To conclude we need to check that s is a lower bound of F . Suppose
not. Then there exists y ∈ F such that y < s. Then y is not an upper bound of E,
so there exists x ∈ E with y < x. Since y ∈ F , this is impossible, and s is a lower
bound of F .
The following is a useful characterization of the supremum. The reader is invited
to prove this result, and to provide a similar statement for the infimum.
Theorem 3.6 Let E be a non-empty subset of R. The real number s is the supremum
of E if and only if
1. for each x ∈ E, x ≤ s;
2. for each ε > 0 there exists x ∈ E with s − ε < x.
Exercise 3.2 Consider subsets A and B of R. If A ⊂ B, prove that inf B ≤ inf A ≤
sup A ≤ sup B. Hint: Trivially, inf A ≤ sup A. Pick a ∈ A and observe that a ∈ B.
Hence a ≤ sup B. Since a ∈ A is arbitrary, deduce that sup A ≤ sup B. Now
complete the proof.
Although any mature mathematician should be aware that natural numbers are set-
theoretic objects, many analysts prefer to consider them as real numbers. This is
how we introduce them now, postponing a set-theoretic definition to a later chapter.
Definition 3.4 (Inductive Sets) We say that a set .I ⊂ R is inductive if and only if
(i) .1 ∈ I and
(ii) .x ∈ I implies .x + 1 ∈ I .
3.3 Natural Numbers 23
Definition 3.5 The set .N is the smallest (in the sense of set inclusion) inductive
subset of .R, i.e.
.N = {I | I is an inductive subset of R} .
Remark 3.3 Plainly 0 is not a natural number according to our definition. This is
just a matter of choice, since one could replace 1 by 0 in the definition of inductive
sets. To be honest, names are just names.
Important: 0 or 1?
Whether .N must contain 0 is an exhausting discussion. Many—if not most—
mathematicians tend nowadays to include 0, but there is a (funny) issue. We have
defined the natural numbers by picking 1 as the first element, and then by adding it
recursively. If we choose 0 as the first element, we cannot add it recursively, since
0 is the neutral element of the addition in .R. To sum up, starting with 0 breaks the
construction of .N as those numbers obtained by adding up the first element as many
times as we wish.
R+ = {x ∈ R | x ≥ 0}
.
is also inductive.
Exercise 3.3 Prove that each half-line .[a, +∞), where .a ∈ R, is an inductive set.
Proposition 3.1 .N is an inductive set.
Proof Indeed .1 ∈ N, since 1 is an element of each inductive subset of .R. Now let
x ∈ N, so that x is an element of each inductive subset I of .R. Then .x + 1 ∈ I , and
.
1
1 + 2 + ···+ n+ n+ 1 =
. n(n + 1) + (n + 1)
2
1
= (n + 1) k+1
2
1
= (n + 1)(n + 2),
2
and we conclude that .n + 1 ∈ A. Hence A is an inductive set, thus .A ⊃ N. Since
A ⊂ N, we necessarily have .A = N.
.
Example 3.5 We prove that .2n > n2 for each integer .n ≥ 5. Indeed, our statement
is equivalent to .2m+4 > (m + 4)2 for each integer .m ≥ 1. Let
A = m ∈ N 2m+4 > (m + 4)2 .
.
= (k + 4)2 + 2(k + 4) + 1
< (k + 4)2 + 2(k + 4) + (k + 4)
= (k + 4)2 + 3(k + 4)
< (k + 4)2 + (k + 4)(k + 4)
= 2(k + 4)2 < 2 · 2k+4 = 2(k+1)+4.
n
n k n−k
. (a + b)n = a b ,
k
k=0
where . nk = n!
k!(n−k)! .
Exercise 3.4
(i) Prove that for every .n ∈ N and every integer .k ∈ {1, . . . , n} there results
n n n+1
. + = .
k−1 k k
3.3 Natural Numbers 25
if (v) were false, then we would have .a ∈ N, .n < a + 1 and .(a + 1) − n < 1. From
(iv) it would follow that .a + 1 − n ∈ N, and this contradicts (i).
The following is a fundamental property of the natural numbers. It looks almost
trivial, but we encourage the reader to keep in mind the well-order property, since it
will come back in a more complicated way.
Theorem 3.10 (.N Is Well-Ordered) Any non-empty subset of .N has a minimum.
Proof Let A be a non-empty subset of .N, and suppose it has no smallest element.
Set
Theorem 3.12 Let .x ∈ R. Then there exists a unique integer n such that .n ≤ x <
n + 1, .x − 1 < n ≤ x.
Proof Suppose the conclusion holds true for different integers .n < m. Then .n <
m ≤ x and .x < n + 1, thus .n < m < n + 1. Hence there would exist an integer m
between the consecutive integers n and .n + 1, a contradiction to (v) of Theorem 3.9.
Let a be the smallest element of .N that is greater than .|x|. If .x ≥ 0, we take
.n = a − 1. If .x < 0, we take .n = −a + 1 or .−a according as x is an integer or
not.
Definition 3.6 The unique integer n of the previous theorem is called the integral
part of the real number x, and is denoted by .[x].
A rational number is any real number of the form .a/b, for some integers a
and b, .b = 0. The set of rational numbers is denoted by the symbol .Q. The
elements of .R \ Q are called irrational numbers.
The reader will easily check that .Q satisfies Axioms 1–5, so that it is a field itself.
A classical result shows that irrational numbers exist.
√
Theorem 3.13 . 2 ∈ / Q.
√
Proof Assume that . 2 = m/n for some integers m, n, .n = 0. We may assume that
m and n are coprime, in the sense that the fraction .m/n cannot be further reduced.
Then .2 = m2 /n2 , i.e. .m2 = 2n2 . Hence .m2 is an even number, and so is m. Hence
.m = 2k for some integer k, and .m = 4k . Thus .4k = 2n , or .2k = n . This
2 2 2 2 2 2
2
yields that .n is even, and so is n. But m and n are coprime, a contradiction.
Thus rational numbers do not exhaust .R. However, there are no “holes” between
rational numbers.
Theorem 3.14 (Density of .Q in .R) If x, y are real numbers with .x < y, there
exists .z ∈ Q such that .x < z < y.
Proof The Archimedean property yields .b ∈ N such that .b > (y − x)−1 . Then
.b
−1 < y − x. Let .a = [bx] + 1 ∈ Z. Hence .a − 1 ≤ bx < a, i.e. .a/b < x + b −1 and
Theorem 3.15 Every ordered field contains sets isomorphic to the natural num-
bers, the integers, and the rational numbers.
Proof Let us consider any ordered field (X, +, ·, ≤). By induction we may define a
function ϕ : N → X recursively as follows:
ϕ(1) = 1
.
ϕ(n + 1) = ϕ(n) + 1.
Let p and q be different natural numbers, say p < q. There exists n ∈ N such that
q = p + n: we claim that ϕ(p) < ϕ(q).
Indeed, for n = 1 we just have q = p + 1 and ϕ(q) = ϕ(p) + 1 > ϕ(p). For
a general n ∈ N we have ϕ(p + n + 1) = ϕ(p + n) + 1 > ϕ(p + n), and so
ϕ(p + n) > ϕ(p) implies ϕ(p + n + 1) > ϕ(p). By induction ϕ(p + n) > ϕ(p)
and we see that ϕ is injective.
Again by induction we can show that ϕ(p + q) = ϕ(p) + ϕ(q) and ϕ(pq) =
ϕ(p)ϕ(q). Thus ϕ is a bijective function from N onto a subset of X which preserves
sums, products and the order relation. By taking differences of natural numbers we
obtain Z as a subset of X, and by taking quotients of integers we obtain Q as a
subset of X. The proof is complete.
z + w = (a + c, b + d).
.
The set .C is a field under these operations. Indeed the complex number .(0, 0)
satisfies .(a, b) + (0, 0) = (a, b) for any .(a, b) ∈ C. The complex number .(1, 0)
satisfies .(a, b)(1, 0) = (a, b) for any .(a, b) ∈ C. If .z = (a, b), then .−z = (−a, −b)
28 3 Numbers
is such that .z + (−z) = (0, 0). If .z = (a, b) = (0, 0), then the number
a b
z−1 =
. ,− 2
a2 +b 2 a + b2
z = (a, −b).
.
Definition 3.8 The imaginary unit is the complex number .i = (0, 1).
The main reason for introducing .i is that .(a, b) = (a, 0) + (b, 0)(0, 1). If we
identify .(a, 0) with a, .(b, 0) with b, we can formally write .(a, b) = a + bi.
Exercise 3.6 Let .z = (a, b) and w be complex numbers. Prove that
1. .z = a − bi
2. .z+w =z+w
3. .zw = zw
4. .z + z = 2z and .z − z = 2i z
.zz = |z|
5. 2
6. .z
−1 = z/|z|2 provided that .z = 0.
Proposition 3.2 Let z, w be complex numbers. Then
(i) .|z| > 0 unless .z = 0, and .|0| = 0;
(ii) .|zw| = |z| |w|;
(iii) .|z| = |z|;
Proof (i) is obvious from the properties of real numbers. Let .z = a +bi, .w = c+di.
Then .|zw|2 = (ac − bd)2 + (ad + bc)2 = (a 2 + b2 )(c2 + d 2 ) = |z|2 |w|2 . Since the
modulus cannot be negative, (ii) follows.
√ (iii) is trivial. To prove (iv) we just remark
that .a 2 ≤ a 2 + b2 , so that .|a| ≤ a 2 + b 2 . The same holds the imaginary part b.
Finally, .zw + zw = 2(zw). Hence
n n n
Proof Define .A = k=1 |ak |
2 , .B = k=1 |bk |
2 , .C = k=1 ak bk . Then
n
n
0≤
. |Bak − Cbk | = (Bak − Cbk )(Ba k − CBk )
2
k=1 k=1
n
n
= B2 |ak |2 − BC ak b k
k=1 k=1
n
n
= BC a k bk + |C|2 |bk |2
k=1 k=1
= B 2 A − B|C|2
= B(AB − |C|2 ).
. z = r (cos ϑ + i sin ϑ) ,
for some numbers .r ≥ 0 and .ϑ ∈ [0, 2π). Indeed, .r = |z| = x 2 + y 2 (since
.cos ϑ + sin ϑ = 1), and .ϑ is defined as follows:
2 2
(a) if .x = 0, then
y
ϑ = arctan
. + kπ,
x
where
⎧
⎪
⎨0 if x > 0 and y ≥ 0
⎪
k=
. 1 if x < 0 and y ∈ R
⎪
⎪
⎩2 if x > 0 and y < 0.
π
if y > 0
ϑ=
.
2
3
2π if y < 0.
such that .z1 = 0 and .z2 = 0 are equal if and only if .|z1 | = |z2 | and .ϑ1 − ϑ2 = 2kπ
for some integer k.
3.6 Polar Representation of Complex Numbers 31
Then
Proof Indeed,
= |z1 ||z2 | (cos ϑ1 cos ϑ2 − sin ϑ1 sin ϑ2 + i(sin ϑ1 cos ϑ2 + sin ϑ2 cos ϑ1 ))
= |z1 ||z2 | (cos(ϑ1 + ϑ2 ) + i sin(ϑ1 + ϑ2 )) .
32 3 Numbers
Proof It is enough to recall that .zn = z · · · z (n times) and to apply Proposition 3.3.
When I began to write this book, I did not consider any discussion about the
construction of .R in the first part. As I said before, it seems that a purely axiomatic
definition of the real numbers is more than enough as a first approach to Mathemati-
cal Analysis. But then some colleagues convinced me that an authoritative approach
(“Believe me, real numbers do exist!”) may not be the best choice for an instructor:
the pace of teaching is not the pace of logic.
In this section we (the reader and I) will meet the basic ideas of a popular
construction of .R from .Q. Let me try to introduce the topic.
Imagine you know and use rational numbers (as √ fractions), but you have no idea
about irrational numbers. What is, for example, . 2? Well, Dedekind proposed to
identify real numbers
with subsets of .Q, in an appropriate way. Roughly speaking,
√ √
. 2 = r ∈ Q r < 2 .
This definition should be prosecuted by the law, of course: nothing should be
defined recursively! But look at the following proposal:
√
. 2 = r ∈ Q r ≤ 0 or r 2 < 2 .
This looks much better, doesn’t it? A (real) number is a set of rational numbers; but
what sets?
Definition 3.9 (Dedekind Cuts) A subset L of .Q is a Dedekind cut, if
(a) .L = ∅ and .L = Q;
(b) for every .x ∈ L there exists .y ∈ L such that .x < y;
(c) if .x ∈ L and .y < x, then .y ∈ L.
In our minds, a cut is a proper, non-empty subset of .Q which has no largest
element (condition (b)) and looks like a half-line starting from .−∞ (condition (c)).
The term cut can be explained as follows: if L is a Dedekind cut, then .Q is cut by L
in two parts, L and .Q \ L, such that any element of L is smaller than any element of
.Q \ L.
Lr = {x ∈ Q | x < r}
.
3.7 A Construction of the Real Numbers 33
is a Dedekind cut. However, our initial discussion suggests that this type of subsets
does not exhaust the class of cuts.
We are tempted to define .R as the collection of all Dedekind cuts. The main issue
is that .R must be an ordered field.
We can now abandon the letter L (as in Left) to denote Dedekind cuts, and
use Greek letters instead. Hence .α, .β, .γ , . . . will be real numbers, i.e. cuts.
α + β = {p + q | p ∈ α, q ∈ β} .
.
. − α = {r ∈ Q | ∃s(s > r ∧ −s ∈
/ α)}
αβ = {pq | p ∈ α, q ∈ β} .
.
Then
⎧
⎪
⎨− ((−α)β)
⎪ if α < 0 and β ≥ 0
.αβ = − (α(−β)) if α ≥ 0 and β < 0
⎪
⎪
⎩(−α)(−β) if α < 0 and β < 0.
34 3 Numbers
Clearly enough, we have adjusted the signs so that the elementary properties of
the algebraic product with respect to the order relation are satisfied.
Exercise 3.9 The real number 1 is defined as
1 = {r ∈ Q | r < 1} ,
.
where the number 1 in the right-hand side is the rational number .1 = 1/1. Show
that for any .α ∈ R, .α = 0, there exists .α −1 ∈ R such that .αα −1 = 1.
We will not prove in detail that .R with these two algebraic operations is a field:
the proofs are straightforward but boring, and can be a good exercise for the reader.
Instead, a proof of (some version of) completeness is more interesting. Let us state
it as follows.
Theorem 3.18 The ordered field .R (obtained from Dedekind cuts) has the upper
bound property: any non-empty subset, bounded from above, has a least upper
bound in .R.
Proof Let .A ⊂ R be a non-empty set, and let .β ∈ R an upper bound for A. Let us
set
.γ = {α | α ∈ A} .
This definition is meaningful, since the elements of A are sets. We are going to show
that .γ = sup A.
Pick any .α0 ∈ A, so that .α0 = ∅. Since .α0 ⊂ γ , .γ = ∅. Then .γ ⊂ β by
construction, hence .γ = Q. Let now .p ∈ γ , hence there exists .α1 ∈ A such that
.p ∈ α1 . If .q < p, then .q ∈ α1 and thus .q ∈ γ . If we finally choose .f ∈ α1 such that
.γ ∈ R.
3.8 Problems
br = sup B(r)
.
|1 + z|2 + |1 − z|2 .
.
3.9 Comments
The chapter on real numbers is always the most important one in textbooks about
(Real) Mathematical Analysis, and often the less self-contained one. The reason is
that an rigorous definition of .R requires a strong background in Set Theory and in
Abstract Algebra. We will see that natural numbers stem directly from Set Theory,
while rational numbers can be defined in terms of natural numbers with an algebraic
construction. These steps are usually omitted, since most students are satisfied with
intuitive definitions like
36 3 Numbers
or
Rational numbers are just quotient of integers.
Needless to say, the first definition is based on the hope that every student
can decide whether an object is a natural number, and the second definition is
meaningless until quotients are defined. In other words, rational numbers are
quotient only when rational numbers already exist, or when real numbers already
exist.
However, it turns out that such an intuition about numbers does suffice to develop
Calculus of one or more real variables. As a result, I believe that only few graduate
students can construct .R from .Q, and only a small minority of them can define .N
in terms of sets. I was one of those students, and this is why this book contains a
chapter on Axiomatic Set Theory.
The books [1] and [2] are good sources about numbers.
References
Abstract What does it mean that two sets have the same number of elements? This
may appear clear if we can write down all the members in a finite list. The answer
becomes complicated if the sets contain infinitely many elements. In this chapter we
propose a definition of cardinality in an elementary fashion.
finite set {1, 2, . . . , n0 − 1}. Thus at most finitely many positive integers do not
belong to N, and (a) holds.
In other words, our sequences may be considered as functions from an
unbounded interval N ∩ [n0 , +∞) for some n0 ∈ N. In the Comments at the
end of the chapter we will discuss again our definition.
Definition 4.2 (Subsequences) Let s be a sequence, and let k : N → N a sequence
of positive integers with the property that kn < kn+1 for each n ∈ N. Then the
composition s ◦ k is called a subsequence of s. Explicitly, s ◦ k = {skn }n .
Remark 4.1 In a subsequent chapter we will see that a weaker condition on
the sequence k could be assumed in order to define subsequences. The strong
monotonicity kn < kn+1 is however more popular in the literature.
Definition 4.3 (Equal Cardinality) Two sets A and B are equinumerous (or have
the same cardinality), if there exists a bijective function F : A → B. In this case we
will write A ∼ B, or even #A = #B.
It is an easy exercise in set theory to check that ∼ is actually an equivalence
relation between sets. We will use this fact in the rest of the chapter.
Definition 4.4 We say that a set A has cardinality n, if A ∼ {1, 2, . . . , n}. By
extension, the cardinality of the empty set is zero. A set A is finite, if there exists a
positive integer n such that A has cardinality n. Otherwise it is called infinite. A set
A is countably infinite if A ∼ N, and it is countable if it is either finite or countably
infinite. If A is not countable, we say that A is uncountable.
Exercise 4.1 Prove that the Cartesian product of two finite sets is a finite set. Hint:
this is essentially a “matrix” proof. If X has n members and Y has m members, you
can write down X × Y as a matrix of n rows and m columns. Then just... count the
entries of this matrix.
A countably infinite set S can always be described as S = {s1 , s2 , . . .}, where s
is the bijective function that describes the fact that A ∼ N. In this sense, a countably
infinite set can be seen as a labeled list of points.
4.1 Countable and Uncountable Sets 39
n−1
Bn = An \
. Ak .
k=1
Clearly
∞ G = {B1 , B2 , B3 , . . .} is a disjoint collection. Setting A = ∞ n=1 An , B =
B
n=1 n , we show that A = B. If x ∈ A, then x ∈ A k for some k. Let n be the
smallest k with this property, so that x ∈ / Ak for k < n. This implies x ∈ Bn , and in
40 4 Elementary Cardinality
1 if un,n = 1
vn =
.
2 if un,n = 1.
We claim that no term of the sequence {sn }n can equal y. Indeed y differs from s1
in the first digit, differs from s2 in the second digit, and in general differs from sn in
the n-th digit. But 0 < y < 1 by construction, and this contradicts the assumption
that (0, 1) is countable.
Example 4.1 Every open subset (a, b) of R has the same cardinality as R. Indeed,
we choose a number c ∈ (a, b) and we define f : (a, b) → R as
x−c
if c ≤ x < b
f (x) =
.
b−x
x−c
x−a if a < x ≤ c.
x if 0 < x ≤ 1/2
f (x) =
.
1
4(1−x) if 1/2 < x < 1.
Exercise 4.3 Prove that any infinite set contains a countably infinite subset. Hint:
let X be an infinite set. Pick any x1 ∈ X. Since X is infinite, there exists x2 ∈
4.2 The Schröder-Bernstein Theorem 41
X \ {x1 }. For the same reason, there exists x3 ∈ X \ {x1 , x2 }, and so on. In this way
we construct a subset {xj | j ∈ N} of X which is clearly countably infinite.
Let us call c the cardinality of R and ℵ0 for the cardinality of N. From our
discussion it is clear that
ℵ0 < c,
.
in the sense that there exists an injective function from N into R, but there cannot
exist a bijection between these two sets.
Important: Question
Is there any set whose cardinality is strictly larger than ℵ0 and strictly smaller than
c?
We have decided that two sets have the same cardinality if a bijective map exists
which takes one set onto the other. A celebrated result by Schröder and Bernstein
simplifies our task.
Theorem 4.7 (Schröder-Bernstein) If there is a one-to-one function on a set A to
a subset of a set B and there is also a one-to-one function on B to a subset of A,
then A and B have the same cardinality.
42 4 Elementary Cardinality
f on AE ∪ AI
F =
.
g −1 on AO
is a bijective map.
Remark 4.2 How do we interpret the previous proof? We have actually constructed
the map F by an inductive process:
E0 = A \ g(B)
.
E1 = g(f (E0 ))
E2 = g(f (E1 ))
...
En+1 = g(f (En )),
and so on. Then we set .E = n En . The function F is constructed in such a way
that .F = f on A, and .F = g −1 on .A \ E.
We present a second proof of this important result in Set Theory. We need a
preliminary tool.
Lemma 4.1 Let .X be an ordered set such that every non-empty subset has a
greatest lower bound. If .f : X → X is such that
1. there exists .x ∈ X such that .f(x) ≤ x;
2. for every .x ∈ X, .y ∈ X, .x ≤ y implies .f(x) ≤ f(y),
then .f has a fixed point, i.e. there exists .a ∈ X such that .f(a) = a.
Proof The set
A = {x ∈ X | f(x) ≤ x}
.
A → X \ g(Y \ f (A)).
.
Lemma 4.1 can be applied with .X = 2X , ordered by inclusion .⊂, and .f = F , since
F satisfies condition 2. Condition 1 is also satisfied, since .2X contains a largest
element. Thus .F (A) = A for some .A ⊂ X, and the proof follows.
A remarkable fact is that given a set A, one can always construct another set
whose cardinality is different than the cardinality of A. We call .P(A) the set of all
subsets of A.
Theorem 4.8 (Cantor) If .A = ∅, then there exists no surjective map .f : A →
P(A). In particular, A and .P(A) do not have the same cardinality.
Proof Let .f : A → P(A); we will prove that the set
.S = {x ∈ A | x ∈
/ f (x)}
does not belong to the image of f . Suppose that .S ∈ f (A), so that .S = f (s) for
some member .s ∈ A. If .s ∈ S, then .s ∈
/ f (s) = S; if .s ∈
/ S, then .s ∈ f (s) = S. In
any case we reach a contradiction.
Exercise 4.4 Suppose that .A = {x}. What is the cardinality of .P(A)? Think
carefully!
4.3 Problems
a0 zn + a1 zn−1 + · · · + an−1 z + an = 0.
.
Prove that the set of all algebraic numbers is countable. Hint: given N ∈ N, there
exist only finitely many equations with n + |a0 | + · · · + |an | = N.
4.2 Is the set R \ Q countable?
4.3 Prove that a set E is infinite if and only if E has the same cardinality of a proper
subset of E. Hint: one direction is Exercise 4.3. Conversely, if f : E → E is an
injective function and a ∈ E \ f (E), define recursively a1 = f (a), an+1 = f (an ).
44 4 Elementary Cardinality
4.4 Comments
References
Abstract The set .R has a rich algebraic structure. What is even more important for
Analysis is that its structure of ordered field with the Completeness Axiom may be
used to generate a topological environment.
We start with a fairly general definition that describes the possibility of measuring a
distance.
Definition 5.1 Let X be a set. A distance on X is a function .d : X × X → R with
the following properties:
1. .d(x, y) ≥ 0 for each .(x, y) ∈ X × X; .d(x, y) = 0 if and only if .x = y.
2. .d(x, y) = d(y, x) for each .(x, y) ∈ X × X.
3. (triangle inequality) .d(x, y) ≤ d(x, z) + d(z, y) for each x, y and z in X.
If a distance d is given on X, we say that .(X, d) is metric space.
Definition 5.2 The standard (or Euclidean) metric on .R is defined as .d(x, y) =
|x − y| for each .(x, y) ∈ R × R.
Whenever a distance is available, we can introduce the idea of neighborhood.
We will come back to this in greater generality; for the time being we stick to the
particular case of the standard metric in .R.
Definition 5.3 An open interval is any set of the form .(a, b) = {x ∈ R | a < x <
b} for some real numbers .a < b. If S is a subset of .R and .x0 is a point, we say
that .x0 is an interior point of S whenever there exists an open interval I such that
.x0 ∈ I ⊂ S. A set S is called open, if each point of S is an interior point of S. A set
(−∞, b) = {x ∈ R | x < b}
[a, +∞) = {x ∈ R | a ≤ x}
and so on.
The sentence “The set I is an interval” means that I is an interval of any kind:
open, closed, half-open, a half-line or even the whole real line.
Definition 5.4 A neighborhood of a point .x0 in .R is any set U such that an open
interval .(a, b) exists with .x0 ∈ (a, b) ⊂ U .
A neighborhood of .x0 is therefore any set that contains an open set that contains
x0 .
.
Exercise 5.1 Prove that if .x0 is a positive real number, there exists a neighborhood
of .x0 whose points are positive numbers. This harmless result will be used to prove
some results about limits of sequences and functions.
Definition 5.5 Let S be a subset of .R. A point .x0 ∈ R is an accumulation (or limit,
or cluster) point of S, if each neighborhood of .x0 contains a point of S different than
.x0 itself (see Fig. 5.1).
Recall that a sequence in a set X is any function whose domain is .N \ F for some
finite subset F of .N and whose values lie in X (see Fig. 5.2).
Definition 5.7 Let .s = {sn }n be a sequence in .R. We say that s converges to a limit
L, and we write
L = lim sn ,
.
n→+∞
if for every neighborhood U of L there exists a positive integer .n0 such that .n > n0
implies .sn ∈ U .
2Since we use s to name a sequence, it would be natural to write .s → L. This, albeit correct,
might be easily confused with the identical symbol that describes the fact that a variable s tends to
L independently, like in .lims→L f (s).
52 5 Distance, Topology and Sequences on the Set of Real Numbers
The sequence .{sn }n is bounded by Theorem 5.5, thus there exists .C > 0 such that
|sn | < C for each n. Therefore
.
≤ Cε + |M|ε = (C + |M|) ε.
5.1 Sequences and Limits 53
.n → +∞.
Exercise 5.4 Let .{sn }n be a bounded decreasing sequence of real numbers. Then
sn → infk sk as .n → +∞.
.
Proof We have already proved that . n∈N In = ∅. Let us suppose that two distinct
numbers .z < w belong to this intersection.
We may choose .ε = w − z > 0 in (ii)
and obtain a contradiction. Hence . n∈N In contains one and only one element.
Example 5.3 We prove that .R is uncountable as a consequence of Theorem 5.9. As
already noticed before, we need to prove that a closed interval .[a, b] is uncountable.
We suppose on the contrary that .[a, b] is countable. Of course .[a, b] contains
infinitely many elements, so we may suppose that .{xn | n = 1, 2, 3, . . .} is an
enumeration of .[a, b]. We are going to construct a number .z ∈ [a, b] which is
different than every term .xn of this enumeration. We divide .[a, b] into three intervals
of equal length, and we choose one of them, called .I1 , such that .x1 ∈ / I1 . If .In
has been chosen, we split it into three intervals of equal length, and we call .In+1
that interval which does not contain .xn+1 . The length of .In converges to zero as
.n → +∞, and by construction .In+1 ⊂ In for each n. Hence there exists a unique
real number z that belongs to every .In . In particular .z = xn for each n, since .z ∈ In
but .xn ∈
/ In . This contradiction proves the statement.
Theorem 5.10 Let S be a subset of .R, and let x be an accumulation point of S.
Then there exists a sequence .{sn }n of points of S such that .sn → x as .n → +∞.
Proof Consider the open neighborhood .U1 = (x − 1, x + 1) of x, and select a
point .s1 ∈ S such that .s1 ∈ U1 . This is possible because x is an accumulation
point of S. After point .s2 , .s3 , . . . , .sn−1 are selected in S, choose .sn in S such that
.sn ∈ Un = (x − 1/n, x + 1/n). Then .|sn − x| < 1/n for each n, and therefore
.sn → x as .n → +∞.
We record the following fact, which should be an easy exercise: a sequence .{sn }n
converges to a limit L if and only if every subsequence of .{sn }n converges to L.
Definition 5.8 A set .K ⊂ R is called sequentially compact, if every sequence in K
has a subsequence that converges to a point of K.
Theorem 5.11 (Bolzano-Weierstrass) Every bounded sequence in .R contains a
converging subsequence.
Proof Let .s = {sn }n be a bounded sequence of real numbers. For some .M > 0, this
means that .sn ∈ [−M, M] for each n. If the range of the sequence s is a finite set (in
the sense that s takes on only a finite number of different values), then there exists
infinitely many positive integers .n1 < n2 < n3 < . . . such that .sn1 = sn2 = sn3 =
. . .. Hence s contains a constant subsequence, which is clearly convergent to a point
of K. We may now assume that the range of s is an infinite set.
We set .I0 = [−M, M]. We divide .I0 into two parts: at least one of them must
contain infinitely many terms of the sequence, otherwise the range of s would be
finite. Let us call .I1 this part, and we choose the smallest positive integer .n1 such that
.sn−1 ∈ I1 . Now we split .I1 into two parts as before, we call .I2 the part that contains
infinitely many terms of the sequence s, and we choose the smallest positive integer
.n2 > n1 such that .sn2 ∈ I2 . Proceeding this way, we construct a subsequence .{snk }k
of S.
5.1 Sequences and Limits 55
− 0
1
2
Proof We provisionally agree that the n-th term of our sequence is dominant if
sm < sn for each .m > n. There are now two cases. It might happen that there exist
.
infinitely many dominant terms, and let .{snk }k be a subsequence consisting solely
of dominant terms. Then .snk+1 < snk for each k, and thus .{snk }k is monotonically
increasing.
The second case is that only finitely many terms are dominant. In particular there
exists a positive integer .n1 which is greater than any dominant term. There must
exist .n2 > n1 such that .sn2 ≥ sn1 . Suppose that .n3 , . . . , nk−1 have been selected so
that .n3 < n4 < . . . < nk−1 and .sn1 ≤ sn2 ≤ sn3 ≤ . . . ≤ snk−1 . As before, since
only finitely many terms are dominant, there exists .nk > nk−1 such that .snk ≥ snk−1 .
By induction we have constructed an increasing subsequence of .{sn }n .
Remark 5.1 Theorem 5.12 is an ingenious trick that essentially reduces the theory
of sequence in .R to the theory of monotone sequences. Unfortunately this trick relies
on the order property of the real numbers, and cannot be generalized to metric or
topological spaces.
We now prove the basic characterization of compact subsets of the real line.
Theorem 5.13 A set .K ⊂ R is sequentially compact if and only if it is closed and
bounded.
Proof Suppose K is sequentially compact. If K is not bounded, then there exists
x1 ∈ K such that .|x1 | > 1. Likewise, there exists .x2 ∈ K such that .|x2 | > 2.
.
In general, for each positive integer n there exists .xn ∈ K such that .|xn | > n.
From this we deduce that no converging subsequence of .{xn }n may exist. Indeed,
56 5 Distance, Topology and Sequences on the Set of Real Numbers
any converging subsequence .{xnk }k must be bounded by Theorem 5.5. But this
contradicts the fact that .|xnk | > k for each k. Hence K is bounded.
To prove that K is closed, we show that K contains its accumulation points. So,
let .{xn }n be a sequence in K that converges to a point x of .R. We have to show
that .x ∈ K. Since K is sequentially compact, there exists a subsequence .{xnk }k that
converges to a point .y ∈ K. As we have remarked above, there results .y = x since
.{xnk }k is a subsequence of .{xn }n . In particular .x ∈ K, and K is a closed set.
Conversely, suppose that K is closed and bounded, and let .{xn }n be a sequence
in K. Since K is bounded, by Bolzano-Weierstrass there exists a subsequence that
converges to a point x of .R. Since K is closed, .x ∈ K, and thus K is sequentially
compact. The proof is complete.
The definition of limit (Definition 5.7) has a big weakness: we can check if a
number is the limit of a sequence, but we need to have a good candidate at our
disposal. In the framework of real numbers with the Euclidean metric, the existence
of a limit can be ensured by a condition that only involves the terms of the sequence.
Definition 5.9 We say that a sequence .{sn }n of real numbers is a Cauchy sequence
if and only if for every .ε > 0 there exists a positive integer .n0 such that .n > n0 ,
.m > n0 imply .d(sn , sm ) < ε.
For each .n > n0 we have .|sn − L| ≤ |sn − snk0 | + |snk0 − L| < 2ε, and the proof is
complete.
Exercise 5.6 Suppose that .{sn }n is a sequence of real numbers such that .|sn+1 −
sn | < 2−n for each n. Prove that .{sn }n is a Cauchy sequence. Is this result true if we
suppose that .|sn+1 − sn | < 1/n for each n?
We complete this discussion with a few words about divergent sequences.
Definition 5.10 A sequence .{sn }n is divergent if it is not convergent.
This definition is more appreciated if one thinks that sequences may be more
general than sequences of real numbers. In Calculus courses, the term divergent
is usually associated to sequences that “have an infinite limit”. When we think of
sequences of terms in a general set, the meaning of infinity becomes hard to define,
if not impossible at all.
Definition 5.11 Let .s = {sn }n be a sequence of real numbers. We say that s diverges
to .+∞ if for every .ε > 0 there exists a positive integer .n0 such that .n > n0 implies
.sn > 1/ε. In this case we write .limn→+∞ sn = +∞, or simply .sn → +∞ as
.n → +∞.
Similarly, we say that s diverges to .−∞ if for every .ε > 0 there exists a positive
integer .n0 such that .n > n0 implies .sn < −1/ε. In this case we write .limn→+∞ sn =
−∞, or simply .sn → −∞ as .n → +∞.
We remark that the symbols .+∞ and .−∞ do not represent elements of a numeric
set. We will see that we could extend the set .R by adding them in such a way the
previous definition becomes a particular case of a general definition of limit for
sequences.
Theorem 5.15 For a sequence .{sn }n of positive real numbers, we have
. lim sn = +∞
n→+∞
if and only if
1
. lim = 0.
n→+∞ sn
Proof Let us suppose that our sequence diverges to .+∞. Let .ε > 0 and .M = 1/ε.
By assumption there exists a positive integer N such that .n > N implies .sn > M.
Therefore .n > N implies .ε > 1/sn > 0. This proves that .1/sn converges to zero.
58 5 Distance, Topology and Sequences on the Set of Real Numbers
Conversely, let .M > 0 and .ε = 1/M. Since .1/sn → 0, there exists a positive
integer N such that .n > N implies .1/sn < ε, or .sn > M. This concludes the proof.
Exercise 5.7 Suppose that there exists a positive integer .N0 such that .sn ≤ tn for
each .n > N0 .
(a) Prove that if .limn→+∞ sn = +∞, then .limn→+∞ tn = +∞.
(b) Prove that if .limn→+∞ tn = −∞, then .limn→+∞ sn = −∞.
(c) Prove that if .limn→+∞ sn and .limn→+∞ tn exist, then .limn→+∞ sn ≤
limn→+∞ tn .
Exercise 5.8 Suppose that .sn = 0 for each n, and that .L = limn→+∞ sn+1
sn
exists.
(a) Show that if .L < 1 then .limn→+∞ sn = 0. Hint: fix a number a such that
.L < a < 1, and obtain a positive integer N such that .|sn+1 | < a|sn | for each
We collect a few statements that follow from elementary estimates based on the
Binomial Theorem. The reader is invited to appreciate the proofs.
Proposition 5.4 If .p > 0 is a real number, then
1
. lim = 0.
n→+∞ np
Proof Given .ε > 0, just take .n > (1/ε)1/p . This is possible by the Archimedean
property of .R.
Proposition 5.5 If .p > 0 is a real number, then
√
. lim n
p = 1.
n→+∞
√
Proof If .p > 1, put .xn = n p−1. Clearly .xn > 0 and Theorem 3.8 yields .1+nxn ≤
(1+xn )n = p. Hence .0 < xn ≤ (p−1)/n, and the conclusion follows by squeezing.
If .p = 1 the conclusion is trivial. If .0 < p < 1, we set .q = 1/p > 1 and the claim
is reconducted to the previous case.
5.3 Lower and Upper Limits 59
Proposition 5.6
√
. lim n
n = 1.
n→+∞
√
Proof Let .xn = n
n − 1, so that .xn ≥ 0. By Theorem 3.8
n(n − 1) 2
n = (1 + xn )n ≥
. xn .
2
Hence
2
0 ≤ xn ≤
.
n−1
nα
. lim = 0.
n→+∞ (1 + p)n
n k n(n − 1) · · · (n − k + 1) k nk pk
(1 + p)n >
. p = p > k .
k k! 2 k!
Hence
nα 2k k!
0<
. < k nα−k
(1 + p) n p
for .n > 2k. Since .α − k < 0, we conclude by squeezing and Proposition 5.4.
Proposition 5.8 If x is a real number and .−1 < x < 1, then
. lim x n = 0.
n→+∞
We now try to describe the loss of convergence for real sequences. The question is:
why can a sequence be divergent?
60 5 Distance, Topology and Sequences on the Set of Real Numbers
Definition 5.12 Let .{an }n be a sequence of real numbers. If it is not bounded above,
we declare that
We say that a number M is an eventual upper bound [resp. lower bound] for the
sequence if there exists a positive integer .ν such that .an ≤ M [resp. .an ≥ M]
for each .n ≥ ν. The limsup of the sequence .{an }n is the infimum of the set .M of
eventual upper bounds:
In a similar fashion, the liminf of .{an }n is the supremum of the set .N of eventual
lower bounds:
Plainly .lim infn→+∞ an ≤ lim supn→+∞ an in any case. The inequality can be
strict: if .an = (−1)n , then .lim infn→+∞ an = −1 and .lim supn→+∞ an = 1.
Theorem 5.16 The sequence .{an }n converges to a limit L if and only if
Proof The cases .L ∈ {−∞, +∞} are clear by the initial definition. We now
focus on the case .L ∈ R. Assume that .an → L: given .ε > 0, there exists
a positive integer .ν such that .L − ε < an < L + ε for each .n > ν. Hence
.L + ε ∈ M, and .lim supn→+∞ an ≤ L + ε by definition of infimum. Similarly
lim supn→+∞ an = L.
Viceversa, assume that .lim infn→+∞ an = L = lim supn→+∞ an . Fix any .ε > 0.
We know that .L + ε ∈ M and .L − ε ∈ N. Hence there exists a positive integer .ν
such that for each .n > ν we must have .L − ε ≤ an ≤ L + ε. Therefore .an → L,
and the proof is complete.
Exercise 5.9 Prove that the sequence .{(−1)n }n does not converge.
We provide a useful characterization of liminf and limsup. Sometimes this is
taken as a definition.
5.3 Lower and Upper Limits 61
Proof The two statements are similar, and we prove the second one. We
set .λn = infk≥n ak , and remark that .λn+1 = inf{an+1 , an+2 , . . .} ≥
inf{an , an+1 , an+2 , . . .} = λn . The sequence .n → λn is thus increasing.
Furthermore, .ak ≥ λn for each .k ≥ n, so that .λn is an eventual lower bound.
By definition, .lim infn→+∞ an ≥ λn and finally .lim infn→+∞ an ≥ supn λn . We
need to prove the opposite inequality.
Fix any eventual lower bound .: there exists a positive integer .ν such that .ak ≥
for each .k ≥ ν. Hence . ≤ infk≥ν ak = λν ≤ supn λn . The element . ∈ N is
arbitrary, and so .lim infn→+∞ an ≤ supn λn .
Theorem 5.18 (Monotonicity of liminf and limsup) If .an ≤ bn for each n, then
lim infn→+∞ an ≤ lim infn→+∞ bn and .lim supn→+∞ an ≤ lim infn→+∞ bn .
.
and thus
an+1
.
a <L+ε
n
Hence
5.4 Problems
5.1 Let {sn }n be a set of real numbers. The arithmetic mean σn are defined by
s0 + s1 + · · · + sn
σn =
. .
n+1
1
n
.sn − σn = kak .
n+1
k=1
Suppose that limn→+∞ nan = 0 and that {σn }n converges. Prove that {sn }n
converges.
5.4 Problems 63
4. Prove the same statement as in 3. under the weaker assumption that {nan }n
is bounded. As a hint, you may use the following approach. Suppose that
limn→+∞ σn = σ and that |nan | ≤ M for each n. If m < n, then
m+1 1
n
sn − σn =
. (σn − σm ) + sj − σj .
n−m n−m
j =m+1
(n − j )M (n − m − 1)M
. |sj − σj | ≤ ≤ .
j +1 m+2
For each ε > 0 and each positive integer n, let m be the positive integer such that
n−ε
m≤
. < m + 1.
1+ε
√
5.2 Let b > 0 be a given real number. Choose any real number x1 > b, and
define recursively
1 b
xn+1 =
. xn + .
2 xn
√
√ {xn }n is decreasing, and that limn→+∞ xn =
1. Prove that the sequence b.
2. Define εn = xn − b, and prove that
εn2 ε2
. εn+1 = < √n .
2xn 2 b
√
Setting β = 2 b, deduce that
2n
ε1
εn+1 < β
.
β
for n = 1, 2, 3, . . .
This problem describes a numerical algorithm for computing the square root of a
given number.
64 5 Distance, Topology and Sequences on the Set of Real Numbers
√
5.3 Let b > 1 be a given real number. Choose any real number x1 > b, and
define recursively
b + xn b − xn2
xn+1 =
. = xn + .
1 + xn 1 + xn
.s1 = 0
s2n−1
s2n =
2
1
s2n+1 = + s2n .
2
5.5 Evaluate
n
1
. lim √ .
n→+∞
k=1 n2 + k
√
Hint: prove that the sum is smaller than 1 and larger than n/ n2 + n.
5.6 We define the sequence of Fibonacci numbers recursively by u0 = 0, u1 = 1
and
un+2 = un + un+1 .
.
5.5 Comments
Elementary Calculus can be introduced in a way that hides the topological (metric)
structure of the set .R. This approach, in my opinion, is too radical: the passage from
analysis in .R to analysis in .RN , .N ≥ 2, requires arguments that I prefer to introduce
from the first time. The topology of the real line coincides with the order topology
induced by the usual ordering .≤, but this is false in higher dimension. The fact that
intervals are basic examples of: open sets, convex sets, connected sets, relatively
compact sets should be seen both as a positive and a negative feature of .R.
The book [1] is a wonderful example of modern treatment of Calculus and
Analysis based on topological tools. In my opinion, this book remains a masterwork
in its field, although students may need to work hard before appreciating it.
Reference
Abstract Series are just a special type of sequences. The main feature of numerical
series is that they lead us to finding convergence theorems which do not involve the
value of the limit.
q
. an
n=p
to denote the finite sum .ap + ap+1 + · · · + aq−1 + aq . We use the sequence a to
construct a new sequence .s = {sn }n by means of the formula
n
sn =
. ak .
k=1
In a really formal world, a series should be defined asan ordered couple .(a, s)
such that a is a sequence, s is a sequence, and .sn = nk=1 ak for each .n ∈ N.
Remark 6.1 The language about series is very unprecise. In a completely rigorous
world, we should probably remove the word series and continue to use the word
sequence, as in
n k
consider the sequence . k=1 k 2 +1 n
.
N
Furthermore, several mathematicians interpret . ∞ n=1 an as .limN→+∞ n=1 an ,
which is either a real number of a symbol of infinity. Despite these difficulties,
tradition rules, and in this chapter we will freely abuse of language and define a
series with the symbol . n an .
Definition 6.1 We say that the series . ∞ n=1 an converges to s if .limn→+∞ sn = s.
In this case, we will often say that s is the sum of the series.
Remark 6.2 It should be clear that sequences and series are the same object.
Indeed, series are sequences by definition. Conversely, the sequence .{an }n can be
recovered from the sequence .{sn }n by writing .an = sn − sn−1 . Of course this logical
equivalence is not a good reason to forget about numerical series at all.
We will often write . n an or even . an to denote a series. We agree that the
first index of the sum may also be different than 1, as in . ∞ n=7 an . Clearly, the
convergence of a series does not depend on the first terms that we add or discard:
remember that the character of a sequence is not altered by the modification of
finitely many terms.
Example 6.1 Let us consider the series
∞
1
. .
n(n − 1)
n=2
Since
1 1 1
. = − ,
n(n − 1) n−1 n
we see that
n
1 1
sn =
. =1− →1
k(k − 1) n
k=2
m
. ak < ε
k=n
Theorem 6.2 Suppose that .an ≥ 0 for each n. The series . an converges if and
only if its partial sums form a bounded sequence.
Proof For a series of non-negative terms, we clearly have
sn+1 = sn + an+1 ≥ sn
.
70 6 Series
for every n. In other words, the sequence of partial sums is increasing. The
conclusion follows from Theorem 5.7.
The most important test of convergence is based on comparison. We will see that
actually all convergence tests are based on some comparison argument.
Theorem 6.3 (Comparison Test)
(a) If .|an | ≤ cn for
.n ≥ N0 , where .N0 is some fixed positive integer, and if . cn
converges, then . an converges aswell.
(b) If .an ≥ dn ≥ 0 for .n ≥ N0 , and if . dn diverges, then . an diverges as well.
Proof
(a) Given .ε > 0, there exists a positive .nm0 ≥ N0 such
m integer mthat .m ≥ n > n0
implies m
c ≤ ε. Hence a ≤ |a | ≤
. k=n k . k=n k k=n k k=n ck < ε.
(b) If . an converges, by (a) . dn converges. Contradiction.
An important corollary is described in the next result.
Theorem 6.4 (Asymptotic Comparison Test) Let . an and . bn be series of
positive terms, and suppose that
an
. lim = 1.
bn
n→+∞
The series . an converges if and only if . bn converges.
Proof Indeed, there exists a positive integer .N0 such that .1/2 < an /bn < 3/2 for
every .n > N0 . Hence . b2n < an < 32 bn for .n > N0 . The conclusion follows from the
Comparison test.
Example 6.3 The series
∞
1
.
n2
n=1
converges. Indeed,
1 1
. ≤
n2 n(n − 1)
leading us to the following definition via the Cauchy condition for convergence.
Definition 6.2 (Absolute Convergence) We say that the series . an converges
absolutely, if the series . |an | is convergent.
An easy but not trivial consequence of (6.1) is the next result.
Theorem 6.5 Every absolutely convergent series is convergent.
Proof Let . an be an absolutely convergent series. By (6.1), the series . an
satisfies the Cauchy condition, and is therefore convergent.
The converse is false, as Exercise 6.4 shows.
Theorem 6.2 says that series of positive terms are somehow easier to deal with, since
no oscillation phenomenon can arise. In this section we develop several convergence
tests for positive series, i.e. series of positive terms.
n
1 − x n+1
. xk = .
1−x
k=0
72 6 Series
Indeed
(1 − x)(1 + x + x 2 + · · · + x n ) = 1 + x − x + x 2 − x 2 + · · · + x n − x n − x n+1
.
= 1 − x n+1 .
(1 − x)(1 + x + x 2 + · · · + x n ) = 1 − x n+1
.
by induction.
The following test is usually a difficult one for students. It states a rather
surprising fact: under a monotonicity assumption, only those terms of a very
particular subsequence decide whether a series converges.
Theorem 6.7 (Condensation Test) Suppose that .a1 ≥ a2 ≥ a3 ≥ . . . ≥ 0. The
series . ∞ ∞ k
n=1 an is convergent if and only if the series . k=0 2 a2k is convergent.
Proof It suffices to prove that the partial sums of the two series are simultaneously
bounded from above. Set
sn = a1 + · · · + an
.
≤ a1 + 2a2 + · · · + 2k a2k
= tk
by the monotonicity of .{an }n . Notice that we have grouped terms in blocks that
begin with a power of 2 and end one step before the subsequent power of 2. We
deduce that .sn ≤ tk .
On the other hand, if .2k < n, we group terms in a different way:
In this case, .tk ≤ 2sn . In any case the sequences .{sn }n and .{tk }k are both bounded or
unbounded above, and the proof is complete.
Example 6.4 As a fundamental application, we consider the generalized harmonic
series
∞
1
. ,
np
n=1
where p is a fixed real number. Clearly .p ≤ 0 implies divergence of the series, since
the general term does not converge to zero. For .p > 0 we use the condensation test,
and look at the series
∞
∞
1
. 2k k p
= 2(1−p)k .
(2 )
k=0 k=0
This is a geometric series, and we know that the latter series converges if and only
if .21−p < 1, i.e. .p > 1.
We propose the following tests for historical reasons. They are based on a
comparison with a geometric series, and we will comment on the weakness of these
tests after the proof.
Theorem 6.8 (Root and Ratio Tests) The series . an
√
(a) converges, if .lim supn→+∞√n |an | < 1;
(b) diverges, if .lim supn→+∞ n|an | > 1;
(c) converges, if .lim supn→+∞ an+1 an < 1;
(d) diverges, if . an+1
an ≥ 1 for each .n ≥ n0 , where .n0 is some fixed positive integer.
√
Proof Put .α = lim supn→+∞ n |an |. If√.α < 1, we can choose .β such that .α < β <
1, and a positive integer N such that . n |an | < β for each .n ≥ N. Hence .n ≥ N
implies .|an | < β n√. Since .β < 1, the comparison test leads to (a).
If .α > 1, then . n |an | > 1 for infinitely many indices n (otherwise 1 would be an
eventualupper bound). This prevents .an from converging to 0 as .n → +∞, and the
series . an is divergent. This proves
(b).
Suppose that .lim supn→+∞ an+1 an < 1: we can find .β < 1 and a positive integer
N such that . an+1
an < β for each .n ≥ N. In particular
for each .n ≥ N. Again (c) follows from the comparison theorem. Finally, if .|an+1 | ≥
|an | for .n ≥ n0 , then the condition .an → 0 fails, and the series . an is divergent.
1
The root and the ration tests are popular but weak. We know that the series . n
diverges while . n12 converges. The ratio and the root tests are both inconclusive,
since the limsup equals 1.
Remark 6.5 It follows from Theorem 5.19 that the root test is stronger than the ratio
test. In particular, if the root test is inconclusive, the ratio test must be inconclusive
as well.
Example 6.5 Consider the series . n n2n+3 . If we put .an = n2n+3 , there results
an+1 n + 1 n2 + 3
. = .
an n n2 + 2n + 4
√
We deduce that .limn→+∞ |an+1 /an | = 1. Similarly, .limn→+∞ n |an | = 1. The
root test and the ratio test are inconclusive, although the series is divergent by
comparison:
n 1
an ≥
. = .
n2 + 3n2 4n
The typical Calculus approach to the definition of the number e is via the
“fundamental limit”
n
1
. lim 1+ .
n→+∞ n
6.2 Euler’s Number as the Sum of a Series 75
Unfortunately the existence of this limit is not straightforward. In the next theorem
we propose a different approach.
Theorem 6.9 (The Euler Number) The series . ∞ 1
n=0 n! converges to a
limit that is
n
denoted by e and called the Euler number. Furthermore, .e = limn→+∞ 1 + n1 .
Proof Recall that .0! = 1 and, for any positive integer n, the factorial of n is defined
as .n! = 1 · 2 · · · (n − 1)n. Since
1 1 1 1
sn = 1 +
. + + + ··· +
1 1·2 1·2·3 1 · 2···n
1 1 1
< 1 + 1 + + 2 + · · · + n−1 < 3,
2 2 2
the series . ∞ 1
n=0 n! converges to a limit .e < 3. To prove the second part, we
introduce the sequences
n
1 1 n
sn =
. , tn = 1 + .
k! n
k=0
n
n n−k k
n
n!
(a + b)n =
. a b = a n−k bk
k k!(n − k)!
k=0 k=0
yields
1 1 1 1 2
tn = 1 + 1 +
. 1− + 1− 1− + ···
2! n 3! n n
1 1 2 n−1
+ 1− 1− ··· 1 − .
n! n n n
1 1 1 1 m−1
tn ≥ 1 + 1 +
. 1− + ··· + 1− ··· 1 − ,
2! n m! n n
so that .sm ≤ lim infn→+∞ tn for any m. Letting .m → +∞, .e ≤ lim infn→+∞ tn ,
and the proof is complete.
∞ 1
The definition .e = n=0 n! is rather flexible, and allows us to derive a theoretical
property of the Euler number.
Theorem 6.10 The number e is irrational.
76 6 Series
Proof We begin with an estimate of the convergence of the series . 1/n! to e.
Letting .sn denote the n-th partial sum of this series, we have
1 1 1
e − sn =
. + + + ···
(n + 1)! (n + 2)! (n + 3)!
1 1 1
< 1+ + + ···
(n + 1)! n + 1 (n + 1)2
1
= .
n!n
1
Therefore .0 < e−sn < n!n for each positive integer n. Now suppose that .e = p/q is
a rational number, where p and q are positive integers. Then .0 < q!(e − sq ) < 1/q.
The number .q!e must be an integer, since e is rational. Also
1 1
q!sq = q! 1 + 1 +
. + ···+
2! q
The reader should suspect that a complete analysis of series whose terms do not have
constant sign is out of reach. In this section we focus our attention on a particular
class of series of variable sign. We begin with a general result which reminds us of
the popular formula of integration by parts.
p
q−1
. an bn = An (bn − bn+1 ) + Aq bq − Ap−1 bp .
n=p n=p
p
q
q
q−1
. an bn = (An − An−1 ) bn = An bn − An bn+1 .
n=p n=p n=p n=p−1
6.3 Alternating Series 77
The last difference is equal to . q−1
n=p An (bn − bn+1 ) + Aq bq − Ap−1 bp , and the
proof is complete.
Theorem 6.11 (Dirichlet’s Test) Suppose
(a) the partial sums .An of . an form a bounded sequence;
(b) .b0 ≥ b1 ≥ b2 ≥ . . .;
(c) .limn→+∞ bn = 0.
Then the series . an bn is convergent.
Proof There exists .M > 0 such that .|An | ≤ M for each n. Let .ε > 0, and
pick a positive integer .ν such that .bν ≤ ε/(2M). For .ν ≤ p ≤ q we have by
Proposition 6.1
q q−1
. an bn ≤ An (bn − bn+1 ) + Aq bq − Ap−1 bp
n=p n=p
q−1
≤ M (bn − bn+1 ) + bq + bp
n=p
= 2Mbp ≤ 2Mbν ≤ ε.
The series . an bn converges by the Cauchy theorem.
Choosing .an = (−1)n+1 and .bn = |cn | in the previous theorem yields a popular
test for alternating series.
Theorem 6.12 (Leibnitz Theorem for Alternating Series) Suppose that
(a) .|c1 | ≥ |c2 | ≥ |c3 | ≥ . . .
(b) .c2m−1 ≥ 0, .c2m ≤ 0 for .m = 1, 2, 3, . . .
(c) .limn→+∞ cn = 0
Then the series . cn is convergent.
Exercise 6.4 Prove that the series . ∞ (−1)n
n=1 n converges, but it does not converge
absolutely. This fact often seems to be surprising, but we must remember that the
factor .(−1)n contributes to a huge balancing of the terms in the series.
Definition
6.3 (Cauchy Product
of Two Series) The Cauchy product of the series
. an and . bn is the series . cn defined by
n
cn =
. ak bn−k .
k=0
Remark 6.6 Properly speaking, the Cauchy product of two series is a discrete
convolution product. Since we do not assume the reader to be familiar with integral
convolutions, we will not use this language in the book.
The convergence of a product of two series is a delicate issue. Consider for
example the series
∞
(−1)n 1 1 1
. √ = 1 − √ + √ − √ + ···
n=0
n+1 2 3 4
Convergence follows from Theorem 6.12. Let us now multiply this series by itself,
obtaining
∞
1 1 1 1 1
. cn = 1 − √ +√ + √ +√ √ +√ + ···
n=0
2 2 3 2 2 3
∞
n
1
= (−1)n √ .
n=0 k=0
(n − k + 1)(k + 1)
But
n 2 n 2 n 2
(n − k + 1)(k + 1) =
. +1 − −k ≤ +1 ,
2 2 2
and
n
2 2(n + 1)
|cn | ≥
. = .
n+2 n+2
k=0
Since the necessary condition .cn → 0 is violated, the series . cn must diverge.
Here comes the basic convergence result about the product of convergent series.
Theorem 6.13 (Mertens) Suppose that
(a) .∞ n=0 an converges absolutely
(b) .∞ n=0 an = A
(c) . ∞ n=0bn = B
(d) .cn = nk=0 ak bn−k .
6.4 Problems 79
Then . ∞
n=0 cn converges.
Proof We follow [1], and set
n
An =
. ak
k=0
n
Bn = bk
k=0
n
Cn = ck
k=0
βn = Bn − B.
We compute
= a0 Bn + a1 Bn−1 + · · · + an B0
= a0 (B + βn ) + · · · + an (B + β0 )
= An B + a0 βn + a1 βn−1 + · · · + an β0 .
Since .lim supn→+∞ (β0 an + · · · + βn an−ν ) = 0, we find .lim supn→+∞ |γn | ≤ εα,
and the conclusion follows.
6.4 Problems
6.2 Let {an }n be a sequence with the property that there exists a real number h < 1
such that |an+1 − an | ≤ h|an − an−1 | for each n. Prove that the sequence converges.
6.3 Using the previous problem, show that the sequence defined by choosing any
two real numbers a1 and a2 , and defining
an−1 + an
an+1 =
.
2
converges. Compute its limit.
∞
6.4 Let {an }n be a sequence of positive
realannumbers. Prove that the series n=1 an
converges if and only if the series ∞ n=1 1+an converges.
6.5 Starting from
∞
1
. = xn
1−x
n=1
∞
1
. = nx n−1
(1 − x) 2
n=1
6.5 Comments
Once upon a time, the treatment of numerical series used to fill up long chapters in
Calculus textbooks. As I have tried to show, the theory of series is indeed a long
collection of sufficient conditions for the convergence of particular sequences of
numbers. In recent years this awareness has become prevalent, and we no longer
annoy our students with awful convergence tests. Last but not least, many of these
tests are based on the algebraic properties of real numbers, and they do not extend
to series of complex numbers, for instance.
Reference
Abstract From a really abstract point of view, the whole theory of limits for
functions of a real variable is an immediate consequence of the theory of limits
for sequences. Furthermore, the definition of limit for a real-valued function can
be easily reduced to a continuity request. In this chapter we introduce limits via
sequential limits, and prove a standard characterization in terms of neighborhoods.
f (x) → L as x → p,
.
or
. lim f (x) = L.
x→p
Remark 7.1 Since .p ∈ E % , at least one sequence of points .xn from E that converge
to p while being always different than p exists. We point out that L has nothing to
do with the value .f (p), and indeed p need not even belong to the domain E of f .1
Figure 7.1 shows a typical example of a function with a finite limit. We now
introduce two more possible definitions of limits, and we finally prove that these
definitions are indeed equivalent.
1 We follow here the traditional definition of limit. In the recent French tradition, the condition
.xn = p is omitted. We will come back to this discussion later.
− +
Definition 7.2 (.ε-.δ) Let E be a nonempty subset of .R, and let p be an accumulation
point of E. We say that a number .L ∈ R is the limit of .f (x) as .x → p, if for every
.ε > 0 there exists .δ > 0 such that .x ∈ E and .0 < |x −p| < δ imply .|f (x)−L| < ε.
Definition 7.3 (Topological Limit) Let E be a nonempty subset of .R, and let p be
an accumulation point of E. We say that a number .L ∈ R is the limit of .f (x) as
7 Limits: From Sequences to Functions of a Real Variable 83
that .f (U \ {p}) ⊂ V .
Theorem 7.1 Definitions 7.1, 7.2, and 7.3 are logically equivalent.
Proof Since any neighborhood of a point x contains an interval of the form
.(x − r, x + r) for some .r > 0, it is clear that Definitions 7.2 and 7.3 are logically
equivalent statements. We prove that Definitions 7.1 and 7.3 are equivalent. Suppose
that .f (x) → L as .x → p in the sense of Definition 7.3, and let .{xn }n be a sequence
of points from E such that .xn = p for each n, and .xn → p. If V is any neighborhood
of L, we choose a neighborhood U of p as in Definition 7.3. There exists a positive
integer .n0 such that .n > n0 implies .xn ∈ U \ {p}. Then .f (xn ) ∈ V for .n > n0 , and
this proves that .f (xn ) → L as .n → +∞.
Conversely, suppose that Definition 7.3 fails to hold. Then there exists some
neighborhood V of L with the property that no neighborhood U of p satisfies the
condition of Definition 7.3. Fixing such V , it follows that for each positive integer
n there is some .xn ∈ E such that .|xn − p| < 1/n, .xn = p, and .f (xn ) ∈ / V . Thus
.{xn }n is a sequence in .E \ {p} that converges to p, but the sequence .{f (xn )}n does
f (x) if x = p
f˜(x) =
.
L if x = p
is continuous at p.
84 7 Limits: From Sequences to Functions of a Real Variable
points in .(a, +∞) such that .xn → +∞ as .n → +∞, there results .f (xn ) →
L as .n → +∞. In this case we write .f (x) → L as .x → +∞, or briefly
.limx→+∞ f (x) = L.
• Suppose that E contains a half-line .(−∞, b) for some b. We say that the function
.f : E → R converges to .L ∈ R as .x → −∞, if for any sequence .{xn }n of
2 Too many students tend to believe that this is indeed the general case. No, it isn’t.
7 Limits: From Sequences to Functions of a Real Variable 85
Similar definitions may be provided to describe the fact that .limx→+∞ f (x) =
+∞ or .limx→+∞ f (x) = −∞ and so on. The details are left to the reader.
Dealing with several different definitions of limits may seem troublesome. An
interesting way out consists in extending the set .R so that it contains .+∞ and .∞.
Definition 7.6 We call the set .R∗ = R ∪ {−∞, +∞} the extended real line. A
neighborhood of .−∞ is any set of the form .(−∞, b) for some .b ∈ R. Analogously,
a neighborhood of .+∞ is any set of the form .(a, +∞) for some .a ∈ R. If .E ⊂ R∗
and .p ∈ R∗ , we say that p is an accumulation point of E if any neighborhood of p
contains a point of E, different than p itself.
The extended real line allows us to summarize a all the possible definitions of
limit into a single topological definition.
Definition 7.7 (Limits in .R∗ ) Let .E ⊂ R∗ , let .p ∈ R∗ an accumulation point of E,
and let .L ∈ R∗ . We say that .f (x) tends to L as .x → p, if for every neighborhood
V of L in .R∗ there exists a neighborhood U of p in .R∗ such that .f (U \ {p}) ⊂ V .
Arithmetic operations in the extended real line present some difficulties. Indeed,
2
it is clear that .limx→+∞ xx = 1, while .limx→+∞ xx = +∞ and .limx→+∞ xx2 = 0.
There is no hope to define sums and products in .R∗ without any exceptional case.
The algebra of limits is completely satisfactory only for finite limits.
Theorem 7.4 (Algebra of Finite Limits) Retain the assumptions of Definition 7.7.
If
. lim f (x) = L ∈ R
x→p
lim g(x) = M ∈ R,
x→p
then
and
f (x) L
. lim =
x→p g(x) M
provided that .M = 0.
Proof The conclusion follows from the corresponding statements for sequences,
see Theorem 5.6.
86 7 Limits: From Sequences to Functions of a Real Variable
is the set .R∗ . On the contrary the algebra of limits describes an interplay of
topology with the algebraic structure of real numbers. As a matter of facts, we
cannot define algebraic operations with .±∞ without losing some properties.
To rephrase this, there is no value of .∞/∞ or of .0 · ∞ that is compatible with
our definition of limits.
Clearly enough, many properties of limits can be deduced from similar properties of
limits for sequences. We just state a few important results that the reader may have
studied in Calculus courses.
Theorem 7.5 (Limits and Order) Let .E ⊂ R, .p ∈ R∗ be an accumulation point
of E, and .L ∈ R.
(a) If .L > 0, there exists a neighborhood U of p such that .f (x) > 0 for each
.x ∈ U \ {p}.
(b) If .f (x) > 0 for each .x = p that belongs to some neighborhood of p, then
.L ≥ 0.
Proof We first deal with the case .L ∈ R, and we use Definition 7.2. To prove (a), we
select .ε = L/2 so that a neighborhood U of p exists such that .L − L/2 < f (x) <
L + L/2 for each .x ∈ U \ {p}. This shows that .f (x) > L/2 > 0 for such values of
x. If (b) were false, then .L < 0. Applying (a) to .−f would yield a neighborhood of
p in which .−f would be positive, i.e. f would be negative. This is a contradiction.
The case .L = +∞ is easier. Indeed, For each .M > 0 there exists a neighborhood
U of p such that .f (x) > M for each .x ∈ U \ {p}. Conclusion (b) follows again by
(a) as above.
Since limits and continuity are related to the local behavior of functions, it seems
rather natural to classify functions according to their asymptotic properties around
a point.
Definition 7.8 Let I be an interval, let .c ∈ I or .c ∈ {−∞, +∞} (in case I is
unbounded), and let f , g be two functions defined on I . We say that f and g are
equivalent at c if and only if there exists a function .u : I → R such that .f (x) =
u(x)g(x) for each .x ∈ I and .limx→c u(x) = 1. In this case we write .f ∼ g as
.x → c.
f (x)
. lim = 1.
x→c g(x)
Theorem 7.7 Let .F(c) be the set of all functions defined on I . Then .∼ is an
equivalence relation on .F(c).
Proof For each .f ∈ F(c), .f ∼ f in a trivial way. Next, if .f ∼ g, there exists u
such that .f = ug on I , with .u(x) → 1 as .x → c. In particular .u = 0 near c, and
thus .g = (1/u)f near c. Hence .g ∼ f . To conclude, if .f ∼ g and .g ∼ h, we can
find functions u and v such that .f = ug and .g = vh with .u(x) → 1, .v(x) → 1 as
.x → c. Then .f = uvh and .u(x)v(x) → 1 as .x → c. Hence .f ∼ h.
Exercise 7.1 Suppose that .limx→c f (x) = limx→c g(x) = y0 = 0. Prove that
f ∼ g as .x → c. Show with a counterexample that the condition .y0 = 0 is essential.
.
Exercise 7.2 Suppose that g does not vanish in a neighborhood of c. Prove that
f = o(g) if and only if
.
f (x)
. lim = 0.
x→c g(x)
ex − 1
. lim = 1,
x→0 x
Indeed,
f1 (x)
f (x) + f1 (x) f (x) 1 + f (x)
. lim = lim .
x→c g(x) + g1 (x) x→c g(x) 1 + g1 (x)
g(x)
This is often called the principle of negligible terms. Its use in computing limits
is ubiquitous.
7.3 Comments
We will discuss again the definition of limit in the chapter about topology. For the
moment, I point out that our definition remains the most common in contemporary
literature, although some alternatives actually exist. In particular, a few authors
propose the following variant:
.limx→p f (x) = q if and only if for every .ε > 0 there exists .δ > 0 such that .|x − p| < δ
implies .|f (x) − q| < ε.
3We assume that the reader is familiar with a few limits that involve the elementary functions.
Formal proofs will be given later on, when we discuss these functions from an advanced viewpoint.
90 7 Limits: From Sequences to Functions of a Real Variable
The difference is that the case .x = p is not excluded. It is therefore clear that the
condition .f (p) = q is a necessary condition for the existence of the limit with this
definition. If you like this approach, you should always remember that limits are no
longer independent of the value of the functions at the point p.
Chapter 8
Continuous Functions of a Real Variable
Abstract Calculus students tend to believe that continuous functions are those
functions which “vary a little when the independent variable varies a little.” In this
chapter we define continuity in a rigorous way, and we invite the readers to convince
themselves that the previous sentence is actually false.
The second case is .p ∈ E but not an accumulation point of E. This means that p is
isolated, in the sense that there exists a neighborhood U of p such that .U ∩E = {p}.
In this situation, the continuity of f at p is always granted. Indeed, if .ε > 0, we
may choose .δ > 0 so small that the condition .|x − p| < δ is satisfied only by .x = p,
and therefore .|f (x) − f (p)| = |f (p) − f (p)| = 0 < ε.
Remark 8.1 Most Calculus books propose .limx→p f (x) = f (p) as the definition
of continuity, but this is equivalent to ours only under the assumption that p is an
accumulation point of the domain of f .
Recalling Theorem 7.1 and Definition 7.4, we may state
Theorem 8.1 Let .f : E → R. The function f is continuous at the point .p ∈ E if
and only if one of the following conditions is met:
(i) for every .ε > 0 there exists .δ > 0 such that .x ∈ E and .|x − p| < δ imply
.|f (x) − f (p)| < ε;
(iii) for every sequence .{xn } of point from E that converges to p, there results
.f (xn ) → f (p) as .n → +∞.
The word discontinuity is used as the negation of continuity. In this book we will
not enter into the troublesome classification of discontinuity points, since we believe
that this is of little interest.
Exercise 8.1 Let .f : E → R be a function defined on .E ⊂ R. Prove that f is
continuous at .x0 ∈ E if and only if for every monotonic sequence .{xn }n converging
to .x0 , we have .f (xn ) → f (x0 ). Hint: use Theorem 5.12.
Example 8.1 Let .E = R, .p = 0 and
x
|x| if x = 0
f (x) =
.
0 if x = 0.
Consider the two sequences defined by .xn = 1/n and .yn = −1/n for .n ∈ N. Then
we form the sequence .{zn }n by the rule
x1 , y1 , x2 , y2 , . . . , xn , yn , . . .
.
1, −1, 1, −1, . . .
.
0 if 0 ≤ x < 1
f (x) =
.
1 if 1 < x ≤ 2.
For each .x ∈ [0, 1) ∪ (1, 2], the function f is continuous at x. However, if x is very
close to 1 but smaller than 1, the value of .f (x) is zero. If x is very close to 1 but
larger than 1, the value of .f (x) is 1. We are allowed to say that f changes a lot when
x is varied a little! In other words, there is much more in the definition of continuity
than the basic idea of “a little on the x-axis becomes a little on the y-axis.”
8 Continuous Functions of a Real Variable 93
The algebraic rules for computing (finite) limits immediately implies the follow-
ing result.
Theorem 8.2 Suppose that .f : E → R and .g : E → R are continuous at .p ∈ E.
Then
(i) .x → kf (x) is continuous at p for every .k ∈ R;
(ii) .x → f (x) + g(x) is continuous at p;
(iii) .x → f (x)g(x) is continuous at p;
(iv) .x → f (x)/g(x) is continuous at p, provided that the quotient is defined.
As a consequence, any polynomial function .x → a0 + a1 x + a2 x 2 + · · · + an x n
is continuous on .R, as the sum of continuous functions.
Example 8.3 We consider the function .h(x) = [x], where .[x] denotes the largest
integer .n ∈ Z such that .n ≤ x. Given a point .p ∈ Z, we consider the sequence
.xn = p − 1/n. Clearly .xn → p, but .h(xn ) → p − 1, which is different than
is continuous at p.
The previous theorem is indeed a particular case of the following formula about
change of variables in limits.
Theorem 8.4 (Changing Variables in Limits) Let D and E be subsets of .R∗ =
R ∪ {−∞} ∪ {+∞}, let c and p be accumulation points of D and E respectively.
Let .f : D → R∗ be a function, and let .ϕ : E → D be a bijective function such that
Under these assumptions, .limx→c f (x) exists in .R∗ if and only if .limt →p f ◦ ϕ(t)
exists in .R∗ , and in such a case these limits coincide.
Proof We refer to Fig. 8.1. Let us assume that .L = limx→c f (x) exists in .R∗ . Let
V be a neighborhood of L in .R∗ , and let U be a neighborhood of c in .R∗ such that
.f (U ∩ D \ {c}) ⊂ V . Furthermore, given the neighborhood U of c there exists a
ex − 1
. lim =1
x→0 x
and
log(x + 1)
. lim = 1.
x→0 x
We claim that the set .{x ∈ R | ωf (x) < ε} is open for each .ε > 0. Indeed, we
suppose that .ωf (x0 ) < ε, and we must prove that the same inequality holds in an
open interval containing .x0 . By definition, there exists .δ0 such that
For every .x ∈ (x0 − δ/2, x0 + δ/2) we have .(x0 − δ/2, x + δ/2) ⊂ (x0 − δ, x0 + δ),
and therefore
Exercise 8.2 Prove that a function f is continuous at a point .x0 if and only if
ωf (x0 ) = 0.
.
The most interesting properties of continuous functions are related to the compact-
ness of their domains.
Theorem 8.5 (Preservation of Compactness) Let .f : E → R be a continuous
function, and suppose that K is a (sequentially) compact subset of A. Then .f (K) is
a (sequentially) compact subset of .R.
Proof Let .{yk }k be a sequence in the range .f (K). Hence for each positive integer k,
there exists an element .xk ∈ K such that .f (xk ) = yk . The sequence .{xk }k possesses
a converging subsequence .{xnk }k . Call x its limit. By continuity of f at x, there
results .f (x) = limk→+∞ f (xnk ). Hence .ynk → f (x) as .k → +∞.
96 8 Continuous Functions of a Real Variable
2
2 1
.|f (xn ) − f (yn )| = n − n+
n
1
= 2 + 2 .
n
.|ynk − x| = |ynk − xnk + xnk − x| and use the triangle inequality). Since f is
Now, the value of .f˜(a) depends on the choice of the sequence .{xn }n . But this is not
the case. Indeed, let .{yn }n be another sequence in .(a, b) such that .yn → a. Then the
sequence .{un }n defined by
{x1 , y1 , x2 , y2 , . . . , xn , yn , . . .}
.
The next results are of topological nature, and in a perfect mathematical world they
should be obtained from the properties of connected sets. We present them in this
chapter, and we invite the interested reader to think back of them after studying the
chapter on General Topology.
Theorem 8.10 Let .f : [a, b] → R be a continuous function. If L is a real number
with either .f (a) < L < f (b) or .f (a) > L > f (b), then there exists a point
.c ∈ (a, b) such that .f (c) = L.
case being similar. Therefore we look for a point .c ∈ (a, b) such that .f (c) = 0. Let
us introduce the set
K = {x ∈ [a, b] | f (x) ≤ 0} .
.
.K ⊂ [a, c−δ], against the definition of c as the supremum of K. By the same token,
one excludes the case .f (c) < 0, and the conclusion follows.
Exercise 8.4 In this exercise we propose a second proof of Theorem 8.10. Suppose
.L = 0 and .f (a) < 0 < f (b). Define .I0 = [a, b] and consider the mid-point
.z = (a + b)/2. If .f (z) ≥ 0, set .a1 = a and .b1 = z. If .f (z) < 0, set .a1 = z and
.b1 = b. This gives rise to an interval .I1 ⊂ I0 . Use this scheme and the principle of
sin(1/x) if x = 0
.g(x) =
0 otherwise.
8.3 Continuous Invertible Functions 99
For each .L ∈ [−1, 1], the equation .sin z = L possesses infinitely many solutions
z = 0, and therefore g has the intermediate value property. However g is not
.
continuous at 0 (why?).
Theorem 8.11 If f is increasing on .[a, b] and satisfies the intermediate value
property, then f is continuous on .[a, b]. The same results holds for a decreasing
function.
Proof We consider a point .c ∈ (a, b): the continuity of f at a and b is left as an
exercise. By monotonicity,
and both limits are finite. Again by monotonicity, .f (c−) ≤ f (c+). We need to
prove that actually .f (c−) = f (c+). We suppose on the contrary that .f (c−) <
f (c+), and pick a number L such that .f (c−) < L < f (c+). We have now two
cases: (i) if .f (c) = L, we reach a contradiction with the intermediate value property.
If (ii) .f (c) = L, we select .L% = L and fall into case (i). The proof is complete.
A word of warning: the results of this section are typical of functions of a single
variable. We will see in the chapter on General Topology that the continuity of an
inverse function is not for free. Here the algebraic properties of the real line add a
remarkable richness.
Example 8.8 Let A be a subset of .R, and let f be an injective continuous function
from A to .R. We show that .f −1 need not be continuous. Consider .A = [0, 1]∪[2, 3]
and
x if 1 ≤ x ≤ 2
f (x) =
.
x − 1 if 2 ≤ x ≤ 3.
y if 0 ≤ y ≤ 1
f −1 (y) =
.
y + 1 if 1 ≤ y ≤ 2.
.f (x) < f (b1 ) for every .x < b1 , hence .f (x1 ) < f (b1 ). Finally, .f (x1 ) < f (x) for
every .x > x1 , and in particular .f (x1 ) < f (x2 ). The claim is proved.
If .f (a) > f (b), in a similar way we prove that f is monotonically decreasing
on I . The proof is complete.
Theorem 8.13 Let g be a function defined on an interval J , and monotonic on J .
The function g is continuous on J if and only if the range .g(J ) is an interval.
Proof Without loss of generality we assume that g is increasing on J . If g is
continuous, then .g(J ) is an interval by Theorem 8.10.
We suppose now that .g(J ) is an interval. If g is discontinuous at some point
.x0 ∈ J , then
For any .x < x0 we have .g(x) < l, while for any .x > x0 we have .g(x) > L. Among
all points of .(l, L), at most one can belong to .g(J ), a set that contains both points
smaller than l and points larger than L. This contradicts the assumption that .g(J ) is
an interval. The proof is complete.
Theorem 8.14 (Continuity of the Inverse Function) If a function f is continuous
on an interval I and invertible, then .f −1 is continuous on .f (I ).
Proof By Theorem 8.12 f is strictly monotonic on I . By Theorem 8.10 .J = f (I )
is an interval. The function .f −1 is defined on J and is monotonic on J . Its range is
I . By Theorem 8.13 .f −1 is therefore continuous.
Example 8.9 The previous results justify several elementary statements. For
instance, the continuity of the exponential function .x → ex is equivalent to the
continuity of the logarithm function .x → log x, since they both map an interval into
an interval.
Important: Warning
We will not spend too much time in proving the continuity of elementary functions.
Most traditional proofs are actually circular, the reason being the lack of a rigorous
8.4 Problems 101
definition of the functions themselves. Consider the following sketch of a proof that
x → ex is continuous at .x = 0. For each .ε > 0, the inequalities
.
1 − ε < ex < 1 + ε
.
is equivalent to
8.4 Problems
lim inf f (x) = sup {inf {f (x) | 0 < |x − y| < δ} | δ > 0.}
x→y
Prove the following statements, and conjecture similar statements for lim inf.
(a) lim supx→y f (x) ≤ A if and only if for every ε > 0 there exists δ > 0 such that
0 < |x − y| < δ implies f (x) ≤ A + ε.
(b) lim supx→y f (x) ≥ A if and only if for every ε > 0 and for every δ > 0 there
exists a point x such that 0 < |x − y| < δ and f (x) ≥ A − ε.
(c) lim infx→y f (x) ≤ lim supx→y f (x) with equality if and only if limx→y f (x)
exists.
(d) If lim supx→y f (x)= A and if {xn }n converges to y, then lim supn→+∞ f (xn ) =
A.
(e) If lim supx→y f (x) = A, then there exists a sequence {xn }n such that xn → y
and A = limn→+∞ f (xn ) = A.
8.2 A function f : R → R is homogeneous of degree one if f (λx) = λf (x) for
each x ∈ R and each λ ∈ R. Prove that f is continuous.
102 8 Continuous Functions of a Real Variable
0 if x < bn
fn (x) =
.
an if x ≥ bn .
Let f (x) = n fn (x) for all x. prove that
1. f is non-decreasing.
2. f is discontinuous on the set A = {bn | n ∈ N}.
3. f is continuous on R \ A.
8.4 Let f and g be real-valued functions, uniformly continuous on a set A.
1. Prove that f + g is uniformly continuous on A.
2. Prove that f ◦ g is uniformly continuous on g(A) ∩ A.
3. Prove that fg is uniformly continuous if A = [a, b]. Give a counterexample if A
is not a compact interval.
8.5 Let A be a non-empty subset of R. A function fA : R → R is defined by
fA (x) = inf {|x − a| | a ∈ A}. Prove that fA is uniformly continuous on A.
8.6 Let K be a compact subset of R, and let f : K → R be a continuous function.
Prove that for each ε > 0 there exists M ∈ R such that |f (x)−f (y)| ≤ M|x −y|+ε
for each x, y ∈ K.
8.7 A function f : [0, 1] → R is upper semicontinuous if given x ∈ [0, 1] and
ε > 0 there exists δ > 0 such that |y − x| < δ implies f (y) < f (x) + ε. Prove
that an upper semicontinuous function on [0, 1] is bounded above and attains its
maximum value at some point of [0, 1].
Chapter 9
Derivatives and Differentiability
The number A is called the derivative of f at .x0 , and is denoted by any of the
symbols
f % (x0 ),
. Df (x0 ), df (x0 ), f˙(x0 ).
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 103
S. Secchi, A Circle-Line Study of Mathematical Analysis,
La Matematica per il 3+2 141, https://doi.org/10.1007/978-3-031-19738-3_9
104 9 Derivatives and Differentiability
that
f (x) − f (x0 )
. lim = A. (9.2)
x→x0 x − x0
f (x) − f (x0 )
. lim = A,
x→x0 x − x0
then
f (x) − f (x0 )
. = A + o(1) as x → x0 ,
x − x0
is the most convenient characterization of the derivative for proving the chain rule.
We record a third definition of the derivative in terms of continuous functions.
Theorem 9.2 A function .f : (a, b) → R is differentiable at .x0 ∈ (a, b) if and only
if there exists a continuous function .ω : (a, b) → R such that
f (x) − f (x0 )
x →
. , x = x0
x − x0
9.1 Rules of Differentiation, or the Algebra of Calculus 105
can be extended at .x = x0 continuously. Of course this is true if and only if the limit
f (x) − f (x0 )
. lim
x→x0 x − x0
exists as a real number. This means that f is differentiable at .x0 , and by (9.3) we
must have .ω(x0 ) = limx→x0 f (x)−f
x−x0
(x0 )
.
Corollary 9.1 If a function is differentiable at a point, then it is continuous at that
point.
Proof This is immediate from (9.3).
Exercise 9.2 Prove the previous Corollary by using each of the equivalent defini-
tions of the derivative.
If two functions f and g are defined on a neighborhood of a point .x0 , we can define
pointwise the functions .f +g and .f ·g: indeed .x → f (x)+g(x) and .x → f (x)g(x)
are well defined in a neighborhood of .x0 . If .g = 0 in a neighborhood of .x0 , then the
quotient .x → f (x)/g(x) is also defined.
Theorem 9.3 (Differentiation Rules) Suppose that f and g are defined on a
neighborhood .(a, b) of the point .x0 . Then
(i) the function .f + g is differentiable at .x0 , and .(f + g)% (x0 ) = f % (x0 ) + g % (x0 );
(ii) the function .f · g is differentiable at .x0 , and
Proof The proof of (i) is left as an easy exercise. A standard proof of (ii) is as
follows:
f (x)g(x) − f (x0 )g(x0 ) f (x)g(x) − f (x0 )g(x) + f (x0 )g(x) − f (x0 )g(x0 )
. =
x − x0 x − x0
f (x) − f (x0 ) g(x) − g(x0 )
= g(x) + f (x0 )
x − x0 x − x0
→ f % (x0 )g(x0 ) + f (x0 )g % (x0 ).
106 9 Derivatives and Differentiability
Another proof, based instead on Definition 9.1, starts from the assumptions .f (x) =
f (x0 ) + f % (x0 )(x − x0 ) + o(1), .g(x) = g(x0 ) + g % (x0 )(x − x0 ) + o(1) and then
f (x)g(x) = f (x0 ) + f % (x0 )(x − x0 ) + o(1) g(x0 ) + g % (x0 )(x − x0 ) + o(1)
.
where .[. . .] contains all the terms that are multiplied by some .o(1) in the algebraic
expansion. It follows that the linearization of fg at .x0 (exists and) is .f % (x0 )g(x0 ) +
f (x0 )g % (x0 ).
The proof of (iii) is more traditional. First of all, since differentiability implies
continuity, the condition .g(x0 ) = 0 implies that .g = 0 in a neighborhood of .x0 .
Then we construct
f (x) f (x0 )
g(x) − g(x0 ) 1 f (x)g(x0 ) − f (x0 )g(x)
. =
x − x0 x − x0 g(x)g(x0 )
1 f (x)g(x0 ) − f (x0 )g(x0 ) + f (x0 )g(x0 ) − f (x0 )g(x)
=
x − x0 g(x)g(x0 )
f % (x0 )g(x0 ) − f (x0 )g % (x0 )
→ .
g(x0 )2
Remark 9.3 Formula (iii) is not easily proved by means of Definition 9.1. The
trouble is that it is not trivial to extract a linearization formula from the quotient
1
. = 1 + z + z2 + z3 + · · · = 1 + z + O(z2 ).
1−z
Proof From Definition 9.1, we know that there exists function .σ and .τ such that
σ (x) = o(1) as .x → x0 , .τ (y) = o(1) as .y → f (x0 ), and
.
Then
g ◦ f (x) = g(f (x0 )) + g % (f (x0 ))(f (x) − f (x0 )) + τ (f (x)(f (x) − f (x0 ))
.
Important: Warning
The Calculus “proof” of the Chain Rule goes as follows:
g(f (x)) − g(f (x0 )) g(f (x)) − g(f (x0 )) f (x) − f (x0 )
. =
x − x0 f (x) − f (x0 ) x − x0
→ g % (f (x0 ))f % (x0 ).
Exercise 9.4 Provide a proof of the Chain Rule according to Theorem 9.2.
Theorem 9.5 (Differentiation of the Inverse Function) Suppose that f is an
invertible function on an interval .(a, b). If f is differentiable at a point .x0 ∈ (a, b)
and .f % (x0 ) = 0, then .f −1 is differentiable at .y0 = f (x0 ). Moreover,
1
Df −1 (y0 ) =
. .
f % (x 0)
108 9 Derivatives and Differentiability
f −1 (y) − f −1 (y0 ) 1
. lim = lim
y→y0 y − y0 x→x0 f (x)−f (x0 )
x−x0
1
= .
f % (x0 )
The necessity of all the assumptions should be clear from Fig. 9.1.
Example 9.1
1. The function f defined by
x sin x1 if x = 0
f (x) =
.
0 otherwise
1 1 1
f % (x) = sin
. − cos .
x x x
However
f (x) − f (0) 1
. = sin
x−0 x
x 2 sin x1 if x = 0
f (x) =
.
0 otherwise
9.2 Mean Value Theorems 109
is differentiable at .x = 0, since
f (x) − f (0) 1
. = x sin ,
x−0 x
f (a + h) − f (a − h)
f % (a) = lim
. .
h→0 2h
Provide an example of a function such that the limit on the right-hand side exists,
but the function is not differentiable at a.
The derivative is a local object that can provide global properties of functions. This
is essentially the basis of mean value theorems.
Theorem 9.6 (Fermat) Let f be a function defined on the interval .[a, b]. If f has
a local maximum or a local minimum at a point .x0 ∈ (a, b), and if .f % (x0 ) exists,
then .f % (x0 ) = 0.
Proof Suppose that .x0 is a local maximum of f , so that there exists .δ > 0 such that
a < x0 − δ < x0 < x0 + δ < b. If .x0 − δ < x, x0 , then
.
f (x) − f (x0 )
. ≥ 0,
x − x0
since f attains a local maximum at .x0 . Letting .x → x0 in the last inequality, we get
f % (x0 ) ≥ 0. Similarly, if .x0 < x < x0 + δ, then
.
f (x) − f (x0 )
. ≤ 0,
x − x0
obviously h is continuous and differentiable on .(a, b), and .h(a) = h(b). It remains
to prove that the derivative of h vanishes somewhere inside .(a, b). If h turns out to
be constant on .[a, b], then the proof is complete.
If .h(x) > h(a) for some .x ∈ (a, b), then h must attain a global maximum inside
%
.(a, b), and at this point .h vanishes by Theorem 9.6.
If .h(x) < h(a) for some .x ∈ (a, b), then h must attain a global minimum inside
%
.(a, b), and at this point .h vanishes again by Theorem 9.6.
The simple choice .g = id is surprisingly important: see Fig. 9.2.
Theorem 9.8 (Lagrange) Let f be a function defined on .[a.b], which is differen-
tiable on .(a, b) and continuous on .[a, b]. Then there exists a point .c ∈ (a, b) such
that
0 1
Proof For any points .x1 and .x2 in .(a, b), Theorem 9.8 yields .f (x2 ) − f (x1 ) =
(x2 − x1 )f % (c) for some c between .x1 and .x2 . It is now immediate to conclude
according to the sign of .f % .
Exercise 9.6 Suppose that f is a differentiable function such that .f % (x) = f (x)
for each .x ∈ R. If .f (0) = 1, prove that .f (x) = ex for each .x ∈ R.
Exercise 9.7 Let f be a differentiable function on .R with
L = sup |f % (x)| x ∈ R < 1.
.
(a) Fix any .s0 ∈ R, and define .sn = f (sn−1 ) for each .n = 1, 2, . . . Prove that the
sequence .{sn }n is convergent. Hint: show that .{sn }n is a Cauchy sequence.
(b) Prove the Banach-Caccioppoli Fixed Point Theorem: there exists a point .x ∈ R
such that .f (x) = x.
Mean value theorems are typically used in Calculus to derive criteria for the
existence of limits. The well-known result which goes under the name of De
l’Hospital is the most celebrated one.1 We follow [2] for the proof.
Theorem 9.9 (De l’Hospital) Suppose f and g are differentiable on .(a, b), and
g % (x) = 0 for all .x ∈ (a, b), where .−∞ ≤ a < b ≤ +∞. Suppose
.
f % (x)
. lim = A, (9.5)
x→a g % (x)
or
then
f (x)
. lim = A. (9.8)
x→a g(x)
1We write De l’Hospital since this is the ancient and original name. Nowadays it is customary to
write De l’Hôpital.
112 9 Derivatives and Differentiability
Proof Let us start with the case .−∞ ≤ A < +∞. Let .q > A be any real number,
and choose r such that .A < r < q. By (9.5) there exists a point .c ∈ (a, b) such that
.a < x < c implies
f % (x)
. < r. (9.9)
g % (x)
If .a < x < y < b, Theorem 9.7 yields a point .t ∈ (x, y) such that
Suppose that (9.6) holds. When .x → a in (9.10) we see that .a < y < c implies
f (y)
. ≤r<q (9.11)
g(y)
Suppose now that (9.7) holds. We fix y in (9.10) and select .c1 ∈ (a, y) such that
g(x) > g(y) and .g(x) > 0 for every .x ∈ (a, c1 ). Then it follows from (9.10) that
.
for every .x ∈ (a, c1 ). Letting .x → a in (9.12) we see that the right-hand side of
(9.12) converges to r, and therefore there exists a point .c2 ∈ (a, c1 ) such that
f (x)
. ≤r<q (9.13)
g(x)
for every .x ∈ (a, c2 ). Equations (9.11) and (9.13) imply that .f (x)/g(x) < q for
every .x ∈ (a, c2 ).
If .−∞ < A ≤ +∞, a completely similar argument shows that, given any .p < A,
there exists a point .c3 such that .p < f (x)/g(x) for every .x ∈ (a, c3 ). Since p and
q are arbitrary, we have proved that .f (x)/g(x) → A as .x → a.
Exercise 9.8 For every .x ∈ R, consider the functions
such that .f % (ξ ) = λ.
Proof We define .g(x) = f (x) − λx. By assumption .g % (a+) < 0 and .g % (b−) >
0. It follows that .g(t1 ) < g(a) and .g(t2 ) < g(b) for some .t1 , .t2 in .(a, b). As a
consequence, the function g must attain its minimum at some point .ξ ∈ (a, b). We
already know that .g % (ξ ) = 0, and thus .f % (ξ ) = λ.
Example 9.2 Define the polynomial .P (x) = (x 2 − 1)2 . Let .f : [0, 1] → R be the
function such that .f (0) = 0 and
1 1 1
f (x) =
. P (2n(n + 1)x − 2n − 1) if ≤x≤ .
n3/2 n+1 n
4n + 1
bn =
. → 0.
4n(n + 1)
Nevertheless, Theorem 9.10 applies, and .f % attains every positive value .γ on every
interval .[0, bn ].
114 9 Derivatives and Differentiability
The idea of linearization is fruitful at inner points of the domain of definition: the
function can be identified, at an infinitesimal scale, with a linear function. It is
nonetheless convenient, from time to time, to extend the definition of derivative
at end-points.
Definition 9.2 Let .f : [a, b] → R be a function. We say that f is differentiable at
a if the limits
f (x) − f (x0 )
f % (a+) = lim
.
x→x0+ x − x0
f ∈ ∞
.
n
n=0 C (a, b). Hence, a function is of class .C
∞ if and only if it can be
n
f (k) (x0 )
P (n, x0 ; x) =
. (x − x0 )k
k!
k=0
f %% (x0 )
= f (x0 ) + f % (x0 )(x − x0 ) + (x − x0 )2 + · · ·
2!
f (n) (x0 )
··· + (x − x0 )n . (9.14)
n!
Taylor polynomials express the local behavior of functions, and generalize the
concept of linear approximation which was introduced in the definition of the first
derivative.
Theorem 9.11 (Local Polynomial Approximation) Let .f : (a, b) → R be n-
times differentiable at a point .x0 ∈ (a, b), and let .P (n, x0 ; ·) be its Taylor
polynomial of degree n. Then
f (x) − P (n, x0 ; x)
. ζ(x) = , (9.16)
(x − x0 )n
thus from (9.17) we deduce .limx→x0 ζ(x) = 0. We can define .ζ(x0 ) = 0, so that .ζ
becomes a continuous function on .(a, b). By applying again Theorem 9.9 n times,
we finally see that
Theorem 9.12 (Lagrange Remainder) Let .f : (a, b) → R be .n + 1 times
differentiable on .(a, b), and let .x0 ∈ (a, b). For each .x ∈ (a, b), .x = x0 , there
exists a point .ξ between .x0 and x such that
f (n+1) (ξ )
.f (x) = P (n, x0 ; x) + (x − x0 )n+1 .
(n + 1)!
Proof Suppose without loss of generality that .x0 < x. We define .F : [x0 , x] → R,
f (n+1) (t)
F % (t) = −
. (x − t)n .
n!
Next we introduce the function .G : [x0 , x] → R,
(x − t)n+1
.G(t) = .
(n + 1)!
We have .F (x) = G(x) = 0, and .F % (t)/G% (t) = f (n+1) (t). We now apply
Theorem 9.7 to F and G on .[x0 , x], and find a point .ξ between .x0 and x such
that
9.6 Convexity 117
[ 1 + (1 − ) 2]
( 1) + (1 − ) ( 2)
( )
1 1 + (1 − ) 2 2
9.6 Convexity
Remark 9.5 The crucial fact is that .λx1 + μx2 must be an element of I , as soon as
x1 and .x2 belong to I and .λ + μ = 1. It is easy to check that this is actually correct,
.
since I is an interval.
Let us manipulate the convexity inequality (9.18). Let x be any point between .x1
and .x2 . We set .x = λx1 + μx2 . Since .λ + μ = 1, we see that
x2 − x x − x1
λ=
. , μ= .
x2 − x1 x2 − x1
x2 − x x − x1
f (x) ≤
. f (x1 ) + f (x2 ). (9.19)
x2 − x1 x2 − x1
f (x) − f (x0 )
x →
.
x − x0
f (x2 ) − f (x1 )
f % (x1 +) ≤
. ≤ f % (x2 −). (9.23)
x2 − x1
Theorem 9.14 Let f be a differentiable function in the interval .[a, b]. A necessary
and sufficient condition for f to be convex is that .f % be monotonically increasing.
Proof If f is convex, then (9.23) implies .f % (x1 ) ≤ f % (x2 ), so that .f % is increasing.
In the other direction, we remark that convexity is equivalent to (9.22) for all points
%
.x1 , x and .x2 such that .x1 < x < x2 . If .f is increasing, then there exists points .ξ1
and .ξ2 such that .x1 < ξ1 < x < ξ2 < x2 and
9.7 Problems
Give an example to show that the existence of the previous limit does not imply the
differentiability of f at x0 .
9.2 Suppose that f is differentiable at the point x0 . let {hn }n and {kn }n be two
nonincreasing sequences which converge to x0 . Prove that
f (x0 + hn ) − f (x0 − kn )
. lim = f % (x0 ).
n→+∞ hn + kn
Give an example to show that the existence of the previous limit does not imply the
differentiability of f at x0 .
120 9 Derivatives and Differentiability
9.3 Suppose that f % is continuous on an interval [a, b]. Prove that for every ε > 0
there exists δ > 0 such that
f (t) − f (x)
− f (x) < ε
%
.
t −x
f (xn )
xn+1 = xn −
.
M
for n = 1, 2, 3, . . . Prove that {xn }n converges to a limit x0 such that f (x0 ) = 0.
Furthermore, prove that
f (x1 ) m n
. |xn+1 − xn | ≤ 1− .
m M
9.5 Suppose f is a real-valued function defined on the half-line (a, +∞). Suppose
that f is twice differentiable on (a, +∞), and define
M0 = sup |f (x)|
.
x>a
M1 = sup |f % (x)|
x>a
M2 = sup |f %% (x)|.
x>a
Prove that M12 ≤ M0 M2 as follows: for each h > 0 deduce from Taylor’s expansion
that there exists ξ ∈ (x, x + 2h) such that
f (x + 2h) − f (x)
f % (x) =
. − hf % (ξ ).
2h
Therefore
M0
|f % (x)| ≤ hM1 +
. .
h
Now optimize the right-hand side with respect to h > 0.
9.6 If f is twice differentiable on (0, +∞), f %% is bounded and limx→+∞ f (x) =
0, prove that limx→+∞ f % (x) = 0. Hint: consider the limit a → +∞ in the previous
problem.
9.7 Problems 121
x−1
. < log x < x − 1.
x
(b) For each j ∈ N, j > 1, prove that
j +1 1 j
. log < < log .
j j j −1
1
kn
1 k
. log k + < < log k +
n j n−1
j =n
and
kn
i
. lim = log k.
n→+∞ j
j =n
2n
(−1)j +1
2n
1
n
1
. = −2
j j 2k
j =1 j =1 k=1
that
∞
(−1)j +1
. = log 2.
j
j =1
9.8 If x > 1, x = e, prove that there exists one and only one number f (x) > 0
such that f (x) = x and
x f (x) = (f (x))x .
.
log x log y
Hint: x y = y x if and only if x = y .
122 9 Derivatives and Differentiability
9.8 Comments
The standard definition of derivative as the limit of the incremental ratio should
not be considered as the one used by mathematicians from the beginning of
Calculus. They would rather use a principle of disappearing quantities which
roughly correspond to an expansion of functions at first order as in
In this sense, the one-dimensional derivative has progressively lost its definition as
a linearization procedure in favor of an iconic limit:
f (t + Δt) − f (t)
f % (t) = lim
. .
Δt →0 Δt
There are several good reasons to define the derivative as a linearization, and the
most important one is that the derivative of a function of several variables is not a
number.
The theory of convex functions is a long but elementary exercise, in the case
of functions of a single real variable. The topic becomes much more exciting in
higher dimensions, where intervals must be replaced by convex sets and a new fact
comes into play: it is possible to draw conclusion about a function on a convex set
by assuming a property of that function on every straight line. The interested reader
may start from [1].
References
Abstract The basic theory of definite integration can be named after Bernhard
Riemann. In this chapter we will propose a rigorous introduction to it. Although
we have in mind Rudin’s lucid chapter in his Principles of Mathematical Analysis,
we prefer to avoid the additional complication of the Stieltjes generalization. By the
way, a later chapter will show that the much more flexible integral of Lebesgue can
be presented without too much effort.
This is by far the worst weakness of the Riemann integral. The rough idea is to
construct finite sums of values of a function at suitable points, and then pass to the
limit is a suitable sense. The next definition gives a name to the selection of the
suitable points.
Definition 10.1 A partition P of a closed and bounded interval .[a, b] is a finite set
of points .x0 , .x1 , . . . , .xn such that .a = x0 < x1 < x2 < . . . < xn−1 < xn = b. We
write .xi = xi − xi−1 for .i = 1, . . . , n.
Let .f : [a, b] → R be a bounded function. To any partition P of .[a, b] we attach
the quantities
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 123
S. Secchi, A Circle-Line Study of Mathematical Analysis,
La Matematica per il 3+2 141, https://doi.org/10.1007/978-3-031-19738-3_10
124 10 Riemann’s Integral
25
20
15
10
1 2 3 4 5
n
U (f, P ) =
. Mi xi
i=1
n
L(f, P ) = mi xi ;
i=1
b b
. f dx = inf U (f, P ), f dx = sup L(f, P )
a a
where inf and sup are taken over all partitions P of .[a, b].
Remark 10.1 Since .mi ≤ Mi , we see that .L(f, P ) ≤ U (f, P ) for every partition
P . Since f is bounded, say .m ≤ f ≤ M on .[a, b], then .m(b − a) ≤ L(f, P ) ≤
U (f, P ) ≤ M(b − a). As a consequence, the upper and the lower integrals of f are
finite. Of course this is the reason why boundedness cannot be dispensed with.
10.1 Partitions and the Riemann Integral 125
b b
. f dx = f dx.
a a
In this case the common value of the upper and the lower integral is denoted by
!b
.
a f dx, and it is called the Riemann integral of f on .[a, b].
Remark 10.2 There are good reasons to criticize the symbol associated to the
!b
Riemann integral. Indeed, the letter x in . a f dx is completely useless. The symbol
!b
.
a f would be a natural choice, but we prefer to use the traditional notation.
Unlike the derivative, the Riemann integral is a global object, in the sense that
it involves the values of the function f on the whole interval .[a, b]. It is a matter
of fact that Definition 10.3 is not particularly concrete. We need to deploy some
general condition for integrability.
Definition 10.4 A partition .P ∗ is a refinement of a partition P of .[a, b], if .P ⊂ P ∗ .
If .P1 and .P2 are two partitions of .[a, b], their union .P ∗ = P1 ∪ P2 is called the
common refinement of .P1 and .P2 .
Exercise 10.1 Prove that the common refinement of two partitions .P1 and .P2 is the
smallest partition which refines both .P1 and .P2 .
Theorem 10.1 If .P ∗ is a refinement of P , then
Proof We suppose first that .P ∗ contains just one point more that P . If this point is
∗ ∗
.x , for some index i we must have .xi−1 < x < xi . Let
In the general case, a finite number of points are added to those of P . We just repeat
the same construction for each of them, and we conclude. The proof of the inequality
∗
.U (f, P ) ≤ U (f, P ) is similar, and we omit the details.
!b !b
Corollary 10.1 . f dx ≤ a f dx.
a
126 10 Riemann’s Integral
Proof Let .P1 and .P2 be arbitrary partitions of .[a, b], and let .P ∗ their common
refinement. As we have just seen,
!b
Taking the supremum over .P1 , we see that . f dx ≤ U (f, P2 ). Taking now the
a
infimum over .P2 we conclude the proof.
Theorem 10.2 (Integrability Condition) A bounded function f is R-integrable on
[a, b] if and only if for every .ε > 0 there exists a partition .Pε of .[a, b] such that
.
b b
L(f, P ) ≤
. f dx ≤ f dx ≤ U (f, P ).
a a
!b !b !b !b
Then (10.3) implies .0 ≤ a f dx − f dx < ε, and . a f dx = f dx because
a a
.ε > 0 is arbitrary.
Conversely, let f be integrable, and let .ε > 0 be fixed. There exist two partitions
.P1 and .P2 such that
b ε b ε
.U (f, P2 ) < f dx + , L(f, P1 ) > f dx − .
a 2 a 2
b ε
U (f, Pε ) ≤ U (f, P2 ) <
. f dx + < L(f, P1 ) + ε ≤ L(f, Pε ) + ε,
a 2
!b
and in this case . a f dx = limn→+∞ U (f, Pn ) = limn→+∞ L(f, Pn ). Hint:
apply (10.3) with .ε = εn → 0 as .n → +∞.
10.1 Partitions and the Riemann Integral 127
(b) For each n, let .Pn be the partition of .[0, 1] into n equally spaced points. Find a
closed formula for .U (f, Pn ) and .L(f, Pn ) in case .f (x) = x. Hint: first prove
by induction that .1 + 2 + . . . + n = n(n + 1)/2.
!1
(c) Deduce that .f (x) = x is integrable on .[0, 1], and compute . 0 x dx.
We can now relate our definition of the Riemann integral to the usual approach
of Calculus via Riemann sums. In undergraduate introductions to the integral, it is
customary to select arbitrary points between two consecutive nodes of a partition,
so that sums like
n
. f (ti )xi
i=1
n
. |f (si ) − f (ti )| xi < ε.
i=1
Proof Part (i) follows immediately from Theorem 10.1. Let us prove part (ii). Of
course .f (si ) and .f (ti ) lie in .[mi , Mi ], so that .|f (si )−f (ti )| ≤ Mi −mi . This yields
n
. |f (si ) − f (ti )|xi ≤ U (f, P ) − L(f, P ) < ε.
i=1
n
L(f, P ) ≤
. f (ti )xi ≤ U (f, P )
i=1
b
L(f, P ) ≤ f dx ≤ U (f, P ).
a
128 10 Riemann’s Integral
( )
= 0 −1 +1 =
Remark 10.3 Part (iii) actually says that the Riemann integral can be approximated
by a Riemann sum as soon as it exists, see Fig. 10.2.
Functions can be added pointwise, and pointwise multiplied by constants. This fact
induces a vector space structure to the class of R-integrable functions, as the next
result states.
10.2 Integrable Functions as Elements of a Vector Space 129
Theorem 10.4
(a) If .f1 and .f2 are R-integrable on .[a, b] and if c is a real number, then .f1 + f2
and .cf1 are R-integrable; moreover
b b b
. (f1 + f2 ) dx = f1 dx + f2 dx,
a a a
b b
cf1 dx = c f1 dx.
a a
!b !b
(b) If .f1 ≤ f2 on .[a, b], then . a f1 dx ≤ a f2 dx.
(c) If f is R-integrable on .[a, b] and if .a < c < b, then f is R-integrable on .[a, c]
!b !c !b
and on .[c, b], and . a f dx = a f dx + c f dx.
Proof For .f = f1 + f2 and any partition P of .[a, b], we notice that
and
Call P the common refinement of .P1 and .P2 . Then .U (f, P ) − L(f, P ) < 2ε.
Furthermore
b
U (fj , P ) <
. fj dx + ε, j = 1, 2
a
Hence
b b b
. f dx ≤ U (f, P ) < f1 dx + f2 dx + 2ε,
a a a
and
b b b
. f dx ≤ f1 dx + f2 dx
a a a
follows from the arbitrariness of .ε. Replacing .f1 and .f2 with .−f1 and .−f2 , we see
!b !b !b
that . a f dx ≥ a f1 dx + a f2 dx. The claim about cf is easier, and we leave it
as an exercise. We have thus proved (a).
130 10 Riemann’s Integral
L(f1 , P ) ≤ L(f2 , P ),
. U (f1 , P ) ≤ U (f2 , P ).
!b !b !b
Hence . a f1 dx ≤ U (f2 , P ), and thus . a f1 dx ≤ a f2 dx.
To prove (c), write .f = g + h in such a way that .g = 0 on .[c, b], .h = 0 on
!b !c
.[a, c]. It is clear that g and h are R-integrable on .[a, b], and . g dx = a f dx,
!b !b a
while . a h dx = c f dx. The conclusion follows from (a).
Exercise 10.3 Suppose that .f (x) > 0 for each .x ∈ [a, b]. If f is integrable on
!b !b
[a, b], prove that . a f dx > 0. Hint: recall that .L(f, P ) ≤ a f dx ≤ U (f, P ) for
.
0 if x ∈ [0, 1] ∩ Q
h(x) =
.
1 otherwise.
Let P be any partition of [0, 1]. Since Q is dense in R, between two consecutive
nodes xi−1 and xi of the partition there are infinitely many rational points, and
infinitely many irrational points. If follows that U (f, P ) = 1 and L(f, P ) = 0.
This clearly shows that h is not R-integrable on [0, 1].
The previous example suggests the following question: are there general proper-
ties that imply the R-integrability of functions?
Theorem 10.5 (Monotonic Functions Are R-integrable) Let f be a bounded
monotonic function on [a, b]. Then f is R-integrable on [a, b].
Proof For the sake of definiteness, we assume that f is increasing. Pick any ε > 0,
and split [a, b] into a number n of equal parts so that (b − a)/n < ε/[f (b) − f (a)].
This gives rise to a partition P of [a, b]. Monotonicity implies that mi = f (xi−1 ),
Mi = f (xi ), so that
n
b−a
U (f, P ) − L(f, P ) =
. [f (xi ) − f (xi−1 )]xi = [f (b) − f (a)] < ε.
n
i=1
Proof Pick any ε > 0. By Theorem 8.8 f is uniformly continuous. There exists
δ > 0 such that |f (x) − f (t)| < ε/(b − 1) whenever |x − t| < δ. Let P a partition
of [a, b] such that xi < δ for every i. Clearly Mi − mi < ε/(b − a) for every i,
and thus
n
ε
U (f, P ) − L(f, P ) =
. [Mi − mi ]xi ≤ (b − a) = ε.
b−a
i=1
The idea is to split the partition in two parts: a part on which Mi − mi < δ, and
the remaining part. Precisely, for i = 0, . . . , n we say that i ∈ A if Mi − mi < δ,
and i ∈ B if Mi − mi ≥ δ. For i ∈ A we have Mi∗ − m∗i ≤ ε. For i ∈ B, we have
Mi∗ − m∗i ≤ 2K, where K = supx∈[a,b] |φ(x)|. By (10.4) we have
.δ xi ≤ (Mi − mi )xi ≤ δ 2 .
i∈B i∈B
Hence
.U (h, P ) − L(h, P ) = (Mi∗ − m∗i )xi + (Mi∗ − m∗i )xi
i∈A i∈B
Proof We take φ(t) = t 2 in Theorem 10.8. This yields that f 2 is R-integrable, and
the identity
shows (a).
Taking φ(t) = |t| in Theorem 10.8 shows that |f | is R-integrable. Let c ∈
!b
{−1, +1} be chosen so that c a f dx ≥ 0. Then
b b b b
. f dx = c f dx = cf dx ≤ |f | dx.
a a a a
f + = max{f, 0},
. f − = max{−f, 0},
.M = sup{f (x) | x ∈ E}
m = inf{f (x) | x ∈ E}
M % = sup{|f (x)| | x ∈ E}
m% = inf{|f (x)| | x ∈ E}.
n
U (f, Q) =
. (sup{g(y) | yi−1 ≤ y ≤ yi }) [ϕ(yi ) − ϕ(yi−1 )]. (10.5)
i=1
n
L(f, Q) = (inf{g(y) | yi−1 ≤ y ≤ yi }) [ϕ(yi ) − ϕ(yi−1 )]. (10.6)
i=1
n
. |ϕ % (si ) − ϕ % (ti )|yi < ε
i=1
134 10 Riemann’s Integral
n
n
. g(si )[ϕ(yi ) − ϕ(yi−1 )] = g(si )ϕ % (ti )yi
i=1 i=1
that
n
n
. g(si )[ϕ(yi ) − ϕ(yi−1 )] − g(si )ϕ % (si )yi ≤ Mε.
i=1 i=1
In particular we get
n
. g(si )[ϕ(yi ) − ϕ(yi−1 )] ≤ U (gϕ % , P ) + Mε
i=1
n
%
U (gϕ , P ) ≤
. g(si )[ϕ(yi ) − ϕ(yi−1 )] + Mε
i=1
How do we actually compute Riemann integrals? Despite all the definitions and all
the results we have proved, there is just one universal approach: we need to find an
antiderivative.
Definition 10.5 A function F is an antiderivative of a function f on the interval
.[a, b], if F is differentiable on .[a, b], and .F % = f at every point.
! !
It is customary to collect under the symbol . f (or . f (x) dx) the set of all
antiderivatives of f (on an interval that is not specified in the notation). The symbol
.D
−1 f would probably be a better choice, but it is not customary in the literature.
10.4 Antiderivatives and the Fundamental Theorem 135
c1 if 0 ≤ x ≤ 1
f (x) =
.
c2 if 2 ≤ x ≤ 3
is an antiderivative of the zero function on .[0, 1] ∪ [2, 3]. This shows that
Theorem 10.10 does not hold on domains different than intervals.
Theorem 10.11 (Existence of Antiderivatives) Let f be R-integrable on .[a, b].
For .a ≤ x ≤ b we define
x
. F (x) = f (t) dt.
a
Hence for every .ε > 0 we see that .|y − x| < ε/M implies .|F (x) − F (y)| < ε.
Now suppose that f is continuous at .x0 . Given .ε > 0 there exists .δ > 0 such that
.|x − x0 | < δ implies .|f (x) − f (x0 )| < ε. As a consequence, if .x0 − δ < s ≤ x0 ≤
then
b
. f dx = F (b) − F (a).
a
Proof Fix any .ε > 0. A partition .P = {x0 , . . . , xn } of .[a, b] exists such that
U (f, P ) − L(f, P ) < ε. The mean value theorem
. yields points .ti ∈ [xi−1 , xi ] such
that .F (xi ) − F (xi−1 ) = f (ti )xi . It follows that . ni=1 f (ti )xi = F (b) − F (a).
Theorem 10.3 now ensures that
b
. F (b) − F (a) − f dx < ε.
a
!b
The arbitrariness of .ε > 0 implies that .F (b) − F (a) = a f dx.
!1
Example 10.3 Following Archimedes, we can easily claim that . −1 x
2 dx = 2/3,
since .F (x) = 13 x 3 is an antiderivative of .f (x) = x 2 on .[−1, 1].
Theorem 10.13 (Integration by Parts) Suppose that F and G are differentiable
functions on .[a, b], and .F % = f , .G% = g are R-integrable. Then
b b
. F (x)g(x) dx = F (b)G(b) − F (a)G(a) − f (x)G(x) dx.
a a
!b
Proof Let .H = F G, so that Theorem 10.12 yields . a H % (x) dx = H (b) − H (a).
But .H % = f G + F g, and we conclude.
Remark 10.6 It is customary to present integration by parts in a more symmetric
manner:
b b
. uv % dx = u(b)v(b) − u(a)v(a) − u% v dx.
a a
Exercise 10.6
(a) Choose .u(x) = arctan x and .v(x) = x 2 /2 to compute the integral
!1
.
0 x arctan x dx.
(b) Choose now .u(x) = arctan x and .v(x) = (x 2 + 1)/2 and repeat the
computations.
Remark 10.7 Theorem 10.12 provides an easier proof of Theorem 10.9. Indeed one
observes that, if .F % = f , then .(F ◦ ϕ)% = (f ◦ ϕ)ϕ % . Hence
B b
. f (ϕ(y))ϕ % (y) dy = F (ϕ(B)) − F (ϕ(A)) = F (b) − F (a) = f dx.
A a
10.5 Problems 137
10.5 Problems
+∞ b
. f dx = lim f dx,
a b→+∞ a
provided that this limit exists as a real number. Suppose that f is monotonically
decreasing on [1, +∞) and that f (x)! ≥ 0 for each x ≥ 1. Prove that the series
+∞ +∞
n=1 f (n) converges if and only if 1 f dx exists as a finite number. This is
sometimes called the test of the improper integral for numerical series.
10.4 Let p and q be two positive real numbers such that 1
p + 1
q = 1. Prove the
following statements.
1. If u ≥ 0 and v ≥ 0, then
up vq
uv ≤
. + .
p q
1 1
b b p b q
. fg dx ≤ |f | dx |g| dx
p q
.
a a a
This is called the Hölder inequality with conjugate exponents p and q. The case
p = q = 2 is usually called the Cauchy-Schwarz inequality.
10.6 Comments
My personal position is that a young (and also an old) mathematician should have
a very clear view of Riemann’s integral for functions of one variable, while a rough
idea of its extension to functions of several variables is more than enough. The
construction I have proposed in this chapter is standard, and can be found on many
textbooks.
Riemann integration theory is weak and easy in one dimension, but it becomes
weak and tricky in two or more dimensions. For this reason I always recommend
colleagues and students to replace Riemann’s integral in .Rn with the (concrete)
Lebesgue integral as soon as possible.
Chapter 11
Elementary Functions
1 It should be remarked that E does not depend on the index n. From a theoretical view point we
could consider sequences of functions .fn : En → R in which every term is defined on a set .En .
For our purposes such a generality can be troublesome.
2 The nature of E is not particularly relevant. In many concrete cases, E is a subset of .R of .C.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 139
S. Secchi, A Circle-Line Study of Mathematical Analysis,
La Matematica per il 3+2 141, https://doi.org/10.1007/978-3-031-19738-3_11
140 11 Elementary Functions
When .m!x ∈ Z, we have .fm (x) = 1. Otherwise we have .fm (x) = 0. Let .f (x) =
limm→+∞ fm (x).
If x is irrational, then .fm (x) = 0 for every m, so .f (x) = 0. For rational values
of .x = p/q, we see that .m!x is an integer for every .m ≥ q, and therefore .f (x) = 1.
To summarize,
0 if x is irrational
f (x) =
.
1 if x is rational.
This shows that the pointwise limit of very smooth functions may well be a very
irregular function.
Exercise 11.1 Consider
x2
.fn (x) =
(1 + x 2 )n
∞
for .x ∈ R and .n = 0, 1, 2, 3, . . . Let .f (x) = n=0 fn (x). Show that f is defined
for every real x, and that
0 if x = 0
f (x) =
.
1 + x2 if x = 0.
so that
1 n2
. lim fn (x) dx = lim = +∞.
n→+∞ 0 n→∞ 2n + 2
!1 !1
In particular .0 = 0 f (x) dx = limn→+∞ 0 fn (x) dx.
Exercise 11.2 Consider instead .fn (x) = nx(1 − x 2 )n for .x ∈ [0, 1]. Show that
!1
.limn→+∞
0 fn (x) dx = 1/2, while .f (x) = limn→+∞ fn (x) = 0 for every .x ∈
[0, 1].
To summarize: the pointwise limit of a sequence of function does not preserve
continuity, differentiability, integrability. The natural question is whether we can
replace our pointwise convergence with another convergence which preserves these
properties.
The quantity sup {|fn (x) − f (x)| | x ∈ E} is often denoted by *fn − f *∞,E .
Theorem 11.1 (Cauchy Criterion for Uniform Convergence) A sequence {fn }n
of functions defined on a set E converges uniformly if and only if the Cauchy
condition holds: for every ε > 0 there exists a positive integer N such that m ≥ N,
n ≥ N, x ∈ E imply
Proof Suppose that {fn }n converges uniformly on E to a limit f , and let ε > 0
be fixed. By Definition 11.2 there exists a positive integer N such that n ≥ N and
x ∈ E imply |fn (x) − f (x)| < ε/2. Thus
|fn (x) − fm (x)| ≤ |fn (x) − f (x)| + |fm (x) − f (x)| < ε
.
for every n ≥ N, m ≥ N, x ∈ E.
On the contrary, suppose that the Cauchy condition holds. For every x ∈ E, the
numerical sequence {fn (x)}n is then a Cauchy sequence in R, so it converges to
some limit that we call f (x). We need to prove that this convergence is uniform on
E. Let ε > 0 be given, and choose a positive integer N such that |fn (x) − fm (x)| <
ε for every m ≥ N, n ≥ N, x ∈ E. Letting m → +∞, since fm (x) → f (x), we
deduce that
. lim fn (t) = An
t →x
11.2 Uniform Convergence 143
We choose n0 ≥ N so that |f (t)−fn0 (t)| < ε/3 for all t ∈ E, and |An0 −A| < ε/3.
With this n0 we choose a neighborhood V of x such that t ∈ V ∩ E, t = x imply
|fn0 (t) − An | < ε/3. It follows that |f (t) − A| < ε for every t ∈ V ∩ E, t = x. The
proof is complete.
Corollary 11.1 If {fn }n is a sequence of continuous functions on E that converges
uniformly to a limit f , then f is a continuous function.
Theorem 11.4 (Uniform Convergence and Differentiation) Suppose that {fn }n
is a sequence of functions, differentiable on [a, b] and such that there exists a point
x0 ∈ [a, b] such that {fn (x0 )}n converges. If {fn% }n converges uniformly on [a, b],
then {fn }n converges uniformly on [a, b] to a limit f , and
for every n ≥ N, m ≥ N. Let us apply the mean value theorem to fn −fm : Eq. (11.1)
yields
|x − t|ε ε
. |fn (x) − fm (x) − (fn (t) − fm (t))| ≤ ≤ (11.2)
2(b − a) 2
|fn (x) − fm (x)| ≤ |fn (x) − fm (x) − fn (x0 ) + fm (x0 )| + |fn (x0 ) − fm (x0 )|
.
This proves that {fn }n converges uniformly to a limit that we call f . We fix a point
x ∈ [a, b] and introduce the incremental ratios
for any t ∈ [a, b] \ {x0 }. Clearly limt →x φn (t) = fn% (x). From (11.2) we see that
n ≥ N, m ≥ N imply
ε
|φn (t) − φm (t)| ≤
. .
2(b − a)
Hence {φn }n converges uniformly on [a, b] \ {x}. But {fn }n converges to f , hence
limn→+∞ φn (t) = φ(t) uniformly on [a, b] \ {x}. We conclude from Theorem 11.3
that limt →x φ(t) = limn→+∞ fn% (x). The proof is complete.
Theorem 11.5 (Uniform Convergence and Integration) Suppose that each fn is
R-integrable on [a, b], and suppose that fn → f uniformly on [a, b]. Then f is
!b !b
R-integrable on [a, b] and a f dx = limn→+∞ a fn dx.
Proof Let
b b b b
. (fn − εn ) dx ≤ f dx ≤ f dx ≤ (fn + εn ) dx.
a a a a
11.3 The Exponential Function 145
Hence
b b
0≤
. f dx − f dx ≤ 2εn (b − a).
a a
!b !b
Since εn → 0 as n → +∞, we conclude that f dx = af dx, and f is R-
a
integrable. As before,
b b b
. f dx − fn dx = (f − fn ) dx ≤ εn (b − a),
a a a
!b !b
and it follows that a fn dx → a f dx as n → +∞.
Remark 11.2 The rigidity of the Riemann integral under passage to the limit is one
of the reasons why it has been superseded by more flexible integrals. We will meet
Lebesgue’s generalization in Chap. 15.
It has been said that the exponential function is the most important function in
Mathematical Analysis. We propose a definition which entails a lot of useful
properties.
Definition 11.3 For each .z ∈ C, we define
∞ n
z
. exp z = .
n!
n=0
. exp : C → C
3We refrain from writing .ez instead of .exp z. This is only a pedagogical choice, since we want to
prevent the reader from believing that all properties of this function are trivial since we are deling
with an ordinary power.
146 11 Elementary Functions
This proves 1. Part 2 is clear from the definitions. Using 1 and 2 we see that
exp(−z) exp z = 1 for every .z ∈ C, and both 3 and 4 follow. To prove 5, we fix
.
ε
.δ = min 1, .
2| exp z|
4. .limx→−∞ exp x = 0.
6. .limx→+∞ x
−n exp x = +∞ for every integer .n ≥ 0 and every .x ∈ R.
Proof We first notice that .exp(R) ⊂ R. Since .x > 0 implies .exp x > 1 + x, 1
follows for .x ≥ 0. If .x < 0, .exp(−x) = 1/ exp x, and 1 follows also in this case. To
prove 2, we fix .x ∈ R and .h > 0. Then .exp(x + h) = exp x · exp h > exp x since
.exp h > 1. Similarly, .limx→+∞ exp x ≥ limx→+∞ (1 + x) = +∞, and 3 follows.
11.3 The Exponential Function 147
1
. lim exp x = lim exp(−x) = lim ,
x→−∞ x→+∞ x→+∞ exp x
we see that 4 follows from 3. The proof of 6 goes as follows: .x > 0 implies
x n+1 x
.x −n exp x > x −n =
(n + 1)! (n + 1)!
To conclude, 5 follows from 1, 3, 4 and the fact that .exp is injective on .R.
Theorem 11.7 The restriction of .exp to .R is differentiable at every point, and there
results .exp% = exp.
Proof Just differentiate the series of function that defines the exponential, and use
Theorem 11.4.
One an exponential function has been introduced, the logarithm comes into play as
its inverse.
Definition 11.4 The (real) logarithm is defined as
−1
. log = exp|R .
9. .log(a
1/n ) = (1/n) log a for every .a > 0 and .n ∈ N.
log x
10. .limx→+∞ √ n x = 0 for every .n ∈ N.
Proof Properties from 1 to 6 follow from the analogous properties of the exponen-
tial. For .a > 0 and .b > 0 we have
and this proves 7. Properties 8 and 9 are left as a simple exercise about induction.
To prove 10 we fix .ε > 0 and .n ∈ N. By Theorem 11.6 we can choose .α > 1 such
that .y > α implies .y −n exp y > ε−n . By property 4 there exists .β > 1 such that
148 11 Elementary Functions
or
log x
0< √
.
n
< ε.
x
Theorem 11.9 The function .log is differentiable at every point of .(0, +∞), and
there results .(log)% (y) = 1/y for every .y > 0.
Proof Exercise on the derivative of the inverse function!
Remark 11.3 Another approach is possible, which avoids the use of series. The first
idea is to define a function .log : (0, +∞) → R by
x dt
. log x = .
1 t
!b !a
Here we are using the convention . a = − b . The main properties of the real
logarithm follows at once, in particular the fact that .log has a continuous inverse.
We call the inverse the real exponential function.
Proof Observe that (−1)n z2n = (iz)2n and i(−1)n z2n+1 = (iz)2n+1 . Then 1
follows from the definition of sin and cos. Property 2 follows by replacing z by
−z in 1. If we solve the system
22n 22n+2
. > ,
(2n)! (2n + 2)!
hence
22 24 26
. cos 2 = 1 − + − + ···
2! 4! 6!
16 2
= −1 + − · · · < −1 + < 0.
24 3
Since x → cos x is a continuous real-valued function on [0, 2] and cos 0 = 1 > 0,
it follows that the set
A = {x ∈ [0, 2] | cos x = 0}
.
functions until they can be rigorously defined. Calculus is built around elementary
functions, but it does not provide sufficient tools to define them without any
reference to geometric or intuitive facts.
Remark 11.4 The approach to trigonometric functions without power series is
slightly involved. A possible approach is to begin with arctan: R → (−π/2, π/2)
in terms of a definite R-integral:
x dt
. arctan x = .
0 1 + t2
Two cases are possible. If η > 0, then we pick h ∈ H and m ∈ Z such that
0 if x is rational
f (x) =
.
1 if x is irrational.
It is clear that no smallest positive period exists, although f is surely periodic in the
sense of Definition 11.6.
Polynomials are the most elementary functions of mathematical analysis. They are
built on arithmetic operations, and they turn out to be a flexible class of infinitely
differentiable functions. Of course not all functions are polynomial.
Exercise 11.4 Prove rigorously that not all functions are polynomials. Hint: if P is
a non-constant polynomial, either .limx→±∞ |P (x)| = +∞.
Nevertheless, polynomials do approximate continuous functions in a strong way.
Theorem 11.14 (Weierstrass Approximation Theorem) If .f : [a, b] → R
is a continuous function and .ε > 0, there exists a polynomial P such that
.supx∈[a,b] |f (x) − P (x)| < ε.
that .f (0) = f (1) = 0. Indeed, the function f may be replaced by the function
This function differs from f by a polynomial (of degree .≤ 1), so that a uniform
approximation of this function by means of polynomials implies that f is approxi-
mated by polynomials.
Finally, for technical reasons, we define .f (x) = 0 for each .x ∈ R \ [0, 1]. Hence
f is defined on the whole real line. For each .n = 1, 2, 3, . . . define
1
cn = ! 1
.
−1 (1 − x 2 )n dx
and
. Qn (x) = cn (1 − x 2 )n .
!1
Trivially, . −1 Qn (x) dx = 1 for each n. Furthermore,
1 1
. (1 − x 2 )n dx = 2 (1 − x 2 )n dx
−1 0
√
1/ n
≥2 (1 − x 2 )n dx
0
√
1/ n
≥2 (1 − nx 2 ) dx
0
4 1
= √ > √ .
3 n n
√
As a consequence, .cn < n for each n. Now, fix any .δ > 0, and observe that
.δ ≤ |x| ≤ 1 implies
√
Qn (x) ≤
. n(1 − δ 2 )n .
The right-hand side converges to zero, hence .Qn converges to zero uniformly in the
region .δ ≤ |x| ≤ 1, i.e. away from zero.
We introduce the sequence of functions
1
Pn (x) =
. f (x + t)Qn (t) dt,
−1
11.6 A Continuous Non-differentiable Function 153
1
Pn (x) =
. f (t)Qn (t − x) dt,
0
1
. |Pn (x) − f (x)| = (f (x + t) − f (x)) Qn (t) dt
−1
1
≤ |f (x + t) − f (x)| Qn (t) dt
−1
−δ ε δ 1
≤ 2M Qn (t) dt + Qn (t) dt + 2M Qn (t) dt
−1 2 −δ δ
√ ε
≤ 4M n(1 − δ 2 )n +
2
<ε
Proof Let us start with .ϕ(x) = |x|, for each .x ∈ [−1, 1]. Then we extend it by
periodicity to .R, i.e. .ϕ(x + 2) = ϕ(x) for each .x ∈ R. It is clear that
|ϕ(s) − ϕ(t)| ≤ |s − t|
. (11.3)
The M-test 11.2 shows that f is defined by a series that converges uniformly on .R,
so that f is a continuous function. We claim that f is differentiable at no point of
.R. To prove this claim, we pick any .x ∈ R and any positive integer m. Let
1
δm = ± · 4−m ,
.
2
where the sign is chosen so that the interval .[4m x, 4m (x + δm )] contains no integer.
Since .4m |δm | = 1/2, this is indeed possible. Consider now the incremental ratio
ϕ(4n (x + δm )) − ϕ(4n x)
γn =
. .
δm
m−1
≥ 3m − 3n
n=0
1 m
= 3 +1 .
2
Letting .m → +∞ we get .δm → 0. Hence f is not differentiable at x, and the proof
is complete.
Remark 11.5 Such a function is a typical example of a fractal curve, whose graph
is essentially impossible to sketch. Our function f is based on the function .ϕ, which
is already irregular at countably many points. However, Weierstrass constructed a
11.7 Asymptotic Estimates for the Factorial Function 155
3
.ab > 1 + π.
2
The function f is then a trigonometric series, and each term of the infinite sum is a
smooth function. Once more we see that a limit of regular functions may be a very
irregular function.
(−1)!! = 1,
. 0!! = 1, (n + 1)!! = (n + 1) · (n − 1)!!
Exercise 11.5 Prove that .n!! is the product of all odd numbers .m ≤ n when n is
odd, and it is the product of all even numbers .m ≤ n when n is even.
Theorem 11.16 (Wallis Integrals) If .n ∈ N, then
π/2 (2n)!!
. (sin x)2n+1 dx = . (11.4)
0 (2n + 1)!!
π/2 π (2n − 1)!!
(sin x)2n dx = . (11.5)
0 2 (2n)!!
π/2 π/2 π/2
(sin x)2n+1 dx ≤ (sin x)2n dx ≤ (sin x)2n−1 dx. (11.6)
0 0 0
Proof For every .x ∈ [0, π/2] we have .0 ≤ sin x ≤ 1. Hence, for any such x,
the sequence .n → (sin x)n is decreasing. In particular (11.6) follows at once from
the monotonicity properties of the Riemann integral. We prove (11.4) and (11.5) by
induction on n. They clearly hold true when .n = 0, and we integrate by parts as
156 11 Elementary Functions
follows:
π/2 π/2
. (sin x)m dx = (sin x)m−2 (1 − cos2 x) dx
0 0
π/2 π/2
= (sin x)m−2 dx − cos x · (sin x)m−2 x cos x dx
0 0
π/2 1 π/2
= (sin x)m−2 dx − (sin x)m dx,
0 m−1 0
(2n)!! √
. ∼ nπ . (11.7)
(2n − 1)!!
2n 22n
∼ √ (11.8)
n nπ
2 · 2 · 4 · 4 · 6 · 6 · 8 · 8 · · · 2n · 2n π
. = + o(1). (11.9)
1 · 3 · 3 · 5 · 5 · 7 · 7 · · · (2n − 1) · (2n − 1) 2
Proof We set
√
nπ
q=
.
(2n)!!
(2n−1)!!
2n π/2 2n π/2
q= √
. (sin x)2n dx ≥ √ (sin x)2n+1 dx
nπ 0 nπ 0
2n (2n)!! 1 1 2n
= √ =
nπ (2n − 1)!! 2n + 1 q 2n + 1
11.7 Asymptotic Estimates for the Factorial Function 157
and
2n π/2 2n π/2
.q= √ (sin x)2n dx ≤ √ (sin x)2n−1 dx
nπ 0 nπ 0
2n (2n)!! 1 1
= √ = .
nπ (2n − 1)!! 2n q
Theorem 11.18 (Stirling) As .n → +∞,
√
. n! ∼ nn e−n 2nπ.
xn = n!en n−n−1/2 .
.
xn+1 1 n+1 1
. log xn+1 − log xn = log = − n+ log +1∼−
xn 2 n 12n2
158 11 Elementary Functions
as .n → +∞. As a consequence, the series . n log xn+1 − log xn converges, and
k = lim log xn
.
n→+∞
√
exists in .R. Thus .n! ∼ nn e−n nek . Inserting this into (11.8) we see that
√ √ √
22n 2n (2n)! (2n)2n e−2n 2nek 22n 2 22n 2π
.√ ∼ = ∼ = √ k =√ ,
nπ n (n!) 2 2n
n e ne−2n 2k ne nπ ek
√
and necessarily .ek = 2π. The proof is complete.
11.8 Problems
For what values of x does the series converge absolutely? In what intervals does the
series converge uniformly? If the series converges, is f a continuous function?
11.2 For each n = 1, 2, 3, . . . let
⎧
⎪
⎪ 1
⎨0 if x < n+1
.fn (x) = ≤x≤
1 1
sin2 π
if
⎪
⎪ x n+1 n
⎩0 if x > n1 .
Prove that the sequence {fn }n converges pointwise to a continuous function, but the
converges is not uniform.
11.3 For each real number x, let {x} = x − [x], where [x] denotes the integer part
of x. Let
∞
{nx}
f (x) =
. .
n2
n=1
Find all discontinuity points of f , and prove that these points form a dense,
countable subset of R. Prove also that f is R-integrable on each bounded interval.
11.8 Problems 159
1
. f (x)x n dx = 0 for n = 0, 1, 2, 3, . . .
0
K1 ⊃ K2 ⊃ K3 ⊃ . . . ,
.
Abstract This chapter introduces the second part of the book. Why should we
bother about set theory, again? And what is axiomatic set theory? In some sense,
learning mathematics has a privileged direction: we take something for granted, and
then we proceed. Mathematical analysts are usually satisfied with a good knowledge
of naïve set theory, since this is all they need to do their job. Going backwards
is another story. We are always worried by primitive knowledge, in the sense of
something that we agree to know before we start out journey. So, what is a set?
Why do textbooks begin with the definition of a function as a black box that turns
an element (of a set) into another element (of another set)? And, after all: is this the
only way to begin?
The roots of mathematics are close to philosophy, in the sense that a beginning must
exist. If nothing exists, how can we exist? We only (!) have to choose where to start
from.
It is a general agreement that set theory is indeed the first mathematical chapter
in the big book of all Mathematics. The rigorous construction of sets, functions and
all that has been a long and recent process. If we agree that mathematics should set
axioms and deduce theorems, we should isolate the axioms of set theory.
Nowadays the most popular axiomatization of set theory is ZF, or Zermelo-
Fraenkel. Another axiomatization is due to Bernays, Gödel and Von Neumann. It
is not our purpose to discuss the pros and cons of each theory, since this is a deep
aspect that soon involves mathematical logic. Since this is a book in mathematical
analysis, we prefer to present an overview of two less known theory of sets. The first
one is due to John Kelley, see [3]. The second one is actually a variation on Kelley’s
theme, due to J. D. Monk. These theories share the simple approach of considering
only classes, an undefined category that contains sets. The idea of elements is not a
different object: everything is a class. For the reader’s convenience, a short account
of ZF is also provided.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 163
S. Secchi, A Circle-Line Study of Mathematical Analysis,
La Matematica per il 3+2 141, https://doi.org/10.1007/978-3-031-19738-3_12
164 12 Return to Set Theory
We will see that the word “class” does not appear in any axiom of Kelley’s theory:
instead of assuming the existence of sets, Kelley assumes the existence of classes,
and sets are just special classes.
The natural question now is: why don’t we simply agree that “set = class”? Of
course this is a matter of language, but we always keep in mind that the set of all
sets is not a set.
Important: Variables
We stick here to Kelley’s original habit of using variables in a broad sense, so that
any object should be considered as a class unless otherwise stated. Another popular
approach is to reserve lower case letters to sets, and upper case letters to classes.
Axiom of Extent For each x and each y, it is true that .x = y if and only if
for each z, .z ∈ x if and only if .z ∈ y.
Two classes are equal if and only if each element of each is a member of the
other. The reader will notice that this is the usual definition of equal sets. And here
is the rigorous definition of a set, at last! Sets are just elements of some class.
Definition 12.1 A class x is a set if and only if there exists a class y such that .x ∈ y.
Remark 12.2 Maybe some reader will remember a common approach to naïve set
theory: whenever we name a set, we must agree that it is a subset of some larger set.
This is a naïve response to the paradox of the set of all sets: since we agree to work
12.1 Kelley’s System of Axioms 165
with subsets of a fixed large set, we will never have to deal with the set of all sets.
This is a useful agreement, but it does not face the deep meaning of the paradox.
Now that we have sets, we want to describe the use of the classifier .{. . . | . . .}.
The are two blanks: the first blank for a variable, and a second blank for a formula.
u ∈ {x | . . . x . . .}
.
. ...x ...
is a formula and
. ...u...
9. .{α | A} ∈ β
10. .{α | A} ∈ {β | B}
(c) Formulae are constructed recursively, beginning with the primitive formulae of
(a) and then proceeding via the constructions allowed by (b).
Important: Braces
In old books a peculiar use of braces was common. For instance, one would write
.M = {m} to mean the “typical” element of the class/set M is m. In modern
mathematics and according to our use of the classifier .{. . . | . . .}, .M = {m} means
.{x | x = m}, so that the class/set M contains exactly one element, i.e. m. Please
Luckily enough, our axioms already allow us to introduce new definitions and
prove some theorems. Let us see some of them.
Definition 12.2 (Union of Classes) .x ∪ y = {z | z ∈ x or z ∈ y}.
Definition 12.3 (Intersection of Classes) .x ∩ y = {z | z ∈ x and z ∈ y}.
Theorem 12.1 For each z, .z ∈ x ∪ y if and only if .z ∈ x or .z ∈ y. For each z,
z ∈ x ∩ y if and only if .z ∈ x and .z ∈ y.
.
Proof It follows from the classification axiom that .z ∈ x ∪ y if and only if z is a set,
and .z ∈ x or .z ∈ y. Recalling Definition 12.1, .z ∈ x or .z ∈ y and z is a set if and
only if .z ∈ x or .z ∈ y. By the same token one proves the second statement about
the intersection.
Remark 12.3 Some reader may think that we are playing a strange game. This is
more or less the case. We should always remember that mathematics is concerned
with proving theorems from axioms. In this particular case, it would be much
12.1 Kelley’s System of Axioms 167
stranger if our theorems were unexpected ones! We are trying to define rigorously
what we handle every day, this is the point.
Theorem 12.2 For each x, .x ∪ x = x, and .x ∩ x = x.
Proof Every element of x is a member of .x ∪ x, by definition of the union.
Conversely, if .z ∈ x ∪ x, then z is a set and .z ∈ x or .z ∈ x. Hence .z ∈ x. The
second statement is left as an exercise.
Exercise 12.1 Prove the theorem .x ∩ x = x by mimicking the previous proof.
We collect now three statements that follow directly from the properties of logical
quantifiers “and”, “or”.
Theorem 12.3 For each x, y and z,
1. .x∪y =y∪x
2. .x∩y =y∩x
3. .(x ∪ y) ∪ z = x ∪ (y ∪ z)
4. .(x ∩ y) ∩ z = x ∩ (y ∩ z)
5. .x ∩ (y ∪ z) = (x ∩ y) ∪ (x ∩ z)
6. .x ∪ (y ∩ z) = (x ∪ y) ∩ (x ∪ z).
Exercise 12.2 Prove the previous Theorem, by showing that any two classes
separated by the symbol .= have the same members.
Definition 12.4 For each x and y, .x ∈
/ y if and only if it is false that .x ∈ y.
Definition 12.5 For each x, .x = {y | y ∈/ x}. The class .x is the complement of
the class x.
Theorem 12.4 For each x, . x = x.
Proof By Definition 12.5, . x is the class
{y | y ∈
. / x} = {y | it is false that y ∈ x} = {y | y ∈ x}.
Theorem 12.5 (De Morgan Laws) For each x and y, .(x ∪ y) = x ∩ y, and
(x ∩ y) = x ∪ y.
.
Proof The second statement is left as an exercise. Let us see why the previous
statement is true. For each z, .z ∈ (x ∪ y) if and only if z is a set and it is false
that .z ∈ x ∪ y. Recalling Theorem 12.1, .z ∈ x ∪ y if and only if .z ∈ x or .z ∈ y.
Consequently, .z ∈ (x ∪ y) if and only if z is a set and .z ∈
/ x and .z ∈
/ y. This means
exactly that .z ∈ x and .z ∈ y. By Theorem 12.1, .z ∈ x and .z ∈ y if and only if
.z ∈ x ∩ y. We conclude by the axiom of extent.
Exercise 12.3 Prove that .(x ∩ y) = x ∪ y by mimicking the previous proof.
Definition 12.6 For each x and y, .x \ y = x ∩ y.
168 12 Return to Set Theory
Remark 12.4 We use the modern symbol .\ throughout the book. Other symbols are
found in the literature, like .x − y or .x ∼ y. Kelley systematically writes .∼ x instead
of .x.
Remark 12.5 In naïve set theory, the class .x is usually undefined as a primitive
object. The reason is that all sets must be subsets of some universe U . Hence .x
must reduce to .U \ x. In other words, only the relative complement of a set must be
defined in naïve set theory.
Definition 12.7 (Empty Class) .∅ = {x | x = x}.
Remark 12.6 The empty class is denoted by 0 in [3]. We prefer to use the dedicated
symbol .∅, since 0 is used with a lot of different meanings in mathematics. Another
popular notation is .{}.
Theorem 12.6 For each x, .x ∈
/ ∅.
Proof By definition equality is reflexive, in the sense that for each x, .x = x. Hence
it is false that .x = x, and therefore .∅ cannot contain any element.
Exercise 12.4 Prove that for each x, .∅ ∪ x = x and .∅ ∩ x = ∅.
Definition 12.8 (Universal Class) .U = {x | x = x}. The class .U is called the
universe.
Important: Warning
Beware! .U is not a set! Look ahead in this chapter.
Remark 12.7 We are using here an intrinsic notation. Most readers are probably
familiar with the bound variable notation for a “set of sets”, as we have seen in (2.1)
and (2.2).
since the right-hand side looks like a union of elements, which is naïvely undefined
(what is .0 ∪ 1, if 0 and 1 represent the usual natural number we first met at school?).
Well, this is precisely how andwhen abstraction is needed. To be honest, in naïve
set theory nobody ever writes . {a, b, c}, and the issue disappears. But a, b and c
may very well be sets (or classes)!
The popular belief that lower-case letters are elements and upper-case letters are
sets/classes is, in this perspective, tragic. It should be encouraged only if a basic
knowledge of Set Theory is sufficient.
Theorem 12.9 . ∅ = U and . ∅ = ∅.
Proof For each x, .x ∈ ∅ if and only if x is a set and x belongs to each member
of .∅. We already know that .∅ contains no member at all, hence .x ∈ ∅ if and only
if x is a set, i.e. .x ∈ U.
Similarly, for each x, .x ∈ ∅ if and only if x is a set and x belongs
to some
member of .∅. Since .∅ has no member, such an x cannot exist. Hence . ∅ contains
no member.
Definition 12.11 For each x and y, .x ⊂ y if and only if for each z, if .z ∈ x then
z ∈ y. In this case we say that x is a subclass of y, or that the class x is contained in
.
Remark 12.8 It is tempting to confuse .∈ and .⊂. The language does not help, since
x ∈ y is often read “y contains x”. However it would be definitely wrong to use .⊂
.
as a replacement of .∈.
Exercise 12.6 Prove that .∅ ⊂ ∅, but .∅ ∈
/ ∅.
We collect some basic properties of subclasses, whose proofs are a straightfor-
ward application of the definition.
Theorem 12.10 For each x, y and z,
1. .∅ ⊂ x and .x ⊂ U.
2. .x = y if and only if .x ⊂ y and .y ⊂ x.
3. If .x ⊂ y and .y ⊂ z, then .x ⊂ z.
4. .x ⊂ y if and only if .x ∪ y = y.
5. .x ⊂ y if and only if .x ∩ y = x.
6. If .x ⊂ y, then . x
⊂ y and . y ⊂ x.
7. If .x ∈ y, then .x ⊂ y and . y ⊂ x.
Exercise 12.7 Prove the previous Theorem. When union and intersection of classes
are involved,
it could be
helpful to temporarily switch to a bound variable notation,
e.g. . α∈A xα instead of . x.
Question Do sets exist?
In words, given a set x there exists a set y such that any subclass of x is a member
of y. Let us see a useful consequence of this axiom.
Theorem 12.11 If x is a set and .z ⊂ x, then z is a set.
Proof If x is a set, there is a set y such that .z ⊂ x implies .z ∈ y. Hence z is a
set.
Exercise 12.8 Read carefully the previous proof, and notice that the axiom of
subsets was not used in its full strength. Hint: by definition, if y is a class and .z ∈ y,
then z is a set. Did we use every property of the classes in the axiom of subsets?
Theorem 12.12 .∅ = U and .U = U.
Proof If .x ∈ U, then x is a set and since .∅ ⊂ x, it follows that .∅ is also
a set.
Then .∅ ∈ U and each member of . U belongs to .∅. It now follows that . U has
no member.
To prove the second statement, Theorem 12.10 implies that . U ⊂ U. On the
other hand, if .x ∈ U (i.e. if x is a set) by the axiom of subsets there exists a set y
12.1 Kelley’s System of Axioms 171
2x = {y | y ⊂ x} .
.
Theorem 12.14 .U = 2U .
Proof Every element of .2U is a set and therefore belongs to .U. Conversely, every
member of .U is a set and is contained in .U, so that it belongs to .2U .
Theorem 12.15 If x is a set, then .2x is a set, and for each y, .y ⊂ x if and only if
.y ∈ 2 .
x
Exercise 12.9 Prove the previous Theorem, by using the fact that .U = 2U .
We are now ready for the Russel paradox, which has been haunting us so far.
Example 12.1 Let .R = {x | x ∈ / x}. By the classification axiom, .R ∈ R if and only
if .R ∈
/ R and R is a set. Therefore R is not a set.
Theorem 12.16 .U is not a set.
Proof Otherwise .R ⊂ U, and R would be a set by Theorem 12.11.
Definition 12.13 (Singleton) For each x, .{x} = {z | if x ∈ U, then z = x}.
Theorem 12.17 If x is a set, for each y, .y ∈ {x} if and only if .y = x.
Exercise 12.10 Prove the previous Theorem.
Theorem 12.18 1. If x is a set, then .{x} is a set.
2. For each x, .{x} = U if and only if x is not a set.
Proof If x is a set, then .{x} ⊂ 2x and .2x is a set. Hence .{x} is a set. This proves 1.
Let us prove 2. If x is a set, then .{x} is a set and therefore is not equal to .U (since
.U is not a set). If x is not a set, then .x ∈
/ U and .x ∈ {x} by definition.
Remark 12.9 The symbol .{xy} has some advantages over .{x, y}. However we
prefer the standard notation with a comma which cannot be confused with “the
singleton of the product xy”, as soon as a product of x and y is defined and denoted
by xy.
If x and y are sets, by the axiom of union also .{x, y} is a set. In general, however,
{x, y} is a class.
.
Theorem 12.19 contains the essential information about ordered couples. For the
sake of completeness we record now a deeper survey of both unordered and ordered
couples. We omit their proofs, since we will not refer to them in the sequel.
Theorem 12.20
1. If x is a set and y is a set, then .{x, y} is a set and .z ∈ {x, y} if and only if .z = x
or .z = y. Furthermore, .{x, y} = U if and only if x is not a set or y is not a set.
2. For each x and y, .(x, y) is a set if and only if x is a set and y is a set. If .(x, y) is
not a set, then .(x, y) = U.
Theorem 12.21 For all x and y, there results
y) = x;−1
1. . (x,
2. . {(x, y)} = y.
Proof Indeed statement 1. follows from
. (x, y) = {{x}, {x, y}} = {x} = x.
Statement 2. follows from 1. and the fact that . {(x, y)}−1 = (y, x).
We are therefore led to the following definition, which is mainly interesting in
the case of an ordered pair.
12.1 Kelley’s System of Axioms 173
We call .π1 x the first coordinate of x, and .π2 x the second coordinate of x.
We now want to define functions. As we saw in Chap. 1, a function is a particular
type of relation, and relations were defined as sets of ordered pairs. It is by now
clear that we can formulate the following definition.
Definition 12.17 A class r is a relation if and only if for each member z of r there
exist x and y such that .z = (x, y).
Remark 12.10 We state clearly that a relation is not a set of ordered pairs. It is a
class of ordered pairs.
Definition 12.18 The composition .r ◦ s of the relations r and s is
for some x, some y and some z, u =
r ◦ s = u
.
(x, z), (x, y) ∈ s and (y, z) ∈ r
Remark 12.11 To save space, we will abbreviate .{u | for some x, some z, .u =
(x, z) and . . . .} with .{(x, z) | . . .}. In particular, .r ◦ s = {(x, z) | for some y,
.(x, y) ∈ s and .(y, z) ∈ r}.
Definition 12.19 For each relations r, .r −1 = {(x, y) | (y, x) ∈ r}. The relation
r −1 is the inverse relation of r.
.
Exercise 12.11 Prove that, for each relation r and each relation s, .(r −1 )−1 = r,
.(r ◦ s)
−1 = s −1 ◦ r −1 .
Definition 12.20 A relation f is a function if and only if for each x, each y and
each z, if .(x, y) ∈ f and .(x, z) ∈ f , then .y = z.
Two functions can always be composed as relations. But they can also be
composed as functions.
Theorem 12.22 If f is a function and g is a function, then .g ◦ f is a function.
Proof Assume that .(x, y) ∈ g ◦f , .(x, z) ∈ g ◦f . There exists .y1 such that .(x, y1 ) ∈
f and .(y1 , y) ∈ g. Similarly there exists .y2 such that .(x, y2 ) ∈ f and .(y2 , z) ∈ g.
Since f is a function, .y1 = y2 . For the same reason, .y = z, and the proof is
complete.
Definition 12.21 For each f ,
. dom f = {x | (∃y)(x, y) ∈ f }
174 12 Return to Set Theory
and
. ran f = {y | (∃x)(x, y) ∈ f }.
The class .dom f is the domain of f , the class .ran f is the range of f .
Exercise 12.12 Prove that .dom U = U = ran U. Hint: If .x ∈ U, then x is a set
and both .(x, ∅), .(∅, x) belong to .U.
Definition 12.22 For each f and each x, we define
f (x) =
. {y | (x, y) ∈ f }.
Hence .z ∈ f (x) if and only if z belongs to the second coordinate of each member
of f whose first coordinate is x. The class .f (x) is the value of f at x.
Remark 12.12
. {y | (x, y) ∈ f } = {z | for each w, if w ∈ {y | (x, y) ∈ f } then z ∈ w},
Clearly f is not a function, and yet .f (3) = 4, .f (9) = 9 are defined. But what
is the image of 1? In the naïve sense, 1 this question is meaningless, since f is
not a function. In MK theory .f (1) must belong to both .{1} and .{2}, i.e. .f (1) =
∅. Similarly, .f (0) = ∅, since 0 is not an element of the domain of f . Needless
to say, nobody really needs to define the image of a point outside the domain, in
everyday mathematics. But Axiomatic Set Theory exists because we need a coherent
deductive theory that does not lean on intuition to prove or disprove statements about
sets.
Theorem 12.23 If .x ∈
/ dom f , then .f (x) = U. If .x ∈ dom f , then .f (x) ∈ U.
12.1 Kelley’s System of Axioms 175
Proof In the first case, .{y | (x, y) ∈ f } = ∅, and the intersection of the empty class
is .U. In the second case, .{y | (x, y) ∈ f } = ∅, and Theorem 12.13 implies that
. {y | (x, y) ∈ f } is a set.
Theorem 12.24 If f is a function, then .f = {(x, y) | y = f (x)}.
Exercise 12.13 Prove the previous Theorem.
Theorem 12.25 (Equality of Functions) For each function f and each function
g, .f = g if and only if .f (x) = g(x) for each x.
Proof Indeed .f = g if and only if .{(x, y) | y = f (x)} = {(x, y) | y = g(x)} and
this happens if and only if .f (x) = g(x) for each x.
We are in a position to enlarge our collection of axioms.
Axiom of Amalgamation If x is a set then . x is a set.
Remark 12.14 We should compare Theorem 12.13 with the axiom of amalgama-
tion.
Definition 12.23 For each x and each y,
.x × y = {(u, v) | u ∈ x, v ∈ y}.
Proof Indeed .f ⊂ dom f × ran f . Since the right-hand class is a set, also f is a
set.
Definition 12.24 For each x and each y,
Remark 12.16 Equivalently, the axiom of choice gives us a function c such that
c(x) ∈ x for each non-empty set x.
.
Definition 12.27 A subset C of a partially ordered set .(P , <) is chain in P if and
only if C is totally ordered by .<. An element .u ∈ P is an upper bound of C if and
only if .c ≤ u for every .c ∈ C. Finally, an element .a ∈ P is a maximal element if
there is no .x ∈ P such that .a < x.
Zorn’s Lemma 1 Let .(P , <) be a non-empty partially ordered set. If every chain
in P has an upper bound, then P has a maximal element.
Definition 12.28 A collection .F of sets has finite character if and only if the
following condition holds: for every X, .X ∈ F if and only if every finite subset
of X belongs to .F.
Tukey’s Lemma 2 If a non-empty collection .F of sets has finite character, then .F
has a maximal element with respect to the inclusion .⊂.
Theorem 12.29 The following statements are equivalent:
(i) the Axiom of Choice;
(ii) the Well-ordering Principle;
(iii) Zorn’s Lemma;
(iv) Tukey’s Lemma.
Proof Suppose (i), and let S be any set. Let F be a choice function on the familyof
all non-empty subsets of S. Now we define .a0 = F (S), .aξ = F (S \ aη η < ξ ).
The construction stops as soon as we exhaust all elements of S. Hence (ii) holds.
Now assume that (ii) holds, and let .(P , <) be a non-empty partially ordered
set. Assume that every chain of P has an upper bound. To construct a maximal
element, we start from the assumption that P can be well-ordered, so that there is
an enumeration
P = p0 , p1 , . . . , pξ , . . .
. ξ <α
for some ordinal number .α. We set .c0 = p0 and .cξ = pγ , where .γ is the smallest
ordinal such that.pγ is an upper bound of the chain .C = cη η < ξ and .pγ ∈ / C.
We remark that . cη η < ξ is always a chain, and that .pγ exists unless .cξ −1 is
a maximal element of P . In the end, the construction must stop, and we obtain a
maximal element of P . This proves (iii).
Suppose now that (iii) holds. We consider a non-empty family .F of sets and we
assume that .F has finite character. Clearly .F is partially ordered by inclusion. If .C is
a chain in .F and if .A = {X | X ∈ C}, then every finite subset of A belongs to .F and
therefore .A ∈ F. It follows at once that A is an upper bound of .C. We may apply
Zorn’s Lemma and obtain a maximal element of the collection .F. This proves the
validity of (iv).
Finally, assume that (iv) holds, and let .F be a collection of non-empty sets. We
need to construct a choice function on .F. To this aim we consider the collection
Since a subset of a choice function is a choice function, it follows that .G has finite
character. By assumption .G possesses a maximal element F . By maximality, the
domain of F is .F, and the proof is complete.
Our next task is to construct the (positive) integers from set theory. As we will
see, this is somehow a painful task, while the subsequent steps—from integers to
rationals, and from rationals to real numbers—are rather standard.
Remark 12.17 Using Kelley’s words, the axiom of regularity avoids the possibility
that there exist a class z whose member exist by “taking in each other’s laundry, in
the sense that every member of z consists of members of z.”
∀x∀y(x ∈ y ∧ y ∈ x)
.
is false.
Proof If .x ∈ y and .y ∈ x, then both x and y are sets and are the only members of
{z | z = x or z = y}. To this class we apply the axiom of regularity, and we reach a
.
Good. Let us now sketch the basic idea to construct (positive) integers from set
theory. We want to start from .0 = ∅, then define .1 = 0∪{0}, .2 = 1∪{1}, .3 = 2∪{2},
and so on. Of course this is not a full definition, since the previous “and so on”
requires explanation. We must now provide a rigorous definition of the ordinals.
Definition 12.29 (The .∈-Relation) .E = {(x, y) | x ∈ y}.
Theorem 12.32 E is not a set.
Proof Assume that .E ∈ U, then .{E} ∈ U and .(E, {E}) ∈ E. We recall that
.(x, y) = {{x}, {x, y}} and, if .(x, y) is a set, .z ∈ (x, y) if and only if .z = {x} or
.z = {x, y}. We deduce that .E ∈ {E} ∈ {{E}, {E, {E}}} ∈ E. Let us summarize:
With this definition in mind, we see that positive integers are ordered in a way
that is compatible with the usual naïve definition of inequality between numbers.
Axiom of Extent For each x and each y, it is true that .x = y if and only if
for each z, .z ∈ x if and only if .z ∈ y.
Axiom of Amalgamation If x is a set then . x is a set.
By this Axiom, two classes A and B are the same class if and only if they share
the same members.
182 12 Return to Set Theory
Definition 12.39 (Sets) A class A is a set if and only if there exists a class B such
that .A ∈ B. A is a proper class if and only if A is not a set.
As a rule, sets will be denoted by lower case letters: a, b,. . . , x, y, z.
Exercise 12.14 Prove that .∀a .∃B .(a ∈ B).
Exercise 12.15 Prove that .∀x(x ∈ A ⇐⇒ x ∈ B) ⇒ A = B.
Definition 12.40 The expressions .A = A, .A = B, .A = C, . . . , .B = A, .B = B,
.B = C, . . . .C = A, .C = B, .C = C,. . . are all set-theoretical formulas, as are .A ∈ A,
.A ∈ B, .A ∈ C, . . . , .B ∈ A, .B ∈ B, .B ∈ C, . . . .C ∈ A, .C ∈ B, .C ∈ C,. . .
and so on for other letters. Letters different than X may also be used.
This axiom allows us to define classes by specifying the properties which each
member must satisfy.
Example 12.2 If .ϕ(X) is .X ∈ X, the class-building axiom allows us to construct
the class of all sets.
Definition 12.41 For any set-theoretical formula .ϕ(X) not involving A, let
. {X | ϕ(X)}
This axiom becomes clearer if we imagine that a is a family of sets, so that there
exists another set b which contains every member of a,
Definition 12.43 (Empty Class) .∅ = {x | x = x}.
Theorem 12.36 For all X, .X ∈
/ ∅.
Proof If .X ∈ ∅, then X is a set and .X = X, a contradiction.
Definition 12.44 The intersection of two classes A and B is defined as
.A ∩ B = {x | x ∈ A ∧ x ∈ B} .
S(A) = {x | x ∈ A ∨ x = A} .
.
184 12 Return to Set Theory
Remark 12.21 We cannot write .S(A) = A ∪ {A}, since singletons have not been
introduced yet.
Theorem 12.37 If A is a proper class, then .S(A) = A.
Proof Suppose that X is any class. If .X ∈ A, then X is a set, and .X ∈ A or .X = A.
Hence .X ∈ S(A). If .X ∈ S(A), then .X ∈ A or .X = A, and X is a set. But A is not
a set, hence .X = A. By the axiom of extensionality, .A = S(A).
. {A, B} = {x | x = A ∨ x = B} .
By imitating Theorem 12.38, one can prove that the ordered pair of two sets is a
set.
show that .b = d. We apply Theorem 12.39 again and we conclude that either .a = c
and .b = d, or .a = d and .b = c. In both cases .a = c and .b = d.
In the second case, .{a} = {c, d} and .{a, b} = {c}. Now .c ∈ {c, d}, so that .c ∈ {a}
and consequently .c = a. Similarly, .d = a and .b = c, so that .a = c and .b = d. The
proof is complete.
Let us turn to relations.
Definition 12.49 A class R is a relation if and only if
As usual, a relation is just a any class of ordered pairs. A function is just a “rule”
which assigns to each set of its domain a unique set of its codomain.
Monk’s final axiom is a very strong form of the Axiom of Choice. It is indeed
stronger than the usual one.
With these axioms one can construct the familiar boolean algebra of sets and of
relations, with minor differences with respect to our previous discussion. Monk’s
Theory of Sets is actually equivalent to Kelley’s one. The strength of the Relational
Axiom of Choice may be used to prove stronger results directly in Monk’s theory,
but most mathematicians do not usually see any concrete difference.
12.5 ZF Axioms
As we have stated at the beginning, the most popular axiomatization of Set Theory
goes under the name of Zermelo and Fraenkel. Although we have preferred another
approach, we believe it may be useful for the reader to have at least an account of
ZF. It should be noticed that ZF requires both sets and elements.
Axiom 10 (Extensionality Axiom) Two sets are equal if and only if they
have the same elements. In symbols:
∃A∀x(x ∈
. / A)
Axiom 12 (Subset Axiom) Let .ϕ(x) be a formula. For every set A there
exists a set S that consists of all the elements .x ∈ A such that .ϕ(x) holds.
In symbols,
∀A∃S∀x(x ∈ S ⇐⇒ (x ∈ A ∧ ϕ(x))
.
Axiom 13 (Pairing Axiom) For every u and v there exists a set that consists
of just u and v. In symbols:
∀u∀v∃A∀x(x ∈ A ⇐⇒ (x = u ∨ x = v))
.
12.5 ZF Axioms 187
Axiom 14 (Union Axiom) For every set .F there exists a set U that consists
of all the elements that belong to at least one set in .F. In symbols,
Axiom 15 (Power Set Axiom) For every set A there exists a set P that
consists of all the sets that are subsets of A. In symbols:
Axiom 16 (Infinity Axiom) There exists a set I that contains the empty set
as an element and whenever .x ∈ I , then .x ∪ {x} ∈ I . In symbols:
∃I (∅ ∈ I ∧ ∀x(x ∈ I ⇒ x ∪ {x} ∈ I ))
.
12.6 From N to Z
(m, n) + (m% , n% ) = (m + m% , n + n% ).
.
Definition 12.53 (The Relative Integers) The symbol .Z denotes the set .X/ ∼, i.e.
the set of equivalence classes determined by .∼. The equivalence class of the element
.(m, n) is denoted by .[(m, n)].
a + b = [(m + p, n + q)].
.
Exercise 12.17 Prove that the sum .a + b is independent of the particular ordered
couples .(m, n) and .(p, q) that describe the classes a and b. More precisely, show
that .(m + p, n + q) ∼ (m% + p% , n% + p% ) whenever .(m, n) ∼ (m% , n% ) and .(p, q) ∼
(p% , q % ).
Definition 12.56 (Opposite) For each .a ∈ Z, we define .−a as the unique element
x ∈ Z such that .a + x = 0.
.
Exercise 12.18 Prove that the previous definition is consistent, in the sense that .−a
does exist for each .a ∈ Z.
Definition 12.57 (Difference) For each a and b in .Z, we define .a − b = a + (−b).
12.7 From Z to Q 189
By the previous exercise, for each .a ∈ Z+ there exists a unique .n ∈ N such that
f+ (n) = a. We write .n = f+−1 (a).
.
Exercise 12.20 Prove that .(m, n) ∼ (m% , n% ) and .(p, q) ∼ (p% , q % ) imply .(m, n) ∗
(p, q) ∼ (m% , n% ) ∗ (p% , q % ).
By the previous exercise, the product on X can be extended to a product on .Z.
Definition 12.61 For each .a = [(m, n)] and .b = [(p, q)] in .Z, we define
Remark 12.23 In general this product does not possess a inverse element, in the
sense that, given a and b in .Z, .a = [(0, 0)], the equation .a ∗ x = b is not solvable
in .Z.
12.7 From Z to Q
The next extension of our number sets will be the set of rational numbers. While
everybody thinks of a rational number as a fraction like .p/q with .p ∈ Z and .q ∈
N \ {0}, this approach is purely naïve. Indeed we did not—and could not—define a
division operation in .Z. Again, the road goes through equivalence classes.
Definition 12.62 .V = {(r, s) ∈ Z × Z | s = 0}. We define an equivalence relation
∼ on V by setting .(r, s) ∼ (t, u) if and only if .ru = st. Notice that we are writing
.
rs instead of .r ∗ s, and so on. The equivalence class of the element .(r, s) will be
denoted by .[(r, s)].
Definition 12.63 (Rational Numbers) A rational number is any element of .V / ∼.
If .[(r, s)] ∈ V / ∼, we simply write .r/s. The set .Q is the set whose members are all
the rational numbers.
190 12 Return to Set Theory
Exercise 12.21 Prove that the previous definition is consistent: if .(r % , s % ) ∼ (r, s)
and if .(t % , u% ) ∼ (t, u), then .(ru + st, su) ∼ (r % u% + s % t % , s % u% ).
Definition 12.66 (Multiplication of Rational Numbers) For each .[(r, s)] and
[(t, u)] in .Q, we define
.
and
= [(xy, 1)]
= ι(x ∗ y)
Exercise 12.23
1. Prove that .[(1, 1)] is neutral for multiplication: for each .[(r, s)] there results
.[(r, s)] ∗ [(1, 1)] = [(r, s)].
2. Prove that for each .[(r, s)] = [(0, 1)], there exists a unique .[(x, y)] such that
.[(r, s)] ∗ [(x, y)] = [(1, 1)]. Hint: a good candidate is .x/y = s/r.
Remark 12.25 Every rational number can be uniquely written as .r/s with .s ≥ 1.
Indeed, if .s ≤ −1, we recall that .[(r, s)] = [(−r, −s)]. In the rest of this section,
we will systematically assume that rational numbers are presented in this form.
Definition 12.67 (Ordering) A relation .< is defined on .Q by .[(r, s)] < [(t, u)] if
and only if .ru < st.
Remark 12.26 If .r/s = r % /s % and .t/u = t % /u% , then .rs % = r % s and .tu% = t % u.
Therefore .[(r, s)] < [(t, u)] if and only if .ru < st if and only if .rus % u% < sts % u%
if and only if .f % suu% < ss % t % u if and only if .r % u% < s % t % if and only if .[(r % , s % )] <
[(t % , u% )]. We deduce that the ordering definition is consistent in .Q.
12.8 From Q to R
Our last step in set and number theory is a concrete construction of real numbers.
We have introduced .R as a totally ordered field which satisfies Dedekind’s axiom.
But this definition remains somehow vague until we prove that such a numerical
structure exists. To this aim several approaches have been proposed. A classical one
is via Dedekind cuts, see [4].
We present a different approach that lies on a more analytical construction, due
to Cantor.
Definition 12.68 A sequence .x = {xn }n of rational numbers is a Cauchy sequence
of rational numbers if for each rational number .a > 0, there exists .N ∈ N such that
.|xm − xn | < a for each .m ≥ N and each .n ≥ N. We denote by .C the set of all
Proof To prove 1, we first remark that .{xn }n and .{yn }n are bounded sequences as in
Proposition 5.3. Then we merely use the triangle inequality:
Since the two sequences and in .C and bounded, the right-hand side of the last
inequality can be made smaller than any given positive rational number by taking m
and n sufficiently large. In a similar way we prove 2 and 3.
Definition 12.72 An equivalence relation on .C is defined as follows: for each .x ∈ C
and each .y ∈ C, .x ∼ y if and only if their difference .x − y is an element of .Z. We
denote by .[x] the equivalence class of the sequence .x ∈ C.
Definition 12.73 A real number is any element of .C/ ∼, i.e. any equivalence class
of a sequence in .C. We denote by .R the set of all real numbers.
Let us stop for a moment: what are we doing? Well, the intuition is that a real
number is the “limit” of a sequence of rational numbers. But this would not be a
good definition, since—of course—there are sequences of rational numbers that do
not converge to a rational number. We are forced to take limits in .Q, but we need to
go outside .Q! The only way out is to remove the requirement that our sequences of
rational numbers have a limit, and this is done by using the Cauchy condition. Then
we identify rational sequences that “converge” to the same number, in the sense that
their difference converges to zero. This is the equivalence relation .∼.
In this way we have avoided any use of the term limit, which would be somehow
misleading at this stage. But we have preserved the original intuition of “adding to
.Q the limits of rational sequences that satisfy the Cauchy property.”
for each .[x] and each .[y] in .R, is a field. It is also totally ordered by the
relation .[x] < [y] if and only if .y − x is a positive real number in the sense of
Definition 12.75.
Luckily enough, .Q may be identified with a subset of .R. Indeed, let .ϕ : Q → R
be the function defined by .ϕ(p) = [p̄], where .p̄ is the constant sequence .n → p.
It can be easily proved that .ϕ is injective, so that .ϕ : Q → ϕ(Q) is a bijection.
Furthermore .ϕ(p + q) = ϕ(p) + ϕ(q) and .ϕ(pq) = ϕ(p)ϕ(q).
The last—and the most intriguing—property of .R is clearly the fact that .R has
the least upper bound property: if a subset E of .R has an upper bound, then there
exists the least upper bound of E. The proof of this fact requires some work.
Theorem 12.46 If .r = [x] is a real number and if there exists .N ∈ N such that
xN ≥ 0, then .r ≥ 0.
.
Proof By assumption, .0 < s−r. By the previous results, we can choose .p ∈ Q such
that .0 < p < s − r. Let x and y be sequences in the classes r and s, respectively.
Now .0 < (s − r) − p implies that the sequence .n → yn − xn − p is eventually
positive. We choose .M ∈ N with .ym > xm + p for each .m ≥ M. Since x is a
Cauchy sequence, there exists .N ≥ M, .N ∈ N, such that .|xN − nn | ≤ p/4 for each
.n ≥ N.
Let .q1 = xN + p/4 and .q2 = xN + 3p/4. For each .n ≥ N, .q1 − xn = p/4 +
xN − xn ≥ p/4 + |xN − xn | ≥ 0 implies .r ≤ q1 , and similarly .q2 ≤ s. The rational
number .q = (q1 + q2 )/2 satisfies .r < q < s.
Theorem 12.49 Let A be a nonempty subset of .R. If A has an upper bound, then a
number .p ∈ Q exists such that p is not an upper bound of A but .p + 1 is an upper
bound of A.
194 12 Return to Set Theory
bound of A.
Theorem 12.50 Let A be a nonempty subset of .R. If A has an upper bound, then
there exist two sequences of rational numbers .{pn }n and .{qn }n such that, for each
.n ∈ N, .pn is not an upper bound of A and .qn is an upper bound of A. Furthermore,
.qn − pn = 2
1−n .
.qn+1 − pn+1 = 2
−n , and the proof is complete.
Theorem 12.51 The sequences .{pn }n and .{qn }n are equivalent Cauchy sequences.
Proof By construction, .pn ≤ pn+1 ≤ qn+1 ≤ qn for each n. It follows that .pn ≤
pn+k ≤ qn+k ≤ qn for each k. Thus, if .m ≥ n, .|pn − pm | ≤ |pn − qn | ≤ 21−n .
Since .{21−n }n is a zero sequence, we see that .{pn }n is a Cauchy sequence. By the
same token, .{qn }n is a Cauchy sequence. Since .qn − pn = 21−n , .{pn }n ∼ {qn }n .
Theorem 12.52 (Existence of the Least Upper Bound) Let A be a non-empty
subset of .R. If A has an upper bound, then there exists in .R a least upper bound of
A.
Proof We begin with the sequences .{pn }n and .{qn }n constructed above. Since they
are equivalent Cauchy sequences, they represent the same real number r. We claim
that .r = sup A.
Suppose r is not an upper bound of A. Then there exists .a ∈ A such that .r < a.
By Theorem 12.48 we can select a rational number q with .r < q < a. The sequence
.n → q − qn represents the number .q − a ∈ R. Hence this sequence must be
eventually positive., and there exists .N ∈ N such that .q > qn for each .n ≥ N. This
implies .qn < q < a for each .n ≥ N, so that .qn is not an upper bound of A, against
the property of .qn . This contradiction proves that r is an upper bound of A.
To see that .r = sup A, we suppose that A possesses an upper bound .s ∈ R with
.s < r. Again, we find .p ∈ Q such that .s < p < r, and .n → pn − p is eventually
positive. As above, .pn > p > s for large .n ∈ N. This implies that .pn is an upper
bound of A, against the main property of .pn . This contradiction shows that no real
number less than r can be an upper bound of A, i.e. .r = sup A.
12.9 About the Uniqueness of R 195
One of the most folkloristic statements of Mathematical Analysis says that the
system of real numbers is unique, up to isomorphisms. However, the vast majority
of the textbooks that I have used in my life omit a rigorous proof of this (true) fact.
Following the survey [1], we will learn that all models (i.e. constructions) of .R are
indeed equivalent up to renaming objects.
Definition 12.76 Let .K = (K, +, ·, ≤) be a totally ordered field, in the sense
described in Chap. 3. For any non-empty subsets X and Y of .K, we write .X ≤ Y to
mean that
∀x∀y(x ∈ X ∧ y ∈ Y ⇒ x ≤ y).
.
∀x∀y(x ∈ X ∧ y ∈ Y ⇒ x ≤ z ≤ y).
.
The element z separates X and Y , in the sense of the order relation .≤.
As a first step, we prove that Dedekind completeness is equivalent to the property
of the least upper bound.
Theorem 12.53 Let .K be a totally ordered field. The following properties are
equivalent:
(i) .K is Dedekind complete;
(ii) every non-empty subset X of .K which is bounded from above possesses a least
upper bound in .K;
(iii) every non-empty subset Y of .K which is bounded from below possesses a least
upper bound in .K;
Proof Suppose (i) holds true, and let X be a non-empty subset of .K which is
bounded from above. The set .U of all upper bounds of X is non-empty, and trivially
.X ≤ U. By (i), there exists .z ∈ K such that .X ≤ {z} ≤ U. Since .z ∈ U, we
conclude that .z = min U, which implies that z is the least upper bound of X. Hence
(ii) is proved.
Suppose now that (ii) holds, and let X, Y be two non-empty subsets of .K such
that .X ≤ Y . Since every element of Y is an upper bound of X, the set .U of all upper
bound of X is non-empty. By (ii), there exists .z ∈ K such that .z = sup X = min U.
But .Y ⊂ U, hence .z = min Y and .{z} ≤ Y . On the other hand, .X ≤ {z} because
.z ∈ U, and thus (i) is proved.
The proof that (i) is equivalent to (iii) is similar, and it is left as an exercise.
196 12 Return to Set Theory
Here comes the most technical part: we will be somehow sketchy, and refer to
[1] for the details. First of all, any ordered field contains an isomorphic copy of the
rational numbers.
Theorem 12.54 Let .K be an ordered field. Then there exists a subfield .Q(K) of .K
isomorphic to .Q.
Proof We temporarily denote by .1K the unit element (with respect to the multipli-
cation) of .K. First we embed .N into .K as follows:
.ϕ : n ∈ N → 1K + · · · + 1K ,
Finally, the required copy of the rational numbers is the range of the function
.e : Q → K defined as follows: for every .r = m/n ∈ Q such that .m ∈ Z, .∈ N,
.n > 0 and either .m = 0 or m and n are coprime, we let
Proof Pick .x ∈ K and .ε > 0. Since .Q is dense in .K, there exist numbers .p ∈ Q
and .q ∈ Q such that
which in turn imply .x − ε < ϕ(x) < x + ε. Since .ε does not depend on x, we
conclude that .ϕ(x) = x, and the proof is complete.
We are ready to learn that Dedekind complete totally ordered fields are unique up
to monotonic isomorphisms. Although the meaning of the terms could be already
clear, we formalize some definitions.
Definition 12.78 Let .K1 = (K1 , +1 , ·1 , ≤1 ) and .K2 = (K2 , +2 , ·2 , ≤2 ) be ordered
fields. A function .ϕ : K1 → K2 is an increasing isomorphism if and only if .ϕ is
bijective and
Lx = {q ∈ Q | q ≤ x} .
.
It follows that .Lx is bounded from above in .Q. Understanding the identification
of the rational numbers in .K1 and .K2 , we deduce that .Lx is bounded from above
also in .K2 . The Dedekind completeness of .K2 allows us to define .ϕ : K1 → K2 by
declaring that .ϕ(x) is the least upper bound in .K2 of .Lx .
If .x < y in .K1 , then .Lx ⊂ Ly by density of .Q in .K1 , so that .ϕ is increasing.
Furthermore, if .x ∈ Q, then x is trivially the maximum of .Lx in .K2 , and then
.ϕ(x) = x.
To complete the proof, we must ensure that .ϕ also preserves the algebraic
operations., namely
(a) .ϕ(x + y) = ϕ(x) + ϕ(y);
(b) .ϕ(xy) = ϕ(x)ϕ(y)
for every .x ∈ K1 , .y ∈ K1 . We will exploit the density of the rational numbers: more
precisely, we claim that, given .x ∈ K1 , .y ∈ K2 and .ε > 0, there results
Fix rational numbers .p1 , .q1 , .p2 and .q2 such that
ε ε ε ε
x−
. < p1 < x < p2 < x + , y− < q1 < y < q2 < x + .
2 2 2 2
It follows from the properties of .ϕ that
ε
ϕ(x) < ϕ p1 +
. ,
2
or
ε
ϕ(x) −
. < p1 .
2
Similarly it follows that
ε
ϕ(x) −
. < p1
2
ε
p2 < ϕ(x) +
2
ε
ϕ(y) − < q1
2
ε
q2 < ϕ(y) + .
2
Putting these inequalities together, we deduce that
But .p1 + q1 < x + y < p2 + q2 , hence .p1 + q1 < ϕ(x + y) < p2 + q2 , and the
proof of (12.1) follows.
We prove (12.2) under the additional assumptions that .0 ≤ x and .0 ≤ y in .K1 .
The general case follows by replacing x and y with .−x and .−y.
References 199
Fix rational numbers .p1 , .q1 , .p2 and .q2 such that
It follows that .p1 q1 < xy < p2 q2 , .p2 − r < x < p1 + r, .q2 − r < y < q1 + r. The
properties of .ϕ imply that .p1 q1 < ϕ(xy) < p2 q2 , and .p2 − r < ϕ(x) < p1 + r,
.q2 − r < ϕ(y) < q1 + r. Then
We conclude that
remain logically ambiguous. Theorem 12.56 ensures that we can speak of real
numbers only up to relabeling elements in an increasing manner. Since Analysts
need to make computations, what really matters to their eyes are models of .R. We
have seen Dedekind cuts and equivalence classes of rational Cauchy sequences, but
other modes are possible. In everyday life, most mathematicians simply use the
formal properties of the real numbers to carry on.
References
1. G. Devillanova, G. Molica Bisci, The fabuloud destiny of Richard Dedekind. Atti Accad.
Peloritana Pericolanti 99(S1), A18 (2021)
2. T.J. Jech, The Axiom of Choice (North-Holland Publishing Company, 1973)
200 12 Return to Set Theory
3. J.L. Kelley, General Topology. Graduate Texts in Mathematics, No. 27 (Springer, New York,
1975). Reprint of the 1955 edition [Van Nostrand, Toronto, Ont.]
4. W. Rudin, Principles of Mathematical Analysis. International Series in Pure and Applied
Mathematics, 3rd edn. (McGraw-Hill Book Co., New York, 1976)
Chapter 13
Neighbors Again: Topological Spaces
I am happy to say that the disease of axiomatic topology has been almost totally cured. Right
now I don’t care a bit whether every .β-capsule of type . is also a T-spot of the second kind.
(Edwin Hewitt)
is usually called the space of the topology τ , and τ is a topology for X. The pair
(X, τ ) is a topological space.
Remark 13.1 Our conditions imply that X = τ is necessarily a member of τ ,
since τ is a subfamily of itself and every member of τ is a subset of X. Similarly,
the empty set ∅ is a member of τ , since it is the union of all the elements of the
empty subfamily of τ .
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 201
S. Secchi, A Circle-Line Study of Mathematical Analysis,
La Matematica per il 3+2 141, https://doi.org/10.1007/978-3-031-19738-3_13
202 13 Neighbors Again: Topological Spaces
Example 13.1 For a given set X, the topology τ = {X, ∅} is called the indiscrete
topology. This topology possesses only two open sets.
Example 13.2 For a given set X, the topology τ = {U | U ⊂ X} is called the
discrete topology. Every set is open in this topology.
Exercise 13.1
1. Let X = {1, 2, 3} and τ = {∅, {1, 2, 3}, {1, 2}, {3}}. Show that τ is a topology on
X.
2. Let X = {1, 2, 3} and τ = {∅, {1, 2, 3}, {1, 2}, {1, 3}}. Show that τ is not a
topology on X.
3. Let X = {1, 2}. Write down all sets of subsets of X, and decide which are
topologies on X.
4. Let τ consist of ∅, R, and all sets of the form [a, +∞), a ∈ R. Show that τ is not
a topology on R.
Exercise 13.2 Show that τ = {S ⊂ R | R \ S is a finite set} ∪ {∅} is a topology on
R. This is the topology of finite complements.
Definition 13.2 Let τ1 and τ2 be two topologies for a set X. If τ1 ⊂ τ2 , we say that
τ1 is smaller than τ2 .
Remark 13.2 If τ1 is smaller than τ2 , many mathematicians say that τ1 is stronger
than τ2 , or equivalently that τ2 is weaker than τ1 . This is somehow counterintuitive,
since a smallness is seldom associated to strongness. For this reason we will try to
use the set-theoretic language of smaller and larger in the rest of this chapter.
Exercise 13.3 Let X be a set, and let τ be a topology for X. Show that any open
set for τ is also an open set for the discrete topology, and that any open set for
the indiscrete topology is also an open set for τ . Deduce that the indiscrete and the
discrete topology for a set X are respectively the largest and the smallest topology
for X.
Example 13.3 Let us go back to the real line. The usual topology of R is the family
of all subsets U of R such that for any x ∈ U there exists an open interval (a, b)
such that (a, b) ⊂ U .
13.1 Topological Spaces 203
Exercise 13.4 Prove that the usual topology of R is indeed a topology in the sense
of our definition. In particular, notice that any open interval is an open set for the
usual topology.
Exercise 13.5 We define a family σ of subsets of R as follows: W ∈ σ if either
W = ∅, or R \ W is a finite set. Prove that σ is a topology on R, and that this
topology is smaller that the usual topology of R.
Definition 13.3 A set U of a topological space (X, τ ) is a neighborhood of a point
x if and only if U contains an open set to which x belongs.
Remark 13.3 A neighborhood need not be an open set. The reader should be aware
that several mathematicians say that U is a neighborhood of x if and only if x ∈ U
and U is open. The two definitions are not equivalent, of course.
Example 13.4 In R with the usual topology, (0, 1) is clearly a neighborhood of
1/2. But also [0, 1) is a neighborhood of 1/2. On the contrary, (0, 1) is not a
neighborhood of 0, and [0, 1) is not a neighborhood of 0.
Exercise 13.6 If x ∈
/ U , can U be a neighborhood of x? If x ∈ U , is U necessarily
a neighborhood of x? Discuss.
The whole topology of a space can be described in terms of its neighborhoods.
The next result characterizes open sets.
Theorem 13.1 A set is open if and only if it contains a neighborhood of each of its
points.
Proof Let A a subset of a topological space. If A is open, then A ⊂ A, so that it
contains a neighborhood of each of its points. Conversely, suppose that A contains a
neighborhood of each of its points. The union U of all open subsets of A is an open
set, according to the definition of a topology. Then each x ∈ A belongs to some
open subset of A, and hence x ∈ U . Hence A = U , and A is open.
Definition 13.4 The neighborhood system of a point is the family of all neighbor-
hoods of that point.
Definition 13.5 (Closed Sets) In a topological space (X, τ ), a subset A is closed if
and only if X \ A is an open set.
Exercise 13.7 Recall that X \ (X \ A) = A. Deduce that the complement of a
closed set is an open set. In conclusion, a subset is open if and only if its relative
complement is closed.
Exercise 13.8 Luckily enough, any closed interval [a, b] is a closed set of R with
its usual topology. Prove this.
Exercise 13.9 What are the closed sets for the discrete and the indiscrete topologies
on a give set X?
204 13 Neighbors Again: Topological Spaces
Exercise 13.10 Using De Morgan’s laws from Set Theory, prove that the union
of any two closed subsets is a closed subset. Furthermore, the intersection of any
subfamily of closed sets is a closed set.
Example 13.5 The topology of upper semicontinuity on R is the topology whose
open sets are all the subsets of the form (−∞, a), where a ∈ R.
Example 13.6 The cofinite topology on a set X is the topology whose closed sets
are those subsets C such that either X = C or C is finite.
Definition 13.6 (Accumulation Points) A point x is an accumulation point of a
subset A of a topological space (X, τ ) if and only if every neighborhood of x
contains points of A different than x. The set A% of all the accumulation points
of A is called the derived set of A.1
Theorem 13.2 A subset is closed if and only if it contains its derived set.
Proof Indeed, if A is a subset, each neighborhood of a point x intersects A if and
only if either x ∈ A or x is an accumulation point of A.
Theorem 13.3 If A is a subset, A ∪ A% is a closed set.
Proof If x is neither a point of A nor an accumulation point of A, then there exists
an open neighborhood U of x which does not intersect A. But U is a neighborhood
of each of its points, and none of these are accumulation points of A. This proves
that the union A ∪ A% is the complement of an open set.
Definition 13.7 (Closure) The closure A of a subset A of a topological space
(X, τ ) is the intersection of all closed sets that contain A. Hence A is the smallest2
closed set containing A.
Theorem 13.4 The closure of any set if the union of the set and of its derived set.
Proof Every accumulation point of a set A in an accumulation point of each set
containing A, and is therefore a member of each closed set containing A. Hence A
contains both A and A% . Conversely, A∪A% is closed by Theorem 13.3, and therefore
it contains A.
Exercise 13.11 Let A = (−1, 3) ∪ [5, 6) be a subset of R with the usual topology.
What is A?
Definition 13.8 (Dense Subsets) A set A is dense in a topological space X if and
only if A = X.
Example 13.7 Q is dense in R, since for any real numbers x < y there exists a
rational number r such that x < r < y.
In the first part of the book we have extensively described the real line .R and its
properties. We have seen that a distance function exists on it: .d(x, y) = |x − y|.
In terms of this distance we have defined open sets, accumulation points, and so
forth. Our aim now is to introduce finitely many additional dimensions to the space
.R = R , a process that is needed is almost every application of modern mathematics
1
x + y = (x1 + y1 , . . . , xN + yN )
.
λx = (λx1 , . . . , λxN )
.
for each .x = (x1 , . . . , xN ) and .y = (y1 , . . . , yN ). This distance is often called the
Euclidean distance. The open ball centered at some .x ∈ RN with radius .r > 0 is
B(x, r) = Br (x) = y ∈ RN d(x, y) < r ;
.
see Fig. 13.1. A subset A of .RN is open if for each .x ∈ A there exists .r > 0 such
that .Br (x) ⊂ A; see Fig. 13.2.
13.2 The Special Case of RN 207
Remark 13.5 From the topological viewpoint, .RN is a metric space. But the vector
space structure is often so important that mathematicians prefer to see .RN as a
normed vector space. For this reason the usual distance .d(x, y) is often denoted by
.*x − y*. This is indeed the standard notation for any distance induced by a norm
.* · *. We will come back to normed vector spaces in the chapter about differentiation
in abstract spaces.
Once we can recognize open sets in .RN , we can speak of closed sets, accumu-
lation points, closure, interior, and so on. If not explicitly stated, we will always
endow .RN with the usual distance and the usual topology induced by this distance.
208 13 Neighbors Again: Topological Spaces
Proof Suppose first that .B is a base for some topology, U and V are members of
.B, and .x ∈ U ∩ V . Since .U ∩ V is open, some .W ∈ B contains x and is contained
in .U ∩ V . Conversely, let .τ be the union of all members of .B. Clearly any union of
members of .τ is a union of members of .B, hence it is a member of .τ . We only need
to show that the intersection of any two members U , V of .τ is a member of .τ . But if
.x ∈ U ∩ V , we can choose .U1 and .V1 in .B and select .W ∈ B such that .x ∈ U1 ⊂ U
as follows: it consists of .∅, X, all finite intersections of elements of ., and all unions
of these finite intersections.
13.4 Subspaces 209
containing ., and it is both unique and the smallest one with such a property. To
prove the stated description of open sets, we notice that .τ () contains ., and thus
it must contain all the sets listed in the statement of the theorem. On the other hand,
since unions distribute over intersections, the sets listed actually form a topology
on X containing ., and therefore such a topology must contain .τ (). The proof is
complete.
13.4 Subspaces
Definition 13.15 (Relative Topology) Let .(X, τ ) be a topological space, and let
Y be a subset of X. The induced (or relative) topology on Y is the family of all
intersections of members of .τ with Y , and is denoted by .τ |Y . Explicitly,
τ |Y = {V ∩ Y | V ∈ τ } .
.
Proof The set A is closed in Y if and only if .Y \ A has the form .V ∩ X for some
V ∈ τ . This is true if and only if .A = (X \ V ) ∩ Y , and (i) is proved. Statement (ii)
.
follows directly from the definitions of relative topology and accumulation point.
Finally, the closure of A in .τ |Y is the union of A and the set of accumulation points
of A in .τ |Y . Then (iii) follows from (ii).
Exercise 13.18 Prove that if Y is open in X, each open set of .(Y, τ |Y ) is also open
in X. Is this true for closed sets, provided that Y is closed in X?
As a matter of fact, the information that a subset A is open in the relative topology
of Y tells very little about the properties of A as a subset of X.
Example 13.13 Suppose that .X = Y ∪ Z, and .A ⊂ X is such that .A ∩ Y is open
in Y , .A ∩ Z is open in Z. Although we might expect that A be open in X, this is
false. Just take .A = Y , .Z = X \ Y : clearly .Y ∩ Y and .Y ∩ Zare open in Y and Z
respectively, but Y may well be any subset of X.
A ∩ Cl(B, X) = (A ∩ Y ) ∩ Cl(B, X)
.
= H ∩ (Y ∩ Cl(B, X))
= A ∩ Cl(B, Y ) = ∅,
and similarly for .Cl(A, X) ∩ B.3 Conversely, if A and B are separated in X and
.Y = A ∪ B, then
3 We have denoted .Cl(A, X) the closure of A in X, since we need to distinguish the closure in X
Uz = A ∩ (−∞, z),
. Vz = A ∩ (z, +∞).
Clearly .Uz and .Vz are nonempty. Since .Uz ⊂ (−∞, z) and .Vz ⊂ (z, +∞), they
are separated. Hence A is the union of two nonempty separated sets, and is not
connected.
Conversely, suppose that A is not connected, and has the form .A = U ∪ V for
some nonempty separated sets U and V . Let .x ∈ U , .y ∈ V , assuming .x < y for the
sake of definiteness. We define .z = sup (U ∩ [x, y]). As a supremum, we know that
z is in the closure of U , so that .z ∈
/ V because U and V are separated. In particular,
.x ≤ z < y. If .z ∈ / U , it follows that .x < z < y and .z ∈
/ A. If .z ∈ U , then .z ∈
/ V,
so that there exists .z1 such that .z < z1 < y and .z1 ∈ / V . Again .x < z1 < y and
.z1 ∈/ A.
Important: Beware!
The last result shows that the only connected subsets of the real line (with the usual
topology) are the intervals, of any kind. This is no longer true if we replace .R with
.Q with the induced topology. Indeed, for any irrational number a, the two sets .{x ∈
λx + (1 − λ)y ∈ A
.
To prove (b), we denote by C the union of the members of .A. Suppose that D
is both open and closed in C. For each .A ∈ A, we have that .A ∩ D is open and
closed in A, and since A is connected we conclude that either .A ⊂ D or .A ⊂ C \ D.
We claim that either each member of .A is contained in D and .C \ D = ∅, or it
is contained in .C \ D and .D = ∅. Indeed, if A and B are members of .A, it is
impossible that .A ⊂ D and .B ⊂ C \ D, for in this case A and B would be separated
as subsets of the separated sets D and .C \ D. This contradicts the assumption, and
proves the claim and the proof.
Definition 13.18 (Connected Components) A (connected) component of a topo-
logical space is a maximal connected subset. More precisely, it is a connected subset
which is properly contained in no other connected subset.
Example 13.14 If a topological space is connected, then it is the only connected
component. Indeed, if A is a connected subset, then A is contained in the connected
topological space, and A cannot be maximal.
Example 13.15 In a discrete topological space, each connected component is a
singleton.
Theorem 13.12 Each connected subset of a topological space is contained in a
connected component. Each component is closed. If A and B are distinct connected
components of a space, then A and B are separated.
Proof Let A be a nonempty connected subsets of a space X, and let C be the union
of all the connected sets containing A. As a consequence of Theorem 13.11, C is
connected. If D contains C and D is connected, then .D ⊂ C, so that .D = C.
Hence C is a maximal connected subset, i.e. a connected component of the space.
Each component C is connected by definition, so that its closure is connected
by Theorem 13.11. Therefore .C = C, and C is closed. Finally, if A and B are
disjoint components but they are not separated, then .A ∪ B is a connected subset, a
contradiction to the maximality of A and B.
Definition 13.19 (Totally Disconnected Spaces) A topological space X is totally
disconnected if and only if the connected component containing any point .x ∈ X
coincides with .{x}.
The intuition behind connectedness is often expressed by saying that any two
points of the space can be connected by a path. Although it can be proved the this is
not an equivalent definition, it is nonetheless an interesting definition.
Definition 13.20 (Arc-Wise Connected Spaces) A topological space X is arc-
wise connected if and only if for each points .x0 and .x1 in X there exists a continuous
map .α : [0, 1] → X such that .α(0) = x0 and .α(1) = x1 . See Fig. 13.4.
214 13 Neighbors Again: Topological Spaces
So far we have described topologies in terms of their open sets. This is by far the
most common approach to General Topology. But there are other viewpoints which
can be even more useful to mathematical analysts.
If we think of our previous chapters, we immediately see that elementary real
analysis in one variable leans on a single idea: that of limits. In any reasonable
generalization, limits should therefore play a crucial role. And indeed limits can be
defined as soon as a topology exists on a set. However, in this section we want to
define a broader definition that can include sequences, functions, Riemann sums,
and much more.
Recall that a sequence is just a function defined on the complement of a finite
subset N of .N. With an abuse of notation, and since our interest is towards the
theory of convergence, we will assume that sequences are functions defined on the
whole set .N of natural numbers. Indeed, any two sequences which differ for finitely
many terms have the same character.
The value of a sequence S at .n ∈ N is denoted either by the functional symbol
.S(n) of by the traditional symbol .Sn . We say that a sequence S is in a set A if and
be .Ni = 1 for each i, since .S(N(i)) = S(1) for each i, and convergence would be
ensured. But this is not an interesting candidate, and some additional condition on
.i → N(i) should be imposed.
The usual condition, which we actually used in a previous chapter, is that .i < j
imply .N(i) < N(j ), i.e. a strict monotonicity condition. If this condition is satisfied,
we then say that .{SNi }i is a subsequence of S. But is this really the good condition?
Well, not really. What we actually need is that .Ni becomes large as i becomes large.
Definition 13.22 (Generalized Subsequences) T is a subsequence of S if and only
if there exists a map .N : N → N such that .T = S ◦ N, and for each integer m there
exists an integer n such that .i ≥ n implies .Ni ≥ m.
The previous discussion introduces the following problem: we want to construct
generalized sequences on any topological space in such a way that the convergence
of such generalized sequences characterize the topology of the space.
Definition 13.23 A binary relation .≥ directs a set D if and only if .D = ∅ and
(a) if m, n and p are elements of D such that .m ≥ n and .n ≥ p, then .m ≥ p;
(b) if .m ∈ D then .m ≥ m;
(c) if m and n are elements of D, then there exists .p ∈ D such that .p ≥ m and
.p ≥ n.
Example 13.18 The real numbers and the natural numbers are directed by their
usual order .≥.
Example 13.19 The family of all neighborhoods of a point in a topological space
is directed by reverse inclusion: .U ≥ V if and only if .U ⊂ V . Condition
(c) is a consequence of the fact that the intersection of two neighborhoods is a
neighborhood. This is a particularly important example, since it will join the usual
definition of convergence is a topological space with the definition of convergence
of a net.
Example 13.20 The family of all finite partitions of the closed interval .[a, b] is a
directed set, when ordered by the relation .P2 ≥ P1 if and only if .P2 is a refinement
of .P1 .
Exercise 13.21 Prove that the family of all finite subsets of a set is directed by
direct inclusion: .A ≥ B if and only if .A ⊃ B.
216 13 Neighbors Again: Topological Spaces
Definition 13.24 (Nets) A directed set is a pair .(D, ≥) such that .≥ directs D. A
net is a pair .(S, ≥) such that S is a function and .≥ directs the domain of S. If S is a
function whose domain contains D and D is directed by .≥, then .{Sn , n ∈ D, ≥} is
the net .(S|D , ≥). A net .{Sn , n ∈ D, ≥} is in a set A if and only if .Sn ∈ A for each
.n ∈ D; it is eventually in A if and only if there exists .m ∈ D such that .Sn ∈ A for
each .n ≥ m. The net is frequently in A if and only if for each .m ∈ D there exists
.n ∈ D such that .n ≥ m and .Sn ∈ A.
Important: Notation
The full notation .{Sn , n ∈ D, ≥} is too cumbersome to be currently used. We will
write .{Sn , n ∈ D} when the domain of S plays an explicit role and no confusion can
arise about the direction .≥ in D.
Example 13.21 Let .{an }n be a sequence in .R. The sequence converges to a (finite)
limit L if and only if the associated net .(a, ≥) converges to L in the sense of
Definition 13.25. Indeed, both statement mean: for each neighborhood U of L there
exists an integer N such that .n ≥ N implies .an ∈ U .
Example 13.22 As another interesting construction based on nets, we consider
summability. Let A be a set, and .f : A → R be a real-valued function. We direct
finite subsets of A by .⊃, and for each finite set .F ⊂ A we define
S(F, f ) =
. {f (a) | a ∈ F } .
n
. lim f (k) = I.
n→+∞
k=1
Example 13.23 We already know that the set .P of all finite partitions of an interval
.[a, b] is a directed set. If .f : [a, b] → R is a given function, we can define a net
.{L(f, P ), P ∈ P} by letting .L(f, P ) denote the lower Riemann sum of f with
respect to the partition P . The same can be done for the upper Riemann sum, of
course. Convergence of both nets to a common limit .c ∈ R is then equivalent to the
!b
integrability of f on .[a, b], and . a f dx = c.
Exercise 13.22 Let .f : [0, 1] → R be a bounded function. We define the set D
of all ordered pairs .(P , ξ ) such that .P = {a0 , . . . , an } is a partition of .[0, 1] and
.ξ = {ξ1 , . . . , ξn } is a finite set of points such that .ak−1 ≤ ξk ≤ ak for .k = 1, . . . , n.
If .(P , ξ ) and .(Q, η) are elements of D, we define .(P , ξ ) ≥ (Q, η) if and only if P
is a refinement of Q. It is easy to prove that .≥ directs D. We define the Riemann net
.S : D → R such that
n
.S(P , ξ ) = f (ξk )(ak − ak−1 ).
k=1
Finally, let
n
U (P ) =
. sup {f (x) | ak−1 ≤ x ≤ ak }
k=1
n
L(P ) = inf {f (x) | ak−1 ≤ x ≤ ak }
k=1
218 13 Neighbors Again: Topological Spaces
1. For every .(P , ξ ) ∈ D, prove that .L(P ) ≤ S(P , ξ ) ≤ U (P ). Prove also that, for
every .ε > 0 and for every partition P , there exist .ξ and .η such that .(P , ξ ) ∈ D,
.(P , η) ∈ D, .S(P ξ ) > U (P ) − ε and .S(P , ξ ) < L(P ) + ε.
4. If P , Q are partitions of .[0, 1] such that .μ(P ) < (1/2)δ(Q), prove that .U (P ) −
L(P ) < 2(U (Q) − L(Q)). Hint: observe that every interval of P is contained in
at most two intervals of Q.
5. Prove that if the Riemann net S converges to I , then the integral of f exists and
equals I . Hint: it follows from 3 and 4 that it is sufficient to construct a partition
Q such that .U (Q) − L(Q) is small.
Exercise 13.23 Let X be a discrete space; prove that a net S converges to x if and
only if it is eventually equal to x. If X is an indiscrete space, prove that any net
converges to any point of X.
Theorem 13.14 Let X be a topological space.
(a) A point x is an accumulation point of a subset A of X if and only if there exists
a net in .A \ {x} which converges to s.
(b) A point x belongs to the closure of a subset A of X if and only if there is a net
in A converging to x.
Proof Suppose x is an accumulation point of A. For every neighborhood U of x
there exists a point .SU of A which belongs to .U \ {x}. Recall that the family .U of
all neighborhoods of x is directed by .⊂. Now, if U and V are neighborhoods of x
such that .V ⊂ U , then .SV ∈ V ⊂ U . The net .{SU , U ∈ U, ⊂} therefore converges
to x. Conversely, if a net in .A \ {x} converges to x, then this net has values in every
neighborhood of x, and .A \ {x} intersects each neighborhood of x.
To prove (b), we remember that the closure of a subset consists of A together
with the set of all accumulation points of A. If x is an accumulation point of A, by
the preceding discussion there exists a net in A converging to x. Furthermore, if x is
a point of A, the net which is constantly equal to x converges to x. In any case, each
point of the closure of A has a net in A which converges to it. On the other hand,
if a net in A converges to x, then every neighborhood of x intersects A, and x thus
belongs to .A.
13.6 Nets and Convergence 219
Remark 13.7 The previous proof is a typical example of the use of the Axiom of
Choice. Although this is usually taken as granted, when we define the net .U → SU
we actually choose a point .SU from the neighborhood U , and this choice can only
be motivated by the Axiom of Choice.
We have seen that in an indiscrete space, any net converges to any point. This is
in striking contrast to the familiar case of .RN , in which a sequence can have at most
one limit. Honestly, it is hard to imagine a sequence converging to any point, since
our mind immediately draws picture in the physical space .R3 .
Definition 13.26 A topological space is a Hausdorff space, or a .T2 -space, if and
only if whenever x and y are distinct points of X, there exist disjoint neighborhoods
of x and y. See Fig. 13.5.
In our intuition, this separation property is usually taken for granted. As we said
before, we always think in .R3 , which is a metric space. This makes the difference.
Example 13.24 For any integer N, the space .RN (with the usual topology) is a
Hausdorff space. Indeed, let x and y be two distinct point of .RN . We call .δ = *x −
y* > 0 their distance. The sets .B(x, δ/3) and .B(y, δ/3) are disjoint neighborhoods
of x and y.
Theorem 13.15 A topological space is a Hausdorff space if and only if each net in
the space converges to at most one point.
Proof One half of the proof is standard. Indeed, assume that X is a Hausdorff space
and x, y are distinct points of X. By definition, we can pick disjoint neighborhoods
U and V of x and y. Since a net cannot be eventually in each of the disjoint sets U
and V , it is clear that a converging net in X cannot have distinct limits. Conversely,
we proceed by contradiction, assuming that X is not a Hausdorff space and x and y
are distinct points such that every neighborhood of x intersects every neighborhood
of y. We now construct a net in X that converges to both x and y. Let .Ux be the
family of all neighborhoods of x, and .Uy be the corresponding family of y. Both
families are directed by inclusion. The cartesian product .Ux × Uy is ordered by
220 13 Neighbors Again: Topological Spaces
τ − lim{Sn , n ∈ D, ≥} = x.
.
Exercise 13.24 Prove that the trick introduced in the previous proof is a general
one. Precisely, if .(D, ≥D ) and .(E, ≥E ) are directed sets, prove that .D×E is directed
by .(d, e) ≥D×E (f, g) if and only if .d ≥E f and .e ≥E g. This is the product
directed set.
Example 13.25 Cartesian products ' can be directed even in a much more general
situation. The cartesian product . {Da | a ∈ A} of a family of sets is the set of all
functions d on A such that .d(a) ' ∈ Da for each .a ∈ A. Suppose now each .Da is
directed by .≥a . The product . {Da | a ∈ A} is directed by .d ≥ e if and only if
.d(a) ≥ e(a) for each .a ∈ A. This is a very natural order relation on the product set.
We now verify that we have constructed a directed set. Let d and e be two members
of the product set. For each .a ∈ A there exists a member .f (a) of .Da such that
.f (a) ≥a d(a) and .f (a) ≥a e(a). Consequently the function f whose value at a is
Remark 13.9 The word “subnet”should not be taken too seriously. While a subse-
quence is always indexed by a selection of natural numbers, a subnet may be indexed
by a set which has nothing to do with the set of indices of the original net. This is
an important fact to remember, if we want to avoid silly mistakes.
We have already seen that the usual approach to subsequences is to require that
the function N be strictly increasing. This of course guarantees that the second
condition is satisfied.
We now try to relate cluster points and convergent subnets in general topological
spaces.
Lemma 13.1 Let S be a net and .A be a family of sets such that S is frequently in
each member of .A, and such that the intersection of two members of .A contains a
member of .A. Then there exists a subnet of S that is eventually in each member of
.A.
eventually in A.
Definition 13.29 A point x of the space is a cluster point of a net S if and only if S
is frequently in every neighborhood of x.
Theorem 13.16 A point x is a topological space is a cluster point of a net S if and
only if some subnet of S converges to x.
Proof Let x be a cluster point of a net S, and let .U be the family of all
neighborhoods of x. The intersection of two members of .U is again a member of .U.
The preceding Lemma applies, and there is a subset of S which is eventually in each
neighborhood of x. This means that this subnet converges to x. On the contrary, if
x is not a cluster point of S, then there exists a neighborhood U of x such that S
is not frequently in U , and therefore S is eventually in the complement of U . Then
each subnet of S is eventually in the complement of U and hence cannot converge
to x.
We conclude this section with some remarks about the limit of a function in
general spaces. We have seen that nets are a genuine generalization of sequences, in
the sense that any sequence is a net since .N is directed by the usual order. But what
about limits of functions?
Example 13.26 Suppose that Y is a topological space, .(X, d) is a metric space, A
is a subset of X, .x0 an accumulation point of A, and .f : A → Y a function. The set
.A \ {x0 } is directed as follows: .x ≥ y if and only if .d(x, x0 ) ≤ d(y, x0 ). Since .x0
222 13 Neighbors Again: Topological Spaces
The last example shows that we can use nets for defining limits of functions if
the domain of the function is a metric space. In particular we recover the theory
of limits for functions of a real variable. However, the situation is less clear if the
domain of the function is merely a topological space. The following is the ad hoc
definition that we can find in many textbooks.
Definition 13.30 Suppose that Y is a topological space, X is a topological space, A
is a subset of X, .x0 an accumulation point of A, and .f : A → Y a function. We say
that .y0 ∈ Y is a limit of f as x tends to .x0 if and only if for every neighborhood V
of .y0 there exists a neighborhood U of .x0 such that .f (A ∩ U \ {x0 }) ⊂ V .
If Y is a Hausdorff space, it is easy to prove that the limit, if it exists, must be unique.
Hence we can write .y0 = limx→x0 f (x), and we are happy.
The question now is: can we interpret this definition of limit in the setting of
nets? Well, in order that .{f, A \ {x0 }} be a net, the domain .A \ {x0 } must be directed
in such a way that .x ≥ y means “x is closer to .x0 than y.” In general there is no
canonical way to ensure that such an order relation exists: topologies do not always
allow to measure the distance between points. Should we give up? No, although the
answer is probably not so elegant as we may hope.
Definition 13.31 (Limits Through Nets) Suppose that Y is a topological space,
X is a topological space, A is a subset of X, .x0 an accumulation point of A, and
.f : A → Y a function. We say that .y0 ∈ Y is a limit of f as x tends to .x0 if and only
if for every net .{xα , α ∈ D} in .A \ {x0 } converging to .x0 , the net .{f (xα ), α ∈ D}
converges to .y0 in Y .
Notice that this is the straightforward generalization of Definition 7.1.
Exercise 13.26 Prove that Definitions 13.30 and 13.31 are indeed equivalent. Hint:
in one direction you will need to construct a suitable net in .A \ {x0 } by using the
family of neighborhoods of .x0 directed by reverse inclusion. Compare with Theorem
7.1.
Remark 13.10 Let A, B be disjoint subsets of a topological space X, and let .Y =
A ∩ B. The set A (resp. B) is closed in Y if and only if for each net .{Sn , n ∈ D} in A
(resp. in B) converging to a limit x in the relative topology of Y there results .x ∈ A
(resp. .x ∈ B). Now, .Sn → x if and only if .Sn eventually lies in every open (with
respect to the topology of Y ) neighborhood of x, i.e. if and only if for every open
(with respect to the topology of X) neighborhood U of x there results .Sn ∈ U ∩ Y
13.7 Continuous Maps and Homeomorphisms 223
How can we prove that two spaces are, or are not, homeomorphic? It doesn’t
come as a surprise that this is a difficult task. As we said before, topology is exactly
the study of properties that are left invariant under homeomorphisms.
Definition 13.35 A topological invariant is a property which when possessed by a
space is also possessed by each homeomorphic space.
This definition offers the following consequence: if we can exhibit a topological
invariant which is possessed by one space but not by another space, then the two
spaces are hot homeomorphic.
Theorem 13.19 If X is a connected space and .f : X → Y is a continuous function,
then .f (X) is also connected.
Proof First of all, we may assume that f is surjective, since the map .f : X → f (X)
is continuous and surjective. Assume that B is both open and closed in Y . Hence
.f
−1 (B) is both open and closed in X by Theorem 13.18. Since X is connected,
(
n
Pi :
. Xi → Xi
i=0
'
Theorem 13.21 If . ni=0 Xi is endowed with the product topology, then each .Pi is
a continuous function.
Proof Indeed, if .Ui is an open subset of .Xi , then
Proof Let .τ be any topology on .X × Y such that .PX and .PY are both continuous.If
U is open in X and V is open in Y , then .U × V is open in .τ , since .U × V =
PX1 (U ) ∩ PY−1 (V ), and this intersection is open as the intersection of two open
subsets. Hence the product topology is smaller than .τ , and we conclude by the
arbitrariness of .τ .
Example 13.30 The space .RN is the product space of N copies of .R' 1 = R, each
factor being endowed with the usual topology. The N-cells of the form . N i=0 (ai , bi ),
where .ai and .bi are real numbers, are a base of the product topology of .RN .
Exercise 13.30 Using some geometric intuition, prove that the product topology of
RN is actually equivalent to the usual topology generated by the distance
.
$
%N
%
.d(x, y) =
& |xi − yi |2 .
i=0
As a hint, you should prove that each open ball contains a N-cell and is contained
in another N-cell. This is fairly obvious, but you should make an effort to prove it.
In some applications to Functional Analysis, finite products of spaces are not
enough. A typical situation is that of function spaces and weak topologies. As we
have seen in Set Theory, the cartesian product of any collection of sets is already a
set of functions.
228 13 Neighbors Again: Topological Spaces
where F is a finite subset of A, and .Ua is open in'.Xa for each .a ∈ F . A strange fact
arises from this discussion: a subset of the form . a∈A Ua , where each .Ua is open in
.Xa , need not be open in the product topology! This however is true if only finitely
for some open sets .Ua ∈ Xa , .a ∈ A. This looks rather natural, since it is the
straightforward generalization of the product topology in .X × Y (or in any product
space with finitely many factors). Nevertheless, there is no reason why Theorem
13.24 should hold for such a topology. Indeed, looking at the proof, it is true that
−1 −1
.{Sn , n ∈ D} is eventually in .Pa (Ua ), say .Sn ∈ Pa (Ua ) as soon as .n ≥ na .
If however A contains infinitely many elements, it may very well happen that .na
“escapes to infinity”, and it is therefore impossible to select a single element .n ∈ D
such that .Sn ∈ Pa−1 (Ua ) when .n ≥ n for every .a ∈ A. It was possible for a finite
number of indices a, of course.
Remark 13.11 The previous result somehow motivates the terminology of point-
wise convergence for convergence in the product topology.
' This is even clearer when
we are dealing with a product of identical factors, . a∈A X = XA , which is simply
the collection of all functions from A to X. A net .{Fn , n ∈ D} converges to f in .XA
if and only if .{Fn (a), n ∈ D} converges to .f (a) for each .a ∈ A.
Theorem 13.25 The product of Hausdorff spaces is a Hausdorff space.
'
Proof Let x and y be distinct members of . a∈A Xa . Then .xa = ya for some a.
By assumption, there exist disjoint open neighborhoods U and V of .xa and .ya
respectively. Then .Pa−1 (U ) and .Pa−1 (V ) are disjoint open neighborhoods of x and
y respectively, so that the product space is Hausdorff.
We now take a break, and go back to the comparison of sequences and nets. We
have insisted that nets completely describe a topology, while sequences in general
do not. But why? The following is an example that leans on the product topology.
Example 13.31 We consider the space .X = RR with the product topology. Let
for some finite set .F ⊂ R and some .ε > 0. Such a neighborhood intersects E in
the function h which is 0 on elements of F and 1 elsewhere. Hence .g ∈ E. We
now claim that no sequence in E can converge to g. Indeed, if .{fn }n is a sequence
in E with .fn being equal to zero on a set .An , then
∞any function which is a limit of
.{fn }n can be zero at most on the countable set . n=1 An . Clearly g does not meet
this condition, and the claim is proved.
Sequences do not suffice to assign a topology on a set. A natural question is
whether we can add any additional requirement in order to restore the full power of
converging sequences.
Definition 13.40 A topological space satisfies the first countability axiom, or
briefly is first countable, if the neighborhood system of each point of the space has
a countable base.
Theorem 13.26 Let X be a first countable space.
(a) A point x is an accumulation point of a subset A if and only if there exists a
sequence in .A \ {x} which converges to x.
(b) A subset A is open if and only if each sequence which converges to a point of A
is eventually in A.
(c) If x is an accumulation point of a sequence S, there is a subsequence of S
converging to x.
Proof Suppose first that x is an accumulation point of a subset A, and that .U0 ,
.
1n,. . . , .Un ,. . . is a countable base of the neighborhood system at x. Let .Vn =
U
i=0 Ui . Then the sequence .V0 , V1 , . . . , Vn , . . . is also a base of the neighborhood
system at x with the additional property that .Vn+1 ⊂ Vn for each n. For each n we
select a point .Sn in .Vn ∩ (An \ {x}), thus obtaining a sequence .{Sn }n which clearly
convergens to x. The converse of (a) is trivial.
If a subset A is not open in X, then there is a sequence in .X \ A which converges
to a point of A. Such a sequence fails to be eventually in A, and (b) is proved.
Finally, suppose that x is an accumulation point of a sequence S and that
.V0 , V1 , . . . , is a countable base for the neighborhood system at x such that .Vn+1 ⊂
Vn for each n. For any integer i we choose .Ni such that .Ni ≥ i and .SNi ∈ Vi . Thus
.{SNi }i is a subsequence of S which converges to x.
Remark 13.12 Any metric space .(X, d) satisfies the first axiom of countability.
Indeed the sequence
1
Un = x d(x, x0 ) <
.
n
13.8 Product Spaces, Quotient Spaces, and Inadequacy of Sequences 231
is a countable base of the neighborhood system at a given point .x0 ∈ X. This is the
reason why mathematical analysis in .RN can be completely explained in terms of
sequences.
A nice consequence of the density of the rational numbers in .R is contained in
the following result.
Theorem 13.27 (Lindelöf) Let .C be a collection of open sets of .R. Then there
exists a countable sub-collection .{On }n of .C such that
∞
. {O | O ∈ C} = On .
n=1
Proof We call .U = {O | O ∈ C} and we pick any .x ∈ U . As such, there exists
.O ∈ C such that .x ∈ O. But O is an open set, hence there exists an open interval
interval .Jx whose end-points are rational numbers and which satisfies .x ∈ Jx ⊂ Ix .
The collection of all open intervals with rational
end-points is countable, we see
that .{Jx | x ∈ U } is countable and .U = {Jx | x ∈ U }. Now, for each interval
in .{Jx | x ∈ U } we select a set O from .C which contains it. In this way we have
constructed a countable sub-collection .{On }n of .C such that .U = ∞ n=1 On , and the
proof is complete.
This suggests a general definition connecting open covers and countability proper-
ties of the local neighborhoods.
Definition 13.41 A topological space X is a Lindelöf space if and only if every
open cover of X has a countable sub-cover.
After this discussion on the importance of nets, we return to the second
construction of a new space from an old one. Suppose that we are given a function
f from a topological space X onto a set Y . Can we topologize Y so that f is
continuous?
Definition 13.42 Let X be a topological space, and f be a function defined on X
with range Y . The family .U of all sets .U ⊂ Y such that .f −1 (U ) is open in X is the
quotient topology of Y induced by f .
Exercise 13.33 By using the elementary properties of preimages, prove that the
quotient topology is indeed a topology, i.e. that .U satisfies the axioms of a topology.
Suppose that Y has a topology .τ such that f is a continuous function. For each
U ∈ τ , we then have that .f −1 (U ) is open in X, which proves that U is also open for
.
the quotient topology. We have thus proved that the quotient topology is the largest
topology on Y such that f is a continuous function.
Exercise 13.34 Prove that a set B is closed in the quotient topology if and only if
f −1 (B) is a closed subset of X. Hint: .f −1 (Y \ B) = X \ f −1 (B).
.
232 13 Neighbors Again: Topological Spaces
Cartesian product and quotient spaces are actually special cases of more general
constructions of topologies which preserve the continuity of given families of
functions.
Problem Suppose that .Xα is a topological space for each index .α ∈ A, and suppose
that Y is a set. Functions .fα : Xα → Y are given. We wish to find a topology on Y
such that each .fα is a continuous function.
There is a trivial answer, of course, so we add a requirement: we wish to find the
largest topology on Y such that each .fα is continuous.
Definition 13.43 The largest topology on Y such that each .fα is continuous is
called the final topology on Y with respect to .{fα | α ∈ A}. This topology can
be explicitly described: a subset U of Y is open if and only if .fα−1 (U ) is open in .Xα
for each .α ∈ A.
The following result characterizes the final topology.
Theorem 13.29 (Universal Property of the Final Topology) For each .α ∈ A,
suppose that .Xα is a topological space. The topology of a space Y is the final
topology with respect to the functions .fα : Xα → Y if and only if the following
condition is satisfied: for any topological space Z, a function .g : Y → Z is
continuous if and only if .g ◦ fα : Xα → Z is continuous for every .α ∈ A.
Proof Suppose first that Y is endowed with the final topology with respect to
{fα | α ∈ A}. If g is continuous, then .g ◦ fα is continuous for every .α as the
.
is open in .Xα for any .α and for any open .U ⊂ Z. Hence .g −1 (U ) is open in Y and g
is continuous.
13.9 Initial and Final Topologies 233
Suppose now that the topology of Y satisfies the condition in the Theorem. Then
each .fα : Xα → Y is continuous, since the identity map on Y is continuous. Calling
.Ȳ the set Y endowed with the final topology with respect to .{fα | α ∈ A}, the
Proof Suppose first that g is continuous. Each .fα : g is then continuous. On the
other hand, suppose that each .fα : g is continuous. Then, for any open .Uα ⊂ Yα ,
g −1 fα−1 (Uα ) = (fα ◦ g)−1 (Uα )
.
More explicitly, a space X is compact if and only if for every open cover
{Uα | α ∈ A} of X it is possible to find a finite set .F = {α1 , . . . , αn } ⊂ A such
.
that
X ⊂ Uα1 ∪ · · · ∪ Uαn .
.
Remark 13.13 The difficulty of the definition is that there is no restriction on the
cardinality of the open cover. Compactness is a highly demanding property.
As a first step, we characterize compactness in terms of an intersection property.
Definition 13.48 A family .A of sets has the finite intersection property if and only
if the intersection of the members of each finite subfamily of .A is non-empty.
Theorem 13.32 A topological space is compact if and only if each family of closed
subsets which has the finite intersection property has a non-empty intersection.
Proof If .A is a family of subsets of a topological space X, then
X\
. A= (X \ A) .
A∈A A∈A
5 The set .An can be called the n-tail of the net .{Sn , n ∈ D}.
236 13 Neighbors Again: Topological Spaces
The converse statement is less trivial, and generally false. It becomes true under
a separation assumption.
Theorem 13.35 If A is a compact subset of a Hausdorff space X and x is a point of
X \A, then there are disjoint neighborhoods of x and A. In particular, each compact
.
is closed in Y . This proves the .(f −1 )−1 (A) is closed, and the continuity of .f −1
follows.
Exercise 13.36 Provide an alternative proof of the previous result by using nets and
Theorem 13.33.
Theorem 13.37 If A and B are disjoint compact subsets of a Hausdorff space X,
then there exist disjoint neighborhoods of A and B.
Proof By Theorem 13.35, to each .x ∈ A there corresponds a neighborhood of
x and a neighborhood of B which are disjoint. As a consequence there exists a
neighborhood U of x such that .U ∩ B = ∅, and since B is compact there exists a
family .Ui , .i = 0, . . . , n such that .Ui ∩ B = ∅ for .i = 0, . . . , n and .A ⊂ V =
finite
{Ui | i = 0, . . . , n}. Then V is a neighborhood of A and .X \ V is a neighborhood
of B which is disjoint from V .
The fundamental example of compact sets for a mathematical analyst is clearly
the N-cell in .RN . We develop here a proof which does not make use of Tychonoff’s
theorem on the product of compact sets.
Definition 13.49 An N-cell in .RN is a cartesian product of N closed and bounded
intervals, i.e. a set of the form
N
. x ∈ R for each i ∈ {1, . . . , N}, ai ≤ xi ≤ bi ,
where .ai and .bi are real numbers. Hence an N-cell is the cartesian product
We now consider the general case .N > 1. Suppose that .In consists of all points
.x = (x1 , . . . , xN ) such that .an,j ≤ xj ≤ bn,j for each n and .j ∈ {1, . . . , N}.
We define .In,j = [an,j , bn,j ]. For fixed j , the sequence .In,j of 1-cells has non-
empty intersection. Hence there exist real numbers .xj∗ , .j = 1, . . . , N, such that
∗ ∗ ∗ ∗
.an,j ≤ xn ≤ bn,j for each n and .j ∈ {1, . . . , N. The point .x = (x , . . . , x ) lies
1 N
in each .In , and the proof is complete.
Unlike most topological spaces, the Euclidean space .RN possesses a complete
characterization of all compact subsets. This is an important result for analysis, and
we provide a statement that anticipates a more general result about sequences in
compact spaces.
238 13 Neighbors Again: Topological Spaces
Theorem 13.39 For a subset K of .RN the following statements are equivalent to
each other:
(a) K is closed and bounded;
(b) K is compact;
(c) every sequence in K has an accumulation point in K.
Proof Since any bounded subset is contained in a suitable N-cell, we use Theorem
13.34 to show that (a) implies (b). Suppose now that K is compact and E is an
infinite subset of K. If no point of K is an accumulation point of E, then each point
of K has an open neighborhood which contains at most one point of E. The union
of all these neighborhoods is an open cover of K which has no finite subcover,
since E is an infinite set. Thus (b) implies (c). We prove that (c) implies (a). If K
is unbounded, we can construct a sequence of points .xn ∈ K such that .|xn | > n
for each n. The set of these points is infinite and has no accumulation point in
K, in contradiction with (c). Hence K is bounded. Suppose that K is not closed,
i.e. there exists a point .x0 of .RN which is an accumulation point of K but does
not belong to K. For each positive integer n, there exist points .xn ∈ K such that
.|xn − x0 | < 1/n. Let .S = {xn | n ∈ N}. The set S is infinite, otherwise the positive
1 1
≥ |x0 − y| − ≥ |x0 − y|.
n 2
for all but finitely many values of n. This shows that y cannot be an accumulation
point of S. We have reached a contradiction with (c), so that K must be closed.
What about sequences? Recall that we called sequentially compact any subset K
of .R with the following property: any sequence in K has a converging subsequence
in K. It is evident that the last property can be generalized to any setting.
Definition 13.50 A topological space is sequentially compact if and only if every
sequence in the space has a subsequence which converges to some point of the
space. A subset is sequentially compact if and only if it is sequentially compact in
the relative topology.
We already know that sequences are not sufficient to describe the topology in
the general case. As a matter of facts, sequential compactness is not equivalent to
compactness in a generic topological space.
Definition 13.51 A topological space X satisfies the second axiom of countability
if and only if the topology of X has a countable base.
13.10 Compact Spaces 239
Theorem 13.40 If X satisfies the second axiom of countability, then the following
statements are equivalent:
(a) Every sequence in X has an accumulation point;
(b) for each sequence in X there exists a subsequence converging to a point of X;
(c) X is compact.
Proof If the topology of X has a countable base, then every open cover of X has a
countable subcover. It is sufficient to prove that (a) implies (c), since (a) and (b) are
equivalent by previous results. So, we must show that every open cover of X has a
finite subcover. By assumption we may assume that the open cover is a sequence of
open sets
A0 , A1 , A2 , . . . , An , . . .
.
By induction we set .B0 = A0 , and for each .p ∈ N we define .Bp as the first member
of the cover which is not covered by
B0 ∪ B1 ∪ · · · ∪ Bp−1 .
.
uncountable, requires much more care. We begin with a more refined criterion for
compactness via nets.
Definition 13.53 A net .{Sn , n ∈ D} in a topological space X is a universal net if
and only if for each .E ⊂ X, it is either eventually in E or eventually in .X \ E.
Remark 13.14 An universal net in a topological space converges to any of its
accumulation points. Indeed, if the net is frequently in a set, then it is eventually
in this set.
Theorem 13.42 (Kelley) Let X be a non-empty set. Every net in X has a universal
subnet.
Proof Let .S = {Sn , n ∈ D} be a net in X and let
= {F ⊂ X | S is eventually in F } .
.
Let .A ⊂ X and suppose that A does not belong to .. We claim that .X \ A ∈ .
Indeed, either S is eventually in .X \ A or there exists .B ∈ such that .A ∩ B = ∅. If
S is eventually in .X \ A then .X \ A ∈ ⊂ . Therefore we assume that there exists
.B ∈ such that .A ∩ B = ∅. Then S must be frequently in .X \ A or else it could
D≥n = {p ∈ D | p ≥ n} .
.
1. Let .S be the collection of all subnets of the net S. Since .S ∈ S, we see that
.S = ∅.
2. If .T1 ∈ S and .T2 ∈ S, we define .T1 ≥ T2 if and only if .T1 is a subnet of .T2 . It
can be proved that .≥ is a partial order on .S.
3. Let .{Ti | i ∈ I } be8 a totally ordered subset of .S, where each .Ti is defined on a
directed set .E i . We set
.E = E≥m mi ∈ E , i ∈ I ,
i i
i
j
and we order E by defining .E≥mj ≥ E≥m
i
i
if and only if .Tj ≥ Ti and
j
Tj (E≥mj ) ⊂ Ti (E≥m
.
i
i
).
and .E≥mj are given, and if .Tj ≥ Ti , we can find .m%j ∈ E j such that
i j
4. If .E≥mi
T ∗ (E≥m
.
i
i
) = Ti (mi ).
8Here we are using a bound-variable notation of a collection of nets. The variable i is not the
dummy variable which runs over a directed set, but a dummy variable which labels the elements
of the collection. Unfortunately it would be quite difficult to switch to an intrinsic notation.
242 13 Neighbors Again: Topological Spaces
universal net converges, and therefore the universal net itself must converge by
Theorem 13.24.
We present a slightly different proof of Tychonoff’s Theorem, due to Paul R.
Chernoff [2].
Proof Let .{Xα | α ∈ A} be an indexed family of non-empty topological spaces,
' of which is compact. A basic neighborhood N of an element .f ∈ X =
each
α∈A Xα is determined by a finite subset .F ⊂ A, together with neighborhoods
.Uα of .f (α) in .Xα for every .α ∈ F . Hence N consists of all .h ∈ X such that for
Since any two elements of .L must agree on their common domain, .g0 is a partially
defined element of X. Furthermore, .g0 ∈ P, since every basic neighborhood of .g0
has finite support F , and thus F is contained in the domain of .gλ for some .λ ∈ .
To summarize, .g0 ∈ P and .g0 is an upper bound of .L.
We can now use Zorn’s Lemma, which yields a maximal element g in .P. We
want to show that the domain J of g coincides 'with A. Otherwise, we may choose
.k ∈ A \ J . Now g is an accumulation point in .
Xα ofthe net .{(Sn )|J | n ∈ D},
α∈J
and thus g is the limit of some subnet . (Sϕ(β) )|J β∈B .
Now, every .Xk is a compact space, the net . Sϕ(β) β ∈ B has an accumulation
point .p ∈ Xk . We define a function h with domain .J ∪{k} by setting .h = g on J and
.h(k) = p. It is clear that h is a partial accumulation point of the net .{Sn , n ∈ D},
proof is complete.
Yet another proof of Tychonoff’s Theorem is based on the following result, of
independent interest.
Theorem 13.45 (Alexander Sub-Base Theorem) Let X be a topological space
with a sub-base .B. Then the following are equivalent:
(i) Every open cover has a finite subcover (i.e. X is compact);
(ii) Every sub-basic open cover has a finite subcover.
With a clear choice of words, a sub-basic open cover is merely a cover which
consists of elements taken from the sub-base .B.
Proof We propose a proof by T. Tao. Call an open cover bad if it has no finite
subcover, and good otherwise. It suffices to show that if every sub-basic open cover
is good, then every basic open cover is also good, where basic refers to the basis
B∗ = {B1 ∩ · · · ∩ Bk | B1 , . . . , Bk ∈ B, k ∈ N}
.
is the standard basis associated to the sub-basis .B. Suppose for contradiction that
every sub-basic open cover was good, but at least one basic open cover was bad. If
we order the bad basic open covers by set inclusion, observe that every chain of bad
basic open covers has an upper bound that is also a bad basic open cover, namely the
union of all the covers in the chain. Thus, by Zorn’s lemma, there exists a maximal
bad basic open cover
C = {Uα | α ∈ A} .
.
Thus this cover has no finite subcover, but if one adds any new basic open set to this
cover, then there must now be a finite subcover.
Pick a basic open set .Uα from this cover .C. Then we can write
Uα = B1 ∩ · · · ∩ Bk
.
for some choice of the sub-basic open sets .B1 , . . . , Bk . We claim that at least one of
the .B1 , . . . , Bk also lies in .C. Suppose not, and observe that adding any of the .Bi to
.C enlarges the basic open cover and thus creates a finite subcover; thus .Bi together
with finitely many sets from .C cover X, or equivalently one can cover .X \ Bi with
finitely many sets from .C. Thus one can also cover
k
X \ Uα =
. (X \ Bi )
i+1
244 13 Neighbors Again: Topological Spaces
with finitely many sets from .C and thus X itself can be covered by finitely many
sets from .C, a contradiction.
From the above discussion and the axiom of choice, we see that for each basic
set .Uα in .C there exists a sub-basic set .Bα containing .Uα that also lies in .C. (Two
different basic sets .Uα , .Uβ could lead to the same sub-basic set .Bα = Bβ , but this
will not concern us.) Since the .Uα cover X, the .Bα do also. By hypothesis, a finite
number of .Bα can cover X, and so .C is good, which gives the desired a contradiction.
'
Proof of Tychonoff’s Theorem via Sub-Bases Let .X = {Xα | α ∈ A} a product
of compact spaces. In virtue of the Alexander sub-base Theorem, it suffices
to show
−1
that any open cover of X by sub-basic open sets . παβ (Uβ )) β ∈ B has a finite
sub-cover, where B is some index set, and for each .β ∈ B, .αβ ∈ A and .Uβ is open
in .Xαβ .
For each .α ∈ A, consider the sub-basic open sets .πα−1 (Uβ ) that are associated to
those .β ∈ B with .αβ = α. If the open sets .Uβ here cover .Xα , then by compactness
of .Xα , a finite number of the .Uβ already suffice to cover .Xα , and so a finite number
of the .πα−1 (Uβ ) cover X, and we are done. So we may assume that the .Uβ do not
cover .Xα , thus there exists .xα ∈ Xα that avoids all the .Uβ with .αβ = α. One then
sees that the point .(xα )α∈A in X avoids all of the .πα−1 (Uβ ), a contradiction. The
claim follows.
H. Lebesgue proved an interesting result: if .U is an open cover of a closed
interval of .R, then there exists a radius .r > 0 such that, if .|x − y| < r, then x and y
are both contained in some member of the cover .U. It is not so easy to provide an
intuitive proof without mentioning compactness: if it is evident that each open set
of .R contains an open interval of some length, in general this length depends on the
member of the open cover.
We prove a generalization of Lebesgue’s result valid in any metric space.
Theorem 13.46 (Lebesgue Covering Lemma) If .U is an open cover of a compact
subset A of a metric space .(X, d), then there exists a positive number r such that
the open sphere of radius r about each point of A is contained in some member of
.U.
(r, +∞). Therefore, for each .x ∈ A there is an index i such that .fi (x) > r, and it
follows that the open sphere of radius r about x is contained in .Ui .
13.10 Compact Spaces 245
P (x) = 0
.
may not have a solution .x ∈ R. For instance, the equation .x 2 + 1 + 0 does not
have any real solution, by the obvious fact that .x 2 + 1 ≥ 0 + 1 = 1 > 0.
The Fundamental Theorem of Algebra states that every polynomial with complex
coefficients possesses at least a complex solution. The proof of this important result
is often postponed to a course in Complex Analysis.
In this section we present an elementary proof due to Charles Fefferman, see [3].
Theorem 13.48 Let .n ∈ N, .a0 , . . . , an ∈ C and
P (z) = a0 + a1 z + · · · + an zn
.
Proof We first prove that the function .z ∈ C → |P (z)| attains a minimum. To prove
this claim, we notice that
n an−1 a0
.|P (z)| = |z|
a n + z + · · · + z n
for every .z ∈ C \ {0}. Hence there exists a number .M > 0 such that
Since the set .B[0, M] = {z ∈ C | |z| ≤ M} is compact in .C, the continuous function
z → |P (z)| attains a global minimum at some .z0 ∈ B[0, M]. Hence
.
|P (z)| ≥ |P (z0 )|
. for every z ∈ B[0, M]. (13.2)
Since .0 ∈ B[0, M], we see that .|P (z0 )| ≤ |P (0)| = |a0 |, and (13.1) implies that
|P (z0 )| ≤ |P (z)| as soon as .|z| > M. A comparison with (13.2) shows that
.
|P (z0 )| ≤ |P (z)|
. for every z ∈ C. (13.3)
The claim is then proved. As a second and last step, we will show that .P (z0 ) = 0.
Indeed, we exploit the identity .P (z) = P (z0 + (z − z0 )) to write .P (z) as a sum
of powers of .z − z0 . More formally, there exists a polynomial Q such that
P (z) = Q(z − z0 ).
.
|Q(0)| ≤ |Q(z)|
. for every z ∈ C.
We need to prove that .Q(0) = 0. Let j the smallest positive integer such that .zj has
a non-zero coefficient in the expansion of the polynomial Q. Then we can write
Q(z) = c0 + cj zj + · · · + cn zn ,
.
Q(z) = c0 + cj zj + zj +1 R(z),
.
Pick .N > 0 so large that .|R(εz1 )| ≤ N for every .ε ∈ (0, 1). Recalling (13.4) we
see that
j j j +1
.|Q(εz1 ) ≤ c0 + cj ε z + ε
1 |z1 |j +1 |R(εz1 )|
j
≤ c0 + εj (cj z1 ) + εj +1 |z1 |j +1 N
= c0 + εj (−c0 ) + εj +1 |z1 |j +1 N
= (1 − εj )|c0 | + εj +1 |z1 |j +1 N
= |c0 | − εj |c0 | + εj +1 |z1 |j +1 N.
Suppose now that .c0 = 0. Since .ε is arbitrary, we can pick it so small that
This contradicts the fact that Q attains a global minimum at 0. Hence .c0 = 0, and
the proof is complete.
Example 13.33 It is false that the continuous image of a locally compact space
must be locally compact. This follows from the interesting fact that any topological
space is the continuous one-to-one image of a discrete space. Indeed, let .(X, τ ) be
a topological space, and let Y be the topological space consisting of the set X with
the discrete topology. The identity map from Y to X is continuous and bijective. But
every discrete space is locally compact, while X is an arbitrary topological space.
Definition 13.55 (Nowhere Dense) A set in a topological space is nowhere dense
if and only if its closure has an empty interior.
'
Theorem 13.49 Let .X = {Xα | α ∈ A} be a topological product space. If an
infinite number of the coordinate spaces .Xα are non-compact, then each compact
subset of X is nowhere dense.
Proof Suppose B is a compact subset of X with an interior
−1 point x. ThenB contains
a neighborhood U of x which is of the form .U = Pα (Vα ) α ∈ F , for some
finite subset F of A and some open sets .Vα in .Xα . If .β ∈ A \ F , then .Pβ (B) = Xβ
and .Xβ is compact as the continuous image of a compact space. As a consequence,
all but finitely many of the coordinate spaces .Xα are compact. The proof is complete.
Theorem 13.50 (Local Compactness of Product Spaces) If a product space is
locally compact, then each coordinate space is locally compact and all but a finite
number of coordinate spaces are compact.
Proof Suppose that a product space is locally compact. Since the projection into a
coordinate space is an open map, each coordinate space is locally compact. Indeed,
if a function is both continuous and open, the image of a compact neighborhood of
a point is a compact neighborhood of the image point.
If infinitely many coordinate spaces are non-compact, then each compact subset
of the product space is nowhere dense by Theorem 13.49. Hence no point can have
a compact neighborhood. The proof is complete.
As a matter of facts, non-compact spaces exist. So the question is: can we somehow
embed a non-compact space into a compact one? To be more precise: can we
construct a compact space which contains the non-compact space as a subspace?
This is a typical problem in mathematical analysis. For instance, it is common
to attach two “points” .−∞ and .+∞ to the real line, so that the resulting set is a
compact space. In Complex Analysis, the complex unit sphere is constructed by
adjoining a single point .∞ to the bi-dimensional space .C and specifying that the
neighborhoods of .∞ are the complements of bounded subsets of .C. In this section
we present an abstract construction along the same lines.
13.11 Compactification of a Space 249
Proof We follow [5]. A set U is open in .X∗ if and only if (a) .U ∩ X is open
in X and (b) whenever .∞ ∈ U , then .X \ U is compact. As a consequence, finite
intersections and arbitrary unions of sets open in .X∗ intersect X in open sets. If .∞ is
a member of the intersection of two open subsets of .X∗ , then the complement of the
intersection is the union of two closed compact subsets of X and is therefore closed
and compact. If .∞ belongs to the union of the members of a family of open subsets
of .X∗ , then .∞ belongs to some member U of the family, and the complement of
the union is a closed subset of the compact set .X \ U and is therefore closed and
compact. Consequently .X∗ is a topological space and X is a subspace.
Let .U be an open cover of .X∗ . Then .∞ is a member of some U in .U and .X \U is
compact, and hence there is a finite subcover of .U. This proves that .X∗ is compact.
If .X∗ is a Hausdorff space, then its open subspace X is a locally compact
Hausdorff space. To conclude, we must show that .X∗ is a Hausdorff space if X
is a locally compact Hausdorff space. It is sufficient to show that, if .x ∈ X, then
there exist disjoint neighborhoods of x and .∞. Since X is locally compact and
Hausdorff, there is a closed compact neighborhood U of x in X, and .X∗ \ U is the
required neighborhood of .∞.
. {V \ {a} | V is a neighborhood of a}
is a filter base which defines a filter .F. As we have seen, there exists a ultrafilter .U
finer that .F. The elements of .F are mutually disjoint, and the same must be true for
the elements of .U. Actually, if .b = a, there exists a neighborhood V of a which
does not contain b, so that .V \ {a} belongs to .F and does not contain b. Clearly,
.V \ {a} does not contain a.
Theorem 13.53 Let X be a set. A filter .U is a ultrafilter if and only if for each
A ⊂ X, either .A ∈ U or .X \ A ∈ U.
.
.U is a ultrafilter.
topology on a subset.
Definition 13.63 Let .B be a filter base on X, and .f : X → Y be a map. The family
f (B) = {f (A) | A ∈ B} is a filter base on Y called the direct image of .B under f .
.
The filter generated by .f (B) is called the direct image of the filter generated by .B.
It should be remarked that .f (B) need not be a filter, even in the favorable case in
which .B is a filter.
Example 13.38 Let .{xn }n be a sequence in X, i.e. a function from .N into X. The
Fréchet filter on .N has a direct image .F in X, which we call the Fréchet filter of the
sequence, or the elementary filter associated to the given sequence: this is the set of
all subsets of X which contain all but finitely many terms .xn .
Theorem 13.54 If .B is a ultrafilter base of X, its direct image under a map
f : X → Y is a ultrafilter base on Y .
.
10 Here we are pedantic: without any further assumption on the topology, a filter can converge to
different points, and this is why we write .x ∈ lim F. Nevertheless, the notation .x = lim F is often
used in the literature.
13.12 Filters and Convergence 253
Proof Let us suppose that the last condition is satisfied. In particular we can select
the neighborhood filter .F of x in X. Its image .f (F) converges to .f (x), and thus f
is continuous at x.
Conversely, let us suppose that f is continuous at x. For each neighborhood W
of .f (x) in Y we can pick a neighborhood V of x in X such that .f (V ) ⊂ W . This
shows that W belongs to the image of the neighborhood filter of x, and therefore this
image filter is finer than the neighborhood filter of .f (x) in Y . Now, if a filter .F is
finer than the neighborhood filter of x, its image .f (F) is finer than the neighborhood
filter of .f (x), and thus it converges to .f (x).
An interesting remark we make is that filters and nets are somehow homomorphic
structures.
Theorem 13.58 Consider a set X.
(a) If .F is a filter on X then the set .IF of pairs .(A, p) such that .A ∈ F and .p ∈ A
is directed by .(A, p) ≤ (B, q) if and only if .A ⊃ B. Furthermore the function
(A, p) ∈ IF → p ∈ X
.
F = {A ⊂ X | ∃N ∈ D ∀n ∈ D(n ≥ N ⇒ Sn ∈ A)}
.
Exercise 13.42 Prove Theorem 13.58. Establish a comparison between the conver-
gence of a net and the convergence of the associated filter.
We will not pursue further the study of filters, since it is by now apparent that they
produce the same results as nets. Choosing either nets or filters for a description of
the topology is essentially a matter of taste.
While most textbooks in the USA propose the traditional definition, in France the
school of Bourbaki imposed the second definition from high schools to universities.
The reason is fairly philosophical: Bourbaki believes that (Real) Analysis is a
daughter of General Topology, and in General Topology the true concept is
continuity. What many people call removable discontinuities is a strange animal
that we can discard from the theory of limits.
If we are in love with nets, we may propose the following definition.
Definition 13.67 (Limits with Nets) We say that .limx→p f (x) = q if and only if
for each net .{xn , n ∈ D} such that .xn = p for each .n ∈ D and which converges to
p, the net .{f (xn ), n ∈ D} converges to q.
This turns out to be equivalent to the traditional definition, and this does not come
as a surprise: it is a mere generalization of the characterization of limits in metric
spaces in terms of sequences. But the price to pay is that the condition
xn = p for each n ∈ D
.
does not belong to the general theory of convergent nets. If the domain X is a
subspace of a metric space, there is a nice way out: we can say that .x > y if
and only if .|x − p| < |y − p|. In this way we exclude p in an elegant fashion,
and convergence with respect to the direction .> reduces to the traditional definition
of limit. What about filters? Well, the first idea is to use the “filter” of punctured
neighborhoods of p, but this is not a filter: the point p may get back through the
window if we pass to a superset of a punctured neighborhood.
So, the only way out is to “screw up” the whole topological space X to which p
belongs by removing p once and for all: now punctured neighborhoods of p form a
filter. But this amounts to considering the new function .f|X\{p} which agrees with f
away from the point p. But then .q = limx→p f (x) is equivalent to the requirement
that
f (x) if x = p
.g(x) =
q if x = p
be continuous at p.
Let us summarize:
for each closed .A ⊂ X, there exist disjoint open sets U and V such that .x ∈ U ,
.A ⊂ V .
(b) We say that X is normal if the following condition is satisfied: if A and B are
closed subsets of X, there exist open sets U , V such that .A ⊂ U , .B ⊂ V , and
.U ∩ V = ∅. See Fig. 13.8.
It should be noted that intermediate separation properties have been introduced, but
they are of little importance in our setting.
Exercise 13.43 Prove that a space X is .T1 if and only if every singleton .{x} is a
closed set.
In the basic topological space .R (endowed with its natural distance), and even in any
metric space, a point y is an accumulation point for a subset A if and only if each
neighborhood of y contains infinitely many elements of A. The proof is standard:
just consider open balls .B(y, 1/n) which shrink to y as .n → +∞. However
this argument depends on the metrizability of the topology. The following result
shows that a rather weak separation assumption ensures the validity of the previous
characterization of accumulation points.
Theorem 13.59 Suppose that y is an accumulation point of a subset A of a .T1 -
space X. Then every neighborhood of y contains infinitely many points of A.
Proof Let U be a neighborhood of y, and let .F = A ∩ (U \ {y}). We claim that
F is an infinite set. Otherwise, .X \ F would be an open set (since singletons are
closed in a .T1 -space), and .y ∈ X \ F . But the neighborhood .X \ F of y cannot
contain points of A other than y itself, and this contradicts the assumption that y is
an accumulation point of A. The proof is complete.
The following technical Lemmas will be useful in the sequel.
Lemma 13.2 Suppose that for each element t of a dense subset D of .(0, +∞), .Ft
is a subset of a set X such that
implies .Ft ⊂ Fs ,
(a) .t < s
(b) .X = {Ft | t ∈ D}.
For .x ∈ X let .f (x) = inf {t ∈ D | x ∈ Ft }. Then
. {x ∈ X | f (x) < s} = {Ft | t ∈ D, t < s}
{x ∈ X | f (x) ≤ s} = {Ft | t ∈ D, t > s}
and the properties of the infimum imply that .{x ∈ X | f (x) < s} is the set of all
.x ∈ X such that for some .t ∈ D, .t < s and .x ∈ Ft . This proves the first identity of
the conclusion.
To prove the second one, we remark that .inf {t ∈ D | x ∈ Ft } ≤ s if and only
if for each .u > s there exists .t < u such that .x ∈ Ft . Conversely, if for each
.t ∈ D such that .t > s it is true that .x ∈ Ft , then .inf {t ∈ D | x ∈ Ft } ≤ s
258 13 Neighbors Again: Topological Spaces
because D is dense in .(0, +∞). We conclude that the set of all x such that
f (x) = inf {t ∈ D | x ∈ Ft } ≤ s coincides with
.
. {x ∈ X | if t ∈ D, t > s, then x ∈ Ft } ,
. {x ∈ D | f (x) ≤ s} = {Ft | t ∈ D, t > s} ,
and the proof will be complete once we show that this set is identical with
. Ft t ∈ D, t > s .
On the other hand, for each .t ∈ D with .t > s there exists .r ∈ D such that .s < r < t,
and thus such that .Fr ⊂ Fs . he reverse inclusion follows, and the proof is complete.
Theorem 13.60 (Urysohn’s Lemma) Let Y be a Hausdorff space. The following
statements are equivalent:
1. Y is normal;
2. for each pair A, B of disjoint closed sets in Y , there exists a continuous function
.f : Y → R, which we call a Urysohn function, such that (a) .0 ≤ f ≤ 1 on Y ,
(b) .f = 0 on A, (c) .f = 1 on B.
Proof We show that 2. implies 1. Fix two closed sets A and B such that .A ∩ B = ∅.
Denoting by f a Urysohn function for this pair, we set .U = {y ∈ Y | f (y) < 1/2}
and .V = {y ∈ Y | f (y) > 1/2}. Then U and V are disjoint open sets such that
.A ⊂ U , .B ⊂ V .
The converse implication is proved as follows. Let D be the set of positive dyadic
rational numbers, i.e. the set of all numbers of the form .p2−q and p and q range over
13.14 Separation and Existence of Continuous Extensions 259
all positive integers. For every .t ∈ D with .t > 1, we let .F (t) = X, .F (1) = X \ B,
and .F (0) be an open set which contains A and such that .F (0) ∩ B = ∅. For every
.t ∈ D with .0 < t < 1, we write .t = (2m+1)2
−n and choose, inductively on n, .F (t)
to be an open set which contains .F (2m2 ) and such that .F (t) ⊂ F ((2m + 2)2−n ).
−n
Of course this construction if possible because X is a normal space. Finally, let
.f (x) = inf {t ∈ D | x ∈ F (t)}. Lemma 13.3 shows that f is a continuous function.
Proof Let
c c
. A+ = x ∈ a g(x) ≥ , A− = x ∈ A g(a) ≤ − .
3 3
These two sets are disjoint and closed in the closed .A ⊂ X, so that both .A− and .A+
are closed in X. Since X is normal, a Urysohn function .h : X → R exists having
value .c/3 on .A+ and .−c/3 on .A− . Furthermore .−c/3 ≤ h(x) ≤ c/3 for every
.x ∈ X.
Theorem 13.64 (Tietze Extension Theorem) Let X be a Hausdorff topological
space. The following statements are equivalent:
1. X is normal;
2. for every closed set .A ⊂ X and every continuous function .f : A → R, there
exists a function .F : X → R such that F coincides with f on A. Furthermore, if
.|f (x)| < c for every .x ∈ A, then .|F (x)| < c for every .x ∈ X.
1 2
|h1 (x)| ≤
. · c x∈X
3 3
2 2
|f (x) − h0 (x) − h1 (x)| ≤ · c x ∈ A.
3 3
13.14 Separation and Existence of Continuous Extensions 261
Now we proceed by induction, and assume that .h0 , .h1 , . . . , hn have been defined.
Lemma 13.4 applied to .g = f − h0 − · · · − hn on A yields .hn+1 : X → R such
that
n
1 2
|hn+1 (x)| ≤
. · c x∈X
3 3
n
2 2
|f (x) − h0 (x) − · · · − hn+1 (x)| ≤ · c x ∈ A.
3 3
∞
1 2 n
. |F (x)| ≤ c = c.
3 3
n=0
Step 2. .|f (x)| < c for every .x ∈ A. Indeed, the extension F constructed in Step
1 satisfies .|F (x)| ≤ c for every .x ∈ X. We set .A0 = {x ∈ X | |F (x)| = c}. This
set is closed in X and disjoint from A. Therefore there exists a Urysohn function
.ϕ : X → R having value 1 on A and value 0 on .A0 , with .0 ≤ ϕ ≤ 1 everywhere.
We define .G(x) = ϕ(x)F (x), a continuous function such that .G(x) = F (x) =
f (x) for .x ∈ A. Thus G extends f ; furthermore .|G(x)| < c for every .x ∈ X.
Indeed .G(x) = 0 if .x ∈ A0 , while .|ϕ(x)| ≤ 1 if .X ∈ X \ A0 , and .|F (x)| < c.
Step 3. f is not necessarily bounded. In this case we introduce the function
.h : R → (−1, 1) such that
x
h(x) =
. .
1 + |x|
Proof As a compact subset of .X∗ , K is closed in .X∗ , while U is open in .X∗ . Since
∗
.X is a compact Hausdorff space, it is normal, and there exists an open subset V of
262 13 Neighbors Again: Topological Spaces
of V in X: indeed the former is a subset of X and the latter is equal to the former
intersected with X. Since .V is closed in the compact space .X∗ , it is also compact,
and since .V ⊂ X is open in .X∗ , it is also open in X. Thus V is the desired open
set.
Theorem 13.65 (Urysohn’s Lemma for LCH Spaces) If X is a locally compact
Hausdorff space, .K ⊂ U ⊂ X, K is compact and U is open, then there exists a
continuous function f on X such that .0 ≤ f ≤ 1, .f ≡ 1 on K, and .supp f ⊂ V .
Proof By the previous Proposition, we may choose an open set V with compact
closure such that .K ⊂ V ⊂ V ⊂ U , since K and .X∗ \ V are disjoint closed
subspaces of the normal space X. By Theorem 13.60, there is a continuous function
g on X such that .0 ≤ g ≤ 1, .g ≡ 1 on K, and .g ≡ 0 on .X∗ \ V . We define f as the
restriction of g to X. Clearly f is continuous on X, .0 ≤ f ≤ 1, and .f ≡ 1 on K.
Since g vanishes outside .V , so does f , and this implies
. supp f = {x ∈ X | f (x) = 0} ⊂ V ⊂ U,
One of the most important tools of Mathematical Analysis is the possibility of gluing
together functions with compact support. As a rough idea, several constructions
like Analysis on Manifolds or Partial Differential Equations proceed from local to
global: around every point we are able to construct something, and the we would
like to extend such a construction to the whole space.
If we work in a compact setting, it is intuitive that a compact space behaves much
like a finite space, in the sense that every open cover may be assumed to be finite
from the beginning. However this is only the most favorable case, and quite often
compactness may not be assumed. In this section we introduce the definition of
paracompactness, originally due to J. Dieudonné. We will show that it generalizes
the definition of compactness and that it allows us to define partitions of unity: a
collection of continuous, compactly supported functions whose sum equals one at
any point.
Definition 13.70 A collection .A of subsets of a topological space X is locally finite
in X if and only if every point of X possesses a neighborhood which intersects only
finitely many elements of .A.
Exercise 13.44 In .R with the Euclidean topology, prove that .A = {(n, n + 2) | n
∈ Z} is locally finite.
13.15 Partitions of Unity and Paracompact Spaces 263
∞each of them with .R \ Bm−1 . We call such a finite collection .Cm . Clearly
intersect n
Sn (U ) = {x ∈ X | B(x, 1/n) ⊂ U } .
.
If V and W are distinct elements of .A, then .d(x, y) ≥ 1/n for every .x ∈ Tn (V )
and every .y ∈ Tn (W ). Indeed, since .< is a well-ordering for .A, we may assume
that .V < W . Since .x ∈ Tn (V ), we have .x ∈ Sn (V ), so the open ball centered at x
with radius .1/n lies in V . On the other hand, since .y ∈ Tn (W ) and .V < W , we see
that .y ∈/ V . Hence y does not lie in the open ball of center x and radius .1/n.
Since there is no reason why the sets .Tn (U ) should be open, we enlarge them a
little bit, and we set
1 1
.En (U ) = x ∈ R d(x, Tn (U )) < = B x, x ∈ Tn (U ) ,
3n 3n
Finally, we set
En = {En (U ) | U ∈ A} .
.
The collection .En is a locally finite collection of open sets which refines .A. Indeed,
En (V ) ⊂ V for every .V ∈ A. Furthermore, for every .x ∈ X, the open ball of center
.
x and radius .1/(6n) intersects at most one element of .En . To obtained the desired
countably locally finite refinement of .A, we define
∞
E=
. En .
n=1
13.15 Partitions of Unity and Paracompact Spaces 265
We only need to check that .E covers X. If x is any point of X, let U be the smallest
element of .A which contains x, with respect to the well-order .<. Since U is an
open set, there exists a positive integer n such that .B(x, 1/n) ⊂ U . By definition
.x ∈ Sn (U ). It follows that .x ∈ Tn (U ), since U is the smallest element of .A which
Proof Since (d) implies (a) in a trivial way, we will prove that (a) implies (b) implies
(c) implies (d).
(a) implies (b) Consider any open cover .A of X, and let .B be an open refinement
∞ covers X and is countably locally finite. We may suppose that .B =
which
n=1 Bn , where each .Bn is locally finite. For every index i we define .Vi =
{U | U ∈ Bi }. For every positive integer n and every .U ∈ B)n, define
Sn (U ) = U \
. Vi .
i<n
finitely many elements of .Cn . But U is in .BN , hence U does not intersect any
element of .Cn for .n > N. To summarize, the neighborhood
W1 ∩ W2 ∩ · · · ∩ WN ∩ U
.
(c) implies (d) This proof is slightly longer. We begin with an open cover .A of
X and we use (c) to construct a refinement .B which covers X and is locally
finite. The trick here is to “enlarge” the elements of .B in order to get open sets.
The geometric intuition is somewhat misleading, since we do not have a distance
function to play with. The proof is subtle.
So, for every .x ∈ X there exists an open neighborhood of X which intersects
only finitely many elements of .B. The union of these neighborhoods is therefore
an open cover of X. By assumption (c), we may define a closed refinement .C of
such an open cover which is still a cover of X and is also locally finite. Clearly,
every element of .C can intersect only finitely many elements of .B.
Now, for every .B ∈ B, we define
C(B) = {C | C ∈ C ∧ C ⊂ X \ B} ,
.
and
E(B) = X \
. {C | C ∈ C(B)} .
The set .E(B) is an open subset of X.12 By definition, .B ⊂ E(B). The collection
of all sets .E(B) need not be a refinement of .A, but we can always choose
some .F (B) ∈ A which contains B. Then .D = {E(B) ∩ F (B) | B ∈ B} is a
refinements of .A which covers X, since .B ⊂ E(B) ∩ F (B).
To conclude, we still have to prove that .D is locally finite. Pick a point .x ∈ X,
and choose an open neighborhood W of x which intersects only finitely many
elements .C1 , . . . , Ck of .C. Since .C is a cover of X, .W ⊂ ki=1 Ci . To prove
that W intersects only finitely many elements of .D, it is now sufficient to prove
that every .C ∈ C intersects only finitely many elements of .D. Suppose that C
intersects .E(B) ∩ F (B). Then C is not contained in .X \ B, which means that
.C ∩ B = ∅. But C intersects only finitely many elements of .B, hence it can
intersect no more than the same number of elements of .D. The proof is complete.
Proof of Theorem 13.69, Different Flavor Consider an open cover .{Ui | i ∈ I } of
the metric space X. As we saw above, we may suppose that the index set I is well-
ordered by .≤.Inparticular,
every .x ∈ X there exists a unique index .i ∈ I such
for
that .x ∈ Ui \ Uj j < i . Explicitly,
i = min j ∈ I x ∈ Uj .
.
12 Here
the fact that .C is a locally finite collection of closed sets is essential to ensure that
. {C | C ∈ C(B)} is a closed set. Recall that arbitrary unions of closed sets are not a closed set, in
general.
13.15 Partitions of Unity and Paracompact Spaces 267
Vi,n =
. B(x, 2−n ) x ∈ Xi,n
/ Uj j < i ,
B(x, 3 · 2−n ) ⊂ Ui , x ∈
Xi,n = x ∈ X .
x∈/ Vj,k j ∈ I, k < n
Our claim is that . Vi,n i ∈ I, n ∈ N is a locally finite open refinement of
.{Ui | i ∈ I }. Of course each element of . Vi,n i ∈ I, n ∈ N is open. By definition,
−n ) ⊂ B(x, 3·2−n ) ⊂ U for every ball contributing to .V , hence .V
.B(x, 2
i i,n i,n ⊂ Ui :
this proves that . Vi,n i ∈ I, n ∈ N is an open refinement of .{Ui | i ∈ I }. To
complete the proof, we must show that thisis a locally
finite refinement.
Pickany .x ∈ X, fix .i = min j ∈ I x ∈ Uj , and select .n ∈ N such that
.Vi,n = B(x, 2−n ) x ∈ Xi,n . Two cases are possible:
either .x ∈ Vj,k
for some
.j ∈ I and .k < n, or .x ∈ Xi,n ⊂ Vi,n . In any case, . Vi,n i ∈ I, n ∈ N is a cover
of X.
For every .x ∈ X, define .i = min j ∈ I x ∈ Vj,n n ∈ N . We can choose
positive integers k and n such that .B(x, 2−k ) ⊂ Vi,n . We are going to prove the
following statements:
(i) if . ≥ n + k, then .B(x, 2−n−k ) does not intersect any .Vj, ;
(ii) if . < n + k, then .B(x, 2−n−k ) intersects .Vj, for at most one index .j ∈ I .
These claims imply that the open neighborhood .B(x, 2
−n−k ) can meet at most
n + k − 1 elements of . Vi,n i ∈ I, n ∈ N , and the latter is then a locally finite
.
refinement.
To prove (i), we pick .y ∈ Xj, . Since . > n, .y ∈/ Vi,n . But .B(x, 2−k ) ⊂ Vi,n ,
hence .d(x, y) ≥ 2−k . Since . ≥ k + 1 and .n + k ≥ k + 1, the condition
would imply
which is a contradiction. Thus .B(x, 2−n−k ) is disjoint from each ball .B(y, 2− ),
.y ∈ Xj, . Since these balls cover .Vj, , (i) is proved.
Let us now turn to (ii). For .i < j we pick .x ∈ Vi, , .y ∈ Vj, . So there exist points
% %
.x and .y in X such that
13 We prefer here the use of indices to label the elements of the cover, since an intrinsic notation
would hide the correspondence between the index of the cover and the index of the function in the
partition of unity.
14 The sum is indeed a finite sum by condition (2), in the sense that only finitely many terms are
.Bβ ⊂ Uf (β) .
16 For every .α ∈ J , let .V be the union of all the elements of
α
Bα = Bβ f (β) = α .
.
finite.
Theorem 13.71 Suppose that X is a paracompact Hausdorff space. If .{Uα | α ∈ J }
is an open cover of X, then there exists a partition of unity dominated by
.{Uα | α ∈ J }.
Proof Invoking the previous Lemma twice, we construct locally finite open covers
{Wα | α ∈ J } and .{Vα | α ∈ J } such that
.
Wα ⊂ Vα ,
. V α ⊂ Uα
covers X.
Hence, for every .x ∈ X we can form the sum
(x) =
. ψα (x),
α∈J
since x has a neighborhood which intersects only finitely many of the sets .supp ψα .
Being a finite sum of continuous functions, . is a continuous function at any point
of X. If we set
ψα (x)
φα (x) =
. ,
(x)
We already know that there exists a partition of unity .{φα | α ∈ J } dominated by .U.
For each .α ∈ J and each .x ∈ X such that .ϕα (x) = 0, we choose a number .kα with
.h(x) < kα < g(x). The function .p : X → R defined by
p(x) =
. kα ϕα (x)
α∈J
is continuous and clearly satisfies .h(x) < p(x) < g(x) at any .x ∈ X.
Some constructions in mathematical analysis and also in differential geometry
are based on a common idea: one can “fill up” the Euclidean space .Rn by a sequence
.{Kn }n of compact sets such that .Kn ⊂ K
◦
n+1 , i.e. each .Kn is continued in the interior
of .Kn+1 . This is a trivial fact, since one can consider .Kn = {x ∈ Rn | *x* ≤ n}. We
now show that the same idea also works in more general situations.
Definition 13.74 (Exhaustion by Compact Sets) An exhaustion by compact sets
in a locally compact Hausdorff space X consists of a sequence .{Kn }n of compact
sets of X such that
∞
◦
Kn ⊂ Kn+1
. and X = Kn .
n=1
17 In many situations, continuity in not sufficient, and smooth partitions of unity must be used.
This happens in Differential Geometry, but also in the theory of Function Spaces. Smoothness is
ensured by convolution with suitable kernels.
13.15 Partitions of Unity and Paracompact Spaces 271
Proof We choose a countable base .{Bn }n of X, such that each .Bn is compact. We
define .K1 = B1 and we assume that .K2 , . . . , Kn have been chosen. Let .m > n be
the smallest integer such that .Kn ⊂ B1 ∪ · · · ∪ Bm , and let
Kn+1 = B1 ∪ · · · ∪ Bm .
.
We define .K2 = Ui(1) ∪ · · · ∪ Ui(j2 ) . As before, .K2 is compact and intersects finitely
many of the .Ui , i.e.
= Map(X, Y )
.
1. Prove that the sets .E(x, U ), as .x ∈ X and .U ⊂ Y is open, form a subbasis of the
topology of .Y X .
2. Since .M(x, U ) = ∩ E(x, U ), deduce that .{M(x, U ) | x ∈ X, U ⊂ Y is open}
is a subbasis of the pointwise topology of ..
Proposition 13.2 For each finite set .F ⊂ X and every open set .U ⊂ Y , let
M(F, U ) = {f ∈ Map(X, Y ) | f (F ) ⊂ U } =
. {M(x, U ) | x ∈ F } .
The basic weakness of the finite-open topology on . is clearly the fact that
the construction is independent of any topological structure of the set X. This
topology typically contains very few open sets, and few open sets correspond to
few continuous maps.
Definition 13.77 Let X and Y be topological spaces. For any compact .K ⊂ X and
any open .U ⊂ Y , we define
M(K, U ) = {f ∈ | f (K) ⊂ U } .
.
ω : (, τ ) × X → Y
.
ω(f, x) = f (x).
Using now x and its neighborhood .X \ A, we find an open set V such that .x ∈
V ⊂ V ⊂ X \ A. Hence .V ∩ A = ∅, and 2. implies 3.
274 13 Neighbors Again: Topological Spaces
Let A be closed, .x ∈
/ A. Pick a neighborhood V of x such that .V ∩ A = ∅. Then
A ⊂ X \ V , and .V ∩ (X \ V ) = ∅. This proves that 3. implies 1.
.
Theorem 13.74 (Arens) If X is a locally compact regular space, then the compact-
open topology on . is admissible. More precisely, it is the smallest of all admissible
topologies on ..
Proof Pick .f ∈ , .x ∈ X, and an open set W of Y containing .f (x). Since f is
continuous, .f −1 (W ) is an open set in X which contains x. Since X is a regular
locally compact space, there exists an open neighborhood V of x such that .V is
compact in X and is contained in .f −1 (W ). Then .U = M(V , W ) is an element of
the subbasis of the compact-open topology of . which contains f . It follows that
the evaluation .ω sends .U × V into W , so that the compact-open topology of . is
admissible.
To complete the proof, we will show that the compact-open topology of . is
smaller than any other admissible topology .τ on . without using our assumptions
on the space X. Pick any set of the form .M(K, W ), where K is compact in X and W
is open in Y , and pick an element .f ∈ M(K, W ). Since the topology .τ is admissible,
the evaluation map .ω : (, τ ) × X → Y is continuous. Hence, for every .x ∈ K,
there exist an open neighborhood .Vx of x in X and an open neighborhood .Ux of f
in .(, τ ) such that .ω (Ux × Vx ) ⊂ W . Since K is compact, there are finitely many
points .x1 , . . . , xn such that
K ⊂ Vx 1 ∪ · · · ∪ Vx n .
.
Let
U = Ux 1 ∩ · · · ∩ Ux n .
.
Although a natural setting for this purpose would be the theory of uniform
structures, we believe that a simplified setting is more than enough to the Analyst’s
eye. So we will consider a metric space Y . The following result brings in another
useful property.
Proposition 13.4 Every metric space is homeomorphic to a bounded metric space.
Proof Let .(Y, d) be a metric space. We define .d ∗ : Y × Y → [0, +∞) such that
d(x, y)
d ∗ (x, y) =
. , (x, y) ∈ Y × Y.
1 + d(x, y)
d(x, y) d(x, z)
d ∗ (x, y) + d ∗ (y − z) ≥
. +
1 + d(x, y) 1 + d(x, y) + d(x, z)
d(x, y) + d(x, z) 1
= =
1 + d(x, y) + d(x, z) 1+ 1
d(x,y)+d(x,z)
1
≥ = d ∗ (y, z).
1+ 1
d(y,z)
Since .t → t/(1 + t) is a homeomorphism of .[0, +∞) onto itself, the spaces .(Y, d)
and .(Y, d ∗ ) are homeomorphic. Finally, the fact that .d ∗ (x, y) ≤ 1 for every .x ∈ Y ,
∗
.y ∈ Y shows that .(Y, d ) is a bounded metric space.
In virtue of the previous Proposition we may assume that .(Y, d) is a bounded
metric space.
Definition 13.80 Consider . = Map(X, Y ). The function .d ∗ : × → [0, +∞)
such that
continuous, there exists a neighborhood V of .x0 in X such that .d(y0 , f0 (x)) < δ/2
for every .x ∈ V . Let
δ
U = f ∈ d ∗ (f, f0 ) <
. .
2
There results
For .x ∈ X, we denote
δ
Wx = y ∈ Y d(f (x), y) <
. .
2
X = Gx 1 ∪ · · · ∪ Gx n .
.
δ δ
d(f (xi ), f (x)) <
. , d(f (xi ), g(x)) < .
3 3
We obtain
2
d(f (x), g(x)) ≤ d(f (x), f (xi )) + d(f (xi ), g(x)) <
. δ.
3
The collection of open sets .{Vb | b ∈ B} covers B, and by compactness there are
points .b1 , . . . , bn ∈ B such that
B ⊂ Vb 1 ∪ · · · ∪ Vb n .
.
Then
n
{a} × B ⊂ U × V ⊂
. Ubi × Vbi ⊂ W.
i=1
We now prove the statement in the general case. We have just proved that for every
.a ∈ A there exist open sets .Ua ⊂ X, .Va ⊂ Y , such that .{a} × B ⊂ Ua × Va ⊂ W .
Again, the collection .{Ua | a ∈ A} of open sets covers A, and by compactness
.A ⊂ Ua1 ∪ · · · Uan for some points .a1 , . . . , an ∈ A.
278 13 Neighbors Again: Topological Spaces
To conclude, the open sets .U = Ua1 ∪ · · · ∪ Uan and .V = Va1 ∩ · · · ∩ Van satisfy
A × B ⊂ U × V ⊂ W.
.
Definition 13.81 A metric space is totally bounded if it can be covered by a finite
number of open balls of radius r, for every .r > 0.
Part of the following result was proved in a more general setting. We present a
version that contains a new characterization of compactness in metric spaces.
Theorem 13.78 If K is a closed subset of a complete metric space .(X, d), the
following statements are equivalent:
(a) K is compact;
(b) every infinite subset of K has an accumulation point;
(c) K is totally bounded.
finite number of steps, because of assumption (b). Thus the open balls with radius .ε
centered at .x1 , . . . , xn cover K, and K is totally bounded.
Suppose now (c). Consider an open cover . of K which has no finite subcover.By
assumption (c), K is a union of finitely many closed sets of diameters less than or
equal to one. One of these, say .K1 , cannot be covered by finitely many members of
. . Now we repeat this scheme with .K1 instead of K, and continue. The result is a
every .f ∈ F and every .x ∈ U . The family .F is pointwise totally bounded if the set
.{f (x) | f ∈ F} is totally bounded in Y for every .x ∈ X.
13.16 Function Spaces 279
For every .ε > 0 the compact set .K × {x0 } is a subset of the open set .α −1 ([0, ε)),
and Wallace’s Theorem 13.77 yields an open set .U ⊂ X such that .x0 ∈ U and
.K × U ⊂ α
−1 ([0, ε)). In particular .d(f (x ), f (x)) < ε for every .f ∈ F and any
0
.x ∈ U .
We now prove the converse implication, assuming that (i) and (ii) hold. It suffices
to prove that .F is totally bounded in .Map(X, Y ). Pick any .ε > 0; the equicontinuity
of the family .F implies that for any .x ∈ X there exists an open neighborhood .Ux of
x such that .d(f (x), f (y)) < ε for every .f ∈ F and every .y ∈ Ux . Since the space
X is compact, there exist finitely many points .x1 , . . . , xn ∈ X such that
X = Ux 1 ∪ · · · ∪ Ux n .
.
We claim that .F is the union of open balls centered at .g ∈ F with radius .3ε.
Indeed, pick .f ∈ Fand .g ∈ F such that .d(f (xi ), g(xi )) < ε for every i. Then for
any .x ∈ X there exists an index i such that .x ∈ Uxi , whence
d(f (x), g(x)) ≤ d(f (x), f (xi )) + d(f (xi ), g(xi )) + d(g(xi ), g(x)) < 3ε,
.
Corollary 13.4 (Arzelà) Let X be a compact space, and let .{fk }k be a sequence of
continuous functions from X to .Rn . If the sequence .{fk }k is pointwise bounded and
equicontinuous, then there exists a uniformly convergent subsequence.
Metric spaces are for sure the most ubiquitous topological structure in Mathematical
Analysis. We dare say that our minds always think in terms of a distance, although
this might be conceptually wrong.
Anyway, metric topologies are particularly easy to work with, and in this section
we try to analyze them and discover when a given topology is induced by a metric.
Definition 13.83 A pseudo-metric (or ècart) for a set X is a function .d : X×X → R
satisfying the following conditions: for every .x ∈ X, .y ∈ X and .z ∈ X,
(a) .d(x, y) = d(y, x);
(b) .d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality);
(c) .d(x, y) = 0 if .x = y.
The function d is a metric if, in addition to the previous properties, it also satisfies
(d) if .d(x, y) = 0, then .x = y.
A pseudo-metric space is a couple .(X, d) in which X is a set and d is a pseudo-
metric on X.
Remark 13.20 Although the non-degeneracy condition (d) is usually assumed in
any Analysis textbook, it turns out that its relevance from a topological viewpoint is
really small. Of course the results of this section hold for any metric space.
Definition 13.84 (Balls) The open ball of center x and radius .r > 0 in a pseudo-
metric space X is the set
. B(x, r) = {y ∈ X | d(x, y) ≤ r} .
Swapping x and y we see that .d(y, A) ≤ d(y, x) + d(x, A), and therefore
If .y ∈ B(x, r), then .|d(x, A) − d(y, A)| ≤ r, and the proof is complete.
Theorem 13.81 In a pseudo-metric space X, the closure of A is the set of point
whose distance to A is equal to zero.
Proof We have to prove that
A = {x ∈ X | d(x, A) = 0} .
.
By continuity, U and V are open sets, and they are clearly disjoint. Since A and B
are closed, it follows from Theorem 13.81 that .A ⊂ U and .B ⊂ V . This concludes
the proof.
Theorem 13.83 A net .{Sn , n ∈ D} is a pseudo-metric space X converges to a point
x if and only if the net .{d(x, Sn ), n ∈ D} converges to zero.
282 13 Neighbors Again: Topological Spaces
If the supremum on the right-hand side is infinite, we way that A has infinite
diameter.
We have already proved the following result in Proposition 13.4. We propose here a
different proof.
Theorem 13.84 Let .(X, d) be a pseudo-metric space, and let
Then .(X, e) is a pseudo-metric space whose topology coincides with the topology
induced by d.
Proof Let .a ≥ 0, .b ≥ 0 and .c ≥ 0 be such that .a + b ≥ c. We claim that
topology. To this aim, we remark that if .V = B(x, 2−p ) is an open ball centered at
.x ∈ X and if
−n−p−2
.U = y ∈ X dn (xn , yn ) < 2 for every n ≤ p + 2 ,
p+2 ∞
1 1 1
d(x, y) <
. + n
< p.
2n+p+2 2 2
n=1 n=p+3
It follows that each set which is open in the pseudo-metric topology is also open in
the product topology of X. Conversely, we consider any element U of the sub-base
which defines the product topology of X. Hence .U = {x ∈ X | xn ∈ W } for some
open set .W ⊂ Xn . Any .x ∈ U has an open ball of radius r, centered at .xn and
contained in W . Since
dn (xn , yn )
d(x, y) ≥
. ,
2n
the open ball centered at x with radius .r/2n is a subset of U . This proves that any
element of the sub-base of the product topology is open relative to the pseudo-metric
topology, and the proof is complete.
Definition 13.88 (Isometries) A function .f : X → Y between the pseudo-metric
spaces .(X, d) and .(Y, e) is an isometry if and only if
R = {(x, y) | d(x, y) = 0}
.
on .X×X. The quotient space .X/R is endowed with the quotient topology associated
to R.
Theorem 13.86 Let .(X, d) be a pseudo-metric space, and let
D = {x} x ∈ X .
.
Then .(D, d) is a metric space whose topology is the quotient topology of .X/R.
Furthermore, the projection of X onto .D is an isometry.
Proof We begin with a remark: .u ∈ {x} if and only if .d(u, x) = 0, which is true if
and only if .x ∈ {u}. If .u ∈ {x} and .v ∈ {y}, then
But .x ∈ {u} and .y ∈ {v}, hence .d(u, v) = d(x, y). As a consequence, for every
.A ∈ D, .B ∈ D, the value of .d(A, B) coincides with the value of .d(x, y) for every
.x ∈ A and .y ∈ B. This proves that .D, d) is a metric space, and the projection of X
onto .D is an isometry.
Let U be an open set in X and .x ∈ U . There exists .r > 0 such that .x ∈ B(x, r) ⊂
U , hence .{x} ⊂ U . The projection of X onto .D is thus an open map with respect
to the quotient topology, but the projection is also an open map with respect to the
metric topology induced by the distance d between sets. Hence these topologies
must coincide.
A basic question arises at this point: when is a given topology on a set X the
topology associated to a (pseudo)metric?
Definition 13.91 (Metrizable Spaces) A topological space X is metrizable if and
only if its topology coincides with the topology induced by a metric on X. Similarly,
X is pseudo-metrizable if and only if f its topology coincides with the topology
induced by a pseudo-metric on X.
Exercise 13.49 Prove that a pseudo-metric is a metric if and only if the associated
topology is .T1 . Deduce that a topological space is metrizable if and only if it is
pseudo-metrizable and its topology is .T1 .
A typical approach to metrizability results is via good embeddings of a given
topological space.
13.17 Cubes and Metrizability 285
Definition 13.92 (Cubes) Any cartesian product of the unit interval .[0, 1],
endowed with the product topology, is called a cube.
Explicitly, a cube is a topological space of the form .[0, 1]A , where A is a set, and
its topology is the topology of pointwise convergence. As a product space, each
element of a cube is a function whose domain is a specified set. In view of this
generality, cubes may fail to have good properties, and this is the reason why we
need to add suitable assumptions to the functions of our cubes.
Definition 13.93 Let X be a topological space, and let F be a collection of
functions .fj : X
'→ Yj , .i ∈ J, such that .Yj is a topological space. The evaluation
map .e : X → Yj j ∈ J is defined as follows: for every .x ∈ X, .e(x) is the
function .j ∈ J → fj (x).
Thus, roughly speaking the j -th coordinate of .e(x) is .fj (x), or .e(x)j = fj (x).
Definition 13.94 The collection .F = fj : X → Yj j ∈ J distinguishes points
if and only if for every .x ∈ X, .y ∈ X such that .x = y there exists .j ∈ J such that
.fj (x) = fj (y).
Definition 13.95 The collection .F = fj : X → Yj j ∈ J distinguishes points
and closed sets if and only if for every .x ∈ X and every closed set A such that
.x ∈ X \ A there exists .j ∈ J such that .fj (x) ∈
/ f (A).
We summarize in the next result the main topological features of the evaluation map.
Theorem 13.87 Let .F = fj : X → Yj j ∈ J be a collection of continuous
functions. Then
'
(a) the evaluation map .e : X → Yj j ∈ J is continuous;
(b) The evaluation e is an open map of X onto .e(X) if and only if F distinguishes
points and closed sets.
(c) The evaluation map e is injective if and only if F distinguishes points.
Proof Since .Pj ◦ e(x) = fj (x) for every .j ∈ J , by definition of the product
topology it follows that e is continuous, and (a) is proved. To prove (b), we show
that the image under e of an open neighborhood U of a point x contains the
intersection of .e(X) and a neighborhood of .e(x) in the product topology. Let .j ∈ J
such that .fj (x) does not belong to the closure of .fj (X \ U ). Now, the set of all
'
.y ∈ Yj j ∈ J such that .Pj (y) ∈ / fj (X \ U ) is open, and evidently the
intersection with .e(X) is a subset of .e(U ). This proves that e is a open map of
X onto .e(X).
Finally, .e(x) = e(y) if and only if .fj (x) = fj (y) for every .j ∈ J . Hence e is
injective if and only if F distinguishes points.
In virtue of the last result, a topological space can be embedded into a cube (i.e.
it is homeomorphic to a subset of a cube) provided that it is possible to construct a
sufficiently rich collection of continuous functions defined on the space. By the very
definition, the existence of such collections seems to be related to some separation
properties of the topology.
286 13 Neighbors Again: Topological Spaces
Exercise 13.50 Prove that the collection of all continuous functions .f : X → [0, 1]
defined on a completely regular space X distinguishes points and closed sets.
Exercise 13.51 Suppose X is a Tychonoff space, and let F be the collection of
all continuous functions from X to .[0, 1]. Prove that the evaluation map e is a
homeomorphism of X with a subspace of .[0, 1]F . Hint: use Theorem 13.87.
Our task now is to prove the converse of the last exercise. We need a preliminary
result.
Theorem 13.88 Any product of Tychonoff spaces is a Tychonoff space.
Proof We say that a continuous function .f : X → [0, 1] defined on a topological
space X is for a pair .(x, U ) if and only if x is a point of X, U is a neighborhood of
x, .f (x) = 0 and .f = 1 on .X \ U . Now, if .f1 , . . . , fn are for .(x, U1 ), . . . , (x, Un ),
then we can set
and conclude that g is for . x, ni=1 Ui . This shows that X is completely regular if
for every x and every neighborhood U of x belonging to a sub-base of the topology
of X, there exists a function for the pair .(x, U ). '
Consider now the case in which X is a product . {Xα | α ∈ A} of Tychonoff
spaces. Let .x ∈ X and .Ua be a neighborhood of .xa = Pa (x) in .Xa . If f is a
function for .(xa , Ua ), then .f ◦ Pa is a function for .(x, Pa−1 (Ua )). The collection
of all sets .Pa−1 (Ua ) forms a sub-base of the product topology of X, hence X is
completely regular. The fact that any product of .T1 spaces is a .T1 space completes
the proof.
Theorem 13.89 (Embedding into Cubes) For a topological space X, the follow-
ing are equivalent:
(a) X is a Tychonoff space;
(b) X is homeomorphic to a subspace of a cube.
Proof Clearly (b) implies (a). Conversely, we remark that the space .[0, 1] is a
Tychonoff space, thus any cube is a Tychonoff space by Theorem 13.88. Therefore
any subspace of a cube is a Tychonoff space. Exercise 13.51 and Theorem 13.87
show that the evaluation map e is a homeomorphism of X into a cube, and the proof
is complete.
We are ready to prove a sufficient condition for a topology to be metrizable.
13.17 Cubes and Metrizability 287
Theorem 13.90 (Urysohn) A regular .T1 topological space whose topology has a
countable base is homeomorphic to a subspace of the cube .[0, 1]R . In particular, it
is metrizable.
Proof Let us explain the strategy of the proof. A product of countably many pseudo-
metrizable spaces is pseudo-metrizable, see Theorem 13.85. By Theorem 13.87, if
F is a collection of continuous function on a .T1 space X, where an'element
f of F
maps X to .Yf , then the evaluation map is continuous from X to . Yf f ∈ F ,
and it is a homeomorphism as soon as F distinguishes points and closed sets. To
show that X is metrizable, it suffices to construct a countable family (i.e. a sequence)
F of continuous functions, each from X into .[0, 1], such that F distinguishes points
and closed sets.
Let .B be a countable base for the topology of X, and let
A = (U, V ) U ∈ B, V ∈ B, U ⊂ V .
.
1 1 1
d(Uk , X \ Uk+1 ) ≥
. − k+1 = k+1 .
2k 2 2
288 13 Neighbors Again: Topological Spaces
Here comes the hard part of the proof: we select a well-order ., on .U. For every
k ∈ N and every .U ∈ U we introduce
.
Uk% = Uk \
. {Vk+1 | V ∈ U, V , U } ⊂ Uk .
In the well-order .,, any pair .(U, V ) of distinct sets in .U satisfies either .U , V
or .V , U , but not both. Hence either .Uk% ⊂ X \ Vk+1 or .Vk% ⊂ X \ Uk+1 . In other
words, either
1
d(Uk% , Vk% ) ≥ d(X \ Vk+1 , Vk ) ≥
.
2k+1
or
1
d(Uk% , Vk% ) ≥ d(Uk , X \ Uk+1 ) ≥
. .
2k+1
Then
1
%%
.Uk = x ∈ X d(x, Uk% ) ≤ k+3
2
1 1 1
d(Uk%% , Vk%% ) ≥
. −2· = .
2k+1 2k+3 2k+2
Thus each collection .Vk = Uk%% U ∈ U is locally finite. Indeed, for every point
.x ∈ X, the open ball .B(x, 2
−k−3 ) intersects at most one element of .V .
k
The collection .V = {Vk | k ∈ N} is a cover of X. Every .x ∈ X belongs to some
.U ∈ U, hence it belongs to .Uk for some .k ≥ 1. As a consequence, .x ∈ U ⊂ U ,
% %%
k k
%%
when U is the smallest element of .U which contains x. Since .Uk is open and is a
subset of U , the collection .V is a .σ -locally finite open refinement of .U. The proof
is complete.
Theorem 13.92Every regular topological space X whose topology has a .σ -locally
finite base .B = {Bk | k ∈ N} is a normal space.
Proof Fix two disjoint closed subsets A and B of X. By regularity, a cover .U of
A exists made by open sets that have closure disjoint from B, and a cover .V of B
exists made by open sets that have closure disjoint from A. By assumption we may
express
U=
. {Uk | k ∈ N} , V= {Vk | k ∈ N} ,
13.17 Cubes and Metrizability 289
We thus see that A has the countable cover .{Uk | k ∈ N} by open sets with closure
.U k ⊂ X \ B, and B has the countable cover .{Vk | k ∈ N} by open sets with closure
.V k ⊂ X \ A. By replacing .Uk and .Vk with the union of their predecessors in the
U1 ⊂ U2 ⊂ . . . ⊂ Uk ⊂ . . .
.
V1 ⊂ V2 ⊂ . . . ⊂ Vk ⊂ . . .
These sets still have the properties that .U k ⊂ X \ B and .V k ⊂ X \ A. Then the sets
U=
. Uk \ V k k ∈ N , V = Vk \ U k k ∈ N
are open neighborhoods of A and B, respectively. But they are also disjoint, since
Uk ∩ X \ U n ⊃ Uk \ V )k ∩ Vn \ U n ⊂ X \ V k ∩ Vn ,
.
where .Uk ∩ X \ U n = ∅ when .k ≤ n, and . X \ V k ∩ Vn = ∅ when .n ≤ k.
Theorem 13.93 (Nagata-Smirnov) For a topological space X the following prop-
erties are equivalent:
(a) X is metrizable;
(b) X is regular and has a .σ -locally finite base .B = {Bk | k ≥ 1}.
Proof (a) implies (b). Indeed, for every integer .n ≥ 1 the opencover .U(b) by open
finite refinement .B(n) = {Bk (n) | k ≥ 1} by
balls of radius .1/n has a .σ -locally
Stone’s Theorem. Then .B = {B(n) | n ≥ 1} is a .σ -locally finite base for the
metric topology of X.
(b) implies (a). Indeed, we begin with a .σ -locally finite base .B = {Bk | k ≥ 1}
for the topology of X. For every couple .(m, n) ∈ N × N and every .U ∈ Bm , we set
. G= V ∈ Bn V ⊂ U .
Since .Bm is locally finite, each element .(x, y) ∈ X×X has a neighborhood on which
the summation in the definition of .dm,n is a finite sum. As a consequence, .dm,n is
a continuous function. It is easy to check that .dm,n is a pseudo-metric on X. We
call .Ym,n the set X endowed with the topology induced by the pseudo-metric .dm,n ,
and let .gm,n : X → Ym,n be the inclusion map. It is clear that .gm,n is a continuous
function.
We claim that the countable collection .F = gm,n (m, n) ∈ N × N separates
points and closed sets. Indeed, pick any .x ∈ X and any closed set A which does
not contain x. There exists a member .U ∈ Bm ⊂ Bof the base such that .x ∈ U ⊂
X \ A. By regularity, there exists a second member .V ∈ Bm of the base such that
.x ∈ V ⊂ V ⊂ U ⊂ X \ A. But then .dm,n (x, A) ≥ 1, since .uU (V ) = {1} and .uU
vanishes on A. As a consequence, .gm,n (x) = x does not belong to the closure (with
respect to .dm,n ) of .gm,n (A) = A. The claim is proved.
The evaluation map
(
e: X →
. Ym,n (m, n) ∈ N × N
13.18 Problems
R2 \ {(x, y) | x ∈ N ∧ y ∈ N}
.
13.18 Problems 291
13.6 Let (X, d) be a metric space, and let E be a subset of X. Prove that diam E =
E.
13.7 Let (X, d) be a metric space.
1. Call two Cauchy sequences {xn }n , {yn }n in X equivalent if and only if
. lim d(xn , yn ) = 0.
n→+∞
where P = [{xn }n ] and Q = [{yn }n ]. Prove that this limit exists, and that
(P , Q) depends only on P and Q, but not on the representatives {xn }n and
{yn }n of P and Q.
3. Prove that (X∗ , ) is a complete metric space.
4. For each point x ∈ X, there is a Cauhcy sequence all of whose terms are equal
to x: let Px be the element of X∗ which contains this sequence. Prove that
(Px , Py ) = d(x, y)
.
292 13 Neighbors Again: Topological Spaces
Na.b = {a + kb | k ∈ Z} .
.
1. Prove that the collection B = Na,b a ∈ Z, b ∈ Z, b > 0 is a base for a
topology τ on Z.
2. Prove that each Na,b is both open and closed in (Z, τ ).
3. Let P = {2, 3, . . .} the set of prime numbers. Prove that
Z \ {−1, 1} =
. N0,p p ∈ P .
Deduce that if P were a finite set, then {−1, 1} would be an open set in (Z, τ ).
Hence P is an infinite set.
13.11 Let Y be a dense subset of a Hausdorff topological space X. If Y is locally
compact, prove that Y is open in X.
13.12 Let (X, d) be a metric space. If A ⊂ X and ε > 0, we set
13.19 Comments
I believe that the chapter on General Topology has a fundamental role in a book like
this one. This is why this chapter is particularly long and full of ideas and results.
The reader may have noticed that I did not insist on a strictly economic exposition:
a few definitions appear twice, and the order of appearance of the main characters is
not always coherent with the tradition. As an example, separation axioms just come
into play when they are needed from the viewpoint of a Mathematical Analyst. As
long as it looks possible, the Hausdorff separation axiom is used alone, because
this is exactly the basic condition ensuring the most important fact of Analysis:
a function converges to at most one point. When the construction of continuous
functions which separate sets becomes necessary, we introduce normality and
regularity. The locally compact Hausdorff case is dealt with separately, although
one might optimize several proofs by reducing to the normal case. I have decided to
do so because locally compact Hausdorff space are the natural setting of Measure
Theory, a topic that will be discussed in the next chapters.
While writing this chapter, I had in mind the classical reference [5]. Kelley’s
book remains a great source for the young Analyst who wishes to learn “what every
Analyst should know about topology”, but its main feature is that the whole book
should be read like a romance, from cover to cover. This is nowadays uncommon,
since textbooks are written for the time-lacking reader.
294 13 Neighbors Again: Topological Spaces
Another standard reference is [6], whose style is opposed to Kelley’s. There are
plenty of pictures, diagrams, sketches, although the book is somehow redundant for
our purposes.
The definitive bible of General Topology is surely [4], whose bibliography is an
encyclopedia of references.
References
1. J.F. Aarnes, P.R. Andenaes, On nets and filters. Math. Scand. 31, 285–292 (1972)
2. P.R. Chernoff, A simple proof of Tychonoff’s theorem via nets. Am. Math. Mon. 99(10), 932–
934 (1992)
3. C. Fefferman, An easy proof of the fundamental theorem of algebra. Am. Math. Mon. 74, 854–
855 (1967)
4. R. Engelking, General Topology. Sigma Series in Pure Mathematics Series Profile, vol. 6
(Heldermann Verlag, Berlin, 1989)
5. J.L. Kelley, General Topology. Graduate Texts in Mathematics, vol. 27 (Springer-Verlag, New
York-Berlin, 1975). Reprint of the 1955 edition [Van Nostrand, Toronto, Ont.]
6. J.R. Munkres, Topology (Prentice Hall, Upper Saddle River, 2000), xvi, 537 p.
7. M.E. Rudin, A new proof that metric spaces are paracompact. Proc. Am. Math. Soc. 20, 603
(1969)
8. L. Schwartz, Analyse, in Topologie Générale et Analyse Fonctionnelle (Hermann, Paris, 1993)
9. A.H. Stone, Paracompactness and product spaces. Bull. Am. Math. Soc. 54, 977–982 (1948)
Chapter 14
Differentiating Again: Linearization
in Normed Spaces
4. .1x = x
for every .x ∈ V , .y ∈ V , .a ∈ R, .b ∈ R. The symbol 0 will be used both for the real
number zero and for the additive identity in V .
Remark 14.1 The systematic use of bold-face fonts to denote vectors, like in .x or .v,
is no longer popular among mathematicians.
Definition 14.2 Let V be a vector space. Given .E ⊂ R, .a ∈ R, .A ⊂ V , .B ⊂ V ,
x0 ∈ V , we set
.
A + B = {x + y | x ∈ A, y ∈ B}
.
x0 + B = = {x0 + y | y ∈ B}
EA = {ax | a ∈ E, x ∈ A}
aA = {ax | x ∈ A} .
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 295
S. Secchi, A Circle-Line Study of Mathematical Analysis,
La Matematica per il 3+2 141, https://doi.org/10.1007/978-3-031-19738-3_14
296 14 Differentiating Again: Linearization in Normed Spaces
Definition 14.3 Let V be a vector space, and let .W ⊂ V . We say that W is a vector
subspace of V if .W +W ⊂ W and .RW ⊂ W . Furthermore, we say that W is convex
if .ax + (1 − a)y ∈ W whenever .a ∈ R, .x ∈ W , .y ∈ W and .0 ≤ a ≤ 1. We say
that W is symmetric if .−1W = −W = W , and that W is balanced if .aW ⊂ W for
every .R such that .|a| ≤ 1.
Exercise 14.1 Prove that W is a vector subspace of V if and only if for every .a ∈ R,
b ∈ R, .u ∈ W and .v ∈ W there results .au + bv ∈ W .
.
Definition 14.4 Let V be a vector space, and suppose that .p : V → R. We say that
p is a seminorm on V if
1. .p(x + y) ≤ p(x) + p(y)
2. .p(ax) = |a|p(x)
for every .x ∈ V , .y ∈ V , .a ∈ R. Of course the first property is the triangle inequality
for p, while the second property is a homogeneity property.
Exercise 14.2 Let p be a seminorm on the vector space V . Prove that .p(0) = 0,
p(x) ≥ 0 for every .x ∈ V , and .p(x − y) ≥ |p(x) − p(y)| for every .x ∈ V , .y ∈ V .
.
d(x, y) = *x − y*.
.
there exists a compact subset .Kε of X such that .|f (x)| < ε for every .x ∈ X \ Kε .
If X is a normed vector space (with norm .* · *), this means that for every .ε > 0
there exists .M ∈ R such that .*x* > M implies .|f (x)| < ε, or equivalently that
.lim*x*→+∞ f (x) = 0.
*f * = *f *∞ = sup {f (t) | t ∈ X} .
.
Exercise 14.3 Prove that .C(X) and .C0 (X) are Banach spaces. If X is a compact
Hausdorff space, prove that the normed spaces .C(X), .C0 (X) and .Cc (X) coincide.
Example 14.2 Let .a < b and let .n = 0, 1, 2, . . . We denote by .C n ([a, b]) the vector
space of n-times continuously differentiable functions from .[a, b] to .R. We define
n *
*
* (k) *
*f *n =
. *f *
∞
k=0
for .f ∈ C n ([a, b]), where .f (k) stands for the k-th derivative of f , .0 ≤ k ≤ n.
Finally, we define
∞
C ∞ ([a, b]) =
. C n ([a, b]).
n=0
vector space of all sequences .{xn }n of real numbers such that . ∞ n=1 |xn | ∈ R. In
p
If .p = ∞, the space .∞ = ∞ (N) is the vector space of all bounded sequences of
real numbers. The norm of .x = {xn }n ∈ ∞ is
Exercise 14.5 Prove that .p and .∞ are Banach spaces.
298 14 Differentiating Again: Linearization in Normed Spaces
Definition 14.7 Let V be a (real) vector space. An inner (or a scalar) product on
V is a function which assigns to every .(x, y) ∈ V × V a real number ."x | y# such
that
1. (linearity) ."ax + by | z# = a"x | z# + b"y | z#
2. (symmetry) ."x, y# = "y | x#
3. (positivity) ."x | x# ≥ 0
4. (non-degeneracy) ."x | x# = 0 implies .x = 0
for every .x ∈ V , .y ∈ V , .z ∈ V , .a ∈ R, .b ∈ R. The norm induced by the inner
product is defined by
*x* =
. "x | x# x ∈ V.
A Hilbert space is a vector space which is a complete metric space with respect to
the norm induced by an inner product.
Remark 14.4 If V is a complex vector space, in the sense that vectors are multiplied
by complex numbers, condition 2 in the definition of an inner product must be
changed to
2. ."x, y# = "y | x#;
in this case the inner product is usually called hermitian instead of symmetric.
If this is the case, the real number in (14.1) is called the operator norm of T , and is
denoted by *T * (as usual!). The set of all bounded linear operators from X to Y is
denoted by L(X, Y ).
Exercise 14.6 Prove that L(X, Y ) is a Banach space under the operator norm. It
should be remarked that only the completeness of Y is necessary here.
Definition 14.9 Let X be a Banach space. The (topological) dual space of X is
X∗ = L(X, R), i.e. the Banach space of all bounded linear operators from X to R.
Any such operator will be called a bounded linear functional on X. The operator
norm of f ∈ X∗ will be also denoted by *f *X∗ .
Remark 14.5 Sometimes X% is used instead of X∗ , but we will always stick to our
symbol.
Theorem 14.1 Let X, Y be Banach spaces, and let T : X → Y be a linear operator.
The following are equivalent:
(a) T is continuous at some point x0 ∈ X;
(b) T is continuous at 0 ∈ X;
(c) T is a bounded operator;
(d) T is uniformly continuous.
Proof Obviously (d) implies (c) implies (b) implies (a). Suppose now that (a) holds.
If V is an open neighborhood of the origin of Y , then V1 = V + T x0 is an open
neighborhood of T x0 . By assumption there exists an open neighborhood U1 of x0
such that T (U1 ) ⊂ V1 . But then U = U1 − x0 is an open neighborhood of the origin
of X, and if y and z are elements of X such that y − z ∈ U , then
T y − T z = T (y − z) ∈ V1 − T x0 = V .
.
This shows that T is uniformly continuous, and thus (a) implies (d). The proof is
complete.
g(x) ≤ p(x)
. for all x ∈ W.
Then there exists a linear map1 f : V → R such that g(x) = f (x) for every x ∈ W ,
and
f (x) + α ≤ p(x + x0 )
.
f (x) − α ≤ p(x − x0 ).
1 Here we use linearity in a pure algebraic sense. No reference to any norm is understood.
14.3 The Hahn-Banach Theorem 301
We have just proved that f ≤ h, against the maximality of f . The proof is now
complete.
Corollary 14.1 Let G be a vector subspace of X and let g : G → R be a continuous
linear map whose norm is
g(x)
.*g*G = sup
∗ x ∈ G, x = 0 .
*x*
*f0 | = *x0 *,
. f0 (x0 ) = *x0 *2 .
Proof Let G = Rx0 and g(tx0 ) = t*x0 *2 . We have |g*G∗ = *x0 *2 , so that
Corollary 14.1 applies.
Corollary 14.3 For every x ∈ X there results
|f (x)| |f (x)|
f ∈ X∗ .
.*x* = sup
f ∈ X∗ = max
*f *X ∗ |f *X∗
On the other hand, there exists f0 ∈ X∗ such that *f0 * = *x* and f0 (x) = *x*2 .
We define f1 = *x*−1 f0 in such a way that *f1 | = 1 and f1 (x) = *x*.
302 14 Differentiating Again: Linearization in Normed Spaces
H = {x ∈ X | f (x) = α} ,
.
[f = α] = {x ∈ X | f (x) = α} .
.
B(x0 , r) = {x ∈ X | *x − x0 * < r} .
.
We claim that f (x) < α for all x ∈ B(x0 , r). Indeed, suppose that f (x1 ) > α for
some x1 ∈ B(x0 , r). For every t ∈ [0, 1], the point xt = (1 − t)x0 + tx1 lies in
B(x0 , r), so that f (xt ) = α for every t ∈ [0, 1]. But f (xt ) = α for
f (x1 ) − α
t=
. ,
f (x1 ) − f (x0 )
which is a contradiction. The claim is proved, and it follows that f (x0 + rz) < α
for every z ∈ B(0, 1). As a consequence f ∈ X∗ with
α − f (x0 )
|f *X∗ <
. .
r
Definition 14.12 Let A ⊂ X, B ⊂ X. The hyperplane H = [f = α] separates A
and B if and only if
x ∈ A ⇒ f (x) ≤ α
.
x ∈ B ⇒ f (x) ≥ α.
We say that H strictly separates A and B if there exists ε > 0 such that
x ∈ A ⇒ f (x) ≤ α − ε
.
x ∈ B ⇒ f (x) ≥ α + ε.
. C = {x ∈ X | pC (x) < 1} .
Proof Since C is open, there exists r > 0 such that B(0, r) ⊂ C. By definition,
1
pC (x) ≤
. |x| for every x ∈ X.
r
Hence M = 1/r works. The first property of a gauge is trivial.
Now, assume that x ∈ C. since C is open, we have (1 + ε)x ∈ C for ε > 0
sufficiently small. Hence pC (x) ≤ 1/(1 + ε) < 1. Conversely, if pC (x) < 1 then
there exists 0 < α < 1 such that ∗1/α)x ∈ C and thus
1
x=α·
. x + (1 − α) · 0 ∈ C.
α
304 14 Differentiating Again: Linearization in Normed Spaces
Therefore
tx (1 − t)y
. + ∈C
pC (x) + ε pC (y) + ε
pC (x) + ε
t=
.
pC (x) + pC (y) + 2ε
we obtain
x+y
. ∈ C.
pC (x) + pC (y) + 2ε
Hence
x+y
pC
. < 1,
pC (x) + pC (y) + 2ε
where pC is the gauge of C. If t > 0, then g(tx0 ) = tpC (x0 ) = pC (tx0 ). Since
0 = pC (0) = pC (x0 − x0 ) ≤ pC (x0 ) + pC (−x0 ) implies pC (−x0 ) ≤ pC (x0 ), it
follows that g(tx0 ) ≤ pC (tx0 ) for every t ≤ 0. In any case, g(x) ≤ pC (x) for every
x ∈ G.
By the Hahn-Banach Theorem, there exists f ∈ X∗ such that f = g on G
and f (x) ≤ pC (x) for every x ∈ X. In particular f (x0 ) = 1. Lemma 14.1 yields
f (x) < 1 for every x ∈ C.
14.3 The Hahn-Banach Theorem 305
Aε = A + B(0, ε)
.
Bε = B + B(0, ε),
so that Aε and Bε are convex, open and non-empty sets. We claim that if ε > 0 is
sufficiently small, then Aε ∩Bε = ∅. This is the only place where the topological
assumptions on A and B are used. Indeed, if the claim were false, there would
exist sequences εn → 0, xn ∈ A and yn ∈ B such that *xn − yn * ≤ 2ε0 . By
compactness, there would exist a subsequence ynk → x ∈ B, and therefore
xnk → x as well. But A is closed, hence x ∈ A ∩ B = ∅, a contradiction. We
can now use part (a) to produce f ∈ X∗ such that [f = α] separates Aε and Bε .
As a consequence,
f (x + εz) ≤ α ≤ f (y + εz)
.
Corollary 14.4 Let Y be a vector subspace such that Y = X. Then there exists
f ∈ X∗ , f = 0, such that
and this implies f (y) = 0 for every y ∈ Y , since λf (y) < α for every λ ∈ R.
.B(x0 , r0 ) ⊂ ω.
Since .U1 is dense, it must intersect the open set .B(x0 , r0 ), so we may choose a point
x1 ∈ B(x0 , r0 ) ∩ U1 and a number .r1 > 0 such that .0 < r1 < r0 /2 and
.
B(x1 , r1 ) ⊂ B(x0 , r0 ) ∩ U1 .
.
the limit as .p → +∞
∈ B(xn , rn )
. for all n ∈ N.
then there exists .n0 ∈ N such that .Xn0 has non-empty interior.
Proof Consider .Un = X \ Xn . By assumption . n∈N Un is not dense, so there exists
.n0 such that .Un0 is not dense, and this means that the interior of .Xn0 is non-empty.
We introduce the principle of uniform boundedness in a non-linear setting.
Although this is not the common statement in Functional Analysis, we believe that
it fits into our topological approach to mathematical analysis.
Definition 14.13 Let X be a topological space. A function .f : X → R is lower
semicontinuous if the set .{x ∈ X | f (x) ≤ t} is closed in X for evert .t ∈ R.
Theorem 14.6 (Osgood) Let X be a topological space, and suppose that .{fα | α ∈
A} is any family of real-valued lower semicontinuous functions defined on X. If for
every .x ∈ X there exists .Mx > 0 such that
then there exist a non-empty open set .U ⊂ X and a number .M > 0 such that
Each .Xn is closed because .fα is lower semicontinuous. Moreover, .X = n∈N Xn .
Thus there exists .n0 ∈ N such that the interior of .Xn0 is non-empty. Denoting by U
the interior of .Xn0 , there results
Remark 14.6 The meaning of Osgood’s Theorem is that a uniform bound on a
suitable open set U can be deduced from a pointwise bound (recall that .Mx depends
on x).
We now apply Osgood’s Theorem in a linear setting. The role of the open set U
becomes irrelevant as a consequence of linearity.
308 14 Differentiating Again: Linearization in Normed Spaces
. sup {*Tα x* | α ∈ A} ≤ Mx ,
. sup {*Tα x* | α ∈ A, x ∈ U } ≤ M % .
In particular, there exist some .x0 ∈ U and some .δ > 0 such that .B(x0 , δ) ⊂ U and
Finally, if .z ∈ X, .z = 0, we set
δ
y=
. z.
2*z*
2*z* 4M %
*T z* =
. *Tα y* ≤ *z*
δ δ
for every .α ∈ A. This implies .*Tα *L(X,Y ) ≤ M = 4M % /δ for every .α ∈ A, and the
proof is complete.
Corollary 14.6 Let X, Y be two Banach spaces, and let .{Tn }n be a sequence of
bounded linear operators from X to Y such that for every .x ∈ X, .Tn x converges as
.n → +∞ to a limit T x. Then there results
Proof Statement (a) follows directly from the Uniform Boundedness Theorem, so
there exists .M > 0 such that
(n ∈ N) ∧ (x ∈ X) ⇒ *Tn x* ≤ *x*.
.
Taking the limit as .n → +∞ we find .*T x* ≤ M*x* for every .x ∈ X. Since the
linearity of T follows from the basic algebra of limits, the proof of (b) is complete.
Finally we have
Since y0 ∈ T (B(0, 1)) we have by symmetry that −y0 ∈ T (B(0, 1)). It follows that
1
. *x* < *T x* for every x ∈ X.
c
This clearly means that T −1 is a bounded linear operator. The proof is complete.
Example 14.4 Let X be a Banach space, and let * · *1 , * · *2 be two norms on X.
Suppose that X is complete under both norms, and suppose also that there exists
C ≥ 0 such that
In other words, the two norms are equivalent. Indeed, the identity operator between
(X, * · *1 ) and (X, * · *2 ) is bounded and bijective. By the Isomorphism Theorem,
its inverse is also bounded, and the existence of the constant c follows.
Theorem 14.10 (Closed Graph Theorem) Let X, Y be Banach spaces, and
T : X → Y be a linear operator. If
G(T ) = {(x, T x) | x ∈ X}
.
ϕx ∗ (x) = f (x)
. for every x ∈ X.
The weak topology .σ (X, X∗ ) is the smallest topology on X such that every function
∗ ∗
.ϕx ∗ , .x ∈ X , is continuous.
To be more explicit, we are considering here the setting of Definition 13.45 with
Yα = R for every .α, and .A = X∗ . Hence the weak topology on X is the initial
.
V = x ∈ X |xi∗ (x − x0 )| < ε for every i ∈ I ,
.
with .ai = xi∗ (x0 ) is an open set for .σ (X, X∗ ). Conversely, let U be a neighborhood
of .x0 for .σ (X, X∗ ). By definition of the initial topology, there exists a neighborhood
W of .x0 such that .W ⊂ U and
.W = ϕx−1
∗ (ωi ) i ∈ I ,
i
312 14 Differentiating Again: Linearization in Normed Spaces
where I is a finite set and .ωi is a neighborhood of the number .ai = xi∗ (x0 ) in .R.
Therefore there exists .ε > 0 such that .(ai − ε, ai + ε) ⊂ ωi for every .i ∈ I , and this
implies that .x0 ∈ V ⊂ W ⊂ U . The proof is complete.
Corollary 14.7 A net .{Sn , n ∈ D} in X converges weakly to a point x if and only if
x ∗ (Sn ) → x ∗ (x) for every .x ∗ ∈ X∗ .
.
If we define
∗
U1 = x ∈ X
. x (x) < α
∗
U2 = x ∈ X x (x) > α ,
then .U1 and .U2 are open sets for .σ (X, X∗ ), and clearly .x1 ∈ U1 , .x2 ∈ U2 , .U1 ∩U2 =
∅.
In a similar way we introduce a new topology on the dual space .X∗ . Let us recall
once more that .X∗ is always topologized by the operator norm
*x ∗ * = sup |x ∗ (x)| *x* ≤ 1
. for every x ∗ ∈ X∗ .
We can then consider the bidual space .X∗∗ of X, i.e. the dual of .X∗ , and this space
is endowed with the norm
*x* = sup |x ∗ (x)| x ∗ ∈ X∗ , *x ∗ * ≤ 1
. for every x ∈ X.
*J (x)* = *x*
. for every x ∈ X.
x ∗ ∈ X∗ → x ∗ (x).
.
J (x)(x ∗ ) = x ∗ (x)
. for every x ∈ X and x ∗ ∈ X∗ .
14.6 Weak and Weak* Topologies 313
By direct computation,
*J (x)* = sup |J (x)(x ∗)| *x ∗ * ≤ 1 = sup |x ∗ (x)| *x ∗ * ≤ 1 = *x*
.
Important: Warning
In general, .J : X → X∗∗ is not surjective. If it is, the space X is called reflexive. Be
careful that a space is reflexive if and only if J is a bijective map; different bijective
maps may exist between X and .X∗∗ , but they do not enter into the definition of
reflexivity.
If
∗
U1∗ = x ∗ ∈ X∗
. x (x) < α
∗
U2∗ = x ∗ ∈ X∗ x (x) > α ,
then .U1∗ and .U2∗ are disjoint open sets such that .x1∗ ∈ U1∗ and .x2∗ ∈ U2∗ . The proof is
complete.
314 14 Differentiating Again: Linearization in Normed Spaces
Once (i) and (ii) have been established, we deduce that .R(B ∗ ) is compact. Hence
∗ ∗
.B must be compact as a homeomorphic copy of .R(B ). It remains to prove that (i)
R(η). The fact that R is a homeomorphism of .B ∗ onto .R(B ∗ ) follows from a direct
comparison of the basic neighborhoods in the weak* topology and in the product
topology. To conclude, we need to prove that .R(B ∗ ) is a closed subset of .[−1, 1]B
in the product topology.
Let .f : B → [−1, 1] be a point of the closure of .R(B ∗ ) in the product topology.
By definition of the application R, we only need to check that for every .u ∈ B,
.v ∈ B, .λ ∈ R such that .u + v ∈ B and .λu ∈ B, there results
f (u + v) = f (u) + f (v),
. f (λu) = λf (u).
Now, for every .ε > 0, the weak* neighborhood consisting of those .g ∈ [−1, 1]B
such that
contains some element .R(ψε ), and since .ψε is linear, we must have
The proof that .|f (λu) − λf (u)| < 2ε is similar, and thus .f ∈ R(B ∗ ). The proof is
complete
14.7 Isomorphisms 315
14.7 Isomorphisms
Theorem 14.17 Let X be a Banach space, and let T ∈ L(X, X) be such that
*T * < 1. Then the operator I − T is invertible, and its inverse operator belongs to
L(X, X).
Proof Consider the series
∞
. Tn = I + T + T2 + ···+ Tn + ··· ,
n=0
S ◦ (I − T ) = (I − T ) ◦ S = I,
.
and this implies that S is the inverse operator of I − T . The proof is complete.
Definition 14.16 If X and Y are Banach spaces, we denote by Iso(X, Y ) the subset
of L(X, Y ) consisting of all T such that T is invertible and the inverse of T belongs
to L(Y, X).
Remark 14.7 Members of Iso(X, Y ) are usually called isomorphisms between X
and Y . In abstract algebra the word isomorphism refers to invertible functions which
preserves some prescribed algebraic structure in X and in Y . In our definition we
add a topological condition, i.e. the continuity of T and of its inverse, and we should
look for a less generic word. It must be said that continuity is somehow the smallest
property that we want to preserve besides linearity in Functional Analysis, and for
this reason we force the word isomorphism to include the continuity preservation.
Theorem 14.18 Let X and Y be Banach spaces.
(a) Iso(X, Y ) is an open subset of L(X, Y ).
(b) The function T → T −1 of Iso(X, Y ) to L(Y, X) is continuous.
Proof Since the empty set is always open, we will assume that Iso(X, Y ) = ∅. Pick
T0 ∈ Iso(X, Y ). For T : X → Y to be an isomorphism it is necessary and sufficient
that T0−1 ◦ T : X → Y be an isomorphism. We set T0−1 ◦ T = I − . If we can
* * * *
3 More precisely, * nk=m T k * ≤ nk=m *T k * ≤ nk=m *T *k , and the conclusion follows from
the completeness of L(X, Y ).
316 14 Differentiating Again: Linearization in Normed Spaces
ensure that ** < 1, then Theorem 14.17 implies that is an isomorphism. Since
1
*T − T0 * <
. ,
*T0−1 *
hence
∞ ∞
But (I − )−1 = n
n=0 , hence (I − )−1 − I = n
n=1 , and
* * ∞
||
* *
. *(I − )−1 − I * ≤ **n = .
1 − **
n=1
is linear from .Xk to Y . Roughly speaking, multilinear means linear in each variable
separately.
14.8 Continuous Multilinear Applications 317
f (λ1 x1 , . . . , λn xn ) = λ1 λ2 · · · λn f (x1 , . . . , xn )
.
is bounded.
Proof Clearly (a) implies (b). Suppose that (b) holds. Since f is continuous at
the origin, the pre-image of the unit ball of Y is an open neighborhood of the
origin in .X1 × · · · × Xn . Hence there exists .r > 0 such that .|xi * ≤ r for every
i implies .*f (x1 , . . . , xn )* ≤ 1. By homogeneity, .*xi * ≤ 1 for every i implies
.*f (x1 , . . . , xn )* ≤ r
−n . Thus (c) holds.
Assume now that (c) holds, and let .M > 0 be such that .*f (x1 , . . . , xn )* ≤ M
whenever .*xi * ≤ 1 for every i. By homogeneity, for every .x1 , . . . , xn ,
f (x1 , . . . , xn ) − f (a1 , . . . , an )
.
= f (x1 − a1 , x2 , . . . , xn ) + f (a1 , x2 − a2 , x3 , . . . , xn )
+ · · · + f (a1 , . . . , an−1 , xn − an ).
Hence
*f (x1 , . . . , xn ) − f (a1 , . . . , an )*
.
Suppose that .*xi − ai | ≤ ε for every i. It follows that .*xi * ≤ *ai * + ε, and there
exists .A > 0 such that .*xi − ai | ≤ ε for every i implies .*xi * ≤ A for every i. We
deduce that
" n #
.*f (x1 , . . . , xn ) − f (a1 , . . . , an )* ≤ MA *xi − ai * ≤ nMAn−1 ε.
n−1
i=1
whenever .*xi − ai * ≤ ε for every i. Since we may choose .A > 0 which does
not depend on .ε provided that .ε > 0 is sufficiently small, the continuity of f at
.(a1 , . . . , an ) follows.
Definition 14.17 If .X1 , . . . , Xn and Y are Banach spaces, we denote by
.L(X1 , . . . , Xn ; Y )
the set of all continuous multilinear applications from .X1 × · · · × Xn to Y . For any
f ∈ L(X1 , . . . , Xn ; Y ) we define the norm
.
*f * = sup |f (x1 , . . . , xn ) *xi *Xi ≤ 1 for i = 1, . . . , n .
.
Exercise 14.9 Prove that .L(X1 , . . . , Xn ; Y ) is a Banach space with respect to the
norm just defined.
Example 14.5 Consider three Banach spaces X, Y and Z. We define .ϕ : L(Y, Z) ×
L(X, Y ) → L(X, Z) such that .ϕ(f, g) = g ◦ f . It is (almost) obvious that .ϕ is
bilinear. For every .x ∈ X we notice that
≤ *g* · *f * · *x*.
so that .*fx * ≤ *f * · *x*. This inequality proves the continuity of .fx , and allows
us to define the application .g : x ∈ X → fx ∈ L(Y, Z). The previous inequality
becomes now
*g(x)* ≤ *f * · *x*.
.
Therefore .*g* ≤ *f *.
To summarize, to each .f ∈ L(X, Y ; Z) we have associated an application
.g : X → L(Y, Z), which we may denote by .ϕ(f ). It is immediate to check that
.ϕ is linear and that .*ϕ* ≤ 1. Our next step consists in constructing an inverse of .ϕ,
which inverts .ϕ. We start with .g : X → L(Y, Z), and we notice that g associates to
each .x ∈ X a bounded linear application .g(x) from Y to Z. Hence
f : (x, y) ∈ X × Y → g(x)(y)
.
which implies
This shows that .f ∈ L(X, Y ; Z) and .*f * ≤ *g*. We summarize as follows: the
application .ψ associates to each .g ∈ L(X, L(Y, Z)) the application .f ∈ L(X, Y ; Z)
in such a way that .*ψ* ≤ 1.
Theorem 14.20 There exists an isometry4 between .L(X, Y ; Z) and .L(X, L(Y, Z)).
Proof It is clear that .ϕ ◦ ψ and .ψ ◦ ϕ are the identities in the corresponding spaces.
In particular the operator norm of .ψ ◦ ϕ must be equal to one. Hence .1 = *ψ ◦ ϕ* ≤
*ψ* · *ϕ*, and the fact that .*ϕ* ≤ 1 and .*ψ* ≤ 1 implies .*ψ* = 1, .*ϕ* = 1. This
proves that .ϕ is a bounded linear operator which preserves norms, and the proof is
complete.
Proof Let us define p : R → R such that p(λ) = "λu + v, λu + v#. From the
definition of the inner product it follows that p(λ) ≥ 0 for every λ. But
Proof Indeed,
Definition 14.18 (Norms in Inner Product Spaces) The norm induced by the
inner product of a space H is defined by
. *u* = "u, u# (14.2)
for every u ∈ H .
14.9 Inner Product Spaces 321
The fact that this is actually a norm on H follows at once from the triangle
inequality and the algebraic properties of the inner product. Unless otherwise stated,
the norm of an inner product space will refer to (14.2).
Theorem 14.23 (Parallelogram Identity) If H is an inner product space, and if
u ∈ H , v ∈ H , then
* * * *
* u + v *2 * u − v *2
* = *u* + *v* .
2 2
* * + *
.
* 2 * * 2 * 2
Proof The identity follows from an expansion of the left-hand side according to the
bilinearity properties of the inner product. The details are left as an exercise.
Definition 14.19 (Hilbert Spaces) An inner product H is a Hilbert space if and
only if it is a complete metric space with respect to the distance d(u, v) = *u − v*
associated to the norm (14.2).
Theorem 14.24 (Projection on Closed Convex Subsets) Let H be a Hilbert
space, and let K be a closed convex subset of H . For every f ∈ H there exists
one and only one element u ∈ K such that
*f − u* = min {*f − v* | v ∈ K} .
.
u ∈ K,
.
"f − u, v − u# ≤ 0 for every v ∈ K.
Proof Let us set d = min {*f − v* | v ∈ K}, and let {vn }n be a sequence in K such
that dn = |f − vn * → d as n → +∞. The parallelogram identity shows that
* *2 * *
* vn + vm * * vn − vm *2
* = dn + dm
2 2
*
. f − * *
+*
* 2 * 2 * 2
* *
for every n, m. Since K is convex, vn +vm
2 ∈ K, and therefore *f − vn +vm *2
2 ≥ d 2.
Hence
* *
* vn − vm *2 dn2 + dm
2
* * ≤ − d 2.
.
* 2 * 2
This shows that *vn − vm *2 can be made as small as we please by choosing n and
m sufficiently large. In other words, {vn }n is a Cauchy sequence in H . Since H is a
complete metric space, vn → u as n → +∞, and u ∈ K because K is closed. But
then d = *f − u*, and we have proved the existence of the desired element of K.
322 14 Differentiating Again: Linearization in Normed Spaces
It follows that
*u − f *2 − *v − f *2 = 2"f − u, v − u# − *u − v*2 ≤ 0
.
"f − u1 , u2 − u1 # ≤ 0
.
"f − u2 , u1 − u2 # ≤ 0
"f − u, v − u# ≤ 0
. for every v ∈ K
. *PK f1 − Pk f2 * ≤ *f1 − f2 * .
14.9 Inner Product Spaces 323
u∈H
.
"f − u, v# = 0 for every v ∈ K.
*u1 − u2 *2 ≤ "f1 − f2 , u1 − u2 #.
.
g0 − g1
g=
. .
*g0 − g1 *
ϕ(v)
λ=
. , w = v − λg.
ϕ(g)
324 14 Differentiating Again: Linearization in Normed Spaces
or
ϕ(v)
"g, v# = λ =
. .
ϕ(g)
d(xn+1 , xn ) ≤ Ln d(x1 , x0 )
.
Ln
= d(x1, x0 ) .
1−L
14.9 Inner Product Spaces 325
ε(1 − L)
LN <
. .
d(x1 , x0 )
Ln ε(1 − L) d(x1 , x0 )
d(xm , xn ) ≤ d(x1 , x0 )
. < = ε.
1−L d(x1 , x0 ) 1 − L
since T is continuous. This proves the existence of a fixed point for T . Uniqueness
is easy, since T (z1 ) = z1 and T (z2 ) = z2 imply d(z1 , z2 ) = d(T (z1 ), T (z2 )) <
Ld(z1 , z2 ), so that d(z1 , z2 ) = 0. The proof is complete.
Theorem 14.28 (Stampacchia) Let H be a Hilbert space, and let a : H × H → R
a continuous bilinear form. We assume that there exists α > 0 such that
a(u, u) ≥ α*u*2
. for every u ∈ H.
a(u, v − u) ≥ ϕ(v − u)
. for every v ∈ K.
"Au, u# ≥ α*u|u2
. for every u ∈ H.
326 14 Differentiating Again: Linearization in Normed Spaces
To complete the proof, it suffices to show that there exists a unique u ∈ K such that
Let ! > 0 be a number that will be chosen hereafter. We can rewrite our inequality
as
This is a fixed point problem. More precisely, we are looking for a fixed point u ∈ K
of the function S : v ∈ K → PK (!f − !Av + v). We may now play with the free
parameter ! > 0 to ensure that S is a strict contraction on the complete metric space
K.
Indeed, we already know that
so that
We now choose
2α
0<!<
. ,
*A*L(H,H )
a(g − u, v − u) ≤ 0
. for every v ∈ K,
14.10 Linearization in Normed Vector Spaces 327
1
. a(v, v) − ϕ(v)
2
with respect to v ∈ K. The proof is complete.
Theorem 14.29 (Lax-Milgram) Let H be a Hilbert space, a : H × H → R be a
continuous bilinear form. Suppose that there exists α > 0 such that
a(u, u) ≥ α*u*2
.
for every u ∈ H . For every ϕ ∈ H ∗ there exists a unique element u ∈ H such that
a(u, v) = ϕ(v)
. for every v ∈ H.
Furthermore, if a is symmetric, i.e. a(v, w) = a(w, v) for every v and w, then the
element u is characterized by
⎧
⎨u ∈ H
⎩ 12 a(u, u) − ϕ(u) = min 12 a(v, v) − ϕ(v) v ∈ H .
.
Proof The conclusion follows from Stampacchia’a Theorem and Theorem 14.25.
Let X and Y be Banach spaces. In the very particular case .X = Y = Rn , the notion
of derivative has been studied in the first part of the book. As we said there, it is
customary to think of the derivative of a function f at a point a as a real number,
defined as
f (x) − f (a)
f % (a) = lim
. ,
x→a x−a
provided that this limit exists in .R. In the general case we need to avoid the division
by .x − a, which is now a vector.
328 14 Differentiating Again: Linearization in Normed Spaces
m(r)
. lim = 0.
r→0+ r
Exercise 14.11 We say that .f1 ∼ f2 if and only if .f1 and .f2 are tangent at a. Prove
that this is an equivalence relation.
Exercise 14.12 Prove that if .f1 and .f2 are tangent at a, then .f1 −f2 is continuous at
a. If, in particular, .f1 is continuous at a, then .f2 is continuous as well, and .f1 (a) =
f2 (a).
Example 14.6 Let g be a linear function .X → Y , and let .f (x) = g(x − a). It is
clear that f is tangent to zero at a if and only if .*g* = 0, i.e. if and only if g is
identically zero. Indeed,
m(r) = *g*r.
.
x → f (x) − f (a)
.
and
x → g(x − a)
.
f % : U → L(X, Y ),
. x → f % (x).
This is the derivative of f . Recalling that .L(X, Y ) is a Banach space with respect
to the operator norm, the following definition makes sense.
Definition 14.23 The function .f : U → Y is of class .C 1 (U ) if and only if
%
.f : U → L(X, Y ) is a continuous function.
5
1
. *x* ≤ *x*1 ≤ M*x*
M
for every .x ∈ X.
Exercise 14.14 Prove that equivalent norms induce the same topology on X. Hint:
any ball for the first norm contains and is contained in a ball for the second norm.
Hence both norms produce the same open neighborhoods.
Theorem 14.31 Suppose that .* · *1 is an equivalent norm on X and that .* · *2 is
an equivalent norm on Y . A function .f : U → Y is differentiable at .a ∈ U with
respect to the norms .* · *1 and .* · *2 if and only if it is differentiable with respect to
the original norms.
Proof Since there is a perfect symmetry between the old and the new norms, it is
enough to prove that the differentiability with respect to the original norms implies
the differentiability with respect to the equivalent norms. Assume therefore that
there exists .g ∈ L(X, Y ) such that
By assumption
1 1
. ≤M .
*x − a*1 *x − a*
Hence
*f (x) − f (a) − g(x − a)*2 *f (x) − f (a) − g(x − a)*
. ≤ M · M% .
*x − a*1 *x − a*
Proof By assumption
+ g % (f (a))(ϕ(x − a))
+ ψ(f (x) − f (a)).
We need to prove that the second and the third line satisfy
The second estimates follows from the fact that .*ψ(f (x) − f (a))* = o(*f (x) −
f (a)*) and .*f (x) − f (a)* ≤ 2*f % (a)* · * − a* holds as long as x is sufficiently
close to a. The proof is complete.
Example 14.7 If U is open in X and if .f : U → Y is the restriction of a continuous
linear application, then f is differentiable and .f % (x) = f for every .x ∈ U . Indeed,
.f (x) − f (a) = f (x − a) = f (f − a) + 0 by linearity, and the conclusion follows.
Theorem 14.33 Let .ϕ : Iso(X, Y ) → Iso(Y, X) be such that .ϕ(u) = u−1 for every
u. Then .ϕ ∈ C 1 (Iso(X, Y )), and
Proof We already know that .Iso(X, Y ) is open in .L(X, Y ). We can also consider .ϕ
as a map into .L(Y, X). Let us fix .u ∈ Iso(X, Y ) and .h ∈ L(X, Y ). We have
It suffices to prove that the difference between .(u + h)−1 ◦ h ◦ u−1 and .u−1 ◦ h ◦ u−1
is .o(*h*). Now,
whence
We have proved above that .ϕ % (u) = ψ(u−1 , u−1 ). The function .(v, w) → ψ(v, w)
is bilinear from .L(Y, X) × L(X, Y ) to .L(L(X, Y ), L(Y, X)). Furthermore it is
continuous, since
Hence
*ψ(v, w)h*
*ψ(v, w)* =
. sup ≤ *v**w*.
h∈L(X,Y )\{0} *h*
Indeed, we write
It suffices to prove that .*f (h, k)* = o(*(h, k)*) as .*(h, k)* → 0. By definition,
*(h, k)* = *h* + *k*, so that
.
Since it is evident that .(*h* + *k*)2 = o(*h* + *k*) as .*(h, k)* → 0, the proof is
complete.
Exercise 14.16 Generalize the previous example to a bounded multilinear applica-
tion .f : X1 × · · · × Xn → Z, where the norm in .X1 × · · · × Xn is defined by
F (a + εh) − F (a)
. lim = g(h) (14.4)
ε→0 ε
for every .h ∈ X. If it exists, the application g is unique, and it is denoted by .DG F (a)
of by .dG F (a).6
6The symbol .FG% (a) is also used, but it may be confused with the Fréchet derivative of a function
called .FG . We prefer to avoid such a symbol.
334 14 Differentiating Again: Linearization in Normed Spaces
Remark 14.9 The limit which defines the Gâteaux derivative is a limit of a function
of one real variable .ε (which takes values in Y ).
Proof We assume that .F (u) = F (v), otherwise the proof is trivial. By Corol-
lary 14.3 there exists .ψ ∈ Y ∗ such that .*ψ* = 1 and
h(t) = ψ (F (γ (t)) .
.
14.10 Linearization in Normed Vector Spaces 335
Since .h : [0, 1] → R, Theorem 9.8 applies and there exists .ϑ ∈ (0, 1) such that
Since .*ψ* = 1 and .ϑu + (1 − ϑ)v = w ∈ [u, v], the proof is complete.
Remark 14.10 The previous Mean Value Inequality is actually a one-dimensional
result, as the proof clearly shows. We have reduced the infinite-dimensional function
F to the function h of one real variable. However, the most interesting technique of
the proof consists in composing F with .ψ ∈ Y ∗ . This is a standard trick to project
the range of the function to .R, so that the basic Lagrange Theorem may be applied.
The simple example
ϑ ∈ [0, 1] → e2πiϑ ∈ C
.
shows that the Lagrange Theorem does not hold for vector-valued functions.
A basic use of Theorem 14.34 is explained in the next regularity result.
Theorem 14.35 Suppose that .F : U → Y is G-differentiable in U , and let
DG F : U → L(X, Y ),
. u → DG F (u)
be continuous (with the standard topologies of each space) at the point .u∗ . Then F
is F-differentiable at .u∗ , and .F % (u∗ ) = DG F (u∗ ).
Proof We only need to show that F is F-differentiable at .u∗ , since we already know
that the F-derivative must then coincide with the G-derivative. For every .h ∈ X, we
336 14 Differentiating Again: Linearization in Normed Spaces
define
We need to prove that .R(h) = o(*h*) as .*h* → 0. For .ε > 0 sufficiently small, the
function R is G-differentiable in .B(0, ε), and the Chain Rule yields
But
hence
The continuity of .DG F at .u∗ comes now into play for the first time, and yields
In other words, we have proved that .*R(h)* ≤ o(1)*h* as .*h* → 0, and the proof
is complete.
Remark 14.11 The previous result offers a convenient tool for checking the Fréchet-
differentiability of a function. Since the Gâteaux derivative is just a limit in one
real variable, it is usually easier to compute. Then one hopes that the G-derivative
depends continuously on the point at which it is evaluated. Of course this is only
a sufficient condition for Fréchet-differentiability, and in many situations the only
possible approach to F-differentiability is via the basic definition.
For functions of a real variable there is no need to distinguish the nature of the
first derivative and of the second derivative, since at each step we go back to some
suitable function of a real variable. The situation changes drastically for real-valued
functions of two or more variables: if the first derivative is usually defined as a
14.11 Derivatives of Higher Order 337
suitable vector, the second derivative is a matrix. Something is lurking behind the
very rich structure of .Rn , and in this section we want to investigate this situation in
a general setting.
Suppose that .F ∈ C(U, Y ) is differentiable (in the sense of Fréchet) at all points
of the open set .U ⊂ X. We consider .F % : U → L(X, Y ).
Definition 14.26 Let .u∗ ∈ U . We say that F is twice F-differentiable at .u∗ if and
only if .F % is differentiable at .u∗ . The second F-derivative of F at .u∗ is then
F %% (u∗ )(h, k)
.
instead of7
.F %% (u∗ )(h)(k).
7 This notation is formally correct but awful. It means that .F %% (u∗ )(h) is a bounded linear operator
which associates to .k ∈ X the element .F %% (u∗ )(h)(k) ∈ Y . Indeed, .F %% (u∗ ) associates to each
%% ∗
.h ∈ X an element of .F (u )(h) ∈ L(X, Y ).
338 14 Differentiating Again: Linearization in Normed Spaces
When .n = 2, .F %% (u) is independent of u. Compare this with the basic fact that “the
second derivative of .x → x 2 is constant.” Can you extend the results of this exercise
to the case .n ∈ R, .n ≥ 2?
The actual computation of the second derivative can be performed by “fixing”
the first increment h, and differentiating once more with respect to u. The precise
statement is as follows.
Theorem 14.36 Suppose that .F : U → Y is twice differentiable at .u∗ ∈ U . For all
%
.h ∈ X, the function .Fh : X → Y such that .Fh (u) = F (u)h is differentiable at .u
∗
and
F %% (u)(h, k) = F %% (u)(k, h)
. for every h ∈ X, k ∈ X.
Proof Let .ε > 0 be given. For any h and k in .B(0, ε), we define
ψ(h, k) = F (u + h + k) − F (u + h) − F (u + k) + F (u)
.
γh (ξ ) = F (u + h + ξ ) − F (u + ξ ).
= sup *F % (u + h + tk) − F % (u + tk
0≤t ≤1
−F %% (u)(h)* · *k*.
14.11 Derivatives of Higher Order 339
Now,
and
≤ ε(*h* + *k*)*k*
≤ ε(*k* + *h*)*h*.
≤ 3ε(*k*2 + *h*2 ).
By homogeneity, the last inequality remains true for every h and k in X. Since .ε > 0
was arbitrary, .F %% (u)(h, k) = F %% (u)(k, h), and the proof is complete.
The definition of higher order derivatives is now clear. If .n ∈ N, the function
F : U → Y is n-times differentiable at .u∗ ∈ U if and only if .D n−1 F is differentiable
.
A very peculiar aspect of differential calculus in Banach spaces is that every function
is actually considered as a function of a single vector variable. When we study
Calculus in Several Variables, the fact that any point of .Rn is an n-tuple of real
numbers plays a fundamental rôle. This is one of the many places where the
algebraic structure of .Rn conflicts with its geometric structure, and things become
unexpectedly obscure. Indeed, our definition of the Fréchet derivative applies very
well to the case .X = Rn , and no need to distinguish variables appears.
There are however situations in which the domain of a function is given as
a cartesian product of two Banach spaces, and then partial derivatives become a
natural idea. This is the object of this section.
Suppose that X, Y and Z are Banach spaces, and let .(u∗ , v ∗ ) ∈ X × Y be a point.
Two applications can be naturally defined as follows:
σv ∗ (u) = (u, v ∗ )
.
Since these derivatives are independent of .u∗ , .v ∗ , u and v, we will denote them by
.σ and .τ , respectively.
∂1 F (u∗ , v ∗ ).
.
∂2 F (u∗ , v ∗ ).
.
14.13 The Taylor Formula 341
Remark 14.12 Of course u and v are dummy variables, and we prefer to avoid the
popular notation .∂u F , .∂v F to denote partial derivatives. What really matters here is
just the position of the variable with respect to which we are differentiating, and our
notation reflects this fact.
Exercise 14.19 Retain the setting of the previous definition. Prove that the defini-
tion of partial derivatives is equivalent to requiring that there exist .Au ∈ L(X, Z)
and .Av ∈ L(Y, Z) such that
Hence partial derivatives are just evaluations of the Fré derivative, provided that F
is F-differentiable.
1 1
F (u) = F (p) +
. Df (p)(u − p) + D 2 F (p)(u − p, u − p) + · · ·
1! 2!
1
+ D m−1 F (p)(u − p, . . . , u − p)+
(m − 1)!
1 1
+ (1 − t)m−1 D m F (p + t (u − p))(u − p, . . . , u − p) dt.
(m − 1)! 0
342 14 Differentiating Again: Linearization in Normed Spaces
1 1
. (1 − t)m−1 D m F (p + t (u − p))(u − p, . . . , u − p) dt
(m − 1)! 0
1 m
= D F (p)(u − p, . . . , u − p).
m!
Proof The idea is to reduce to a function of a single variable. Let σ (t) = p + t (u −
p) for every t ∈ [0, 1]. If g = F ◦ σ , then g ∈ C m ([0, 1], Y ) and
for every positive integer ≤ m. Hence the usual Taylor formula with integral
remainder yields
m−1
g () (0) 1 1
g(1) =
. 1 + (1 − τ )m−1 g (m) (τ ) dτ.
! (m − 1)! 0
=0
F (u) = v,
.
8It is a matter of fact that these two results are indeed equivalent, so that they could be seen as
different flavors of the same result.
14.14 The Inverse and the Implicit Function Theorems 343
We begin with local inversion of a function around suitable points. We will always
deal with continuous functions .F : X → Y between Banach spaces. We recall that
.Iso(X, Y ) denotes the Banach space of bounded linear operators from X to Y which
such that
(a) .F ∈ Iso(U, V )
(b) .F −1 ∈ C 1 (V , X) and for every .v ∈ V there results .(F −1 )% (v) = (F % (u))−1 ,
where .u = F −1 (v).
(c) If .F ∈ C k (X, Y ) for some .k > 1, then .F −1 ∈ C k (V , X)
Proof A few reductions may be convenient. Firstly, we assume that .u∗ = 0 and
∗
.v = 0: the general case follows by composition with two translations. If we define
.F = I + ,
1
≤ *p − q*.
2
This shows that . is a contraction and that .*(p)* ≤ (1/2)*p* whenever .*p* < r.
Now, for every .v ∈ X we introduce the auxiliary function
v (u) = v − (u).
.
344 14 Differentiating Again: Linearization in Normed Spaces
The function .v is also a contraction, and for every .u ∈ B(0, r) and .v ∈ B(0, r/2)
there results
As a consequence, whenever .*v* ≤ r/2, .v maps .B(0, r) into itself and is a
contraction. It follows that .v possesses a unique fixed point .u ∈ B(0, r) which
satisfies .u = v (u), i.e.
u = v − (u).
.
u + (u) = v,
. w + (w) = z.
1
|u − w| ≤ |v − z* + |(u) − (w)| ≤ *v − z* + *u − w*.
.
2
Hence .*F −1 (v) − F −1 (z)* ≤ 2*v − z*, and .F −1 is actually Lipschitz continuous.
Letting
r
V = B 0,
. , U = B(0, r) ∩ F −1 (V ),
2
we see that .F|U ∈ Iso(U, V ). This proves (a).
To prove (b), we set again .u = F −1 (v) and from .v = u + (u) we derive
.F
−1 (v) = v − (F −1 (v)). But .(u) = o(*u*) as .|u* → 0 and F is Lipschitz
continuous, hence .(F −1 (v)) = o(*v*) and .F −1 is differentiable at .v = 0 with
.(F
−1 )% (0) = I . To treat the general case, we pick .v ∈ B(0, r/2) and .u = F −1 (v);
we then translate both u and v to the origin of X, and we find that .(F −1 )% (v) =
(F % (u))−1 . The application
(F % )−1 : v → (F % (F −1 (v)))−1
.
can be factored as
F : × X → Y.
.
The strategy of the proof is to reduce the conclusion to the setting of Theo-
rem 14.39. This is the technical content of the following Lemma.
Lemma 14.3 Let .(λ∗ , u∗ ) ∈ × U . Suppose that
1. F is continuous, .∂2 F exists in .×U , and .∂2 F : ×U → L(X, Y ) is continuous;
2. .∂2 F (λ∗ , u∗ ) is a bounded linear invertible operator from X to Y .
Then the application . : × U → T × Y such that
A = ∂1 F (λ∗ , u∗ ),
. B = ∂2 F (λ∗ , u∗ ).
The local inverse . os . is of the form .(λ, v) = (λ, ϕ(v)) for some function
ϕ : " × V → X defined in an open neighborhood ." × V of .(λ∗ , F (λ∗ , u∗ )) and
.
satisfying .F (λ, ϕ(λ, v)) = v for every .λ ∈ ". This follows at once from the fact
that the first component of . is the identity in .. An easy induction argument shows
also that .ϕ ∈ C k provided that .F ∈ C k .
We define .g(λ) = ϕ(λ, 0) for every .λ ∈ ". It follows that
for every .λ ∈ ". Hence (a) follows, and (b) follows from the fact that . is injective.
Differentiating the identity .F (λ, ϕ(λ, v)) = v yields
. ∂1 F + ∂2 F ◦ ∂1 ϕ = 0
∂2 F ◦ ∂2 ϕ = I.
The nature of Theorem 14.39 in strongly local, since the assumption on the
derivative at a single point does not allow us to prove that the given function is
globally injective. It is then natural to investigate whether a global Inverse Function
Theorem may exist at all. In this section we first present a very classical result in
this direction, whose proof is inspiring in its own. As a further improvement, we
prove a strong result which requires some definitions of Algebraic Topology.
Theorem 14.41 (Hadamard-Caccioppoli) Let X and Y be Banach spaces, and let
.F ∈ C 1 (X, Y ) such that .F % (x)−1 ∈ L(Y, X) for all .x ∈ X. If there exist constants
.A > 0 and .B > 0 such that
* *
* % −1 *
. *F (x) * ≤ A*x* + B for all x ∈ X,
14.15 A Global Inverse Function Theorem 347
It is clear that .0 ∈ S, and Theorem 14.39 shows that S is an open set, since
(∂2 F̃ )−1 (t, x) = F % (x)−1 ∈ L(Y, X). We claim that S is closed, and this implies
.
F̃ (t, xt ) = 0
. for every t ∈ (a, b).
This follows from the Implicit Function Theorem. Differentiating both sides we get
dxt
F % (xt )
. = y − F (x0 ).
dt
Therefore
* *
* dxt * * *
.* * * % −1 *
* dt * ≤ *F (xt ) * *y − F (x0 )* ≤ (A *xt * + B) *y − F (x0 )*. (14.5)
for every .t > c. It follows easily10 that there exists a constant .C > 0 such that
9 Recall that this means that both F and .F −1 are differentiable with continuous inverse.
10 This is often called Gronwall’s inequality.
348 14 Differentiating Again: Linearization in Normed Spaces
x(0) = x0
.
x(1) = x1
f (x(s)) = y for every s ∈ [0, 1].
This will immediately contradict the local invertibility of F . We set .I = [0, 1] and
T : I × C0 (I, X) → C0 (I, Y ) such that
.
where
S = {t ∈ I | T (t, u) = 0 is solvable}
.
dut
F % (γ (s) + ut (s))
. = y − F (γ (s)).
dt
14.15 A Global Inverse Function Theorem 349
As before we obtain
* *
* dut *
* * ≤ A *ut *C0 (I,X) + B1 *y − F ◦ γ *C0 (I,Y ) ,
.
* dt *
C0 (I,X)
and .γ (1) = y.
Definition 14.31 A topological space X is simply connected if and only if it is
path-connected and for every continuous .p : [0, 1] → X, .q : [0, 1] → X such that
.p(0) = q(0), .p(1) = q(1), there exists a continuous .F : [0, 1] × [0, 1] → X such
The proof is rather long, and requires some preliminaries. In the rest of this
section, X, Y and Z will denote Hausdorff spaces.
Definition 14.33 Let .f : X → Y be a local homeomorphism, and let .p : Z → Y
be a continuous function. A continuous function .p̃ : Z → X is a lifting of p by f if
and only if .f ◦ p̃ = p.
Proposition 14.1 Let .f : X → Y be a local homeomorphism, and let .p : Z → Y
be a continuous function. If Z is connected and if .p̃1 , .p̃2 : Z → X are both liftings
of p by f , then either .p̃1 = p̃2 , or .p̃1 (z) = p̃2 (z) for every .z ∈ Z.
350 14 Differentiating Again: Linearization in Normed Spaces
Proof Let .C = {z ∈ Z | p̃1 (z) = p̃2 (z)}. We first claim that C is open in Z. If
C = ∅, there is nothing to prove; otherwise we consider .z0 ∈ C and let .x0 =
.
p̃1 (z0 ) = p̃2 (z0 ). Moreover, let U and V open neighborhoods of .x0 and .f (x0 ) such
that .f : U → V has a continuous inverse .g : V → U . The set .W = p̃1−1 (U ) ∩
p̃2−1 (U ) is an open neighborhood of .z0 and there results .p̃1 |W = p̃2 |W = g ◦ p|W .
Thus .W ⊂ C and C is an open set.
By a very easy argument the complement .Z \ C is also open, hence .C = Z by
connectedness. The proof is complete.
Definition 14.34 A local homeomorphism .f : X → Y lifts the paths if and only
if, for every continuous .α : [0, 1] → Y such that .α(0) ∈ f (X) and for every .x0 ∈
f −1 ({0}), there exists a lifting .α̃ : [0, 1] → X of .α such that .α̃(0) = x0 .
Exercise 14.21 Prove that there exists at most one lifting .α̃ as described in the
previous Definition. Hint: use Proposition 14.1.
Definition 14.35 Any continuous function .H : Z × [0, 1] → Y is a homotopy
with base .H0 : Z → Y such that .z → H (z, 0). A function .f : X → Y lifts the
homotopies if, for every homotopy H and any continuous function .H̃0 : Z → X
such that .H̃0 = H0 , there exists a continuous lifting .H̃ with base .H̃0 , i.e. .f ◦ H̃ = H
and .H̃ (z, 0) = H̃0 (z) for all .z ∈ Z.
Proposition 14.2 If a local homeomorphism .f : X → Y between Hausdorff spaces
lifts the paths, then it lifts the homotopies.
Proof Let .t → H̃ (z, t) be the unique lifting of the path .t → H (z, t) with origin
.H̃0 (z), for any .z ∈ Z. It is clear that .f ◦ H̃ = H , and that .H̃ (z, 0) = H̃0 (z). Since
.z ∈ Z is arbitrary, we have thus defined .H̃ : Z × [0, 1] → X. We claim that .H̃ is
continuous.
Let .z0 ∈ Z and let
.D = t ∈ [0, 1] H̃ is not continuous at (z0 , t) .
These two functions agree on .W × {b}, hence for every .z ∈ W the functions
t → H̃ (z, t)
.
t → (f |U )−1 ◦ H (z, t)
know that there exists a unique lifting .α̃ of .α such that .α̃(0) = x0 . Since .f ◦ α̃ = α
yields .f (α̃(1)) = y. Since .y ∈ Y is arbitrary, f is surjective.
To prove that f is injective, let .x0 and .x1 be two distinct points of X such that
.f (x0 ) = f (x1 ) = y0 . Since X is path-connected, we can consider a continuous
It follows from the definitions that a constant path11 is lifted to a constant path.
In particular .h̃(0, s) = σ (0) = x0 , .h̃(1, s) = σ (1) = x1 for every .s ∈ [0, 1].
Exploiting the fact that .t → h̃(t, 1) is also constant, we have
x0 = h̃(0, 1) = h̃(1, 1) = x1 .
.
11 This refers to any continuous function .t → α(t) such that .α(t) does not depend on .t ∈ [0, 1].
352 14 Differentiating Again: Linearization in Normed Spaces
.ωφ = φ([t, b)) 0 ≤ t < b .
Proof We assume that .J = [0, a] for some .0 < a < 1. At the point .f (φ(a)) the
function f has a local inverse, and we can extend .φ to a lifting defined on some
larger interval. This contradicts the maximality of .φ, and therefore the maximal
lifting must be defined on an interval .[0, b) with .0 < b < 1.
To prove the second statement, we assume that .x0 ∈ ωφ . Then .f (x0 ) = α(b)
since f is continuous, and thus .f (φ([t, b))) ⊂ f (φ([t, b)) and
. f (φ([t, b)) 0 ≤ t < b = {α([t, b]) | 0 ≤ t < b} = {α(b)} .
Pick open neighborhoods U and V of .x0 and .f (x0 ) respectively, such that
.f |U : U → V is a homeomorphism, and let g be the inverse of .f |U . We select
.a ∈ [0, b) such that .α([a, b]) ⊂ V and such that .φ(a) ∈ U . We then define
g ◦ α|(a, b]. Again we reach a contradiction with the maximality of .φ, and the proof
is complete.
Proof (of Theorem 14.42) Since (a) trivially implies (b), we only prove that (b)
implies (a). Let f be a proper function. If we can prove that f lifts the paths, an
application of Proposition 14.3 gives the desired result.
We argue by contradiction, assuming the existence of a continuous path
.α : [0, 1] → Y
and of a point .x0 ∈ f −1 (α(0)) such that the maximal lifting .φ of .α with .φ(0) = x0
is defined on a proper subset .[0, b) of .[0, 1]. Then we know that .ωφ = ∅. But
14.16 Critical and Almost Critical Points 353
φ([0, b)) ⊂ f −1 (α([0, 1])) and the latter set is compact since .α([0, 1]) is compact
.
in Y and f is proper.
Every finite collection of closed sets .{φ([ti , b))}i has non-empty intersection,
then
.ωφ = φ([t, b)) 0 ≤ t < b = ∅,
Every (or almost every) student should remember the definition of critical point for a
function .f : R → R: we say that .x0 is a critical point of f if and only if .f % (x0 ) = 0
(which implies that f must be differentiable at .x0 ). What can we do for functions
between normed vector spaces?
Definition 14.38 Let X, Y be Banach spaces, and let U be an open subset of X. A
point .u ∈ U is a critical point of a function .F : U → Y if and only if .DF (u) ∈
L(X, Y ) is not surjective.
Example 14.9 If .Y = R, a critical point is just a point .u ∈ U such that .DF (u) =
0 ∈ X∗ , i.e.
DF (u)(v) = 0
. for every v ∈ X.
(u) ≤ inf + ε.
.
M
is a partial order on M. We call .u0 = u and we suppose that .un is know. Let
Sn = {w ∈ M | w ≤ un } .
.
1
(un+1 ) ≤ inf +
. .
Sn n+1
Since .un+1 ≤ un we see that .Sn+1 ⊂ Sn . Moreover .Sn is a closed subset since . is
lower semicontinuous. Now, for every .w ∈ Sn+1 we have .w ≤ un+1 ≤ un , so that
1 1
εd(w, un+1 ) ≤ (un+1 ) − (w) ≤ inf +
. − inf = .
Sn n+1 Sn n+1
As a consequence,
2
. diam Sn+1 = sup {d(w1 , w2 ) | w1 ∈ Sn+1 , w2 ∈ Sn+1 } ≤ .
ε(n + 1)
and
(u) − (v) infM + ε − infM
.d(u, v) ≤ ≤ = 1.
ε ε
ϕ(u) ≤ inf ϕ + ε,
.
X
ϕ(v) ≤ ϕ(u)
.
√
*u − v* ≤ ε
√
*Dϕ(v)* ≤ ε.
ϕ(vn ) ≤ ϕ(un )
.
*un − vn * → 0
*Dϕ(vn )* → 0.
Proof We define
1
. d(v, T (v)) ≥ ϕ(v) − ϕ(T (v)).
2
Hence .d(v, T (v)) ≤ 0.
Exercise 14.24 Let .(M, d) be a complete metric space, .T : M → M such that
there exists .L ∈ [0, 1) such that .d(T (u), T (v)) ≤ Ld(u, v) for every .u ∈ M and
.v ∈ M. Use the previous exercise to show that T has a fixed point. Hint: try to
choose a constant .c > 0 such that .ϕ(u) = cd(u, T (u)) satisfies the conditions of
the previous exercise.
At this point, the natural question is: does the sequence .{vn }n of Corollary 14.8
converge?
The answer is negative, in general. This is the reason why the following definition
was introduced.
Definition 14.39 Let X be a Banach space, .F : X → R be a differentiable function.
The function F satisfies the Palais-Smale condition ((PS) for short) if and only if
every sequence .{un }n in X such that .{F (un )}n is bounded and .*DF (un )* → 0 as
.n → +∞ has a convergent subsequence.
14.17 Problems
14.1 Let (X, * · *) be a normed space, and let f : X → R be the function such that
f (u) = *u* for every u ∈ X. Prove that the Gâteaux derivative of f at u = 0 does
not exist.
14.2 Let σ : Rn \ {0} → Rn be the function such that σ (x) = x/*x* for every
x ∈ Rn \ {0}. Prove that
1 h
.Dσ (x) : h ∈ Rn → − "x | h#x + ,
*x*3 *x*
where "· | ·# denotes the usual inner product of Rn and * · * the associated Euclidean
norm.
14.17 Problems 357
n
pf (x) =
. xk ∂k f (x) for every x ∈ C.
k=1
F (u + tn vn ) − F (u)
. lim = Av
n→+∞ tn
for all v ∈ X, for all sequences {tn }n in R\{0} such that tn → 0 and for all sequences
{vn }n in X such that vn → v. Similarly, F is weakly Hadamard-differentiable at
u ∈ X if and only if there exists A ∈ L(X, Y ) such that
F (u + tn vn ) − F (u)
. lim = (Av)
n→+∞ tn
for all v ∈ X and all ∈ Y ∗ , for all sequences {tn }n in R \ {0} such that tn → 0 and
for all sequences {vn }n in X such that vn $ v. Prove the following statements:
1. F is Fréchet-differentiable at u ∈ X if and only if there exists A ∈ L(X, Y ) such
that
F (u + tv) − F (u)
. lim = Av
t →0 t
F (u + tv) − F (u)
. lim = (Av)
t →0 t
tz(· + t −1 ) for t = 0
F (t) =
.
0 for t = 0.
14.18 Comments
References
Abstract The standard approach to the Lebesgue integral is via measure theory:
we must define a set function—called a measure—on a set of suitable sets—called
measurable sets, then we can define measurable functions, and finally integrable
functions. The main advantage of this approach is that at the end we have the
highest generality. On the other hand, such a construction requires a good amount
of mathematical education before it can be understood.
Let us present the basic construction of the Riemann integral in .Rn . We will see that
much the same ideas that we developed in .R can be adapted. We start with some
notation.
Definition 15.1 (Cell) An n-cell is a cartesian product
.B = [a1 , b1 ] × · · · × [an , bn ]
(
n
. Vol(B) = (bk − ak ).
k=1
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 361
S. Secchi, A Circle-Line Study of Mathematical Analysis,
La Matematica per il 3+2 141, https://doi.org/10.1007/978-3-031-19738-3_15
362 15 A Functional Approach to Lebesgue Integration Theory
Mk = sup {f (x) | x ∈ Bk }
.
mk = inf {f (x) | x ∈ Bk } ,
p
U (P , f ) =
. Mk Vol(Bk )
k=1
p
L(P , f ) = mk Vol(Bk ),
k=1
L(P , f ) ≤ L(P % , f ),
. U (P % , f ) ≤ U (P , f ).
.L(P1 , f ) ≤ U (P2 , f ).
The number
. f = f.
B B
. f,
B
The theory of the Riemann integral can now be developed as it was developed in
R, but this is not our aim. We are going to construct a different integral, called the
.
Lebesgue integral.
I (αh + βk) = αI h + βI k.
.
I hp < ε,
.
Definition 15.6 Let P (x) be a logical statement depending on the free variable x.
If
. {x ∈ X | P (x)}
is a set of full measure, we say that the property P holds for almost every x ∈ X, or
that P holds almost everywhere in X.
Example 15.1 Let {hp }p be a sequence of elementary functions. We say that hp →
0 almost everywhere on X if hp (x) → 0 (as p → +∞) for every x is a subset of
full measure of X.
Here is a first application of our definitions. The reader is invited to compare the
next statement with the third property of the elementary integral.
Proposition 15.1 Suppose that a sequence {hp }p of non-negative elementary
functions is non-increasing and converges to zero almost everywhere. Then
. lim I hp = 0.
p→+∞
Let Z be the subset of X on which the sequence {hp }p does not converge to zero: by
assumption, Z is a null set. If ε > 0, there exists a non-decreasing sequence {kp }p
of elementary functions such that
ε
I kp <
.
M1
ε
. lim I hp ≥ 0, lim I kp ≤
p→+∞ p→+∞ M1
and therefore
. lim I hp − M1 lim I kp ≤ 0.
p→+∞ p→+∞
15.4 The Class L+ 365
But then
ε
0 ≤ lim I hp ≤ M1 lim I kp ≤ M1 ·
. = ε.
p→+∞ p→+∞ M1
∞
is non-decreasing, supp hp ≥ 1 on n=1 Zn , and
p
I hp ≤
. I hn,p ≤ ε.
n=1
Remark 15.1 Quite often the arrows - and . mean monotone convergence at
every point. Be careful, since we will always use them with almost everywhere
convergence.
C = sup I hp ∈ R.
.
p
Proof Let Z be the set of all x ∈ X such that f (x) = +∞. Replacing hp with
hp − h1 , we may assume that each hp is non-negative. Discarding a set of measure
zero, we may then assume that {hp }p is non-decreasing and converges to +∞ on
the whole set Z.
Pick ε > 0 and x ∈ Z; then the inequality
C
hp (x) ≥
.
ε
must holds from some value of p onwards. Hence Z is covered by the countable
family of sets
C
. x hp (x) ≥ , p ∈ N.
ε
Hence
εhp (x)
. sup ≥1
p C
and
εhp ε
I
. = I hp ≤ ε.
C C
. lim I hp = lim I kp .
p→+∞ p→+∞
n → hm − kn .
.
15.4 The Class L+ 367
I (hm − kn )+ . 0.
.
n → I (hm − kn ) = I hm − I kn
.
If = lim I hp ,
.
p→+∞
. sup I hp ∈ R
p
sup I kp ∈ R.
p
I (f + g) = If + Ig.
.
I (αf ) = α If.
.
If = lim Ifn .
.
n→+∞
h11 ≤ . . . ≤ h1n ≤ . . . ,
. h1n - f1
h21 ≤ . . . ≤ h2n ≤ . . . , h2n - f2
···
hk1 ≤ . . . ≤ hkn ≤ . . . , hkn - fk
···
It is clear that hn is an elementary function, and that hn ≤ hn+1 for every n. Since
hn ≤ max {f1 , . . . , fn } = fn , we see that
. If ∗ = lim I hn .
n→+∞
n ∞
If = lim
. Igk = Igk .
n→+∞
k=1 k=1
n
n →
. gk
k=1
We can now complete the last step of our construction. We have seen that the
class .L+ is not closed under multiplication by negative numbers, so that we cannot
subtract elements of .L+ . This is a gap we need to fill by enlarging the class .L+ .
Definition 15.9 The class .L = L(X) of integrable functions on X is the set of all
functions .ϕ on X which can be represented almost everywhere as .ϕ = f − g, for
some .f ∈ L+ and .g ∈ L+ .
It is evident from the previous discussion that L enjoys the following proper-
ties:
1. if .ϕ1 = f1 − g1 and .ϕ2 = f2 − g2 are elements of L, then .ϕ1 + ϕ2 = (f1 + f2 ) −
(g1 + g2 ), and therefore .ϕ1 + ϕ2 ∈ L.
2. If .ϕ = f − g and .α ∈ R, then .αϕ ∈ L, Indeed, if .α ≥ 0, then .αϕ = αf − αg
and .αf , .αg belong to .L+ . If .α < 0, then .−α > 0 and .αϕ = (−α)g − (−αf ),
and it follows again that .αϕ ∈ L.
3. If .ϕ ∈ L, then .ϕ + , .ϕ − and .|ϕ| belong to L. Indeed, from .ϕ = f − g it follows
that
belongs to L. Then
|ϕ| + ϕ
ϕ+ =
.
2
|ϕ| −ϕ
ϕ− =
2
also belong to L by linearity.
We propose a formal definition, which will immediately be justified.
Definition 15.10 The integral .I ϕ of a function .ϕ ∈ L is defined as
I ϕ = If − Ig,
.
where .ϕ = f − g, .f ∈ L+ and .g ∈ L+ .
Proposition 15.3 The integral of .ϕ ∈ L is independent on the representation .ϕ =
f − g.
Proof Let .ϕ = f − g = f1 − g1 , for suitable functions f , .f1 , g and .g1 in .L+ . Since
.f + g1 = g + f1 , we have .I (f + g1 ) = I (g + f1 ), or .If + Ig1 = Ig + If1 . This
is equivalent to
.I (f − g) = I (f1 − g1 ),
Important: Notation
!
We have carefully
! avoided the use of the integral symbol . for our abstract integral.
The use of . X ϕ is clearly possible—and often useful—, but our choice of the letter
I highlights the functional nature of our integral. In other words, in our approach
the integral is a linear operator which acts on the vector space .L(X), and we prefer
to encourage this viewpoint.
When we were dealing with sequences and series of functions, we realized quite
easily that a pointwise convergence of the integrands does not imply the convergence
of the integrals. Uniform convergence was a successful replacement, but the
weakness of the Riemann integral with respect to limits remains a matter of facts.
We want to convince the reader that the Lebesgue (abstract) integral I is much more
flexible. We begin with a technical lemma.
Lemma 15.1 Any .ϕ ∈ L admits a representation with the following property: for
every .ε > 0, there exist .f ∈ L+ and .g ∈ L+ such that .ϕ = f − g, .g ≥ 0 and
.Ig < ε.
ϕ = f − g = (f − hn ) − (g − hn ) = fn − gn .
.
Now .fn ∈ L+ since .fn = f − hn is the sum of two elements of .L+ , and similarly
+
.gn = g − hn ∈ L . Clearly .gn ≥ 0 for n sufficiently large, and .Ign < ε because
.limn→+∞ Ign = 0.
Theorem 15.3 (Beppo Levi) Assume that .ϕk ∈ L, .ϕk ≥ 0 for every .k ∈ N, and
" #
n
I
. ϕk ≤ C, n∈N
k=1
∞
for some suitable constant C. Then .ϕ = k=1 ϕk belongs to L, and
∞
.Iϕ = I ϕk .
k=1
372 15 A Functional Approach to Lebesgue Integration Theory
Proof We use Lemma 15.1 to decompose .ϕk = fk −gk , where .fk , .gk ∈ L+ , .gk ≥ 0
and .Igk < 2−k , for every .k = 1, 2, . . .. Since .ϕk ≥ 0,
it turns out that .fk ≥+ 0. It is
easy to check that Theorem 15.2 applies, hence .g = ∞ k=1 gk belongs to .L and
∞
Ig =
. Igk .
k=1
∞
∞ same conclusion holds also for .fk , and then .f =
The k=1 fk ∈ L+ , .If =
k=1 Ifk . Putting everything together we see that
∞
∞
∞
ϕ=
. ϕk = fk − gk = f − g ∈ L
k=1 k=1 k=1
and
∞
∞
I ϕ = If − Ig =
. I (fk − gk ) = I ϕk .
k=1 k=1
Recalling that sequences and series are the same mathematical object, we deduce
the following statement.
Theorem 15.4 If .ψn ∈ L for .n = 1, 2, . . ., .ψn - ψ and .I ψn ≤ C for every n,
then .ψ ∈ L and .I ψ = limn→+∞ I ψn .
Proof We set .ϕ1 = ψ1 , .ϕn = ψn −ψn−1 , and the conclusion follows from Theorem
15.3.
A very useful consequence of Beppo Levi’s Convergence Theorem is a sort of
characterization of nonnegative functions whose integral vanishes.
Exercise 15.6 Prove that the integral of a function .ϕ ∈ L such that .ϕ = 0 almost
everywhere is zero.
The converse implication is contained in the next result.
Theorem 15.5 Let .ϕ0 ∈ L be a non-negative function such that .I ϕ0 = 0. Then
ϕ0 = 0 almost everywhere in X.
.
Corollary 15.2 If .Z ⊂ X, suppose that for every .ε > 0 there exists a sequence of
integrable functions
such that .I ϕε,n < ε for each n and .supn ϕε,n ≥ 1 on Z. The Z is a set of measure
zero.
Proof Of course there is nothing to prove if each .ϕn is an elementary function. In
the general case, let .ϕε = limn→+∞ ϕε,n . By Beppo Levi’s Theorem, .ϕ is integrable
and
I ϕε = lim I ϕε,n ≤ ε.
.
n→+∞
ψ1 = ϕ1
.
ψ2 = min {ϕ1 , ϕ2 }
...
ψn = min {ϕ1 , ϕ2 , . . . , ϕn }
...
Each function .ψm is non-negative and .ψm ≥ 1 on the set Z. Furthermore .ψ1 ≥
ψ2 ≥ ψ3 ≥ . . . and
1
I ψm ≤ I ϕ1/m ≤
. .
m
Setting .ψ = limm→+∞ ψm , we see that .ψ ∈ L and .I ψ = limm→+∞ I ψm = 0. The
limit .ψ is non-negative, and .ψ ≥ 1 on Z. Hence the set .Z % = {x ∈ X | ψ(x) > 0}
has measure zero. But then Z has measure zero.
The most powerful Convergence Theorem for our integral is due to Lebesgue.
Theorem 15.6 (Dominated Convergence Theorem) Suppose that .ϕn ∈ L for
every n, .ϕn → ϕ almost everywhere, and there exists a function .ϕ0 ∈ L such
that
we have
. I (|ϕn |) ≤ C.
Theorem 15.7 (Fatou’s Lemma) Suppose that .ϕn ∈ L, .ϕn ≥ 0, .ϕn → ϕ almost
everywhere, and for some constant .C ≥ 0 we have .I (|ϕn |) ≤ C for every n. Then
.ϕ ∈ L and
0 ≤ I ϕ ≤ C.
.
Proof We define .χn = inf {ϕn , ϕn+1 , . . .}, observing that .χn ≤ χn+1 and .χn → ϕ
almost everywhere. Furthermore .χn ≤ ϕn , .I χn ≤ I ϕn ≤ C, and Beppo Levi’s
Theorem for sequences implies that .ϕ ∈ L. Since .I χn - I ϕ, it follows in particular
that .0 ≤ I ϕ = limn→+∞ I ϕn ≤ C.
∞
where the real coefficients .cnp are chosen so that the series . ∞ n=1 p=1 cnp
converges. By Beppo Levi’s Theorem, .ϕ0 is integrable, and .ϕ0 (x) > 0 whenever
.fn (x) > 0. It follows that f is also the limit of the sequence
According to Theorem 15.9, we need to show that the measurability of .fn implies
the measurability of .gn . But
and each function .min {fm (x), nϕ0 (x)} is measurable and bounded by the integrable
function .nϕ0 . The Dominated Convergence Theorem then yields the integrability of
.gn , and the proof is complete.
Corollary 15.3 If .{fn }n is a sequence of measurable functions, then .infn fn and
.supn fn are measurable functions. If each .fn is finite almost everywhere, then
.lim infn→+∞ fn and .lim supn→+∞ fn are measurable.
Once measurable functions have been defined, measurable sets do not come as a
surprise.
Definition 15.12 A subset .E ⊂ X is measurable if and only if its characteristic
function .χE is a measurable function.
We recall that
1 if x ∈ E
χE (x) =
.
0 if x ∈ X \ E.
Definition 15.13 A subset E of X has finite measure if and only if .χE ∈ L(X). In
this case, the measure of E is the number
µ(E) = I χE .
.
If .χE is not integrable, we set .µ(E) = +∞. In a conventionally way, we set .µ(∅) =
0.
15.7 Measurable Functions and Measurable Sets 377
Exercise 15.8
1. Prove that a measurable subset of a set of finite measure, is a set of finite measure.
2. Prove that any subset of a set of measure zero is a set of measure zero.
Theorem 15.11 The union, intersection and difference of two measurable sets are
measurable sets.
Proof Let E, F be measurable sets. The conclusion follows from the identities
The previous result can be generalized to countable unions as follows.
∞
Theorem 15.12 If each set .En , .n ∈ N, is measurable, then .E = n=1 En is
measurable. Moreover, if .Ei ∩ Ej = ∅ whenever .i = j , then
∞
µ(E) =
. µ(En ). (15.3)
n=1
Proof If some .En has infinite measure, then .µ(E) = +∞, and there is nothing to
prove. Otherwise, we write
E = E1 ∪ (E2 \ E1 ) ∪ (E3 \ E2 ) ∪ . . .
.
n
µ(E) = lim
. µ(Ek \ Ek−1 ) = lim µ(En ),
n→+∞ n→+∞
k=1
E1 = F ∪ (E1 \ E2 ) ∪ (E2 \ E3 ) ∪ . . .
.
and apply (a) again. The term .µ(E1 ) cancels since it is finite, and the proof is
complete.
Up to now, all our integrals have been computed “on the whole space X.” In
many applications it would be convenient to integrate functions over subsets of X.
Although we already have the best possible candidate, i.e. measurable subsets, a
problem arises: consider indeed .µ(X), the measure of the whole space. There is no
need for X to be a measurable set; but even if this were the case, .µ(X) should be
15.8 Integration Over Measurable Sets 379
∞
1 hn (x)
ϕ0 (x) =
.
n2 I hn
n=1
does the job, where .{hn }n is the sequence considered in Axiom (d).
Theorem 15.13 The function .χX is measurable, so that X is a measurable set. In
particular, the set .X \ E is measurable for every measurable set .E ⊂ X.
Proof Actually, .1 = limn→+∞ min {1, nϕ0 }, and Axiom (c) ensures that .{1, nϕ0 } is
measurable.
Corollary 15.5 If .ϕ is a measurable function and a, b, c are real numbers, then
0 if ϕ(x) ≤ c
. lim ϕn,c (x) =
n→+∞ 1 if ϕ(x) > c.
Hence .χ[ϕ>c] is the limit of measurable functions, and this implies the measurability
of .[ϕ > c].
Conversely, suppose that .[ϕ > c] is measurable for every c. Then
is also measurable for every .c < d. Given .n ∈ N, we consider the function .ϕn equal
to .k/n on the measurable sets
k k+1
Ek,n
. = x ∈ X < ϕ(x) ≤ ,
n n
k ∈ Z. The function .ϕn is defined almost everywhere, and differs from .ϕ by at most
.
1/n. Moreover
.
∞
k
ϕn =
. χE ,
n k,n
k=−∞
c
{x ∈ X | f (x)g(x) > c} =
. x ∈ X f (x) > r, g(x) > r ∈ Q, r > 0 ,
r
and the conclusion follows from the fact that .Q is a countable set.
Corollary 15.6 If f is a measurable function and E is a measurable set, then .f χE
is a measurable function. In particular, if f is integrable and E is measurable, then
.f χE is integrable.
. ϕ dµ = I (ϕχE ) .
E
!
At this point, we will freely write . X ϕ dµ for .I ϕ, as a particular case. Of
course this is just a customary piece of notation that adds no content to the
formal .I (ϕχE ).
. |ϕ| dµ ≤ Mµ(E).
E
Proposition 15.7 If .{En | n ∈ N} is a countable family of mutually disjoint
measurable sets, and if .ϕ is integrable (resp. measurable) on .E = n∈N En , then
.ϕ is integrable (resp. measurable) on each .En . If .ϕ is integrable on E, then
∞
. ϕ dµ = ϕ dµ.
E n=1 En
and each function on the right-hand side is measurable. By Theorem 15.10 the
function .χE ϕ is measurable.
Theorem 15.15 (Absolute Continuity of the Integral) Suppose that .ϕ is an
integrable function on X. For every .ε > 0 there exists .δ > 0 such that if E is a
measurable set and .µ(E) < δ, then
.
ϕ dµ < ε.
E
≤ ||ϕ| − h| dµ + h dµ
E E
ε
≤ + Mδ < ε
2
whenever E is a measurable set with .µ(E) < δ.
Definition 15.15 Let .B = B1 ∪B2 ∪· · ·∪Bn be a partition of the basic n-cell .B, and
we suppose that the different sub-cells .Bj do not have interior points in common.1
Any function h such that
⎧
⎪
⎪h1 if x ∈ B1
⎪
⎪
⎪
⎨h2 if x ∈ B2
.h(x) = ..
⎪
⎪
⎪
⎪ .
⎪
⎩h
n if x ∈ Bn
for suitable real numbers .h1 , . . . , hn is a step function. The collection of all step
functions on .B is denoted by .H (B).
Remark 15.3 The previous definition is somehow troublesome, since h might be
defined in different ways on the layer .Bi ∩ Bj , for .i = j . This is irrelevant to us, and
we might even leave step functions undefined on the interface layers of the partition.
As we will see in a moment, their values on such layers plays no role at all in our
construction.
Definition 15.16 Let .h ∈ H (B) be a step function, and consider the sets
Bj = x ∈ B h(x) = hj ,
. j = 1, . . . , n.
The integral of h is
n
Ih =
. hj Vol(Bj ).
j =1
1Geometrically, we are assuming here that the cells .Bj can touch each other only at their
boundaries.
384 15 A Functional Approach to Lebesgue Integration Theory
corresponds an integer .m = m(x0 ) such that .fm (x0 ) < ε. By continuity, there
exists a neighborhood .U (x0 ) of .x0 such that .fm (x) < ε for every .x ∈ U (x0 ).
By monotonicity, if .p > m then .fp (x) ≤ fm (x) < ε for every .x ∈ U (x0 ).
As .x0 ranges over K, the neighborhoods .U (x0 ) form an open cover of K: let
.{U (x1 ), U (x2 ), . . . , U (xν )} be a finite subcover of the compact set K. If q denotes
the smallest index of the functions which participate in this subcover, we see that
.fr (x) < ε for every .x ∈ K and every .r > q, and the proof is complete.
As before, we can now turn on the engine of our abstract extension machine, and
obtain a space .L̃(B) of integrable functions and a corresponding integral .I˜. We
now show that .L̃ = L and .I˜ = I , so that the concrete Lebesgue integral can be
equivalently defined in two ways.
Theorem 15.17 There results .L̃(B) = L(B) and .I˜ = I .
Proof Since the proof is long, we split it into several steps.
Step 1. Every continuous function f belongs to L. Indeed, let .ε > 0 be given,
and we select a partition .P = {B1 , . . . , Bm } of .B such that
m
. f− f (ξj ) Vol(Bj ) < ε,
B j =1
where .hP is the step function whose value on .Bj is .f (ξj ). Since .hP converges
uniformly to f as the partition P is indefinitely refined, it follows from the
Dominated Convergence Theorem that .f ∈ L(B) and
If = I˜f.
.
Step 2. Every step function h belongs to .L̃. This is actually a density statement
about the approximation of step functions by continuous function. We consider a
function h which is equal to 1 on a cell B and to 0 outside B. If the dimension n of
the space is 1, a “trapezoidal” graph shows that there exists a continuous (actually
piecewise affine) function that approximates h with any prescribed precision. If
.n > 1, we just pick such an approximating function .fi = fi (xi ) in each variable
.xi , and define .(x1 , . . . , xn ) → f1 (x1 )f2 (x2 ) · · · fn (xn ). As a consequence, each
Step 3. Both constructions lead to the same sets of measure zero.2 Indeed, let
.Z̃ be a set of measure zero with respect to the integral .I˜. Given .ε > 0, there
of step functions .hm with bounded integrals .I hm . Then the integrals .I˜hm = I hm
remain bounded, too, and hence by Beppo Levi’s Theorem .f ∈ L̃, .I˜f = If .
On the other hand, if .f ∈ L̃+ , then f is the limit (almost everywhere) of a non-
decreasing sequence of continuous functions .fm with bounded integrals .I˜fm . As
before the integrals .Ifm = I˜fm remain bounded, .f ∈ L and .If = I˜f . By taking
differences, .L̃ contains every function of L, and vice-versa, with equal integrals.
The proof is now complete.
For the concrete Lebesgue integral, a geometric characterization of sets of
measure zero is possible. We remark that the next result is usually taken as the
definition of measure zero in Euclidean spaces.
Theorem 15.18 Let .Z ⊂ B be a subset of the basic block .B. The following
statements are equivalent:
(a) for every .ε > 0 there exists a countable (i.e. finite or countably
∞ infinite)
collection of n-cells .B1 , B2 , . . . , such that .Z ⊂ ∞
k=1 Bk and . k=1 Vol(Bk ) <
ε;
(b) for every .ε > 0 there exists a non-decreasing sequence of non-negative step
functions
(ε)
h1 ≤ · · · ≤ h(ε)
. m ≤ ···
(ε)
such that .I hm < ε for every m and
Proof Assume that (a) holds, and call .h(ε) m the step function which is equal to 1
on the cells .B1 , . . . , Bm and equal to 0 outside these cells. Now, every point .x0 ∈
2 Hence the sentence “almost everywhere” can be interpreted with respect to both integrals.
386 15 A Functional Approach to Lebesgue Integration Theory
(ε)
Z belongs to some cell .Bm , hence .hm (x0 ) = 1. The remaining properties of the
(ε)
sequence .{hm }m are trivial, and therefore (b) holds.
Conversely, suppose that (b) holds. Let .B1 , . . . , Br1 be the collection of cells
(ε) (ε)
on which the function .h1 ≥ 1/2. Then .h2 is larger than .1/2 on the same cells,
and also on some other cells .Br1 +1 , . . . , Br2 . Iterating this construction, we obtain
an infinite collection of cells .B1 , . . . , Br1 , . . . , Br2 , . . . with no interior points in
common. Since
. m (x) ≥ 1
sup h(ε) for every x ∈ Z,
m
the set Z is covered by all these cells. To complete the proof, we need to compute
the sum of the volumes of such cells. We consider only the cells .B1 , . . . , Brm on
which .h(ε) (ε)
m ≥ 1/2, it follows from .I hm < ε that
rm
. Vol(Bk ) ≤ 2ε.
k=1
Letting .m → +∞ we derive . ∞ k=1 Vol(Bk ) ≤ 2ε. We must now remark that
the cells .Bk just considered may not cover Z, since the points of Z need not lie
in the interior of such cells. But this is not an obstruction, since wemay replace
% % ∞ %
.Bk by a concentric cell .B such that .Vol B
∞ k k = 2 Vol(Bk ), .Z ⊂ k=1 Bk and
%
. k=1 Vol(Bk ) ≤ 4ε. Hence (a) holds, and the proof is complete.
of integrable functions on each set, and that .IX , .IY , .IW = I are the corresponding
abstract integrals. Assume moreover that the family .H (W ) of elementary functions
which generate .L(W ) has the following properties:
(a) for every function .h ∈ H (W ), the function .x → h(x, y) belongs to .L(X) for
almost every .y ∈ Y ;
(b) for every .x ∈ X, the function .y → IX h(x, y) belongs to .L(Y );
(c) .I h = IY (IX h).
15.10 Integration on Product Spaces 387
The family .L(W ) has the same properties: every function .ϕ ∈ L(W ) is integrable in
the first variable for almost every value of the second variable, the integral .IX ϕ(x, ·)
is integrable on Y , and .I ϕ = IY (IX ϕ).
Proof The proof is long, so we split it into several steps. We define the class . of
all functions .ϕ ∈ L(W ) for which the conclusion is true. We will eventually show
that . = L(W ). Since . contains all elementary functions by assumption, we only
need to prove the inclusion .L(W ) ⊂ .
Step 1. . is closed under the formation linear combinations. This is trivial, and
we omit the details.
Step 2. . is closed under monotonic limits. Let .{ϕn }n be a sequence in . which is
monotonic, and suppose that the sequence of the integrals .I ϕn remains bounded.
Then the pointwise limit .ϕ = limn→+∞ ϕn belongs to .. Indeed, suppose
for the sake of definiteness that .{ϕn }n is non-decreasing, and put .gn (y) =
IN ϕn (·, y).The the sequence .{gn }n is also non-decreasing and the sequence of
integrals .IY gn remains bounded. Furthermore
IY gn = IY (IX ϕn ) = I ϕn - I ϕ.
.
1
I hm = lim I hm,n ≤
.
n→+∞ m
I h = lim I hm = 0.
m→+∞
. IY g = IY (IX h) = I h = 0
I z = 0 = IY (IX z).
.
1 on Z
=
.
0 on W \ Z.
There results
1
z = lim n min , z ,
.
n→+∞ n
hn - f,
. I hn - If.
ĥ1 = h
.
This sequence is everywhere non-decreasing, and the functions .ĥn , .hn coincide
almost everywhere, so that .I ĥn = I h. As a result, the function f and the function
.fˆ coincide almost everywhere, and we can write .fˆ = f + z for some function z
which is almost everywhere equal to zero. By Steps 2 and 3, both .fˆ and z belong
to ., so that .f ∈ by Step 1.
Step 5. . contains every function .ϕ ∈ L(W ). Indeed, .ϕ can be written as the
difference of two elements of .L+ (W ).
Theorem 15.20 (Tonelli) Suppose that .ϕ is a measurable function on W such that
ϕ ≥ 0. If the iterated integral .IY (IX ϕ) exists, then .ϕ ∈ L(W ) and
.
I ϕ = IY (IX ϕ).
.
are measurable; furthermore the function .ϕn is dominated by the integrable function
max {h1 , . . . , hn }. Hence .ϕn is integrable on W , and
.
I ϕn = IY (IX ϕn ) ≤ A
.
x2 − y2
dx dy, (15.5)
(0,1)×(0,1) (x + y )
2 2 2
with integrals computed in the concrete Lebesgue sense. Prove that the correspond-
ing iterated integrals exist for either order of integration, and that they coincide for
(15.4) but differ for (15.5). Finally, prove that both integrands are not integrable.
Taking account of the last exercise, we may say that Theorem 15.20 is a partial
converse to Theorem 15.19, and that the non-negativity assumption in Theorem
15.20 is optimal.
390 15 A Functional Approach to Lebesgue Integration Theory
Let us stop for a moment. In the two main theorems of this Section, the existence
of a class of integrable functions was assumed. The main issue is now to construct
such a class and the corresponding integral.
Definition 15.17 Let X, Y be sets equipped with abstract integrals .IX and .IY . If
W = X × Y , the class .H (W ) of elementary functions on the product W consists of
.
m
.h : (x, y) ∈ W → αj χEj (x)χFj (y),
j =1
where .m ∈ N, .αj ∈ R, and the sets .Ej ⊂ X, .Fj ⊂ Y are integrable sets (i.e.
χEj ∈ L(X), .χFj ∈ L(Y )) for every .j = 1, . . . , m.
.
m
Ih =
. αj µX (Ej )µY (Fj )
j =1
m
for every .h ∈ H (W ) of the form .h(x, y) = j =1 αj χEj (x)χFj (y). Of course
.µX (Ej ) = IX χEj and .µY (Fj ) = IY χFj .
Levi’s Theorem. But the assumption .hn . 0 (everywhere) implies .IX hn (·, y) .
for every .y ∈ Y by Levi’s Theorem, and the claim is proved.
Once the elementary integral on .W = X × Y has been defined, our abstract
machinery yields a class .L(W ) of integrable functions and an associated integral I
which satisfies all the assumptions of Fubini’s Theorem.
Definition 15.19 For every .p > 0, the set .Lp = Lp (X) is defined to be the set of
all measurable functions f for which
I |f |p =
. |f |p dµ ∈ R.
X
we see that .f + g ∈ Lp . The fact that any real multiple of f belongs to .Lp is trivial,
and the proof is complete.
Definition 15.20 The standard norm of .Lp (X) is defined by
1/p 1/p
. *f *p = I |f |p = |f |p dµ .
X
The fact that .* · *p is actually a norm follows from two fundamental inequalities.
Theorem 15.22
(a) (Hölder’s inequality) If .f ∈ Lp and .g ∈ Lq for some numbers .p > 1, .q > 1
such that
1 1
. + = 1,
p q
then
. |fg| dµ ≤ *f *p *g*q .
X
*f + g*p ≤ *f *p + *g*p .
.
A = *f *p ,
. B = *g*q .
392 15 A Functional Approach to Lebesgue Integration Theory
|f | |g|
F =
. , G= ,
A B
! !
so that . X F p dµ = 1 = X Gq dµ. If .x ∈ X is such that .0 < F (x) < +∞ and
.0 < G(x) < +∞, then there exist real numbers s and t such that
F (x) = es/p ,
. G(x) = et /q .
s
+t es et
e
. p q ≤ + .
p q
It follows that
F (x)s G(x)q
F (x)G(x) ≤
. + .
p q
We integrate this inequality and get (a). The proof of (b) follows easily from (a).
Indeed, let us write
By Hölder’s inequality,
1/p 1/q
. |f ||f + g|p−1 dµ ≤ |f |p dµ |f + g|(p−1)q dµ .
X X X
Theorem 15.23 (Riesz-Fischer) The space .Lp (X) is complete.
Proof Let .{ϕn }n be a Cauchy sequence in .Lp . It is sufficient to prove that some
subsequence converges in .Lp , since the whole sequence will then converge to the
same limit. We first find indices .n1 < n2 < n3 < . . . of positive integers such that
* * 1
. * ϕn − ϕnk * < k , k = 1, 2, . . . .
k+1
2
15.11 Spaces of Integrable Functions 393
Hence the series . ∞
k=1 |ϕnk+1 − ϕnk | converges almost everywhere. Indeed,
*N *
* *
N
* *
N
* ϕ n − ϕ n * * ϕn − ϕn * < 1
.* k * < <1
* k+1
* k+1 k p
2k
k=1 p k=1 k=1
N
. ϕnk+1 − ϕnk = ϕnN+1 − ϕn1 .
k=1
This means that the sequence . ϕnk k has a limit .ϕ almost everywhere. For any fixed
k, the function .ϕnj −ϕnk approaches .ϕ −ϕnk almost everywhere as .j → +∞. Since
* *
. *ϕn − ϕn * < 1 , k = 1, 2, . . .
j k p
2k
every index k,
∞
|ψk | ≤ |ψ1 | +
. |ψk+1 − ψk | ∈ Lp (X).
k=1
Theorem 15.25 (Density of Elementary Functions) The collection of elementary
functions H is dense in .Lp (X).
394 15 A Functional Approach to Lebesgue Integration Theory
Mp if x ∈ E
. |h(x)|p ≤
|h(x)| if x ∈
/ E.
In other words,
and set
f (x) if x ∈ En
fn (x) =
.
0 otherwise.
Obviously .fn - f and .(f − fn )p . 0, and therefore Beppo Levi’s Theorem yields
fn → f in .Lp . Given .ε > 0 we can choose a positive integer n such that
.
ε
. *f − fn *p < .
2
p
Since .χEn ≤ χEn ≤ np f p , the function .fn is integrable by Hölder’s inequality:
* *
* q * *
. fn dµ = χEn f dµ ≤ *χEn * *f p p .
X X q
Assume that H is dense in .L1 . Then a sequence of elementary functions .hk exists
such that .hk → fn in .L1 as .k → +∞. Replacing .hk by .h+
k we may also assume
that .hk ≥ 0. Replacing .hk by
1
. min {hn , n} = n min hk , 1 ,
n
15.11 Spaces of Integrable Functions 395
we may also assume that .|hk | ≤ n. Here we are using Stone’s axiom. We claim that
hk → fn in .Lp . Indeed,
.
p
. *fn − hk *p = |fn − hk |p dµ = |fn − hk |p−1 |fn − hk | dµ
X X
To complete the proof, it remains to show that H is dense in .L1 (X) = L(X). But
every function in L is the difference of two functions in .L+ , and we need to prove
the claim for functions .f ∈ L+ which are limits (in the norm of .L1 ) of a sequence
of functions .hn ∈ H . The natural choice for this sequence is the sequence which
defines f . Then .hn - f , .I hn - If , and
. *f − hn *1 = I (f − hn ) = If − I hn → 0
as .n → +∞.
Theorem 15.26 (Generalized Hölder’s Inequality) Suppose .1 < pj < +∞,
uj ∈ Lpj (X) for .1 ≤ j ≤ k. If
.
1 1 1
. + + ··· + = 1,
p1 p2 pk
Proof Exercise.
Theorem 15.27 (Interpolation Inequality) Suppose that .1 ≤ p < q < r < +∞,
1 1−λ λ
. = +
q p r
396 15 A Functional Approach to Lebesgue Integration Theory
. *u*p ≤ *u*1−λ
p *u*λr .
Proof Exercise.
Theorem 15.28 Suppose that .p ≥ 1 and .{un }n is a sequence in .Lp (X) such that
(a) .*un *p → *u*p as .n → +∞,
(b) .un → u almost everywhere as .n → +∞.
Then .*un − u*p → 0 as .n → +∞.
Proof We have almost everywhere
0 ≤ 2p |un |p + |u|p − |un − u|p .
.
2p+1
. |u|p dµ ≤ lim inf 2p |un |p + |u|p − |un − u|p dµ
X n→+∞ X
Proof The assumptions imply .|u*p ≤ c. Fix any .ε > 0. By homogeneity, there
exists a number .C(ε) > 0 such that for every a, b in .R,
. |a + b|p − |a|p − |b|p ≤ ε|a|p + C(ε)|b|p .
≤ (2c)p ε + C(ε)|u|p dµ
X
− lim sup |un |p − |un − u|p − |u|p dµ,
n→+∞ X
which means
. lim sup |un |p − |un − u|p − |u|p dµ ≤ (2c)p ε.
n→+∞ X
The formal case .p = ∞ of .Lp gives rise to a very different space of functions.
Definition 15.21 Suppose that .f : X → [0, +∞] be a measurable function. We
consider the set
−1
.S = α ∈ R µ(g ((α, +∞])) = 0 .
The inequalities of Hölder and Minkowski can be extended to the .L∞ -case without
much pain. For example,
Theorem 15.30 If .f ∈ L1 (X) and .g ∈ L∞ (X), then .fg ∈ L1 (X) and .*fg*1 ≤
*f *1 *g*∞ .
Proof For almost every .x ∈ X we have .|f (x)g(x)| ≤ *f (x)**g*∞ . The
conclusion follows by integration of this inequality.
398 15 A Functional Approach to Lebesgue Integration Theory
If .f % > 0, then .ω = (f (a), f (b)). If .f % < 0, then .ω(f (b), f (a)). In both cases
the claim is proved.
15.13 Changing Variables in Multiple Integrals 399
2. The induction step. We assume the result has been proved in dimension .n−1. Let
.a ∈ be a fixed point. Since f is a diffeomorphism, .(∂1 f1 (a), . . . , ∂n fn (a)) =
0. Without loss of generality we may assume that .∂n fn (a) = 0. By the Implicit
Function Theorem, there exist .r > 0, an open set .U ⊂ such that .a ∈ U , an
open set .V ⊂ Rn−1 , and a function .β ∈ C 1 (V × (fn (a) − r, fn (a) + r)) such
that for .|t − fn (a)| < r there results
. {fn = t} ∩ U = (x % , β(x % , t)) x % ∈ V . (15.6)
We now split3
f = (f % , fn )
.
h(x % , xn ) = (x % , fn (x % , xn ))
t (x % ) = f % (x % , β(x % , t))
g(x % , t) = (t (x % ), t).
We add the assumption that .supp u ⊂ U . Fubini’s Theorem and the induction
hypothesis ensure that4
= dt u(y % , t) dy %
= u(y) dy.
= dx % v(x % , t) dt
3 The prime .% does not mean differentiation. It is used to group the first .n − 1 components of a
vector in .Rn .
4 All the integrals in the next equations may be computed on .Rn , since the integrand functions have
compact support.
400 15 A Functional Approach to Lebesgue Integration Theory
= v(h(x))|Jh (x)| dx
= u(y) dy.
To conclude the proof, we need to remove the condition that the support of u
be contained in U . By assumption the support of u is a compact subset of .Rn ,
and we can cover it by a finite collection of open sets .Uj which satisfy (15.6). If
.{ψj }j is a finite partition of unity subordinated to the covering .{Uj }j , we have
.u = j ψj u, and the general case follows from the linearity of the integral.
15.14 Comments
The construction of the abstract Lebesgue integral dates back to P.J. Daniell in 1918.
It has the typical elegance of formal axiomatizations, which isolate the essential
properties of the object we want to define. A short but complete reference is [1].
The clean approach of [2] is another interesting source.
To be fairly honest, many analysts are satisfied with the basic Lebesgue integral.
If this is the main purpose, the approach we present in the next chapter might
be preferable. It is nonetheless interesting that the theory of integration can be
completely developed in terms of a functional-analytic completion process: we
construct the Lebesgue integral from the Riemann integral in the same way as we
construct the real numbers from the rational ones.
References
1. G.E. Shilov, B.L. Gurevich, Integral, Measure and Derivative: A Unified Approach (Dover
Publications, New York, 1977)
2. M. Willem, Functional Analysis: Fundamentals and Applications (Birkhäuser/Springer, New
York, 2013)
Chapter 16
Measures Before Integrals
The popular approach to the Lebesgue integral is via abstract measure theory:
we first define a set function—called a measure—on a set of suitable sets—
called measurable sets, then we define measurable functions, and finally integrable
functions. The main pro of this approach is that at the end we have the highest
generality. A troublesome con is that such a construction requires a good amount of
mathematical education before it can be understood.
Our approach follows closely [3]. Historically, the Bourbaki group rejected
abstract measure theory for a long time, since they decided to focus only on Radon
measures defined as continuous linear functionals. There is some reason for doing
this, but the stiffness of Radon measures is an obstacle to several investigations
in Calculus of Variations, Geometric Measure Theory, Differential Geometry, and
so on.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 401
S. Secchi, A Circle-Line Study of Mathematical Analysis,
La Matematica per il 3+2 141, https://doi.org/10.1007/978-3-031-19738-3_16
402 16 Measures Before Integrals
Then h : X → Y is measurable.
Proof Let us write f (x) = (u(x), v(x)) for x ∈ X. Since h = ◦ f , Theorem 16.1
shows that we only need to prove that f is measurable. To this aim, we suppose that
R = (a, b) × (c, d). It follows that
Proof This follows immediately from the remark that the collection of all open
half-lines of the form (a, +∞) or (−∞, b) is a sub-basis of the standard topology
of R.
Theorem 16.4 If {fn }n is a sequence of real-valued measurable functions on X,
then infn fn , supn fn , lim infn→+∞ fn , lim supn→+∞ fn are measurable.
Proof For example,
∞
. x ∈ X sup fn (x) > a = {x ∈ X | fn (x) > a} ,
n
n=1
Exercise 16.3 Prove that any Fσ and any Gδ set are Borel sets.
Definition 16.5 Let (X, ) be a measure space. A function µ : → [0, +∞] is a
measure if and only if
1. µ(∅) = 0,
2. if {An | n ∈ N} is a sequence of measurable sets such that Ai ∩Aj = ∅ for i = j ,
then
"∞ # ∞
.µ An = µ(An ).
n=1 n=1
The space X is of finite measure if and only if µ(X) < +∞. The space X is σ -
finite
if and only if it is the countable union of measurable subsets of finite measure:
X= ∞ n=1 Xn with µ(Xn ) < +∞ for every n.
Example 16.1 (Counting Measure) For every subset A of a set X, we define
µ(A) = +∞ if A contains infinitely many elements, and µ(A) = #A if A contains
finitely many elements.
Example 16.2 (Dirac Measure) Let X be a set, and let x0 ∈ X be a fixed point. If
A ⊂ X contains x0 , we set µ(A) = 1; otherwise we set µ(A) = 0. This is the Dirac
measure concentrated at x0 .
Definition 16.6 A subset E of a measure space (X, , µ) has measure zero if and
only if E ∈ and µ(E) = 0. A property holds almost everywhere (a.e. for short)
if and only if it holds in X except for a subset of measure zero.
Theorem 16.6 Let (X, , µ) be a measure space.
(a) If {An }n is a sequence of measurable subsets such that An ⊂ An+1 for every n,
then
"∞ #
.µ An = lim µ(An ).
n→+∞
n=1
(b) If {An }n is a sequence of measurable subsets such that An ⊃ An+1 for every n,
and if µ(A1 ) < +∞, then
"∞ #
µ
. An = lim µ(An ).
n→+∞
n=1
16.1 General Measure Theory 405
Proof We may assume that µ(An ) < +∞ for every n, otherwise the conclusion is
trivial. From the identity
∞
. An = A1 ∪ (A2 \ A1 ) ∪ (A3 \ A2 ) ∪ · · ·
n=1
we derive
"∞ #
.µ An = µ(A1 ) + µ(A2 \ A1 ) + µ(A3 \ A2 ) + · · ·
n=1
= µ(A1 ) + µ(A2 ) − µ(A1 ) + µ(A3 ) − µ(A2 ) + · · ·
= lim µ(An ).
n→+∞
This proves (a). To prove (b) we set Bn = A1 \ An , so that Bn ⊂ Bn+1 for every n.
Furthermore µ(Bn ) = µ(A1 ) − µ(An ), and
∞
∞
. Bn = A1 \ An .
n=1 n=1
Simple function are the bricks with which we build integration theory. Let us start
with an approximation result.
Theorem 16.8 If f : X → R is a function, then there exists a sequence {sn }n of
simple functions such that sn (x) → f (x) for every x ∈ X. If f is measurable, then
each sn can be taken measurable. If f ≥ 0, then we may suppose that sn ≤ sn+1 for
every n.
Proof We begin with the case f ≥ 0. For any n ∈ N and i ∈ {1, 2, . . . , n · 2n } we
define
i−1 i
.Eni = x ∈ X n ≤ f (x) ≤ n
2 2
Fn = {x ∈ X | f (x) ≥ n}
n
n·2
i−1
sn (x) = χEni (x) + nχFn (x).
2n
i=1
n
s=
. ci χEi ,
i=1
n
IE (s) =
. ci µ(E ∩ Ei ).
i=1
. f dµ = f + dµ − f − dµ.
E E E
We collect several properties of the abstract integral. The easy proofs can be
provided by the reader as an exercise.
Proposition 16.1 Let X be a measurable space, and let E be a measurable subset
of X.
1. If f is measurable and bounded on E, and if µ(E) <!+∞, then f ∈ L1 (X).
2. If µ(E) < +∞ and a ≤ f ≤ b on E, then aµ(E) ≤ E f dµ ≤ bµ(E).
! !
3. If f ∈ L1 (E) and c ∈ R, then cf ∈ L1 (E), and E (cf ) dµ = c E f dµ.
! !
4. If f ∈ L1 (E), g ∈ L1 (E) and f ≤ g on E, then E f dµ ≤ E g dµ.
5. If f ∈ L1 (E), A is measurable and A ⊂! E, then f ∈ L1 (A).
6. If µ(E) = 0 and f is measurable, then E f dµ = 0.
Any measurable function induces a new measure by means of the integral. Here
is the precise statement.
Theorem 16.9 Suppose that f is a measurable function on a measurable space
(X, , µ) such that f ≥ 0 on X. Then the function
ν : A ∈ →
. f dµ
A
is a measure on X.
Proof Consider any sequence {An }n of measurable sets such that Ai ∩ Aj = ∅ for
i = j , and A = ∞n=1 An . We need to prove that
∞
ν(A) =
. ν(An ).
n=1
408 16 Measures Before Integrals
. χE dµ = µ(B ∩ E)
B
for every measurable B and from countable additivity of µ. It then follows that the
same conclusion holds for every simple function f .
In the general case, let s be a simple measurable function such that 0 ≤ s ≤ f
on X. We have
∞
∞
. s dµ = s dµ ≤ ν(An ).
A n=1 An n=1
Hence
∞
. f dµ = ν(A) ≤ ν(An ).
A n=1
To prove the reversed inequality, we may clearly restrict to the case ν(An ) ∈ R for
every n. Pick ε > 0 and two measurable simple functions s1 , s2 such that
ε
. s1 dµ > f dµ −
A1 A1 2
ε
s2 dµ > f dµ − .
A2 A2 2
Then
ν(A1 ∪ A2 ) ≥
. s1 χA1 + s2 χA2 dµ
A1 ∪A2
= s1 dµ + s2 dµ ≥ ν(A1 ) + ν(A2 ) − ε.
A1 A2
Since this holds for any ε > 0, we see that ν(A1 ∪ A2 ) ≥ ν(A1 ) + ν(A2 ). By
induction
n
ν(A1 ∪ · · · ∪ An ) ≥
. ν(Ak ) n ≥ 1,
k=1
n
and finally ν(A) ≥ ∞ n=1 ν(An ), since A ⊃ k=1 Ak . Since the other properties of
a measure are trivially satisfied, the proof is complete.
16.2 Convergence Theorems 409
Exercise 16.6 Suppose that !A and B are! measurable sets such that µ(A \ B) = 0.
If f is integrable, prove that A f dµ = B f dµ.
Theorem 16.10 Let f be a measurable function on X.
! !
(a) If f ∈ L1 (X), then |f | ∈ L1 (X), and X f dµ ≤ X |f | dµ.
(b) If g ∈ L1 (X) and |f | ≤ g on X, then f ∈ L1 (X).
. |f | dµ = |f | dµ + |f | dµ
X A B
= f + dµ + f − dµ,
A B
and the last two integrals are finite. Hence |f | ∈ L1 (X). Recalling that f ≤ |f | and
−f ≤ |f |, we see that
. f dµ ≤ |f | dµ, − f dµ ≤ |f | dµ.
X X X X
. f dµ = lim fn dµ.
E n→+∞ E
!
Proof The sequence! .n → E fn dµ is increasing, so there exists .α ∈ [0, +∞] such
that .α = limn→+∞ E fn dµ. Furthermore
α≤
. f dµ.
E
410 16 Measures Before Integrals
En = {x ∈ E | fn (x) ≥ cs(x)} ,
. n ≥ 1.
. fm dµ ≥ fn dµ ≥ c s dµ.
E En En
!
By Theorem 16.9 we may let .n → +∞ and conclude! that .α ≥ c E s dµ. Since this
holds for every .0 < c < 1, we also have .α ≥ E s dµ. Now the conclusion follows
from the arbitrariness of the simple function s.
Remark 16.1 The integrals appearing in Beppo Levi’s Theorem may well be
infinite. It should be remembered as a statement about limit of measurable functions,
without any reference to integrability properties.
Theorem 16.12 (Fatou’s Lemma) Suppose .fn ≥ 0 is a measurable function on a
set .E ∈ for every n, and let .f (x) = lim infn→+∞ fn (x) for every .x ∈ E. Then
Proof For every .x ∈ E we set .gn (x) = infm≥n supn∈N fn (x). Clearly
. 0 ≤ g1 ≤ g2 ≤ g3 ≤ . . .
gn ≤ fn
lim gn (x) = f (x)
n→+∞
Beppo Levi’s Theorem is a fundamental result for proving basic statements about
integrable functions. We just provide an example.
Proposition 16.2 Suppose that .f1 and .f2 are integrable functions on a measurable
set E. If .f = f1 + f2 , then .f ∈ L1 (E) and
. f dµ = f1 dµ + f2 dµ.
E E E
16.2 Convergence Theorems 411
Proof Suppose initially that .f1 ≥ 0 and .f2 ≥ 0. Let .{sn1 }n and .{sn2 }n be sequences
of measurable simple functions such that .sn1 → f1 , .sn2 → f2 on E. If we define
.sn = sn + sn , then
1 2
A = {x ∈ E | f (x) ≥ 0} ,
. B = {x ∈ E | f (x) < 0} .
. f1 dµ = f dµ + (−f2 ) dµ = f dµ − f2 dµ.
A A A A A
. f1 dµ = f dµ − f2 dµ,
B B B
and the conclusion follows in this case. The other cases are treated similarly.
Theorem 16.13 (Dominated Convergence Theorem) Let E be a measurable set,
and suppose that .{fn }n is a sequence of measurable functions such that
. f dµ = lim fn dµ.
E n→+∞ E
412 16 Measures Before Integrals
fn + g ≥ 0 on E,
.
or
. f dµ ≤ lim inf − fn dµ .
E n→+∞ E
This is equivalent to
We have already remarked that sets of measure zero should be completely invisible
in integration theory. Formally, we may agree that two measurable functions f and
g are equivalent if and only if the set
. {x ∈ X | f (x) = g(x)}
has measure zero in X. Then integrable equivalent functions have the same integral.
Furthermore, it would be quite natural to relax the assumptions of Beppo Levi’s
Theorem, or of the Dominated Convergence Theorem, to allow pointwise limits
almost everywhere. Unfortunately this may be troublesome, since a subset of a set
of measure zero need not be measurable. We thus introduce a reasonable definition.
Definition 16.11 A measurable space is complete if and only if the following
condition is satisfied: if E is a measurable set of measure zero and if .F ⊂ E,
then F is a measurable set (and of course the measure of F is then zero).
16.3 Complete Measures 413
∞
∞
B \A=
. (Bn \ A) ⊂ (Bn \ An ).
n=1 n=1
Since countable unions of sets of measure zero are sets of measure zero, it follows
that .E ∈ ∗ is each .En ∈ ∗ .
If, in particular, the sets .En are disjoint, then the sets .An are disjoint, so that
∞
∞
µ(E) = µ(A) =
. µ(An ) = µ(En ),
n=1 n=1
In this way our statements can take into accounts function defined almost
everywhere, without altering the conclusions. Let us prove a useful example.
414 16 Measures Before Integrals
∞
then the series .f (x) = n=1 fn (x) converges for almost every .x ∈ X, .f ∈ L1 (X),
and
∞
. f dµ = fn dµ
X n=1 X
1
. µ(An ) ≤ f dµ ≤ f dµ = 0,
n An E
which implies .µ(An ) = 0 for every n. Since .{x ∈ E | f (x) > 0} = ∞
n=1 An , it
follows that .f = 0 almost everywhere.
!
Theorem 16.17 If .f ∈ L (X) and . E f dµ = 0 for every measurable .E ⊂ X,
1
1With our original definition of measurable functions, this implication would require the completes
of the measurable space X.
16.4 Different Types of Convergence 415
!
Proof We apply the assumption to .E = {x ∈ X | f (x) ≥ 0}, so that . E f + dµ = 0.
Hence .f + = 0 almost everywhere on X. Similarly .f − = 0 almost everywhere on
X, and the proof is complete.
For a sequence of functions, pointwise convergence is often too less, while uniform
convergence is usually too much in applications. Measure Theory provides some
intermediate types of convergence which are of great convenience in mathematical
analysis. In this section we want to review some of them. As a rule, all functions
will be defined on a measurable space .(X, , µ).
Definition 16.13 A sequence of measurable functions .{fn }n converges almost
uniformly to a measurable limit f if and only if for every .ε > 0 there exists a
measurable set E such that .µ(X \ E) < ε and .fn → f uniformly on E.
Exercise 16.7 Prove that almost uniform convergence implies almost everywhere
convergence.
For a converse of the previous exercise, an additional assumption is needed.
Theorem 16.18 (Severini-Egorov) Suppose that .µ(X) < +∞. If a sequence of
measurable functions .{fn }n converges almost everywhere to f in X, then .fn → f
almost uniformly.
Proof Removing a set of measure zero, we may assume that .fn → f pointwise in
the whole X. For every .k ∈ N and .m ∈ N we define the set
1
E(m, k) =
. x ∈ X |fn (x) − f (x)| < .
n>m
k
We have .E(1, k) ⊂ E(2, k) ⊂ E(3, k) ⊂ · · · , and . m∈N E(m, k) = X. As a
consequence, to each .k ∈ N there corresponds an integer .mk ∈ N such that
" #
1 ε
µ
. x ∈ X |fn (x) − f (x)| < < k.
n>mk
k 2
If we set
1
E=
. x ∈ X |fn (x) − f (x)| < ,
k
k∈N n>mk
∞
it follows that .µ(X \ E) ≤ k=1 ε · 2−k = ε and .fn → f uniformly on E. The
proof is complete.
416 16 Measures Before Integrals
Exercise 16.8 When did we use the assumption .µ(X) < +∞ in the previous
proof?
Definition 16.14 A sequence .{fn }n of measurable functions converges in measure
to the limit f if and only if for every .ε > 0 there results
Example 16.3 Let .X = [0, +∞) with the concrete Lebesgue measure, and define
fn on X such that
.
0 if 0 ≤ x < n
fn (x) =
.
1 if n ≤ x < +∞.
Then .fn → 0 pointwise in X, but it does not converge to zero in measure. Indeed,
for .0 < ε < 1, .fn (x) > ε if and only if .x > n.
Theorem 16.19 If .fn → f in measure, then there exists a subsequence .{fnk }k
which converges to f almost everywhere. If, in addition, .µ(X) < +∞, then
convergence almost everywhere implies convergence in measure.
Proof Pick a sequence .n1 < n2 < n3 < . . . of positive integers such that the
measure of the set
1
.Ek = x ∈ X fnk (x) − f (x) >
k
and
∞
∞
E=
. Ek
m=1 k=m
we see that .f (x) = limk→+∞ fnk (x) for every .x ∈ X \ E, and the first assertion is
proved.
16.4 Different Types of Convergence 417
To complete the proof, we assume that X has finite measure and we use Theorem
16.18. For each .η > 0 there exists a measurable set E such that .µ(X \ E) < η and
.fn → f uniformly on E. Then
ε ε
. f dµ = s dµ + (f − s) dµ ≤ s dµ + ≤ Mµ(E) + .
E E E E 2 2
. f dµ = (f − s) dµ ≤ (f − s) dµ < ε.
X\X0 X\X0 X
Proof Trivially, .|fn − f | ≤ |fn | + |f |. If .X0 and .X1 are measurable sets with
.X1 ⊂ X0 , then
.
(fn − f ) dµ ≤ |fn −f | dµ+ (|fn |+|f |) dµ+ (|fn |+|f |) dµ.
X X1 X0 \X1 X\X0
Fix .ε > 0. By Proposition 16.3 and the first condition of equi-integrability, there
exists a measurable set .X0 ⊂ X of finite measure such that
ε
. (|fn | + |f |) dµ = |fn | dµ + |f | dµ < .
X\X1 X\X0 X\X0 3
By Proposition 16.3 again, we may find a number .δ > 0 such that for every
measurable .E ⊂ X, .µ(E) < δ implies
ε
. (|fn | + |f |) dµ = |fn | dµ + |f | dµ < .
E E E 3
ε
. (|fn | + |f |) dµ <
X0 \X1 3
ε
. |fn − f | dµ ≤ sup {|fn (x) − f (x)| | x ∈ X1 } µ(X1 ) < .
X1 3
Theorem 16.21 × T is the smallest monotone class which contains all elemen-
tary sets.
Proof Let M be the intersection of all monotone classes which contain E. It is clear
that M = ∅, since X × Y is a monotone class which contains E. But × T is a
monotone classes, so that M ⊂ × T . For A1 ∈ , A2 ∈ , B1 ∈ T , B2 ∈ T we
have
Ex = {y ∈ Y | (x, y) ∈ E}
.
E y = {x ∈ X | (x, y) ∈ E} .
B if x ∈ A
Ex =
.
∅ if x ∈
/ A.
fx = f (x, ·),
. f y = f (·, y).
16.5 Measure Theory on Product Spaces 421
Theorem 16.22 Let (X, , µ) and (Y, T , λ) be σ -finite measurable spaces. Sup-
pose that Q ∈ × T . If
ϕ(x) = λ(Qx ),
. ψ(y) = µ(Qy ) (16.1)
Proof We call the collection of all subsets Q of X × Y for which the conclusion
of the theorem holds.
(a) Every measurable rectangle belongs to Q. Indeed, if Q = A×B is a measurable
rectangle, then λ(Qx ) = λ(B)χA (x), µ(Qy ) = µ(A)χB (y), and both integrals
are equal to µ(A)λ(B).
(b) If Q1 ⊂ Q2 ⊂ Q3 ⊂ . . ., if Qi ∈ for every i and if Q = ∞ i=1 Qi , then
Q ∈ . Indeed, let ϕi and ψi be defined as in (16.1) with Qi instead of Q. It
follows from the countable additivity of each measure that ϕi - ϕ, ψi - ψ
pointwise as i → +∞. Beppo Levi’s Theorem yields the conclusion.
(c) If
∞ {Qi }i is a disjoint countable collection of elements of , and if Q =
i=1 Qi , then Q ∈ . Indeed, this is trivial for a finite collection, since
the characteristic function of a union of disjoint sets is just the sum of their
characteristic functions. In the general case we may use (b).
422 16 Measures Before Integrals
(d) If
∞ µ(A) < +∞, λ(B) < +∞, if A × B ⊃ Q1 ⊃ Q2 ⊃ Q3 ⊃ . . ., if Q =
i=1 Qi and Qi ∈ for each i, then Q ∈ . Indeed, we can repeat the proof
of (b) after replacing Beppo Levi’s Theorem with the Dominated Convergence
Theorem.
∞
Now, recall that X = ∞ n=1 Xn and Y = m=1 Ym for suitable disjoint sets Xn , Ym
of finite measure. For every m ∈ N, n ∈ N, we define Qmn = Q ∩ (Xn × Ym ). Let
us agree that Q ∈ M if and only if Qmn ∈ for every m and n. We have proved in
(b) and (d) that M is a monotone class. Furthermore E is contained in M by (a) and
(c). Since M ⊂ × T , it follows from Theorem 16.21 that M = × T .
We have proved that for every Q ∈ × T we have Qmn ∈ for every choice of
m and n. But Q = ∞ m,n=1 Qmn , and the sets Qmn are disjoint. Hence (c) implies
that Q ∈ . The proof is complete.
Exercise 16.10 Prove that µ × λ is indeed a measure on (X × Y, × T ), and that
it is a σ -finite measure.
Remark 16.3 We notice that we have defined the product measure µ × λ by means
of two (equal) iterated integrals. This might look strange, since the integral should
come after the measure. It would be possible to define µ × λ without any reference
to integration, and we would all agree that (µ × λ)(A × B) = µ(A)λ(B) for any
measurable rectangle; but a valid formula for any measurable subset of ×T would
be much harder to guess.
The equality of the two integrals in Theorem 16.22 suggests a further step. This
is the content of a celebrated result that we discussed from an abstract viewpoint in
Daniell’s approach to integration theory.
Theorem 16.23 (Fubini-Tonelli) Let (X, , µ) and (Y, T , λ) be σ -finite measur-
able spaces, and let f be a measurable function in × T .
(a) If 0 ≤ f ≤ +∞, and if
ϕ(x) =
. fx dλ, ψ(y) = f y dµ
Y X
. ϕ dµ = f d(µ × λ) = ψ dλ.
X X×Y Y
(b) If f is real-valued, if
ϕ ∗ (x) =
. |f |x dλ
Y
!
and if X ϕ ∗ dµ < +∞, then f ∈ L1 (X × Y ).
16.6 Measure, Topology, and the Concrete Lebesgue Measure 423
ϕ(x) =
. fx dλ, ψ(y) = f y dµ.
Y X
Proof We consider (a). The conclusion holds if f is a simple function. In the general
case, there exists a sequence {sn }n of non-negative simple functions such that sn -
f pointwise on X × Y . We associate to each sn a function ϕn in the same way as we
associated ϕ to f . Then
. ϕn dµ = sn d(µ × λ)
X X×Y
K ⊂ V1 ⊂ V1 ⊂ V0 ⊂ V0 ⊂ V .
.
Suppose .n ≥ 2 and that .Vr1 , .Vr2 , . . . , .Vrn have been chosen so that .ri < rj implies
Vri ⊂ Vrj . Among the rational numbers .r1 , . . . , rn , choose the largest one, say .ri ,
.
such that .ri < rn+1 , and the smallest one, say .rj , such that .rj > rn+1 . Arguing
again as before, we select an open set .Vrn+1 such that
By induction we construct a countable family .{Vr | r ∈ Q∩[0, 1]} of open sets, such
that .K ⊂ V1 , .V0 ⊂ V , each .Vr has compact closure, and .s > r implies .Vs ⊂ Vr .
Define
r if x ∈ Vr
fr (x) =
.
0 otherwise,
1 if x ∈ Vs
. gs (x) =
s otherwise,
and .f = sup {fr | r ∈ Q ∩ [0, 1]}, .g = inf {gs | s ∈ Q ∩ [0, 1]}. It is easy to check
that f is lower semicontinuous and that g is upper semicontinuous. Clearly .0 ≤
f ≤ 1, .f (x) = 1 if .x ∈ K, and the support of f is contained in V . To conclude, we
prove that .f = g, getting the continuity of f .
The inequality .fr (x) > gs (x) is possible only if .r > s, .x ∈ Vr , and .x ∈ / Vs .
But .r > s implies .Vr ⊂ Vs . Hence .fr ≤ fs for all r and s, which yields .f ≤ g.
Suppose that .f (x) < g(x) for some x. There exist rational numbers r and s such
that .f (x) < r < s < g(x). The inequality .f (x) < r implies .x ∈
/ Vr . The inequality
.g(x) > s implies .x ∈ Vs . This is clearly a contradiction to the property that .s > r
implies .Vs ⊂ Vr .
We conclude that .f = g, and the proof is complete.
Theorem 16.25 (Finite Partition of Unity) Suppose .V1 , . . . , Vn are open subsets
of a locally compact Hausdorff space X. If K is compact and .K ⊂ V1 ∪ · · · ∪ Vn ,
then there exist functions .hi Vi , .i = 1, . . . , n, such that . ni=1 hi (x) = 1 for every
.x ∈ K.
16.6 Measure, Topology, and the Concrete Lebesgue Measure 425
Proof For every .x ∈ K there exists an open neighborhood .Wx with compact closure
.Wx ⊂ Vi for some index i which depends on x. By compactness, we select points
.x1 , . . . , xn such that .K ⊂ Wx1 ∪ · · · ∪ Wxn . For .1 ≤ i ≤ n, let
Hi =
. Wxj Wxj ⊂ Vi .
By Theorem 16.24, there exist functions .gi such that .Hi gi Vi . We now define
h1 = g1
.
h2 = (1 − g1 )g2
..
.
hn = (1 − g1 )(1 − g2 ) · · · (1 − gn−1 )gn .
n
. hi = 1 − (1 − g1 ) · · · (1 − gn ).
i=1
2No continuity is assumed here. The term positive means that .f ∈ Cc (X) and .f (X) ⊂ [0, +∞)
imply .f ≥ 0.
426 16 Measures Before Integrals
holds for every open set E and for every measurable set E of finite measure.
(e) The measure .µ is complete.
Proof The proof is rather long, so we state some general principle: K will always
denote a compact set of X, and V an open set of X.
The uniqueness of the measure .µ is rather easy. Indeed, properties (c) and (d)
imply that .µ is determined by its values on compact sets. Let .µ1 and .µ2 two
measures for which the theorem holds, and assume that .µ1 (K) = µ2 (K) for
every K. Fix any such K, and let .ε > 0. By (b) and (c) there is V such that
.µ2 (V ) < µ2 (K) + ε. Theorem 16.24 provides a function f such that .K f V ,
hence
µ1 (K) =
. χK dµ1 ≤ f dµ1 = f = f dµ2
X X X
Letting .ε → 0 we find .µ1 (K) ≤ µ2 (K). Exchanging .µ1 and .µ2 yields .µ1 (K) =
µ2 (K). Hence uniqueness of .µ is proved.
We now construct . and .µ. For every open V , define
for every .E ⊂ X. This definition agrees with the previous one if E is open. But
beware! Our set function .µ is not countably additive on the whole .2X , and for this
reason we need to introduce a good .σ -algebra. Let .F be the collection of all .E ⊂ X
such that .µ(E) is finite and
Our .σ -algebra is
The rest of the proof consists in showing that . and .µ have the desired properties.
Evidently .µ is monotone, i.e. .µ(A) ≤ µ(B) if .A ⊂ B, and .µ(E) = 0 implies
.E ∈ F and then .E ∈ . Hence .µ is a complete measure, and (c) holds true by
definition.
16.6 Measure, Topology, and the Concrete Lebesgue Measure 427
We first prove that .µ(V1 ∪ V2 ) ≤ µ(V1 ) + µ(V2 ) if .V1 and .V2 are open sets. Pick
any .g V1 ∪ V2 . There exists a partition of unity consisting of two functions .h1 ,
.h2 such that .hi Vi and .h1 (x) + h2 (x) = 1 for all .x ∈ supp g. Hence .hi g Vi ,
Since g was arbitrary, it follows that .µ(V1 ∪ V2 ) ≤ µ(V1 ) + µ(V2 ). Let us now
consider the general case, and we assume without loss of generality that .µ(Ei )
is finite for every index i (otherwise the inequality is trivial). Let .ε > 0, and
consider open sets .Vi ⊃ Ei such that
ε
µ(Vi ) ≤ µ(Ei ) +
. .
2i
We set .V = ∞ i=1 Vi and we choose any .f V . This says that f has compact
support, so that .f V1 ∪ · · · ∪ Vn for some n. Iterating the previous case we get
∞
≤ µ(Ei ) + ε.
i=1
Since f was arbitrary and . ∞
i=1 Ei ⊂ V , we may conclude that
"∞ # ∞
µ
. Ei ≤ µ(V ) ≤ µ(Ei ) + ε.
i=1 i=1
Of course this claim proves part (b) of the theorem. So, let .K f and .0 < α <
1. Define .Vα = {x ∈ X | f (x) > α}. It is clear that .K ⊂ Vα and that .αg ≤ f as
soon as .g Vα . As a consequence
1
µ(K) ≤ µ(Vα ) − sup {g | g Vα } ≤
. f.
α
428 16 Measures Before Integrals
Therefore .f ≤ µ(V ) < µ(K) + ε, and the proof of the claim is complete.
Claim 3. Every open set V satisfies (16.2). Hence .F containers every open set of
finite measure. Indeed, we fix a real number .α such that .α < µ(V ), and a function
.f V such that .α < f . If W is any open set such that .K = supp f ⊂ W ,
then .f W and .f ≤ µ(W ). Hence .f ≤ µ(K), and this argument provides
a compact .K ⊂ V such that .α < µ(K), so that (16.2) holds for V .
Claim 4. Suppose that .Ei ∈ F for .i ∈ N, and that .Ei ∩ Ej = ∅ when .i = j .
If .E = ∞ i=1 E i , then .µ(E) = ∞
i=1 µ(Ei ). If, in addition, .µ(E) is finite, then
also .E ∈ F .
We first prove that .µ(K1 ∪ K2 ) = µ(K1 ) + µ(K2 ) if .K1 and .K2 are disjoint
compact sets. Let .ε > 0. Urysohn’s Lemma provides .f ∈ Cc (X) such that
.f ≡ 1 on .K1 , .f ≡ 0 on .K2 , and .0 ≤ f ≤ 1. By Claim 2 there exists a function
g such that .K1 ∪ K2 g and .g < µ(K1 ∪ K2 ) + ε. We remark that .K1 fg
and .K2 (1 − f )g. The linearity of . implies
ε
µ(Hi ) > µ(Ei ) −
. .
2i
The set .Kn = H1 ∪ · · · ∪ Hn is compact, and we deduce by induction that
n
n
µ(E) ≥ µ(Kn ) =
. µ(Hi ) > µ(Ei ) − ε.
i=1 i=1
Letting first .n → +∞ and then .ε → 0, we see that .µ(E) ≥ ∞i=1 µ(Ei ), and the
conclusion follows from Claim 1. To prove that .E ∈ F we recall the definition
of convergent sequence, and deduce that
N
.µ(E) ≤ µ(Ei ) + ε
i=1
for some positive integer N. Hence .µ(E) ≤ µ(KN ) + 2ε, so that E satisfies
(16.2). The claim is now proved.
Claim 5. If .E ∈ F and .ε > 0, there exist a compact set K and an open set V
such that .K ⊂ E ⊂ V and .µ(V \ K) < ε.
16.6 Measure, Topology, and the Concrete Lebesgue Measure 429
Indeed, our definitions show that there are .K ⊂ E and .V ⊃ E such that
ε ε
µ(V ) −
. < µ(E) < µ(K) + .
2 2
Now, .V \ K is open and .V \ K ∈ F by Claim 3. Hence Claim 4 yields
E ⊂ (E ∩ K) ∪ (V \ K)
.
implies
which implies .E ∈ F . The claim is proved, and part (d) of the theorem as well.
430 16 Measures Before Integrals
Claim 9. .µ is a measure on .. Indeed, this follows! at once from Claim 4 and 8.
Claim 10. .µ represents . in the sense that .f = X f dµ for every .f ∈ Cc (X).
This is part (a) of the theorem. !
Indeed, we observe that we may prove the inequality .f ≤ X f dµ, since by
linearity
and the equality follows. So, we call K the support of .f ∈ Cc (X), and let .[a, b]
be an intervals which contains the range of f . For every .ε > 0 we choose points
.y0 , .y1 , . . . , .yn such that .yi − yi−1 < ε and .y0 < a < y1 < . . . < yn = b. We
ε
µ(Vi ) < µ(Ei ) +
.
n
and such that .f (x) < yi + ε for
every .x ∈ Vi . We introduce a partition
n of unity
{hi }i such that .hi Vi and . ni=1 hi = 1 on K. Hence .f =
. i=1 f hi and
Claim 2 yields
" n #
n
µ(K) ≤
. hi = hi .
i=1 i=1
Observing that .hi f < (yi + ε)hi and that .yi − ε < f (x) for .x ∈ Ei , we have
n
n
f =
. (hi f ) ≤ (yi + ε)hi
i=1 i=1
n
n
= (|a| + yi + ε) hi − |a| hi
i=1 i=1
n ε
≤ (|a| + yi + ε) µ(Ei ) + − |a|µ(K)
n
i=1
n
ε
n
= (yi − ε)µ(Ei ) + 2εµ(K) + (|a| + yi + ε)
n
i=1 i=1
≤ f dµ + ε (2µ(K) + |a| + b + ε) .
X
16.6 Measure, Topology, and the Concrete Lebesgue Measure 431
!
The arbitrariness of .ε > 0 proves that .f ≤ X f dµ. The theorem is completely
proved.
The Riesz Representation Theorem will be our access point to the concrete
Lebesgue measure in .Rn . Although our approach might be considered rather
abstract, it has some advantages over the usual approach via an outer measure and a
Carathéodory completion.
Definition 16.21 A measure .µ defined on the .σ -algebra of Borel sets in a locally
compact Hausdorff space X is called a Borel measure on X.
Definition 16.22 A Borel set E is outer regular if
Finally, the measure .µ is regular if every Borel set is both inner and outer regular.
Remark 16.4 An inspection of Theorem 16.26 shows that the measure induced by
the positive linear functional . is not regular, in general. Indeed, the inner regularity
holds for open sets and for Borel sets of finite measure.
Definition 16.23 A set E in a topological space is .σ -compact if E is a countable
union of compact sets.
We can now show that .σ -compactness fills the gap of inner regularity.
Theorem 16.27 Let X be a locally compact, .σ -compact Hausdorff space, and let
, .µ be defined according to Theorem 16.26.
.
(i) If .E ∈ and .ε > 0, there exist a closed set F and an open set V such that
.F ⊂ E ⊂ V and .µ(V \ F ) < ε.
that
ε
.µ (Vn \ (Kn ∩ E)) <
2n+1
∞
for .n ∈ N. We define .V = ∞i=1 Vi so that .V \ E ⊂ i=1 (Vn \ (Kn ∩ E)) and
.µ(V \E) < ε/2. The very same construction applies to .X\E in place of E, yielding
432 16 Measures Before Integrals
an open set .W ⊃ X \ E such that .µ(W \ (X \ E)) < ε/2. With .F = X \ W we get
F ⊂ E and .E \ F = W \ (X \ E). Conclusion (i)follows at once.
.
. f dλ = f dµ
X X
λ(V ) = lim
. gn dλ = lim gn dµ = µ(V ).
n→+∞ X n→+∞ X
For a generic Borel set E, we fix any .ε > 0. By Theorem 16.27, there exist a closed
set F and an open set V such that .F ⊂ E ⊂ V such that .µ(V \ F ) < ε. In particular
.µ(V ) ≤ µ(F ) + ε ≤ µ(E) + ε. We apply the previous considerations to the open
set .V \ F , and we get .λ(V \ V ) < ε and thus .λ(V ) < λ(E) + ε. We conclude that
Hence .|λ(E) − µ(E)| < ε for every .ε > 0, and therefore .µ(E) = λ(E). The proof
is complete.
Since continuous functions with compact support appear as the basic ingredient
of the Riesz Representation Theorem, we investigate their role in Measure Theory.
16.6 Measure, Topology, and the Concrete Lebesgue Measure 433
∞
f (x) =
. tn (x), x ∈ X.
n=1
Fix an open set V such that .A ⊂ V and .V is compact. There exist compact sets .Kn
and open sets .Vn such that .Kn ⊂ Tn ⊂ Vn and .µ(Vn \ Kn ) < ε/2n . By Urysohn’s
Lemma there are functions .hn such that .Kn hn Vn . We define
∞
hn (x)
g(x) =
.
2n
n=1
To remove
the boundedness condition on f , we set .Bn = {x ∈ X | |f (x)| > n},
so that . ∞
n=1 n = ∅. Hence .µ(Bn ) → 0 as .n → +∞. But f coincides with the
B
bounded function .(1 − χBn )f except on .Bn , and the proof of the first statement of
the theorem is complete in the general case.
434 16 Measures Before Integrals
t if − R ≤ t ≤ R
ϕ(t) =
.
Rt
|t | if |t| > R.
The function .ϕ is continuous, and if g satisfies the first statement of the theorem,
then .g1 = ϕ ◦ g satisfies the second statement as well. The proof is complete.
From now on, we want to construct the concrete Lebesgue measure using Theorem
16.26 as a starting point. This allows us to deduce some additional (regularity)
properties of the Lebesgue measure almost for free.
Due to some possible conflict with the index of several sequences, we will denote
by .k ≥ 1 the dimension of our basic Euclidean space .Rk . Let us recall that a k-cell
is any set of the form
W = x = (x1 , . . . , xk ) ∈ Rk αi < xi < βi , 1 ≤ i ≤ k
.
where .αi and .βi are given real numbers. Sometimes we can replace some or all
inequality signs .< by .≤: we already suspect that the difference will be a set of
measure zero. Finally, recall that the volume of the k-cell W is defined to be
(
k
. Vol(W ) = (βi − αi ) .
i=1
For our purposes, it will be better to replace the basic spherical neighborhood
B(a, δ) by a box. More precisely, for .a = (a1 , . . . , ak ) ∈ Rk and .δ > 0, we define
.
Q(a, δ) = {x | ai ≤ xi < ai + δ, 1 ≤ i ≤ k} .
.
Definition 16.24 For .n ∈ N we define .Pn as the set of all points .x ∈ Rk whose
coordinates are integral multiples of .2−n , and we define .n as the collection of all
.2
−n -boxes with corners at points of .P .
n
Proof Let V be an open set. Every point .x ∈ V lies in an open ball which lies, in
turn, in V . Hence .x ∈ Q ⊂ V for some suitable Q which belongs to some .n .
Equivalently, V is the union of all boxes which lie in V and which belong to some
.n .
From this collection of boxes we first select those boxes belonging to .1 , and
we remove those in .2 , 3 , . . . which lie in any of the selected boxes. From the
remaining collection, we select those boxes of .2 which lie in V , and we remove
those in .3 , 4 , . . . which lie in any of the selected boxes. This procedure provides
a countable collection of disjoint boxes in .1 ∪ 2 ∪ 3 ∪ · · · whose union is V .
The proof is complete.
The next theorem defines the concrete Lebesgue measure in .Rk .
Theorem 16.31 There exists a positive complete measure m defined on a .σ -algebra
M of .Rk , with the following properties:
.
Proof Our proof, as we said, constructs .M and m via the Riesz Representation
Theorem. Clearly enough, we need a positive linear functional to begin with. This
functional is precisely the Riemann (or Cauchy) integral in .Rk , which was already
studied in Sect. 15.1. For the reader’s sake we recall here the basic ideas.
Let .f ∈ Cc (Rk ), and define for .n ∈ N the functional
1
n f =
. {f (x) | x ∈ Pn } ,
2nk
where .Pn was introduced in Definition 16.24. Let W be an open k-cell containing
the support of f . By uniform continuity, there exist an integer N and functions g,
h with support in W , such that (i) g and h are constant on each box of .N , (ii)
.g ≤ f ≤ h, (iii) .h − g < ε.
N g = n g ≤ n f ≤ n h = N h
.
which implies that the limit .f = limn→+∞ n f exists (as a finite real number).
As a simple exercise, the reader can prove that . is a positive linear operator on
.Cc (R ). The .σ -algebra .M and the measure m are now defined according to Theorem
k
gr =
. gr dm → m(W )
Rk
by Beppo Levi’s Theorem, observing that .gr - χW . Thus .m(W ) = Vol(W ) for
every open k-cell W . Since any k-cell is the intersection of a decreasing sequence
of open k-cells, the proof of (a) is complete.
Let us now remark what follows: if .λ is a Borel measure on .Rk and .λ(E) = m(E)
for every box E, then the same equality holds for every open set E. This is indeed
a consequence of Theorem 16.30. Once this is established, it follows from Theorem
16.28 that the equality holds for every Borel set, since .λ and m are regular.
Consider statement (c). Let .x ∈ Rk and define .λ(E) = m(E + x). Clearly .λ is a
measure, and (a) implies that .λ coincides with m on all boxes, and thus on all Borel
sets: this means that .m(E) = m(E + x). The same equality holds for every .E ∈ M
because of (b). Hence (c) is proved.
Finally, suppose that .µ satisfies the hypotheses of (d). Let .Q0 be a 1-box, and set
.c = µ(Q0 ). Since .Q0 is the union of .2
nk disjoint .2−n -boxes which are translates of
for every .2−n -box Q. Theorem 16.30 implies that .µ(E) = cm(E) for all open sets
E of .Rk , and the proof of (d) is complete.
Remark 16.5 The symbolism for the concrete Lebesgue measure m is very rich.
Firstly, it may be useful to write .mk in order to denote the dimension of the
Euclidean space. But sometimes .λk is preferred3 to .mk . Some books use .Lk , and
even .|E| to denote the Lebesgue measure of E. When integrals come into play, the
pedantic notation
. f dmk
Rk
is often replaced by
. f (x) dx,
Rk
where x is a dummy variable. However we need to point out that the bad habit of
calling .dx the Lebesgue measure, which occurs in several Calculus texts.
It is interesting to show that non-measurable sets exist for the concrete Lebesgue
measure.
Theorem 16.32 Every set of positive Lebesgue measure contains a non-
measurable subset.
This is indeed a corollary of a more general result.
Theorem 16.33 If .A ⊂ R and if every subset of A is Lebesgue measurable, then
m(A) = 0.
.
Proof The basic tool in the proof is the structure of .R as a group relative to addition.
Consider .Q as a subgroup of .R, and introduce an equivalence relation as follows:
.x ∼ y if and only if .x − y ∈ Q. For .x ∈ R, we write .[x]∼ = {y ∈ R | y ∼ x},
the equivalence class of x. Using the Axiom of Choice, we construct a set E which
contains exactly one point from each equivalence class of .Q in .R.
Claim 1. If .r ∈ Q, .s ∈ Q and .r = s, then .(E + r) ∩ (E + s) = ∅. Indeed, suppose
.x ∈ (E + r) ∩ (E + s). Then .x = y + r = z + s for some .y ∈ E, .z ∈ E, .y = z.
But .M(K + r) = m(K), hence .m(K) = 0. Since this holds forevery compact
K ⊂ At , we deduce that .m(At ) = 0. Claim 2 shows that .A = {At | t ∈ Q}.
.
Since .Q is a countable set, we conclude that .m(A) = 0, and the proof is complete.
Lusin’s theorem is a powerful result that allows us to approximate different classes
of functions by means of continuous functions.
438 16 Measures Before Integrals
b b
. h≤ f dx.
a a
On the other hand, by definition of the upper envelope of f , there exists a sequence
{ψn }n of step functions such that .ϕn . f . Since f is bounded on .[a, b], The
.
b b b
. h dx = lim ϕn dx ≥ f dx.
a n→+∞ a a
b b
. f dx = h dx.
a a
b b
. f dx = g dx.
a a
!b
It follows that f is R-integrable on .[a, b] if and only if . a (h − g) dx = 0. Since
.h ≥ g, we deduce that this happens if and only if .h = g a.e. on .[a, b]. Recalling (a)
above, this is equivalent to the fact that the set of points at which f is discontinuous
has measure zero. The proof is complete.
In the whole section, we will be working on the measurable space .X = RN with the
standard Lebesgue measure.4
Our aim is to introduce a general technique for regularizing integrable functions.
As a by-product, we will provide a compactness result in .Lp . Let us start with some
notation.
Definition 16.25 Let . be an open subset of .RN , .N ≥ 1. We write
D() = u ∈ C ∞ () the support of u is a compact subset of .
.
4The choice of N as the fixed dimension of the Euclidean space allows us to use freely the index
n.
440 16 Measures Before Integrals
N
|α| =
. αj ,
j =1
e1/x if x < 0
f (x) =
.
0 if x ≥ 0
belongs to .C ∞ (R). Hint: show by induction that for every .n ∈ R and every .x < 0
there results .D n f (0) = 0 and .D n f (x) = Pn (1/x)e1/x , where .Pn is a polynomial.
Definition 16.26 (Standard Mollifiers) Let .! : RN → R such that
⎧
⎪
1
⎪
⎪ e |x|2 −1
⎨ if |x| < 1
!(x) =
.
! 1
|x|2 −1 dx
⎪
⎪ B(0,1) e
⎪
⎩0 if |x| ≥ 1.
The standard mollifiers are the functions .!n : RN → R such that .!n (x) = nN !(nx)
for every .x ∈ RN and every .n ∈ N.
.!n ≥ 0 and .
RN !n (x) dx = 1.
Since we will often need to restrict functions to relatively compact subsets of ., we
introduce a new space.
Definition 16.27 (Local Lebesgue Spaces) Let . be an open subset of .RN , and
let .ω be such that .ω is open and .ω is a compact subset of .. For brevity, we will
write .ω ⊂⊂ .
For every .1 ≤ p < ∞ we define
Lloc () = u ∈ R u|ω ∈ Lp (ω) for every ω ⊂⊂ .
p
.
16.7 Mollifiers and Regularization 441
Although local Lebesgue spaces are not normed spaces,5 yet we introduce a
definition of convergent sequences.
p p
Definition 16.28 (Convergence in .Lloc ) A sequence .{un }n in .Lloc () converges
to u if and only if for every .ω ⊂⊂
Here comes the most useful tool of Harmonic Analysis: the convolution. We
consider a particular case, which however is sufficient for our purposes.
Definition 16.29 (Convolution in Local Lebesgue Spaces) Suppose .u ∈ L1loc ()
and .v ∈ Cc (RN ) are such that
1
. supp v ⊂ B 0, .
n
by
v ∗ u(x) =
. v(x − y)u(y) dy = v(y)u(x − y) dy.
B(0,1/n)
1
. supp v ⊂ B 0, .
n
5 The intuitive reason is that there is no norm that can take into account all possible sets .ω ⊂⊂
at the same time.
442 16 Measures Before Integrals
D α (v ∗ u) = (D α v) ∗ u.
.
Proof We prove the result under the additional assumption .|α| = 1. The general
case follows by induction on .|α|. Fix .x ∈ n : there exists .r > 0 such that6
.B(0, r) ⊂ n . Hence
1
ω = B x, r +
. ⊂⊂ ,
n
D α (v ∗ u)(x) =
. D α v(x − y)u(y) dy = (D α v) ∗ u(x).
ω
Theorem 16.37 (Continuity of Translations) Let .ω ⊂⊂ .
(a) If .u ∈ C(), then
. lim sup τy u(x) − u(x) = 0.
y→0 x∈ω
p
(b) If .u ∈ Lloc () for some .1 ≤ p < ∞, then
* *
. lim *τy u − u*p = 0.
y→0
Proof
(a) Choose an open set U such that .ω ⊂⊂ U ⊂⊂ . Since u is uniformly
continuous on U , the conclusion follows immediately from (16.4).
(b) Pick .ε > 0, and choose now an open set U such that .ω ⊂⊂ U ⊂⊂ . By
Theorem 16.34 a function .v ∈ Cc (U ) exists such that .*u − v*Lp (U ) ≤ ε. By
(a) there exists .0 < δ < d(ω, ∂U ) such that, if .|y| < 1/n, then
. sup τy u(x) − u(x) ≤ ε.
x∈ω
where .mN denotes the Lebesgue measure in .RN , as usual. Since .ε > 0 is
arbitrary, the proof is complete.
Theorem 16.38 (Regularization Theorem)
(a) If .u ∈ C(), then .{!n ∗ u}n converges uniformly to u on every compact subset
of ..
p
(b) If .u ∈ Lloc (), .1 ≤ p < ∞, then .{!n ∗ u}n converges to u in .Lp ().
Proof
(a) We claim that, for .n ∈ N sufficiently large,
. sup |!n ∗ u(x) − u(x)| ≤ sup sup τy u(x) − u(x) . (16.4)
x∈ω |y|< n1 x∈ω
and (16.4) is proved. The conclusion follows from the continuity of translations.
(b) We claim that, for every .n ∈ N sufficiently large,
* *
. *!n ∗ u − u*Lp (ω) ≤ sup *τy u − u*Lp (ω) . (16.5)
|y|<1/n
444 16 Measures Before Integrals
and (16.5) follows. The conclusion follows from the continuity of translations.
As we promised, the convolution with mollifiers allows us to approximate integrable
functions with smooth functions.
Theorem 16.39 (Smooth Functions Are Dense in Lebesgue Spaces) If .1 ≤ p <
∞, then .D() is dense in .Lp ().
Proof Theorem 16.34 ensures the density of .Cc () in .Lp (). Fix .u ∈ Cc () and
an open set .ω such that .supp u ⊂⊂ ω ⊂⊂ . Taking n sufficiently large, the support
of .un = !n ∗ u is contained in .ω, and .un ∈ C ∞ (RN ). It follows that .un ∈ D(),
and the conclusion follows from Theorem 16.38.
It is not too difficult to convince ourselves that we cannot approximate globally
a continuous function with a smooth function, or equivalently that the uniform
convergence of .!n ∗ u to u in part (a) of Theorem 16.38 is optimal.
On the other hand, part (b) extends to the case . = RN .
Theorem 16.40 Let .1 ≤ p < ∞. If .u ∈ Lp (RN ), then .*!n ∗ u*p ≤ *u*p , and
.!n ∗ u → u in .L (R ).
p N
= dy |u(y)|p !n (x − y) dx
RN RN
= |u(y)|p dy.
RN
This proves the first part of the theorem. Let now .u ∈ Lp (RN ) and .ε > 0. By
Theorem 16.34 there exists .v ∈ Cc (RN ) such that .*v − u*p ≤ ε. By Theorem
16.38, .!n ∗ v → v in .Lp (RN ). Fix .m ∈ N such that, for every .n ≥ m, there results
.*!n ∗ v − v*p ≤ ε. For these values of n,
Define
.U = x ∈ R d(x, ω) < 1
N
⊂⊂ .
n
*u*L1 (U ) ≤ c2 |x − y|.
F is also relatively compact in Lp (ω).7 Now (16.6) implies the existence of a finite
cover of F|ω in Lp (ω) by balls of radius 2ε. We finally use assumption (b) to ensure
the existence of a finite cover of F in Lp () by balls of radius 3ε. We have thus
proved that F is totally bounded in the complete metric space Lp (), hence its
closure is relatively compact by Theorem 13.78.
Let us consider a .σ -finite measurable space .(X, ) together with a measure .µ, and
let .f ∈ L1 (X) be a non-negative function. It is easy to check that
ν : A ∈ →
. f dµ
A
7 We have proved that the space of bounded continuous functions on ω is continuously embedded
into Lp (ω).
16.9 The Radon-Nykodim Theorem 447
µ(A) = 0 ⇒ ν(A) = 0.
.
The main result of this section is the following description of absolutely continuous
measures.
Theorem 16.42 Let .(X, ) be a .σ -finite measurable space with measure .µ. If .ν
! respect to .µ, then there
is a finite measure on .(X, ), absolutely continuous with
exists a measurable function f on X such that .f ≥ 0, . X f dµ < ∞ and
ν(E) =
. f dµ for every E ∈ .
E
Lemma 16.1 There exists a set .E ∈ such that .ϕ + (X) = ϕ(E). Moreover
ϕ(A) ≥ 0 if A ⊂ E
.
ϕ(A) ≤ 0 if A ⊂ X \ E.
1
ϕ + (X) ≥ ϕ(En ) ≥ ϕ + (X) −
. for n ≥ 1.
2n
If .A ⊂ En , writing .ϕ(En ) = ϕ(A) + ϕ(En \ A), we see that
1
ϕ + (X) −
. ≤ ϕ(En ) ≤ ϕ(A) + ϕ + (X).
2n
448 16 Measures Before Integrals
satisfies .ϕ(E) = ϕ + (X). To prove the second part, we notice that if .A ⊂ E, then
or .ϕ(A) ≥ 0. If .A ⊂ X \ E, then
1
. gn dµ ≥ S − .
X n
We define
fn = max {g1 , . . . , gn } ,
. f = lim fn .
n→+∞
It is not difficult to check that .fn ∈ I for every .n ≥ 1. For instance, in the case
n = 2, for every .E ∈ we have
.
. f2 dµ = g1 dµ + g2 dµ
E E∩{g1 ≥g2 } E∩{g2 >g1 }
The !general case is similar. Since .fn - f , Beppo Levi’s Theorem yields .f ∈ I
and . X f dµ = S. The claim is thus proved. !
Let us now show that for every .E ∈ there results
! .ν(E) =
E f dµ. By
definition of .I, we only need to show that .ν(E) ≤ E f dµ. Suppose not, so that
there exist .ε > 0 and .E0 ∈ such that
. (f + ε) dµ < ν(E0 ).
E0
Since .ν is absolutely continuous with respect to .µ, we see that .µ(E0 ) > 0. If we
define
ϕ(E) = ν(E) −
. (f + ε) dµ,
E
ν(A) −
. (f + ε) dµ ≥ 0 if A ⊂ F
A
ν(A) − (f + ε) dµ ≤ 0 if A ⊂ G.
A
f (x) if x ∈
/F
g(x) =
.
f (x) + ε if x ∈ F
450 16 Measures Before Integrals
belongs to .I and
. g dµ > f dµ = S.
X X
This is impossible, and the proof is complete in the particular case .µ(X) < ∞.
To deal with the general case, let .{Xi }i be a sequence of pairwise disjoint
measurable sets such that .µ(Xi ) is finite for every i and
∞
X=
. Xi .
i=1
Applying the previous step in each .Xi , we see that there exist functions .fi ∈
L1 (Xi , µ) such that .fi ≥ 0 and
ν(E) =
. fi dµ for every E ⊂ Xi .
E
ν(E) =
. f dµ.
E
.
∞ f dµ ≤ ν(X),
i=1 Xi
. f dµ = lim f dµ
E n→+∞ E∩n X
i=1 i
" #
n
= lim ν E ∩ Xi = ν(E),
n→+∞
i=1
The basic result of Riemann integration theory is for sure the formula
b
. f % (x) dx = f (b) − f (a),
a
which expresses a function as the integral of its derivative. We refer back to Theorem
10.12 for a precise statement.
Important: Question
Does there exist a Fundamental Theorem of Calculus for Lebesgue integrals?
Such a natural question requires several new ideas to be answered. In the rest of
the section, .λ = m1 will always denote the Lebesgue measure in .R, and “almost
everywhere” will always refer to this measure.
Definition 16.32 Let .a ∈ R and .δ > 0. If .ϕ : (a, a + δ) → R, we define
These quantities are called respectively the lower right limit and the upper right limit
of the function .ϕ at the point a. Similarly, if .ϕ : (a − δ, a) → R, we define the lower
left limit and the upper left limit of .ϕ at a as
If .f : (a − δ, a] → R, we define
f (a + h) − f (a)
D− f (a) = lim inf
.
h→0 h
f (a + h) − f (a)
D − f (a) = lim sup .
h→0 h
f (a + hn ) − f (a)
.
hn n
where .hn > 0 and .limn→+∞ hn = 0. Conjecture and prove similar statements for
the remaining Dini’s derivatives.
The four Dini’s derivatives describe the lack of differentiability of f at a, since it is
clear that f is differentiable at a if and only if the four Dini’s derivatives are finite
and coincide.
Theorem 16.43 Let .(a, b) be an open interval, and let f be a real-valued function
defined on .(a, b). There exist at most countably many points .x ∈ (a, b) such that
D+ f (x) = D + f (x)
.
and
D− f (x) = D − f (x)
.
For each point .x ∈ A we select a rational number .r(x) such that .f+% (x) < r(x) <
f−% (x). Next we select rational numbers .s(x) and .t (x) such that .a < s(x) < x <
t (x) < b,
f (y) − f (x)
. > r(x) if s(x) < y < x
y−x
16.10 A Strong Form of the Fundamental Theorem of Calculus 453
and
f (y) − f (x)
. < r(x) if x < y < t (x).
y−x
But .r(x) = r(y), hence .0 < 0. This contradiction proves that .ϕ is injective, and
thus A is a countable set. The proof that B is also countable is similar.
Definition 16.34 Let E be a subset of .R. A collection .V of closed intervals, each
having positive measure, is a Vitali cover of E if and only if for every .x ∈ E and
for every .ε > 0 there exists an interval .I ∈ V such that .x ∈ I and .λ(I ) < ε.
Roughly speaking, Vitali covers consist of closed interval of arbitrarily small
lengths.
Theorem 16.44 (Vitali’s Covering Theorem) Let .V be a non-empty Vitali cover
of a set .E ⊂ R. Then there exists a pairwise disjoint countable collection .{In }n ⊂ V
such that
" " ∞
##
.λ E ∩ R\ In = 0.
n=1
Proof
First case: .λ(E) ∈ R. We fix an open set V which contains E and such that
.λ(V ) ∈ R. Let
V0 = {I ∈ V | I ⊂ V } .
.
The fact that .V0 is a Vitali cover of E is clear. If .I1 ∈ V0 and .E ⊂ I1 , the proof
is complete. Otherwise we proceed by induction. Assume that .I1 , .I2 , . . . , .In have
been chosen and are pairwise disjoint. If .E ⊂ nk=1 Ik , the proof is complete.
Otherwise we write
n
An =
. Ik
k=1
Un = V ∩ (R \ An ) .
The set .An is closed as a finite union of closed sets, .Un is open, and .Un ∩ E = ∅.
Define
δn = sup {λ(I ) | I ∈ V0 , I ⊂ Un } .
.
Next we select .In+1 ∈ V0 such that .In+1 ⊂ Un and .λ(In+1 ) > δn /2.
If this procedure continuous indefinitely, weget an infinite sequence .{In }n of
pairwise disjoint elements of .V0 . Let .A = ∞ n=1 In , and we claim that .λ(E ∩
(R \ A)) = 0. Indeed, for every positive integer n there exists a unique closed
interval .Jn having the same mid-point as .In and such that .λ(Jn ) = 5λ(In ). Since
" ∞
# ∞ ∞
λ
. Jn ≤ λ(Jn ) = 5 λ(In ) = 5λ(A) ≤ 5λ(V ),
n=1 n=1 n=1
we see that
" ∞
#
. lim λ Jn = 0.
p→+∞
n=p
As a consequence, it suffices to show that .E ∩ (R \ A) ⊂ ∞ n=p Jn for every
.p ∈ N.
I ∩ Aq = ∅,
. I ∩ Aq−1 = ∅.
16.10 A Strong Form of the Fundamental Theorem of Calculus 455
This yields .I ∩ Iq = ∅ and, since I is a subset of .Uq−1 , we find .λ(I ) ≤ δq−1 <
2λ(Iq ). Recalling that .λ(Jq ) = 5λ(Iq ), we see that
∞
I ⊂ Jq ⊂
. Jn ,
n=p
∞
hence .x ∈ ∞ n=p Jn . In conclusion .E ∩ (R \ A) ⊂ n=p Jn , and this shows that
.λ(E ∩ (R \ A) = 0. To conclude the first part of the proof, let .ε > 0 be given and
But
∞
E ∩ R \ Ap ⊂ (E ∩ (R \ A)) ∪
. In ,
n=p+1
hence
⎛ ⎞
∞
.λ E ∩ R \ Ap ≤ 0 + λ⎝ In ⎠ < ε.
n=p+1
and
Vn = {I ∈ V | I ⊂ (n, n + 1)} .
.
It is a simple exercise to check that .Vn is a Vitali cover of .En . Since .λ(En )
is finite, the first part of the proof applies and yields a finite pairwise disjoint
collection .In ⊂ Vn such that
λ En ∩ R \
. In = 0 for every n ∈ Z.
To conclude, let .I = n∈Z In . Then .I is a countable pairwise disjoint sub-
collection of .V and
" #
.E ∩ R \ I ⊂Z∪ En ∩ (R \ In ) .
n∈Z
456 16 Measures Before Integrals
Since
∞
λ E∩ R\
. I ≤ λ(Z) + 0 = 0,
n=−∞
E = x ∈ [a, b) D+ f (x) < D + f (x) .
.
We claim that .λ(E) = 0. Indeed, we can write .E = {E(u, v) | u ∈ Q, v ∈ Q,
. 0 < u < v}, where
E(u, v) = x ∈ [a, b) D+ f (x) < u < v < D + f (x) .
.
If we show that .λ(E(u, v)) = 0 for every such u and v, the claim follows.
Arguing by contradiction, we assume that for some .0 < u < v in .Q,
.λ(E(u, v)) = α > 0. Fix .ε > 0 so small that
α(v − u)
0<ε<
. .
u + 2v
By the regularity properties of the Lebesgue measure, there exists an open set U
such that .E(u, v) ⊂ U and .λ(U ) < α + ε. By definition, to each .x ∈ E(u, v) there
correspond arbitrarily small numbers h such that .[x, x + h] ⊂ U ∩ [a, b] and
" " ##
m
λ E(u, v) ∩ R \
. [xi , xi + hi ] < ε.
i=1
16.10 A Strong Form of the Fundamental Theorem of Calculus 457
m
If .V = i=1 (xi , xi + hi ), then .λ(E ∩ (R \ V )) < ε. Since .V ⊂ U , we have
m
. hi = λ(V ) ≤ λ(U ) < α + ε,
i=1
and thus
m
m
. (f (xi + hi ) − f (xi )) < u hi < u(α + ε).
i=1 i=1
As before, there exists a finite pairwise disjoint collection .[yj , yj + kj ]
. | j = 1, . . . , n} such that
⎛ ⎛ ⎞⎞
n
λ ⎝E(u, v) ∩ V ∩ ⎝R \
. [yj , yj + kj ]⎠⎠ < ε.
j =1
n
.α = λ(E(u, v)) ≤ λ (E(u, v) ∩ (R \ V )) + λ (E(u, v) ∩ V ) < ε + ε + kj .
j =1
n
n
. v(α − 2ε) < v kj < f (yj + kj ) − f (yj ).
j =1 j =1
Recalling that . nj=1 [yj , yj +kj ] ⊂ m
i=1 [xi , xi +hi ], the monotonicity of f implies
m
m
. f (yj + kj ) − f (yj ) ≤ f (xi + hi ) − f (xi ).
j =1 i=1
In conclusion .v(α − 2ε) < u(α + ε), which is a contradiction. We have thus proved
that .λ(E) = 0, so that the right derivative of f exists as a real number at almost
every point of .[a, b]. By the same token, the left derivative of f exists and is finite
at almost every point of .[a, b]. Theorem 16.43 implies that the derivative .f % (x)
458 16 Measures Before Integrals
exists for almost every .x ∈ [a, b]. To complete the proof, we must show that the set
F of points .x ∈ (a, b) where .f % (x) = +∞ is a set of measure zero.8
Let .β > 0 be given. For every .x ∈ F there exist arbitrarily small numbers h such
that .[x, x + h] ⊂ (a, b) and .f (x + h) − f (x) > βh. We can therefore construct a
countable pairwise disjoint collection .{[xn , xn + hn ] | n = 1, 2, . . .} such that
" " ∞
##
λ F ∩ R\
. [xn , xn + hn ] = 0.
n=1
We derive that
∞
∞
βλ(F ) ≤ β
. hn < f (xn + hn ) − f (xn ) ≤ f (b) − f (a).
n=1 n=1
Since .β > 0 can be arbitrarily large, we see that .λ(F ) = 0. The proof of the theorem
is complete.
Definition 16.35 Let .f : [a, b] → R be a function. We define the total variation of
f over .[a, b] as
)
n
Vab f = sup
. |f (xk ) − f (xk−1 )| a = x0 < x1 < . . . < xn = b .
k=1
The function f is a function of bounded variation over .[a, b] if and only if .Vab f ∈ R.
Remark 16.6 The previous definition can be extended to functions of several vari-
ables, but the language of Measure Theory becomes necessary, and the development
is much more involved. We will not enter into the details in this book.
Exercise 16.15 Prove the identity .Vab f + Vbc f = Vac f for every .a < b < c.
Exercise 16.16 Prove that the function .x → Vax f is non-decreasing on .[a, b].
These exercises inspire the next result.
Theorem 16.46 (Jordan Decomposition Theorem) A function of bounded varia-
tion is the difference of two non-decreasing functions.
Proof If f is of bounded variation over .[a, b], we write
. f (x) = Vax f − Vax f − f (x) .
8 Recall that f is a monotone increasing function, hence the derivative .f % (x) cannot equal .−∞.
16.10 A Strong Form of the Fundamental Theorem of Calculus 459
Here we set .Vaa f = 0. The function .x → Vax f is non-decreasing. If .x1 < x2 , then
Vax2 f − f (x2 ) − Vax1 f − f (x1 ) = Vxx12 f − (f (x2 ) − f (x1 )) ≥ 0
.
Proof We will suppose without loss of generality that all functions .fn are non-
decreasing. Replacing .fn with .fn − fn (a), we also assume that .fn ≥ 0. Thus .s =
∞ %
n=1 fn is non-negative and non-decreasing. Therefore .s exists as a finite number
almost everywhere in .(a, b). Let
sn = f1 + · · · + fn ,
. rn = s − sn .
Each function .fj has finite derivative almost everywhere: there exists a set .A ⊂
(a, b) such that .λ((R \ A) ∩ (a, b)) = 0,
hence
sn (x + h) − sn (x) s(x + h) − s(x)
. ≤ .
h h
As .h → 0, this yields .sn% (x) ≤ s % (x) for every .x ∈ A. Since .sn% (x) ≤ sn+1
% (x) is
trivial, we deduce that
∞
. s(b) − snk (b) < +∞.
k=1
.x ∈ (a, b) we have
We remark that for every k and for every .0 ≤ s(x) − snk (x) ≤
s(b) − snk (b). Therefore the series . ∞ k=1 s(x) − snk (x) converges. Since the
terms of this series are monoton functions
with finite
% derivative almost
everywhere,
the same reasoning as above shows that . ∞ k=1 s (x) − s % (x) converges almost
nk
everywhere. It is now clear that .limk→+∞ sn% k (x) = s % (x) almost everywhere in
.(a, b). The proof is complete.
Let us now face the main problem of identifying those functions F of the form
x
F (x) =
. f (t) dt
a
! x .Va F =
The function F is uniformly continuous and of bounded variation b
inta |f (t)| dt. The same conclusion holds if .f ∈ L (R) and .F (x) = −∞ f (t) dt.
b 1
then
n
n
xk n xk b
|F (xk ) − F (xk−1 )| = f (t) dt ≤ |f (t)| dt = |f (t)| dt.
.
k=1 k=1 xk−1 k=1 xk−1 a
n
s=
. αk χ[xk−1 ,xk )
k=1
in .L1 (a, b), where .a = x0 < x1 < . . . < xn = b. We consider the function .sign f
defined by
⎧
⎪
⎪
⎨1 if f (x) > 0
. sign f (x) = 0 if f (x) = 0
⎪
⎪
⎩−1 if f (x) < 0.
1
. *sm − sign f *1 ≤ .
m
This inequality is preserved if we replace every .αk such that .|αk | > 1 with
.αk |αk |−1 . In other words, we may always assume that .|sm | ≤ 1 for every m. Since
.sm converges to .sign f in measure, there exists a subsequence .{smj }j such that
.smj → sign f pointwise almost everywhere in .[a, b]. The Dominated Convergence
b b b
. |f (t)| dt = f (t) sign f (t) dt = lim f (t)smj (t) dt.
a a j →+∞ a
But
b n xk
. f (t)smj (t) dt = αk f (t) dt
a k=1 xk−1
462 16 Measures Before Integrals
n
= αk (F (xk ) − F (xk−1 ))
k=1
n
≤ |αk ||F (xk ) − F (xk−1 )|
k=1
n
≤ |F (xk ) − F (xk−1 )|
k=1
≤ Vab F.
!b
This shows that . a |f (t)| dt ≤ Vab F , and the proof is complete.
Theorem 16.50 If A is a subset of .R, then
.U1 ⊃ U2 ⊃ · · · ⊃ Un ⊃ · · · ⊃ A
and .λ(Un ) − 2−n < λ(A). We call .a = inf U1 and consider the functions
ϕn (x + h) − ϕn (x) ϕn (x) − ϕn (x − h)
. = = 1.
h h
Hence .ϕn% (x) exists at all .x ∈ Un and .ϕn% (x) = 1. We consider the series
1
ϕn (b) − ϕ(b) = λ(Un ) − λ(A) <
. .
2n
For every .x ∈ [a, b] we then have
∞
∞
∞
1
. (ϕn (x) − ϕ(x)) ≤ (ϕn (b) − ϕ(b)) ≤ < +∞.
2n
n=1 n=1 n=1
∞
We may set .s(x) = n=1 (ϕn (x) − ϕ(x)). By Theorem 16.48 we have that
∞
%
s % (x) =
. ϕn (x) − ϕ % (x) ∈ R
n=1
for almost every .x ∈ (a, b), and so also .limn→+∞ ϕn% (x) %
∞= ϕ (x) for almost every
%
.x ∈ (a, b). To summarize, we see that .ϕ = 1 on . n=1 Un except on a set of
measure zero, and the first statement of the theorem is proved.
If A is also Lebesgue-measurable, then
As .h → 0 and .k → 0 along positive values, the first part of the theorem applied
to .R \ A yields that .ψR\A (x) → 1 for a.e. .x ∈ R \ A. Hence .ψA (x) → 0 for a.e.
.x ∈ R \ A, and the proof is complete.
!
Theorem 16.51 If .f ∈ L1 (a, b) and .F (x) = a f (t) dt, then .F % (x) = f (x) for
x
x
n x
.S(x) = s(t) dt = αk χAk (t) dt.
a k=1 a
Again Theorem 16.50 shows that .S % (x) = s(x) for a.e. .x ∈ (a, b). If .f ∈ L1 (a, b),
there exists a sequence .{sn }n of simple measurable ! x functions such that .sn ≤ sn+1
and .sn (x) → f (x) for all .x ∈ [a, b]. If .Sn (x) = a sn (t) dt, Beppo Levi’s Theorem
yields
x x
F (x) =
. f (t) dt = lim sn (t) dt = lim Sn (x)
a n→+∞ a n→+∞
∞
= S1 (x) + (Sn+1 (x) − Sn (x))
n=1
= lim S % (x)
n→+∞ n+1
for a.e. .x ∈ (a, b). Therefore .limn→+∞ Sn% (x) = limn→+∞ sn (x) for a.e. .x ∈ (a, b).
Since .sn → f , we conclude that .F % (x) = f (x) for a.e. .x ∈ (a, b).
The hard question is whether the last theorem can be reversed: if .ϕ is a continuous
function whose derivative exists almost everywhere, is it true that
x
ϕ(x) − ϕ(a) =
. ϕ % (t) dt ?
a
Very technical examples show that this is typically false. The validity of the
Fundamental Theorem of Calculus is however true for a restricted class of functions.
Definition 16.36 Let J be an interval (of any kind, even unbounded), and let
f : J → R be a function. We say that f is absolutely continuous on J if and only
.
n
. |f (dk ) − f (ck )| < ε
k=1
16.10 A Strong Form of the Fundamental Theorem of Calculus 465
n
Vab f ≤
. k f ≤ n.
Vxxk−1
k=1
Proposition 16.7 Every absolutely continuous function f on .[a, b] is continuous,
and can be written as the difference .f1 − f2 of two non-decreasing absolutely
continuous functions .f1 and .f2 .
Proof It is obvious that absolutely continuous functions are continuous. We set
f1 (x) = Vax f and .f2 = f1 − f . We need to prove that .f1 is absolutely continuous.
.
n
ε
. |f (dk ) − f (ck )| <
2
k=1
whenever the pairwise disjoint intervals .(ck , dk ) satisfy . nk=1 (dk − ck ) < δ. Since
f is of bounded variation, each .(ck , dk ) admits a subdivision
such that
k −1
(k) (k) ε
d
.Vc k f
k
< f (aj +1 ) − f (aj ) + .
2n
j =0
Hence
n
n k −1
n ε
(k) (k)
. |f1 (dk ) − f1 (ck )| = Vcdkk f < f (aj +1 ) − f (aj ) +
2
k=1 k=1 k=1 j =0
ε ε
< + = ε,
2 2
and the proof is complete.
466 16 Measures Before Integrals
b
. f % (x) dx ≤ f (b) − f (a).
a
Then .{fn }n is a sequence of non-negative measurable functions such that .fn (x) →
f % (x) for a.e. .x ∈ (a, b). In particular .f % is measurable. We apply Fatou’s Lemma:
b b b
. f % (x) dx = lim fn (x) dx ≤ lim inf fn (x) dx
a a n→+∞ n→+∞ a
b 1
= lim inf n f x+ − f (x) dx
n→+∞ a n
" #
b+ n1 a+ n1
= lim inf n f (x) dx − n f (x) dx
n→+∞ b a
" #
b+ n1 a+ n1
≤ lim inf n f (b) dx − n f (a) dx
n→+∞ b a
= f (b) − f (a).
h
. |f (x + h) − f (x)| < ε.
c−a
16.10 A Strong Form of the Fundamental Theorem of Calculus 467
Since the collection of all such intervals .[x, x + h] is a Vitali cover of E, there exists
a finite pairwise disjoint collection .{[xk , xk + hk ] | k = 1, . . . , n} such that
" " ##
n
. λ E∩ R\ [xk , xk + hk ] < δ.
k=1
Hence .λ((a, c)) = λ(E) < δ + nk=1 hk . Assuming that .x1 < x2 < · · · < xn , it
follows that the sum of the lengths of the intervals
n−1
|f (a) − f (x1 )| +
. |f (xk + hk ) − f (xk )| + |f (xn + hn ) − f (c)| < ε.
k=1
n−1
|f (a) − f (c)| ≤ |f (a) − f (x1 )| +
. |f (xk + hk ) − f (xk )|
k=1
n
+ |f (xn + hn ) − f (c)| + |f (xk + hk ) − f (xk )|
k=1
n
hk
<ε+ ε ≤ 2ε.
c−a
k=1
Since .ε > 0 is arbitrary, we conclude that .f (a) = f (c), and f is constant on .[a, b].
Theorem 16.54 (Fundamental Theorem of Calculus) If .f : [a, b] → R is an
absolutely continuous function, then .f ∈ L1 ([a, b]) and
x
f (x) = f (a) +
. f (t) dt
a
statement.
Theorem 16.55 A function .f : [a, b] → R has the form
x
f (x) = f (a) +
. ϕ(t) dt
a
for some .ϕ ∈ L1 ([a, b]) if and only if f is absolutely continuous on .[a, b]. In this
case there results .ϕ = f % almost everywhere in .[a, b].
!x
Exercise 16.17 Prove that a function .f : R → R has the form .f (x) = −∞ ϕ(t) dt
for some .ϕ ∈ L1 (R) if and only if f is absolutely continuous on .[−A, A] for every
+∞
.A > 0, .V−∞ f ∈ R and .limx→−∞ f (x) = 0. To this aim, proceed as follows.
!x
(1) If f has the form .f (x) = −∞ ϕ(t) dt, prove that f is absolutely continuous
+∞ !
on every interval .[−A, A] and that .V−∞ f = R ϕ(t) dt. Use the Dominated
Convergence Theorem to show that .f (x) → 0 as .x → −∞.
(2) Conversely,
!x % apply the previous theorem to deduce that .f (x) = f (−A) +
f (t) dt for every .A > 0 and every .x > −A.
−A !x
(3) Let .A → +∞ and deduce that .f (x) = limA→+∞ −A f % (t) dt.
! +∞ % !n
(4) Prove that . −∞ |f (t)| dt = limn→+∞ −n |f % (t)| dt = limn→+∞ V−n n f ≤
+∞
V−∞ f ∈ R. !x
(5) Deduce that .f % ∈ L1 (R), so that (3) gives .f (x) = −∞ f % (t) dt.
16.11 Problems
16.3 In a metric setting, the proof of Urysohn’s Lemma is much easier. Let (X, d)
be a metric space. For every non-empty subset E of X we define for all x ∈ X
dA (x)
f (x) =
. .
dA (x) + dB (x)
16.4 Prove that Theorem 16.18 extends to the situation in which the sequence {fn }n
is replaced by a family {ft | t ∈ R} such that (i) ft (x) → f (x) as t → +∞, for
every x ∈ X; (b) t → ft (x) is continuous, for every x ∈ X.
!
16.5 Let u ∈ L1loc () be such that u(x)v(x) dx = 0 for every v ∈ D(). Prove
that u = 0 almost everywhere on . Hint: observe that !n ∗ u = 0 for every n.
16.12 Comments
References
1. L.C. Evans, R.F. Gariepy, Measure Theory and Fine Properties of Functions. Textbooks in
Mathematics Series Profile, 2nd edn. (CRC Press, Boca Raton, 2015)
2. G.B. Folland, Real Analysis: Modern Techniques and Their Applications. Pure and Applied
Mathematics. A Wiley-Interscience Series of Texts, Monographs, and Tracts, 2nd edn. (Wiley,
New York, 1999)
3. W. Rudin, Real and Complex Analysis. McGraw-Hill Series in Higher Mathematics, xi, 412 p.
(McGraw-Hill Book Company, New York, 1966)