Topics in Linear and Nonlinear Functional Analysis
Topics in Linear and Nonlinear Functional Analysis
Functional Analysis
Gerald Teschl
Graduate Studies
in Mathematics
Volume (to appear)
E-mail: Gerald.Teschl@univie.ac.at
URL: http://www.mat.univie.ac.at/~gerald/
Preface vii
iii
iv Contents
The present manuscript was written for my course Functional Analysis given
at the University of Vienna in winter 2004 and 2009. The second part are
the notes for my course Nonlinear Functional Analysis held at the University
of Vienna in Summer 1998, 2001, and 2018. The two parts are essentially in-
dependent. In particular, the first part does not assume any knowledge from
measure theory (at the expense of hardly mentioning Lp spaces). However,
there is an accompanying part on Real Analysis [37], where these topics are
covered.
It is updated whenever I find some errors and extended from time to
time. Hence you might want to make sure that you have the most recent
version, which is available from
http://www.mat.univie.ac.at/~gerald/ftp/book-fa/
Please do not redistribute this file or put a copy on your personal
webpage but link to the page above.
Goals
The main goal of the present book is to give students a concise introduc-
tion which gets to some interesting results without much ado while using a
sufficiently general approach suitable for further studies. Still I have tried
to always start with some interesting special cases and then work my way
up to the general theory. While this unavoidably leads to some duplications,
it usually provides much better motivation and implies that the core ma-
terial always comes first (while the more general results are then optional).
Moreover, this book is not written under the assumption that it will be
vii
viii Preface
read linearly starting with the first chapter and ending with the last. Con-
sequently, I have tried to separate core and optional materials as much as
possible while keeping the optional parts as independent as possible.
Furthermore, my aim is not to present an encyclopedic treatment but to
provide the reader with a versatile toolbox for further study. Moreover, in
contradistinction to many other books, I do not have a particular direction
in mind and hence I am trying to give a broad introduction which should
prepare you for diverse fields such as spectral theory, partial differential
equations, or probability theory. This is related to the fact that I am working
in mathematical physics, an area where you never know what mathematical
theory you will need next.
I have tried to keep a balance between verbosity and clarity in the sense
that I have tried to provide sufficient detail for being able to follow the argu-
ments but without drowning the key ideas in boring details. In particular,
you will find a show this from time to time encouraging the reader to check
the claims made (these tasks typically involve only simple routine calcula-
tions). Moreover, to make the presentation student friendly, I have tried
to include many worked-out examples within the main text. Some of them
are standard counterexamples pointing out the limitations of theorems (and
explaining why the assumptions are important). Others show how to use the
theory in the investigation of practical examples.
Preliminaries
Content
To the teacher
Acknowledgments
I wish to thank my readers, Olta Ahmeti, Kerstin Ammann, Phillip Bachler,
Batuhan Bayır, Alexander Beigl, Mikhail Botchkarev, Ho Boon Suan, Peng
Du, Christian Ekstrand, Mischa Elkner, Damir Ferizović, Michael Fischer,
Raffaello Giulietti, Melanie Graf, Josef Greilhuber, Julian Grüber, Matthias
Hammerl, Jona Marie Hassenbach, Nobuya Kakehashi, Jerzy Knopik, Niko-
las Knotz, Florian Kogelbauer, Helge Krüger, Reinhold Küstner, Oliver Lein-
gang, Juho Leppäkangas, Annemarie Luger, Joris Mestdagh, Alice Mikikits-
Leitner, Claudiu Mîndrilǎ, Jakob Möller, Caroline Moosmüller, Nikola Ne-
govanovic, Matthias Ostermann, Martina Pflegpeter, Mateusz Piorkowski,
Piotr Owczarek, Fabio Plaga, Tobias Preinerstorfer, Maximilian H. Ruep,
Tidhar Sariel, Chiara Schindler, Christian Schmid, Stephan Schneider, Laura
Shou, Bertram Tschiderer, Liam Urban, Vincent Valmorin, David Wallauch,
Richard Welke, David Wimmesberger, Gunter Wirthumer, Song Xiaojun,
Markus Youssef, Rudolf Zeidler, and colleagues Pierre-Antoine Absil, Nils
C. Framstad, Fritz Gesztesy, Heinz Hanßmann, Günther Hörmann, Aleksey
Kostenko, Wallace Lam, Daniel Lenz, Johanna Michor, Viktor Qvarfordt,
Alex Strohmaier, David C. Ullrich, Hendrik Vogt, Marko Stautz, Maxim
Zinchenko who have pointed out several typos and made useful suggestions
for improvements. Moreover, I am most grateful to Iryna Karpenko who
read several parts of the manuscript, provided long lists of typos, and also
xii Preface
contributed some of the problems. I am also grateful to Volker Enß for mak-
ing his lecture notes on nonlinear Functional Analysis available to me.
Gerald Teschl
Vienna, Austria
January, 2019
Part 1
Functional Analysis
Chapter 1
3
4 1. A first look at Banach and Hilbert spaces
which cannot be satisfied and explains our choice of sign above). In summary,
we obtain the solutions
2
un (t, x) := cn e−(πn) t sin(nπx), n ∈ N. (1.10)
1.1. Introduction: Linear partial differential equations 5
So we have found a large number of solutions, but we still have not dealt
with our initial condition u(0, x) = u0 (x). This can be done using the
superposition principle which holds since our equation is linear: Any finite
linear combination of the above solutions will again be a solution. Moreover,
under suitable conditions on the coefficients we can even consider infinite
linear combinations. In fact, choosing
∞
2
X
u(t, x) := cn e−(πn) t sin(nπx), (1.11)
n=1
we see that the solution of our original problem is given by (1.11) if we choose
cn = û0,n (cf. Problem 1.2).
Of course for this last statement to hold we need to ensure that the series
in (1.11) converges and that we can interchange summation and differentia-
tion. You are asked to do so in Problem 1.1.
In fact, many equations in physics can be solved in a similar way:
• Reaction-Diffusion equation:
∂ ∂2
u(t, x) − 2 u(t, x) + q(x)u(t, x) = 0,
∂t ∂x
u(0, x) = u0 (x),
u(t, 0) = u(t, 1) = 0. (1.14)
Here u(t, x) could be the density of some gas in a pipe and q(x) > 0 describes
that a certain amount per time is removed (e.g., by a chemical reaction).
• Wave equation:
∂2 ∂2
2
u(t, x) − 2 u(t, x) = 0,
∂t ∂x
∂u
u(0, x) = u0 (x), (0, x) = v0 (x)
∂t
u(t, 0) = u(t, 1) = 0. (1.15)
Here u(t, x) is the displacement of a vibrating string which is fixed at x = 0
and x = 1. Since the equation is of second order in time, both the initial
6 1. A first look at Banach and Hilbert spaces
displacement u0 (x) and the initial velocity v0 (x) of the string need to be
known.
• Schrödinger equation:2
∂ ∂2
i u(t, x) = − 2 u(t, x) + q(x)u(t, x),
∂t ∂x
u(0, x) = u0 (x),
u(t, 0) = u(t, 1) = 0. (1.16)
Here |u(t, x)|2 is the probability distribution of a particle trapped in a box
x ∈ [0, 1] and q(x) is a given external potential which describes the forces
acting on the particle.
All these problems (and many others) lead to the investigation of the
following problem
d2
Ly(x) = λy(x), L := − + q(x), (1.17)
dx2
subject to the boundary conditions
y(a) = y(b) = 0. (1.18)
Such a problem is called a Sturm–Liouville boundary value problem.3
Our example shows that we should prove the following facts about Sturm–
Liouville problems:
(i) The Sturm–Liouville problem has a countable number of eigenval-
ues En with corresponding eigenfunctions un , that is, un satisfies
the boundary conditions and Lun = En un .
(ii) The eigenfunctions un are complete, that is, any nice function u
can be expanded into a generalized Fourier series
∞
X
u(x) = cn un (x).
n=1
This problem is very similar to the eigenvalue problem of a matrix and we
are looking for a generalization of the well-known fact that every symmetric
matrix has an orthonormal basis of eigenvectors. However, our linear opera-
tor L is now acting on some space of functions which is not finite dimensional
and it is not at all clear what (e.g.) orthogonal should mean in this context.
Moreover, since we need to handle infinite series, we need convergence and
hence we need to define the distance of two functions as well.
Hence our program looks as follows:
2Erwin Schrödinger (1887–1961), Austrian physicist
3Jacques Charles François Sturm (1803–1855), French mathematician
3Joseph Liouville (1809–1882), French mathematician and engineer
1.1. Introduction: Linear partial differential equations 7
provided the sum in (1.13) converges uniformly. Conclude that in this case
the solution can be expressed as
Z 1
u(t, x) = K(t, x, y)u0 (y)dy, t > 0,
0
where
∞
2
X
K(t, x, y) := 2 e−(πn) t sin(nπx) sin(nπy)
n=1
1 x−y x+y
= ϑ( , iπt) − ϑ( , iπt) .
2 2 2
Here
2 τ +2πinz 2τ
X X
ϑ(z, τ ) := eiπn =1+2 eiπn cos(2πnz), Im(τ ) > 0,
n∈Z n∈N
It is not hard to see that with this definition C(I) becomes a normed vector
space:
A normed vector space X is a vector space X over C (or R) with a
nonnegative function (the norm) ∥.∥ : X → [0, ∞) such that
• ∥f ∥ > 0 for f ∈ X \ {0} (positive definiteness),
• ∥α f ∥ = |α| ∥f ∥ for all α ∈ C, f ∈ X (positive homogeneity),
and
• ∥f + g∥ ≤ ∥f ∥ + ∥g∥ for all f, g ∈ X (triangle inequality).
If positive definiteness is dropped from the requirements, one calls ∥.∥ a
seminorm.
From the triangle inequality we also get the inverse triangle inequal-
ity (Problem 1.3)
|∥f ∥ − ∥g∥| ≤ ∥f − g∥, (1.20)
which shows that the norm is continuous.
Also note that norms are closely related to convexity. To this end recall
that a subset C ⊆ X is called convex if for every f, g ∈ C we also have
λf + (1 − λ)g ∈ C whenever λ ∈ (0, 1). Moreover, a mapping F : C → R is
called convex if F (λf +(1−λ)g) ≤ λF (f )+(1−λ)F (g) whenever λ ∈ (0, 1)
and f, g ∈ C. In our case the triangle inequality plus homogeneity imply
that every norm is convex:
∥λf + (1 − λ)g∥ ≤ λ∥f ∥ + (1 − λ)∥g∥, λ ∈ [0, 1]. (1.21)
Moreover, choosing λ = 21 we get back the triangle inequality upon using
homogeneity. In particular, the triangle inequality could be replaced by
convexity in the definition.
Once we have a norm, we have a distance d(f, g) := ∥f − g∥ (in par-
ticular, every normed space is a special case of a metric space) and hence
we know when a sequence of vectors fn converges to a vector f (namely if
1.2. The Banach space of continuous functions 9
∥fn − f ∥ → 0, that is, for every ε > 0 there is some N such that ∥fn − f ∥ < ε
for all n ≥ N ). We will write fn → f or limn→∞ fn = f , as usual, in this
case. Moreover, a mapping F : X → Y between two normed spaces is
called continuous if for every convergent sequence fn → f from X we have
F (fn ) → F (f ) (with respect to the norm of X and Y , respectively). In
fact, the norm, vector addition, and multiplication by scalars are continuous
(Problem 1.4).
Two normed spaces X and Y are called isomorphic if there exists a lin-
ear bijection T : X → Y such that T and its inverse T −1 are continuous. We
will write X ∼= Y in this case. They are called isometrically isomorphic
if in addition, T is an isometry, ∥T (f )∥ = ∥f ∥ for every f ∈ X.
In addition to the concept of convergence, we also have the concept of
a Cauchy sequence:6 A sequence fn is Cauchy if for every ε > 0 there is
some N such that ∥fn −fm ∥ < ε for all n, m ≥ N . Of course every convergent
sequence is Cauchy but the converse might not be true in general. Hence a
normed space is called complete if every Cauchy sequence has a limit. A
complete normed space is called a Banach space.7
Example 1.1. By completeness of the real numbers R as well as the complex
numbers C with the absolute value as norm are Banach spaces. ⋄
Example 1.2. The space ℓ1 (N) of all complex-valued sequences a = (aj )∞
j=1
for which the norm
X∞
∥a∥1 := |aj | (1.22)
j=1
is finite is a Banach space.
To show this, we need to verify three things: (i) ℓ1 (N) is a vector space,
that is, closed under addition and scalar multiplication, (ii) ∥.∥1 satisfies the
three requirements for a norm, and (iii) ℓ1 (N) is complete.
First of all, observe
k
X k
X k
X
|aj + bj | ≤ |aj | + |bj | ≤ ∥a∥1 + ∥b∥1
j=1 j=1 j=1
k
X
|aj − anj | ≤ ε, n ≥ N.
j=1
p=∞
1
p=4
p=2
p=1
1
p= 2
−1 1
−1
restriction p ≥ 1). Moreover, for 1 < p < ∞ it is even strictly convex (that
is, the line segment joining two distinct points is always in the interior). This
is related to the question of equality in the triangle inequality and will be
discussed in Problems 1.15 and 1.16. ⋄
Example 1.4. The space ℓ∞ (N) of all complex-valued bounded sequences
j=1 together with the norm
a = (aj )∞
is a Banach space (Problem 1.11). Note that with this definition, Hölder’s
inequality (1.25) remains true for the cases p = 1, q = ∞ and p = ∞, q = 1.
The reason for the notation is explained in Problem 1.17. ⋄
By a subspace U of a normed space X, we mean a subset which is closed
under the vector operations. If it is also closed in a topological sense, we call
it a closed subspace. In this context recall that a subset U ⊆ X is called open
if for every point f there is also a ball Bε (f ) := {g ∈ X| ∥f − g∥ < ε} ⊆ X
contained within the set. The closed sets are then defined as the complements
of open sets and one has that a set V ⊆ X is closed if and only if for
every convergent sequence fn ∈ V the limit is also in the set, limn fn ∈ V .
Warning: Some authors require subspaces to be closed.
Example 1.5. Every closed subspace of a Banach space is again a Banach
space. For example, the space c0 (N) ⊂ ℓ∞ (N) of all sequences converging to
zero is a closed subspace. In fact, if a ∈ ℓ∞ (N)\c0 (N), then lim supj→∞ |aj | =
12 1. A first look at Banach and Hilbert spaces
ε > 0 and thus a + b ̸∈ c0 (N) for every b ∈ ℓ∞ (N) with ∥b∥∞ < ε. Hence the
complement of c0 (N) is open. ⋄
Now what about completeness of C(I)? A sequence of functions fn
converges to f if and only if
lim ∥f − fn ∥∞ = lim max |f (x) − fn (x)| = 0. (1.27)
n→∞ n→∞ x∈I
For finite dimensional vector spaces the concept of a basis plays a crucial
role. In the case of infinite dimensional vector spaces one could define a
basis as a maximal set of linearly independent vectors (known as a Hamel
basis;10 Problem 1.8). Such a basis has the advantage that it only requires
finite linear combinations. However, the price one has to pay is that such
a basis will be way too large (typically uncountable, cf. Problems 1.7 and
4.4). Since we have the notion of convergence, we can handle countable
linear combinations and try to look for countable bases. We start with a few
definitions.
10Georg Hamel (1877–1954)), German mathematician
1.2. The Banach space of continuous functions 13
The set of all finite linear combinations of a set of vectors {un }n∈N ⊂ X
is called the span of {un }n∈N and denoted by
m
X
span{un }n∈N := { αj unj |nj ∈ N , αj ∈ C, m ∈ N}. (1.30)
j=1
Pm
Let a = (aj )∞
j=1 ∈ ℓ (N) be given and set a :=
p m
n=1 an δ . Then
n
1/p
X∞
∥a − am ∥p = |aj |p → 0
j=m+1
since am
j = aj for 1 ≤ j ≤ m and am
j = 0 for j > m. Hence
∞
X
a= an δ n
n=1
and (δ n )∞
n=1 is a Schauder basis (uniqueness of the coefficients is left as an
exercise).
Note that (δ n )∞
n=1 is also Schauder basis for c0 (N) but not for ℓ (N) (try
∞
Proof. Since f is uniformly continuous, for given ε we can find a δ < 1/2
(independent of x) such that |f (x) − R f (y)| ≤ ε whenever |x − y| ≤ δ. More-
over, we can choose n such that δ≤|y|≤1 un (y)dy ≤ ε. Now abbreviate
M := maxx∈[−1/2,1/2] {1, |f (x)|} and note
Z 1/2 Z 1/2
|f (x) − un (x − y)f (x)dy| = |f (x)| |1 − un (x − y)dy| ≤ M ε.
−1/2 −1/2
Corollary 1.4. The monomials are total and hence C(I) is separable.
Note that while the proof of Theorem 1.3 provides an explicit way of
constructing a sequence of polynomials fn (x) which will converge uniformly
to f (x), this method still has a few drawbacks from a practical point of
view: Suppose we have approximated f by a polynomial of degree n but our
approximation turns out to be insufficient for the intended purpose. First
of all, since our polynomial will not be optimal in general, we could try to
find another polynomial of the same degree giving a better approximation.
However, as this is by no means straightforward, it seems more feasible to
simply increase the degree. However, if we do this, all coefficients will change
and we need to start from scratch. This is in contradistinction to a Schauder
basis where we could just add one new element from the basis (and where it
suffices to compute one new coefficient).
In particular, note that this shows that the monomials are no Schauder
basis for C(I) since the coefficients must satisfy |αn |∥x∥n∞ = ∥fn −fn−1 ∥∞ →
0 and hence the limit must be analytic on the interior of I. This observation
emphasizes that a Schauder basis is more than a set of linearly independent
vectors whose span is dense.
We will see in the next section that the concept of orthogonality resolves
these problems.
Problem* 1.3. Let X be a normed space and f, g ∈ X. Show that |∥f ∥ −
∥g∥| ≤ ∥f − g∥.
Problem* 1.4. Let X be a normed space. Show that the norm, vector
addition, and multiplication by scalars are continuous. That is, if fn → f ,
gn → g, and αn → α, then ∥fn ∥ → ∥f ∥, fn + gn → f + g, and αn gn → αg.
Problem 1.5. Let X be a normed space and g ∈ X. Show that ∥f ∥ ≤
max(∥f − g∥, ∥f + g∥).
1.2. The Banach space of continuous functions 17
P∞
Problem 1.6. Let X be a Banach space. Show that j=1 ∥fj ∥ < ∞ implies
that
∞
X Xn
fj = lim fj
n→∞
j=1 j=1
exists. The series is called absolutely convergent in this case. Conversely,
show that a normed space is complete if every absolutely convergent series
converges.
Problem 1.7. While ℓ1 (N) is separable, it still has room for an uncountable
set of linearly independent vectors. Show this by considering vectors of the
form
aα = (1, α, α2 , . . . ), α ∈ (0, 1).
(Hint: Recall the Vandermonde15 determinant. See Problem 4.4 for a gen-
eralization.)
Problem 1.8. A Hamel basis is a maximal set of linearly independent
vectors. Show that every vector space X has a Hamel basis {uα }α∈A . Show
that given a HamelPnbasis, every x ∈ X can be written as a finite linear
combination x = j=1 cj uαj , where the vectors uαj and the constants cj
are uniquely determined. (Hint: Use Zorn’s lemma, Theorem A.2, to show
existence.)
Problem* 1.9. Prove Young’s inequality (1.24). Show that equality occurs
precisely if α = β. (Hint: Take logarithms on both sides.)
Problem* 1.10. Show that ℓp (N), 1 ≤ p < ∞, is complete.
Problem* 1.11. Show that ℓ∞ (N) is a Banach space.
Problem 1.12. Is ℓ1 (N) a closed subspace of ℓ∞ (N) (with respect to the
∥.∥∞ norm)? If not, what is its closure?
Problem* 1.13. Show that ℓ∞ (N) is not separable. (Hint: Consider se-
quences which take only the value one and zero. How many are there? What
is the distance between two such sequences?)
Problem 1.14. Show that the set of convergent sequences c(N) is a Banach
space isomorphic to the set of convergent sequence c0 (N). (Hint: Hilbert’s
hotel.)
Problem* 1.15. Show that there is equality in the Hölder inequality (1.25)
for 1 < p < ∞ if and only if either a = 0 or |bj |q = α|aj |p for all j ∈ N.
Show that we have equality in the triangle inequality for ℓ1 (N) if and only if
aj b∗j ≥ 0 for all j ∈ N (here the ‘∗’ denotes complex conjugation). Show that
15Alexandre-Théophile Vandermonde (1735–1796), French mathematician, musician and
chemist
18 1. A first look at Banach and Hilbert spaces
we have equality in the triangle inequality for ℓp (N) with 1 < p < ∞ if and
only if a = 0 or b = αa with α ≥ 0.
Problem* 1.16. Let X be a normed space. Show that the following condi-
tions are equivalent.
(i) If ∥x + y∥ = ∥x∥ + ∥y∥ then y = αx for some α ≥ 0 or x = 0.
(ii) If ∥x∥ = ∥y∥ = 1 and x ̸= y then ∥λx + (1 − λ)y∥ < 1 for all
0 < λ < 1.
(iii) If ∥x∥ = ∥y∥ = 1 and x ̸= y then 21 ∥x + y∥ < 1.
(iv) The function x 7→ ∥x∥2 is strictly convex.
A norm satisfying one of them is called strictly convex.
Show that ℓp (N) is strictly convex for 1 < p < ∞ but not for p = 1, ∞.
Problem 1.17. Show that p0 ≤ p implies ℓp0 (N) ⊂ ℓp (N) and ∥a∥p ≤ ∥a∥p0 .
Moreover, show
lim ∥a∥p = ∥a∥∞ .
p→∞
Problem 1.18. Formally extend the definition of ℓp (N) to p ∈ (0, 1). Show
that ∥.∥p does not satisfy the triangle inequality. However, show that it is a
quasinormed space, that is, it satisfies all requirements for a normed space
except for the triangle inequality which is replaced by
∥a + b∥ ≤ K(∥a∥ + ∥b∥)
with some constant K ≥ 1. Show, in fact,
∥a + b∥p ≤ 21/p−1 (∥a∥p + ∥b∥p ), p ∈ (0, 1).
Moreover, show that ∥.∥pp
satisfies the triangle inequality in this case, but
of course it is no longer homogeneous (but at least you can get an honest
metric d(a, b) = ∥a − b∥pp which gives rise to the same topology). (Hint:
Show α + β ≤ (αp + β p )1/p ≤ 21/p−1 (α + β) for 0 < p < 1 and α, β ≥ 0.)
Problem 1.19. Let I be a compact interval and consider X := C(I). Which
of following sets are subspaces of X? If yes, are they closed?
(i) monotone functions
(ii) even functions
(iii) polynomials
(iv) polynomials of degree at most k for some fixed k ∈ N0
(v) continuous piecewise linear functions
(vi) C 1 (I)
(vii) {f ∈ C(I)|f (c) = f0 } for some fixed c ∈ I and f0 ∈ R
1.3. The geometry of Hilbert spaces 19
∞}.
Problem* 1.22. Show that the following set of functions is a Schauder
basis for C[0, 1]: We start with u1 (t) = t, u2 (t) = 1 − t and then split
[0, 1] into 2n intervals of equal length and let u2n +k+1 (t), for 1 ≤ k ≤ 2n ,
be a piecewise linear peak of height 1 supported in the k’th subinterval:
u2n +k+1 (t) := max(0, 1 − |2n+1 t − 2k + 1|) for n ∈ N0 and 1 ≤ k ≤ 2n .
The pair (H, ⟨., ..⟩) is called an inner product space. If H is complete
(with respect to the norm (1.36)), it is called a Hilbert space.17
Example 1.9. Clearly, Cn with the usual scalar product
n
X
⟨a, b⟩ := a∗j bj (1.37)
j=1
BMB
f B f⊥
B
1
B
f∥
u
1
Proof. It suffices to prove the case ∥g∥ = 1. But then the claim follows
from ∥f ∥2 = |⟨g, f ⟩|2 + ∥f⊥ ∥2 . □
holds.
In this case the scalar product can be recovered from its norm by virtue
of the polarization identity
1
∥f + g∥2 − ∥f − g∥2 + i∥f − ig∥2 − i∥f + ig∥2 . (1.47)
⟨f, g⟩ =
4
Proof. If an inner product space is given, verification of the parallelogram
law and the polarization identity is straightforward (Problem 1.27).
To show the converse, we define
1
∥f + g∥2 − ∥f − g∥2 + i∥f − ig∥2 − i∥f + ig∥2 .
s(f, g) :=
4
Then s(f, f ) = ∥f ∥2 and s(f, g) = s(g, f )∗ are straightforward to check.
Moreover, another straightforward computation using the parallelogram law
shows
g+h
s(f, g) + s(f, h) = 2s(f, ).
2
Now choosing h = 0 (and using s(f, 0) = 0) shows s(f, g) = 2s(f, g2 ) and thus
s(f, g)+s(f, h) = s(f, g +h). Furthermore, by induction we infer 2mn s(f, g) =
s(f, 2mn g); that is, α s(f, g) = s(f, αg) for a dense set of positive rational
numbers α. By continuity (which follows from continuity of the norm) this
holds for all α ≥ 0 and s(f, −g) = −s(f, g), respectively, s(f, ig) = i s(f, g),
finishes the proof. □
Suppose we have two norms ∥.∥1 and ∥.∥2 on a vector space X. Then
∥.∥2 is said to be stronger than ∥.∥1 if there is a constant m > 0 such that
∥f ∥1 ≤ m∥f ∥2 . (1.50)
It is straightforward to check the following.
Lemma 1.7. If ∥.∥2 is stronger than ∥.∥1 , then every ∥.∥2 Cauchy sequence
is also a ∥.∥1 Cauchy sequence.
1.3. The geometry of Hilbert spaces 23
Proof. ChooseP a basis {uj }1≤j≤n such that every f ∈ X can be writ-
ten as f = j αj uj . Since equivalence of norms is an equivalence rela-
tion (check this!), we can assume that ∥.∥2 is the usual Euclidean norm:
∥f ∥2 := ∥ j αj uj ∥2 = ( j |αj |2 )1/2 . Then by the triangle and Cauchy–
P P
Schwarz inequalities,
X sX
∥f ∥1 ≤ |αj |∥uj ∥1 ≤ ∥uj ∥21 ∥f ∥2
j j
qP
and we can choose m2 = j ∥uj ∥21 .
In particular, if fn is convergent with respect to ∥.∥2 , it is also convergent
with respect to ∥.∥1 . Thus ∥.∥1 is continuous with respect to ∥.∥2 and attains
its minimum m > 0 on the unit sphere S := {u|∥u∥2 = 1} (which is compact
by the Heine–Borel theorem, Theorem B.22). Now choose m1 = 1/m. □
Finally, I remark that a real Hilbert space can always be embedded into
a complex Hilbert space. In fact, if H is a real Hilbert space, then H × H is
a complex Hilbert space if we define
(f1 , f2 )+(g1 , g2 ) = (f1 +g1 , f2 +g2 ), (α+iβ)(f1 , f2 ) = (αf1 −βf2 , αf2 +βf1 )
(1.52)
and
Here you should think of (f1 , f2 ) as f1 + if2 . Note that we have a conjugate
linear map C : H × H → H × H, (f1 , f2 ) 7→ (f1 , −f2 ) which satisfies C 2 = I
and ⟨Cf, Cg⟩ = ⟨g, f ⟩. In particular, we can get our original Hilbert space
back if we consider Re(f ) = 21 (f + Cf ) = (f1 , 0).
Problem 1.23. Which of the following bilinear forms are scalar products on
Rn ?
(i) s(x, y) := nj=1 (xj + yj ).
P
is finite. Show
∥q∥ ≤ ∥s∥ ≤ 2∥q∥
with ∥q∥ = ∥s∥ if s is symmetric. (Hint: Use the polarization identity from
the previous problem. For the symmetric case look at the real part.)
Problem* 1.29. Suppose Q is a vector space. Let s(f, g) be a sesquilinear
form on Q and q(f ) := s(f, f ) the associated quadratic form. Show that the
Cauchy–Schwarz inequality
|s(f, g)| ≤ q(f )1/2 q(g)1/2
holds if q(f ) ≥ 0. In this case q(.)1/2 satisfies the triangle inequality and
hence is a seminorm.
(Hint: Consider 0 ≤ q(f + αg) = q(f ) + 2Re(α s(f, g)) + |α|2 q(g) and
choose α = t s(f, g)∗ /|s(f, g)| with t ∈ R.)
Problem* 1.30. Prove the claims made about fn in Example 1.11.
26 1. A first look at Banach and Hilbert spaces
1.4. Completeness
Since L2cont (I) is not complete, how can we obtain a Hilbert space from it?
Well, the answer is simple: take the completion.
If X is an (incomplete) normed space, consider the set of all Cauchy
sequences X . Call two Cauchy sequences equivalent if their difference con-
verges to zero and denote by X̄ the set of all equivalence classes. It is easy
to see that X̄ (and X ) inherit the vector space structure from X. Moreover,
Proof. (Outline) To see that constant sequences are dense, note that we
can approximate [(xn )∞n=1 ] by the constant sequence [(xn0 )n=1 ] as n0 → ∞.
∞
sequence in X̄. Without loss of generality (by dropping terms) we can choose
the representatives xn,j such that |xn,j − xn,k | ≤ n1 for j, k ≥ n. Then it is
not hard to see that ξ = [(xj,j )∞
j=1 ] is its limit. □
The only requirement for a norm which is not immediate is the triangle
inequality (except for p = 1, 2) but this can be shown as for ℓp (cf. Prob-
lem 1.33). ⋄
Problem 1.31. Provide a detailed proof of Theorem 1.10.
Problem 1.32. For every f ∈ L1 (I) we can define its integral
Z d
f (x)dx
c
1.5. Compactness
In analysis, compactness is one of the most ubiquitous tools for showing
existence of solutions for various problems. In finite dimensions relatively
compact sets are easily identified as they are precisely the bounded sets by
the Heine–Borel theorem (Theorem B.22). In the infinite dimensional case
the situation is more complicated. Before we look into this, please recall
that for a subset U of a Banach space (or more generally a complete metric
space) the following are equivalent (see Corollary B.20 and Lemma B.26):
• U is relatively compact (i.e. its closure is compact)
• every sequence from U has a convergent subsequence
• U is totally bounded (i.e. it has a finite ε-cover for every ε > 0)
Example 1.14. Consider the bounded sequence (δ n )∞
n=1 in ℓ (N). Since
p
n m
∥δ − δ ∥p = 2 1/p for n ̸= m, there is no way to extract a convergent
subsequence. ⋄
28 1. A first look at Banach and Hilbert spaces
{Bε (xj )}nj=1 is an ε-cover for K since Pε−1 (Bδ (yj )) ∩ K ⊆ Bε (xj ).
For the last claim consider Pε/3 and note that for δ := ε/3 we have
∥x − y∥ ≤ ∥(1 − Pε/3 )x∥ + ∥Pε/3 (x − y)∥ + ∥(1 − Pε/3 )y∥ < ε for x, y ∈ K. □
Proof. Clearly (i) and (ii) is what is needed for Lemma 1.11.
Conversely, if K is relatively compact it is bounded. Moreover, given
δ we can choose a finite δ-cover {Bδ (aj )}m j=1 for K and some n such that
∥(1 − Pn )a ∥p ≤ δ for all 1 ≤ j ≤ m (this last claim fails for ℓ∞ (N)). Now
j
is finite. This says that A is bounded if the image of the closed unit ball
B̄1X (0) ∩ D(A) is contained in some closed ball B̄rY (0) of finite radius r (with
the smallest radius being the operator norm). Hence A is bounded if and
only if it maps bounded sets to bounded sets.
Note that if you replace the norm on X or Y , then the operator norm
will of course also change in general. However, if the norms are equivalent
so will be the operator norms.
By construction, a bounded operator satisfies
∥Ax∥Y ≤ ∥A∥∥x∥X , x ∈ D(A), (1.61)
1.6. Bounded operators 31
and hence is Lipschitz25 continuous, that is, ∥Ax − Ay∥Y ≤ ∥A∥∥x − y∥X for
x, y ∈ D(A). Note that ∥A∥ could also be defined as the optimal constant
in the inequality (1.61). In particular, it is continuous. The converse is also
true:
Proof. Choose a basis {xj }nj=1 for D(A) such that every x ∈ D(A) can be
written as x = nj=1 αj xj . By Theorem 1.8 there is a constant m > 0 such
P
v
n
u n
X uX
∥Ax∥Y ≤ |αj |∥Axj ∥Y ≤ mt ∥Axj ∥2Y ∥x∥X
j=1 j=1
Pn
and thus ∥A∥ ≤ m( 2 1/2 .
j=1 ∥Axj ∥Y ) □
However, if we consider A = dx d
: D(A) ⊆ Y → Y defined on D(A) =
C [0, 1], then we have an unbounded operator. Indeed, choose un (x) :=
1
called the dual space of X. The dual space takes the role of coordinate
functions in a Banach space.
Example 1.19. Let X be a finite dimensional space and {uj }nj=1 a basis.
Then every x ∈ X can be uniquely written as x = nj=1 αj uj and we can
P
consider the dual functionals defined via u∗j (x) := αj for 1 ≤ j ≤ n. The
biorthogonal system {u∗j }nj=1 (which are continuous by Lemma 1.15) form
a dual
Pn basis since any other linear functional ℓ ∈ X ∗ can be written as
ℓ = j=1 ℓ(uj )u∗j . In particular, X and X ∗ have the same dimension. ⋄
Example 1.20. Let X := ℓp (N). Then the coordinate functions
ℓj (a) := aj
are bounded linear functionals: |ℓj (a)| = |aj | ≤ ∥a∥p and hence ∥ℓj ∥ = 1
(since equality is attained for a = δ j ). More general, let b ∈ ℓq (N) where
p + q = 1. Then
1 1
∞
X
ℓb (a) := bj aj
j=1
is a linear functional with norm ∥ℓg ∥ = ∥g∥1 . Indeed, first of all note that
Z b Z b
|ℓg (f )| ≤ |g(x)f (x)|dx ≤ ∥f ∥∞ |g(x)|dx
a a
34 1. A first look at Banach and Hilbert spaces
In particular, note that the dual space X ∗ is always a Banach space, even
if X is not complete. Moreover, by Theorem 1.16 the completion X̄ satisfies
X̄ ∗ = X ∗ .
The Banach space of bounded linear operators L (X) even has a multi-
plication given by composition. Clearly, this multiplication is distributive
(A+B)C = AC +BC, A(B +C) = AB +BC, A, B, C ∈ L (X), (1.62)
and associative
(AB)C = A(BC), α (AB) = (αA)B = A (αB), α ∈ C. (1.63)
Moreover, it is easy to see that we have
∥AB∥ ≤ ∥A∥∥B∥. (1.64)
In other words, L (X) is a so-called Banach algebra. However, note that
our multiplication is not commutative (unless X is one-dimensional). We
even have an identity, the identity operator I, satisfying ∥I∥ = 1.
Problem 1.39. Show that two norms on X are equivalent if and only if they
give rise to the same convergent sequences.
Problem 1.40. Show that a finite dimensional subspace M ⊆ X of a normed
space is closed.
Problem 1.41. Consider X = Cn and let A ∈ L (X) be a matrix. Equip
X with the norm (show that this is a norm)
∥x∥∞ := max |xj |
1≤j≤n
1.6. Bounded operators 35
and compute the operator norm ∥A∥ with respect to this norm in terms of
the matrix entries. Do the same with respect to the norm
X
∥x∥1 := |xj |.
1≤j≤n
Problem* 1.46. Let I be a compact interval. Show that the set of dif-
ferentiable functions C 1 (I) becomes a Banach space if we set ∥f ∥∞,1 :=
maxx∈I |f (x)| + maxx∈I |f ′ (x)|.
Problem* 1.47. Show that ∥AB∥ ≤ ∥A∥∥B∥ for every A, B ∈ L (X).
Conclude that the multiplication is continuous: An → A and Bn → B imply
An Bn → AB.
Problem 1.48. Let A ∈ L (X) be a bijection. Show
∥A−1 ∥−1 = inf ∥Af ∥.
x∈X,∥x∥=1
Problem* 1.49. Suppose B ∈ L (X) with ∥B∥ < 1. Then I+B is invertible
with
X∞
−1
(I + B) = (−1)n B n .
n=0
36 1. A first look at Banach and Hilbert spaces
exists and defines a bounded linear operator. Moreover, if f and g are two
such functions and α ∈ C, then
(f + g)(A) = f (A) + g(A), (αf )(A) = αf (a), (f g)(A) = f (A)g(A).
(Hint: Problem 1.6.)
Problem* 1.51. Show that a linear map ℓ : X → C is continuous if and
only if its kernel is closed. (Hint: If ℓ is not continuous, we can find a
sequence of normalized vectors xn with |ℓ(xn )| → ∞ and a vector y with
ℓ(y) = 1.)
Problem 1.52. Show that the norm of a nontrivial linear functional ℓ ∈ X ∗
equals the reciprocal of the distance of the hyperplane ℓ(x) = 1 to the origin:
1
∥ℓ∥ = .
inf{∥x∥|ℓ(x) = 1}
It is not hard to see that this identification is bijective and preserves the
norm (Problem 1.55).
Lemma 1.19. Let Xj , j = 1, . . . , n, be Banach spaces. Then ( np,j=1 Xj )∗ ∼
L
=
Ln ∗ 1 1
q,j=1 Xj , where p + q = 1.
Proof. First of all we need to show that (1.67) is indeed a norm. If ∥[x]∥ = 0
we must have a sequence yj ∈ M with yj → −x and since M is closed we
conclude x ∈ M , that is [x] = [0] as required. To see ∥α[x]∥ = |α|∥[x]∥ we
use again the definition
Thus (1.67) is a norm and it remains to show that X/M is complete if X is.
To this end let [xn ] be a Cauchy sequence. Since it suffices to show that some
subsequence has a limit, we can assume ∥[xn+1 ]−[xn ]∥ < 2−n without loss of
generality. Moreover, by definition of (1.67) we can chose the representatives
xn such that ∥xn+1 − xn ∥ < 2−n (start with x1 and then chose the remaining
ones inductively). By construction xn is a Cauchy sequence which has a limit
x ∈ X since X is complete. Moreover, by ∥[xn ]−[x]∥ = ∥[xn −x]∥ ≤ ∥xn −x∥
we see that [x] is the limit of [xn ]. □
Note that the space Cbk (I) could be further refined by requiring the
highest derivatives to be Hölder continuous. Recall that a function f : I → C
is called uniformly Hölder continuous with exponent γ ∈ (0, 1] if
|f (x) − f (y)|
[f ]γ := sup (1.69)
x̸=y∈I |x − y|γ
is finite. Clearly, any Hölder continuous function is uniformly continuous
and, in the special case γ = 1, we obtain the Lipschitz continuous func-
tions. Note that for γ = 0 the Hölder condition boils down to boundedness
and also the case γ > 1 is not very interesting (Problem 1.63).
Example 1.26. By the mean value theorem every function f ∈ Cb1 (I) is
Lipschitz continuous with [f ]γ ≤ ∥f ′ ∥∞ . ⋄
42 1. A first look at Banach and Hilbert spaces
y γ − xγ 1 − tγ 1−t
γ
≤ γ
≤ = 1.
(y − x) (1 − t) 1−t
From this one easily gets further examples since the composition of two
Hölder continuous functions is again Hölder continuous (the exponent being
the product). ⋄
It is easy to verify that this is a seminorm and that the corresponding
space is complete.
Theorem 1.23. Let I ⊆ R be an interval. The space Cbk,γ (I) of all functions
whose partial derivatives up to order k are bounded and Hölder continuous
with exponent γ ∈ (0, 1] form a Banach space with norm
that
|gm (x) − gm (y)| |gm (x) − gm (y)|
[gm ]γ2 = sup γ
+ sup
x̸=y∈I:|x−y|≥ε |x − y| 2 x̸=y∈I:|x−y|<ε |x − y|γ2
≤ 2∥gm ∥∞ ε−γ2 + [gm ]γ1 εγ1 −γ2 ≤ 2∥gm ∥∞ ε−γ2 + 2Cεγ1 −γ2 ,
implying lim supm→∞ [gm ]γ2 ≤ 2Cεγ1 −γ2 and since ε > 0 is arbitrary this
establishes the claim. □
As pointed out in Example 1.26, the embedding Cb1 (I) ⊆ Cb0,1 (I) is
continuous and combining this with the previous result immediately gives
Corollary 1.25. Suppose I ⊂ R is a compact interval, k1 , k2 ∈ N0 , and
0 ≤ γ1 , γ2 ≤ 1. Then C k2 ,γ2 (I) ⊆ C k1 ,γ1 (I) for k1 + γ1 ≤ k2 + γ2 with the
embeddings being compact if the inequality is strict.
For now continuous functions on intervals will be sufficient for our pur-
pose. However, once we delve deeper into the subject we will also need
continuous functions on topological spaces X. Luckily most of the results
extend to this case in a more or less straightforward way. If you are not
familiar with these extensions you can find them in Section B.8.
Problem 1.63. Let I be an interval. Suppose f : I → C is Hölder continu-
ous with exponent γ > 1. Show that f is constant.
Problem 1.64. Let I := [a, b] be a compact interval and consider C 1 (I).
Which of the following is a norm? In case of a norm, is it equivalent to
∥.∥1,∞ ?
(i) ∥f ∥∞
(ii) ∥f ′ ∥∞
(iii) |f (a)| + ∥f ′ ∥∞
(iv) |f (a) − f (b)| + ∥f ′ ∥∞
Rb
(v) a |f (x)|dx + ∥f ∥∞
Problem* 1.65. Suppose X is a vector space Pand ∥.∥j , 1 ≤ j ≤ n, is a
finite family of seminorms. Show that ∥x∥ := nj=1 ∥x∥j is a seminorm. It
is a norm if and only if ∥x∥j = 0 for all j implies x = 0.
Problem 1.66. Let I be a compact interval. Show that the product of two
bounded Hölder continuous functions is again Hölder continuous with
[f g]γ ≤ ∥f ∥∞ [g]γ + [f ]γ ∥g∥∞ .
Chapter 2
Hilbert spaces
45
46 2. Hilbert spaces
computes
with equality holding if and only if f lies in the span of {uj }nj=1 .
Of course, since we cannot assume H to be a finite dimensional vec-
tor space, we need to generalize Lemma 2.1 to arbitrary orthonormal sets
{uj }j∈J . We start by assuming that J is countable. Then Bessel’s inequality
(2.4) shows that
X
|⟨uj , f ⟩|2 (2.5)
j∈J
by the Pythagorean theorem and thus j∈J ⟨uj , f ⟩uj is a Cauchy sequence
P
if and only if j∈J |⟨uj , f ⟩|2 is. Now let J be arbitrary. Again, Bessel’s
P
inequality shows that for any given ε > 0 there are at most finitely many
j for which |⟨uj , f ⟩| ≥ ε (namely at most ∥f ∥/ε). Hence there are at most
countably many j for which |⟨uj , f ⟩| > 0. Thus it follows that
X
|⟨uj , f ⟩|2 (2.7)
j∈J
is well defined (as a countable sum over the nonzero terms) and (by com-
pleteness) so is
X
⟨uj , f ⟩uj . (2.8)
j∈J
Furthermore, it is also independent of the order of summation.
In particular, by continuity of the scalar product we see that Lemma 2.1
can be generalized to arbitrary orthonormal sets.
Theorem 2.2. Suppose {uj }j∈J is an orthonormal set in a Hilbert space H.
Then every f ∈ H can be written as
X
f = f∥ + f⊥ , f∥ := ⟨uj , f ⟩uj , (2.9)
j∈J
Proof. The first part follows as in Lemma 2.1 using continuity of the scalar
product. The same is true for the lastP part except for the fact that every
f ∈ span{uj }j∈J can be written as f = j∈J αj uj (i.e., f = f∥ ). To see this,
let fn ∈ span{uj }j∈J converge to f . Then ∥f −fn ∥2 = ∥f∥ −fn ∥2 +∥f⊥ ∥2 → 0
implies fn → f∥ and f⊥ = 0. □
Note that from Bessel’s inequality (which of course still holds), it follows
that the map f → f∥ is continuous.
Of course we are
P particularly interested in the case where every f ∈ H
can be written as j∈J ⟨uj , f ⟩uj . In this case we will call the orthonormal
set {uj }j∈J an orthonormal basis (ONB).
If H is separable it is easy to construct an orthonormal basis. In fact, if
H is separable, then there exists a countable total set {fj }N
j=1 . Here N ∈ N
if H is finite dimensional and N = ∞ otherwise. After throwing away some
vectors, we can assume that fn+1 cannot be expressed as a linear combination
of the vectors f1 , . . . , fn . Now we can construct an orthonormal set as
follows: We begin by normalizing f1 :
f1
u1 := . (2.12)
∥f1 ∥
48 2. Hilbert spaces
for any finite n and thus also for n = N (if N = ∞). Since {fj }N j=1 is total,
so is {uj }j=1 . Now suppose there is some f = f∥ + f⊥ ∈ H for which f⊥ ̸= 0.
N
Since {uj }N ˆ ˆ
j=1 is total, we can find a f in its span such that ∥f − f ∥ < ∥f⊥ ∥,
contradicting (2.11). Hence we infer that {uj }N j=1 is an orthonormal basis.
By continuity of the norm it suffices to check (iii), and hence also (ii),
for f in a dense set. In fact, by the inverse triangle inequality for ℓ2 (N) and
the Bessel inequality we have
X X sX sX
2 2 2
|⟨uj , f ⟩| − |⟨uj , g⟩| ≤ |⟨uj , f − g⟩| |⟨uj , f + g⟩|2
j∈J j∈J j∈J j∈J
≤ ∥f − g∥∥f + g∥ (2.19)
implying |⟨uj , fn ⟩|2 → |⟨uj , f ⟩|2 if fn → f .
P P
j∈J j∈J
It is not surprising that if there is one countable basis, then it follows
that every other basis is countable as well.
Theorem 2.5. In a Hilbert space H every orthonormal basis has the same
cardinality.
Proof. Let {uj }j∈J and {vk }k∈K be two orthonormal bases. We first look
at the case where one of them, say the first, is finite dimensional: J =
{1, . . . , n}. Suppose
P the other basis has at least n elements {1, . . . , n} ⊆
K. Then vk = nj=1 Uk,j uj , where Uk,j := ⟨uj , vk ⟩. By δj,k = ⟨vj , vk ⟩ =
Pn Pn
l=1 Uj,l Uk,l we see k=1 Uk,j vk = uj showing that v1 , . . . , vn span H and
∗ ∗
Now let us turn to the case where both J and K are infinite. Set Kj =
̸ 0}. Since these are the expansion coefficients of uj with
{k ∈ K|⟨vk , uj ⟩ =
respect to {vk }k∈K , this set is countable (and nonempty). Hence the set
K̃ = j∈J Kj satisfies |K̃| ≤ |J × N| = |J| (Theorem A.9). But k ∈ K \ K̃
S
⟨uj , f ⟩. In particular,
Theorem 2.6. Any separable infinite dimensional Hilbert space is unitarily
equivalent to ℓ2 (N).
Of course the same argument shows that every finite dimensional Hilbert
space of dimension n is unitarily equivalent to Cn with the usual scalar
product.
Finally we briefly turn to the case where H is not separable.
Theorem 2.7. Every Hilbert space has an orthonormal basis.
Proof. To prove this we need to resort to Zorn’s lemma (Theorem A.2): The
collection of all orthonormal sets in H can be partially ordered by inclusion.
Moreover, every linearly ordered chain has an upper bound (the union of all
sets in the chain). Hence Zorn’s lemma implies the existence of a maximal
element, that is, an orthonormal set which is not a proper subset of every
other orthonormal set. This maximal element is an ONB by Theorem 2.4
(i). □
2.1. Orthonormal bases 51
Note that |M (f )| ≤ ∥f ∥∞ .
Next one can show that
⟨f, g⟩ := M (f ∗ g)
defines a scalar product on AP (R). To see that it is positive definite (all other
properties are straightforward), let f ∈ AP (R) with ∥f ∥2 = M (|f |2 ) = 0.
Choose a sequence of trigonometric polynomials fn with ∥f − fn ∥∞ → 0. By
∥f ∥ ≤ ∥f ∥∞ we also have ∥f −fn ∥ → 0. Moreover, by the triangle inequality
(which holds for any nonnegative sesquilinear form — Problem 1.29) we have
∥fn ∥ ≤ ∥f ∥ + ∥f − fn ∥ = ∥f − fn ∥ ≤ ∥f − fn ∥∞ → 0, and thus f = 0.
Abbreviating eθ (t) = eiθt we see that {eθ }θ∈R is an uncountable orthonor-
mal set and
f (t) 7→ fˆ(θ) := ⟨eθ , f ⟩ = M (e−θ f )
maps AP (R) isometrically (with respect to ∥.∥) into ℓ2 (R). This map is
however not surjective (take e.g. a Fourier series which converges in mean
square but not uniformly — see later) and hence AP (R) is not complete
with respect to ∥.∥. ⋄
52 2. Hilbert spaces
Problem 2.10. Compute PK for the closed unit ball K := B̄1 (0).
Note that if {uk }k∈K ⊆ H1 and {vj }j∈J ⊆ H2 are some orthogonal bases,
then the matrix elements Aj,k := ⟨vj , Auk ⟩H2 for all (j, k) ∈ J × K uniquely
determine ⟨g, Af ⟩H2 for arbitrary f ∈ H1 , g ∈ H2 (just expand f, g with
respect to these bases) and thus A by our theorem.
Example 2.5. Consider ℓ2 (N) and let A ∈ L (ℓ2 (N)) be some bounded
operator. Let Ajk = ⟨δ j , Aδ k ⟩ be its matrix elements such that
∞
X
(Aa)j = Ajk ak .
k=1
Since
P∞ Ajk are the expansion coefficients of A∗ δ j (see (2.28) below), we have
k=1 |Ajk | = ∥A δ ∥ and the sum is even absolutely convergent.
2 ∗ j 2 ⋄
Moreover, for A ∈ L (H) the polarization identity (Problem 1.27) implies
that A is already uniquely determined by its quadratic form qA (f ) := ⟨f, Af ⟩.
As a first application we introduce the adjoint operator via Lemma 2.12
as the operator associated with the sesquilinear form s(f, g) := ⟨Af, g⟩H2 .
Theorem 2.13. Let H1 , H2 be Hilbert spaces. For every bounded operator
A ∈ L (H1 , H2 ) there is a unique bounded operator A∗ ∈ L (H2 , H1 ) defined
via
⟨f, A∗ g⟩H1 = ⟨Af, g⟩H2 . (2.28)
Proof. (i) is obvious. (ii) follows from ⟨g, A∗∗ f ⟩H2 = ⟨A∗ g, f ⟩H1 = ⟨g, Af ⟩H2 .
(iii) follows from ⟨g, (CA)f ⟩H3 = ⟨C ∗ g, Af ⟩H2 = ⟨A∗ C ∗ g, f ⟩H1 . (iv) follows
using (2.27) from
∥A∗ ∥ = sup |⟨f, A∗ g⟩H1 | = sup |⟨Af, g⟩H2 |
∥f ∥H1 =∥g∥H2 =1 ∥f ∥H1 =∥g∥H2 =1
and
∥A∗ A∥ = sup |⟨f, A∗ Ag⟩H1 | = sup |⟨Af, Ag⟩H2 |
∥f ∥H1 =∥g∥H2 =1 ∥f ∥H1 =∥g∥H2 =1
where we have used that |⟨Af, Ag⟩H2 | attains its maximum when Af and Ag
are parallel (compare Theorem 1.5). Finally, ∥AA∗ ∥ = ∥A∗∗ A∗ ∥ = ∥A∗ ∥2 =
∥A∥2 . □
Note that ∥A∥ = ∥A∗ ∥ implies that taking adjoints is a continuous op-
eration. For later use also note that (Problem 2.15)
Ker(A∗ ) = Ran(A)⊥ . (2.29)
For the remainder of this section we restrict to the case of one Hilbert
space. A sesquilinear form s : H×H → C is called nonnegative if s(f, f ) ≥ 0
and it is called coercive if
Re(s(f, f )) ≥ ε∥f ∥2 , ε > 0. (2.30)
We will call A ∈ L (H) nonnegative, coercive if its associated sesquilinear
form is. We will write A ≥ 0 if A is nonnegative and A ≥ B if A − B ≥ 0.
Observe that nonnegative operators are self-adjoint (as their quadratic forms
are real-valued — here it is important that the underlying space is complex;
in case of a real space a nonnegative form is required to be symmetric).
Example 2.12. For any operator A the operators A∗ A and AA∗ are both
nonnegative. In fact ⟨f, A∗ Af ⟩ = ⟨Af, Af ⟩ = ∥Af ∥2 ≥ 0 and similarly
⟨f, AA∗ f ⟩ = ∥A∗ f ∥2 ≥ 0. ⋄
2.3. Operators defined via forms 59
Lemma 2.16. Suppose A ∈ L (H) satisfies ∥Af ∥ ≥ ε∥f ∥ for some ε > 0.
Then Ran(A) is closed and A : H → Ran(A) is a bijection with bounded
inverse, ∥A−1 ∥ ≤ 1ε . If we have the stronger condition |⟨f, Af ⟩| ≥ ε∥f ∥2 ,
then Ran(A) = H.
In particular, this shows A ≥ 0. Moreover, we have |sA (a, b)| ≤ 4∥a∥2 ∥b∥2
or equivalently ∥A∥ ≤ 4.
Next, let
(Qa)j = qj aj
for some sequence q ∈ ℓ∞ (N). Then
∞
X
sQ (a, b) = qj a∗j bj
j=1
and |sQ (a, b)| ≤ ∥q∥∞ ∥a∥2 ∥b∥2 or equivalently ∥Q∥ ≤ ∥q∥∞ . If in addition
qj ≥ ε > 0, then sA+Q (a, b) = sA (a, b) + sQ (a, b) satisfies the assumptions of
the Lax–Milgram theorem and
(A + Q)a = b
has a unique solution a = (A + Q)−1 b for every given b ∈ ℓ2 (N). Moreover,
since (A + Q)−1 is bounded, this solution depends continuously on b. ⋄
Problem* 2.11. Let H1 , H2 be Hilbert spaces and let u ∈ H1 , v ∈ H2 . Show
that the operator
Af := ⟨u, f ⟩v
is bounded and compute its norm. Compute the adjoint of A.
Problem 2.12. Show that under the assumptions of Problem 1.50 one has
f (A)∗ = f # (A∗ ) where f # (z) = f (z ∗ )∗ .
Problem* 2.13. Prove (2.27). (Hint: Use ∥f ∥ = sup∥g∥=1 |⟨g, f ⟩| — com-
pare Theorem 1.5.)
Problem 2.14. Suppose A ∈ L (H1 , H2 ) has a bounded inverse A−1 ∈
L (H2 , H1 ). Show (A−1 )∗ = (A∗ )−1 .
Problem* 2.15. Show (2.29).
Problem* 2.16. Show that every operator A ∈ L (H) can be written as the
linear combination of two self-adjoint operators Re(A) := 12 (A + A∗ ) and
Im(A) := 2i1 (A − A∗ ). Moreover, every self-adjoint operator can be written
as a linear combination√ of two unitary operators. (Hint: For the last part
consider f± (z) = z ± i 1 − z 2 and Problems 1.50, 2.12.)
2.4. Orthogonal sums and tensor products 61
Similarly, if H and H̃ are two Hilbert spaces, we define their tensor prod-
uct as follows: The elements should be products f ⊗ f˜ of elements f ∈ H
and f˜ ∈ H̃. Hence we start with the set of all finite linear combinations of
elements of H × H̃
Xn
F(H, H̃) := { αj (fj , f˜j )|(fj , f˜j ) ∈ H × H̃, αj ∈ C}. (2.35)
j=1
62 2. Hilbert spaces
and write f ⊗ f˜ for the equivalence class of (f, f˜). By construction, every
element in this quotient space is a linear combination of elements of the type
f ⊗ f˜.
Next, we want to define a scalar product such that
⟨f ⊗ f˜, g ⊗ g̃⟩ = ⟨f, g⟩H ⟨f˜, g̃⟩ H̃ (2.37)
holds. To this end we set
Xn n
X n
X
s( αj (fj , f˜j ), βk (gk , g̃k )) = αj∗ βk ⟨fj , gk ⟩H ⟨f˜j , g̃k ⟩H̃ , (2.38)
j=1 k=1 j,k=1
is a symmetric sesquilinear form on F(H, H̃)/N (H, H̃). To show that this is in
fact a scalar product, we need to ensure positivity. Let f = i αi fi ⊗ f˜i ̸= 0
P
and pick orthonormal bases uj , ũk for span{fi }, span{f˜i }, respectively. Then
X X
f= αjk uj ⊗ ũk , αjk = αi ⟨uj , fi ⟩H ⟨ũk , f˜i ⟩H̃ (2.40)
j,k i
and we compute X
⟨f, f ⟩ = |αjk |2 > 0. (2.41)
j,k
The completion of F(H, H̃)/N (H, H̃) with respect to the induced norm is
called the tensor product H ⊗ H̃ of H and H̃.
Lemma 2.18. If uj , ũk are orthonormal bases for H, H̃, respectively, then
uj ⊗ ũk is an orthonormal basis for H ⊗ H̃.
Example 2.17. A quantum mechanical particle which can only attain two
possible states is called a qubit. Its state space is accordingly C2 and the
two states, usually written as |0⟩ and |1⟩, are an orthonormal basis for C2 .
The state space for two qubits is given by the tensor product C2 ⊗ C2 ∼ = C4 .
An orthonormal basis is given by |00⟩ := |0⟩ ⊗ |0⟩, |01⟩ := |0⟩ ⊗ |1⟩, |10⟩ :=
|1⟩ ⊗ |0⟩, and |11⟩ := |1⟩ ⊗ |1⟩. The state space of n qubits is the n fold
n
tensor product of C2 (isomorphic to C2 ). ⋄
Example 2.18. P We have ℓ2 (N) ⊗ ℓ2 (N) = ℓ2 (N × N) by virtue of the identi-
fication (ajk ) 7→ jk ajk δ j ⊗ δ k where δ j is the standard basis for ℓ2 (N). In
fact, this follows from the previous lemma as in the proof of Theorem 2.6. ⋄
It is straightforward to extend the tensor product to any finite number
of Hilbert spaces. We even note
M∞ ∞
M
( Hj ) ⊗ H = (Hj ⊗ H), (2.42)
j=1 j=1
where equality has to be understood in the sense that both spaces are uni-
tarily equivalent by virtue of the identification
∞
X ∞
X
( fj ) ⊗ f = fj ⊗ f. (2.43)
j=1 j=1
3
D3 (x)
2
D2 (x)
1
D1 (x)
−π π
the present section and then come back later to this when we have further
tools at our disposal.
For our purpose the complex form
Z π
X 1
S(f )(x) = ˆ ikx
fk e , ˆ
fk := e−iky f (y)dy (2.46)
2π −π
k∈Z
where
n
X sin((n + 1/2)x)
Dn (x) = eikx = (2.48)
sin(x/2)
k=−n
is known as the Dirichlet kernel7 (to obtain the second form observe that
the left-hand side is a geometric series). Note that Dn (−x) = Dn (x) and
that |Dn (x)| has a global
R π maximum Dn (0) = 2n + 1 at x = 0. Moreover, by
Sn (1) = 1 we see that −π Dn (x)dx = 2π.
Since Z π
e−ikx eilx dx = 2πδk,l (2.49)
−π
the functions ek (x) := (2π)−1/2 eikx are orthonormal in L2 (−π, π) and hence
the Fourier series is just the expansion with respect to this orthogonal set.
Hence we obtain
Theorem 2.19. For every square integrable function f ∈ L2 (−π, π), the
Fourier coefficients fˆk are square summable
Z π
X 1
|fˆk |2 = |f (x)|2 dx (2.50)
2π −π
k∈Z
Proof. To show this theorem it suffices to show that the functions ek form
a basis. This will follow from Theorem 2.22 below (see the discussion after
this theorem). It will also follow as a special case of Theorem 3.11 below
(see the examples after this theorem) as well as from the Stone–Weierstraß
theorem — Problem 2.27. □
This gives a satisfactory answer in the Hilbert space L2 (−π, π) but does
not answer the question about pointwise or uniform convergence. The latter
will be the case if the Fourier coefficients are summable. First of all we note
that for integrable functions the Fourier coefficients will at least tend to zero.
Lemma 2.20 (Riemann–Lebesgue lemma). Suppose f ∈ L1 (−π, π), then
the Fourier coefficients fˆk converge to zero as |k| → ∞.
Proof. By our previous theorem this holds for continuous functions. But the
map f → fˆ is bounded from C[−π, π] ⊂ L1 (−π, π) to c0 (Z) (the sequences
vanishing as |k| → ∞) since |fˆk | ≤ (2π)−1 ∥f ∥1 and there is a unique exten-
sion to all of L1 (−π, π). □
It turns out that this result is best possible in general and we cannot say
more about the decay without additional assumptions on f . For example, if
f is periodic of period 2π and continuously differentiable, then integration
by parts shows Z π
1
ˆ
fk = e−ikx f ′ (x)dx. (2.51)
2πik −π
Then, since both k −1 and the Fourier coefficients of f ′ are square summa-
ble, we conclude that fˆ is absolutely summable and hence the Fourier series
converges uniformly. So we have a simple sufficient criterion for summa-
bility of the Fourier coefficients, but can we do better? Of course conti-
nuity of f is a necessary condition for absolute summability but this alone
will not even be enough for pointwise convergence as we will see in Exam-
ple 4.3. Moreover, continuity will not tell us more about the decay of the
66 2. Hilbert spaces
Fourier coefficients than what we already know in the integrable case from
the Riemann–Lebesgue lemma (see Example 4.4).
A few improvements are easy: (2.51) holds for any class of functions
for which integration by parts holds, e.g., piecewise continuously differen-
tiable functions or, slightly more general, absolutely continuous functions
(cf. Lemma 4.30 from [37]) provided one assumes that the derivative is
square integrable. However, for an arbitrary absolutely continuous func-
tion the Fourier coefficients might not be absolutely summable: For an
absolutely continuous function f we have a derivative which is integrable
(Theorem 4.29 from [37]) and hence the above formula combined with the
Riemann–Lebesgue lemma implies fˆk = o( k1 ). But on the other hand we
can choose an absolutely summable sequence ck which does not obey this
asymptotic requirement, say ck = k1 for k = l2 and ck = 0 else. Then
X X 1 2
f (x) := ck eikx = eil x (2.52)
l2
k∈Z l∈N
X
|fˆk | ≤ Cγ ∥f ∥0,γ .
k∈Z
Proof. The proof starts with the observation that the Fourier coefficients of
fδ (x) := f (x−δ) are fˆk = e−ikδ fˆk . Now for δ := 2π
3 2
−m and 2m ≤ |k| < 2m+1
we have |e − 1| ≥ 3 implying
ikδ 2
Z π
X
ˆ 2 1 X ikδ 2 ˆ 2 1
|fk | ≤ |e − 1| |fk | = |fδ (x) − f (x)|2 dx
m m+1
3 6π −π
2 ≤|k|<2 k
1
≤ [f ]2γ δ 2γ
3
Now the sum on the left has 2 · 2m terms and hence Cauchy–Schwarz implies
r γ
X 2(m+1)/2 2 2π
|fˆk | ≤ √ [f ]γ δ γ = 2(1/2−γ)m [f ]γ .
m m+1 3 3 3
2 ≤|k|<2
F3 (x)
F2 (x)
F1 (x)
1
−π π
provided γ > 1
2 and establishes the claim since |fˆ0 | ≤ ∥f ∥∞ . □
Note however, that the situation looks much brighter if one looks at mean
values
n−1 Z π
1X 1
S̄n (f )(x) := Sk (f )(x) = Fn (x − y)f (y)dy, (2.53)
n 2π −π
k=0
where
n−1 2
1X 1 sin(nx/2)
Fn (x) = Dk (x) = (2.54)
n n sin(x/2)
k=0
is the Fejér kernel.9 To see the second form we use the closed form for the
Dirichlet kernel to obtain
n−1
X sin((k + 1/2)x) n−1
1 X
nFn (x) = = Im ei(k+1/2)x
sin(x/2) sin(x/2)
k=0 k=0
inx sin(nx/2) 2
1 ix/2 e −1 1 − cos(nx)
= Im e = = .
sin(x/2) eix − 1 2 sin(x/2)2 sin(x/2)
The main differenceR to the Dirichlet kernel is positivity: Fn (x) ≥ 0. Of
π
course the property −π Fn (x)dx = 2π is inherited from the Dirichlet kernel.
In particular, this shows that the functions {ek }k∈Z are total in Cper [−π, π]
(continuous periodic functions) and hence also in Lp (−π, π) for 1 ≤ p < ∞
(Problem 2.26).
Note that for a given continuous function f this result shows that if
Sn (f )(x) converges, then it must converge to S̄n (f )(x) = f (x). We also
remark that one can extend this result (see Lemma 3.21 from [37]) to show
that for f ∈ Lp (−π, π), 1 ≤ p < ∞, one has S̄n (f ) → f in the sense of Lp .
As a consequence note that the Fourier coefficients uniquely determine f for
integrable f (for square integrable f this follows from Theorem 2.19).
Finally, we look at pointwise convergence.
Theorem 2.23. Suppose
f (x) − f (x0 )
(2.55)
x − x0
is integrable (e.g. f is Hölder continuous), then
Xn
lim fˆk eikx0 = f (x0 ). (2.56)
m,n→∞
k=−m
0,γ
Problem 2.25. Show that if f ∈ Cper [−π, π] is Hölder continuous (cf.
(1.69)), then
[f ]γ π γ
ˆ
|fk | ≤ , k ̸= 0.
2 |k|
(Hint: What changes if you replace e−iky by e−ik(y+π/k) in (2.46)? Now make
a change of variables y → y − π/k in the integral.)
Problem 2.26. Show that Cper [−π, π] is dense in Lp (−π, π) for 1 ≤ p < ∞.
Problem 2.27. Show that the functions ek (x) := √12π eikx , k ∈ Z, form an
orthonormal basis for H = L2 (−π, π). (Hint: Start with K = [−π, π] where
−π and π are identified and use the Stone–Weierstraß theorem.)
Chapter 3
Compact operators
Typically, linear operators are much more difficult to analyze than matrices
and many new phenomena appear which are not present in the finite dimen-
sional case. So we have to be modest and slowly work our way up. A class
of operators which still preserves some of the nice properties of matrices is
the class of compact operators to be discussed in this chapter.
71
72 3. Compact operators
Proof. Let fj0 be a bounded sequence. Choose a subsequence fj1 such that
A1 fj1 converges. From fj1 choose another subsequence fj2 such that A2 fj2
converges and so on. Since there might be nothing left from fjn as n → ∞, we
consider the diagonal sequence fj := fjj . By construction, fj is a subsequence
of fjn for j ≥ n and hence An fj is Cauchy for every fixed n. Now
∥Afj − Afk ∥ = ∥(A − An )(fj − fk ) + An (fj − fk )∥
≤ ∥A − An ∥∥fj − fk ∥ + ∥An fj − An fk ∥
shows that Afj is Cauchy since the first term can be made arbitrary small
by choosing n large and the second by the Cauchy property of An fj . □
Example 3.2. Let X := ℓp (N) and consider the operator
(Qa)j := qj aj
for some sequence q = (qj )∞ ∈ c0 (N) converging to zero. Let Qn be
j=1
associated with qj := qj for j ≤ n and qjn := 0 for j > n. Then the
n
Proof. First of all note that K(., ..) is continuous on [a, b] × [a, b] and hence
uniformly continuous. In particular, for every ε > 0 we can find a δ > 0 such
that |K(y, t) − K(x, t)| ≤ ε for any t ∈ [a, b] whenever |y − x| ≤ δ. Moreover,
∥K∥∞ = supx,y∈[a,b] |K(x, y)| < ∞.
We begin with the case X := L2cont (a, b). Let g := Kf . Then
Z b Z b
|g(x)| ≤ |K(x, t)| |f (t)|dt ≤ ∥K∥∞ |f (t)|dt ≤ ∥K∥∞ ∥1∥ ∥f ∥,
a a
where
√ we have used Cauchy–Schwarz in the last step (note that ∥1∥ =
b − a). Similarly,
Z b
|g(x) − g(y)| ≤ |K(y, t) − K(x, t)| |f (t)|dt
a
Z b
≤ε |f (t)|dt ≤ ε∥1∥ ∥f ∥,
a
(Hint: Fubini.)
Ker(A − z) (3.4)
Theorem 3.5. Let A be symmetric. Then all eigenvalues are real and eigen-
vectors corresponding to different eigenvalues are orthogonal.
by observing that the eigenfunctions for (e.g.) z = 0 and z = 1/2 are not
orthogonal. However, the above formula also shows that we can obtain a
symmetric operator by further restricting the domain. For example, we can
impose Dirichlet boundary conditions4 and consider
1 d
A0 := , D(A0 ) := {f ∈ C 1 [−π, π]|f (−π) = f (π) = 0}.
i dx
Then the above computation shows that A0 is symmetric since the boundary
terms vanish for g, f ∈ D(A0 ). Moreover, note that this domain is still
dense (to see this note that both 1 and x can be approximated by functions
vanishing at the boundary and that every polynomial can be decomposed into
a linear part and a polynomial which vanishes at the boundary). However,
note that since the exponential function has no zeros, we loose all eigenvalues!
The reason for this unfortunate behavior is that A and A0 are adjoint to
each other in the sense that ⟨g, A0 f ⟩ = ⟨Ag, f ⟩ for f ∈ D(A0 ) and g ∈ D(A).
Hence, at least formally, the adjoint of A0 is A and hence A0 is symmetric
but not self-adjoint. This gives a first hint at the fact, that symmetry is not
the same as self-adjointness for unbounded operators.
Returning to our original problem, another choice are periodic boundary
conditions
1 d
Ap := , D(Ap ) := {f ∈ C 1 [−π, π]|f (−π) = f (π)}.
i dx
Now we have increased the domain (in comparison to A0 ) such that we are
still symmetric, but such that A is no longer adjoint to Ap . Moreover, we
loose some of the eigenfunctions, but not all:
1
αn := n, un (x) := √ einx , n ∈ Z.
2π
In fact, the eigenfunctions are just the orthonormal basis from the Fourier
series. ⋄
The previous examples show that in the infinite dimensional case sym-
metry is not enough to guarantee existence of even a single eigenvalue. In
order to always get this, we will need an extra condition. In fact, we will
see that compactness provides a suitable extra condition to obtain an or-
thonormal basis of eigenfunctions. The crucial step is to prove existence of
one eigenvalue, the rest then follows as in the finite dimensional case.
where (αj )N
j=1 are the nonzero eigenvalues with corresponding eigenvectors
uj from the previous theorem.
Remark: There are two cases where our procedure might fail to construct
an orthonormal basis of eigenvectors. One case is where there is an infinite
number of nonzero eigenvalues. In this case αn never reaches 0 and all eigen-
vectors corresponding to 0 are missed. In the other case, 0 is reached, but
we might still miss some of the eigenvectors corresponding to 0 (if the kernel
is not separable or if we do not choose the vectors uj properly). In any
case, by adding vectors from the kernel (which are automatically eigenvec-
tors), one can always extend the eigenvectors uj to an orthonormal basis of
eigenvectors.
80 3. Compact operators
Example 3.9. We continue Example 3.7 and would like to apply the spectral
theorem to our operator Ap . However, since Ap is unbounded (its eigenvalues
are not bounded), it cannot be compact and hence we cannot apply Theo-
rem 3.7 directly to Ap . However, the trick is to apply it to the resolvent. To
this end we need to solve the inhomogeneous differential equation
implying
π x
ie2πiz
Z Z
−1 iz(x−y)
(Ap − z) g(x) = e g(y)dy + i eiz(x−y) g(y)dy
1 − e2πiz −π −π
Problem 3.12. Let H := L2cont (0, 1). Show that the Volterra integral opera-
tor K : H → H from Problem 3.6 has no eigenvalues except for 0. Show that
0 is no eigenvalue if K(x, y) > 0. Why does this not contradict Theorem 3.6?
(Hint: Gronwall’s inequality.)
Problem* 3.13. Show that the resolvent RA (z) = (A − z)−1 (provided it
exists and is densely defined) of a symmetric operator A is again symmetric
for z ∈ R. (Hint: g ∈ D(RA (z)) if and only if g = (A − z)f for some
f ∈ D(A).)
Here we have used integration by parts twice (the boundary terms vanish
due to our boundary conditions f (0) = f (1) = 0 and g(0) = g(1) = 0).
Of course we want to apply Theorem 3.7 and for this we would need to
show that L is compact. But this task is bound to fail, since L is not even
bounded (see Example 1.18)!
So here comes the trick (cf. Example 3.9): If L is unbounded its inverse
L−1 might still be bounded. Moreover, L−1 might even be compact and this
is the case here! Since L might not be injective (0 might be an eigenvalue),
we consider the resolvent RL (z) := (L − z)−1 , z ∈ C.
In order to compute the resolvent, we need to solve the inhomogeneous
equation (L − z)f = g. This can be done using the variation of constants
formula from ordinary differential equations which determines the solution
up to an arbitrary solution of the homogeneous equation. This homogeneous
equation has to be chosen such that f ∈ D(L), that is, such that f (0) =
f (1) = 0.
Define
Z x
u+ (z, x)
f (x) := u− (z, t)g(t)dt
W (z) 0
u− (z, x) 1
Z
+ u+ (z, t)g(t)dt , (3.17)
W (z) x
Hence Theorem 3.7 applies to (L − z)−1 once we show that we can find a
real z which is not an eigenvalue.
Theorem 3.11. The Sturm–Liouville operator L has a countable number of
discrete and simple eigenvalues En which accumulate only at ∞. They are
bounded from below and can hence be ordered as follows:
min q(x) < E0 < E1 < · · · . (3.23)
x∈[0,1]
Now, by (2.18), ∞ j=0 |⟨uj , g⟩| = ∥g∥ and hence the first term is part of a
2 2
P
convergent series. Similarly, the second term can be estimated independent
of x since
Z 1
αn un (x) = RL (λ)un (x) = G(λ, x, t)un (t)dt = ⟨un , G(λ, x, .)⟩
0
implies
n
X ∞
X Z 1
2 2
|αj uj (x)| ≤ |⟨uj , G(λ, x, .)⟩| = |G(λ, x, t)|2 dt ≤ M (λ)2 ,
j=m j=0 0
which we call the form domain of L. Here Cp1 [a, b] denotes the set of
piecewise continuously differentiable functions f in the sense that f is con-
tinuously differentiable except for a finite number of points at which it is
continuous and the derivative has limits from the left and right. In fact, any
class of functions for which the partial integration needed to obtain (3.26)
can be justified would be good enough (e.g. the set of absolutely continuous
functions to be discussed in Section 4.4 from [37]).
which implies
n
X
Ej |⟨uj , f ⟩|2 ≤ qL (f ).
j=m
In particular, note that this estimate applies to f (y) = G(λ, x, y). Now from
the proof of Theorem 3.11 (with λ = 0 and αj = Ej−1 ) we have uj (x) =
Ej ⟨uj , G(0, x, .)⟩ and hence
n
X n
X
|⟨uj , f ⟩uj (x)| = Ej |⟨uj , f ⟩⟨uj , G(0, x, .)⟩|
j=m j=m
1/2
n
X n
X
≤ Ej |⟨uj , f ⟩|2 Ej |⟨uj , G(0, x, .)⟩|2
j=m j=m
1/2
n
X
≤ Ej |⟨uj , f ⟩|2 qL (G(0, x, .))1/2 ,
j=m
Proof. Using the conventions from the proof of the previous lemma we have
⟨uj , G(0, x, .)⟩ = Ej−1 uj (x) and since G(0, x, .) ∈ Q(L) for fixed x ∈ [a, b] we
have
∞
X 1
uj (x)uj (y) = G(0, x, y),
Ej
j=0
Ej
where C(z) := supj |Ej −z| .
Finally, the last claim follows upon computing the integral using (3.28)
and observing ∥uj ∥ = 1. □
which is convergent with respect to our scalar product. If f ∈ Cp1 [0, 1] with
f (0) = f (1) = 0 the series will converge uniformly. For an application of the
trace formula see Problem 3.16. ⋄
90 3. Compact operators
Example 3.11. We could also look at the same equation as in the previous
problem but with different boundary conditions
u′ (0) = u′ (1) = 0.
Then (
1, n = 0,
En = π 2 n2 , un (x) = √
2 cos(nπx), n ∈ N.
Moreover, every function f ∈ L2cont (0, 1) can be expanded into a Fourier
cosine series
∞
X Z 1
f (x) = fn un (x), fn := un (x)f (x)dx,
n=1 0
∞
X ∞
X
⟨f, Af ⟩ = ⟨f, αj γj uj ⟩ = αj |γj |2 , f ∈ D(A), (3.31)
j=1 j=1
take some orthogonal basis, take a finite number of coefficients and optimize
them. This is known as the Rayleigh–Ritz method.9
Example 3.13. Consider the Sturm–Liouville operator L with potential
q(x) = x and Dirichlet boundary conditions f (0) = f (1) = 0 on the in-
terval [0, 1]. Our starting point is the quadratic form
Z 1
|f ′ (x)|2 + q(x)|f (x)|2 dx
qL (f ) := ⟨f, Lf ⟩ =
0
which gives us the lower bound
⟨f, Lf ⟩ ≥ min q(x) = 0.
0≤x≤1
While the corresponding differential equation can in principle be solved in
terms of Airy functions, there is no closed form for the eigenvalues.
First of all we can improve the above bound upon observing 0 ≤ q(x) ≤ 1
which implies
⟨f, L0 f ⟩ ≤ ⟨f, Lf ⟩ ≤ ⟨f, (L0 + 1)f ⟩, f ∈ D(L) = D(L0 ),
where L0 is the Sturm–Liouville operator corresponding to q(x) = 0. Since
the lowest eigenvalue of L0 is π 2 we obtain
π 2 ≤ E1 ≤ π 2 + 1
for the lowest eigenvalue E1 of L.
√
Moreover, using the lowest eigenfunction f1 (x) = 2 sin(πx) of L0 one
obtains the improved upper bound
1
E1 ≤ ⟨f1 , Lf1 ⟩ = π 2 + ≈ 10.3696.
2
√
Taking the second eigenfunction f2 (x) = 2 sin(2πx) of L0 we can make the
ansatz f (x) = (1 + γ 2 )−1/2 (f1 (x) + γf2 (x)) which gives
1 γ 32
⟨f, Lf ⟩ = π 2 + + 2
3π 2 γ − 2 .
2 1+γ 9π
The right-hand side has a unique minimum at γ = 27π4 +√1024+729π
32
8
giving
the bound √
5 2 1 1024 + 729π 8
E1 ≤ π + − ≈ 10.3685
2 2 18π 2
which coincides with the exact eigenvalue up to five digits. ⋄
But is there also something one can say about the next eigenvalues?
Suppose we know the first eigenfunction u1 . Then we can restrict A to
the orthogonal complement of u1 and proceed as before: E2 will be the
minimum of ⟨f, Af ⟩ over all f restricted to this subspace. If we restrict to
9John William Strutt, 3rd Baron Rayleigh (1842–1919), English physicist
9Walther Ritz (1878–1909), Swiss theoretical physicist
3.4. Estimating eigenvalues 93
where
U (f1 , . . . , fj ) := {f ∈ D(A)| ∥f ∥ = 1, f ∈ span{f1 , . . . , fj }⊥ }. (3.35)
Proof. We have
inf ⟨f, Af ⟩ ≤ αj .
f ∈U (f1 ,...,fj−1 )
Pj
In fact, set f = k=1 γk uk and choose γk such that f ∈ U (f1 , . . . , fj−1 ).
Then
j
X
⟨f, Af ⟩ = |γk |2 αk ≤ αj
k=1
and the claim follows.
Conversely, let γk = ⟨uk , f ⟩ and write f = jk=1 γk uk + f˜. Then
P
XN
inf ⟨f, Af ⟩ = inf |γk |2 αk + ⟨f˜, Ãf˜⟩ = αj . □
f ∈U (u1 ,...,uj−1 ) f ∈U (u1 ,...,uj−1 )
k=j
where the inf is taken over subspaces with the indicated properties.
Problem* 3.18. Prove Theorem 3.16.
Problem 3.19. Suppose A, An are self-adjoint, bounded and An → A.
Then αk (An ) → αk (A). (Hint: For B self-adjoint ∥B∥ ≤ ε is equivalent to
−ε ≤ B ≤ ε.)
Moreover, ∥Kuj ∥2 = ⟨uj , K ∗ Kuj ⟩ = ⟨uj , s2j uj ⟩ = s2j shows that we can set
sj := ∥Kuj ∥ > 0. (3.39)
11Hermann Weyl (1885-1955), German mathematician, theoretical physicist and philosopher
3.5. Singular value decomposition of compact operators 95
The numbers sj = sj (K) are called singular values of K. There are either
finitely many singular values or they converge to zero.
Theorem 3.17 (Schmidt; Singular value decomposition of compact opera-
tors). Let K ∈ K (H1 , H2 ) be compact and let sj be the singular values of K
and {uj } ⊂ H1 corresponding orthonormal eigenvectors of K ∗ K. Then
X
K= sj ⟨uj , .⟩vj , (3.40)
j
where vj = s−1
j Kuj . The norm of K is given by the largest singular value
as required. Furthermore,
⟨vj , vk ⟩ = (sj sk )−1 ⟨Kuj , Kuk ⟩ = (sj sk )−1 ⟨K ∗ Kuj , uk ⟩ = sj s−1
k ⟨uj , uk ⟩
where
√ X √ X
|K| := K ∗K = sj ⟨uj , .⟩uj , |K ∗ | = KK ∗ = sj ⟨vj , .⟩vj (3.43)
j j
In particular, note
sj (AK) ≤ ∥A∥sj (K), sj (KA) ≤ ∥A∥sj (K) (3.46)
whenever K is compact and A is bounded (the second estimate follows from
the first by taking adjoints).
An operator K ∈ L (H1 , H2 ) is called a finite rank operator if its
range is finite dimensional. The dimension
rank(K) := dim Ran(K)
is called the rank of K. Since for a compact operator
Ran(K) = span{vj } (3.47)
we see that a compact operator is finite rank if and only if the sum in (3.40)
is finite. Note that the finite rank operators form an ideal in L (H) just as
the compact operators do. Moreover, every finite rank operator is compact
by the Heine–Borel theorem (Theorem B.22).
Now truncating the sum in the canonical form gives us a simple way to
approximate compact operators by finite rank ones. Moreover, this is in fact
the best approximation within the class of finite rank operators:
Lemma 3.19 (Schmidt). Let K ∈ K (H1 , H2 ) be compact and let its singular
values be ordered. Then
sj (K) = min ∥K − F ∥, (3.48)
rank(F )<j
In particular, the closure of the ideal of finite rank operators in L (H) is the
ideal of compact operators.
Proof. That there is equality for F = Fj−1 follows from (3.41). In general,
the restriction of F to span{u1 , . . . , uj } will have a nontrivial kernel. Let
f = jk=1 αj uj be a normalized element of this kernel, then ∥(K − F )f ∥2 =
P
Proof. Just observe that K ∗ K compact is all that was used to show Theo-
rem 3.17. □
Corollary 3.21. An operator K ∈ L (H1 , H2 ) is compact (finite rank) if
and only K ∗ ∈ L (H2 , H1 ) is. In fact, sj (K) = sj (K ∗ ) and
X
K∗ = sj ⟨vj , .⟩uj . (3.50)
j
Proof. First of all note that (3.50) follows from (3.40) since taking adjoints
is continuous and (⟨uj , .⟩vj )∗ = ⟨vj , .⟩uj (cf. Problem 2.11). The rest is
straightforward. □
From this last lemma one easily gets a number of useful inequalities for
the singular values:
Corollary 3.22 (Weyl). Let K1 and K2 be compact and let sj (K1 ) and
sj (K2 ) be ordered. Then
(i) sj+k−1 (K1 + K2 ) ≤ sj (K1 ) + sk (K2 ),
(ii) sj+k−1 (K1 K2 ) ≤ sj (K1 )sk (K2 ),
(iii) |sj (K1 ) − sj (K2 )| ≤ ∥K1 − K2 ∥.
where the minimum is taken over all subspaces with the indicated dimension.
Moreover, the minimum is attained for
M = span{uk }j−1
k=1 , N = span{vk }j−1
k=1 .
The two most important cases are p = 1 and p = 2: J2 (H) is the space
of Hilbert–Schmidt operators and J1 (H) is the space of trace class
operators.
Example 3.16. Any multiplication operator by a sequence from ℓp (N) is in
the Schatten p-class of H = ℓ2 (N). ⋄
Example 3.17. By virtue of the Weyl asymptotics (see Example 3.14) the
resolvent of a regular Sturm–Liouville operator is trace class. ⋄
Example 3.18. Let k be a periodic function which is square integrable over
[−π, π]. Then the integral operator
Z π
1
(Kf )(x) = k(y − x)f (y)dy
2π −π
Proof. First of all note that (3.55) implies that K is compact. To see this,
let Pn be the projection onto the space spanned by the first n elements of
100 3. Compact operators
the orthonormal basis {wj }. Then Kn = KPn is finite rank and converges
to K since
X X X 1/2
∥(K − Kn )f ∥ = ∥ cj Kwj ∥ ≤ |cj |∥Kwj ∥ ≤ ∥Kwj ∥2 ∥f ∥,
j>n j>n j>n
where f = j cj wj .
P
Proof. This follows from (3.56) upon using the triangle inequality for H and
for ℓ2 (J). □
But then
X XZ b Z bX
2 2
∥Kwj ∥ = |(Kwj )(x)| dx = |(Kwj )(x)|2 dx
j∈N j∈N a a j∈N
2
≤ (b − a)M
as claimed. ⋄
Since Hilbert–Schmidt operators turn out easy to identify (cf. also Sec-
tion 3.5 from [37]), it is important to relate J1 (H) with J2 (H):
Lemma 3.26. An operator is trace class if and only if it can be written as
the product of two Hilbert–Schmidt operators, K = K1 K2 , and in this case
we have
∥K∥1 ≤ ∥K1 ∥2 ∥K2 ∥2 . (3.58)
In fact, K1 , K2 can be chosen such that ∥K∥1 = ∥K1 ∥2 ∥K2 ∥2 .
12Ferdinand Georg Frobenius (1849 –1917), German mathematician
102 3. Compact operators
In the special case w = w̃ we see tr(K1 K2 ) = tr(K2 K1 ) and the general case
now shows that the trace is independent of the orthonormal basis. □
Clearly for self-adjoint trace class operators, the trace is the sum over
all eigenvalues (counted with their multiplicity). To see this, one just has to
choose the orthonormal basis to consist of eigenfunctions. This is even true
for all trace class operators and is known as Lidskii13 trace theorem (see [27]
for an easy to read introduction).
13Victor Lidskii (1924–2008), Soviet mathematician
3.6. Hilbert–Schmidt and trace class operators 103
for z ∈ C no eigenvalue. ⋄
Example 3.22. For our integral operator K from Example 3.18 we have in
the trace class case X
tr(K) = k̂j = k(0).
j∈Z
Note that this can again be interpreted as the integral over the diagonal
(2π)−1 k(x − x) = (2π)−1 k(0) of the kernel. ⋄
We also note the following elementary properties of the trace:
Lemma 3.28. Suppose K, K1 , K2 are trace class and A is bounded.
(i) The trace is linear.
(ii) tr(K ∗ ) = tr(K)∗ .
(iii) If K1 ≤ K2 , then tr(K1 ) ≤ tr(K2 ).
(iv) tr(AK) = tr(KA).
Proof. (i) and (ii) are straightforward. (iii) follows from K1 ≤ K2 if and
only if ⟨f, K1 f ⟩ ≤ ⟨f, K2 f ⟩ for every f ∈ H. (iv) By Problem 2.16 and (i),
it is no restriction to assume that A is unitary. Let {wn } be some ONB and
note that {w̃n = Awn } is also an ONB. Then
X X
tr(AK) = ⟨w̃n , AK w̃n ⟩ = ⟨Awn , AKAwn ⟩
n n
X
= ⟨wn , KAwn ⟩ = tr(KA)
n
and the claim follows. □
Proof. To see that a trace class operator (3.40) can be written in such a
way choose fj = uj , gj = sj vj . This also shows that the minimum in (3.64)
is attained. Conversely note that the sum converges in the operator norm
and hence K is compact. Moreover, for every finite N we have
N
X N
X N X
X N
XX
sk = ⟨vk , Kuk ⟩ = ⟨vk , gj ⟩⟨fj , uk ⟩ = ⟨vk , gj ⟩⟨fj , uk ⟩
k=1 k=1 k=1 j j k=1
N
!1/2 N
!1/2
X X X X
2
≤ |⟨vk , gj ⟩| |⟨fj , uk ⟩|2 ≤ ∥fj ∥∥gj ∥.
j k=1 k=1 j
This also shows that the right-hand side in (3.64) cannot exceed ∥K∥1 . To
see the last claim we choose an ONB {wk } to compute the trace
X XX XX
tr(K) = ⟨wk , Kwk ⟩ = ⟨wk , ⟨fj , wk ⟩gj ⟩ = ⟨⟨wk , fj ⟩wk , gj ⟩
k k j j k
X
= ⟨fj , gj ⟩. □
j
2
P
• Show that K is Hilbert–Schmidt if and only if j∈N j|kj+1 | <∞
and this number equals ∥K∥2 .
• Show that K is Hilbert–Schmidt with ∥K∥2 ≤ ∥c∥1 if |kj | ≤ cj ,
where cj is decreasing and summable.
(Hint: For the first item use summation by parts.)
Chapter 4
Despite the many advantages of Hilbert spaces, there are also situations
where a non-Hilbert space is better suited (in fact the choice of the right
space is typically crucial for many problems). Hence we will devote our
attention to Banach spaces next.
Proof. Suppose X = ∞ n=1 Xn . We can assume that the sets Xn are closed
S
and none of them contains a ball; in particular, X \Xn is open and nonempty
for every n. We will construct a Cauchy sequence xn which stays away from
all Xn .
Since X \ X1 is open and nonempty, there is a ball Br1 (x1 ) ⊆ X \ X1 .
Reducing r1 a little, we can even assume Br1 (x1 ) ⊆ X \ X1 . Moreover,
since X2 cannot contain Br1 (x1 ), there is some x2 ∈ Br1 (x1 ) that is not
in X2 . Since Br1 (x1 ) ∩ (X \ X2 ) is open, there is a closed ball Br2 (x2 ) ⊆
1René-Louis Baire (1874 –1932), French mathematician
107
108 4. The main theorems about Banach spaces
Proof. Let {On } be a family of open dense sets whose intersection is not
dense. ThenS this intersection must be missing some closed ball Bε . This ball
will lie in n Xn , where Xn := X \ On are closed and nowhere dense. Now
note that X̃n := Xn ∩ Bε are closed nowhere dense sets in Bε . But Bε is a
complete metric space, a contradiction. □
Countable intersections of open sets are in some sense the next general
sets after open sets and are called Gδ sets (here G and δ stand for the German
words Gebiet and Durchschnitt, respectively). The complement of a Gδ set is
a countable union of closed sets also known as an Fσ set (here F and σ stand
for the French words fermé and somme, respectively). The complement of
a dense Gδ set will be a countable union of nowhere dense sets and hence
by definition meager. Consequently properties which hold on a dense Gδ are
considered generic in this context.
Example 4.2. The irrational numbers are a dense Gδ set in R. To see
this, let xn be an enumeration of the rational numbers and consider the
4.1. The Baire theorem and its consequences 109
and it suffices to show that the family {ℓn }n∈N is not uniformly bounded.
By Example 1.22 (adapted to our present periodic setting) we have
1
∥ℓn ∥ = ∥Dn ∥1 .
2π
Now we estimate
Z π π
| sin((n + 1/2)x)|
Z
∥Dn ∥1 = 2 |Dn (x)|dx ≥ 2 dx
0 0 x/2
Z (n+1/2)π n Z kπ n
dy X dy 8X1
=4 | sin(y)| ≥4 | sin(y)| =
0 y (k−1)π kπ π k
k=1 k=1
This raises the question if a similar estimate can be true for continuous
functions. More precisely, can we find a sequence ck > 0 such that
|fˆk | ≤ Cf ck ,
4.1. The Baire theorem and its consequences 111
Proof. Set BrX := BrX (0) and similarly for BrY (0). By translating balls
(using linearity of A), it suffices to prove that for every ε > 0 there is a δ > 0
such that BδY ⊆ A(BεX ). (By scaling we could also assume ε = 1 without
loss of generality.)
So let ε > 0 be given. Since A is surjective we have
∞
[ ∞
[ ∞
[
Y = AX = A nBεX = A(nBεX ) = nA(BεX )
n=1 n=1 n=1
and the Baire theorem implies that for some n, nA(BεX ) contains a ball.
Since multiplication by n is a homeomorphism, the same must be true for
n = 1, that is, BδY (y) ⊂ A(BεX ). Consequently
So it remains to get rid of the closure. To this end choose εn > 0 such that
P∞
n=1 εn < ε and corresponding δn → 0 such that Bδn ⊂ A(Bεn ). Now
Y X
for z ∈ BδY1 ⊂ A(BεX1 ) we have x1 ∈ BεX1 such that Ax1 is arbitrarily close
to z, say z − Ax1 ∈ BδY2 ⊂ A(BεX2 ). Hence we can find x2 ∈ BεX2 such
that (z − Ax1 ) − Ax2 ∈ BδY3 ⊂ A(BεX3 ) and proceeding like this a sequence
xn ∈ BεXn such that
X n
z− Axk ∈ BδYn+1 .
k=1
112 4. The main theorems about Banach spaces
Conversely, if A is open, then the image of the unit ball contains again
some ball BεY ⊆ A(B1X ). Hence by scaling Brε
Y ⊆ A(B X ) and letting r → ∞
r
we see that A is onto: Y = A(X). □
Proof. As shown in the proof, if A is not onto, none of the sets ABnX will
contain a ball
S andX hence the sets ABn are nowhere dense. Consequently,
X
Example 4.7. For example, ℓp0 (N) is meager as a subset of ℓp (N) for p0 < p
(which follows from applying the above corollary to the natural embedding
operator — Problem 1.17). ⋄
As another immediate consequence we get the inverse mapping theorem:
Theorem 4.8 (Inverse mapping). Let A ∈ L (X, Y ) be a continuous linear
bijection between Banach spaces. Then A−1 is continuous.
Example 4.8. Consider the operator (Aa)nj=1 := ( 1j aj )nj=1 in ℓ2 (N). Then
its inverse (A−1 a)nj=1 = (j aj )nj=1 is unbounded (show this!). This is in
agreement with our theorem since its range is dense (why?) but not all
of ℓ2 (N): For example, (bj := 1j )∞ j=1 ̸∈ Ran(A) since b = Aa gives the
4.1. The Baire theorem and its consequences 113
contradiction
∞
X ∞
X ∞
X
2
∞= 1= |jbj | = |aj |2 < ∞.
j=1 j=1 j=1
not guarantee convergence of Axn but it will ensure that if this sequence
converges, then it will converge to the right object, namely Ax.
Example 4.10. Let X := C[0, 1] and consider the unbounded operator (cf.
Example 1.18)
D(A) := C 1 [0, 1], Af := f ′ .
Then A is closed since fn → f and fn′ → g implies that f is differentiable
and f ′ = g. ⋄
Theorem 4.9 (Closed graph). Let A : X → Y be a linear map from a
Banach space X to another Banach space Y . Then A is continuous if and
only if its graph is closed.
Proof. If Γ(A) is closed, then it is again a Banach space. Now the projec-
tion π1 (x, Ax) := x onto the first component is a continuous bijection onto
X. So by the inverse mapping theorem its inverse π1−1 is again continuous.
Moreover, the projection π2 (x, Ax) := Ax onto the second component is also
continuous and consequently so is A = π2 ◦ π1−1 . The converse is easy. □
Problem 4.1. Every subset of a meager set is again meager. Every superset
of a fat set is fat.
Problem 4.2. Let X be a complete metric space. The complement of a
meager set is dense.
Problem 4.3. Consider X := C1 [−1, 1]. Show that M := {x ∈ X|x(−t) =
x(t)} is meager.
Problem 4.4. An infinite dimensional Banach space cannot have a count-
able Hamel basis (see Problem 1.8). (Hint: Apply Baire’s theorem to Xn :=
span{uj }nj=1 .)
Problem 4.5. Let X := C[0, 1]. Show that the set of functions which are
nowhere differentiable contains a dense Gδ . (Hint: Consider Fk := {f ∈
X| ∃x ∈ [0, 1] : |f (x) − f (y)| ≤ k|x − y|, ∀y ∈ [0, 1]}. Show that this set is
closed and nowhere dense. For the first property Bolzano–Weierstraß might
be useful, for the latter property show that the set Pm of piecewise linear
functions whose slopes are bounded below by m in absolute value are dense.
Now observe that Fk ∩ Pm = ∅ for m > k.)
Problem 4.6. Let X be a complete metric space without isolated points.
Show that a dense Gδ set cannot be countable. (Hint: A single point is
nowhere dense.)
Problem 4.7. Let X be the space of sequences with finitely many nonzero
terms together with the sup norm. Consider the family of operators {An }n∈N
given by (An a)j := jaj , j ≤ n and (An a)j := 0, j > n. Then this family
is pointwise bounded but not uniformly bounded. Does this contradict the
Banach–Steinhaus theorem?
Problem 4.8. Let X be a Banach space and Y, Z normed spaces. Show that
a bilinear map B : X ×Y → Z is bounded, ∥B(x, y)∥ ≤ C∥x∥∥y∥, if and only
if it is separately continuous with respect to both arguments. (Hint: Uniform
boundedness principle.)
Problem 4.9. Consider a Schauder basis as in (1.31). Show that the coor-
dinate functionals αn are continuous. (Hint: Denote the set of all possible
N
Pncoefficients α = (αn )n=1 by A and equip it with the
sequences of Schauder
norm P∥α∥ := supn ∥ k=1 αk uk ∥. By construction the operator A : A → X,
α 7→ k αk uk has norm one. Now show that A is complete and apply the
inverse mapping theorem.)
Problem 4.10. Show that a compact symmetric operator in an infinite-di-
mensional Hilbert space cannot be surjective.
Problem 4.11. Show that the operator
D(A) := {a ∈ ℓp (N)|j aj ∈ ℓp (N)}, (Aa)j := j aj ,
116 4. The main theorems about Banach spaces
simply writes ℓp (N)∗ = ℓq (N). In the case p = ∞ this is not true, as we will
see soon. ⋄
It turns out that many questions are easier to handle after applying a
linear functional ℓ ∈ X ∗ . For example, suppose x(t) is a function R → X
(or C → X), then ℓ(x(t)) is a function R → C (respectively C → C) for
any ℓ ∈ X ∗ . So to investigate ℓ(x(t)) we have all tools from real/complex
analysis at our disposal. But how do we translate this information back to
x(t)? Suppose we have ℓ(x(t)) = ℓ(y(t)) for all ℓ ∈ X ∗ . Can we conclude
4.2. The Hahn–Banach theorem and its consequences 117
x(t) = y(t)? The answer is yes and will follow from the Hahn–Banach
theorem.
We first prove the real version from which the complex one then follows
easily.
Theorem 4.11 (Hahn–Banach3, real version). Let X be a real vector space
and φ : X → R a convex function (i.e., φ(λx+(1−λ)y) ≤ λφ(x)+(1−λ)φ(y)
for λ ∈ (0, 1)).
If ℓ is a linear functional defined on some subspace Y ⊂ X which satisfies
ℓ(y) ≤ φ(y), y ∈ Y , then there is an extension ℓ to all of X satisfying
ℓ(x) ≤ φ(x), x ∈ X.
Proof. Let us first try to extend ℓ in just one direction: Take x ̸∈ Y and
set Ỹ := span{x, Y }. If there is an extension ℓ̃ to Ỹ it must clearly satisfy
ℓ̃(y + αx) := ℓ(y) + αℓ̃(x), y ∈ Y.
So all we need to do is to choose ℓ̃(x) such that ℓ̃(y + αx) ≤ φ(y + αx). But
this is equivalent to
φ(y − αx) − ℓ(y) φ(y + αx) − ℓ(y)
sup ≤ ℓ̃(x) ≤ inf
α>0,y∈Y −α α>0,y∈Y α
and is hence only possible if
φ(y1 − α1 x) − ℓ(y1 ) φ(y2 + α2 x) − ℓ(y2 )
≤
−α1 α2
for every α1 , α2 > 0 and y1 , y2 ∈ Y . Rearranging this last equations we see
that we need to show
α2 ℓ(y1 ) + α1 ℓ(y2 ) ≤ α2 φ(y1 − α1 x) + α1 φ(y2 + α2 x).
Starting with the left-hand side we have
α2 ℓ(y1 ) + α1 ℓ(y2 ) = (α1 + α2 )ℓ (λy1 + (1 − λ)y2 )
≤ (α1 + α2 )φ (λy1 + (1 − λ)y2 )
= (α1 + α2 )φ (λ(y1 − α1 x) + (1 − λ)(y2 + α2 x))
≤ α2 φ(y1 − α1 x) + α1 φ(y2 + α2 x),
where λ = α1 +α2 .
α2
Hence one dimension works.
To finish the proof we appeal to Zorn’s lemma (Theorem A.2): Let E
be the collection of all extensions ℓ̃ satisfying ℓ̃(x) ≤ φ(x). Then E can be
partially ordered by inclusion (with respect to the domain, i.e., ℓ̃1 ⊆ ℓ̃2 if
D(ℓ̃1 ) ⊆ D(ℓ̃2 ) and ℓ̃2 |D(ℓ̃1 ) = ℓ̃1 ) and every linear chain has an upper bound
(defined on the union of all domains). Hence there is a maximal element
3Hans Hahn (1879–1934), Austrian mathematician
118 4. The main theorems about Banach spaces
Proof. Clearly, if ℓ(x) = 0 holds for all ℓ in some total subset, this holds
for all ℓ ∈ X ∗ . If x ̸= 0 we can construct a bounded linear functional on
span{x} by setting ℓ(αx) = α and extending it to X ∗ using the previous
corollary. But this contradicts our assumption. □
Example 4.15. Let us return to our example ℓ∞ (N). Let c(N) ⊂ ℓ∞ (N) be
the subspace of convergent sequences. Set
l(x) := lim xn , x ∈ c(N), (4.4)
n→∞
then l is bounded since
|l(x)| = lim |xn | ≤ ∥x∥∞ . (4.5)
n→∞
Hence we can extend it to ℓ∞ (N) by Corollary 4.13. Then l(x) cannot be
written as l(x) = ly (x) for some y ∈ ℓ1 (N) (as in (4.3)) since yn = l(δ n ) = 0
shows y = 0 and hence ℓy = 0. The problem is that span{δ n } = c0 (N) ̸=
ℓ∞ (N), where c0 (N) is the subspace of sequences converging to 0.
Moreover, there is also no other way to identify ℓ∞ (N)∗ with ℓ1 (N), since
ℓ1 (N) is separable whereas ℓ∞ (N) is not. This will follow from Lemma 4.19 (iii)
below. ⋄
Another useful consequence is
Corollary 4.15 (Mazur4). Let Y ⊆ X be a subspace of a normed vector
space and let x0 ∈ X \ Y . Then there exists an ℓ ∈ X ∗ such that (i) ℓ(y) = 0,
y ∈ Y , (ii) ℓ(x0 ) = dist(x0 , Y ), and (iii) ∥ℓ∥ = 1.
Problem 4.21 (Banach limit). Let c(N) ⊂ ℓ∞ (N) be the subspace of all
bounded sequences for which the limit of the Cesàro means
n
1X
L(x) := lim xk
n→∞ n
k=1
exists. Note that c(N) ⊆ c(N) and L(x) = limn→∞ xn for x ∈ c(N).
Show that L can be extended to all of ℓ∞ (N) such that
(i) L is linear,
(ii) |L(x)| ≤ ∥x∥∞ ,
(iii) L(Sx) = L(x) where (Sx)n = xn+1 is the shift operator,
(iv) L(x) ≥ 0 when xn ≥ 0 for all n,
(v) lim inf n xn ≤ L(x) ≤ lim sup xn for all real-valued sequences.
(Hint: Of course existence follows from Hahn–Banach and (i), (ii) will come
for free. Also (iii) will be inherited from the construction. For (iv) note
that the extension can assumed to be real-valued and investigate L(e − x) for
x ≥ 0 with ∥x∥∞ = 1 where e = (1, 1, 1, . . . ). (v) then follows from (iv).)
4.3. Reflexivity
If we take the bidual (or double dual) X ∗∗ of a normed space X, then the
Hahn–Banach theorem tells us, that X can be identified with a subspace of
X ∗∗ . In fact, consider the linear map J : X → X ∗∗ defined by J(x)(ℓ) := ℓ(x)
(i.e., J(x) is evaluation at x). Then
Theorem 4.18. Let X be a normed space. Then J : X → X ∗∗ is isometric
(norm preserving).
Example 4.16. This gives another quick way of showing that a normed
space has a completion: Take X̄ := J(X) ⊆ X ∗∗ and recall that a dual
space is always complete (Theorem 1.17). ⋄
Thus J : X → X ∗∗ is an isometric embedding. In many cases we even
have J(X) = X ∗∗ and X is called reflexive in this case. Of course a reflexive
space must necessarily be complete.
Example 4.17. The Banach spaces ℓp (N) with 1 < p < ∞ are reflexive:
Identify ℓp (N)∗ with ℓq (N) (cf. Problem 4.18) and choose c ∈ ℓp (N)∗∗ . Then
there is some a ∈ ℓp (N) such that
b ∈ ℓq (N) ∼
X
c(b) = bj aj , = ℓp (N)∗ .
j∈N
But this implies c(b) = b(a), that is, c = J(a), and thus J is surjective.
(Warning: It does not suffice to just argue ℓp (N)∗∗ ∼
= ℓq (N)∗ ∼
= ℓp (N).)
However, ℓ1 is not reflexive since ℓ1 (N)∗ =∼ ℓ∞ (N) but ℓ∞ (N)∗ ̸∼= ℓ1 (N)
as noted earlier. Things get even a bit more explicit if we look at c0 (N),
where we can identify (cf. Problem 4.19) c0 (N)∗ with ℓ1 (N) and c0 (N)∗∗ with
ℓ∞ (N). Under this identification J(c0 (N)) = c0 (N) ⊆ ℓ∞ (N). ⋄
4.3. Reflexivity 123
that ∥xn ∥ = 1 and ℓn (xn ) ≥ ∥ℓn ∥/2. We will show that {xn }∞ n=1 is total in
X. If it were not, we could find some x ∈ X \ span{xn }∞ n=1 and hence there
124 4. The main theorems about Banach spaces
where we have used Problem 4.15 to obtain the fourth equality. In summary,
(BA)′ = A′ B ′ (4.10)
Ra := (0, a1 , a2 , . . . ).
which shows (R′ b)k = b′k+1 upon choosing a = δ k . Hence R′ = L is the left
shift: Lb := (b2 , b3 , . . . ). Similarly, L′ = R. ⋄
Example 4.25. Let c ∈ ℓ∞ (N) and consider the multiplication operator A
in ℓp (N) with 1 ≤ p < ∞ defined by Aaj := cj aj . As in the previous example
X∗ ∼= ℓq (N) with 1q + p1 = 1 and for b ∈ ℓq (N) we have
∞
X ∞
X
lb (Aa) = bj (cj aj ) = (cj bj )aj ,
j=1 j=1
which shows (A′ b)j = cj bj and hence A′ is multiplication with c but now in
ℓq (N). Also note that in the case p = 2 the Hilbert space adjoint A∗ would
be multiplication by the complex conjugate sequence c∗ . ⋄
Example 4.26. Recall that an operator K ∈ L (X, Y ) is called a finite
rank operator if its range is finite dimensional. The dimension of its
range rank(K) := dim Ran(K) is called the rank of K. Choosing a ba-
sis {yj = Kxj }nj=1 for Ran(K) and a corresponding dual basis {yj′ }nj=1 (cf.
Problem 4.23), then x′j := K ′ yj′ is a dual basis for xj and
n
X n
X n
X
Kx = yj′ (Kx)yj = x′j (x)yj , K ′y′ = y ′ (yj )x′j .
j=1 j=1 j=1
Of course we can also consider the doubly adjoint operator A′′ . Then a
simple computation
A′′ (JX (x))(y ′ ) = JX (x)(A′ y ′ ) = (A′ y ′ )(x) = y ′ (Ax) = JY (Ax)(y ′ ) (4.11)
shows that the following diagram commutes
A
X −→ Y
JX ↓ ↓ JY
X ∗∗ −→
′′
Y ∗∗
A
Consequently
−1
A′′ ↾Ran(JX ) = JY AJX , A = JY−1 A′′ JX . (4.12)
Hence, regarding X as a subspace JX (X) ⊆ X ∗∗ and Y as a subspace
JY (Y ) ⊆ Y ∗∗ , then A′′ is an extension of A to X ∗∗ but with values in
Y ∗∗ . In particular, note that B ∈ L (Y ∗ , X ∗ ) is the adjoint of some other
operator B = A′ if and only if B ′ (JX (X)) = A′′ (JX (X)) ⊆ JY (Y ) (for the
converse note that A := JY−1 B ′ JX will do the trick). This can be used to
show that not every operator is an adjoint (Problem 4.30).
Theorem 4.24 (Schauder). Suppose X, Y are Banach spaces and A ∈
L (X, Y ). Then A is compact if and only if A′ is.
since A(B1X (0)) ⊆ K is dense. Thus yn′ j is the required subsequence and A′
is compact.
To see the converse note that if A′ is compact then so is A′′ by the first
part and hence also A = JY−1 A′′ JX . □
Theorem 4.25. Suppose X, Y are Banach spaces. If A ∈ L (X, Y ),
then A−1 exists and is in L (Y, X) if and only if (A′ )−1 exists and is in
L (X ∗ , Y ∗ ). Moreover, in this case we have
(A′ )−1 = (A−1 )′ . (4.13)
4.4. The adjoint operator 129
Proof. If Ran(A) is closed, then we can factor out its kernel and restrict Y
to obtain a bijective operator à as in Problem 1.61. By the inverse mapping
theorem (Theorem 4.8) Ã has a bounded inverse. Fix δ < 1 and choose
ε < ∥Ã−1 ∥−1 δ. Then for every y ∈ Ran(A) there is some x ∈ X with
y = Ax and ∥Ax∥ ≥ δε ∥x∥ after maybe adding an element from the kernel
to x. This x satisfies ε∥x∥ + ∥y − Ax∥ = ε∥x∥ ≤ δ∥y∥ as required.
Conversely, fix y ∈ Ran(A) and recursively choose a sequence xn such
that
X
ε∥xn ∥ + ∥(y − Ax̃n−1 ) − Axn ∥ ≤ δ∥y − Ax̃n−1 ∥, x̃n := xm .
m≤n
Proof. (i) Follows from ℓ(αn xn + yn ) = αn ℓ(xn ) + ℓ(yn ) → αℓ(x) + ℓ(y). (ii)
Choose ℓ ∈ X ∗ such that ℓ(x) = ∥x∥ (for the limit x) and ∥ℓ∥ = 1. Then
∥x∥ = ℓ(x) = lim inf |ℓ(xn )| ≤ lim inf ∥xn ∥.
(iii) For every ℓ we have that |J(xn )(ℓ)| = |ℓ(xn )| ≤ C(ℓ) is bounded. Hence
by the uniform boundedness principle we have ∥xn ∥ = ∥J(xn )∥ ≤ C.
(iv) If xn is a weak Cauchy sequence, then ℓ(xn ) converges and we can define
j(ℓ) := lim ℓ(xn ). By construction j is a linear functional on X ∗ . Moreover,
4.5. Weak convergence 133
by (iii) we have |j(ℓ)| ≤ sup |ℓ(xn )| ≤ ∥ℓ∥ sup ∥xn ∥ ≤ C∥ℓ∥ which shows
j ∈ X ∗∗ . Since X is reflexive, j = J(x) for some x ∈ X and by construction
ℓ(xn ) → J(x)(ℓ) = ℓ(x), that is, xn ⇀ x.
(v) This follows from
∥xn − xm ∥ = sup |ℓ(xn − xm )|
∥ℓ∥=1
Item (ii) says that the norm is sequentially weakly lower semicontinuous
(cf. Problem B.19) while the previous example shows that it is not sequen-
tially weakly continuous. However, bounded linear operators turn out to
be sequentially weakly continuous (Problem 4.37). Nonlinear operations are
more tricky as the next example shows:
Example 4.29. Consider L2 (0, 1) and recall (see Example 3.10) that
√
un (x) := 2 sin(nπx), n ∈ N,
form an ONB and hence un ⇀ 0. However, vn := u2n ⇀ 1. In fact, one easily
computes
√ √
2(1 − (−1)m ) 4n2 2(1 − (−1)m )
⟨um , vn ⟩ = → = ⟨um , 1⟩
mπ (4n2 − m2 ) mπ
q
and the claim follows from Problem 4.41 since ∥vn ∥ = 32 . ⋄
Example 4.30. Let X := c0 (N) and hence X ∗ ∼ = ℓ1 (N). Let anj := 1 for
1 ≤ j ≤ n and anj := 0 for j > n. Then for every b ∈ ℓ1 (N) we have
∞
X n
X ∞
X
lim lb (an ) = lim bj anj = lim bj = bj
n→∞ n→∞ n→∞
j=1 j=1 j=1
and hence an is a weak Cauchy sequence which, does not converge. Indeed,
an ⇀ a would imply aj = 1 for all j ∈ N (upon choosing b = δ j ) which is
clearly not in X. The limit is however in X ∗∗ ∼
= ℓ∞ (N). ⋄
Remark: One can equip X with the weakest topology for which all ℓ ∈ X ∗
remain continuous. This topology is called the weak topology and it is
given by taking all finite intersections of inverse images of open sets as a
base. By construction, a sequence will converge in the weak topology if and
only if it converges weakly. By Corollary 4.15 the weak topology is Hausdorff,
but it will not be metrizable in general. In particular, sequences do not suffice
to describe this topology. Nevertheless we will stick with sequences for now
and come back to this more general point of view in Section 6.3.
In a Hilbert space there is also a simple criterion for a weakly convergent
sequence to converge in norm (see Theorem 6.19 for a generalization).
134 4. The main theorems about Banach spaces
Proof. By (ii) of the previous lemma we have lim ∥fn ∥ = ∥f ∥ and hence
∥f − fn ∥2 = ∥f ∥2 − 2Re(⟨f, fn ⟩) + ∥fn ∥2 → 0.
The converse is straightforward. □
Now we come to the main reason why weakly convergent sequences are of
interest: A typical approach for solving a given equation in a Banach space
is as follows:
(i) Construct a (bounded) sequence xn of approximating solutions
(e.g. by solving the equation restricted to a finite dimensional sub-
space and increasing this subspace).
(ii) Use a compactness argument to extract a convergent subsequence.
(iii) Show that the limit solves the equation.
Our aim here is to provide some results for the step (ii). In a finite di-
mensional vector space the most important compactness criterion is bound-
edness (Heine–Borel theorem, Theorem B.22). In infinite dimensions this
breaks down as we have already seen in Section 1.5. We even have
Theorem 4.31 (F. Riesz). The closed unit ball in a Banach space X is
compact if and only if X is finite dimensional.
Of course in the formulation of the above theorem the unit ball could
be replaced with any ball (of positive radius). In particular, if X is infinite
dimensional, a compact set cannot contain a ball, that is, it must have
empty interior. Hence compact sets are always meager in infinite dimensional
spaces.
However, if we are willing to treat convergence for weak convergence, the
situation looks much brighter!
Theorem 4.32 (Šmulian5). Let X be a reflexive Banach space. Then every
bounded sequence has a weakly convergent subsequence.
5Vitold Shmulyan (1914–1944), Soviet mathematician
4.5. Weak convergence 135
a contradiction. ⋄
Example 4.32. Let X := L1 (−1, 1). Every continuous function φ gives rise
to a linear functional
Z 1
ℓφ (f ) := f (x)φ(x) dx
−1
inL1 (−1, 1)∗ . Take some nonnegative u1 with compact support, ∥u1 ∥1 = 1,
and set uk (x) := ku1 (k x) (implying ∥uk ∥1 = 1). Then we have
Z
uk (x)φ(x) dx → φ(0)
a contradiction.
136 4. The main theorems about Banach spaces
n
bk := sign(ak j ), kj ≤ k < kj+1 .
Then
X n
X n 2 1 1
|lb (anj )| ≥ |ak j | − bk ak j ≥ − = ,
3 3 3
kj ≤k<kj+1 1≤k<kj ; kj+1 ≤k
contradicting anj ⇀ 0. ⋄
It is also useful to observe that compact operators will turn weakly con-
vergent into (norm) convergent sequences.
Theorem 4.33. Let A ∈ K (X, Y ) be compact. Then xn ⇀ x implies
Axn → Ax. If X is reflexive the converse is also true.
Then Sn converges to zero strongly but not in norm (since ∥Sn ∥ = 1) and Sn∗
converges weakly to zero (since ⟨x, Sn∗ y⟩ = ⟨Sn x, y⟩) but not strongly (since
∥Sn∗ x∥ = ∥x∥) . ⋄
and we get that the Fourier series does not converge for some L1 function. ⋄
4.5. Weak convergence 139
to the fact that the sequence is bounded and each component converges (cf.
Problem 4.42). ⋄
With this notation the proof of Lemma 4.29 (iv) shows that (without
assuming X to be refelxive) every weak Cauchy sequence converges weak-
∗. Similarly, it is also possible to slightly generalize Theorem 4.32 (Prob-
lem 4.43):
Lemma 4.36 (Helly7). Suppose X is a separable Banach space. Then every
bounded sequence ℓn ∈ X ∗ has a weak-∗ convergent subsequence.
∗
Problem* 4.42. Show that if {xj } ⊆ X is some total set, then ℓn ⇀ ℓ if
and only if ℓn ∈ X ∗ is bounded and ℓn (xj ) → ℓ(xj ) for all j.
Problem* 4.43. Prove Lemma 4.36.
Chapter 5
We have started out our study by looking at eigenvalue problems which, from
a historic view point, were one of the key problems driving the development
of functional analysis. In Chapter 3 we have investigated compact operators
in Hilbert space and we have seen that they allow a treatment similar to
what is known from matrices. However, more sophisticated problems will
lead to operators whose spectra consist of more than just eigenvalues. Hence
we want to go one step further and look at spectral theory for bounded
operators. Here one of the driving forces was the development of quantum
mechanics (there even the boundedness assumption is too much — but first
things first). A crucial role is played by the algebraic structure, namely recall
from Section 1.6 that the bounded linear operators on X form a Banach
space which has a (non-commutative) multiplication given by composition.
In order to emphasize that it is only this algebraic structure which matters,
we will develop the theory from this abstract point of view. While the reader
should always remember that bounded operators on a Hilbert space is what
we have in mind as the prime application, examples will apply these ideas
also to other cases thereby justifying the abstract approach.
To begin with, the operators could be on a Banach space (note that even
if X is a Hilbert space, L (X) will only be a Banach space) but eventually
again self-adjointness will be needed. Hence we will need the additional
operation of taking adjoints.
143
144 5. Bounded linear operators
and
(xy)z = x(yz), α (xy) = (αx)y = x (αy), α ∈ C, (5.2)
and
∥xy∥ ≤ ∥x∥∥y∥. (5.3)
is called a Banach algebra. In particular, note that (5.3) ensures that mul-
tiplication is continuous (Problem 5.1). Conversely one can show that (sep-
arate) continuity of multiplication implies existence of an equivalent norm
satisfying (5.3) (Problem 5.2).
An element e ∈ X satisfying
ex = xe = x, ∀x ∈ X (5.4)
is called identity (show that e is unique) and we will assume ∥e∥ = 1 in this
case (by Problem 5.2 this can be done without loss of generality).
Example 5.1. The continuous functions C(I) over some compact interval
form a commutative Banach algebra with identity 1. ⋄
Example 5.2. The differentiable functions C n (I) over some compact inter-
val do not form a commutative Banach algebra since (5.3) fails for n > 1.
However, the equivalent norm
n
X ∥f (k) ∥∞
∥f ∥∞,n :=
k!
k=0
remedies this problem. ⋄
Example 5.3. Let X be a Banach space. The bounded linear operators
L (X) form a Banach algebra with identity I. ⋄
Example 5.4. The bounded sequences ℓ∞ (N) together with the component-
wise product form a commutative Banach algebra with identity 1. ⋄
Example 5.5. The space of all periodic continuous functions which have an
absolutely convergent Fourier series A together with the norm
X
∥f ∥A := |fˆk |
k∈Z
respectively
∞ ∞ ∞
!
X X X
n n
x (e − x) = x − xn = e. □
n=0 n=0 n=1
In particular, both conditions are satisfied if ∥y∥ < ∥x−1 ∥−1 and the set of
invertible elements G(X) is open and taking the inverse is continuous:
∥y∥∥x−1 ∥2
∥(x − y)−1 − x−1 ∥ ≤ . (5.10)
1 − ∥x−1 y∥
continuous function. But will it again have a convergent Fourier series, that
is, will it be in the Wiener Algebra? The affirmative answer of this question
is a famous theorem of Wiener, which will be given later in Theorem 7.22. ⋄
The map α 7→ (x − α)−1 is called the resolvent of x ∈ X. If α0 ∈ ρ(x)
we can choose x → x − α0 and y → α − α0 in (5.9) which implies
∞
X
(x − α)−1 = (α − α0 )n (x − α0 )−n−1 , |α − α0 | < ∥(x − α0 )−1 ∥−1 . (5.13)
n=0
Proof. Equation (5.13) already shows that ρ(x) is open. Hence σ(x) is
closed. Moreover, x − α = −α(e − α1 x) together with Lemma 5.1 shows
∞
1 X x n
(x − α)−1 = − , |α| > ∥x∥,
α α
n=0
which implies σ(x) ⊆ {α| |α| ≤ ∥x∥} is bounded and thus compact. More-
over, taking norms shows
∞
−1 1 X ∥x∥n 1
∥(x − α) ∥ ≤ n
= , |α| > ∥x∥,
|α| |α| |α| − ∥x∥
n=0
which implies (x − → 0 as α → ∞. In particular, if σ(x) is empty,
α)−1
then ℓ((x − α)−1 ) is an entire analytic function which vanishes at infinity.
By Liouville’s theorem we must have ℓ((x − α)−1 ) = 0 for all ℓ ∈ X ∗ in this
case, and so (x − α)−1 = 0, which is impossible. □
Example 5.13. The spectrum of the matrix
0 1
0 1
. .. ...
A :=
0 1
−c0 −c1 · · · · · · −cn−1
is given by the zeros of the polynomial (show this)
det(zI − A) = z n + cn−1 z n−1 + · · · + c1 z + c0 .
Hence the fact that σ(A) is nonempty implies the fundamental theorem
of algebra, that every non-constant polynomial has at least one zero. ⋄
As another simple consequence we obtain:
Theorem 5.4 (Gelfand–Mazur3). Suppose X is a Banach algebra in which
every element except 0 is invertible. Then X is isomorphic to C.
The second key ingredient for the proof of the spectral theorem is the
spectral radius
r(x) := sup |α| (5.18)
α∈σ(x)
of x. Note that by (5.15) we have
r(x) ≤ ∥x∥. (5.19)
As our next theorem shows, it is related to the radius of convergence of the
Neumann series for the resolvent
∞
−1 1 X x n
(x − α) = − (5.20)
α α
n=0
encountered in the proof of Theorem 5.3 (which is just the Laurent expansion
around infinity).
Theorem 5.6 (Beurling–Gelfand). The spectral radius satisfies
r(x) = inf ∥xn ∥1/n = lim ∥xn ∥1/n . (5.21)
n∈N n→∞
Then ℓ((x − α)−1 ) is analytic in |α| > r(x) and hence (5.22) converges
absolutely for |α| > r(x) by Cauchy’s integral formula for derivatives. Hence
for fixed α with |α| > r(x), ℓ(xn /αn ) converges to zero for every ℓ ∈ X ∗ .
Since every weakly convergent sequence is bounded we have
∥xn ∥
≤ C(α)
|α|n
and thus
lim sup ∥xn ∥1/n ≤ lim sup C(α)1/n |α| = |α|.
n→∞ n→∞
Since this holds for every |α| > r(x) we have
r(x) ≤ inf ∥xn ∥1/n ≤ lim inf ∥xn ∥1/n ≤ lim sup ∥xn ∥1/n ≤ r(x),
n→∞ n→∞
which finishes the proof. □
Note that it might be tempting to conjecture that the sequence ∥xn ∥1/n
is monotone, however this is false in general – see Problem 5.11. By the ratio
test, the Neumann series (5.20) converges for |α| > r(x).
Next let us look at some examples illustrating these ideas.
Example 5.14. In X := C(I) we have σ(x) = x(I) and hence r(x) = ∥x∥∞
for all x. ⋄
Example 5.15. If X := L (C2 ) and x := ( 00 10 ) such that x2 = 0 and
consequently r(x) = 0. This is not surprising, since x has the only eigenvalue
0. In particular, the spectral radius can be strictly smaller then the norm
(note that ∥x∥ = 1 in our example). The same is true for any nilpotent
matrix. In general, x will be called nilpotent if xn = 0 for some n ∈ N
and any nilpotent element will satisfy r(x) = 0. Note that in this case the
Neumann series terminates after n terms,
n−1
−1 1 X x j
(x − α) =− , α ̸= 0,
α α
j=0
∥k∥n
that is ∥K n ∥ ≤ n! ,
∞
which shows
∥k∥∞
r(K) ≤ lim = 0.
n→∞ (n!)1/n
Hence r(K) = 0 and for every λ ∈ C and every y ∈ C[0, 1] the equation
x − λK x = y (5.25)
has a unique solution given by
∞
X
−1
x = (I − λK) y= λn K n y. (5.26)
n=0
Note that σ(K) = {0} but 0 is in general not an eigenvalue (consider
e.g. k(t, s) = 1). Elements of a Banach algebra with r(x) = 0 are called
quasinilpotent. Since the Neumann series (5.20) converges for |α| > 0 in
this case, the resolvent has an essential singularity at 0 if x is quasinilpotent
(but not nilpotent). ⋄
In the last two examples we have seen a strict inequality in (5.19). If we
regard r(x) as a spectral norm for x, then the spectral norm does not control
the algebraic norm in such a situation. On the other hand, if we had equal-
ity for some x, and moreover, this were also true for any polynomial p(x),
then spectral mapping would imply that the spectral norm supα∈σ(x) |p(α)|
equals the algebraic norm ∥p(x)∥ and convergence on one side would imply
convergence on the other side. So by taking limits we could get an isometric
identification of elements of the form f (x) with functions f ∈ C(σ(x)). But
this is nothing but the content of the spectral theorem and self-adjointness
will be the property which will make all this work.
We end this section with the remark that neither the spectrum nor the
spectral radius is continuous. All one can say is
Lemma 5.7. Let xn ∈ X be a convergent sequence and x := limn→∞ xn .
Then whenever α ∈ ρ(x) we have α ∈ ρ(xn ) eventually and
(xn − α)−1 → (x − α)−1 . (5.27)
Moreover,
lim σ(xn ) ⊆ σ(x), (5.28)
n→∞
where limn→∞ σ(xn ) := {α ∈ C|∃αn ∈ σ(xn ) → α}, and
r(x) ≥ lim sup r(xn ). (5.29)
n→∞
Proof. The first claim is immediate since taking the inverse is continuous
by Corollary 5.2. Furthermore, Corollary 5.2 also shows that for α ∈ ρ(A)
and ∥x − xn ∥ + |α − αn | < ∥(x − α)−1 ∥−1 we have αn ∈ ρ(xn ), which implies
the second claim.
152 5. Bounded linear operators
Concerning the last claim, observe that r(xk ) ≤ ∥xnk ∥1/n implies that
lim supk→∞ r(xk ) ≤ ∥xn ∥1/n . □
Example 5.17. That the spectrum can expand is shown by the following
example due to Kakutani4. We consider the bounded linear operators on
ℓ2 (N) and look at shift-type operators of the form
(Aa)j := qj aj+1 ,
where q ∈ ℓ∞ (N). Then we have ∥A∥ = supj∈N |qj | and
(An a)j = (qj qj+1 · · · qj+n−1 )aj+n
with ∥An ∥ = supj∈N |qj qj+1 · · · qj+n−1 |.
Now note that every integer can be written as j = 2k (2l + 1) and write
k(j) := k in this case. Choose
qj := e−k(j) .
To compute the above products we group integers into blocks 2m−1 , . . . , 2m −
P2m−1
1 of 2m−1 elements and observe that Km := j=2m−1 k(j) = 2
m−1 − 1.
Indeed, note that since odd numbers do not contribute to this sum, we can
drop them and divide the remaining even ones by 2 to get the previous block.
This shows Km = P 2m−2 + Km−1 and establishes the claim. Summing over
all blocks we have nm=1 Km = 2n − n − 1 implying
n n
∥A2 ∥1/2 = q1 q2 · · · q2n −1 = exp(−1 + (n + 1)2−n ).
Taking the limit n → ∞ shows r(A) = 1e .
Next define (
0, k(j) = k,
(Ak a)j :=
qj aj+1 , else,
k+1
and observe that Ak is nilpotent since A2 = 0. Indeed note that (Ak a)j =
0 for j = 2 , 2 3, 2 5, . . . which are a distance 2k+1 − 1 apart. Hence apply-
k k k
ing A once more the result will vanish at the previous points as well, etc.
Moreover, (
qj aj+1 , k(j) = k,
((Ak − A)a)j =
0, else,
implying ∥Ak − A∥ = e−k . Hence we have Ak → A with r(Ak ) = 0 → 0 <
e−1 = r(x) and σ(Ak ) = {0} → {0} ⊊ σ(A). ⋄
Problem* 5.1. Show that the multiplication in a Banach algebra X is con-
tinuous: xn → x and yn → y imply xn yn → xy.
4Shizuo Kakutani (1911–2004), Japanese-American mathematician
5.1. Banach algebras 153
Problem* 5.2. Suppose that X satisfies all requirements for a Banach al-
gebra except that (5.3) is replaced by
∥xy∥ ≤ C∥x∥∥y∥, C > 0.
Of course one can rescale the norm to reduce it to the case C = 1. However,
this might have undesirable side effects in case there is a unit. Show that if
X has a unit e, then ∥e∥ ≥ C −1 and there is an equivalent norm ∥.∥0 which
satisfies (5.3) and ∥e∥0 = 1.
Finally, note that for this construction to work it suffices to assume that
multiplication is separately continuous by Problem 4.8.
(Hint: Identify x ∈ X with the operator Lx : X → X, y 7→ xy in L (X).
For the last part use the uniform boundedness principle.)
Problem* 5.3 (Unitization). Show that if X is a Banach algebra then
C ⊕ X is a unital Banach algebra, where we set ∥(α, x)∥ = |α| + ∥x∥ and
(α, x)(β, y) = (αβ, αy + βx + xy).
Problem 5.4. Show σ(x−1 ) = σ(x)−1 if x is invertible.
Problem 5.5. An element x ∈ X satisfying x2 = x is called a projection.
Compute the spectrum of a projection.
Problem 5.6. If X := L (Lp (I)), then every x ∈ C(I) gives rise to a
multiplication operator Mx ∈ X defined as Mx f := x f . Show r(Mx ) =
∥Mx ∥ = ∥x∥∞ and σ(Mx ) = Ran(x).
Problem 5.7. If X := L (ℓp (N)), then every m ∈ ℓ∞ (N) gives rise to a
multiplication operator M ∈ X defined as (M a)n := mn an . Show r(M ) =
∥M ∥ = ∥m∥∞ and σ(M ) = Ran(m).
Problem 5.8. Can every compact set K ⊂ C arise as the spectrum of an
element of some Banach algebra?
Problem* 5.9. Suppose x has both a right inverse y (i.e., xy = e) and a
left inverse z (i.e., zx = e). Show that y = z = x−1 .
Problem* 5.10. Suppose xy and yx are both invertible, then so are x and
y:
y −1 = (xy)−1 x = x(yx)−1 , x−1 = (yx)−1 y = y(xy)−1 .
(Hint: Previous problem.)
Problem* 5.11. Let X := L (C2 ) and compute ∥xn ∥1/n for x := β0 α0 .
Example 5.18. The continuous functions C(I) together with complex con-
jugation form a commutative C ∗ algebra. ⋄
Example 5.19. The Banach algebra L (H) is a C ∗ algebra by Lemma 2.15.
The compact operators K (H) are a ∗-subalgebra. ⋄
Example 5.20. The bounded sequences ℓ∞ (N) together with complex con-
jugation form a commutative C ∗ algebra. The set c0 (N) of sequences con-
verging to 0 are a ∗-subalgebra. ⋄
If X has an identity e, we clearly have e∗ = e, ∥e∥ = 1, (x−1 )∗ = (x∗ )−1
(show this), and
σ(x∗ ) = σ(x)∗ . (5.33)
We will always assume that we have an identity and we note that it is always
possible to add an identity (Problem 5.17).
If X is a C ∗ algebra, then x ∈ X is called normal if x∗ x = xx∗ , self-
adjoint if x∗ = x, and unitary if x∗ = x−1 . Moreover, x is called positive if
x = y 2 for some y = y ∗ ∈ X. Clearly both self-adjoint and unitary elements
are normal and positive elements are self-adjoint. If x is normal, then so is
any polynomial p(x) (it will be self-adjoint if x is and p is real-valued).
As already pointed out in the previous section, it is crucial to identify
elements for which the spectral radius equals the norm. The key ingredient
will be (5.31) which implies 2
p ∥x ∥ = ∥x∥ p if x is self-adjoint. For unitary
2
The next result generalizes the fact that self-adjoint operators have only
real eigenvalues.
Lemma 5.9. If x is self-adjoint, then σ(x) ⊆ R. If x is positive, then
σ(x) ⊆ [0, ∞).
Proof. First of all, Φ is well defined for polynomials p and given by Φ(p) =
p(x). Moreover, since p(x) is normal, spectral mapping implies
∥p(x)∥ = r(p(x)) = sup |α| = sup |p(α)| = ∥p∥∞
α∈σ(p(x)) α∈σ(x)
for every polynomial p. Hence Φ is isometric. Next we use that the poly-
nomials are dense in C(σ(x)). In fact, to see this one can either consider
a compact interval I containing σ(x) and use the Tietze extension theo-
rem (Theorem B.30 to extend f to I and then approximate the extension
using polynomials (Theorem 1.3) or use the Stone–Weierstraß theorem (The-
orem B.42). Thus Φ uniquely extends to a map on all of C(σ(x)) by Theo-
rem 1.16. By continuity of the norm this extension is again isometric. Sim-
ilarly, we have Φ(f g) = Φ(f )Φ(g) and Φ(f )∗ = Φ(f ∗ ) since both relations
hold for polynomials.
To show σ(f (x)) = f (σ(x)) fix some α ∈ C. If α ̸∈ f (σ(x)), then
1
g(t) = f (t)−α ∈ C(σ(x)) and Φ(g) = (f (x) − α)−1 ∈ X shows α ̸∈ σ(f (x)).
Conversely, if α ̸∈ σ(f (x)) then g = Φ−1 ((f (x) − α)−1 ) = f −α
1
is continuous,
which shows α ̸∈ f (σ(x)). □
We have
µu,v1 +v2 = µu,v1 + µu,v2 , µu,αv = αµu,v , µv,u = µ∗u,v (5.38)
and |µu,v |(σ(A)) ≤ ∥u∥∥v∥. Furthermore, µu := µu,u is a positive Borel
measure with µu (σ(A)) = ∥u∥2 .
Proof. Consider the continuous functions on I = [−∥A∥, ∥A∥] and note that
every f ∈ C(I) gives rise to some f ∈ C(σ(A)) by restricting its domain.
Clearly ℓu,v (f ) := ⟨u, f (A)v⟩ is a bounded linear functional and the existence
of a corresponding measure µu,v with |µu,v |(I) = ∥ℓu,v ∥ ≤ ∥u∥∥v∥ follows
from the Riesz representation theorem (Theorem 6.5 from [37]). Since ℓu,v (f )
depends only on the value of f on σ(A) ⊆ I, µu,v is supported on σ(A).
Moreover, if f ≥ 0 we have ℓu (f ) := ⟨u, f (A)u⟩ = ⟨f (A)1/2 u, f (A)1/2 u⟩ =
∥f (A)1/2 u∥2 ≥ 0 and hence ℓu is positive and the corresponding measure µu
is positive. The rest follows from the properties of the scalar product. □
Z m
X
f (A) = f (t)dPA (t) = f (αj )PA ({αj }).
j=1
Advanced Functional
Analysis
Chapter 6
More on convexity
165
166 6. More on convexity
ℓ(x) = c
a topological vector space with the usual topology generated by open balls.
As in the case of normed linear spaces, X ∗ will denote the vector space of
all continuous linear functionals on X.
Lemma 6.1. Let X be a vector space and U a convex subset containing 0.
Then
pU (x + y) ≤ pU (x) + pU (y), pU (λ x) = λpU (x), λ ≥ 0. (6.2)
Moreover, {x|pU (x) < 1} ⊆ U ⊆ {x|pU (x) ≤ 1}. If, in addition, X is a
topological vector space and U is open, then U = {x|pU (x) < 1}.
are called locally convex spaces and they will be discussed further in
Section 6.4. For now we just remark that every normed vector space is
locally convex since balls are convex.
is a convex open set which is disjoint from V . Hence by the previous theorem
we can find some ℓ such that Re(ℓ(x)) < c ≤ Re(ℓ(y)) for all x ∈ Ũ and
y ∈ V . Moreover, since Re(ℓ(U )) is a compact interval [e, d], the claim
follows. □
Problem 6.4. Show that Corollary 6.4 fails even in R2 unless one set is
compact.
Problem 6.6 (Bipolar theorem). Let X be a locally convex space and sup-
pose M ⊆ X is absolutely convex. Show (M ◦ )◦ = M . (Hint: Use Corol-
lary 6.4 to show that for every y ̸∈ M there is some ℓ ∈ X ∗ with Re(ℓ(x)) ≤
1 < ℓ(y), x ∈ M .)
A line segment is convex and can be generated as the convex hull of its
endpoints. Similarly, a full triangle is convex and can be generated as the
convex hull of its vertices. However, if we look at a ball, then we need its
entire boundary to recover it as the convex hull. So how can we characterize
those points which determine a convex set via the convex hull?
Let K be a set and M ⊆ K a nonempty subset. Then M is called
an extremal subset of K if no point of M can be written as a convex
combination of two points unless both are in M : For given x, y ∈ K and
λ ∈ (0, 1) we have that
λx + (1 − λ)y ∈ M ⇒ x, y ∈ M. (6.9)
If M = {x} is extremal, then x is called an extremal point of K. Hence
an extremal point cannot be written as a convex combination of two other
points from K.
Note that we did not require K to be convex. If K is convex and M is
extremal, then K \ M is convex. Conversely, if K and K \ {x} are convex,
then x is an extremal point. Note that the nonempty intersection of extremal
sets is extremal. Moreover, if L ⊆ M is extremal and M ⊆ K is extremal,
then L ⊆ K is extremal as well (Problem 6.8).
Example 6.3. Consider R2 with the norms ∥.∥p . Then the extremal points
of the closed unit ball (cf. Figure 1.1) are the boundary points for 1 < p < ∞
and the vertices for p = 1, ∞. In any case the boundary is an extremal set.
Slightly more general, in a strictly convex space, (ii) of Problem 1.16 says
that the extremal points of the unit ball are precisely its boundary points. ⋄
Example 6.4. Consider R3 and let C := {(x1 , x2 , 0) ∈ R3 |x21 + x22 = 1}.
Take two more points x± = (0, 0, ±1) and consider the convex hull K of
M := C ∪ {x+ , x− }. Then M is extremal in K and, moreover, every point
from M is an extremal point. However, if we change the two extra points to
be x± = (1, 0, ±1), then the point (1, 0, 0) is no longer extremal. Hence the
extremal points are now M \ {(1, 0, 0)}. Note in particular that the set of
extremal points is not closed in this case. ⋄
Extremal sets arise naturally when minimizing linear functionals.
Lemma 6.7. Suppose K ⊆ X and ℓ ∈ X ∗ . If
Kℓ := {x ∈ K|Re(ℓ(x)) = inf Re(ℓ(y))}
y∈K
6.2. Convex sets and the Krein–Milman theorem 171
Proof. We want to apply Zorn’s lemma. To this end consider the family
M = {M ⊆ K|compact and extremal in K}
with the partial order given by reversed inclusion. Since K ∈ M this family
is nonempty. Moreover, given a linear chain C ⊂ M we consider M := C.
T
Then M ⊆ K is nonempty by the finite intersection property and since it
is closed also compact. Moreover, as the nonempty intersection of extremal
sets it is also extremal. Hence M ∈ M and thus M has a maximal element.
Denote this maximal element by M .
We will show that M contains precisely one point (which is then extremal
by construction). Indeed, suppose x, y ∈ M . If x ̸= y then {x}, {y} are two
disjoint and compact sets to which we can apply Corollary 6.4 to obtain a
linear functional ℓ ∈ X ∗ with Re(ℓ(x)) ̸= Re(ℓ(y)). Then by Lemma 6.7
Mℓ ⊂ M is extremal in M and hence also in K. But by Re(ℓ(x)) ̸= Re(ℓ(y))
it cannot contain both x and y contradicting maximality of M . □
Finally, we want to recover a convex set as the convex hull of its ex-
tremal points. In our infinite dimensional setting an additional closure will
be necessary in general.
Since the intersection of arbitrary closed convex sets is again closed and
convex we can define the closed convex hull of a set U as the smallest closed
convex set containing U , that is, the intersection of all closed convex sets
containing U . Since the closure of a convex set is again convex (Problem 6.11)
the closed convex hull is simply the closure of the convex hull.
Theorem 6.9 (Krein–Milman). Let X be a locally convex Hausdorff space.
Suppose K ⊆ X is convex and compact. Then it is the closed convex hull of
its extremal points.
Now consider Kℓ from Lemma 6.7 which is nonempty and hence contains an
extremal point y ∈ E. But y ̸∈ M , a contradiction. □
While in the finite dimensional case the closure is not necessary (Prob-
lem 6.13), it is important in general as the following example shows.
Example 6.8. Consider the closed unit ball in ℓ1 (N). Then the extremal
points are {eiθ δ n |n ∈ N, θ ∈ R}. Indeed, suppose ∥a∥1 = 1 with λ :=
|aj | ∈ (0, 1) for some j ∈ N. Then a = λb + (1 − λ)c where b := λ−1 aj δ j
and c := (1 − λ)−1 (a − aj δ j ). Hence the only possible extremal points
are of the form eiθ δ n . Moreover, if eiθ δ n = λb + (1 − λ)c we must have
1 = |λbn +(1−λ)cn | ≤ λ|bn |+(1−λ)|cn | ≤ 1 and hence an = bn = cn by strict
convexity of the absolute value. Thus the convex hull of the extremal points
are the sequences from the unit ball which have finitely many terms nonzero.
While the closed unit ball is not compact in the norm topology it will be in
the weak-∗ topology by the Banach–Alaoglu theorem (Theorem 6.10). To
this end note that ℓ1 (N) ∼
= c0 (N)∗ . ⋄
Also note that in the infinite dimensional case the extremal points can
be dense.
Example 6.9. Let X = C([0, 1], R) and consider the convex set K = {f ∈
C 1 ([0, 1], R)|f (0) = 0, ∥f ′ ∥∞ ≤ 1}. Note that the functions f± (x) = ±x are
extremal. For example, assume
x = λf (x) + (1 − λ)g(x)
then
1 = λf ′ (x) + (1 − λ)g ′ (x)
which implies f ′ (x) = g ′ (x) = 1 and hence f (x) = g(x) = x.
To see that there are no other extremal functions, suppose |f ′ (x)| ≤ 1−ε
on some interval I. Choose a nontrivial continuous function R x g which is 0
outside I and has integral 0 over I and ∥g∥∞ ≤ ε. Let G = 0 g(t)dt. Then
f = 21 (f + G) + 12 (f − G) and hence f is not extremal. Thus f± are the
only extremal points and their (closed) convex hull is given by fλ (x) = λx
for λ ∈ [−1, 1].
Of course the problem is that K is not closed. Hence we consider the
Lipschitz continuous functions K̄ := {f ∈ C 0,1 ([0, 1], R)|f (0) = 0, [f ]1 ≤ 1}
(this is in fact the closure of K, but this is a bit tricky to see and we won’t
need this here). By the Arzelà–Ascoli theorem (Theorem 1.13) K̄ is relatively
compact and since the Lipschitz estimate clearly is preserved under uniform
limits it is even compact.
Now note that piecewise linear functions with f ′ (x) ∈ {±1} away from
the kinks are extremal in K̄. Moreover, these functions are dense: Split
[0, 1] into n pieces of equal length using xj = nj . Set fn (x0 ) = 0 and
fn (x) = fn (xj ) ± (x − xj ) for x ∈ [xj , xj+1 ] where the sign is chosen such
that |f (xj+1 ) − fn (xj+1 )| gets minimal. Then ∥f − fn ∥∞ ≤ n1 . ⋄
174 6. More on convexity
Please note that Example 4.40 shows that B̄1∗ (0) ⊂ X ∗ might not be
sequentially compact in the weak-∗ topology unless X is separable.
If X is a reflexive space and we apply this to X ∗ , we get that the closed
unit ball is compact in the weak topology. In fact, the converse is also true.
Theorem 6.11 (Kakutani). A Banach space X is reflexive if and only if
the closed unit ball B̄1 (0) is weakly compact.
Proof. Suppose X is not reflexive and choose x′′ ∈ B̄1∗∗ (0) \ J(B̄1 (0)) with
∥x′′ ∥ = 1. Then, if B̄1 (0) is weakly compact, J(B̄1 (0)) is weak-∗ compact
(note that J is a homeomorphism if we equip X with the weak and X ∗∗ with
the weak-* topology). Moreover, if we equip X ∗∗ with the weak-* topology,
then its dual is isomorphic to X ∗ (Problem 6.19) and by Corollary 6.4 we
can find some ℓ ∈ X ∗ with ∥ℓ∥ = 1 and
Re(x′′ (ℓ)) < inf Re(y ′′ (ℓ)) = inf Re(ℓ(y)) = −1.
y ′′ ∈J(B̄1 (0)) y∈B̄1 (0)
Note that in this context Theorem 4.32 says that in a reflexive Banach
space X the closed unit ball B̄1 (0) is weakly sequentially compact. The
converse is also true and known as the Eberlein theorem.
Since the weak topology is weaker than the norm topology, every weakly
closed set is also (norm) closed. Moreover, the weak closure of a set will in
general be larger than the norm closure. However, for convex sets both will
coincide. In fact, we have the following characterization in terms of closed
(affine) half-spaces, that is, sets of the form {x ∈ X|Re(ℓ(x)) ≤ α} for
some ℓ ∈ X ∗ and some α ∈ R.
Theorem 6.12 (Mazur). The weak as well as the norm closure of a convex
set K is the intersection of all half-spaces containing K. In particular, for a
convex set K ⊆ X the following are equivalent:
• K is weakly closed,
6.3. Weak topologies 177
Problem 4.24 the ℓj would otherwise be a basis) such that the affine line
w
x + tx0 is in this neighborhood and hence also avoids S . But this is impos-
sible since by the intermediate value theorem there is some t0 > 0 such that
w
∥x + t0 x0 ∥ = 1. Hence B̄1 (0) ⊆ S . ⋄
Note that this example also shows that in an infinite dimensional space
the weak and norm topologies are always different! In a finite dimensional
space both topologies of course agree.
Example 6.11. Let H be a Hilbert space and {φj } some infinite ONS.
Then we already know φj ⇀ 0. Moreover, the convex combination ψj :=
1 Pj
k=1 φk → 0 since ∥ψj ∥ = j
−1/2 . ⋄
j
For the last result note that since X ∗∗ is the dual of X ∗ it has a cor-
responding weak-∗ topology and by the Banach–Alaoglu theorem B̄1∗∗ (0) is
weak-∗ compact and hence weak-∗ closed.
178 6. More on convexity
Theorem 6.14 (Goldstine4). The image of the closed unit ball B̄1 (0) under
the canonical embedding J into the closed unit ball B̄1∗∗ (0) is weak-∗ dense.
Proof. Let j ∈ B̄1∗∗ (0) be given. Since sets of the form {ȷ̃ ∈ X ∗∗ | |j(ℓk ) −
ȷ̃(ℓk )| < ε, 1 ≤ k ≤ n} provide a neighborhood base (where we can assume
the ℓk ∈ X ∗ to be linearly independent without loss of generality), it suffices
to find some x ∈ B̄1+ε (0) with ℓk (x) = j(ℓk ) for 1 ≤ k ≤ n since then
(1 + ε)−1 J(x) will be in the above neighborhood. Without the requirement
∥x∥ ≤ 1 + ε this follows from surjectivity of the map F : X → Cn , x 7→
(ℓ1 (x), . . . , ℓn (x)). Moreover, given
T one such x the same is true for every
element from x + Y , where Y := k Ker(ℓk ). But if (x + Y ) ∩ B̄1+ε (0) were
empty, we would have dist(x, Y ) ≥ 1 + ε and by Corollary 4.15 we could find
some normalized ℓ ∈ X ∗ which vanishes on Y and satisfies ℓ(x) ≥ 1 + ε. By
Problem 4.24 we have ℓ ∈ span(ℓ1 , . . . , ℓn ) implying
1 + ε ≤ ℓ(x) = j(ℓ) ≤ ∥j∥∥ℓ∥ ≤ 1
a contradiction. □
Note that if B̄1 (0) ⊂ X is weakly compact, then J(B̄1 (0)) is compact
(and thus closed) in the weak-∗ topology on X ∗∗ . Hence Goldstine’s theorem
implies J(B̄1 (0)) = B̄1∗∗ (0) and we get an alternative proof of Kakutani’s
theorem.
Example 6.12. Consider X := c0 (N), X ∗ = ∼ ℓ∞ (N) with
∼ ℓ1 (N), and X ∗∗ =
J corresponding to the inclusion c0 (N) ,→ ℓ∞ (N). Then we can consider the
linear functionals ℓj (x) = xj which are total in X ∗ and a sequence in X ∗∗
will be weak-∗ convergent if and only if it is bounded and converges when
composed with any of the ℓj (in other words, when the sequence converges
componentwise — cf. Problem 4.42). So for example, cutting off a sequence
in B̄1∗∗ (0) after n terms (setting the remaining terms equal to 0) we get a
sequence from B̄1 (0) ,→ B̄1∗∗ (0) which is weak-∗ convergent (but of course
not norm convergent).
Also observe that c0 (N) ⊆ ℓ∞ (N) is closed but not weak-∗ closed and
hence Mazur’s theorem does not hold if we replace the weak by the weak-∗
topology. ⋄
Problem 6.14. Show that in an infinite dimensional space, a weakly open
neighborhood of 0 contains a nontrivial subspace. Show the analogue state-
ment for weak-∗ open neighborhoods of 0.
Problem 6.15. Show that a weakly sequentially compact set is bounded.
Similarly, show that a weakly compact set is bounded.
Pn
inverse triangle inequality |q(y) − q(x)| ≤ q(y − x) ≤ j=1 cj qαj (x − y) < ε
for y ∈ U (x). □
Example 6.17. The weak topology on an infinite dimensional space cannot
be generated by a norm. Indeed,
Tn let q be a continuous seminorm and qαj =
|ℓαj | as in the lemma. Then j=1 Ker(ℓαj ) has codimension at most n and
hence contains some x ̸= 0 implying that q(x) ≤ nj=1 cj qαj (x) = 0. Thus
P
q is no norm. Similarly, the other examples cannot be generated by a norm
except in finite dimensional cases. ⋄
Moreover, note that the topology is translation invariant in the sense that
U (x) is a neighborhood of x if and only if U (x) − x = {y − x|y ∈ U (x)} is
a neighborhood of 0. Hence we can restrict our attention to neighborhoods
of 0 (this is of course true for any topological vector space). Hence if X
and Y are topological vector spaces, then a linear map A : X → Y will be
continuous if and only if it is continuous at 0. Moreover, if Y is a locally
convex space with respect to some seminorms pβ , then A will be continuous
if and only if pβ ◦ A is continuous for every β (Lemma B.11). Finally, since
pβ ◦ A is a seminorm, the previous lemma implies:
Corollary 6.16. Let (X, {qα }) and (Y, {pβ }) be locally convex vector spaces.
Then a linear map A : X → Y is continuous if and only if for every β
P seminorms qαj and constants cj > 0, 1 ≤ j ≤ n, such that
there are some
pβ (Ax) ≤ nj=1 cj qαj (x).
Pn
It will shorten notation when sums of the type j=1 cj qαj (x), which
appeared in the last two results, can be replaced by a single expression c qα .
This can be done if the family of seminorms {qα }α∈A is directed, that is,
for given α, β ∈ A there is a γ ∈ A such that qα (x) + qβ (x) ≤ Cqγ (x)
for some C P > 0. Moreover, if F(A) is the set of all finite subsets of A,
then {q̃F = α∈F qα }F ∈F (A) is a directed family which generates the same
topology (since every q̃F is continuous with respect to the original family we
do not get any new open sets).
While the family of seminorms is in most cases more convenient to work
with, it is important to observe that different families can give rise to the
same topology and it is only the topology which matters for us. In fact, it
is possible to characterize locally convex vector spaces as topological vector
spaces which have a neighborhood basis at 0 of absolutely convex sets. Here a
set U is called absolutely convex, if for |α|+|β| ≤ 1 we have αU +βU ⊆ U .
Example 6.18. The absolutely convex sets in C are precisely the (open and
closed) balls. ⋄
Since the sets qα−1 ([0, ε)) are absolutely convex we always have such a ba-
sis in our case. To see the converse note that such a neighborhood U of 0 is
182 6. More on convexity
also absorbing (Problem 6.21) and hence the corresponding Minkowski func-
tional (6.1) is a seminorm (Problem 6.26).
Tn By construction, these seminorms
generate the topology since if U0 = j=1 qαj ([0, εj )) ⊆ U we have for the
−1
corresponding Minkowski functionals pU (x) ≤ pU0 (x) ≤ ε−1 nj=1 qαj (x),
P
where ε = min εj . With a little more work (Problem 6.25), one can even
show that it suffices to assume to have a neighborhood basis at 0 of convex
open sets.
Given a topological vector space X we can define its dual space X ∗ as
the set of all continuous linear functionals. However, while it can happen in
general that the dual space is trivial, X ∗ will always be nontrivial for a lo-
cally convex space since the Hahn–Banach theorem can be used to construct
linear functionals (using a continuous seminorm for φ in Theorem 4.12) and
also the geometric Hahn–Banach theorem (Theorem 6.3) holds; see also its
corollaries. In this respect note that for every continuous linear functional ℓ
in a topological vector space, |ℓ|−1 ([0, ε)) is an absolutely convex open neigh-
borhoods of 0 and hence existence of such sets is necessary for the existence
of nontrivial continuous functionals. As a natural topology on X ∗ we could
use the weak-∗ topology defined to be the weakest topology generated by the
family of all point evaluations qx (ℓ) = |ℓ(x)| for all x ∈ X. Since different
linear functionals must differ at least at one point, the weak-∗ topology is
Hausdorff. Given a continuous linear operator A : X → Y between locally
convex spaces we can define its adjoint A′ : Y ∗ → X ∗ as before,
A brief calculation
is a Fréchet space.
Note that ∂α : C ∞ (Rm ) → C ∞ (Rm ) is continuous. Indeed by Corol-
lary 6.16 it suffices to observe that ∥∂α f ∥j,k ≤ ∥f ∥j,k+|α| . ⋄
Example 6.21. The Schwartz space6
S(Rm ) := {f ∈ C ∞ (Rm )| sup |xα (∂β f )(x)| < ∞, ∀α, β ∈ Nm
0 } (6.21)
x
together with the seminorms
qα,β (f ) := ∥xα (∂β f )(x)∥∞ , α, β ∈ Nm
0 , (6.22)
is a Fréchet space. To see completeness note that a Cauchy sequence fn
is in particular a Cauchy sequence in C ∞ (Rm ). Hence there is a limit
f ∈ C ∞ (Rm ) such that all derivatives converge uniformly. Moreover, since
5Maurice René Fréchet (1878–1973), French mathematician
6Laurent Schwartz (1915 –2002), French mathematician
184 6. More on convexity
Proof. In a normed space every open ball is bounded and hence only the
converse direction is nontrivial. So let U be a bounded open set. By shifting
and decreasing U if necessary we can assume U to be an absolutely convex
open neighborhood of 0 and consider the associated Minkowski functional
q = pU . Then since U = {x|q(x) < 1} and supx∈U qα (x) = Cα < ∞ we infer
qα (x) ≤ Cα q(x) (Problem 6.22) and thus the single seminorm q generates
the topology. □
Finally, we mention that, since the Baire category theorem holds for
arbitrary complete metric spaces, the open mapping theorem (Theorem 4.5),
the inverse mapping theorem (Theorem 4.8) and the closed graph theorem
(Theorem 4.9) hold for Fréchet spaces without modifications. In fact, they
are formulated such that it suffices to replace Banach by Fréchet in these
theorems as well as their proofs (concerning the proof of Theorem 4.5 take
into account Problems 6.21 and 6.28).
Problem* 6.21. In a topological vector space every neighborhood U of 0 is
absorbing.
7Andrei Kolmogorov (1903–1987), Soviet mathematician
6.4. Beyond Banach spaces: Locally convex spaces 185
exists and
∞
X ∞
X
d(0, xj ) ≤ d(0, xj )
j=1 j=1
in this case (compare also Problem 1.6).
Problem 6.29. Let X be a locally convex Hausdorff space. Then for a
nonzero linear functional f the following are equivalent:
186 6. More on convexity
(i) f is continuous.
(ii) Ker(f ) is closed.
(iii) Ker(f ) is not dense in X.
(iv) There exists a neighborhood U of zero such that f (U ) is bounded.
Problem 6.30. Instead of (6.17) one frequently uses
X 1 qn (x − y)
˜ y) :=
d(x, .
2n 1 + qn (x − y)
n∈N
Note that by ∥x∥2 ≤ ∥x∥∞ this norm is equivalent to the usual one: ∥x∥∞ ≤
∥x∥ ≤ 2∥x∥∞ . While with the usual norm ∥.∥∞ this space is not strictly
convex, it is with the new one. To see this we use (i) from Problem 1.16.
Then if ∥x + y∥ = ∥x∥ + ∥y∥ we must have both ∥x + y∥∞ = ∥x∥∞ + ∥y∥∞
and ∥x + y∥2 = ∥x∥2 + ∥y∥2 . Hence strict convexity of ∥.∥2 implies strict
convexity of ∥.∥.
Note however, that ∥.∥ is not uniformly convex. In fact, since by the
Milman–Pettis theorem below, every uniformly convex space is reflexive,
there cannot be an equivalent norm on C[0, 1] which is uniformly convex (cf.
Example 4.20). ⋄
Example 6.27. It can be shown that ℓp (N) is uniformly convex for 1 < p <
∞ (see Theorem 3.11 from [37]). ⋄
Equivalently, uniform convexity implies that if the average of two unit
vectors is close to the boundary, then they must be close to each other.
Specifically, if ∥x∥ = ∥y∥ = 1 and ∥ x+y
2 ∥ > 1 − δ(ε) then ∥x − y∥ < ε. The
following result (which generalizes Lemma 4.30) uses this observation:
Theorem 6.19 (Radon–F. Riesz8). Let X be a uniformly convex Banach
space and let xn ⇀ x. Then xn → x if and only if lim sup ∥xn ∥ ≤ ∥x∥.
8Johann Radon (1887–1956), Austrian mathematician
188 6. More on convexity
Proof. By Lemma 4.29 (ii) we have in fact lim ∥xn ∥ = ∥x∥. If x = 0 there
is nothing to prove. Hence we can assume xn ̸= 0 for all n and consider
yn := ∥xxnn ∥ . Then yn ⇀ y := ∥x∥
x
and it suffices to show yn → y. Next choose
a linear functional ℓ ∈ X with ∥ℓ∥ = 1 and ℓ(y) = 1. Then
∗
yn + y yn + y
ℓ ≤ ≤1
2 2
and letting n → ∞ shows ∥ yn2+y ∥ → 1. Finally uniform convexity shows
yn → y. □
For the proof of the next result we need the following equivalent condi-
tion.
Lemma 6.20. Let X be a Banach space. Then
n o
δ(ε) = inf 1 − ∥ x+y
2 ∥ ∥x∥ ≤ 1, ∥y∥ ≤ 1, ∥x − y∥ ≥ ε (6.24)
for 0 ≤ ε ≤ 2.
Proof. It suffices to show that for given x and y which are not both on the
unit sphere there is an equivalent pair on the unit sphere within the real
subspace spanned by these vectors. By scaling we could get a better pair if
both were strictly inside the unit ball and hence we can assume at least one
vector to have norm one, say ∥x∥ = 1. Moreover, consider
cos(t)x + sin(t)y
u(t) := , v(t) := u(t) + (y − x).
∥ cos(t)x + sin(t)y∥
Then ∥v(0)∥ = ∥y∥ < 1. Moreover, let t0 ∈ ( π2 , 3π
4 ) be the value such that the
line from x to u(t0 ) passes through y. Then we must have ∥v(t0 )∥ > 1 and by
the intermediate value theorem there is some 0 < t1 < t0 with ∥v(t1 )∥ = 1.
Let u := u(t1 ), v := v(t1 ). The line through u and x is not parallel to the
line through 0 and x + y and hence there are α, λ ≥ 0 such that
α
(x + y) = λu + (1 − λ)x.
2
Moreover, since the line from x to u is above the line from x to y (since
t1 < t0 ) we have α ≥ 1. Rearranging this equation we get
α
(u + v) = (α + λ)u + (1 − α − λ)x.
2
Now, consider the convex function f (t) := ∥t u + (1 − t)x∥ which satisfies
f (0) = f (1) = 1. Then for 0 ≤ λ ≤ 1, α ≥ 1 we have f (λ) ≤ 1 ≤ f (λ + α)
and for λ ≥ 1, α ≥ 1 we have f (λ) ≤ f (λ + α). Hence we always have
f (λ) ≤ f (λ + α) or equivalently ∥ 12 (x + y)∥ ≤ ∥ 21 (u + v)∥ and u, v is as
required. □
Proof. Pick some x′′ ∈ X ∗∗ with ∥x′′ ∥ = 1. It suffices to find some x ∈ B̄1 (0)
with ∥x′′ − J(x)∥ ≤ ε. So fix ε > 0 and δ := δ(ε), where δ(ε) is the modulus
of convexity. Then ∥x′′ ∥ = 1 implies that we can find some ℓ ∈ X ∗ with
∥ℓ∥ = 1 and |x′′ (ℓ)| > 1 − 2δ . Consider the weak-∗ neighborhood
U := {y ′′ ∈ X ∗∗ | |(y ′′ − x′′ )(ℓ)| < 2δ }
of x′′ . By Goldstine’s theorem (Theorem 6.14) there is some x ∈ B̄1 (0) with
J(x) ∈ U and this is the x we are looking for. In fact, suppose this were not
the case. Then the set V := X ∗∗ \ B̄ε∗∗ (J(x)) is another weak-∗ neighborhood
of x′′ (since B̄ε∗∗ (J(x)) is weak-∗ compact by the Banach-Alaoglu theorem)
and appealing again to Goldstine’s theorem there is some y ∈ B̄1 (0) with
J(y) ∈ U ∩ V . Since x, y ∈ U we obtain
1− δ
2 < |x′′ (ℓ)| ≤ |ℓ( x+y
2 )| +
δ
2 ⇒ 1 − δ < |ℓ( x+y x+y
2 )| ≤ ∥ 2 ∥,
a contradiction to uniform convexity since ∥x − y∥ ≥ ε. □
Problem 6.35. Find an equivalent norm for ℓ1 (N) such that it becomes
strictly convex (cf. Problems 1.16 and 1.24).
Problem* 6.36. Show that a Hilbert space is uniformly convex. (Hint: Use
the parallelogram law.)
Problem 6.37. A Banach space X is uniformly convex if and only if ∥xn ∥ =
∥yn ∥ = 1 and ∥ xn +y
2 ∥ → 1 implies ∥xn − yn ∥ → 0.
n
Advanced Spectral
theory
191
192 7. Advanced Spectral theory
Proof. As already indicated, the first claim follows from Theorem 4.25.
The remaining items follow from Lemma 4.26 and (4.7). For example, if
α ∈ σp (A′ − α), then Ran(A − α)⊥ = Ker(A′ − α) ̸= {0} and hence Ran(A −
α) ̸= X, so α ̸∈ σc (X). If α ∈ σr (A′ − α), then Ker(A′ − α) = {0} and hence
Ran(A − α) = X, so α ̸∈ σr (X). Etc. In the reflexive case use A ∼ = A′′ by
(4.12). □
7.1. Spectral theory for compact operators 193
Example 7.4. Consider L′ from the previous example, which is just the
right shift in ℓq (N) if 1 ≤ p < ∞. Then σ(L′ ) = σ(L) = B̄1 (0). Moreover,
it is easy to see that σp (L′ ) = ∅. Thus in the reflexive case 1 < p < ∞ we
have σc (L′ ) = σc (L) = ∂B1 (0) as well as σr (L′ ) = σ(L′ ) \ σc (L′ ) = B1 (0).
Otherwise, if p = 1, we only get B1 (0) ⊆ σr (L′ ) and σc (L′ ) ⊆ σc (L) =
∂B1 (0). Hence it remains to investigate Ran(L′ − α) for |α| = 1: If we have
(L′ −α)x = y with some y ∈ ℓ∞ (N), we must have xj := −α−j−1 jk=1 αk yk .
P
Thus y = ((α∗ )n )n∈N is clearly not in Ran(L′ −α). Moreover, if ∥y − ỹ∥∞ ≤ ε
we have |x̃j | = | jk=1 αk ỹk | ≥ (1 − ε)j and hence ỹ ̸∈ Ran(L′ − α), which
P
shows that the range is not dense and hence σr (L′ ) = B̄1 (0), σc (L′ ) = ∅. ⋄
Example 7.5. Consider the bilateral left shift (Sx)j = xj+1 on X := ℓp (Z).
By ∥S∥ = 1 we conclude σ(S) ⊆ B̄1 (0). Moreover, since its inverse is the
corresponding right shift (S −1 x)j = xj−1 , we also have ∥S −1 ∥ = 1 and thus
σ(S −1 ) ⊆ B̄1 (0). Since we also have σ(S −1 ) = σ(S)−1 (cf. Problem 5.4), we
arrive at σ(S) ⊆ ∂B1 (0) as well as σ(S −1 ) ⊆ ∂B1 (0).
Now for |α| = 1 there are two cases: If p = ∞, then clearly xj := αj is in
Ker(S−α) and hence σ(S) = σp (S) = ∂B1 (0) as well as σ(S −1 ) = σp (S −1 ) =
∂B1 (0). If p < ∞, then one concludes that if y = (S − α)x has compact
support, so has x (as outside the support of y the sequence x must equal a
multiple of αj and this multiple must beP0 since x ∈ ℓp (Z)). Moreover, in
this case we further infer j∈Z α yj = j∈Z α−j (xj+1 − αxj ) = 0. Hence
−j
P
any sequence y with compact support violating this condition cannot be in
the range of S − α. Hence σ(S) = ∂B1 (0). Moreover, since S ′ = S −1 we see
from the fact that σp (S) = σp (S −1 ) = ∅, that σr (S) = σr (S −1 ) = ∅. Hence
σc (S) = σc (S −1 ) = ∂B1 (0). ⋄
Moreover, for compact operators the spectrum is particularly simple (cf.
also Theorem 3.7). We start with the following observation:
and
X ⊇ Ran(A) ⊇ Ran(A2 ) ⊇ Ran(A3 ) ⊇ · · · (7.8)
Lemma 7.4 (F. Riesz lemma). Let X be a normed vector space and Y ⊂ X
some subspace. which is not dense Y ̸= X. Then for every ε ∈ (0, 1) there
196 7. Advanced Spectral theory
the range chain of I − K ′ stabilizes at n. The rest follows from the previous
lemma. □
Proof. First of all note that the induced map à : X/ Ker(A) → Y is in-
jective (Problem 1.61). Moreover, the assumption that the cokernel is finite
says that there is a finite subspace Y0 ⊂ Y such that Y = Y0 ∔ Ran(A).
Then
 : X/ Ker(A) ⊕ Y0 → Y, Â(x, y) = Ãx + y
is bijective and hence a homeomorphism by Theorem 4.8. Since X̃ :=
X/ Ker(A)⊕{0} is a closed subspace of X/ Ker(A)⊕Y0 we see that Ran(A) =
Â(X̃) is closed in Y . □
Proof. Using Lemma 4.26 and Theorems 4.28, 4.21 we have Ker(A′ ) =
Ran(A)⊥ ∼ = (Y / Ran(A))∗ = Coker(A)∗ and Coker(A′ ) = X ∗ / Ran(A′ ) =
X ∗ / Ker(A)⊥ ∼
= Ker(A)∗ . The second claim follows since for a finite dimen-
sional space the dual space has the same dimension. □
Next we want to look a bit further into the structure of Fredholm op-
erators. First of all, since Ker(A) is finite dimensional, it is complemented
(Problem 4.23), that is, there exists a closed subspace X0 ⊆ X such that
X = Ker(A)∔X0 and a corresponding projection P ∈ L (X) with Ran(P ) =
Ker(A). Similarly, Ran(A) is complemented (Problem 1.62) and there exists
a closed subspace Y0 ⊆ Y such that Y = Y0 ∔ Ran(A) and a corresponding
projection Q ∈ L (Y ) with Ran(Q) = Y0 . With respect to the decomposition
Ker(A) ⊕ X0 → Y0 ⊕ Ran(A) our Fredholm operator is given by
0 0
A= , (7.23)
0 A0
where A0 is the restriction of A to X0 → Ran(A). By construction A0 is
bijective and hence a homeomorphism (Theorem 4.8).
Example 7.14. In case of an Hilbert space we can choose X0 = Ker(A)⊥ =
Ran(A∗ ) and Y0 = Ran(A)⊥ = Ker(A∗ ) by (2.29). In particular, A :
Ker(A)⊥ → Ker(A∗ )⊥ has a bounded inverse. ⋄
Defining
0 0
B := (7.24)
0 A−1
0
we get
AB = I − Q, BA = I − P (7.25)
and hence A is invertible up to finite rank operators. Now we are ready for
showing that the index is stable under small perturbations.
Fredholm operators are also used to split the spectrum. For A ∈ L (X)
one defines the essential spectrum
σess (A) := {α ∈ C|A − α ̸∈ Φ0 (X)} ⊆ σ(A) (7.26)
and the Fredholm spectrum
σΦ (A) := {α ∈ C|A − α ̸∈ Φ(X)} ⊆ σess (A). (7.27)
By Dieudonné’s theorem both σess (A) and σΦ (A) are closed. Also note that
we have σc (A) ⊆ σΦ (A). Warning: These definitions are not universally
accepted and several variants can be found in the literature.
Example 7.15. Let X be infinite dimensional and K ∈ K (X). Then
σess (K) = σΦ (K) = {0}. ⋄
Example 7.16. If X is a Hilbert space and A is self-adjoint, then σ(A) ⊆ R
and for α ∈ R\σΦ (A) the identity ind(A−α) = − ind((A−α)∗ ) = − ind(A−
α) shows that the index is always zero. Thus σess (A) = σΦ (A) for self-adjoint
operators. ⋄
By Corollary 7.15 both the Fredholm spectrum and the essential spec-
trum are invariant under compact perturbations:
Theorem 7.16 (Weyl). Let A ∈ L (X), then
σΦ (A + K) = σΦ (A), σess (A + K) = σess (A), K ∈ K (X). (7.28)
Note that if I is a closed ideal, then the quotient space X/I (cf. Lemma 1.20)
is again a Banach algebra if we define
[x][y] = [xy]. (7.32)
Indeed (x + I)(y + I) = xy + I and hence the multiplication is well-defined
and inherits the distributive and associative laws from X. Also [e] is an
identity. Finally,
∥[xy]∥ = inf ∥xy + a∥ = inf ∥(x + b)(y + c)∥ ≤ inf ∥x + b∥ inf ∥y + c∥
a∈I b,c∈I b∈I c∈I
= ∥[x]∥∥[y]∥. (7.33)
In particular, the quotient map π : X → X/I is a Banach algebra homomor-
phism.
Example 7.20. Consider the Banach algebra L (X) together with the ideal
of compact operators K (X). Then the Banach algebra L (X)/K (X) is
known as the Calkin algebra.4 Atkinson’s theorem (Theorem 7.14) says
that the invertible elements in the Calkin algebra are precisely the images of
the Fredholm operators. ⋄
4John Williams Calkin (1909–1964), American mathematician
208 7. Advanced Spectral theory
Moreover, x̂(M) ⊆ σ(x) and hence ∥x̂∥∞ ≤ r(x) ≤ ∥x∥ where r(x) is
the spectral radius of x. If X is commutative then x̂(M) = σ(x) and hence
∥x̂∥∞ = r(x).
of the unit ball in X ∗ and the first claim follows form the Banach–Alaoglu
theorem (Theorem 6.10).
Next (x+y)∧ (m) = m(x+y) = m(x)+m(y) = x̂(m)+ ŷ(m), (xy)∧ (m) =
m(xy) = m(x)m(y) = x̂(m)ŷ(m), and ê(m) = m(e) = 1 shows that the
Gelfand transform is an algebra homomorphism.
Moreover, if m(x) = α then x − α ∈ Ker(m) implying that x − α is
not invertible (as maximal ideals cannot contain invertible elements), that
is α ∈ σ(x). Conversely, if X is commutative and α ∈ σ(x), then x − α is
not invertible and hence contained in some maximal ideal, which in turn is
the kernel of some character m. Whence m(x − α) = 0, that is m(x) = α
for some m. □
and
T it can only be injective if X is commutative (if xy ̸= yx, then xy − yx ∈
m∈M Ker(m)). In this case Lemma 7.19 implies
\
x ∈ Ker(Γ) ⇔ x ∈ Rad(X) := I, (7.36)
I maximal ideal
this end we will show that all maximal ideals are of the form I = Ker(mx0 )
for some x0 ∈ K. So let I be an ideal and suppose there is no point where
all functions vanish. Then for every x ∈ K there is a ball Br(x) (x) and a
function fx ∈ C(K) such that |fx (y)| ≥ 1 for y ∈ Br(x) (x). By
Pcompactness
finitely many of these balls will cover K. Now consider f = j fx∗j fxj ∈ I.
Then f ≥ 1 and hence f is invertible, that is I = C(K). Thus maximal
ideals are of the form Ix0 = {f ∈ C(K)|f (x0 ) = 0} which are precisely the
kernels of the characters mx0 (f ) = f (x0 ). Thus M ∼
= K as well as fˆ ∼
= f. ⋄
Example 7.22. Consider the Wiener algebra A of all periodic continuous
functions which have an absolutely convergent Fourier series. As in the
previous example it suffices to show that all maximal ideals are of the form
Ix0 = {f ∈ A|f (x0 ) = 0}. To see this set ek (x) = eikx and note ∥ek ∥A = 1.
Hence for every character m(ek ) = m(e1 )k and |m(ek )| ≤ 1. Since the
last claim holds for both positive and negative k, we conclude |m(ek )| = 1
and thus there is some x0 ∈ [−π, π] with m(ek ) = eikx0 . Consequently
m(f ) = f (x0 ) and point evaluations are the only characters. Equivalently,
every maximal ideal is of the form Ker(mx0 ) = Ix0 .
So, as in the previous example, M ∼= [−π, π] (with −π and π identified)
ˆ ∼
as well hat f = f . Moreover, the Gelfand transform is injective but not
surjective since there are continuous functions whose Fourier series are not
absolutely convergent. Incidentally this also shows that the Wiener algebra
is no C ∗ algebra (despite the fact that we have a natural conjugation which
satisfies ∥f ∗ ∥A = ∥f ∥A — this again underlines the special role of (5.31)) as
the Gelfand–Naimark6 theorem below will show that the Gelfand transform
is bijective for commutative C ∗ algebras. ⋄
Since 0 ̸∈ σ(x) implies that x is invertible, the Gelfand representation
theorem also contains a useful criterion for invertibility.
Corollary 7.21. In a commutative unital Banach algebra an element x is
invertible if and only if m(x) ̸= 0 for all characters m.
And applying this to the last example we get the following famous the-
orem of Wiener:
Theorem 7.22 (Wiener). Suppose f ∈ Cper [−π, π] has an absolutely con-
vergent Fourier series and does not vanish on [−π, π]. Then the function f1
also has an absolutely convergent Fourier series.
The first moral from this theorem is that from an abstract point of view
there is only one commutative C ∗ algebra, namely C(K) with K some com-
pact Hausdorff space. Moreover, the formulation also very much reassembles
the spectral theorem and in fact, we can derive the spectral theorem by ap-
plying it to C ∗ (x), the C ∗ algebra generated by x (cf. (5.34)). This will even
give us the more general version for normal elements. As a preparation we
show that it makes no difference whether we compute the spectrum in X or
in C ∗ (x).
Lemma 7.25 (Spectral permanence). Let X be a C ∗ algebra and Y ⊆ X
a closed ∗-subalgebra containing the identity. Then σ(y) = σY (y) for every
y ∈ Y , where σY (y) denotes the spectrum computed in Y .
Proof. Clearly we have σ(y) ⊆ σY (y) and it remains to establish the reverse
inclusion. If (y−α) has an inverse in X, then the same is true for (y−α)∗ (y−
α). But the last operator is self-adjoint and hence has real spectrum in Y .
Thus ((y − α)∗ (y − α) + ni )−1 ∈ Y and letting n → ∞ shows ((y − α)∗ (y −
α))−1 ∈ Y since taking the inverse is continuous and Y is closed. Whence
(y − α)−1 = (y − α∗ )((y − α)∗ (y − α))−1 ∈ Y . □
Unbounded operators
213
214 8. Unbounded operators
and
D(B) := {f ∈ C 1 [0, 1]|f (0) = f (1) = 0}, Bf := f ′
are two different operators in X := C[0, 1]. Clearly A is an extension of
B. Moreover, both are closed since fn → f and fn′ → g implies that f is
differentiable and f ′ = g. Note that A is densely defined while B is not. ⋄
Be aware that taking sums or products of unbounded operators is tricky
due to the possible different domains. Indeed, if A and B are two operators
between Banach spaces X and Y , so is A + B defined on D(A + B) :=
D(A) ∩ D(B). The problem is that D(A + B) might contain nothing more
than zero. Similarly, if A : D(A) ⊆ X → Y and B : D(B) ⊆ Y → Z, then
the composition BA is defined on D(BA) := {x ∈ D(A)|Ax ∈ D(B)}.
Example 8.2. Consider X := C[0, 1]. Let M be the subspace of trigonomet-
ric polynomials and N be the subspace of piecewise linear functions. Then
both M and N are dense with M ∩ N = {0}. ⋄
If an operator is not closed, you can try to take the closure of its graph,
to obtain a closed operator. If A is bounded this always works (which is
just the content of Theorem 1.16). However, in general, the closure of the
graph might not be the graph of an operator as we might pick up points
(x, y1 ), (x, y2 ) ∈ Γ(A) with y1 ̸= y2 . Since Γ(A) is a subspace, we also have
(x, y2 ) − (x, y1 ) = (0, y2 − y1 ) ∈ Γ(A) in this case and thus Γ(A) is the graph
of some operator if and only if
Γ(A) ∩ {(0, y)|y ∈ Y } = {(0, 0)}. (8.2)
If this is the case, A is called closable and the operator A associated with
Γ(A) is called the closure of A. Any linear subset D ⊆ D(A) with the
property that A restricted to D has the same closure, A|D = A, is called a
core for A.
In particular, A is closable if and only if xn → 0 and Axn → y implies
y = 0. In this case
D(A) = {x ∈ X|∃xn ∈ D(A), y ∈ Y : xn → x and Axn → y},
Ax = y. (8.3)
There is yet another way of defining the closure: Define the graph norm
associated with A by
∥x∥A := ∥x∥X + ∥Ax∥Y , x ∈ D(A). (8.4)
Since we have ∥Ax∥ ≤ ∥x∥A we see that A : D(A) → Y is bounded with
norm at most one. Thus far (D(A), ∥.∥A ) is a normed space and it suggests
itself to consider its completion XA . Then one can check (Problem 8.5) that
XA can be regarded as a subset of X if and only if A is closable. In this
case the completion can be identified with D(A) and the closure of A in X
8.1. Closed operators 215
and Aaj = jaj . In fact, if an → a and Aan → b then we have anj → aj and
janj → bj for any j ∈ N and thus bj = jaj for any j ∈ N. In particular,
j=1 = (bj )j=1 ∈ ℓ (N) (c0 (N) if p = ∞). Conversely, suppose (jaj )j=1 ∈
(jaj )∞ ∞ p ∞
Let anj := n1 for 1 ≤ j ≤ n and anj := 0 for j > n. Then ∥an ∥2 = √1n implying
an → 0 but Ban = δ 1 ̸→ 0. ⋄
Example 8.5 (Sobolev1 spaces). Let X := Lp (0, 1), 1 ≤ p < ∞, and con-
sider Af := f ′ on D(A) := C 1 [0, 1]. Then it is not hard to see that A is
not closed (take a sequence gn of continuous functions which converges in Lp
to a non-continuous
Rx function, cf. Example 1.11, and consider its primitive
fn (x) = 0 gn (y)dy). It is however closable.
R x To see this Rsuppose fn → 0
x
and fn′ → g in Lp . Then fn (0) = fn (x) − 0 fn′ (y)dy → − 0 g(y)dy. But a
sequence of constant functions can only have a constant function as a limit
implying g ≡ 0 as required. The domain of the closure is the Sobolev
space W 1,p (0, 1) and this is one way of defining Sobolev spaces. In partic-
ular, W 1,p (0, 1) is a Banach space when equipped with the graph norm. In
this context one chooses the p-norm for the direct sum X ⊕p X such that
There is also another criterion which does not involve the distance to the
kernel.
Corollary 8.5. Suppose A : D(A) ⊆ X → Y is closed. Then Ran(A)
is closed if for some given ε > 0 and 0 ≤ δ < 1 we can find for every
y ∈ Ran(A) a corresponding x ∈ D(X) such that
ε∥x∥ + ∥y − Ax∥ ≤ δ∥y∥. (8.8)
218 8. Unbounded operators
Conversely, if Ran(A) is closed this can be done whenever ε < cδ with c from
the previous corollary.
Proof. For the first claim observe: y ′ ∈ Ker(A′ ) implies 0 = (A′ y ′ )(x) =
y ′ (Ax) for all x ∈ D(A) and hence y ′ ∈ Ran(A)⊥ . Conversely, y ′ ∈ Ran(A)⊥
implies y ′ (Ax) = 0 for all x ∈ D(A) and hence y ′ ∈ D(A′ ) with A′ y ′ = 0.
For the second claim observe: x ∈ Ker(A) implies (A′ y ′ )(x) = y ′ (Ax) = 0
for all y ′ ∈ D(A′ ) and hence x ∈ Ran(A′ )⊥ . Conversely, x ∈ Ran(A′ )⊥
implies (A′ y ′ )(x) = 0 for all y ′ ∈ D(A′ ). If we had (x, 0) ̸∈ Γ(A), we could
find (Corollary 4.15) a linear functional (x′ , y ′ ) ∈ X ∗ ⊕ Y ∗ which vanishes on
Γ(A) and satisfies (x′ , y ′ )(x, 0) = 1. The fact that it vanishes on Γ(A) implies
x′ (x) + y ′ (Ax) = 0 for all x ∈ D(A) implying y ′ ∈ D(A′ ) with A′ y ′ = −x′ .
But this gives the contradiction 0 = (A′ y ′ )(x) = −x′ (x) = −1 and hence
(x, 0) ∈ Γ(A), that is, x ∈ Ker(A). □
We end this section with the remark that one could try to define the
concept of a weakly closed operator by replacing the norm topology in the
definition of a closed operator by the weak topology. However, Theorem 6.12
implies that this gives nothing new:
Lemma 8.10. For an operator A : D(A) ⊆ X → Y the following are
equivalent:
• Γ(A) is closed.
• xn ∈ D(A) with xn → x and Axn → y implies x ∈ D(A) and
y = Ax.
• Γ(A) is weakly closed.
• xn ∈ D(A) with xn ⇀ x and Axn ⇀ y implies x ∈ D(A) and
Ax = y.
Problem* 8.1. Show that the kernel Ker(A) of a closed operator A is closed.
Problem* 8.2. A linear functional defined on a dense subspace is closable
if and only if it is bounded.
Problem 8.3. Let (mj )j∈N be a sequence of complex numbers and consider
the multiplication operator M in ℓp (N) defined by M aj := mj aj on D(A) :=
ℓc (N). Under which conditions on m is M closable and what is its closure?
d
Problem 8.4. Show that the differential operator A = dx defined on D(A) =
1
C [0, 1] ⊂ C[0, 1] (sup norm) is a closed operator. (Compare the example in
Section 1.6.)
Problem* 8.5. Show that the completion XA of (D(A), ∥.∥A ) can be re-
garded as a subset of X if and only if A is closable. Show that in this case
the completion can be identified with D(A) and that the closure of A in X
coincides with the extension from Theorem 1.16 of A in XA . In particular,
A is closed if and only if (D(A), ∥.∥A ) is complete.
Problem 8.6. Consider the Sobolev spaces W 1,p (0, 1). Show that
∥f ∥∞ ≤ ∥f ∥1,1
222 8. Unbounded operators
Problem 8.18. Let A be a closed operator. Show that for every α ∈ ρ(A)
the expression ∥f ∥α := ∥(A − α)x∥ defines a norm which is equivalent to the
graph norm.
Problem 8.19. Let Xj be Banach spaces. A sequence of operators Aj ∈
C (Xj , Xj+1 )
A 1 A 2 A n
X1 −→ X2 −→ X3 · · · Xn −→ Xn+1
is said to be exact if Ran(Aj ) = Ker(Aj+1 ) for 1 ≤ j ≤ n. Show that a
sequence is exact if and only if the corresponding dual sequence
A′ A′ A′
X1∗ ←−
1
X2∗ ←−
2
X3∗ · · · Xn∗ ←−
n ∗
Xn+1
is exact.
Problem* 8.20. Let A : D(A) ⊆ X → X be an unbounded operator and
B ∈ L (X) bounded. Then A and B are said to commute if
BA ⊆ AB.
Of course if A ∈ L (X), then we have equality and this definition reduces to
the usual one. Note that the definition implies in particular, that B leaves
D(A) invariant, BD(A) ⊆ D(A).
Show that if A is invertible, then A commutes with B if and only if A−1
commutes with B. Conclude that if A has a nonempty resolvent set, then A
commutes with B if and only if RA (α) commutes with B for one α ∈ ρ(A).
Moreover, in this case this holds for all α ∈ ρ(A).
Problem* 8.21. Let A : D(A) ⊆ X → X be closable and B ∈ L (X)
bounded. Then if A commutes with B so does A.
The set of all α ∈ C for which a Weyl sequence exists is called the
approximate point spectrum
σap (A) := {α ∈ C|∃xn ∈ D(A) : ∥xn ∥ = 1, ∥(A − α)xn ∥ → 0} (8.23)
and the above lemma could be phrased as
∂σ(A) ⊆ σap (A) ⊆ σ(A). (8.24)
Note that there are two possibilities if α ∈ σap (A): Either α is an eigenvalue
or (A−α)−1 exists but is unbounded. By the closed graph theorem the latter
case is equivalent to the fact that Ran(A − α) is not closed. In particular,
we have
σp (A) ∪ σc (A) ⊆ σap (A) ⊆ σ(A). (8.25)
8.2. Spectral theory for unbounded operators 225
for α0 , α1 ∈ ρ(A).
However, note that for unbounded operators the spectrum will no longer
be bounded in general and both σ(A) = ∅ as well as σ(A) = C are possible.
Example 8.12. Consider X := C[0, 1] and A = dx d
with D(A) = C 1 [0, 1].
We obtain the eigenvalues by solving the ordinary differential equation x′ (t) =
αx(t) which gives x(t) = eαt . Hence every α ∈ C is an eigenvalue, that is,
σ(A) = C.
Now let us modify the domain and look at A0 = dx d
with D(A0 ) =
{x ∈ C [0, 1]|x(0) = 0} and X0 := {x ∈ C[0, 1]|x(0) = 0}. Then the
1
and
Moreover,
σp (A′ ) ⊆ σp (A) ∪· σr (A), σp (A) ⊆ σp (A′ ) ∪· σr (A′ ),
σr (A′ ) ⊆ σp (A) ∪· σc (A), σr (A) ⊆ σp (A′ ), (8.29)
σc (A′ ) ⊆ σc (A), σc (A) ⊆ σr (A′ ) ∪· σc (A′ ).
If in addition, X is reflexive we have σr (A′ ) ⊆ σp (A) as well as σc (A′ ) =
σc (A).
Proof. Literally follow the proof of Lemma 7.1 using the corresponding
results for closed operators: Theorem 8.8, Lemma 8.7, and Theorem 8.6. □
Proof. Throughout this proof α ̸= 0. We first show the formula for the
resolvent of A−1 . To this end choose some α ∈ ρ(A)\{0} and observe that the
right-hand side of (8.31) is a bounded operator from X → Ran(A) = D(A−1 )
and
(A−1 − α−1 )(−αARA (α))x = (−α + A)RA (α)x = x, x ∈ X.
Conversely, if y ∈ D(A−1 ) = Ran(A), we have y = Ax and hence
(−αARA (α))(A−1 − α−1 )y = ARA (α)((A − α)x) = Ax = y.
Thus (8.31) holds and α−1 ∈ ρ(A−1 ). Interchanging the roles of A and A−1
establishes the first part.
Next note that for x ∈ D(A) the equation (A − α)x = y is equivalent to
x = α1 (Ax − y) and hence (A − α)x ∈ Ran(Am ) implies x ∈ Ran(Am ) for any
m ∈ N. Consequently we get that x ∈ Ker((A−α)n ) implies x ∈ Ran(Am ) =
D(A−m ) for any m ∈ N and 0 = A−n (A − α)n α = (−α)n (A−1 − α−1 )n x. So
Ker((A − α)n ) ⊆ Ker((A−1 − α−1 )n ) and equality follows by reversing the
roles of A and A−1 .
Similarly, x ∈ Ran((A − α)n ) implies x = (A − α)n y for some y ∈ D(An ).
Consequently y = A−n z and x = (A−α)n y = (A−α)n A−n z = (−α)n (A−1 −
α−1 )n z ∈ Ran((A−1 − α−1 )n ). So Ran((A − α)n ) ⊆ Ran((A−1 − α−1 )n ) and
equality follows again by reversing the roles of A and A−1 . □
8.2. Spectral theory for unbounded operators 227
Theorem 8.15. Suppose RA (α) ∈ K (X) for one α ∈ ρ(A). Then the
spectrum of A consists only of discrete eigenvalues with finite (geometric
and algebraic) multiplicity and we have the splitting into closed invariant
subspaces
X = Ker(A − α)n ∔ Ran(A − α)n , α ∈ σ(A), (8.33)
where n is the index of α.
Proof. The claim follows by combining the previous lemma with the spectral
theorem for compact operators (Theorem 7.7 and Lemma 7.5). □
is compact by Problem 3.6 (cf. also Problem 7.4). Hence we get again
σ(A0 ) = ∅ without the need of computing the resolvent. ⋄
Example 8.14. Of course another example of unbounded operators with
compact resolvent are regular Sturm–Liouville operators as shown in Sec-
tion 3.3. ⋄
This result says in particular, that for α ∈ σp (A) we can split X =
X1 ⊕ X2 where both X1 := Ker(A − α)n and X2 := Ran(A − α)n are
invariant subspaces for A. Consequently we can split A = A1 ⊕ A2 , where
A1 is the restriction of A to the finite dimensional subspace X1 (with A1 −α a
nilpotent matrix) and A2 is the restriction of A to X2 . Moreover, Ker(A2 −
α) = {0} by construction. Now note that for β ∈ ρ(A) we must have
β ∈ ρ(A1 ) ∩ ρ(A2 ) with RA (β) = RA1 (β) ⊕ RA2 (β) (cf. Problem 8.25) which
shows that RA2 (β) ∈ K (X2 ) for β ∈ ρ(A). Now since α ̸∈ σp (A2 ) this tells
us α ∈ ρ(A2 ).
228 8. Unbounded operators
for all) α ∈ ρ(A). Moreover, if A commutes with P , the same is true for A
(Problem 8.21).
Here the integral is defined as a Riemann integral (cf. Section 9.1 for details),
explicitly
I Xn
RA γ(j/n) γ ′ (j/n)/n. (8.36)
RA (α)dα := lim
γ n→∞
j=1
Choosing some x ∈ X and some ℓ ∈ X ∗ we get
I
1
ℓ(P x) = − ℓ(RA (α)x)dα, (8.37)
2πi γ
where now the integral on the right is an ordinary path integral in the com-
plex plane. In particular, this shows that all the convenient facts from
complex analysis about line integrals of holomorphic functions are at our
disposal. For example, we can continuously deform the curve γ within ρ(A)
without changing the integral and the Cauchy integral theorem holds.
Lemma 8.17. Let A ∈ C (X) and γ : [0, 1] → ρ(A) be a Jordan curve. Then
P from (8.35) is a projection which reduces A.
Proof. First of all note that P ∈ L (X) since (see again Section 9.1 for
details) the right-hand side of (8.36) converges in the operator norm. More-
over, since the integral is independent of γ we can take a slightly deformed
path γ ′ in the exterior of γ to compute:
I I
2 1
P =− 2 RA (α)RA (β)dα dβ
4π γ ′ γ
I I I I
1 dα 1 dα
=− 2 RA (α) dβ + 2 RA (β) dβ
4π γ ′ γ α−β 4π γ ′ γ α−β
I I I
1 dβ 1
=− 2 RA (α) dα = − RA (α)dα = P
4π γ γ′ α − β 2πi γ
Here we have used the first resolvent identity (8.26) to obtain the second
line. To obtain the third line we have used Fubini for the first double integral
and the fact that in the second integral the inner integral vanishes by the
Cauchy integral theorem since β lies in the exterior of γ. To obtain the last
line observe that the inner integral equals −2πi since α lies in the interior of
γ′.
Finally, since P commutes with RA (α) it also commutes with A (Prob-
lem 8.20), that is, P reduces A. □
Proof. From ∥Ax∥ ≤ ∥(A + B)x∥ + ∥Bψ∥ ≤ ∥(A + B)x∥ + a∥Aψ∥ + b∥x∥
we obtain
1 b
∥Ax∥ ≤ ∥(A + B)x∥ + ∥x∥
1−a 1−a
which shows that the graph norms of A and A + B are equivalent. □
qε1/q ′ 1
|f (0)| ≤ ∥f ∥p + ∥f ∥p .
1+q ε
234 8. Unbounded operators
Hence the relative bound is 0 for 1 < p < ∞ and our lemma applies. In
the case p = 1 this argument breaks down. In fact, it turns out that for
p = 1 the A-bound is one. To see this let fn (x) := max(1 − nx, 0) such that
1
∥fn ∥1 = 2n and ∥fn′ ∥1 = |fn (0)| = 1. Then letting n → ∞ in
b
1 = |fn (0)| = ∥Bfn ∥1 ≤ a∥Afn ∥1 + b∥fn ∥ = a +
2n
shows a ≥ 1 and hence the A bound of B is one and our lemma does not
apply. We will come back to this case below. ⋄
Next, there is also a convenient formula for the resolvent of A + B.
Lemma 8.21. Let A, B be two given operators with D(A) ⊆ D(B) such
that A and A + B are closed. Then we have the second resolvent formula
RA+B (α) − RA (α) = −RA (α)BRA+B (α) = −RA+B (α)BRA (α) (8.40)
for α ∈ ρ(A) ∩ ρ(A + B).
Example 8.21. Consider the operators from Example 8.17 or 8.20. Then, B
is relatively compact since its range is one-dimensional. In fact, the same is
true for any relatively bounded operator B whose range is finite dimensional
(cf. Problem 8.30). ⋄
Example 8.22. Consider the differential operator A : W 1,p (0, 1) ⊆ Lp (0, 1) →
Lp (0, 1), 1 < p < ∞ from Example 8.5. Then since the embedding W 1,p (0, 1) ⊆
Lp (0, 1) is compact (Problem 8.7), every bounded operator B ∈ L (Lp (0, 1))
is relatively compact. For example one can choose B to be a multiplication
operator by a bounded function. ⋄
Again there is an equivalent characterization in terms of resolvents.
Lemma 8.23. Suppose A ∈ C (X) is closed with nonempty resolvent set.
Then B ∈ KA (X) if and only if BRA (α) ∈ K (X) for one (and hence for
all) α ∈ ρ(A).
even need the computation of the A-bound. Moreover, in the case of Exam-
ple 8.20 this now covers the full range 1 ≤ p < ∞. ⋄
Applications of these notions will be given in the next section.
Problem 8.27. Suppose Bj , j = 1, 2, are A bounded with respective A-
bounds ai , i = 1, 2. Show that α1 B1 + α2 B2 is also A bounded with A-bound
less than |α1 |a1 + |α2 |a2 .
Problem* 8.28. Suppose A is closed and B satisfies D(A) ⊆ D(B):
• Show that 1 + B has a bounded inverse if ∥B∥ < 1.
• Suppose A has a bounded inverse. Then so does A+B if ∥BA−1 ∥ <
∥A−1 ∥
1. In this case we have ∥(A + B)−1 ∥ ≤ 1−∥BA −1 ∥ .
Problem 8.29. Suppose A, B are linear operators with D(A) ⊆ D(B) and
∥Bx∥ ≤ a∥Ax∥α ∥x∥1−α for all x ∈ D(A) and some a > 0, α ∈ (0, 1). Then
B is relatively bounded with A-bound 0. (Hint: Young’s inequality (1.24).)
Problem 8.30. Show that a relatively bounded operator B ∈ LA (X, Y ) with
finite dimensional range is of the form
n
X
Bx = (x′j (x) + yj′ (Ax))yj
j=1
is called a Fredholm operator if both its kernel and cokernel are finite
dimensional. In this case we define its index as
ind(A) := dim Ker(A) − dim Coker(A). (8.43)
In fact, many results for bounded Fredholm operators carry over to the
unbounded case using the following observation: Let us denote A regarded
as an operator from D(A) equipped with the graph norm ∥.∥A to Y by Â.
Recall that  is bounded (cf. Problem 8.5). Moreover, Ker(Â) = Ker(A)
and Ran(Â) = Ran(A). Consequently, A ∈ C (X, Y ) is Fredholm if and only
if  is.
For example, applying Lemma 7.8 to  shows that a closed operator
with finite cokernel has closed range. In particular, a Fredholm operator has
closed range.
Example 8.24. Consider the operator A from Example 8.12. There we
have seen dim Ker(A − α) = 1 for every α ∈ C. Moreover, the solution of
the inhomogeneous differential equation f ′ − αf = g is given by
Z x
αx
f (x) = f (0)e + eα(x−y) g(y)dy,
0
f (x) = f (0)e−iαx
and such a function will never be square integrable unless f (0) = 0. How-
ever, for α ∈ R this function is at least bounded and we could try to get
approximating eigenfunctions by restricting e−iαx to compact intervals. More
precisely, choose φn ∈ Cc∞ (R) such that φn is symmetric with φn (x) = 1 for
0 ≤ x ≤ n, φn (x) = 1 for x ≥ n + 1 and such that the piece on [n, n + 1] is
independent of n. Set un (x) := φn (x)e−iαx . Then ∥un ∥p = (2(C0 + n))1/p
while ∥(A − α)un ∥p = (2C1 )1/p and thus un /∥un ∥p is a Weyl sequence. Since
the same is true for un (x + rn ) we get a singular Weyl sequence by choosing
rn such that the supports of φn and φm are disjoint for n ̸= m. Hence we
conclude R ⊂ σΦ (A).
If α ∈ C \ R we can write down the resolvent of A. To this end note that
the solution of the inhomogeneous equation −if ′ − αf = g is given by
Z x
iα(x−a)
f (x) = f (a)e +i eiα(x−y) g(y)dy.
a
240 8. Unbounded operators
If g has compact support, we can shift a beyond the support of g and hence
we expect
Z x
RA (α)g(x) := i eiα(x−y) g(y)dy, ∓Im(α) > 0.
±∞
Note that here we have shifted a to the left/right of the support depending
on the sign of Im(α) such that the above expression at least formally makes
sense without the compact support assumption. Then Young’s inequality
for convolutions implies
1
∥RA (α)g∥p ≤ ∥g∥p ,
|Im(α)|
which shows α ∈ ρ(A) with ∥RA (α)∥ ≤ |Im(α)|−1 .
In summary, we have σ(A) = σΦ (A) = σess (A) = R.
Moreover, since the embedding W 1,p (a, b) ,→ Lp (a, b) is compact (Prob-
lem 8.7) and the embedding Lp (a, b) ,→ Lp (R) is bounded for every bounded
interval (a, b) every multiplication operator Q with a function q ∈ L∞ c (R)
(bounded with compact support) is relatively compact. Moreover, every
bounded function q ∈ L∞ 0 (R) which vanishes as |x| → ∞ can be approx-
imated by bounded functions with compact support qn in the sup norm.
Hence we also have Qn → Q in the operator norm, and for any q ∈ L∞ 0 (R)
we have σΦ (A + Q) = σess (A + Q) = R.
However, note that while the spectrum of A + Q could be different from
the spectrum of A, this is not the case here. Indeed one can verify that the
resolvent is given by
Z x Rx
RA+Q (α)g(x) := i eiα(x−y)−i y q(t)dt g(y)dy, ∓Im(α) > 0. ⋄
±∞
d
Problem 8.32. Consider X := C[0, 1] and A = dx with D(A) = {f ∈
1
C [0, 1]|f (0) = f (1) = 0}. Investigate when A − α is Fredholm and compute
the essential spectrum of A.
Part 3
Nonlinear Functional
Analysis
Chapter 9
Analysis in Banach
spaces
f (t + ε) − f (t)
f˙(t) := lim (9.1)
ε→0 ε
exists. If t is a boundary point, the limit/derivative is understood as the
corresponding onesided limit/derivative.
The set of functions f : I → X which are differentiable at all t ∈ I and
for which f˙ ∈ C(I, X) is denoted by C 1 (I, X). Clearly C 1 (I, X) ⊂ C(I, X).
As usual we set C k+1 (I, X) := {f ∈ C 1 (I, X)|f˙ ∈ C k (I, X)}. Note that if
A ∈ L (X, Y ) and f ∈ C k (I, X), then Af ∈ C k (I, Y ) and dtd
Af = Af˙.
The following version of the mean value theorem will be crucial.
for s ≤ t ∈ I.
243
244 9. Analysis in Banach spaces
In particular,
Corollary 9.2. For f ∈ C 1 (I, X) we have f˙ = 0 if and only if f is constant.
X by
Z b n
X
f (t)dt := xj (tj − tj−1 ), (9.4)
a j=1
where xi is the value of f on (tj−1 , tj ). This map satisfies
Z b
f (t)dt ≤ ∥f ∥∞ (b − a). (9.5)
a
Now for x, u ∈ X with ∥x∥∞ < R and ∥u∥∞ < δ we have |f (xn + un ) −
f (xn ) − f ′ (xn )un | < ε|un | and hence
∥F (x + u) − F (x) − dF (x)u∥p < ε∥u∥p
which establishes differentiability. Moreover, using uniform continuity of
f on compact sets a similar argument shows that dF : X → L (X, X) is
continuous (observe that the operator norm of a multiplication operator by
a sequence is the sup norm of the sequence) and hence one writes F ∈
C 1 (X, X) as usual. ⋄
Differentiability implies existence of directional derivatives
F (x + εu) − F (x)
δF (x, u) := lim , ε ∈ R \ {0}, (9.13)
ε→0 ε
which are also known as Gâteaux derivative2 or variational derivative.
Indeed, if F is differentiable at x, then (9.11) implies
δF (x, u) = dF (x)u. (9.14)
In particular, we call F Gâteaux differentiable at x ∈ U if the limit on the
right-hand side in (9.13) exists for all u ∈ X. However, note that Gâteaux
differentiability does not imply differentiability. In fact, the Gâteaux deriva-
tive might be unbounded or it might even fail to be linear in u. Some authors
require the Gâteaux derivative to be a bounded linear operator and in this
case we will write δF (x, u) = δF (x)u. But even this additional requirement
does not imply differentiability in general. Note that in any case the Gâteaux
derivative is homogenous, that is, if δF (x, u) exists, then δF (x, λu) exists
for every λ ∈ R and
δF (x, λu) = λ δF (x, u), λ ∈ R. (9.15)
x3
Example 9.3. The function F : R2 → R given by F (x, y) := for x2 +y 2
(x, y) ̸= 0 and F (0, 0) = 0 is Gâteaux differentiable at 0 with Gâteaux
derivative
F (εu, εv)
δF (0, (u, v)) = lim = F (u, v),
ε→0 ε
which is clearly nonlinear.
The function F : R2 → R given by F (x, y) = x for y = x2 and F (x, y) :=
0 else is Gâteaux differentiable at 0 with Gâteaux derivative δF (0) = 0,
which is clearly linear. However, F is not differentiable.
If you take a linear function L : X → Y which is unbounded, then L
is everywhere Gâteaux differentiable with derivative equal to Lu, which is
linear but, by construction, not bounded. ⋄
2René Gâteaux (1889–1914), French mathematician
9.2. Multivariable calculus in Banach spaces 249
In fact, by the chain rule h(ε) := G(f + εg) is differentiable with h′ (0) =
(∂x G)(f )Re(g) + (∂y G)(f )Im(g). Moreover, by the mean value theorem
h(ε) − h(0)
q
≤ sup (∂x G)(f + τ g)2 + (∂y G)(f + τ g)2 |g|
ε 0≤τ ≤ε
Proof. For every ε > 0 we can find a δ > 0 such that |F (x + u) − F (x) −
dF (x) u| ≤ ε|u| for |u| ≤ δ. Now choose M = ∥dF (x)∥ + ε. □
Example 9.6. Note that this lemma fails for the Gâteaux derivative as the
example of an unbounded linear function shows. In fact, it already fails in
R2 as the function F : R2 → R given by F (x, y) = 1 for y = x2 ̸= 0 and
F (x, y) = 0 else shows: It is Gâteaux differentiable at 0 with δF (0) = 0 but
it is not continuous since limε→0 F (ε, ε2 ) = 1 ̸= 0 = F (0, 0). ⋄
9.2. Multivariable calculus in Banach spaces 251
Using |ṽ| ≤ ∥dF (x)∥|u| + |o(u)| we see that o(ṽ) = o(u) and hence
G(F (x + u)) = G(y) + dG(y)v + o(u) = G(F (x)) + dG(F (x)) ◦ dF (x)u + o(u)
as required. This establishes the case r = 1. The general case follows from
induction. □
Proof. First of all note that ∂j F (x) ∈ L (R, Y ) and thus it can be regarded
as an element of Y . Clearly the same applies to ∂i ∂j F (x). Let ℓ ∈ Y ∗
be a bounded linear functional, then ℓ ◦ F ∈ C 2 (R2 , R) and hence ∂i ∂j (ℓ ◦
F ) = ∂j ∂i (ℓ ◦ F ) by the classical theorem of Schwarz. Moreover, by our
remark preceding this lemma ∂i ∂j (ℓ ◦ F ) = ∂i ℓ(∂j F ) = ℓ(∂i ∂j F ) and hence
ℓ(∂i ∂j F ) = ℓ(∂j ∂i F ) for every ℓ ∈ Y ∗ implying the claim. □
Finally, note that to each L ∈ L n (X, Y ) we can assign its polar form
L ∈ C(X, Y ) using L(x) = L(x, . . . , x), x ∈ X. If L is symmetric it can be
reconstructed using polarization (Problem 9.9):
n
1 X
L(u1 , . . . , un ) = ∂t1 · · · ∂tn L( ti ui ). (9.24)
n!
i=1
since we can assume |o(ε)| < εδ for ε > 0 small enough, a contradiction. □
Proof. As in the proof of the previous lemma, the case r = 0 is just the
fundamental theorem of calculus applied to f (t) := F (x + tu). For the in-
duction step we use integration by parts. To this end let fj ∈ C 1 ([0, 1], Xj ),
L ∈ L 2 (X1 × X2 , Y ) bilinear. Then the product rule (9.25) and the funda-
mental theorem of calculus imply
Z 1 Z 1
˙
L(f1 (t), f2 (t))dt = L(f1 (1), f2 (1))−L(f1 (0), f2 (0))− L(f1 (t), f˙2 (t))dt.
0 0
Hence applying integration by parts with L(y, t) = ty, f1 (t) = dr+1 F (x+ut),
r+1
and f2 (t) = (1−t)
(r+1)! establishes the induction step. □
Of course this also gives the Peano form for the remainder:
Corollary 9.14. Suppose U ⊆ X and F ∈ C r (U, Y ). Then
1 1
F (x+u) = F (x)+dF (x)u+ d2 F (x)u2 +· · ·+ dr F (x)ur +o(|u|r ). (9.31)
2 r!
9.2. Multivariable calculus in Banach spaces 257
The set of all r times continuously differentiable functions for which this
norm is finite forms a Banach space which is denoted by Cbr (U, Y ).
In the definition of differentiability we have required U to be open. Of
course there is no stringent reason for this and (9.12) could simply be required
for all sequences from U \ {x} converging to x. However, note that the
derivative might not be unique in case you miss some directions (the ultimate
problem occurring at an isolated point). Our requirement avoids all these
issues. Moreover, there is usually another way of defining differentiability
at a boundary point: By C r (U , Y ) we denote the set of all functions in
C r (U, Y ) all whose derivatives of order up to r have a continuous extension
to U . Note that if you can approach a boundary point along a half-line then
the fundamental theorem of calculus shows that the extension coincides with
the Gâteaux derivative.
Problem* 9.7. Let X := C([0, 1], R) and suppose f ∈ C 1 (R). Show that
F : X → X, x 7→ f ◦ x
Note that if δ n F (x, u) exists, then δ n F (x, λu) exists for every λ ∈ R and
δ n F (x, λu) = λn δ n F (x, u), λ ∈ R. (9.35)
However, the condition > 0 for all unit vectors u is not sufficient as
δ 2 F (x, u)
there are certain features you might miss when you only look at the function
along rays through a fixed point. This is demonstrated by the following
example:
Example 9.13. Let X = R2 and consider the points (xn , yn ) := ( n1 , n12 ).
For each point choose a radius rn such that the balls Bn := Brn (xn , yn )
2
are disjoint and lie between two parabolas Bn ⊂ {(x, y)|x ≥ 0, x2 ≤ y ≤
9.3. Minimizing nonlinear functionals via calculus 259
Proof. The necessary conditions have already been established. To see the
sufficient conditions note that the assumptions on δ 2 F imply that there is
some ε > 0 such that δ 2 F (y, u) ≥ 2c for all y ∈ Bε (x) and all u ∈ ∂B1 (0).
Equivalently, δ 2 F (y, u) ≥ 2c |u|2 for all y ∈ Bε (x) and all u ∈ X. Hence
applying Taylor’s theorem to f (t) using f¨(t) = δ 2 F (x + tu, u) gives
Z 1
c
F (x + u) = f (1) = f (0) + (1 − s)f¨(s)ds ≥ F (x) + |u|2
0 4
for u ∈ Bε (0). □
Then F ∈ C 2 (X, R) with dF (x)u = n∈N ( xnn2 − 4x3n )un and d2 F (x)(u, v) =
P
There are two special cases worth while noticing: First of all, if L does not
depend on q, then the Euler–Lagrange equation simplifies to
Lq̇ (s, q̇(s)) = C
and this identity can be derived directly from δF (x, u) without performing
the integration by parts (Lemma 3.24 from [37]). In particular, it suffices to
assume L ∈ C 1 and x ∈ C 1 in this case.
Secondly, if L does not depend on s, we can eliminate Lq q̇ from
d
L(q(s), q̇(s)) = Lq (q(s), q̇(s))q̇(s) + Lq̇ (q(s), q̇(s))q̈(s)
ds
using the Euler–Lagrange identity which gives
d d
L(q(s), q̇(s)) = q̇(s) Lq̇ (q(s), q̇(s)) + Lq̇ (q(s), q̇(s))q̈(s).
ds ds
Rearranging this last equation gives the Beltrami identity
L(q(s), q̇(s)) − q̇(s)Lq̇ (q(s), q̇(s)) = C
which must be satisfied by any solution of the Euler–Lagrange equation.
Conversely, if q̇ ̸= 0, a solution of the Beltrami equation also solves the
Euler–Lagrange equation. ⋄
Example 9.17. For example, for a classical particle of mass m > 0 moving
in a conservative force field described by a potential V ∈ C 1 (Rn , R) the
Lagrangian is given by the difference between kinetic and potential energy
m
L(t, q, q̇) := q̇ 2 − V (q)
2
and the Euler–Lagrange equations read
mq̈ = −Vq (q),
which are just Newton’s equations of motion. ⋄
Finally we note that the situation simplifies a lot when F is convex. Our
first observation is that a local minimum is automatically a global one.
Lemma 9.17. Suppose C ⊆ X is convex and F : C → R is convex. Every
local minimum is a global minimum. Moreover, if F is strictly convex then
the minimum is unique.
Proof. Suppose x is a local minimum and F (y) < F (x). Then F (λy + (1 −
λ)x) ≤ λF (y) + (1 − λ)F (x) < F (x) for λ ∈ (0, 1) contradicts the fact that x
is a local minimum. If x, y are two global minima, then F (λy + (1 − λ)x) <
F (y) = F (x) yielding a contradiction unless x = y. □
As in the one-dimensional case, convexity can be read off from the second
derivative.
Lemma 9.19. Suppose C ⊆ X is open and convex and F : C → R
has Gâteaux derivatives up to order two. Then F is convex if and only
if δ 2 F (x, u) ≥ 0 for all x ∈ C and u ∈ X. Moreover, F is strictly convex if
δ 2 F (x, u) > 0 for all x ∈ C and u ∈ X \ {0}.
There is also a version using only first derivatives plus the concept of a
monotone operator. A map F : U ⊆ X → X ∗ is monotone if
(F (x) − F (y))(x − y) ≥ 0, x, y ∈ U.
It is called strictly monotone if we have strict inequality for x ̸= y. Mono-
tone operators will be the topic of Chapter 12.
Lemma 9.20. Suppose C ⊆ X is open and convex and F : C → R has
Gâteaux derivatives δF (x) ∈ X ∗ for every x ∈ C. Then F is (strictly)
convex if and only if δF is (strictly) monotone.
Of course we know that the shortest curve between two given points q0 and q1
is a straight line. Notwithstanding that this is evident, defining the length as
the total variation, let us show this by seeking the minimum of the following
functional
Z b
t−a
F (x) := |q ′ (s)|ds, q(t) = x(t) + q0 + (q1 − q0 )
a b−a
for x ∈ X := {x ∈ C 1 ([a, b], Rn )|x(a) = x(b) = 0}. Unfortunately our inte-
grand will not be differentiable unless |q ′ | ≥ c. However, since the absolute
value is convex, so is F and it will suffice to search for a local minimum
within the convex open set C := {x ∈ X||x′ | < |q2(b−a)1 −q0 |
}. We compute
Z b ′
q (s)u′ (s)
δF (x, u) = ds
a |q ′ (s)|
which shows by virtue of the du Bois-Reymond5 Lemma (Lemma 3.24 from
[37]) that q ′ /|q ′ | must be constant. Hence the local minimum in C is indeed
a straight line and this must also be a global minimum in X. However, since
the length of a curve is independent of its parametrization, this minimum is
not unique! ⋄
Example 9.19. Let us try to find a curve q(t) from q(0) = 0 to q(t1 ) = q1
which minimizes
Z t1 r
1 + q̇(t)2
F (q) := dt.
0 t
√
Note that since the function v 7→ 1 + v 2 is convex, we obtain that F is
convex. Hence it suffices to find a zero of
Z b
q̇(t)u̇(t)
δF (q, u) = p dt,
0 t(1 + q̇(t)2 )
which shows by virtue of the du Bois-Reymond Lemma (Lemma 3.24 from
[37]) that √ q̇ 2 = C −1/2 is constant or equivalently
t(1+q̇ )
r
t
q̇(t) =
C −t
and hence r !
t p
q(t) = C arctan − t(C − t).
C −t
The constant C has to be chosen such that q(t1 ) matches the given value
q1 . Note that C 7→ q(t1 ) decreases from πt21 to 0 and hence there will be a
unique C > t1 for 0 < q1 < πt21 . ⋄
5Paul du Bois-Reymond (1831–1889), German mathematician
264 9. Analysis in Banach spaces
use weak convergence to get compactness and hence we will also need weak
(sequential) continuity of F . However, since there are more weakly than
strongly convergent subsequences, weak (sequential) continuity is in fact a
stronger property than just continuity!
Example 9.21. By Lemma 4.29 (ii) the norm is weakly sequentially lower
semicontinuous but it is in general not weakly sequentially continuous as any
infinite orthonormal set in a Hilbert space converges weakly to 0. However,
note that this problem does not occur for linear maps. This is an immediate
consequence of the very definition of weak convergence (Problem 4.37). ⋄
Hence weak continuity might be too much to hope for in concrete ap-
plications. In this respect note that, for our argument to work, lower semi-
continuity (cf. Problem B.19) will already be sufficient. This is frequently
referred to as the direct method in the calculus of variations due to
Zaremba and Hilbert:
Theorem 9.21 (Variational principle). Let X be a reflexive Banach space
and let F : M ⊆ X → (−∞, ∞]. Suppose M is nonempty, weakly se-
quentially closed and that either F is weakly coercive, that is F (x) → ∞
whenever ∥x∥ → ∞, or that M is bounded. Then, if F is weakly sequentially
lower semicontinuous, there exists some x0 ∈ M with F (x0 ) = inf M F .
If F is Gâteaux differentiable, then
δF (x0 , u) = 0 (9.36)
for every u ∈ X with x0 + εu ∈ M for sufficiently small ε.
Proof. Without loss of generality we can assume F (x) < ∞ for some x ∈ M .
As above we start with a sequence xn ∈ M such that F (xn ) → inf M F <
∞. If M is unbounded, then the fact that F is coercive implies that xn
is bounded. Otherwise, if M is bounded, it is obviously bounded. Hence
by Theorem 4.32 we can pass to a subsequence such that xn ⇀ x0 with
x0 ∈ M since M is assumed sequentially closed. Now, since F is weakly
sequentially lower semicontinuous, we finally get inf M F = limn→∞ F (xn ) =
lim inf n→∞ F (xn ) ≥ F (x0 ). □
we can find a linear functional ℓ which separates {x} and M Re(ℓ(x)) <
c ≤ Re(ℓ(y)), y ∈ M . But this contradicts Re(ℓ(x)) < c ≤ Re(ℓ(xn )) →
Re(ℓ(x)). □
and equality would imply max{K(x, u(x)), K(x, v(x))} = K(x, λu(x) + (1 −
λ)v(x)) for a.e. x and hence u(x) = v(x) for a.e. x.
Note that this result generalizes to Cn -valued functions in a straightfor-
ward manner. ⋄
Moreover, in this case our variational principle reads as follows:
Corollary 9.24. Let X be a reflexive Banach space and let M be a nonempty
closed convex subset. If F : M ⊆ X → R is quasiconvex, lower semicontinu-
ous, and, if M is unbounded, weakly coercive, then there exists some x0 ∈ M
with F (x0 ) = inf M F . If F is strictly quasiconvex then x0 is unique.
Example 9.26. Let H be a Hilbert space and let us consider the problem
of finding the lowest eigenvalue of a positive operator A ≥ 0. Of course
this is bound to fail since the eigenvalues could accumulate at 0 without 0
being an eigenvalue (e.g. the multiplication operator with the sequence n1 in
ℓ2 (N)). Nevertheless it is instructive to see how things can go wrong (and it
underlines the importance of our various assumptions).
To this end consider its quadratic form qA (f ) = ⟨f, Af ⟩. Then, since
1/2
qA is a seminorm (Problem 1.29) and taking squares is convex, qA is con-
vex. If we consider it on M = B̄1 (0) we get existence of a minimum from
Theorem 9.21. However this minimum is just qA (0) = 0 which is not very
interesting. In order to obtain a minimal eigenvalue we would need to take
M = S1 = {f | ∥f ∥ = 1}, however, this set is not weakly closed (its weak
closure is B̄1 (0) as we will see in the next section). In fact, as pointed out
before, the minimum is in general not attained on M in this case.
Note that our problem with the trivial minimum at 0 would also dis-
appear if we would search for a maximum instead. However, our lemma
above only guarantees us weak sequential lower semicontinuity but not weak
sequential upper semicontinuity. In fact, note that not even the norm (the
quadratic form of the identity) is weakly sequentially upper continuous (cf.
Lemma 4.29 (ii) versus Lemma 4.30). If we make the additional assumption
that A is compact, then qA is weakly sequentially continuous as can be seen
from Theorem 4.33. Hence for compact operators the maximum is attained
at some vector f0 . Of course we will have ∥f0 ∥ = 1 but is it an eigenvalue?
To see this we resort to a small ruse: Consider the real function
qA (f0 + tf ) α0 + 2tRe⟨f, Af0 ⟩ + t2 qA (f )
ϕ(t) = = , α0 = qA (f0 ),
∥f0 + tf ∥2 1 + 2tRe⟨f, f0 ⟩ + t2 ∥f ∥2
which has a maximum at t = 0 for any f ∈ H. Hence we must have ϕ′ (0) =
2Re⟨f, (A − α0 )f0 ⟩ = 0 for all f ∈ H. Replacing f → if we get 2Im⟨f, (A −
α0 )f0 ⟩ = 0 and hence ⟨f, (A − α0 )f0 ⟩ = 0 for all f , that is Af0 = α0 f . So
we have recovered Theorem 3.6. ⋄
Example 9.27. The classical Poisson problem6 asks for a solution of
−∆u = f
on X := L2R (Rn ) and set F (u) = ∞ if u ̸∈ HR1 (Rn ) ∩ L3R (Rn ). We also choose
M := X. One checks that for u ∈ HR1 (Rn ) ∩ L3R (Rn ) and ϕ ∈ Cc∞ (Rn ) this
functional has a variational derivative
Z
δF (u, ϕ) = (∂u · ∂ϕ + (|u|u + u − f )ϕ) dn x = 0
Rn
which coincides with the weak formulation of our problem. Hence a mini-
mizer (which is necessarily in HR1 (Rn ) ∩ L3R (Rn )) is a weak solution of our
nonlinear elliptic problem and it remains to show existence of a minimizer.
First of all note that
1 1
F (u) ≥ ∥u∥22 − ∥u∥2 ∥f ∥2 ≥ ∥u∥22 − ∥f ∥22
2 4
and hence F is coercive. To see that it is weakly sequentially lower continu-
ous, observe that for the first term this follows from convexity, for the second
term this follows from Example 9.23 and the last two are easy. Hence the
claim follows. ⋄
If we look at Example 9.28 in the case f = 0, our approach will only
give us the trivial solution. In fact, for a linear problem one has nontriv-
ial solutions for the homogenous problem only at an eigenvalue. Since the
Laplace operator has no eigenvalues on Rn (as is not hard to see using the
Fourier transform), we look at a bounded domain U instead. To avoid the
trivial solution we will add a constraint. Of course the natural constraint
is to require admissible elements to be normalized. However, since the unit
sphere is not weakly closed (its weak closure is the unit ball — see Exam-
ple 6.10), we cannot simply add this requirement to M . To overcome this
problem we will use that another way of getting weak sequential closedness
is via compactness:
Proof. This follows from Theorem 4.33 since every weakly convergent se-
quence in X is convergent in Y . □
Proof. If x = F (x) and x̃ = F (x̃), then |x − x̃| = |F (x) − F (x̃)| ≤ θ|x − x̃|
shows that there can be at most one fixed point.
Concerning existence, fix x0 ∈ C and consider the sequence xn = F n (x0 ).
We have
|xn+1 − xn | ≤ θ|xn − xn−1 | ≤ · · · ≤ θn |x1 − x0 |
and hence by the triangle inequality (for n > m)
n
X n−m−1
X
m
|xn − xm | ≤ |xj − xj−1 | ≤ θ θj |x1 − x0 |
j=m+1 j=0
θm
≤ |x1 − x0 |. (9.41)
1−θ
Thus xn is Cauchy and tends to a limit x. Moreover,
|F (x) − x| = lim |xn+1 − xn | = 0
n→∞
shows that x is a fixed point and the estimate (9.40) follows after taking the
limit m → ∞ in (9.41). □
Note that our proof is constructive, since it shows that the solution ξ(y)
can be obtained by iterating x − (∂x F (x0 , y0 ))−1 (F (x, y) − F (x0 , y0 )).
Moreover, as a corollary of the implicit function theorem we also obtain
the inverse function theorem.
Theorem 9.31 (Inverse function). Suppose F ∈ C r (U, Y ), r ≥ 1, U ⊆
X, and let dF (x0 ) be an isomorphism for some x0 ∈ U . Then there are
neighborhoods U1 , V1 of x0 , F (x0 ), respectively, such that F ∈ C r (U1 , V1 ) is
a diffeomorphism.
Problem 9.15. Derive Newton’s method for finding the zeros of a twice
continuously differentiable function f (x),
f (x)
xn+1 = F (xn ), F (x) = x − ,
f ′ (x)
from the contraction principle by showing that if x is a zero with f ′ (x) ̸=
0, then there is a corresponding closed interval C around x such that the
assumptions of Theorem 9.27 are satisfied.
Proof. Fix x0 ∈ C(I, U ) and ε > 0. For each t ∈ I we have a δ(t) > 0
such that B2δ(t) (x0 (t)) ⊂ U and |f (x) − f (x0 (t))| ≤ ε/2 for all x with
|x − x0 (t)| ≤ 2δ(t). The balls Bδ(t) (x0 (t)), t ∈ I, cover the set {x0 (t)}t∈I
and since I is compact, there is a finite subcover Bδ(tj ) (x0 (tj )), 1 ≤ j ≤
n. Let ∥x − x0 ∥ ≤ δ := min1≤j≤n δ(tj ). Then for each t ∈ I there is
a tj such that |x0 (t) − x0 (tj )| ≤ δ(tj ) and hence |f (x(t)) − f (x0 (t))| ≤
|f (x(t)) − f (x0 (tj ))| + |f (x0 (tj )) − f (x0 (t))| ≤ ε since |x(t) − x0 (tj )| ≤
|x(t) − x0 (t)| + |x0 (t) − x0 (tj )| ≤ 2δ(tj ). This settles the case r = 0.
9.6. Ordinary differential equations 277
Next let us turn to r = 1. We claim that df∗ is given by (df∗ (x0 )x)(t) :=
df (x0 (t))x(t). To show this we use Taylor’s theorem (cf. the proof of Corol-
lary 9.14) to conclude that
|f (x0 (t) + x) − f (x0 (t)) − df (x0 (t))x| ≤ |x| sup ∥df (x0 (t) + sx) − df (x0 (t))∥.
0≤s≤1
By the first part (df )∗ is continuous and hence for a given ε we can find a
corresponding δ such that |x(t) − y(t)| ≤ δ implies ∥df (x(t)) − df (y(t))∥ ≤ ε
and hence ∥df (x0 (t) + sx) − df (x0 (t))∥ ≤ ε for |x0 (t) + sx − x0 (t)| ≤ |x| ≤ δ.
But this shows differentiability of f∗ as required and it remains to show that
df∗ is continuous. To see this we use the linear map
λ : C(I, L (X, Y )) → L (C(I, X), C(I, Y )) ,
T 7→ T∗
where (T∗ x)(t) := T (t)x(t). Since we have
∥T∗ x∥ = sup |T (t)x(t)| ≤ sup ∥T (t)∥|x(t)| ≤ ∥T ∥∥x∥,
t∈I t∈I
Now we come to our existence and uniqueness result for the initial value
problem in Banach spaces.
Theorem 9.33. Let I be an open interval, U an open subset of a Banach
space X and Λ an open subset of another Banach space. Suppose F ∈ C r (I ×
U × Λ, X), r ≥ 1, then the initial value problem
ẋ = F (t, x, λ), x(t0 ) = x0 , (t0 , x0 , λ) ∈ I × U × Λ, (9.49)
has a unique solution x(t, t0 , x0 , λ) ∈ C r (I1 × I2 × U1 × Λ1 , U ), where I1,2 ,
U1 , and Λ1 are open subsets of I, U , and Λ, respectively. The sets I2 , U1 ,
and Λ1 can be chosen to contain any point t0 ∈ I, x0 ∈ U , and λ0 ∈ Λ,
respectively.
ẋ = Ax, x(0) = x0 ,
x(t) = exp(tA)x0 ,
where
∞ k
X t
exp(tA) := Ak .
k!
k=0
It is easy to check that the last series converges absolutely (cf. also Prob-
lem 1.50) and solves the differential equation (Problem 9.17). ⋄
Example 9.34. The classical example ẋ = x2 , x(0) = x0 , in X := R with
solution
1
x0 (−∞, x0 ), x0 > 0,
x(t) = , t ∈ R, x0 = 0,
1 − x0 t
1
( x0 , ∞), x0 < 0.
shows that solutions might not exist for all t ∈ R even though the differential
equation is defined for all t ∈ R. ⋄
9.6. Ordinary differential equations 279
This raises the question about the maximal interval on which a solution
of the initial value problem (9.49) can be defined.
Suppose that solutions of the initial value problem (9.49) exist locally and
are unique (as guaranteed by Theorem 9.33). Let ϕ1 , ϕ2 be two solutions
of (9.49) defined on the open intervals I1 , I2 , respectively. Let I := I1 ∩
I2 = (T− , T+ ) and let (t− , t+ ) be the maximal open interval on which both
solutions coincide. I claim that (t− , t+ ) = (T− , T+ ). In fact, if t+ < T+ ,
both solutions would also coincide at t+ by continuity. Next, considering the
initial value problem with initial condition x(t+ ) = ϕ1 (t+ ) = ϕ2 (t+ ) shows
that both solutions coincide in a neighborhood of t+ by local uniqueness.
This contradicts maximality of t+ and hence t+ = T+ . Similarly, t− = T− .
Moreover, we get a solution
(
ϕ1 (t), t ∈ I1 ,
ϕ(t) := (9.52)
ϕ2 (t), t ∈ I2 ,
defined on I1 ∪ I2 . In fact, this even extends to an arbitrary number of
solutions and in this way we get a (unique) solution defined on some maximal
interval.
Theorem 9.34. Suppose the initial value problem (9.49) has a unique local
solution (e.g. the conditions of Theorem 9.33 are satisfied). Then there ex-
ists a unique maximal solution defined on some maximal interval I(t0 ,x0 ) =
(T− (t0 , x0 ), T+ (t0 , x0 )).
Proof. Let S be the set of all Ssolutions ϕ of (9.49) which are defined on
an open interval Iϕ . Let I := ϕ∈S Iϕ , which is again open. Moreover, if
t1 > t0 ∈ I, then t1 ∈ Iϕ for some ϕ and thus [t0 , t1 ] ⊆ Iϕ ⊆ I. Similarly for
t1 < t0 and thus I is an open interval containing t0 . In particular, it is of the
form I = (T− , T+ ). Now define ϕmax (t) on I by ϕmax (t) := ϕ(t) for some
ϕ ∈ S with t ∈ Iϕ . By our above considerations any two ϕ will give the same
value, and thus ϕmax (t) is well-defined. Moreover, for every t1 > t0 there is
some ϕ ∈ S such that t1 ∈ Iϕ and ϕmax (t) = ϕ(t) for t ∈ (t0 − ε, t1 + ε) which
shows that ϕmax is a solution. By construction there cannot be a solution
defined on a larger interval. □
The solution found in the previous theorem is called the maximal so-
lution. A solution defined for all t ∈ R is called a global solution. Clearly
every global solution is maximal.
The next result gives a simple criterion for a solution to be global.
Lemma 9.35. Suppose F ∈ C 1 (R×X, X) and let x(t) be a maximal solution
of the initial value problem (9.49). Suppose |F (t, x(t))| is bounded on finite
t-intervals. Then x(t) is a global solution.
280 9. Analysis in Banach spaces
Proof. Let (T− , T+ ) be the domain of x(t) and suppose T+ < ∞. Then
|F (t, x(t))| ≤ C for t ∈ (t0 , T+ ) and for t0 < s < t < T+ we have
Z t Z t
|x(t) − x(s)| ≤ |ẋ(τ )|dτ = |F (τ, x(τ ))|dτ ≤ C|t − s|.
s s
Thus x(tn ) is Cauchy whenever tn is and hence limt→T+ x(t) = x+ exists.
Now let y(t) be the solution satisfying the initial condition y(T+ ) = x+ .
Then (
x(t), t < T+ ,
x̃(t) =
y(t), t ≥ T+ ,
is a larger solution contradicting maximality of T+ . □
Example 9.35. Finally, we want to apply this to a famous example, the so-
called FPU lattices (after Enrico Fermi, John Pasta, and Stanislaw Ulam
who investigated such systems numerically). This is a simple model of a
linear chain of particles coupled via nearest neighbor interactions. Let us
assume for simplicity that all particles are identical and that the interaction
is described by a potential V ∈ C 2 (R). Then the equation of motions are
given by
q̈n (t) = V ′ (qn+1 − qn ) − V ′ (qn − qn−1 ), n ∈ Z,
where qn (t) ∈ R denotes the position of the n’th particle at time t ∈ R and
the particle index n runs through all integers. If the potential is quadratic,
V (r) = k2 r2 , then we get the discrete linear wave equation
q̈n (t) = k qn+1 (t) − 2qn (t) + qn−1 (t) .
If we use the fact that the Jacobi operator Aqn = −k(qn+1 − 2qn + qn−1 ) is
a bounded operator in X = ℓpR (Z) we can easily solve this system as in the
case of ordinary differential equations. In fact, if q 0 = q(0) and p0 = q̇(0)
are the initial conditions then one can easily check (cf. Problem 9.17) that
the solution is given by
sin(tA1/2 ) 0
q(t) = cos(tA1/2 )q 0 + p .
A1/2
In the Hilbert space case p = 2 these functions of our operator A could
be defined via the spectral theorem but here we just use the more direct
definition
∞ ∞
X t2k k sin(tA1/2 ) X t2k+1
cos(tA1/2 ) := A , := Ak .
k=0
(2k)! A1/2 (2k + 1)!
k=0
In the general case an explicit solution is no longer possible but we are still
able to show global existence under appropriate conditions. To this end
we will assume that V has a global minimum at 0 and hence looks like
V (r) = V (0) + k2 r2 + o(r2 ). As V (0) does not enter our differential equation
9.6. Ordinary differential equations 281
Now split our equation into a system of two equations according to the above
splitting of the underlying Banach space:
F (µ, x) = 0 ⇔ F1 (µ, u, v) = 0, F2 (µ, u, v) = 0, (9.56)
where x = u+v with u = P x ∈ Ker(A), v = (1−P )x ∈ X0 and F1 (µ, u, v) =
Q F (µ, u + v), F2 (µ, u, v) = (1 − Q)F (µ, u + v).
Since P, Q are bounded, this system is still C 1 and the derivatives are
given by (recall the block structure of A from (7.23))
∂u F1 (µ0 , 0, 0) = 0, ∂v F1 (µ0 , 0, 0) = 0,
∂u F2 (µ0 , 0, 0) = 0, ∂v F2 (µ0 , 0, 0) = A0 . (9.57)
Moreover, since A0 is an isomorphism, the implicit function theorem tells us
that we can (locally) solve F2 for v. That is, there exists a neighborhood U
of (µ0 , 0) ∈ R × Ker(A) and a unique function ψ ∈ C 1 (U, X0 ) such that
F2 (µ, u, ψ(µ, u)) = 0, (µ, u) ∈ U. (9.58)
In particular, by the uniqueness part we have ψ(µ, 0) = 0. Moreover,
∂u ψ(µ0 , 0) = −A−1
0 ∂u F2 (µ0 , 0, 0) = 0.
Plugging this into the first equation reduces to the original system to the
finite dimensional system
F̃1 (µ, u) = F1 (µ, u, ψ(µ, u)) = 0. (9.59)
Of course the chain rule tells us that F̃ ∈ C 1.
Moreover, we still have
F̃1 (µ, 0) = F1 (µ, 0, ψ(µ, 0)) = QF (µ, 0) = 0 as well as
∂u F̃1 (µ0 , 0) = ∂u F1 (µ0 , 0, 0) + ∂v F1 (µ0 , 0, 0)∂u ψ(µ0 , 0) = 0. (9.60)
This is known as Lyapunov–Schmidt reduction.
Now that we have reduced the problem to a finite-dimensional system,
it remains to find conditions such that the finite dimensional system has a
nontrivial solution. For simplicity we make the requirement
dim Ker(A) = dim Coker(A) = 1 (9.61)
such that we actually have a problem in R × R → R.
Explicitly, let u0 span Ker(A) and let u1 span X1 . Then we can write
F̃1 (µ, λu0 ) = f (µ, λ)u1 , (9.62)
where f ∈ C 1 (V, R) with V = {(µ, λ)|(µ, λu0 ) ∈ U } ⊆ R2 a neighborhood of
(µ0 , 0). Of course we still have f (µ, 0) = 0 for (µ, 0) ∈ V as well as
∂λ f (µ0 , 0)u1 = ∂u F̃1 (µ0 , 0)u0 = 0. (9.63)
It remains to investigate f . To split off the trivial solution it suggests itself
to write
f (µ, λ) = λ g(µ, λ) (9.64)
9.7. Bifurcation theory 285
We already have
g(µ0 , 0) = ∂λ f (µ0 , 0) = 0 (9.65)
and hence if
0 ̸= ∂µ g(µ0 , 0) = ∂µ ∂λ f (µ0 , 0) ̸= 0 (9.66)
the implicit function theorem implies existence of a function µ(λ) with µ(0) =
∂ 2 f (µ ,0)
µ0 and g(µ(λ), λ) = 0. Moreover, µ′ (0) = − ∂∂µλ g(µ0 ,0)
g(µ0 ,0) = − 2∂µ ∂λ f (µ0 ,0) .
λ 0
Note that if Q∂x2 F (µ0 , 0)(u0 , u0 ) ̸= 0 we could have also solved for λ
obtaining a function λ(µ) with λ(µ0 ) = 0. However, in this case it is not
obvious that λ(µ) ̸= 0 for µ ̸= µ0 , and hence that we get a nontrivial
solution, unless we also require Q∂µ ∂x F (µ0 , 0)u0 ̸= 0 which brings us back
to our previous condition. If both conditions are met, then µ′ (0) ̸= 0 and
286 9. Analysis in Banach spaces
there is a unique nontrivial solution x(µ) which crosses the trivial solution
non transversally at µ0 . This is known as transcritical bifurcation. If
µ′ (0) = 0 but µ′′ (0) ̸= 0 (assuming this derivative exists), then two solutions
will branch off (either for µ > µ0 or for µ < µ0 depending on the sign of
the second derivative). This is known as a pitchfork bifurcation and is
typical in case of an odd function F (µ, −x) = −F (µ, x).
Example 9.38. Now we can establish existence of a stationary solution of
the dNLS of the form
un (t) = e−itω ϕn (ω)
Plugging this ansatz into the dNLS we get the stationary dNLS
Hϕ − ωϕ ± |ϕ|2p ϕ = 0.
Of course we always have the trivial solution ϕ = 0.
Applying our analysis to
1
F (ω, ϕ) = (H − ω)ϕ ± |ϕ|2p ϕ, p> ,
2
we have (with respect to ϕ = ϕr + iϕi ∼
= (ϕr , ϕi ))
2
2(p−1) ϕr ϕr ϕi
∂ϕ F (ω, ϕ)u = (H − ω)u ± 2p|ϕ| u ± |ϕ|2p u
ϕr ϕi ϕ2i
and in particular ∂ϕ F (ω, 0) = H −ω and hence ω must be an eigenvalue of H.
In fact, if ω0 is a discrete eigenvalue, then self-adjointness implies that H −ω0
is Fredholm of index zero. Moreover, if there are two eigenfunction u and v,
then one checks that the Wronskian W (u, v) = u(n)v(n+1)−u(n+1)v(n) is
constant. But square summability implies that the Wronskian must vanish
and hence u and v must be linearly dependent (note that a solution of Hu =
ω0 u vanishing at two consecutive points must vanish everywhere). Hence
eigenvalues are always simple for our Jacobi operator H. Finally, if u0 is the
eigenfunction corresponding to ω0 we have
∂ω ∂ϕ F (ω0 , 0)u0 = −u0 ̸∈ Ran(H − ω0 ) = Ker(H − ω0 )⊥
and the Crandall–Rabinowitz theorem ensures existence of a stationary so-
lution ϕ for ω in a neighborhood of ω0 . Note that
∂ϕ2 F (ω, ϕ)(u, v) = ±2p(2p + 1)|ϕ|2p−1 sign(ϕ)uv
and hence ∂ϕ2 F (ω, 0) = 0. This is of course not surprising and related to
the symmetry F (ω, −ϕ) = −F (ω, ϕ) which implies that zeros branch off in
symmetric pairs.
Of course this leaves the problem of finding a discrete eigenvalue open.
One can show that for the free operator H0 (with q = 0) the spectrum
is σ(H0 ) = [−2, 2] and that there are no eigenvalues (in fact, the discrete
Fourier transform will map H0 to a multiplication operator in L2 (−π, π)).
9.7. Bifurcation theory 287
10.1. Introduction
Many applications lead to the problem of finding zeros of a mapping f : U ⊆
X → X, where X is some (real) Banach space. That is, we are interested in
the solutions of
f (x) = 0, x ∈ U. (10.1)
In most cases it turns out that this is too much to ask for, since determining
the zeros analytically is in general impossible.
Hence one has to ask some weaker questions and hope to find answers
for them. One such question would be “Are there any solutions, respectively,
how many are there?”. Luckily, these questions allow some progress.
To see how, lets consider the case f ∈ H(C), where H(U ) denotes the set
of holomorphic functions on a domain U ⊂ C. Recall the concept of the
winding number from complex analysis. The winding number of a path
γ : [0, 1] → C \ {z0 } around a point z0 ∈ C is defined by
Z
1 dz
n(γ, z0 ) := ∈ Z. (10.2)
2πi γ z − z0
It gives the number of times γ encircles z0 taking orientation into account.
That is, encirclings in opposite directions are counted with opposite signs.
In particular, if we pick f ∈ H(C) one computes (assuming 0 ̸∈ f (γ))
Z ′
1 f (z) X
n(f (γ), 0) = dz = n(γ, zk )αk , (10.3)
2πi γ f (z)
k
289
290 10. The Brouwer mapping degree
(1 − t)f (z) + t g(z) is the required homotopy since |f (z) − g(z)| < |g(z)|,
|z| = 1, implying H(t, z) ̸= 0 on [0, 1] × γ. Hence f (z) has one zero inside
the unit circle.
Summarizing, given a (sufficiently smooth) domain U with enclosing Jor-
dan curve ∂U , we have defined a degree deg(f, U, z0 ) = n(f (∂U ), z0 ) =
n(f (∂U ) − z0 , 0) ∈ Z which counts the number of solutions of f (z) = z0
inside U . The invariance of this degree with respect to certain deformations
of f allowed us to explicitly compute deg(f, U, z0 ) even in nontrivial cases.
Our ultimate goal is to extend this approach to continuous functions
f : Rn → Rn . However, such a generalization runs into several problems.
First of all, it is unclear how one should define the multiplicity of a zero. But
even more severe is the fact, that the number of zeros is unstable with respect
to small perturbations. For example, consider fε : [−1, 2] → R, x 7→ x2 − ε.
Then fε has no √ zeros for ε < 0, one zero
√ for ε = 0, two zeros for 0 < ε ≤ 1,
one for 1 < ε ≤ 2, and none for ε > 2. This shows the following facts.
(i) Zeros with f ′ ̸= 0 are stable under small perturbations.
(ii) The number of zeros can change if two zeros with opposite sign
change (i.e., opposite signs of f ′ ) run into each other.
(iii) The number of zeros can change if a zero drops over the boundary.
Hence we see that we cannot expect too much from our degree. In addition,
since it is unclear how it should be defined, we will first require some basic
properties a degree should have and then we will look for functions satisfying
these properties.
10.2. Definition of the mapping degree and the determinant formula 291
is positive for f ∈ C̄y (U, Rn ) and thus C̄y (U, Rn ) is an open subset of
C(U , Rn ).
Now that these things are out of the way, we come to the formulation of
the requirements for our degree.
A function deg which assigns each f ∈ C̄y (U, Rn ), y ∈ Rn , a real number
deg(f, U, y) will be called degree if it satisfies the following conditions.
(D1). deg(f, U, y) = deg(f − y, U, 0) (translation invariance).
(D2). deg(I, U, y) = 1 if y ∈ U (normalization).
(D3). If U1,2 are open, disjoint subsets of U such that y ̸∈ f (U \(U1 ∪U2 )),
then deg(f, U, y) = deg(f, U1 , y) + deg(f, U2 , y) (additivity).
(D4). If H(t) = (1 − t)f + tg ∈ C̄y (U, Rn ), t ∈ [0, 1], then deg(f, U, y) =
deg(g, U, y) (homotopy invariance).
292 10. The Brouwer mapping degree
Before we draw some first conclusions form this definition, let us discuss
the properties (D1)–(D4) first. (D1) is natural since deg(f, U, y) should
have something to do with the solutions of f (x) = y, x ∈ U , which is the
same as the solutions of f (x) − y = 0, x ∈ U . (D2) is a normalization
since any multiple of deg would also satisfy the other requirements. (D3)
is also quite natural since it requires deg to be additive with respect to
components. In addition, it implies that sets where f ̸= y do not contribute.
(D4) is not that natural since it already rules out the case where deg is the
cardinality of f −1 ({y}). On the other hand it will give us the ability to
compute deg(f, U, y) in several cases.
Theorem 10.1. Suppose deg satisfies (D1)–(D4) and let f, g ∈ C̄y (U, Rn ),
then the following statements hold.
(i). We have deg(f, ∅, y) = 0. Moreover, if Ui , 1 ≤ i ≤ N , are disjoint
open subsets of U such that y ̸∈ f (U \ N
S
PN i=1 Ui ), then deg(f, U, y) =
i=1 deg(f, Ui , y).
(ii). If y ̸∈ f (U ), then deg(f, U, y) = 0 (but not the other way round).
Equivalently, if deg(f, U, y) ̸= 0, then y ∈ f (U ).
(iii). If |f (x)−g(x)| < |f (x)−y|, x ∈ ∂U , then deg(f, U, y) = deg(g, U, y).
In particular, this is true if f (x) = g(x) for x ∈ ∂U .
Proof. For the first part of (i) use (D3) with U1 = U and U2 = ∅. For
the second part use U2 = ∅ in (D3) if N = 1 and the rest follows from
induction. For (ii) use N = 1 and U1 = ∅ in (i). For (iii) note that H(t, x) =
(1 − t)f (x) + t g(x) satisfies |H(t, x) − y| ≥ dist(y, f (∂U )) − |f (x) − g(x)| for
x on the boundary. □
Proof. For (i) it suffices to show that deg(., U, y) is locally constant. But
if |g − f | < dist(y, f (∂U )), then deg(f, U, y) = deg(g, U, y) by (D4) since
|H(t) − y| ≥ |f − y| − |g − f | > 0, H(t) = (1 − t)f + t g. The proof of (ii) is
similar.
10.2. Definition of the mapping degree and the determinant formula 293
N
X
deg(f, U, 0) = deg(f, U (xi ), 0). (10.9)
i=1
It suffices to consider one of the zeros, say x1 . Moreover, we can even assume
x1 = 0 and U (x1 ) = Bδ (0). Next we replace f by its linear approximation
around 0. By the definition of the derivative we have
f (x) = df (0)x + |x|r(x), r ∈ C(Bδ (0), Rn ), r(0) = 0. (10.10)
Now consider the homotopy H(t, x) = df (0)x + (1 − t)|x|r(x). In order
to conclude deg(f, Bδ (0), 0) = deg(df (0), Bδ (0), 0) we need to show 0 ̸∈
H(t, ∂Bδ (0)). Since Jf (0) ̸= 0 we can find a constant λ such that |df (0)x| ≥
λ|x| and since r(0) = 0 we can decrease δ such that |r| < λ. This implies
|H(t, x)| ≥ ||df (0)x| − (1 − t)|x||r(x)|| ≥ λδ − δ|r| > 0 for x ∈ ∂Bδ (0) as
desired.
In summary we have
N
X
deg(f, U, 0) = deg(df (xi ), Bδ (0), 0) (10.11)
i=1
294 10. The Brouwer mapping degree
Using this lemma we can now show the main result of this section.
Theorem 10.4. Suppose f ∈ C̄y1 (U, Rn ) and y ̸∈ CV(f ), then a degree
satisfying (D1)–(D4) satisfies
X
deg(f, U, y) = sign Jf (x), (10.12)
x∈f −1 ({y})
P
where the sum is finite and we agree to set x∈∅ = 0.
Up to this point we have only shown that a degree (provided there is one
at all) necessarily satisfies (10.12). Once we have shown that regular values
are dense, it will follow that the degree is uniquely determined by (10.12)
since the remaining values follow from point (iii) of Theorem 10.1. On the
other hand, we don’t even know whether a degree exists since it is unclear
whether (10.12) satisfies (D4). Hence we need to show that (10.12) can be
extended to f ∈ C̄y (U, Rn ) and that this extension satisfies our requirements
(D1)–(D4).
Proof. Since the claim is easy for linear mappings our strategy is as follows.
We divide U into sufficiently small subsets. Then we replace f by its linear
approximation in each subset and estimate the error.
Let CP(f ) := {x ∈ U |Jf (x) = 0} be the set of critical points of f . We
first pass to cubes which are easier to divide. Let {Qi }i∈N be a countable
cover for U consisting of open cubes such that Qi ⊂ U . Then it suffices
to prove that f (CP(f ) ∩ Qi ) has zero measure since CV(f ) = f (CP(f )) =
i f (CP(f ) ∩ Qi ) (the Qi ’s are a cover).
S
Let Q be anyone of these cubes and denote by ρ the length of its edges.
Fix ε > 0 and divide Q into N n cubes Qi of length ρ/N . These cubes don’t
have to be open and hence we can assume that they cover Q. Since df (x) is
uniformly continuous on Q we can find an N (independent of i) such that
Z 1
ερ
|f (x) − f (x̃) − df (x̃)(x − x̃)| ≤ |df (x̃ + t(x − x̃)) − df (x̃)||x̃ − x|dt ≤
0 N
(10.13)
for x̃, x ∈ Qi . Now pick a Qi which contains a critical point x̃i ∈ CP(f ).
Without restriction we assume x̃i = 0, f (x̃i ) = 0 and set M := df (x̃i ). By
det M = 0 there is an orthonormal basis {bi }1≤i≤n of Rn such that bn is
296 10. The Brouwer mapping degree
(e.g., C := maxx∈Q |df (x)|). Next, by our estimate (10.13) we even have
n
X √ ρ √ ρ
f (Qi ) ⊆ { λi bi | |λi | ≤ (C + ε) n , |λn | ≤ ε n }
N N
i=1
and hence the measure of f (Qi ) is smaller than N n . Since there are at most
C̃ε
C̃ε. □
δy (.) is the Dirac distribution at y. But since we don’t want to mess with
distributions, we replace δy (.) by ϕε (. − y), where {ϕε }ε>0 is a family of
functions such R that ϕε is supported on the ball Bε (0) of radius ε around 0
and satisfies Rn ϕε (x)dn x = 1.
Lemma 10.6 (Heinz). Suppose f ∈ C̄y1 (U, Rn ) and y ̸∈ CV(f ). Then the
degree defined as in (10.12) satisfies
Z
deg(f, U, y) = ϕε (f (x) − y)Jf (x)dn x (10.14)
U
for all positive ε smaller than a certain ε0 depending on f and y. Moreover,
supp(ϕε (f (.) − y)) ⊂ U for ε < dist(y, f (∂U )).
Z N Z
X
ϕε (f (x) − y)Jf (x)dn x = ϕε (f (x) − y)Jf (x)dn x
U i=1 U (xi )
N
X Z
= sign(Jf (xi )) ϕε (x̃)dn x̃ = deg(f, U, y),
i=1 Bε0 (0)
Our new integral representation makes sense even for critical values. But
since ε0 depends on f and y, continuity is not clear. This will be tackled
next.
The key idea is to show that the integral representation is independent
of ε as long as ε < dist(y, f (∂U )). To this end we will rewrite the difference
as an integral over a divergence supported in U and then apply the Gauss–
Green theorem. For this purpose the following result will be used.
Proof. We compute
n
X n
X
div Df (u) = ∂xj Df (u)j = Df (u)j,k ,
j=1 j,k=1
where Df (u)j,k is the determinant of the matrix obtained from the matrix
associated with Df (u)j by applying ∂xj to the k-th column. Since ∂xj ∂xk f =
∂xk ∂xj f we infer Df (u)j,k = −Df (u)k,j , j ̸= k, by exchanging the k-th and
the j-th column. Hence
n
X
div Df (u) = Df (u)i,i .
i=1
(i,j)
Now let Jf (x) denote the (i, j) cofactor of df (x) and recall the cofactor
(i,j)
expansion of the determinant ni=1 Jf ∂xi fk = δj,k Jf . Using this to expand
P
298 10. The Brouwer mapping degree
as required. □
where f˜ ∈ C̄y1 (U, Rn ) is in the same component of C̄y (U, Rn ), say ∥f − f˜∥∞ <
dist(y, f (∂U )), such that y ∈ RV(f˜).
Proof. We will first show that our integral formula works in fact for all
ε < ρ := dist(y, f (∂U )). For this we will make some additional assumptions:
Let f ∈ C̄ 2 (U, Rn ) and choose Ra family of functions ϕε ∈ C ∞ ((0, ∞)) with
ε
supp(ϕε ) ⊂ (0, ε) such that Sn 0 ϕ(r)rn−1 dr = 1. Consider
Z
Iε (f, U, y) := ϕε (|f (x) − y|)Jf (x)dn x.
U
Then I := Iε1 − Iε2 will be of the same form but with ϕε replaced
Rρ by φ :=
ϕε1 −ϕε2 , where φ ∈ C ∞ ((0, ∞)) with supp(φ) ⊂ (0, ρ) and 0 φ(r)rn−1 dr =
0. To show that I = 0 we will use our previous lemma with u chosen such
that div(u(x)) = φ(|x|). To this end we make the ansatz u(x) = ψ(|x|)x
such that div(u(x)) = |x|ψ ′ (|x|) + n ψ(|x|). Our requirement now leads to
an ordinary differential equation whose solution is
Z r
1
ψ(r) = n sn−1 φ(s)ds.
r 0
Moreover, one checks ψ ∈ C ∞ ((0, ∞)) with supp(ψ) ⊂ (0, ρ). Thus our
lemma shows Z
I= div Df −y (u)dn x
U
and since the integrand vanishes in a neighborhood of ∂U we can extend it
to all of Rn by setting it zero outside U and choose a cube Q ⊃ U . Then
elementary coordinatewise integration gives I = Q div Df −y (u)dn x = 0.
R
10.3. Extension of the determinant formula 299
(1 − t)f (x) − tx must have a zero (t0 , x0 ) ∈ (0, 1) × ∂U and hence f (x0 ) =
1−t0 x0 . Otherwise, if deg(f, U, 0) = −1 we can apply the same argument to
t0
where the sum is even since for every x ∈ f −1 (0) \ {0} we also have −x ∈
f −1 (0) \ {0} as well as Jf (x) = Jf (−x).
Hence we need to reduce the general case to this one. Clearly if f ∈
C̄0 (U, Rn ) we can choose an approximating f0 ∈ C̄01 (U, Rn ) and replacing f0
by its odd part 12 (f0 (x) − f0 (−x)) we can assume f0 to be odd. Moreover, if
Jf0 (0) = 0 we can replace f0 by f0 (x) + δx such that 0 is regular. However,
if we choose a nearby regular value y and consider f0 (x) − y we have the
problem that constant functions are even. Hence we will try the next best
thing and perturb by a function which is constant in all except one direction.
To this end we choose an odd function φ ∈ C 1 (R) such that φ′ (0) = 0 (since
we don’t want to alter the behavior at 0) and φ(t) ̸= 0 for t ̸= 0. Now we
consider f1 (x) = f0 (x) − φ(x1 )y 1 and note
f0 (x) f0 (x)
df1 (x) = df0 (x) − dφ(x1 )y 1 = df0 (x) − dφ(x1 ) = φ(x1 )d
φ(x1 ) φ(x1 )
for every x ∈ U1 := {x ∈ U |x1 ̸= 0} with f1 (x) = 0. Hence if y 1 is chosen
f0 (x)
such that y 1 ∈ RV(h1 ), where h1 : U1 → Rn , x 7→ φ(x 1)
, then 0 will be
10.3. Extension of the determinant formula 301
At first sight the obvious conclusion that an odd function has a zero
does not seem too spectacular since the fact that f is odd already implies
f (0) = 0. However, the result gets more interesting upon observing that it
suffices when the boundary values are odd. Moreover, local constancy of the
degree implies that f does not only attain 0 but also any y in a neighborhood
of 0. The next two important consequences are based on this observation:
This theorem is often illustrated by the fact that there are always two
opposite points on the earth which have the same weather (in the sense that
they have the same temperature and the same pressure). In a similar manner
one can also derive the invariance of domain theorem.
Proof. Suppose there where such a map and extend it to a map from U to
Rn by setting the additional coordinates equal to zero. The resulting map
contradicts the invariance of domain theorem. □
f ◦ R ∈ C(B̄ρ (0), B̄ρ (0)). By our previous analysis, there is a fixed point
x = f˜(x) ∈ conv(f (K)) ⊆ K. □
Proof. We equip Rn with the norm |x|1 := nj=1 |xj | and set ∆ := {x ∈
P
Ax
f : ∆ → ∆, x 7→
|Ax|1
has a fixed point x0 by the Brouwer fixed point theorem. Then Ax0 =
|Ax0 |1 x0 and x0 has positive components since Am x0 = |Ax0 |m
1 x0 has. □
v1 v2
For each vertex vi in this subdivision pick an element yi ∈ f (vi ). Now de-
fine f k (vi ) = yi and extend f k to the interior of each subsimplex as before.
10.5. Kakutani’s fixed point theorem and applications to game theory 305
m
X m
X
xk = λki vik = λki yik , yik = f k (vik ). (10.18)
i=1 i=1
If f (x) contains precisely one point for all x, then Kakutani’s theorem
reduces to the Brouwer’s theorem (show that the closedness of Γ is equivalent
to continuity of f ).
Now we want to see how this applies to game theory.
An n-person game consists of n players who have mi possible actions to
choose from. The set of all possible actions for the i-th player will be denoted
by Φi = {1, . . . , mi }. An element φi ∈ Φi is also called a pure strategy for
reasons to become clear in a moment. Once all players have chosen their
move φi , the payoff for each player is given by the payoff function
n
Ri (φ) ∈ R, φ = (φ1 , . . . , φn ) ∈ Φ = Φi (10.19)
i=1
of the i-th player. We will consider the case where the game is repeated a
large number of times and where in each step the players choose their action
according to a fixed strategy. Here a strategy si for the i-th player is a
probability distribution on Φi , that is, si = (s1i , . . . , sm
i ) such that si ≥ 0
i k
Pmi k
and k=1 si = 1. The set of all possible strategies for the i-th player is
denoted by Si . The number ski is the probability for the k-th pure strategy
to be chosen. Consequently, if s = (s1 , . . . , sn ) ∈ S = ni=1 Si is a collection
of strategies, then the probability that a given collection of pure strategies
gets chosen is
n
Y
s(φ) = si (φ), si (φ) = ski i , φ = (k1 , . . . , kn ) ∈ Φ (10.20)
i=1
306 10. The Brouwer mapping degree
(assuming all players make their choice independently) and the expected
payoff for player i is X
Ri (s) = s(φ)Ri (φ). (10.21)
φ∈Φ
By construction, Ri : S → R is polynomial and hence in particular continu-
ous.
The question is of course, what is an optimal strategy for a player? If
the other strategies are known, a best reply of player i against s would be
a strategy si satisfying
Ri (s \ si ) = max Ri (s \ s̃i ) (10.22)
s̃i ∈Si
Of course, both players could get the payoff 1 if they both agree to cooperate.
But if one would break this agreement in order to increase his payoff, the
other one would get less. Hence it might be safer to defect.
Now that we have seen that Nash equilibria are a useful concept, we
want to know when such an equilibrium exists. Luckily we have the following
result.
Theorem 10.17 (Nash). Every n-person game has at least one Nash equi-
librium.
X
deg(g ◦ f, U, y) = deg(f, U, Gj ) deg(g, Gj , y), (10.27)
j
where only finitely many terms in the sum are nonzero (and in particu-
lar, summands corresponding to unbounded components are considered to
be zero).
straightforward calculation
X
deg(g ◦ f, U, y) = sign(Jg◦f (x))
x∈(g◦f )−1 ({y})
X
= sign(Jg (f (x))) sign(Jf (x))
x∈(g◦f )−1 ({y})
X X
= sign(Jg (z)) sign(Jf (x))
z∈g −1 ({y}) x∈f −1 ({z}))
X
= sign(Jg (z)) deg(f, U, z)
z∈g −1 ({y})
Now choose f˜ ∈ C 1 such that |f (x) − f˜(x)| < 2−1 dist(g −1 ({y}), f (∂U ))
for x ∈ U and define G̃j , L̃l accordingly. Then we have Ll ∩ g −1 ({y}) =
L̃l ∩ g −1 ({y}) by Theorem 10.1 (iii) and hence deg(g, L̃l , y) = deg(g, Ll , y)
by Theorem 10.1 (i) implying
m̃
X
deg(g ◦ f, U, y) = deg(g ◦ f˜, U, y) = deg(f˜, U, G̃j ) deg(g, G̃j , y)
j=1
X X
= l deg(g, L̃l , y) = l deg(g, Ll , y)
l̸=0 l̸=0
ml
XX m
X
= l deg(g, Gj l , y) = deg(f, U, Gj ) deg(g, Gj , y)
k
l̸=0 k=1 j=1
310 10. The Brouwer mapping degree
The Leray–Schauder
mapping degree
311
312 11. The Leray–Schauder mapping degree
Our next aim is to tackle the infinite dimensional case. The following
example due to Kakutani shows that the Brouwer fixed point theorem (and
hence also the Brouwer degree) does not generalize to infinite dimensions
directly.
Example 11.1. Let X be the Hilbert space ℓ2 (N) and let R be the right
shift
p given by Rx := (0, px1 , x2 , . . . ). Define f : B̄1 (0) → B̄1 (0), x 7→
1 − ∥x∥ δ1 + Rx = ( 1 − ∥x∥2 , x1 , x2 , . . . ). Then a short calculation
2
shows ∥f (x)∥2 p
= (1 − ∥x∥2 ) + ∥x∥2 = 1 and any fixed point must satisfy
∥x∥ = 1, x1 = 1 − ∥x∥2 = 0 and xj+1 = xj , j ∈ N giving the contradiction
xj = 0, j ∈ N. ⋄
However, by the reduction property we expect that the degree should
hold for functions of the type I + F , where F has finite dimensional range.
In fact, it should work for functions which can be approximated by such
functions. Hence as a preparation we will investigate this class of functions.
Proof. Pick {xi }ni=1 ⊆ K such that ni=1 Bε (xi ) covers K. Let {ϕi }ni=1 be
S
a partition of unity (restricted to K) subordinate
Pn to {Bε (xi )}i=1 , that is,
n
ϕi ∈ C(K, [0, 1]) with supp(ϕi ) ⊂ Bε (xi ) and i=1 ϕi (x) = 1, x ∈ K. Set
n
X
Pε (x) = ϕi (x)xi ,
i=1
then
n
X n
X n
X
|Pε (x) − x| = | ϕi (x)x − ϕi (x)xi | ≤ ϕi (x)|x − xi | ≤ ε. □
i=1 i=1 i=1
xnm + F (xnm ) such that ynm → y. As before this implies xnm → x and thus
(I + F )−1 (K) is compact. □
Finally note that if F ∈ K(U , Y ) and G ∈ C(Y, Z), then G◦F ∈ K(U , Z)
and similarly, if G ∈ Cb (V , U ), then F ◦ G ∈ K(V , Y ).
Now we are all set for the definition of the Leray–Schauder degree, that
is, for the extension of our degree to infinite dimensional Banach spaces.
Proof. Except for (iv) all statements follow easily from the definition of the
degree and the corresponding property for the degree in finite dimensional
spaces. Considering H(t, x) − y(t), we can assume y(t) = 0 by (i). Since
H([0, 1], ∂U ) is compact, we have ρ = dist(y, H([0, 1], ∂U )) > 0. By Theo-
rem 11.2 we can pick H1 ∈ F([0, 1] × U, X) such that |H(t) − H1 (t)| < ρ,
t ∈ [0, 1]. This implies deg(I + H(t), U, 0) = deg(I + H1 (t), U, 0) and the rest
follows from Theorem 10.2. □
In addition, Theorem 10.1 and Theorem 10.2 hold for the new situation
as well (no changes are needed in the proofs).
Theorem 11.5. Let F, G ∈ K̄y (U, X), then the following statements hold.
(i). We have deg(I + F, ∅, y) = 0. Moreover, if Ui , 1 ≤ i ≤ N , are
disjoint open subsets of U such that y ̸∈ (I + F )(U \ N
S
PN i=1 Ui ), then
deg(I + F, U, y) = i=1 deg(I + F, Ui , y).
(ii). If y ̸∈ (I + F )(U ), then deg(I + F, U, y) = 0 (but not the other way
round). Equivalently, if deg(I + F, U, y) ̸= 0, then y ∈ (I + F )(U ).
(iii). If |F (x) − G(x)| < dist(y, (I + F )(∂U )), x ∈ ∂U , then deg(I +
F, U, y) = deg(I + G, U, y). In particular, this is true if F (x) =
G(x) for x ∈ ∂U .
(iv). deg(I + ., U, y) is constant on each component of K̄y (U, X).
(v). deg(I + F, U, .) is constant on each component of X \ (I + F )(∂U ).
In the same way as in the finite dimensional case we also obtain the
invariance of domain theorem.
Now we can extend the Brouwer fixed point theorem to infinite dimen-
sional spaces as well.
Theorem 11.9 (Schauder fixed point). Let K be a closed, convex, and
bounded subset of a Banach space X. If F ∈ K(K, K), then F has at least
one fixed point. The result remains valid if K is only homeomorphic to a
closed, convex, and bounded subset.
Proof. Consider the open cover {Bρ(x) (x)}x∈X\K for X \ K, where ρ(x) =
dist(x, K)/2. Choose a locally finite refinement {Oλ }λ∈Λ of this cover (see
Lemma B.14) and define
dist(x, X \ Oλ )
ϕλ (x) := P .
µ∈Λ dist(x, X \ Oµ )
Set X
F̄ (x) := ϕλ (x)F (xλ ) for x ∈ X \ K,
λ∈Λ
11.4. The Leray–Schauder principle 317
Theorem 11.11. Let U ⊂ X be open and bounded and let F ∈ K(U , X).
Suppose there is an x0 ∈ U such that
Corollary 11.12. Let F ∈ K(B̄ρ (0), X). Then F has a fixed point if one of
the following conditions holds.
(i) F (∂Bρ (0)) ⊆ B̄ρ (0) (Rothe).
(ii) |F (x) − x|2 ≥ |F (x)|2 − |x|2 for x ∈ ∂Bρ (0) (Altman).
(iii) X is a Hilbert space and ⟨F (x), x⟩ ≤ |x|2 for x ∈ ∂Bρ (0) (Kras-
nosel’skii).
Proof. Our strategy is to verify (11.7) with x0 = 0. (i). F (∂Bρ (0)) ⊆ B̄ρ (0)
and F (x) = αx for |x| = ρ implies |α|ρ ≤ ρ and hence (11.7) holds. (ii).
F (x) = αx for |x| = ρ implies (α − 1)2 ρ2 ≥ (α2 − 1)ρ2 and hence α ≤ 1.
(iii). Special case of (ii) since |F (x) − x|2 = |F (x)|2 − 2⟨F (x), x⟩ + |x|2 . □
≤ (b − a)ε1 + ε2 M = ε.
This implies that F (U ) is relatively compact by the Arzelà–Ascoli theorem
(Theorem B.40). Thus F is compact. □
Proof. Note that, by our assumption on λ, λF + y maps B̄ρ (y) into itself.
Now apply the Schauder fixed point theorem. □
This result immediately gives the Peano theorem for ordinary differential
equations.
320 11. The Leray–Schauder mapping degree
and the first part follows from our previous theorem. To show the second,
fix ε > 0 and assume M (ε, ρ) ≤ M̃ (ε)(1 + ρ). Then
Z t Z t
|x(t)| ≤ |f (s, x(s))|ds ≤ M̃ (ε) (1 + |x(s)|)ds
0 0
where η > 0 is the viscosity constant and ρ > 0 is the density of the fluid.
In addition to the incompressibility condition ∇v = 0 we also require the
boundary condition v|∂U = 0, which follows from experimental observations.
In what follows we will only consider the stationary Navier–Stokes equa-
tion
0 = η∆v − ρ(v · ∇)v − ∇p + K. (11.12)
Our first step is to switch to a weak formulation and rewrite this equation
in integral form, which is more suitable for our further analysis. We pick as
underlying Hilbert space H01 (U, R3 ) with scalar product
X3 Z
⟨u, v⟩ = (∂j ui )(∂j vi )dx. (11.13)
i,j=1 U
Recall that by the Poincaré inequality (Theorem 7.38 from [37]) the corre-
sponding norm is equivalent to the usual one. In order to take care of the
incompressibility condition we will choose
X := {v ∈ H01 (U, R3 )|∇ · v = 0}. (11.14)
as our configuration space (check that this is a closed subspace of H01 (U, R3 )).
Now we multiply (11.12) by w ∈ X and integrate over U
Z Z
η∆v − ρ(v · ∇)v − K · w d x = (∇p) · w d3 x
3
U
ZU
= p(∇w)d3 x = 0, (11.15)
U
where we have used integration by parts (Lemma 7.11 from [37] (iii)) to
conclude that the pressure term drops out of our picture. Using further inte-
gration by parts we finally arrive at the weak formulation of the stationary
Navier–Stokes equation
Z
η⟨v, w⟩ − a(v, v, w) − K · w d3 x = 0, for all w ∈ X , (11.16)
U
where
3 Z
X
a(u, v, w) := uk vj (∂k wj ) d3 x. (11.17)
j,k=1 U
In other words, (11.16) represents a necessary solubility condition for the
Navier–Stokes equations and a solution of (11.16) will also be called a weak
solution. If we can show that a weak solution is in H 2 (U, R3 ), then we can
undo the integration by parts and obtain again (11.15). Since the integral
on the left-hand side vanishes for all w ∈ X , one can conclude that the
expression in parenthesis must be the gradient of some function p ∈ L2 (U, R)
and hence one recovers the original equation. In particular, note that p
follows from v up to a constant if U is connected.
322 11. The Leray–Schauder mapping degree
and hence
ηv − B(v, v) = K̃. (11.23)
So in order to apply the theory from our previous chapter, we choose the
Banach space Y := L4 (U, R3 ) such that X ,→ Y is compact by the Rellich–
Kondrachov theorem (Theorem 7.35 from [37]).
Motivated by this analysis we formulate the following theorem which
implies existence of weak solutions and uniqueness for sufficiently small outer
forces.
11.5. Applications to integral and differential equations 323
Monotone maps
Proof. Our first assumption implies that G(x) = F (x) − y satisfies G(x)x =
F (x)x − yx > 0 for |x| sufficiently large. Hence the first claim follows from
Problem 10.2. The second claim is trivial. □
325
326 12. Monotone maps
Proof. Set
G(x) := x − t(F (x) − y), t > 0,
then F (x) = y is equivalent to the fixed point equation
G(x) = x.
It remains to show that G is a contraction. We compute
∥G(x) − G(x̃)∥2 = ∥x − x̃∥2 − 2t⟨F (x) − F (x̃), x − x̃⟩ + t2 ∥F (x) − F (x̃)∥2
C
≤ (1 − 2 (Lt) + (Lt)2 )∥x − x̃∥2 ,
L
where L is a Lipschitz constant for F (i.e., ∥F (x) − F (x̃)∥ ≤ L∥x − x̃∥).
Thus, if t ∈ (0, 2C
L2
), G is a uniform contraction and the rest follows from the
uniform contraction principle. □
Again observe that our proof is constructive. In fact, the best choice
for t is clearly t = LC2 such that the contraction constant θ = 1 − ( C
L ) is
2
⟨Ax, x⟩ ≥ C∥x∥2
and
|a(x, z) − a(y, z)| ≤ L∥z∥∥x − y∥. (12.12)
Then there is a unique x ∈ H such that (12.10) holds.
Proof. By the Riesz lemma (Theorem 2.10) there are elements F (x) ∈ H
and z ∈ H such that a(x, y) = b(y) is equivalent to ⟨F (x) − z, y⟩ = 0, y ∈ H,
and hence to
F (x) = z.
By (12.11) the map F is strongly monotone. Moreover, by (12.12) we infer
∥xn ∥ ≤ R. (12.17)
implies F (x) = y.
At the beginning of the 20th century Russell showed with his famous paradox
"Is {x|x ̸∈ x} a set?" that naive set theory can lead to contradictions. Hence
it was replaced by axiomatic set theory, more specific we will take the
Zermelo–Fraenkel set theory (ZF)1, which assumes existence of some
sets (like the empty set and the integers) and defines what operations are al-
lowed. Somewhat informally (i.e. without writing them using the symbolism
of first order logic) they can be stated as follows:
• Axiom of empty set. There is a set ∅ which contains no elements.
• Axiom of extensionality. Two sets A and B are equal A = B
if they contain the same elements. If a set A contains all elements
from a set B, it is called a subset A ⊆ B. In particular A ⊆ B and
B ⊆ A if and only if A = B.
The last axiom implies that the empty set is unique and that any set which
is not equal to the empty set has at least one element.
• Axiom of pairing. If A and B are sets, then there exists a set
{A, B} which contains A and B as elements. One writes {A, A} =
{A}. By the axiom of extensionality we have {A, B} = {B, A}.
• Axiom of union. S Given a set F whose elements are again sets,
there is a set A = F containing every element that is a member of
some member ofSF. In particular, given two sets A, B there exists
a set A ∪ B = {A, B} consisting of the elements of both sets.
Note that this definition ensures that the union is commutative
331
332 A. Some set theory
Proof. Otherwise the set of all k for which A(k) is false had a least element
k0 . But by our choice of k0 , A(l) holds for all l ≺ k0 and thus for k0
contradicting our assumption. □
We will also frequently use the cardinality of sets: Two sets A and
B have the same cardinality, written as |A| = |B|, if there is a bijection
φ : A → B. We write |A| ≤ |B| if there is an injective map φ : A → B. Note
that |A| ≤ |B| and |B| ≤ |C| implies |A| ≤ |C|. A set A is called infinite if
|A| ≥ |N|, countable if |A| ≤ |N|, and countably infinite if |A| = |N|.
Theorem A.3 (Schröder–Bernstein3). |A| ≤ |B| and |B| ≤ |A| implies
|A| = |B|.
the union of all domains) and by Zorn’s lemma there is a maximal element
φm . For φm we have either Am = A or φm (Am ) = B since otherwise there
is some x ∈ A \ Am and some y ∈ B \ f (Am ) which could be used to extend
φm to Am ∪ {x} by setting φ(x) = y. But if Am = A we have |A| ≤ |B| and
if φm (Am ) = B we have |B| ≤ |A|. □
The cardinality of the power set P(A) is strictly larger than the cardi-
nality of A.
Theorem A.5 (Cantor4). |A| < |P(A)|.
This innocent looking result also caused some grief when announced by
Cantor as it clearly gives a contradiction when applied to the set of all sets
(which is fortunately not a legal object in ZFC).
The following result and its corollary will be used to determine the car-
dinality of unions and products.
Lemma A.6. Any infinite set can be written as a disjoint union of countably
infinite sets.
Proof. Without loss of generality we can assume |B| ≤ |A| (otherwise ex-
change both sets). Then |A| ≤ |A × B| ≤ |A × A| and it suffices to show
|A × A| = |A|.
We proceed as before and consider the set of all bijective functions φα :
Aα → Aα × Aα with Aα ⊆ A with the same partial ordering as before. By
Zorn’s lemma there is a maximal element φm . Let Am be its domain and
let A′m = A \ Am . We claim that |A′m | < |Am . If not, A′m had a subset
A′′m with the same cardinality of Am and hence we had a bijection from
A′′m → A′′m × A′′m which could be used to extend φ. So |A′m | < |Am and thus
|A| = |Am ∪ A′m | = |Am |. Since we have shown |Am × Am | = |Am | the claim
follows. □
Example A.4. Note that for A = N we have |P(N)| = |R|. Indeed, since
|R| = |Z × [0, 1)| = |[0, 1)| it suffices to show |P(N)| = |[0, 1)|. To this
end note that P(N) can be identified with the set of all sequences with val-
ues in {0, 1} (the value of the sequence at a point tells us wether it is in the
corresponding subset). Now every point in [0, 1) can be mapped to such a se-
quence via its binary expansion. This map is injective but not surjective since
a point can have different binary expansions: |[0, 1)| ≤ |P(N)|.PConversely,
given a sequence an ∈ {0, 1} we can map it to the number ∞ n=1 an 4
−n .
Since this map is again injective (note that we avoid expansions which are
eventually 1) we get |P(N)| ≤ |[0, 1)|. ⋄
Hence we have
|N| < |P(N)| = |R| (A.1)
and the continuum hypothesis states that there are no sets whose cardi-
nality lie in between. It was shown by Gödel6 and Cohen7 that it, as well
as its negation, is consistent with ZFC and hence cannot be decided within
this framework.
Problem A.1. Show that Zorn’s lemma implies the axiom of choice. (Hint:
Consider the set of all partial choice functions defined on a subset.)
Problem A.2. Show |RN | = |R|. (Hint: Without loss we can replace R by
(0, 1) and identify each x ∈ (0, 1) with its decimal expansion. Now the digits
in a given sequence are indexed by two countable parameters.)
This chapter collects some basic facts from metric and topological spaces as
a reference for the main text. I presume that you are familiar with most of
these topics from your calculus course. As a general reference I can warmly
recommend Kelley’s classical book [19] or the nice book by Kaplansky [17].
As always such a brief compilation introduces a zoo of properties. While
sometimes the connection between these properties are straightforward, oth-
ertimes they might be quite tricky. So if at some point you are wondering if
there exists an infinite multi-variable sub-polynormal Woffle which does not
satisfy the lower regular Q-property, start searching in the book by Steen
and Seebach [34].
B.1. Basics
One of the key concepts in analysis is approximation which boils down to
convergence. To define convergence requires the notion of distance. Moti-
vated by the Euclidean distance one is lead to the following definition:
A metric space is a set X together with a distance function d : X ×X →
[0, ∞) such that for arbitrary points x, y, z ∈ X we have
339
340 B. Metric and topological spaces
That is, O is closed under finite intersections and arbitrary unions. In-
deed, (i) is obvious, (ii) follows since the intersection of two open balls cen-
tered at x is again an open ball centered at x (explicitly Br1 (x) ∩ Br2 (x) =
Bmin(r1 ,r2 ) (x)), and (iii) follows since every ball contained in one of the sets
is also contained in the union.
B.1. Basics 341
Example B.4. Of course every open ball Br (x) is an open set since y ∈
Br (x) implies Bs (y) ⊆ Br (x) for s < r − d(x, y). ⋄
Now it turns out that for defining convergence, a distance is slightly more
than what is actually needed. In fact, it suffices to know when a point is
in the neighborhood of another point. And if we adapt the definition of a
neighborhood by requiring it to contain an open set around x, then we see
that it suffices to know when a set is open. This motivates the following
definition:
A space X together with a family of sets O, the open sets, satisfying
(i)–(iii), is called a topological space. The notions of interior point, limit
point, and neighborhood carry over to topological spaces if we replace open
ball around x by open set containing x.
There are usually different choices for the topology. Two not too inter-
esting examples are the trivial topology O = {∅, X} and the discrete
topology O = P(X) (the power set of X). Given two topologies O1 and
O2 on X, O1 is called weaker (or coarser) than O2 if O1 ⊆ O2 . Conversely,
O1 is called stronger (or finer) than O2 if O2 ⊆ O1 .
Given two topologies on X, their intersection will again be a topology on
X. In fact, the intersection of an arbitrary collection of topologies is again
a topology and one can define the weakest topology with a certain property
to be the intersection of all topologies with this property (provided there is
one at all).
Example B.5. Note that different metrics can give rise to the same topology.
For example, we can equip Rn (or Cn ) with the Euclidean distance d(x, y)
as before or we could also use
n
X
˜ y) :=
d(x, |xk − yk |. (B.3)
k=1
Then v
n u n n
1 X uX X
√ |xk | ≤ t 2
|xk | ≤ |xk | (B.4)
n
k=1 k=1 k=1
shows Br/√n (x) ⊆ B̃r (x) ⊆ Br (x), where B, B̃ are balls computed using d,
˜ respectively. In particular, both distances will lead to the same notion of
d,
convergence. ⋄
Example B.6. We can always replace a metric d by the bounded metric
˜ y) := d(x, y)
d(x, (B.5)
1 + d(x, y)
without changing the topology (since the family of open balls does not
change: Bδ (x) = B̃δ/(1+δ) (x)). To see that d˜ is again a metric, observe
342 B. Metric and topological spaces
Proof. To see the converse let x and U (x) be given. Then U (x) contains an
open set O containing x which can be written as a union of elements from
B. One of the elements in this union must contain x and this is the set we
are looking for. □
The next definition will ensure that limits are unique: A topological
space is called a Hausdorff space if for any two different points there are
always two disjoint neighborhoods.
Example B.12. Any metric space is a Hausdorff space: Given two different
points x and y, the balls Bd/2 (x) and Bd/2 (y), where d := d(x, y) > 0,
are disjoint neighborhoods. A pseudometric space will in general not be
344 B. Metric and topological spaces
That is, closed sets are closed under finite unions and arbitrary intersections.
The smallest closed set containing a given set U is called the closure
\
U := C, (B.7)
C∈C,U ⊆C
and the largest open set contained in a given set U is called the interior
[
U ◦ := O. (B.8)
O∈O,O⊆U
It is not hard to see that the closure satisfies the following axioms (Kuratowski
closure axioms):
(i) ∅ = ∅,
(ii) U ⊆ U ,
(iii) U = U ,
(iv) U ∪ V = U ∪ V .
In fact, one can show that these axioms can equivalently be used to define the
topology by observing that the closed sets are precisely those which satisfy
U = U . Similarly, the open sets are precisely those which satisfy U ◦ = U .
Lemma B.3. Let X be a topological space. Then the interior of U is the
set of all interior points of U , and the closure of U is the union of U with
all limit points of U . Moreover, ∂U = U \ U ◦ .
Proof. The first claim is straightforward. For the second claim observe that
by Problem B.7 we have that U = (X \ (X \ U )◦ ), that is, the closure is the
set of all points which are not interior points of the complement. That is,
x ̸∈ U iff there is some open set O containing x with O ⊆ X \ U . Hence,
x ∈ U iff for all open sets O containing x we have O ̸⊆ X \ U , that is,
O ∩ U ̸= ∅. Hence, x ∈ U iff x ∈ U or if x is a limit point of U . The last
claim is left as Problem B.8. □
B.2. Convergence and completeness 345
(this is not true for a pseudometric). We will also frequently identify the
sequence with its values xn for simplicity of notation.
Note that convergent sequences are bounded. Here a set U ⊆ X is called
bounded if it is contained within a ball, that is, U ⊆ Br (x) for some x ∈ X
and r > 0.
Note that convergence can also be equivalently formulated in topological
terms: A sequence (xn )∞ n=1 converges to x if and only if for every neighbor-
hood U (x) of x there is some N ∈ N such that xn ∈ U (x) for n ≥ N . In a
Hausdorff space the limit is unique. However, sequences usually do not suf-
fice to describe a topology and, in general, definitions in terms of sequences
are weaker (see the example below). This could be avoided by using general-
ized sequences, so-called nets, where the index set N is replaced by arbitrary
directed sets. We will not pursue this here.
Example B.14. For example, we can call a set U ⊆ X sequentially closed
if every convergent sequence from U also has its limit in U . If U is closed,
then every point in the complement is an interior point of the complement,
thus no sequence from U can converge to such a point. Hence every closed
set is sequentially closed. In a metric space (or more generally in a first
countable space) we can find a sequence for every limit point x by choosing
a point (different from x) from every set in a neighborhood base. Hence the
converse is also true in this case. ⋄
Note that the argument from the previous example shows that in a first
countable space sequentially closed is the same as closed. In particular, in
this case the family of closed sets is uniquely determined by the convergent
sequences:
Lemma B.4. Two first countable topologies agree if and only if they give
rise to the same convergent sequences.
to show that a notion of convergence does not stem from a topology (cf.
Problem 5.11 from [37]).
In summary: A metric induces a natural topology and a topology induces
a natural notion of convergence. However, a notion of convergence might
not stem form a topology (or different topologies might give rise to the same
notion of convergence) and a topology might not stem from a metric.
A sequence (xn )∞n=1 ∈ X is called a Cauchy sequence if for every
N
Proof. From every dense set we get a countable base by considering open
balls with rational radii and centers in the dense set. Conversely, from every
countable base we obtain a dense set by choosing an element from each set
in the base. □
348 B. Metric and topological spaces
Proof. Let A = {xn }n∈N be a dense set in X. The only problem is that A∩Y
might contain no elements at all. However, some elements of A must be at
least arbitrarily close to this intersection: Let J ⊆ N2 be the set of all pairs
(n, m) for which B1/m (xn )∩Y ̸= ∅ and choose some yn,m ∈ B1/m (xn )∩Y for
all (n, m) ∈ J. Then B = {yn,m }(n,m)∈J ⊆ Y is countable. To see that B is
dense, choose y ∈ Y . Then there is some sequence xnk with d(xnk , y) < 1/k.
Hence (nk , k) ∈ J and d(ynk ,k , y) ≤ d(ynk ,k , xnk ) + d(xnk , y) ≤ 2/k → 0. □
B.3. Functions
Next, we come to functions f : X → Y , x 7→ f (x). We use the usual
conventions f (U ) := {f (x)|x ∈ U } for U ⊆ X and f −1 (V ) := {x|f (x) ∈ V }
for V ⊆ Y . Note
U ⊆ f −1 (f (U )), f (f −1 (V )) ⊆ V. (B.14)
The set Ran(f ) := f (X) is called the range of f , and X is called the
domain of f . A function f is called injective or one-to-one if for each
y ∈ Y there is at most one x ∈ X with f (x) = y (i.e., f −1 ({y}) contains at
most one point) and surjective or onto if Ran(f ) = Y . A function f which
is both injective and surjective is called bijective.
Recall that we always have
[ [ \ \
f −1 ( Vα ) = f −1 (Vα ), f −1 ( Vα ) = f −1 (Vα ),
α α α α
f −1 (Y \ V ) = X \ f −1 (V ) (B.15)
as well as
[ [ \ \
f ( Uα ) = f (Uα ), f ( Uα ) ⊆ f (Uα ),
α α α α
f (X) \ f (U ) ⊆ f (X \ U ) (B.16)
with equality if f is injective.
A function f between metric spaces X and Y is called continuous at a
point x ∈ X if for every ε > 0 we can find a δ > 0 such that
dY (f (x), f (y)) ≤ ε if dX (x, y) < δ. (B.17)
350 B. Metric and topological spaces
Proof. (i) ⇒ (ii) is obvious. (ii) ⇒ (iii): If (iii) does not hold, there is a
neighborhood V of f (x) such that Bδ (x) ̸⊆ f −1 (V ) for every δ. Hence we
can choose a sequence xn ∈ B1/n (x) such that xn ̸∈ f −1 (V ). Thus xn → x
but f (xn ) ̸→ f (x). (iii) ⇒ (i): Choose V = Bε (f (x)) and observe that by
(iii), Bδ (x) ⊆ f −1 (V ) for some δ. □
Show that both are independent of the neighborhood base and satisfy
(i) lim inf x→x0 (−f (x)) = − lim supx→x0 f (x).
(ii) lim inf x→x0 (αf (x)) = α lim inf x→x0 f (x), α ≥ 0.
(iii) lim inf x→x0 (f (x) + g(x)) ≥ lim inf x→x0 f (x) + lim inf x→x0 g(x).
Moreover, show that
lim inf f (xn ) ≥ lim inf f (x), lim sup f (xn ) ≤ lim sup f (x)
n→∞ x→x0 n→∞ x→x0
Lemma B.11. Let X have the initial topology from a collection of functions
{fα : X → Yα }α∈A and let Z be another topological space. A function
f : Z → X is continuous (at z) if and only if fα ◦ f is continuous (at z) for
all α ∈ A.
If all Yα are Hausdorff and if the collection {fα }α∈A separates points,
that is for every x ̸= y there is some α with fα (x) ̸= fα (y), then X will
again be Hausdorff. Indeed for x ̸= y choose α such that fα (x) ̸= fα (y)
and let Uα , Vα be two disjoint neighborhoods separating fα (x), fα (y). Then
fα−1 (Uα ), fα−1 (Vα ) are two disjoint neighborhoods separating x, y. In partic-
ular, X = α∈A Xα is Hausdorff if all Xα are.
Note that a similar construction works in the other direction. Let {fα }α∈A
be a collection of functions fα : Xα → Y , where Xα are some topological
spaces. Then we can equip Y with the strongest topology (known as the
final topology) which makes all fα continuous. That is, we take as open
sets those for which fα−1 (O) is open for all α ∈ A.
Example B.19. Let ∼ be an equivalence relation on X with equivalence
classes [x] = {y ∈ X|x ∼ y}. Then the quotient topology on the set of
equivalence classes X/ ∼ is the final topology of the projection map π : X →
X/ ∼. ⋄
Example B.20. Let Xα be a collection of topological spaces. The disjoint
union
[
X := · Xα
α∈A
354 B. Metric and topological spaces
B.5. Compactness
A cover of a set Y ⊆ X is a family of sets {Uα } such that Y ⊆ α Uα . A
S
cover is called open if all Uα are open. Any subset of {Uα } which still covers
Y is called a subcover.
Lemma B.13 (Lindelöf1). If X is second countable, then every open cover
has a countable subcover.
Proof. Let {Uα } be an open cover for Y , and let B be a countable base.
Since every Uα can be written as a union of elements from B, the set of all
B ∈ B which satisfy B ⊆ Uα for some α form a countable open cover for Y .
1Ernst Leonard Lindelöf (1870–1946), Finnish mathematician
B.5. Compactness 355
Moreover, for every Bn in this set we can find an αn such that Bn ⊆ Uαn .
By construction, {Uαn } is a countable subcover. □
Proof. Denote the cover by {Oj }j∈N and introduce the sets
[
Ôj,n := B2−n (x), where
x∈Aj,n
[
Aj,n := {x ∈ Oj \ (O1 ∪ · · · ∪ Oj−1 )|x ̸∈ Ôk,l and B3·2−n (x) ⊆ Oj }.
k∈N,1≤l<n
Proof. (i) Observe that if {Oα } is an open cover for f (Y ), then {f −1 (Oα )}
is one for Y .
(ii) Let {Oα } be an open cover for the closed subset Y (in the induced
topology). Then there are open sets Õα with Oα = Õα ∩Y and {Õα }∪{X\Y }
is an open cover for X which has a finite subcover. This subcover induces a
finite subcover for Y .
(iii) Let Y ⊆ X be compact. We show that X \ Y is open. Fix x ∈ X \ Y
(if Y = X there is nothing to do). By the definition of Hausdorff, for
every y ∈ Y there are disjoint neighborhoods V (y) of y and Uy (x) of x. By
compactness
Tn of Y , there are y1 , . . . , yn such that the V (yj ) cover Y . But
then j=1 Uyj (x) is a neighborhood of x which does not intersect Y .
(iv) Note that a cover of the union is a cover for each individual set and
the union of the individual subcovers is the subcover we are looking for.
(v) Follows from (ii) and (iii) since an intersection of closed sets is closed.
□
Proof. It suffices to show that f maps closed sets to closed sets. By (ii)
every closed set is compact, by (i) its image is also compact, and by (iii) it
is also closed. □
Proof. We say that a family F of closed subsets of K has the finite inter-
section property if the intersection of every finite subfamily has nonempty
intersection. The collection of all such families which contain F is partially
ordered by inclusion and every chain has an upper bound (the union of all
B.5. Compactness 357
at least one of these two intervals, call it I1 , contains infinitely many ele-
ments of our sequence. Let y1 = xn1 be the first one. Subdivide I1 and pick
y2 = xn2 , with n2 > n1 as before. Proceeding like this, we obtain a Cauchy
sequence yn (note that by construction In+1 ⊆ In and hence |yn − ym | ≤ b−a 2n
for m ≥ n). □
Combining Theorem B.22 with Lemma B.16 (i) we also obtain the ex-
treme value theorem.
Theorem B.24 (Weierstraß). Let X be compact. Every continuous function
f : X → R attains its maximum and minimum.
3Eduard Heine (1821–1881), German mathematician
3Émil Borel (1871–1956), French mathematician and politician
4Bernard Bolzano (1781–1848), Bohemian mathematician, logician, philosopher, theologian
and Catholic priest
B.5. Compactness 359
Proof. (i) ⇒ (ii): Let {xn } be a dense set. Then the balls Bn,m = B1/m (xn )
form a base. Moreover, for every n there is some mn such that Bn,m is
relatively compact for m ≤ mn . Since those balls are still a base we are
done. (ii) ⇒ (iii): TakeS the union over the closures of all sets in the base.
(iii) ⇒ (vi): Let X = n Kn with Kn compact. Without loss Kn ⊆ Kn+1 .
For a given compact set K we can find a relatively compact open set V (K)
such that K ⊆ V (K) (cover K by relatively compact open balls and choose
a finite subcover). Now define Un = V (Un ). (vi) ⇒ (i): Each of the sets Un
has a countable dense subset by Corollary B.21. The union gives a countable
dense set for X. Since every x ∈ Un for some n, X is also locally compact. □
B.6. Separation
The distance between a point x ∈ X and a subset Y ⊆ X is
dist(x, Y ) := inf d(x, y). (B.22)
y∈Y
A topological space is called normal if for any two disjoint closed sets C1
and C2 , there are disjoint open sets O1 and O2 such that Cj ⊆ Oj , j = 1, 2.
Lemma B.28 (Urysohn). Let X be a topological space. Then X is normal
if and only if for every pair of disjoint closed sets C1 and C2 , there exists a
continuous function f : X → [0, 1] which is one on C1 and zero on C2 .
If in addition X is locally compact and C1 is compact, then f can be
chosen to have compact support.
that f := f s = f i is continuous.
Conversely, given f choose O1 := f −1 ([0, 1/2)) and O2 := f −1 ((1/2, 1]).
For the second claim, observe that there is an open set O0 such that O0
is compact and C1 ⊂ O0 ⊂ O0 ⊂ X \ C2 . In fact, for every x ∈ C1 , there
is a ball Bε (x) such that Bε (x) is compact and Bε (x) ⊂ X \ C2 . Since C1
is compact, finitely many of them cover C1 and we can choose the union of
those balls to be O0 . □
dist(x,C2 )
Example B.24. In a metric space we can choose f (x) := dist(x,C1 )+dist(x,C2 )
and hence every metric space is normal. ⋄
An important class of normal spaces are locally compact Hausdorff spaces.
Lemma B.29. Every locally compact Hausdorff space is normal.
Note that the proof easily extends to (e.g.) Hölder continuous functions
by replacing d by dα in the definition of f¯.
AP partition of unity is a collection of functions hα : X → [0, 1] such
that α hα (x) = 1. We will only consider the case when the partition of
unity is locally finite, that is, when every x has a neighborhood where all
but a finite number of the functions hα vanish. Moreover, given a cover {Oβ }
of X it is called subordinate to this cover if every hα has support contained
in some set Oβ from this cover.
In the case of subsets of Rn we are interested in the existence of smooth
partitions of unity. To this end recall that for every point x ∈ Rn there
364 B. Metric and topological spaces
r ) x=x0 = ϕ(0) = e .
ϕ( x−x 0 −1 ⋄
Lemma B.32. Let X ⊆ Rn be open and {Oj } a countable open cover.
Then there is a locally finite partition of unity of functions from Cc∞ (X)
subordinate to this cover.
Proof. Let Uj be as in Lemma B.25 (iv). For the compact set U j choose
finitely many bump functions h̃j,k such that h̃j,1 (x) + · · · + h̃j,kj (x) > 0 for
every x ∈ U j \ Uj−1 and such that supp(h̃j,k ) is contained in one of the Ok
and in Uj+1 \ Uj−1 . Then {h̃j,k }j,k is locally finite and hence h := j,k h̃j,k
P
B.7. Connectedness
Roughly speaking a topological space X is disconnected if it can be split
into two (nonempty) separated sets. This of course raises the question what
B.7. Connectedness 365
In this case the sets from the splitting are both open and closed. A topo-
logical space X is called connected if it cannot be split as above. That
is, in a connected space X the only sets which are both open and closed
are ∅ and X. This last observation is frequently used in proofs: If the set
where a property holds is both open and closed it must either hold nowhere
or everywhere. In particular, any continuous mapping from a connected to
a discrete space must be constant since the inverse image of a point is both
open and closed.
A subset of X is called (dis-)connected if it is (dis-)connected with respect
to the relative topology. In other words, a subset A ⊆ X is disconnected if
there are disjoint nonempty open sets U and V which split A according to
A = (U ∩ A) ∪ (V ∩ A).
Example B.26. In R the nonempty connected sets are precisely the inter-
vals (Problem B.33). Consequently A = [0, 1] ∪ [2, 3] is disconnected with
[0, 1] and [2, 3] being its components (to be defined precisely below). While
you might be reluctant to consider the closed interval [0, 1] as open, it is im-
portant to observe that it is the relative topology which is relevant here. ⋄
The maximal connected subsets (ordered by inclusion) of a nonempty
topological space X are called the connected components of X.
Example B.27. Consider Q ⊆ R. Then every rational point is its own
component (if a set of rational points contains more than one point there
would be an irrational point in between which can be used to split the set). ⋄
In many applications one also needs the following stronger concept. A
space X is called path-connected if any two points x, y ∈ X can be joined
by a path, that is a continuous map γ : [0, 1] → X with γ(0) = x and
γ(1) = y. A space is called locally (path-)connected if for every given
366 B. Metric and topological spaces
point and every open set containing that point there is a smaller open set
which is (path-)connected.
Example B.28. Every normed vector space is (locally) path-connected since
every ball is path-connected (consider straight lines). In fact this also holds
for locally convex spaces. Every open subset of a locally (path-)connected
space is locally (path-)connected. ⋄
Every path-connected space is connected. In fact, if X = U ∪ V were
disconnected but path-connected we could choose x ∈ U and y ∈ V plus a
path γ joining them. But this would give a splitting [0, 1] = γ −1 (U )∪γ −1 (V )
contradicting our assumption. The converse however is not true in general
as a space might be impassable (an example will follow).
Example B.29. The spaces R and Rn , n > 1, are not homeomorphic. In
fact, removing any point form R gives a disconnected space while removing
a point form Rn still leaves it (path-)connected. ⋄
We collect a few simple but useful properties below.
Lemma B.33. Suppose X and Y are topological spaces.
(i) Suppose f : X → Y is continuous. Then if X is (path-)connected
so is the image f (X).
(ii) Suppose
T
S A α ⊆ X are (path-)connected and α Aα ̸= ∅. Then
α Aα is (path-)connected
(iii) A ⊆ X is (path-)connected if and only if any two points x, y ∈ A
are contained in a (path-)connected set B ⊆ A
(iv) Suppose X1 , . . . , Xn are (path-)connected then so is nj=1 Xj .
(v) Suppose A ⊆ X is connected, then A is connected.
(vi) A locally path-connected space is path-connected if and only if it is
connected.
A few simple consequences are also worth while noting: If two different
components contain a common point, their union is again connected con-
tradicting maximality. Hence two different components are always disjoint.
Moreover, every point is contained in a component, namely the union of all
connected sets containing this point. In other words, the components of any
topological space X form a partition of X (i.e., they are disjoint, nonempty,
and their union is X). Moreover, every component is a closed subset of the
original space X. In the case where their number is finite we can take com-
plements and each component is also an open subset (the rational numbers
from our first example show that components are not open in general). In a
locally (path-)connected space, components are open and (path-)connected
by (vi) of the last lemma. Note also that in a second countable space an
open set can have at most countably many components (take those sets from
a countable base which are contained in some component, then we have a
surjective map from these sets to the components).
Example B.30. Consider the graph of the function f : (0, 1] → R, x 7→
sin( x1 ). Then its graph Γ(f ) ⊆ R2 is path-connected and the closure Γ(f ) =
Γ(f ) ∪ {0} × [−1, 1] is connected. However, Γ(f ) is not path-connected
as there is no path from (1, 0) to (0, 0). Indeed, suppose γ were such a
path. Then, since γ1 covers [0, 1] by the intermediate value theorem (see
368 B. Metric and topological spaces
2 + 2 = ε for n ≥ N .
ε ε
□
Proof. Choose a countable base B for X and let I the collection of all balls in
Cn with rational radius and center. Given O1 , . . . , Om ∈ B and I1 , . S. . , Im ∈
I we say that f ∈ Cc (X, C ) is adapted to these sets if supp(f ) ⊆ m
n
j=1 Oj
and f (Oj ) ⊆ Ij . The set of all tuples (Oj , Ij )1≤j≤m is countable and for
each tuple we choose a corresponding adapted function (if there exists one
at all). Then the set of these functions F is dense. It suffices to show that
the closure of F contains Cc (X, Cn ). So let f ∈ Cc (X, Cn ) and let ε > 0
be given. Then for every x ∈ X there is some neighborhood O(x) ∈ B such
that |f (x) − f (y)| < ε for y ∈ O(x). Since supp(f ) is compact, it can be
covered by O(x1 ), . . . , O(xm ). In particular f (O(xj )) ⊆ Bε (f (xj )) and we
can find a ball Ij of radius at most 2ε with f (O(xj )) ⊆ Ij . Now let g be
the function from F which is adapted to (O(xj ), Ij )1≤j≤m and observe that
|f (x) − g(x)| < 4ε since x ∈ O(xj ) implies f (x), g(x) ∈ Ij . □
Proof. Suppose the claim were wrong. Fix ε > 0. Then for every δn = n1
we can find xn , yn with dX (xn , yn ) < δn but dY (f (xn ), f (yn )) ≥ ε. Since
X is compact we can assume that xn converges to some x ∈ X (after pass-
ing to a subsequence if necessary). Then we also have yn → x implying
dY (f (xn ), f (yn )) → 0, a contradiction. □
and since K is compact, finitely many, say U (y1 ), . . . , U (yj ), cover K. Then
fz = max{fy1 ,z , . . . , fyj ,z } ∈ A
B.8. Continuous functions on metric spaces 373
Proof. Just observe that F̃ = {Re(f ), Im(f )|f ∈ F } satisfies the assump-
tion of the real version. Hence every real-valued continuous function can be
approximated by elements from the subalgebra generated by F̃ ; in particular,
this holds for the real and imaginary parts for every given complex-valued
function. Finally, note that the subalgebra spanned by F̃ is contained in the
∗-subalgebra spanned by F . □
Note that the additional requirement of being closed under complex con-
jugation is crucial: The functions holomorphic on the unit disc and contin-
uous on the boundary separate points, but they are not dense (since the
uniform limit of holomorphic functions is again holomorphic).
Corollary B.43. Suppose K is a compact topological space and consider
C(K). If F ⊂ C(K) separates points, then the closure of the ∗-subalgebra
generated by F is either C(K) or {f ∈ C(K)|f (t0 ) = 0} for some t0 ∈ K.
Proof. There are two possibilities: either all f ∈ F vanish at one point
t0 ∈ K (there can be at most one such point since F separates points) or
there is no such point.
If there is no such point, then the identity can be approximated by
elements in A: First of all note that |f | ∈ A if f ∈ A, since the polynomials
pn (t) used to prove this fact can be replaced by pn (t)−pn (0) which contain no
constant term. Hence for every point y we can find a nonnegative function
in A which is positive at y and by compactness we can find a finite sum
of such functions which is positive everywhere, say m ≤ f (t) ≤ M . Now
approximate min(m−1 t, t−1 ) by polynomials qn (t) (again a constant term is
374 B. Metric and topological spaces
Problem B.35. Show that the uniform limit of uniformly continuous func-
tions is again uniformly continuous.
Problem B.41. Let K ⊆ C be a compact set. Show that the set of all
functions f (z) = p(x, y), where p : R2 → C is polynomial and z = x + iy, is
dense in C(K).
Bibliography
377
378 Bibliography
379
380 Glossary of notation
383
384 Index
Young inequality, 10