Nonlinear Functional Analysis
Nonlinear Functional Analysis
Functional Analysis
Gerald Teschl
Graduate Studies
in Mathematics
Volume (to appear)
E-mail: Gerald.Teschl@univie.ac.at
URL: http://www.mat.univie.ac.at/~gerald/
Preface vii
iii
iv Contents
The present manuscript was written for my course Functional Analysis given
at the University of Vienna in winter 2004 and 2009. The second part are
the notes for my course Nonlinear Functional Analysis held at the University
of Vienna in Summer 1998, 2001, and 2018. The two parts are essentially in-
dependent. In particular, the first part does not assume any knowledge from
measure theory (at the expense of hardly mentioning Lp spaces). However,
there is an accompanying part on Real Analysis [47], where these topics are
covered.
It is updated whenever I find some errors and extended from time to
time. Hence you might want to make sure that you have the most recent
version, which is available from
http://www.mat.univie.ac.at/~gerald/ftp/book-fa/
Please do not redistribute this file or put a copy on your personal
webpage but link to the page above.
Goals
The main goal of the present book is to give students a concise introduc-
tion which gets to some interesting results without much ado while using a
sufficiently general approach suitable for further studies. Still I have tried
to always start with some interesting special cases and then work my way
up to the general theory. While this unavoidably leads to some duplications,
it usually provides much better motivation and implies that the core ma-
terial always comes first (while the more general results are then optional).
Moreover, this book is not written under the assumption that it will be
vii
viii Preface
read linearly starting with the first chapter and ending with the last. Con-
sequently, I have tried to separate core and optional materials as much as
possible while keeping the optional parts as independent as possible.
Furthermore, my aim is not to present an encyclopedic treatment but to
provide the reader with a versatile toolbox for further study. Moreover, in
contradistinction to many other books, I do not have a particular direction
in mind and hence I am trying to give a broad introduction which should
prepare you for diverse fields such as spectral theory, partial differential
equations, or probability theory. This is related to the fact that I am working
in mathematical physics, an area where you never know what mathematical
theory you will need next.
I have tried to keep a balance between verbosity and clarity in the sense
that I have tried to provide sufficient detail for being able to follow the argu-
ments but without drowning the key ideas in boring details. In particular,
you will find a show this from time to time encouraging the reader to check
the claims made (these tasks typically involve only simple routine calcula-
tions). Moreover, to make the presentation student friendly, I have tried
to include many worked out examples within the main text. Some of them
are standard counterexamples pointing out the limitations of theorems (and
explaining why the assumptions are important). Others show how to use the
theory in the investigation of practical examples.
Preliminaries
Content
To the teacher
topics from the second part or some material from unbounded operators in
Hilbert spaces following [46] (where one can start with Chapter 2) or from
unbounded operators in Banach spaces following the book by Kato [24] (e.g.
Sections 3.4, 3.5 and 4.1).
The third part gives a short basis for a course on nonlinear functional
analysis.
Problems relevant for the main text are marked with a "*". A Solutions
Manual will be available electronically for instructors only.
Acknowledgments
I wish to thank my readers, Kerstin Ammann, Phillip Bachler, Batuhan
Bayır, Alexander Beigl, Mikhail Botchkarev, Ho Boon Suan, Peng Du, Chris-
tian Ekstrand, Damir Ferizović, Michael Fischer, Raffaello Giulietti, Melanie
Graf, Josef Greilhuber, Julian Grüber, Matthias Hammerl, Jona Marie Has-
senbach, Nobuya Kakehashi, Iryna, Karpenko, Jerzy Knopik, Nikolas Knotz,
Florian Kogelbauer, Helge Krüger, Reinhold Küstner, Oliver Leingang, Juho
Leppäkangas, Joris Mestdagh, Alice Mikikits-Leitner, Claudiu Mîndrilǎ, Jakob
Möller, Caroline Moosmüller, Matthias Ostermann, Piotr Owczarek, Mar-
tina Pflegpeter, Mateusz Piorkowski, Tobias Preinerstorfer, Maximilian H.
Ruep, Tidhar Sariel, Christian Schmid, Laura Shou, Bertram Tschiderer,
Liam Urban, Vincent Valmorin, David Wallauch, Richard Welke, David
Wimmesberger, Song Xiaojun, Markus Youssef, Rudolf Zeidler, and col-
leagues Pierre-Antoine Absil, Nils C. Framstad, Fritz Gesztesy, Heinz Hanß-
mann, Günther Hörmann, Aleksey Kostenko, Wallace Lam, Daniel Lenz,
Johanna Michor, Viktor Qvarfordt, Alex Strohmaier, David C. Ullrich, Hen-
drik Vogt, Marko Stautz, Maxim Zinchenko who have pointed out several
typos and made useful suggestions for improvements. I am also grateful to
Volker Enß for making his lecture notes on nonlinear Functional Analysis
available to me.
Gerald Teschl
Vienna, Austria
January, 2019
Part 1
Functional Analysis
Chapter 1
3
4 1. A first look at Banach and Hilbert spaces
which cannot be satisfied and explains our choice of sign above). In summary,
we obtain the solutions
2
un (t, x) := cn e−(πn) t sin(nπx), n ∈ N. (1.10)
So we have found a large number of solutions, but we still have not dealt
with our initial condition u(0, x) = u0 (x). This can be done using the
superposition principle which holds since our equation is linear: Any finite
linear combination of the above solutions will again be a solution. Moreover,
1.1. Introduction: Linear partial differential equations 5
we see that the solution of our original problem is given by (1.11) if we choose
cn = û0,n (cf. Problem 1.2).
Of course for this last statement to hold we need to ensure that the series
in (1.11) converges and that we can interchange summation and differentia-
tion. You are asked to do so in Problem 1.1.
In fact, many equations in physics can be solved in a similar way:
• Reaction-Diffusion equation:
∂ ∂2
u(t, x) − 2 u(t, x) + q(x)u(t, x) = 0,
∂t ∂x
u(0, x) = u0 (x),
u(t, 0) = u(t, 1) = 0. (1.14)
Here u(t, x) could be the density of some gas in a pipe and q(x) > 0 describes
that a certain amount per time is removed (e.g., by a chemical reaction).
• Wave equation:
∂2 ∂2
u(t, x) − u(t, x) = 0,
∂t2 ∂x2
∂u
u(0, x) = u0 (x), (0, x) = v0 (x)
∂t
u(t, 0) = u(t, 1) = 0. (1.15)
• Schrödinger equation:
∂ ∂2
i u(t, x) = − 2 u(t, x) + q(x)u(t, x),
∂t ∂x
u(0, x) = u0 (x),
u(t, 0) = u(t, 1) = 0. (1.16)
Here |u(t, x)|2 is the probability distribution of a particle trapped in a box
x ∈ [0, 1] and q(x) is a given external potential which describes the forces
acting on the particle.
All these problems (and many others) lead to the investigation of the
following problem
d2
Ly(x) = λy(x), L := − + q(x), (1.17)
dx2
subject to the boundary conditions
y(a) = y(b) = 0. (1.18)
Such a problem is called a Sturm–Liouville boundary value problem.
Our example shows that we should prove the following facts about Sturm–
Liouville problems:
(i) The Sturm–Liouville problem has a countable number of eigenval-
ues En with corresponding eigenfunctions un , that is, un satisfies
the boundary conditions and Lun = En un .
(ii) The eigenfunctions un are complete, that is, any nice function u
can be expanded into a generalized Fourier series
∞
X
u(x) = cn un (x).
n=1
This problem is very similar to the eigenvalue problem of a matrix and we
are looking for a generalization of the well-known fact that every symmetric
matrix has an orthonormal basis of eigenvectors. However, our linear opera-
tor L is now acting on some space of functions which is not finite dimensional
and it is not at all clear what (e.g.) orthogonal should mean in this context.
Moreover, since we need to handle infinite series, we need convergence and
hence we need to define the distance of two functions as well.
Hence our program looks as follows:
• What is the distance of two functions? This automatically leads
us to the problem of convergence and completeness.
• If we additionally require the concept of orthogonality, we are led
to Hilbert spaces which are the proper setting for our eigenvalue
problem.
1.2. The Banach space of continuous functions 7
It is not hard to see that with this definition C(I) becomes a normed vector
space:
A normed vector space X is a vector space X over C (or R) with a
nonnegative function (the norm) k.k : X → [0, ∞) such that
• kf k > 0 for f 6= 0 (positive definiteness),
• kα f k = |α| kf k for all α ∈ C, f ∈ X (positive homogeneity),
and
• kf + gk ≤ kf k + kgk for all f, g ∈ X (triangle inequality).
If positive definiteness is dropped from the requirements, one calls k.k a
seminorm.
From the triangle inequality we also get the inverse triangle inequal-
ity (Problem 1.3)
|kf k − kgk| ≤ kf − gk, (1.20)
which shows that the norm is continuous.
Also note that norms are closely related to convexity. To this end recall
that a subset C ⊆ X is called convex if for every x, y ∈ C we also have
λx + (1 − λ)y ∈ C whenever λ ∈ (0, 1). Moreover, a mapping f : C → R is
called convex if f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) whenever λ ∈ (0, 1)
and in our case the triangle inequality plus homogeneity imply that every
norm is convex:
kλx + (1 − λ)yk ≤ λkxk + (1 − λ)kyk, λ ∈ [0, 1]. (1.21)
Moreover, choosing λ = 21 we get back the triangle inequality upon using
homogeneity. In particular, the triangle inequality could be replaced by
convexity in the definition.
Once we have a norm, we have a distance d(f, g) := kf − gk and hence
we know when a sequence of vectors fn converges to a vector f (namely
if d(fn , f ) → 0). We will write fn → f or limn→∞ fn = f , as usual, in
this case. Moreover, a mapping F : X → Y between two normed spaces
is called continuous if for every convergent sequence fn → f from X we
have F (fn ) → F (f ) (with respect to the norm of X and Y , respectively). In
fact, the norm, vector addition, and multiplication by scalars are continuous
(Problem 1.4).
In addition to the concept of convergence, we also have the concept of a
Cauchy sequence and hence the concept of completeness: A normed space
is called complete if every Cauchy sequence has a limit. A complete normed
space is called a Banach space.
Example 1.1. By completeness of the real numbers, R as well as C with
the absolute value as norm are Banach spaces.
1.2. The Banach space of continuous functions 9
p=∞
1
p=4
p=2
p=1
1
p= 2
−1 1
−1
is a Banach space (Problem 1.10). Note that with this definition, Hölder’s
inequality (1.25) remains true for the cases p = 1, q = ∞ and p = ∞, q = 1.
The reason for the notation is explained in Problem 1.14.
Example 1.5. Every closed subspace of a Banach space is again a Banach
space. For example, the space c0 (N) ⊂ `∞ (N) of all sequences converging to
zero is a closed subspace. In fact, if a ∈ `∞ (N)\c0 (N), then lim supj→∞ |aj | =
ε > 0 and thus ka − bk∞ ≥ ε for every b ∈ c0 (N).
Now what about completeness of C(I)? A sequence of functions fn
converges to f if and only if
lim kf − fn k∞ = lim max |f (x) − fn (x)| = 0. (1.27)
n→∞ n→∞ x∈I
For finite dimensional vector spaces the concept of a basis plays a crucial
role. In the case of infinite dimensional vector spaces one could define a
basis as a maximal set of linearly independent vectors (known as a Hamel
basis; Problem 1.7). Such a basis has the advantage that it only requires
finite linear combinations. However, the price one has to pay is that such
a basis will be way too large (typically uncountable, cf. Problems 1.6 and
12 1. A first look at Banach and Hilbert spaces
1/p
X∞
ka − am kp = |aj |p → 0
j=m+1
∞
X
a= an δ n
n=1
1.2. The Banach space of continuous functions 13
and (δ n )∞
n=1 is a Schauder basis (uniqueness of the coefficients is left as an
exercise).
Note that (δ n )∞
n=1 is also Schauder basis for c0 (N) but not for ` (N) (try
∞
(In other words, un has mass one and concentrates near x = 0 as n → ∞.)
Then for every f ∈ C[− 21 , 12 ] which vanishes at the endpoints, f (− 12 ) =
f ( 21 )
= 0, we have that
Z 1/2
fn (x) := un (x − y)f (y)dy (1.34)
−1/2
Proof. Since f is uniformly continuous, for given ε we can find a δ < 1/2
(independent of x) such that |f (x) −R f (y)| ≤ ε whenever |x − y| ≤ δ. More-
over, we can choose n such that δ≤|y|≤1 un (y)dy ≤ ε. Now abbreviate
M := maxx∈[−1/2,1/2] {1, |f (x)|} and note
Z 1/2 Z 1/2
|f (x) − un (x − y)f (x)dy| = |f (x)| |1 − un (x − y)dy| ≤ M ε.
−1/2 −1/2
14 1. A first look at Banach and Hilbert spaces
Corollary 1.4. The monomials are total and hence C(I) is separable.
Note that while the proof of Theorem 1.3 provides an explicit way of
constructing a sequence of polynomials fn (x) which will converge uniformly
to f (x), this method still has a few drawbacks from a practical point of
view: Suppose we have approximated f by a polynomial of degree n but our
approximation turns out to be insufficient for the intended purpose. First
of all, since our polynomial will not be optimal in general, we could try to
find another polynomial of the same degree giving a better approximation.
However, as this is by no means straightforward, it seems more feasible to
simply increase the degree. However, if we do this, all coefficients will change
and we need to start from scratch. This is in contradistinction to a Schauder
basis where we could just add one new element from the basis (and where it
suffices to compute one new coefficient).
In particular, note that this shows that the monomials are no Schauder
basis for C(I) since adding monomials incrementally to the expansion gives
a uniformly convergent power series whose limit must be analytic. This
observation emphasizes that a Schauder basis is more than a set of linearly
independent vectors whose span is dense.
We will see in the next section that the concept of orthogonality resolves
these problems.
Problem* 1.4. Let X be a Banach space. Show that the norm, vector
addition, and multiplication by scalars are continuous. That is, if fn → f ,
gn → g, and αn → α, then kfn k → kf k, fn + gn → f + g, and αn gn → αg.
Problem 1.6. While `1 (N) is separable, it still has room for an uncountable
set of linearly independent vectors. Show this by considering vectors of the
form
aα = (1, α, α2 , . . . ), α ∈ (0, 1).
(Hint: Recall the Vandermonde determinant. See Problem 4.2 for a gener-
alization.)
16 1. A first look at Banach and Hilbert spaces
Problem 1.15. Formally extend the definition of `p (N) to p ∈ (0, 1). Show
that k.kp does not satisfy the triangle inequality. However, show that it is a
quasinormed space, that is, it satisfies all requirements for a normed space
except for the triangle inequality which is replaced by
ka + bk ≤ K(kak + kbk)
1.3. The geometry of Hilbert spaces 17
The pair (H, h., ..i) is called an inner product space. If H is complete
(with respect to the norm (1.36)), it is called a Hilbert space.
Example 1.8. Clearly, Cn with the usual scalar product
n
X
ha, bi := a∗j bj (1.37)
j=1
BMB
f B f⊥
B
1
B
fk
u
1
Proof. It suffices to prove the case kgk = 1. But then the claim follows
from kf k2 = |hg, f i|2 + kf⊥ k2 .
In this case the scalar product can be recovered from its norm by virtue
of the polarization identity
1
kf + gk2 − kf − gk2 + ikf − igk2 − ikf + igk2 . (1.47)
hf, gi =
4
Proof. If an inner product space is given, verification of the parallelogram
law and the polarization identity is straightforward (Problem 1.20).
To show the converse, we define
1
kf + gk2 − kf − gk2 + ikf − igk2 − ikf + igk2 .
s(f, g) :=
4
Then s(f, f ) = kf k2 and s(f, g) = s(g, f )∗ are straightforward to check.
Moreover, another straightforward computation using the parallelogram law
shows
g+h
s(f, g) + s(f, h) = 2s(f, ).
2
Now choosing h = 0 (and using s(f, 0) = 0) shows s(f, g) = 2s(f, g2 ) and thus
s(f, g)+s(f, h) = s(f, g +h). Furthermore, by induction we infer 2mn s(f, g) =
s(f, 2mn g); that is, α s(f, g) = s(f, αg) for a dense set of positive rational
numbers α. By continuity (which follows from continuity of the norm) this
holds for all α ≥ 0 and s(f, −g) = −s(f, g), respectively, s(f, ig) = i s(f, g),
finishes the proof.
The corresponding inner product space is denoted by L2cont (I). Note that we
have
(1.49)
p
kf k ≤ |b − a|kf k∞
and hence the maximum norm is stronger than the L2cont norm.
Suppose we have two norms k.k1 and k.k2 on a vector space X. Then
k.k2 is said to be stronger than k.k1 if there is a constant m > 0 such that
kf k1 ≤ mkf k2 . (1.50)
It is straightforward to check the following.
Lemma 1.7. If k.k2 is stronger than k.k1 , then every k.k2 Cauchy sequence
is also a k.k1 Cauchy sequence.
1.3. The geometry of Hilbert spaces 21
Proof. ChooseP a basis {uj }1≤j≤n such that every f ∈ X can be writ-
ten as f = j αj uj . Since equivalence of norms is an equivalence rela-
tion (check this!), we can assume that k.k2 is the usual Euclidean norm:
kf k2 := k j αj uj k2 = ( j |αj |2 )1/2 . Then by the triangle and Cauchy–
P P
Schwarz inequalities,
X sX
kf k1 ≤ |αj |kuj k1 ≤ kuj k21 kf k2
j j
qP
and we can choose m2 = j kuj k21 .
In particular, if fn is convergent with respect to k.k2 , it is also convergent
with respect to k.k1 . Thus k.k1 is continuous with respect to k.k2 and attains
its minimum m > 0 on the unit sphere S := {u|kuk2 = 1} (which is compact
by the Heine–Borel theorem, Theorem B.22). Now choose m1 = 1/m.
Finally, I remark that a real Hilbert space can always be embedded into
a complex Hilbert space. In fact, if H is a real Hilbert space, then H × H is
a complex Hilbert space if we define
(f1 , f2 )+(g1 , g2 ) = (f1 +g1 , f2 +g2 ), (α+iβ)(f1 , f2 ) = (αf1 −βf2 , αf2 +βf1 )
(1.52)
and
h(f1 , f2 ), (g1 , g2 )i = hf1 , g1 i + hf2 , g2 i + i(hf1 , g2 i − hf2 , g1 i). (1.53)
Here you should think of (f1 , f2 ) as f1 + if2 . Note that we have a conjugate
linear map C : H × H → H × H, (f1 , f2 ) 7→ (f1 , −f2 ) which satisfies C 2 = I
and hCf, Cgi = hg, f i. In particular, we can get our original Hilbert space
back if we consider Re(f ) = 21 (f + Cf ) = (f1 , 0).
Problem 1.17. Show that the norm in a Hilbert space satisfies kf + gk =
kf k + kgk if and only if f = αg, α ≥ 0, or g = 0. Hence Hilbert spaces are
strictly convex (cf. Problem 1.13).
Problem 1.18 (Generalized parallelogram law). Show that, in a Hilbert
space, X X X
kxj − xk k2 + k xj k2 = n kxj k2
1≤j<k≤n 1≤j≤n 1≤j≤n
for every n ∈ N. The case n = 2 is (1.46).
Problem 1.19. Show that the maximum norm on C[0, 1] does not satisfy
the parallelogram law.
Problem* 1.20. Suppose Q is a complex vector space. Let s(f, g) be a
sesquilinear form on Q and q(f ) := s(f, f ) the associated quadratic form.
Prove the parallelogram law
q(f + g) + q(f − g) = 2q(f ) + 2q(g) (1.54)
1.4. Completeness 23
is finite. Show
kqk ≤ ksk ≤ 2kqk
with kqk = ksk if s is symmetric. (Hint: Use the polarization identity from
the previous problem. For the symmetric case look at the real part.)
Problem* 1.22. Suppose Q is a vector space. Let s(f, g) be a sesquilinear
form on Q and q(f ) := s(f, f ) the associated quadratic form. Show that the
Cauchy–Schwarz inequality
|s(f, g)| ≤ q(f )1/2 q(g)1/2
holds if q(f ) ≥ 0. In this case q(.)1/2 satisfies the triangle inequality and
hence is a seminorm.
(Hint: Consider 0 ≤ q(f + αg) = q(f ) + 2Re(α s(f, g)) + |α|2 q(g) and
choose α = t s(f, g)∗ /|s(f, g)| with t ∈ R.)
Problem* 1.23. Prove the claims made about fn in Example 1.10.
1.4. Completeness
Since L2cont is not complete, how can we obtain a Hilbert space from it? Well,
the answer is simple: take the completion.
If X is an (incomplete) normed space, consider the set of all Cauchy
sequences X . Call two Cauchy sequences equivalent if their difference con-
verges to zero and denote by X̄ the set of all equivalence classes. It is easy
to see that X̄ (and X ) inherit the vector space structure from X. Moreover,
Lemma 1.9. If xn is a Cauchy sequence in X, then kxn k is also a Cauchy
sequence and thus converges.
24 1. A first look at Banach and Hilbert spaces
The only requirement for a norm which is not immediate is the triangle
inequality (except for p = 1, 2) but this can be shown as for `p (cf. Prob-
lem 1.26).
1.5. Compactness
In finite dimensions relatively compact sets are easily identified as they are
precisely the bounded sets by the Heine–Borel theorem (Theorem B.22).
In the infinite dimensional case the situation is more complicated. Before
we look into this, please recall that for a subset U of a Banach space the
following are equivalent (see Corollary B.20 and Lemma B.26):
• U is relatively compact (i.e. its closure is compact)
• every sequence from U has a convergent subsequence
• U is totally bounded (i.e. it has a finite ε-cover for every ε > 0)
Example 1.13. Consider the bounded sequence (δ n )∞
n=1 in ` (N). Since
p
n m
kδ − δ kp = 2 1/p for n 6= m, there is no way to extract a convergent
subsequence.
In particular, the Heine–Borel theorem fails for `p (N). In fact, it turns
out that it fails in any infinite dimensional space as we will see in Theo-
rem 4.30 below. Hence one needs criteria when a given subset is relatively
compact. Our strategy will be based on total boundedness and can be out-
lined as follows: Project the original set to some finite dimensional space
such that the information loss can be made arbitrarily small (by increasing
the dimension of the finite dimensional space) and apply Heine–Borel to the
finite dimensional space. This idea is formalized in the following lemma.
Lemma 1.11. Let X be a metric space and K some subset. Assume that
for every ε > 0 there is a metric space Yn , a surjective map Pn : X → Yn ,
and some δ > 0 such that Pn (K) is totally bounded and d(x, y) < ε whenever
x, y ∈ K with d(Pn (x), Pn (y)) < δ. Then K is totally bounded.
In particular, if X is a Banach space the claim holds if Pn can be chosen
a linear map onto a finite dimensional subspace Yn such that kPn k ≤ C,
Pn K is bounded, and k(1 − Pn )xk ≤ ε for x ∈ K.
26 1. A first look at Banach and Hilbert spaces
{Bε (xj )}nj=1 is an ε-cover for K since Pn−1 (Bδ (yj )) ∩ K ⊆ Bε (xj ).
For the last claim take Pn corresponding to ε/3 and note that kx − yk ≤
k(1 − Pn )xk + kPn (x − y)k + k(1 − Pn )yk < ε for δ := ε/3.
Proof. Clearly (i) and (ii) is what is needed for Lemma 1.11.
Conversely, if K is relatively compact it is bounded. Moreover, given
δ we can choose a finite δ-cover {Bδ (aj )}m
j=1 for K and some n such that
k(1 − Pn )a kp ≤ δ for all 1 ≤ j ≤ m. Now given a ∈ K we have a ∈ Bδ (aj )
j
for some j and hence k(1−Pn )akp ≤ k(1−Pn )(a−aj )kp +k(1−Pn )aj kp ≤ 2δ
as required.
Example 1.14. Fix a ∈ `p (N) if 1 ≤ p < ∞ or a ∈ c0 (N) else. Then
K := {b| |bj | ≤ |aj |} ⊂ `p (N) is compact.
The second application will be to C(I). A family of functions F ⊂ C(I)
is called (pointwise) equicontinuous if for every ε > 0 and every x ∈ I
there is a δ > 0 such that
|f (y) − f (x)| ≤ ε whenever |y − x| < δ, ∀f ∈ F. (1.56)
That is, in this case δ is required to be independent of the function f ∈ F .
Theorem 1.13 (Arzelà–Ascoli). Let F ⊂ C(I) be a family of continuous
functions. Then every sequence from F has a uniformly convergent sub-
sequence if and only if F is equicontinuous and the set {f (x0 )|f ∈ F } is
bounded for one x0 ∈ I. In this case F is even bounded.
connected and since x0 ∈ Bδj (xj ) for some j, we see that F is bounded:
|f (x)| ≤ supf ∈F |f (x0 )| + nε.
Next consider P : C[0, 1] → Cn , P (f ) = (f (x1 ), . . . , f (xn )). Then P (F )
is bounded and kf − gk∞ ≤ 3ε whenever kP (f ) − P (g)k∞ < ε. Indeed,
just note that for every x there is some j such that x ∈ Bδj (xj ) and thus
|f (x) − g(x)| ≤ |f (x) − f (xj )| + |f (xj ) − g(xj )| + |g(xj ) − g(x)| ≤ 3ε. Hence
F is relatively compact by Lemma 1.11.
Conversely, suppose F is relatively compact. Then F is totally bounded
and hence bounded. To see equicontinuity fix x ∈ I, ε > 0 and choose a
corresponding ε-cover {Bε (fj )}nj=1 for F . Pick δ > 0 such that y ∈ Bδ (x)
implies |fj (y) − fj (x)| < ε for all 1 ≤ j ≤ n. Then f ∈ Bε (fj ) for some j and
hence |f (y) − f (x)| ≤ |f (y) − fj (y)| + |fj (y) − fj (x)| + |fj (x) − f (x)| ≤ 3ε,
proving equicontinuity.
Example 1.15. Consider the solution yn (x) of the initial value problem
y 0 = sin(ny), y(0) = 1.
(Assuming this solution exists — it can in principle be found using separation
of variables.) Then |yn0 (x)| ≤ 1 and hence the mean value theorem shows
that the family {yn } ⊆ C([0, 1]) is equicontinuous. Hence there is a uniformly
convergent subsequence.
Problem 1.27. Show that a subset F ⊂ c0 (N) is relatively compact if and
only if there is a nonnegative sequence a ∈ c0 (N) such that |bn | ≤ an for all
n ∈ N and all b ∈ F .
Problem 1.28. Find a family in C[0, 1] that is equicontinuous but not
bounded.
Problem 1.29. Which of the following families are relatively compact in
C[0, 1]?
(i) F = {f ∈ C 1 [0, 1]| kf k∞ ≤ 1}
(ii) F = {f ∈ C 1 [0, 1]| kf 0 k∞ ≤ 1}
(iii) F = {f ∈ C 1 [0, 1]| kf k∞ ≤ 1, kf 0 k2 ≤ 1}
and range
Ran(A) := {Af |f ∈ D(A)} = AD(A) ⊆ Y (1.59)
are again linear subspaces. Note that a linear map A will be continuous if
and only if it is continuous at 0, that is, xn ∈ D(A) → 0 implies Axn → 0.
The operator A is called bounded if the operator norm
kAk := sup kAf kY (1.60)
f ∈D(A),kf kX =1
is finite. This says that A is bounded if the image of the closed unit ball
B̄1 (0) ⊂ X is contained in some closed ball B̄r (0) ⊂ Y of finite radius r
(with the smallest radius being the operator norm). Hence A is bounded if
and only if it maps bounded sets to bounded sets.
Note that if you replace the norm on X or Y then the operator norm
will of course also change in general. However, if the norms are equivalent
so will be the operator norms.
By construction, a bounded operator satisfies
kAf kY ≤ kAkkf kX , f ∈ D(A), (1.61)
and hence is Lipschitz continuous, that is, kAf − AgkY ≤ kAkkf − gkX for
f, g ∈ D(A). In particular, it is continuous. The converse is also true:
Theorem 1.14. A linear operator A is bounded if and only if it is continu-
ous.
Proof. Choose a basis {xj }nj=1 for X such that every x ∈ X can be written
as x = nj=1 αj xj . By Theorem 1.8 we can assume kxkX = ( nj=1 |αj |2 )1/2
P P
without loss of generality. Then
v
n
u n
X uX
kAxkY ≤ |αj |kAxj kY ≤ t kAxj k2Y kxk
j=1 j=1
However, if we consider A = dx d
: D(A) ⊆ Y → Y defined on D(A) =
C [0, 1], then we have an unbounded operator. Indeed, choose un (x) :=
1
`j (a) := aj
are bounded linear functionals: |`j (a)| = |aj | ≤ kakp and hence k`j k = 1.
More general, let b ∈ `q (N) where p1 + 1q = 1. Then
n
X
`b (a) := bj aj
j=1
The Banach space of bounded linear operators L (X) even has a multi-
plication given by composition. Clearly, this multiplication is distributive
(A + B)C = AC + BC, A(B + C) = AB + BC, A, B, C ∈ L (X) (1.62)
and associative
(AB)C = A(BC), α (AB) = (αA)B = A (αB), α ∈ C. (1.63)
Moreover, it is easy to see that we have
kABk ≤ kAkkBk. (1.64)
In other words, L (X) is a so-called Banach algebra. However, note that
our multiplication is not commutative (unless X is one-dimensional). We
even have an identity, the identity operator I, satisfying kIk = 1.
Problem 1.30. Consider X = Cn and let A ∈ L (X) be a matrix. Equip
X with the norm (show that this is a norm)
kxk∞ := max |xj |
1≤j≤n
32 1. A first look at Banach and Hilbert spaces
and compute the operator norm kAk with respect to this norm in terms of
the matrix entries. Do the same with respect to the norm
X
kxk1 := |xj |.
1≤j≤n
Problem* 1.32. Let I be a compact interval. Show that the set of dif-
ferentiable functions C 1 (I) becomes a Banach space if we set kf k∞,1 :=
maxx∈I |f (x)| + maxx∈I |f 0 (x)|.
Problem* 1.33. Show that kABk ≤ kAkkBk for every A, B ∈ L (X).
Conclude that the multiplication is continuous: An → A and Bn → B imply
An Bn → AB.
Problem 1.34. Let A ∈ L (X) be a bijection. Show
kA−1 k−1 = inf kAf k.
f ∈X,kf k=1
Problem* 1.35. Suppose B ∈ L (X) with kBk < 1. Then I+B is invertible
with
X∞
−1
(I + B) = (−1)n B n .
n=0
Consequently for A, B ∈ L (X, Y ), A + B is invertible if A is invertible and
kBk < kA−1 k−1 .
Problem* 1.36. Let
∞
X
f (z) := fj z j , |z| < R,
j=0
exists and defines a bounded linear operator. Moreover, if f and g are two
such functions and α ∈ C, then
(f + g)(A) = f (A) + g(A), (αf )(A) = αf (a), (f g)(A) = f (A)g(A).
(Hint: Problem 1.5.)
Problem* 1.37. Show that a linear map ` : X → C is continuous if and
only if its kernel is closed. (Hint: If ` is not continuous, we can find a
sequence of normalized vectors xn with |`(xn )| → ∞ and a vector y with
`(y) = 1.)
be closed (no problems occur if one of the spaces is finite dimensional — see
Corollary 1.19 below).
Example 1.22. Consider X := `p (N). Let M = {a ∈ X|a2n = 0} and N =
{a ∈ X|a2n+1 = n3 a2n }. Then both subspaces are closed and M ∩ N = {0}.
Moreover, M u N is dense since it contains all sequences with finite support.
However, it is not all of X since an = n12 6∈ M u N . Indeed, if we could
write a = b + c ∈ M u N , then c2n = 4n1 2 and hence c2n+1 = n4 contradicting
c ∈ N ⊆ X.
A closed subspace M is called complemented if we can find another
closed subspace N with M ∩ N = {0} and M u N = X. In this case every
x ∈ X can be uniquely written as x = x1 + x2 with x1 ∈ M , x2 ∈ N and
we can define a projection P : X → M , x 7→ x1 . By definition P 2 = P
and we have a complementary projection Q := I − P with Q : X → N ,
x 7→ x2 . Moreover, it is straightforward to check M = Ker(Q) = Ran(P )
and N = Ker(P ) = Ran(Q). Of course one would like P (and hence also
Q) to be continuous. If we consider the linear operator φ : M ⊕ N → X,
(x1 , x2 ) → x1 +x2 then this is equivalent to the question if φ−1 is continuous.
By the triangle inequality φ is continuous with kφk ≤ 1 and the inverse
mapping theorem (Theorem 4.6) will answer this question affirmative.
It is important to emphasize, that it is precisely the requirement that N
is closed which makes P continuous (conversely observe that N = Ker(P )
is closed if P is continuous). Without this requirement we can always find
N by a simple application of Zorn’s lemma (order the subspaces which have
trivial intersection with M by inclusion and note that a maximal element has
the required properties). Moreover, the question which closed subspaces can
be complemented is a highly nontrivial one. If M is finite (co)dimensional,
then it can be complemented (see Problems 1.44 and 4.26).
Given a subspace M of a linear space X we can define the quotient
space X/M as the set of all equivalence classes [x] = x + M with respect
to the equivalence relation x ≡ y if x − y ∈ M . It is straightforward to see
that X/M is a vector space when defining [x] + [y] = [x + y] and α[x] = [αx]
(show that these definitions are independent of the representative of the
equivalence class). In particular, for a linear operator A : X → Y the linear
space Coker(A) := Y / Ran(A) is know as the cokernel of A. The dimension
of X/M is known as the codimension of M .
is a Banach space.
1.7. Sums and quotients of Banach spaces 35
Proof. First of all we need to show that (1.65) is indeed a norm. If k[x]k = 0
we must have a sequence yj ∈ M with yj → −x and since M is closed we
conclude x ∈ M , that is [x] = [0] as required. To see kα[x]k = |α|k[x]k we
use again the definition
kα[x]k = k[αx]k = inf kαx + yk = inf kαx + αyk
y∈M y∈M
= |α| inf kx + yk = |α|k[x]k.
y∈M
Show that X is a Banach space. Show that all norms are equivalent and that
this sum is associative (X1 ⊕p X2 ) ⊕p X3 = X1 ⊕p (X2 ⊕p X3 ).
L
Problem 1.39. Let Xj , j ∈ N, be Banach spaces. Let X := p,j∈N Xj be
the set of all elements x = (xj )j∈N of the Cartesian product for which the
norm 1/p
kxj kp
P
j∈N , 1 ≤ p < ∞,
kxkp :=
j∈N kxj k, p = ∞,
max
is finite. Show that X is a Banach space. Show that for 1 ≤ p < ∞ the
elements with finitely many nonzero terms are dense and conclude that X is
separable if all Xj are.
Problem 1.40. Let ` be a nontrivial linear functional. Then its kernel has
codimension one.
Note that the space Cbk (I) could be further refined by requiring the
highest derivatives to be Hölder continuous. Recall that a function f : I → C
is called uniformly Hölder continuous with exponent γ ∈ (0, 1] if
|f (x) − f (y)|
[f ]γ := sup (1.67)
x6=y∈I |x − y|γ
is finite. Clearly, any Hölder continuous function is uniformly continuous
and, in the special case γ = 1, we obtain the Lipschitz continuous func-
tions. Note that for γ = 0 the Hölder condition boils down to boundedness
and also the case γ > 1 is not very interesting (Problem 1.45).
Example 1.25. By the mean value theorem every function f ∈ Cb1 (I) is
Lipschitz continuous with [f ]γ ≤ kf 0 k∞ .
Example 1.26. The prototypical example of a Hölder continuous function
is of course f (x) = xγ on [0, ∞) with γ ∈ (0, 1]. In fact, without loss of
generality we can assume 0 ≤ x < y and set t = xy ∈ [0, 1). Then we have
y γ − xγ 1 − tγ 1−t
γ
≤ γ
≤ = 1.
(y − x) (1 − t) 1−t
From this one easily gets further examples since the composition of two
Hölder continuous functions is again Hölder continuous (the exponent being
the product).
It is easy to verify that this is a seminorm and that the corresponding
space is complete.
38 1. A first look at Banach and Hilbert spaces
Theorem 1.21. Let I ⊆ R be an interval. The space Cbk,γ (I) of all functions
whose partial derivatives up to order k are bounded and Hölder continuous
with exponent γ ∈ (0, 1] form a Banach space with norm
implying lim supm→∞ [gm ]γ2 ≤ 2Cεγ1 −γ2 and since ε > 0 is arbitrary this
establishes the claim.
As pointed out in the example before, the embedding Cb1 (I) ⊆ Cb0,1 (I) is
continuous and combining this with the previous result immediately gives
For now continuous functions on intervals will be sufficient for our pur-
pose. However, once we delve deeper into the subject we will also need
continuous functions on topological spaces X. Luckily most of the results
extend to this case in a more or less straightforward way. If you are not
familiar with these extensions you can find them in Section B.8.
Problem 1.45. Let I be an interval. Suppose f : I → C is Hölder continu-
ous with exponent γ > 1. Show that f is constant.
Problem* 1.46. Suppose X is a vector space Pand k.kj , 1 ≤ j ≤ n, is a
finite family of seminorms. Show that kxk := nj=1 kxkj is a seminorm. It
is a norm if and only if kxkj = 0 for all j implies x = 0.
Problem 1.47. Let I. Show that the product of two bounded Hölder contin-
uous functions is again Hölder continuous with
[f g]γ ≤ kf k∞ [g]γ + [f ]γ kgk∞ .
Chapter 2
Hilbert spaces
41
42 2. Hilbert spaces
computes
kf − fˆk2 = kfk + f⊥ − fˆk2 = kf⊥ k2 + kfk − fˆk2
n
X
2
= kf⊥ k + |αj − huj , f i|2
j=1
with equality holding if and only if f lies in the span of {uj }nj=1 .
Of course, since we cannot assume H to be a finite dimensional vec-
tor space, we need to generalize Lemma 2.1 to arbitrary orthonormal sets
{uj }j∈J . We start by assuming that J is countable. Then Bessel’s inequality
(2.4) shows that X
|huj , f i|2 (2.5)
j∈J
converges absolutely. Moreover, for any finite subset K ⊂ J we have
X X
k huj , f iuj k2 = |huj , f i|2 (2.6)
j∈K j∈K
by the Pythagorean theorem and thus j∈J huj , f iuj is a Cauchy sequence
P
if and only if j∈J |huj , f i|2 is. Now let J be arbitrary. Again, Bessel’s
P
inequality shows that for any given ε > 0 there are at most finitely many
j for which |huj , f i| ≥ ε (namely at most kf k/ε). Hence there are at most
countably many j for which |huj , f i| > 0. Thus it follows that
X
|huj , f i|2 (2.7)
j∈J
is well defined (as a countable sum over the nonzero terms) and (by com-
pleteness) so is X
huj , f iuj . (2.8)
j∈J
Furthermore, it is also independent of the order of summation.
2.1. Orthonormal bases 43
Proof. The first part follows as in Lemma 2.1 using continuity of the scalar
product. The same is true for the lastP part except for the fact that every
f ∈ span{uj }j∈J can be written as f = j∈J αj uj (i.e., f = fk ). To see this,
let fn ∈ span{uj }j∈J converge to f . Then kf −fn k2 = kfk −fn k2 +kf⊥ k2 → 0
implies fn → fk and f⊥ = 0.
Note that from Bessel’s inequality (which of course still holds), it follows
that the map f → fk is continuous.
Of course we are
P particularly interested in the case where every f ∈ H
can be written as j∈J huj , f iuj . In this case we will call the orthonormal
set {uj }j∈J an orthonormal basis (ONB).
If H is separable it is easy to construct an orthonormal basis. In fact, if
H is separable, then there exists a countable total set {fj }N
j=1 . Here N ∈ N
if H is finite dimensional and N = ∞ otherwise. After throwing away some
vectors, we can assume that fn+1 cannot be expressed as a linear combination
of the vectors f1 , . . . , fn . Now we can construct an orthonormal set as
follows: We begin by normalizing f1 :
f1
u1 := . (2.12)
kf1 k
Next we take f2 and remove the component parallel to u1 and normalize
again:
f2 − hu1 , f2 iu1
u2 := . (2.13)
kf2 − hu1 , f2 iu1 k
44 2. Hilbert spaces
for any finite n and thus also for n = N (if N = ∞). Since {fj }N j=1 is total,
so is {uj }j=1 . Now suppose there is some f = fk + f⊥ ∈ H for which f⊥ 6= 0.
N
Since {uj }N ˆ ˆ
j=1 is total, we can find a f in its span such that kf − f k < kf⊥ k,
contradicting (2.11). Hence we infer that {uj }N j=1 is an orthonormal basis.
By continuity of the norm it suffices to check (iii), and hence also (ii),
for f in a dense set. In fact, by the inverse triangle inequality for `2 (N) and
the Bessel inequality we have
s sX
X 2
X
2
X
|huj , f i| − |huj , gi| ≤ |huj , f − gi|2 |huj , f + gi|2
j∈J j∈J j∈J j∈J
≤ kf − gkkf + gk (2.19)
implying |huj , fn i|2 → |huj , f i|2 if fn → f .
P P
j∈J j∈J
It is not surprising that if there is one countable basis, then it follows
that every other basis is countable as well.
Theorem 2.5. In a Hilbert space H every orthonormal basis has the same
cardinality.
Proof. Let {uj }j∈J and {vk }k∈K be two orthonormal bases. We first look
at the case where one of them, say the first, is finite dimensional: J =
{1, . . . , n}. Suppose
Pn the other basis has at least n elements {1, . . . , n} ⊆
K. Then vk = j=1 Uk,j uj , where Uk,j = huj , vk i. By δj,k = hvj , vk i =
Pn Pn
l=1 Uj,l Uk,l we see uj = k=1 Uk,j vk showing that K cannot have more
∗ ∗
than n elements.
Now let us turn to the case where both J and K are infinite. Set Kj =
6 0}. Since these are the expansion coefficients of uj with
{k ∈ K|hvk , uj i =
respect to {vk }k∈K , this set is countable (and nonempty). Hence the set
K̃ = j∈J Kj satisfies |K̃| ≤ |J × N| = |J| (Theorem A.9) But k ∈ K \ K̃
S
It even turns out that, up to unitary equivalence, there is only one sep-
arable infinite dimensional Hilbert space:
A bijective linear operator U ∈ L (H1 , H2 ) is called unitary if U pre-
serves scalar products:
hU g, U f i2 = hg, f i1 , g, f ∈ H1 . (2.20)
By the polarization identity, (1.47) this is the case if and only if U preserves
norms: kU f k2 = kf k1 for all f ∈ H1 (note a norm preserving linear operator
is automatically injective). The two Hilbert spaces H1 and H2 are called
unitarily equivalent in this case.
Let H be a separable infinite dimensional Hilbert space and let {uj }j∈N
be any orthogonal basis. Then the map U : H → `2 (N), f 7→ (huj , f i)j∈N is
unitary. Indeed by Theorem 2.4 (iii) it is norm preserving andPhence injective.
To see that it is onto, let a ∈ `2 (N) and observe that by k nj=m aj uj k2 =
Pn
j=m |aj | the vector f := j∈N aj uj is well defined and satisfies aj =
2
P
huj , f i. In particular,
Theorem 2.6. Any separable infinite dimensional Hilbert space is unitarily
equivalent to `2 (N).
Of course the same argument shows that every finite dimensional Hilbert
space of dimension n is unitarily equivalent to Cn with the usual scalar
product.
Finally we briefly turn to the case where H is not separable.
Theorem 2.7. Every Hilbert space has an orthonormal basis.
Proof. To prove this we need to resort to Zorn’s lemma (see Appendix A):
The collection of all orthonormal sets in H can be partially ordered by in-
clusion. Moreover, every linearly ordered chain has an upper bound (the
union of all sets in the chain). Hence Zorn’s lemma implies the existence of
a maximal element, that is, an orthonormal set which is not a proper subset
of every other orthonormal set.
It is not difficult to show that every almost periodic function has a mean
value Z T
1
M (f ) := lim f (t)dt
T →∞ 2T −T
and one can show that
hf, gi := M (f ∗ g)
defines a scalar product on AP (R) (only positivity is nontrivial and it will
not be shown here). Note that kf k ≤ kf k∞ . Abbreviating eθ (t) = eiθt one
computes M (eθ ) = 0 if θ 6= 0 and M (e0 ) = 1. In particular, {eθ }θ∈R is an
uncountable orthonormal set and
f (t) 7→ fˆ(θ) := heθ , f i = M (e−θ f )
maps AP (R) isometrically (with respect to k.k) into `2 (R). This map is
however not surjective (take e.g. a Fourier series which converges in mean
square but not uniformly — see later).
Problem* 2.1. Given some vectors f1 , . . . , fn we define their Gram de-
terminant as
Γ(f1 , . . . , fn ) := det (hfj , fk i)1≤j,k≤n .
Show that the Gram determinant is nonzero if and only if the vectors are
linearly independent. Moreover, show that in this case
Γ(f1 , . . . , fn , g)
dist(g, span{f1 , . . . , fn })2 =
Γ(f1 , . . . , fn )
and
n
Y
Γ(f1 , . . . , fn ) ≤ kfj k2 .
j=1
with equality if the vectors are orthogonal. (Hint: How does Γ change when
you apply the Gram–Schmidt procedure?)
Problem 2.2. Let {uj } be some orthonormal basis. Show that a bounded
linear operator A is uniquely determined by its matrix elements Ajk :=
huj , Auk i with respect to this basis.
Problem 2.3. Give an example of a nonempty closed bounded subset of a
Hilbert space which does not contain an element with minimal norm. Can
this happen in finite dimensions? (Hint: Look for a discrete set.)
48 2. Hilbert spaces
and fm − f we obtain
kfn − fm k2 = 2(kf − fn k2 + kf − fm k2 ) − 4kf − 21 (fn + fm )k2
≤ 2(kf − fn k2 + kf − fm k2 ) − 4d2 ,
which shows that fn is Cauchy and hence converges to some point which
we call P (f ). By construction kP (f ) − f k = d. If there would be another
point P̃ (f ) with the same property, we could apply the parallelogram law
to P (f ) − f and P̃ (f ) − f giving kP (f ) − P̃ (f )k2 ≤ 0 and hence P (f ) is
uniquely defined.
Next, let f ∈ H, g ∈ K and consider g̃ = (1 − t)P (f ) + t g ∈ K, t ∈ [0, 1].
Then
0 ≥ kf − P (f )k2 − kf − g̃k2 = 2tRe(hf − P (f ), g − P (f )i) − t2 kg − P (f )k2
for arbitrary t ∈ [0, 1] shows Re(hf − P (f ), P (f ) − gi) ≥ 0. Consequently
we have Re(hf − P (f ), P (f ) − P (g)i) ≥ 0 for all f, g ∈ H. Now reverse
to roles of f, g and add the two inequalities to obtain kP (f ) − P (g)k2 ≤
Rehf − g, P (f ) − P (g)i ≤ kf − gkkP (f ) − P (g)k. Hence Lipschitz continuity
follows.
this raises the question how this results extends to the infinite dimensional
setting. As a first result we show that the Riesz lemma, Theorem 2.10, im-
plies that a bounded operator A is uniquely determined by its associated
sesquilinear form hg, Af i. In fact, there is a one-to-one correspondence be-
tween bounded operators and bounded sesquilinear forms:
Lemma 2.12. Suppose s : H2 × H1 → C is a bounded sesquilinear form; that
is,
|s(g, f )| ≤ CkgkH2 kf kH1 . (2.24)
Then there is a unique bounded operator A ∈ L (H1 , H2 ) such that
s(g, f ) = hg, Af iH2 . (2.25)
Moreover, the norm of A is given by
kAk = sup |hg, Af iH2 | ≤ C. (2.26)
kgkH2 =kf kH1 =1
Note that if {uk }k∈K ⊆ H1 and {vj }j∈J ⊆ H2 are some orthogonal bases,
then the matrix elements Aj,k := hvj , Auk iH2 for all (j, k) ∈ J × K uniquely
determine hg, Af iH2 for arbitrary f ∈ H1 , g ∈ H2 (just expand f, g with
respect to these bases) and thus A by our theorem.
Example 2.5. Consider `2 (N) and let A ∈ L (`2 (N)) be some bounded
operator. Let Ajk = hδ j , Aδ k i be its matrix elements such that
∞
X
(Aa)j = Ajk ak .
k=1
Here the sum converges in `2 (N) and hence, in particular, for every fixed
j. Moreover, choosing ank = αn Ajk for k ≤ n and ank = 0 for k > n with
αn = ( nj=1 |Ajk |2 )1/2 we see αn = |(Aan )j | ≤ kAkkan k = kAk. Thus
P
P∞
j=1 |Ajk | ≤ kAk and the sum is even absolutely convergent.
2 2
Moreover, for A ∈ L (H) the polarization identity (Problem 1.20) implies
that A is already uniquely determined by its quadratic form qA (f ) := hf, Af i.
As a first application we introduce the adjoint operator via Lemma 2.12
as the operator associated with the sesquilinear form s(f, g) := hAf, giH2 .
52 2. Hilbert spaces
αg.
Example 2.9. Let H := `2 (N), a ∈ `∞ (N) and consider the multiplication
operator
(Ab)j := aj bj .
Then
∞
X ∞
X
∗
hAb, ci = (aj bj ) cj = b∗j (a∗j cj ) = hb, A∗ ci
j=1 j=1
with (A∗ c)j = a∗j cj , that is, A∗ is the multiplication operator with a∗ .
Example 2.10. Let H := `2 (N) and consider the shift operators defined via
(S ± a)j := aj±1
with the convention that a0 = 0. That is, S − shifts a sequence to the right
and fills up the left most place by zero and S + shifts a sequence to the left
dropping the left most place:
S − (a1 , a2 , a3 , · · · ) = (0, a1 , a2 , · · · ), S + (a1 , a2 , a3 , · · · ) = (a2 , a3 , a4 , · · · ).
Then
∞
X ∞
X
hS − a, bi = a∗j−1 bj = a∗j bj+1 = ha, S + bi,
j=2 j=1
which shows that (S − )∗
= S+.
Using symmetry of the scalar product we
also get hb, S − ai = hS + b, ai, that is, (S + )∗ = S − .
Note that S + is a left inverse of S − , S + S − = I, but not a right inverse
as S − S + 6= I. This is different from the finite dimensional case, where a left
inverse is also a right inverse and vice versa.
Example 2.11. Suppose U ∈ L (H1 , H2 ) is unitary. Then U ∗ = U −1 . This
follows from Lemma 2.12 since hf, giH1 = hU f, U giH2 = hf, U ∗ U giH1 implies
U ∗ U = IH1 . Since U is bijective we can multiply this last equation from the
right with U −1 to obtain the claim. Of course this calculation shows that
2.3. Operators defined via forms 53
Proof. (i) is obvious. (ii) follows from hg, A∗∗ f iH2 = hA∗ g, f iH1 = hg, Af iH2 .
(iii) follows from hg, (CA)f iH3 = hC ∗ g, Af iH2 = hA∗ C ∗ g, f iH1 . (iv) follows
using (2.26) from
kA∗ k = sup |hf, A∗ giH1 | = sup |hAf, giH2 |
kf kH1 =kgkH2 =1 kf kH1 =kgkH2 =1
and
kA∗ Ak = sup |hf, A∗ AgiH1 | = sup |hAf, AgiH2 |
kf kH1 =kgkH2 =1 kf kH1 =kgkH2 =1
where we have used that |hAf, AgiH2 | attains its maximum when Af and Ag
are parallel (compare Theorem 1.5).
Note that kAk = kA∗ k implies that taking adjoints is a continuous op-
eration. For later use also note that (Problem 2.11)
Ker(A∗ ) = Ran(A)⊥ . (2.28)
For the remainder of this section we restrict to the case of one Hilbert
space. A sesquilinear form s : H × H → C is called nonnegative if s(f, f ) ≥ 0
and we will call A ∈ L (H) nonnegative, A ≥ 0, if its associated sesquilinear
form is. We will write A ≥ B if A − B ≥ 0. Observe that nonnegative
operators are self-adjoint (as their quadratic forms are real-valued — here it
is important that the underlying space is complex; in case of a real space a
nonnegative form is required to be symmetric).
Example 2.12. For any operator A the operators A∗ A and AA∗ are both
nonnegative. In fact hf, A∗ Af i = hAf, Af i = kAf k2 ≥ 0 and similarly
hf, AA∗ f i = kA∗ f k2 ≥ 0.
54 2. Hilbert spaces
Combining the last two results we obtain the famous Lax–Milgram the-
orem which plays an important role in theory of elliptic partial differential
equations.
Theorem 2.16 (Lax–Milgram). Let s : H × H → C be a sesquilinear form
which is
• bounded, |s(f, g)| ≤ Ckf k kgk, and
• coercive, |s(f, f )| ≥ εkf k2 for some ε > 0.
Then for every g ∈ H there is a unique f ∈ H such that
s(h, f ) = hh, gi, ∀h ∈ H. (2.29)
Moreover, kf k ≤ 1ε kgk.
In particular, this shows A ≥ 0. Moreover, we have |sA (a, b)| ≤ 4kak2 kbk2
or equivalently kAk ≤ 4.
Next, let
(Qa)j = qj aj
for some sequence q ∈ `∞ (N). Then
∞
X
sQ (a, b) = qj a∗j bj
j=1
and |sQ (a, b)| ≤ kqk∞ kak2 kbk2 or equivalently kQk ≤ kqk∞ . If in addition
qj ≥ ε > 0, then sA+Q (a, b) = sA (a, b) + sQ (a, b) satisfies the assumptions of
the Lax–Milgram theorem and
(A + Q)a = b
has a unique solution a = (A + Q)−1 b for every given b ∈ `2 (Z). Moreover,
since (A + Q)−1 is bounded, this solution depends continuously on b.
Problem* 2.7. Let H1 , H2 be Hilbert spaces and let u ∈ H1 , v ∈ H2 . Show
that the operator
Af := hu, f iv
is bounded and compute its norm. Compute the adjoint of A.
Problem 2.8. Show that under the assumptions of Problem 1.36 one has
f (A)∗ = f # (A∗ ) where f # (z) = f (z ∗ )∗ .
Problem* 2.9. Prove (2.26). (Hint: Use kf k = supkgk=1 |hg, f i| — com-
pare Theorem 1.5.)
Problem 2.10. Suppose A ∈ L (H1 , H2 ) has a bounded inverse A−1 ∈
L (H2 , H1 ). Show (A−1 )∗ = (A∗ )−1 .
Problem* 2.11. Show (2.28).
Problem* 2.12. Show that every operator A ∈ L (H) can be written as the
linear combination of two self-adjoint operators Re(A) := 12 (A + A∗ ) and
Im(A) := 2i1 (A − A∗ ). Moreover, every self-adjoint operator can be written
as a linear combination√ of two unitary operators. (Hint: For the last part
consider f± (z) = z ± i 1 − z 2 and Problems 1.36, 2.8.)
Problem 2.13 (Abstract Dirichlet problem). Show that the solution of
(2.29) is also the unique minimizer of
1
h 7→ Re s(h, h) − hh, gi .
2
if s is nonnegative with s(w, w) ≥ εkwk2 for all w ∈ H.
56 2. Hilbert spaces
Similarly, if H and H̃ are two Hilbert spaces, we define their tensor prod-
uct as follows: The elements should be products f ⊗ f˜ of elements f ∈ H
and f˜ ∈ H̃. Hence we start with the set of all finite linear combinations of
elements of H × H̃
Xn
F(H, H̃) := { αj (fj , f˜j )|(fj , f˜j ) ∈ H × H̃, αj ∈ C}. (2.33)
j=1
and write f ⊗ f˜ for the equivalence class of (f, f˜). By construction, every
element in this quotient space is a linear combination of elements of the type
f ⊗ f˜.
2.4. Orthogonal sums and tensor products 57
is a symmetric sesquilinear form on F(H, H̃)/N (H, H̃). To show that this is in
fact a scalar product, we need to ensure positivity. Let f = i αi fi ⊗ f˜i 6= 0
P
and pick orthonormal bases uj , ũk for span{fi }, span{f˜i }, respectively. Then
X X
f= αjk uj ⊗ ũk , αjk = αi huj , fi iH hũk , f˜i iH̃ (2.38)
j,k i
and we compute X
hf, f i = |αjk |2 > 0. (2.39)
j,k
The completion of F(H, H̃)/N (H, H̃) with respect to the induced norm is
called the tensor product H ⊗ H̃ of H and H̃.
Lemma 2.17. If uj , ũk are orthonormal bases for H, H̃, respectively, then
uj ⊗ ũk is an orthonormal basis for H ⊗ H̃.
where equality has to be understood in the sense that both spaces are uni-
tarily equivalent by virtue of the identification
∞
X ∞
X
( fj ) ⊗ f = fj ⊗ f. (2.41)
j=1 j=1
3
D3 (x)
2
D2 (x)
1
D1 (x)
−π π
is known as the Dirichlet kernel (to obtain the second form observe that
the left-hand side is a geometric series). Note that Dn (−x) = Dn (x) and
that |Dn (x)| has a global
R π maximum Dn (0) = 2n + 1 at x = 0. Moreover, by
Sn (1) = 1 we see that −π Dn (x)dx = 1.
Since Z π
e−ikx eilx dx = 2πδk,l (2.47)
−π
the functions ek (x) :=(2π)−1/2 eikxare orthonormal in L2 (−π, π) and hence
the Fourier series is just the expansion with respect to this orthogonal set.
Hence we obtain
Theorem 2.18. For every square integrable function f ∈ L2 (−π, π), the
Fourier coefficients fˆk are square summable
Z π
X
ˆ 1
2
|fk | = |f (x)|2 dx (2.48)
2π −π
k∈Z
Proof. To show this theorem it suffices to show that the functions ek form
a basis. This will follow from Theorem 2.21 below (see the discussion after
this theorem). It will also follow as a special case of Theorem 3.11 below
(see the examples after this theorem) as well as from the Stone–Weierstraß
theorem — Problem 2.21.
This gives a satisfactory answer in the Hilbert space L2 (−π, π) but does
not answer the question about pointwise or uniform convergence. The latter
60 2. Hilbert spaces
will be the case if the Fourier coefficients are summable. First of all we note
that for integrable functions the Fourier coefficients will at least tend to zero.
Lemma 2.19 (Riemann–Lebesgue lemma). Suppose f ∈ L1 (−π, π), then
the Fourier coefficients fˆk converge to zero as |k| → ∞.
Proof. By our previous theorem this holds for continuous functions. But the
map f → fˆ is bounded from C[−π, π] ⊂ L1 (−π, π) to c0 (Z) (the sequences
vanishing as |k| → ∞) since |fˆk | ≤ (2π)−1 kf k1 and there is a unique exten-
sion to all of L1 (−π, π).
It turns out that this result is best possible in general and we cannot say
more without additional assumptions on f . For example, if f is periodic of
period 2π and continuously differentiable, then integration by parts shows
Z π
1
ˆ
fk = e−ikx f 0 (x)dx. (2.49)
2πik −π
Then, since both k −1 and the Fourier coefficients of f 0 are square summa-
ble, we conclude that fˆ is absolutely summable and hence the Fourier series
converges uniformly. So we have a simple sufficient criterion for summa-
bility of the Fourier coefficients, but can we do better? Of course conti-
nuity of f is a necessary condition for absolute summability but this alone
will not even be enough for pointwise convergence as we will see in Exam-
ple 4.3. Moreover, continuity will not tell us more about the decay of the
Fourier coefficients than what we already know in the integrable case from
the Riemann–Lebesgue lemma (see Example 4.4).
A few improvements are easy: (2.49) holds for any class of functions
for which integration by parts holds, e.g., piecewise continuously differen-
tiable functions or, slightly more general, absolutely continuous functions
(cf. Lemma 4.30 from [47]) provided one assumes that the derivative is
square integrable. However, for an arbitrary absolutely continuous func-
tion the Fourier coefficients might not be absolutely summable: For an
absolutely continuous function f we have a derivative which is integrable
(Theorem 4.29 from [47]) and hence the above formula combined with the
Riemann–Lebesgue lemma implies fˆk = o( k1 ). But on the other hand we
can choose an absolutely summable sequence ck which does not obey this
asymptotic requirement, say ck = k1 for k = l2 and ck = 0 else. Then
X X 1 2
f (x) := ck eikx = eil x (2.50)
l2
k∈Z l∈N
Proof. The proof starts with the observation that the Fourier coefficients of
fδ (x) := f (x−δ) are fˆk = e−ikδ fˆk . Now for δ := 2π
3 2
−m and 2m ≤ |k| < 2m−1
we have |e − 1| ≥ 3 implying
ikδ 2
Z π
X 1 X ikδ 1
|fˆk |2 ≤ |e − 1|2 |fˆk |2 = |fδ (x) − f (x)|2 dx
m m−1
3 2π −π
2 ≤|k|<2 k
≤ [f ]2γ δ 2γ
Now the sum on the left has 2(2m − 1) terms and hence Cauchy–Schwarz
implies
γ
X √ 2π
ˆ
|fk | ≤ 2 (m+1)/2 γ
[f ]γ δ = 2 2(1/2−γ)m [f ]γ .
m m−1
3
2 ≤|k|<2
provided γ > 1
2 and establishes the claim since |fˆ0 | ≤ kf k∞ .
Note however, that the situation looks much brighter if one looks at mean
values
n−1 Z π
1X 1
S̄n (f )(x) := Sk (f )(x) = Fn (x − y)f (y)dy, (2.51)
n 2π −π
k=0
where
n−1 2
1X 1 sin(nx/2)
Fn (x) = Dk (x) = (2.52)
n n sin(x/2)
k=0
is the Fejér kernel. To see the second form we use the closed form for the
Dirichlet kernel to obtain
n−1 n−1
X sin((k + 1/2)x) 1 X
nFn (x) = = Im ei(k+1/2)x
sin(x/2) sin(x/2)
k=0 k=0
inx sin(nx/2) 2
1 ix/2 e −1 1 − cos(nx)
= Im e = = .
sin(x/2) eix − 1 2 sin(x/2)2 sin(x/2)
62 2. Hilbert spaces
F3 (x)
F2 (x)
F1 (x)
1
−π π
In particular, this shows that the functions {ek }k∈Z are total in Cper [−π, π]
(continuous periodic functions) and hence also in Lp (−π, π) for 1 ≤ p < ∞
(Problem 2.20).
Note that for a given continuous function f this result shows that if
Sn (f )(x) converges, then it must converge to S̄n (f )(x) = f (x). We also
remark that one can extend this result (see Lemma 3.21 from [47]) to show
that for f ∈ Lp (−π, π), 1 ≤ p < ∞, one has S̄n (f ) → f in the sense of Lp .
As a consequence note that the Fourier coefficients uniquely determine f for
integrable f (for square integrable f this follows from Theorem 2.18).
Finally, we look at pointwise convergence.
0,γ
Problem 2.19. Show that if f ∈ Cper [−π, π] is Hölder continuous (cf.
(1.67)), then
[f ]γ π γ
ˆ
|fk | ≤ , k 6= 0.
2 |k|
(Hint: What changes if you replace e−iky by e−ik(y+π/k) in (2.44)? Now make
a change of variables y → y − π/k in the integral.)
Problem 2.20. Show that Cper [−π, π] is dense in Lp (−π, π) for 1 ≤ p < ∞.
Problem 2.21. Show that the functions ek (x) := √12π eikx , k ∈ Z, form an
orthonormal basis for H = L2 (−π, π). (Hint: Start with K = [−π, π] where
−π and π are identified and use the Stone–Weierstraß theorem.)
Chapter 3
Compact operators
Typically, linear operators are much more difficult to analyze than matrices
and many new phenomena appear which are not present in the finite dimen-
sional case. So we have to be modest and slowly work our way up. A class
of operators which still preserves some of the nice properties of matrices is
the class of compact operators to be discussed in this chapter.
65
66 3. Compact operators
Proof. Let fj0 be a bounded sequence. Choose a subsequence fj1 such that
A1 fj1 converges. From fj1 choose another subsequence fj2 such that A2 fj2
converges and so on. Since there might be nothing left from fjn as n → ∞, we
consider the diagonal sequence fj := fjj . By construction, fj is a subsequence
of fjn for j ≥ n and hence An fj is Cauchy for every fixed n. Now
kAfj − Afk k = k(A − An )(fj − fk ) + An (fj − fk )k
≤ kA − An kkfj − fk k + kAn fj − An fk k
shows that Afj is Cauchy since the first term can be made arbitrary small
by choosing n large and the second by the Cauchy property of An fj .
Example 3.2. Let X := `p (N) and consider the operator
(Qa)j := qj aj
for some sequence q = (qj )∞ j=1 ∈ c0 (N) converging to zero. Let Qn be
associated with qj = qj for j ≤ n and qjn = 0 for j > n. Then the range of
n
Proof. First of all note that K(., ..) is continuous on [a, b] × [a, b] and hence
uniformly continuous. In particular, for every ε > 0 we can find a δ > 0 such
that |K(y, t) − K(x, t)| ≤ ε for any t ∈ [a, b] whenever |y − x| ≤ δ. Moreover,
kKk∞ = supx,y∈[a,b] |K(x, y)| < ∞.
We begin with the case X := L2cont (a, b). Let g := Kf . Then
Z b Z b
|g(x)| ≤ |K(x, t)| |f (t)|dt ≤ kKk∞ |f (t)|dt ≤ kKk∞ k1k kf k,
a a
where
√ we have used Cauchy–Schwarz in the last step (note that k1k =
b − a). Similarly,
Z b
|g(x) − g(y)| ≤ |K(y, t) − K(x, t)| |f (t)|dt
a
Z b
≤ε |f (t)|dt ≤ εk1k kf k,
a
Problem* 3.2. Show that the adjoint of the integral operator K on L2cont (a, b)
from Lemma 3.4 is the integral operator with kernel K(y, x)∗ :
Z b
∗
(K f )(x) = K(y, x)∗ f (y)dy.
a
(Hint: Fubini.)
d
Problem 3.3. Show that the operator dx : C 2 [a, b] → C[a, b] is compact.
(Hint: Arzelà–Ascoli.)
in the sequence q). If z is different from all entries of the sequence then u = 0
and z is no eigenvalue.
Note that in the last example Q will be self-adjoint if and only if q is real-
valued and hence if and only if all eigenvalues are real-valued. Moreover, the
corresponding eigenfunctions are orthogonal. This has nothing to do with
the simple structure of our operator and is in fact always true.
Theorem 3.5. Let A be symmetric. Then all eigenvalues are real and eigen-
vectors corresponding to different eigenvalues are orthogonal.
The previous example shows that in the infinite dimensional case sym-
metry is not enough to guarantee existence of even a single eigenvalue. In
order to always get this, we will need an extra condition. In fact, we will
see that compactness provides a suitable extra condition to obtain an or-
thonormal basis of eigenfunctions. The crucial step is to prove existence of
one eigenvalue, the rest then follows as in the finite dimensional case.
Theorem 3.6. Let H be an inner product space. A symmetric compact
operator A has an eigenvalue α1 which satisfies |α1 | = kAk.
all uk with k < j and hence the eigenvectors {uj } form an orthonormal set.
By construction we also have |αj | = kAj k ≤ kAj−1 k = |αj−1 |. This proce-
dure will not stop unless H is finite dimensional. However, note that αj = 0
for j ≥ n might happen if An = 0.
Theorem 3.7 (Hilbert–Schmidt; Spectral theorem for compact symmetric
operators). Suppose H is an infinite dimensional Hilbert space and A : H →
H is a compact symmetric operator. Then there exists a sequence of real
eigenvalues αj converging to 0. The corresponding normalized eigenvectors
uj form an orthonormal set and every f ∈ H can be written as
∞
X
f= huj , f iuj + h, (3.7)
j=1
we have
kA(f − fn )k ≤ |αn+1 |kf − fn k ≤ |αn+1 |kf k
since f − fn ∈ Hn and kAn k = |αn+1 |. Letting n → ∞ shows A(f∞ − f ) = 0
proving (3.7). Finally, note that without completeness f∞ might not be
well-defined unless h = 0.
Remark: There are two cases where our procedure might fail to construct
an orthonormal basis of eigenvectors. One case is where there is an infinite
number of nonzero eigenvalues. In this case αn never reaches 0 and all eigen-
vectors corresponding to 0 are missed. In the other case, 0 is reached, but
there might not be a countable basis and hence again some of the eigen-
vectors corresponding to 0 are missed. In any case, by adding vectors from
the kernel (which are automatically eigenvectors), one can always extend the
eigenvectors uj to an orthonormal basis of eigenvectors.
Corollary 3.9. Every compact symmetric operator A has an associated or-
thonormal basis of eigenvectors {uj }j∈J . The corresponding unitary map
U : H → `2 (J), f 7→ {huj , f i}j∈J diagonalizes A in the sense that U AU −1 is
the operator which multiplies each basis vector δ j = U uj by the corresponding
eigenvalue αj .
Example 3.7. Let a, b ∈ c0 (N) be real-valued sequences and consider the
operator
(Jc)j := aj cj+1 + bj cj + aj−1 cj−1 .
If A, B denote the multiplication operators by the sequences a, b, respec-
tively, then we already know that A and B are compact. Moreover, using
the shift operators S ± we can write
J = AS + + B + S − A,
which shows that J is self-adjoint since A∗ = A, B ∗ = B, and (S ± )∗ =
S ∓ . Hence we can conclude that J has a countable number of eigenvalues
converging to zero and a corresponding orthonormal basis of eigenvectors.
In particular, in the new picture it is easy to define functions of our
operator (thus extending the functional calculus from Problem 1.36). To this
end set Σ := {αj }j∈J and denote by B(K) the Banach algebra of bounded
functions F : K → C together with the sup norm.
Corollary 3.10 (Functional calculus). Let A be a compact symmetric op-
erator with associated orthonormal basis of eigenvectors {uj }j∈J and corre-
sponding eigenvalues {αj }j∈J . Suppose F ∈ B(Σ), then
X
F (A)f = F (αj )huj , f iuj (3.9)
j∈J
X 1
RA (z)f := huj , f iuj (3.11)
αj − z
j∈J
αj
Moreover, if we use 1
αj −z = z(αj −z) − 1
z we can rewrite this as
N
1 X αj
RA (z)f = huj , f iuj − f
z αj − z
j=1
happens to satisfy u(1) = 0 and these are precisely the eigenvalues we are
looking for.
Note that the fact that L2cont (0, 1) is not complete causes no problems
since we can always replace it by its completion H = L2 (0, 1). A thorough
investigation of this completion will be given later, at this point this is not
essential.
We first verify that L is symmetric:
Z 1
hf, Lgi = f (x)∗ (−g 00 (x) + q(x)g(x))dx
0
Z 1 Z 1
0 ∗ 0
= f (x) g (x)dx + f (x)∗ q(x)g(x)dx
0 0
Z 1 Z 1
00 ∗
= −f (x) g(x)dx + f (x)∗ q(x)g(x)dx (3.16)
0 0
= hLf, gi.
Here we have used integration by parts twice (the boundary terms vanish
due to our boundary conditions f (0) = f (1) = 0 and g(0) = g(1) = 0).
Of course we want to apply Theorem 3.7 and for this we would need to
show that L is compact. But this task is bound to fail, since L is not even
bounded (see Example 1.17)!
So here comes the trick: If L is unbounded its inverse L−1 might still
be bounded. Moreover, L−1 might even be compact and this is the case
here! Since L might not be injective (0 might be an eigenvalue), we consider
RL (z) := (L − z)−1 , z ∈ C, which is also known as the resolvent of L.
In order to compute the resolvent, we need to solve the inhomogeneous
equation (L − z)f = g. This can be done using the variation of constants
formula from ordinary differential equations which determines the solution
up to an arbitrary solution of the homogeneous equation. This homogeneous
equation has to be chosen such that f ∈ D(L), that is, such that f (0) =
f (1) = 0.
Define
Z x
u+ (z, x)
f (x) := u− (z, t)g(t)dt
W (z) 0
Z 1
u− (z, x)
+ u+ (z, t)g(t)dt , (3.17)
W (z) x
where u± (z, x) are the solutions of the homogeneous differential equation
−u00± (z, x)+(q(x)−z)u± (z, x) = 0 satisfying the initial conditions u− (z, 0) =
0, u0− (z, 0) = 1 respectively u+ (z, 1) = 0, u0+ (z, 1) = 1 and
W (z) := W (u+ (z), u− (z)) = u0− (z, x)u+ (z, x) − u− (z, x)u0+ (z, x) (3.18)
76 3. Compact operators
Now, by (2.18), ∞ j=0 |huj , gi| = kgk and hence the first term is part of a
2 2
P
convergent series. Similarly, the second term can be estimated independent
of x since
Z 1
αn un (x) = RL (λ)un (x) = G(λ, x, t)un (t)dt = hun , G(λ, x, .)i
0
78 3. Compact operators
implies
n
X ∞
X Z 1
2 2
|αj uj (x)| ≤ |huj , G(λ, x, .)i| = |G(λ, x, t)|2 dt ≤ M (λ)2 ,
j=m j=0 0
which we call the form domain of L. Here Cp1 [a, b] denotes the set of
piecewise continuously differentiable functions f in the sense that f is con-
tinuously differentiable except for a finite number of points at which it is
continuous and the derivative has limits from the left and right. In fact, any
class of functions for which the partial integration needed to obtain (3.26)
can be justified would be good enough (e.g. the set of absolutely continuous
functions to be discussed in Section 4.4 from [47]).
which implies
n
X
Ej |huj , f i|2 ≤ qL (f ).
j=m
In particular, note that this estimate applies to f (y) = G(λ, x, y). Now
we can proceed as in the proof of the previous theorem (with λ = 0 and
αj = Ej−1 )
n
X n
X
|huj , f iuj (x)| = Ej |huj , f ihuj , G(0, x, .)i|
j=m j=m
1/2
n
X n
X
≤ Ej |huj , f i|2 Ej |huj , G(0, x, .)i|2
j=m j=m
1/2
n
X
≤ Ej |huj , f i|2 qL (G(0, x, .))1/2 ,
j=m
Proof. Using the conventions from the proof of the previous lemma we have
huj , G(0, x, .)i = Ej−1 uj (x) and since G(0, x, .) ∈ Q(L) for fixed x ∈ [a, b] we
have
∞
X 1
uj (x)uj (y) = G(0, x, y),
Ej
j=0
Ej
where C(z) := supj |Ej −z| .
Finally, the last claim follows upon computing the integral using (3.28)
and observing kuj k = 1.
which is convergent with respect to our scalar product. If f ∈ Cp1 [0, 1] with
f (0) = f (1) = 0 the series will converge uniformly. For an application of the
trace formula see Problem 3.10.
3.3. Applications to Sturm–Liouville operators 81
Example 3.9. We could also look at the same equation as in the previous
problem but with different boundary conditions
u0 (0) = u0 (1) = 0.
Then (
2 2 1, n = 0,
En = π n , un (x) = √
2 cos(nπx), n ∈ N.
Moreover, every function f ∈ L2cont (0, 1) can be expanded into a Fourier
cosine series
∞
X Z 1
f (x) = fn un (x), fn := un (x)f (x)dx,
n=1 0
∞
X ∞
X
hf, Af i = hf, αj γj uj i = αj |γj |2 , f ∈ D(A), (3.31)
j=1 j=1
where
U (f1 , . . . , fj ) := {f ∈ D(A)| kf k = 1, f ∈ span{f1 , . . . , fj }⊥ }. (3.35)
Proof. We have
inf hf, Af i ≤ αj .
f ∈U (f1 ,...,fj−1 )
Pj
In fact, set f = k=1 γk uk and choose γk such that f ∈ U (f1 , . . . , fj−1 ).
Then
j
X
hf, Af i = |γk |2 αk ≤ αj
k=1
and the claim follows.
Conversely, let γk = huk , f i and write f = jk=1 γk uk + f˜. Then
P
XN
inf hf, Af i = inf |γk |2 αk + hf˜, Ãf˜i = αj .
f ∈U (u1 ,...,uj−1 ) f ∈U (u1 ,...,uj−1 )
k=j
where the inf is taken over subspaces with the indicated properties.
Problem* 3.12. Prove Theorem 3.16.
Problem 3.13. Suppose A, An are self-adjoint, bounded and An → A.
Then αk (An ) → αk (A). (Hint: For B self-adjoint kBk ≤ ε is equivalent to
−ε ≤ B ≤ ε.)
Moreover, kKuj k2 = huj , K ∗ Kuj i = huj , s2j uj i = s2j shows that we can set
sj := kKuj k > 0. (3.39)
The numbers sj = sj (K) are called singular values of K. There are either
finitely many singular values or they converge to zero.
Theorem 3.17 (Singular value decomposition of compact operators). Let
K ∈ C (H1 , H2 ) be compact and let sj be the singular values of K and {uj } ⊂
H1 corresponding orthonormal eigenvectors of K ∗ K. Then
X
K= sj huj , .ivj , (3.40)
j
86 3. Compact operators
where vj = s−1
j Kuj . The norm of K is given by the largest singular value
as required. Furthermore,
hvj , vk i = (sj sk )−1 hKuj , Kuk i = (sj sk )−1 hK ∗ Kuj , uk i = sj s−1
k huj , uk i
In particular, note
sj (AK) ≤ kAksj (K), sj (KA) ≤ kAksj (K) (3.46)
whenever K is compact and A is bounded (the second estimate follows from
the first by taking adjoints).
An operator K ∈ L (H1 , H2 ) is called a finite rank operator if its
range is finite dimensional. The dimension
rank(K) := dim Ran(K)
is called the rank of K. Since for a compact operator
Ran(K) = span{vj } (3.47)
we see that a compact operator is finite rank if and only if the sum in (3.40)
is finite. Note that the finite rank operators form an ideal in L (H) just as
the compact operators do. Moreover, every finite rank operator is compact
by the Heine–Borel theorem (Theorem B.22).
Now truncating the sum in the canonical form gives us a simple way to
approximate compact operators by finite rank ones. Moreover, this is in fact
the best approximation within the class of finite rank operators:
Lemma 3.19. Let K ∈ C (H1 , H2 ) be compact and let its singular values be
ordered. Then
sj (K) = min kK − F k, (3.48)
rank(F )<j
where the minimum is attained for
j−1
X
Fj−1 := sk huk , .ivk . (3.49)
k=1
In particular, the closure of the ideal of finite rank operators in L (H) is the
ideal of compact operators.
Proof. That there is equality for F = Fj−1 follows from (3.41). In general,
the restriction of F to span{u1 , . . . , uj } will have a nontrivial kernel. Let
f = jk=1 αj uj be a normalized element of this kernel, then k(K − F )f k2 =
P
Proof. Just observe that K ∗ K compact is all that was used to show Theo-
rem 3.17.
Corollary 3.21. An operator K ∈ L (H1 , H2 ) is compact (finite rank) if
and only K ∗ ∈ L (H2 , H1 ) is. In fact, sj (K) = sj (K ∗ ) and
X
K∗ = sj hvj , .iuj . (3.50)
j
Proof. First of all note that (3.50) follows from (3.40) since taking ad-
joints is continuous and (huj , .ivj )∗ = hvj , .iuj (cf. Problem 2.7). The rest is
straightforward.
From this last lemma one easily gets a number of useful inequalities for
the singular values:
Corollary 3.22. Let K1 and K2 be compact and let sj (K1 ) and sj (K2 ) be
ordered. Then
(i) sj+k−1 (K1 + K2 ) ≤ sj (K1 ) + sk (K2 ),
(ii) sj+k−1 (K1 K2 ) ≤ sj (K1 )sk (K2 ),
(iii) |sj (K1 ) − sj (K2 )| ≤ kK1 − K2 k.
where the minimum is taken over all subspaces with the indicated dimension.
Moreover, the minimum is attained for
M = span{uk }j−1
k=1 , N = span{vk }j−1
k=1 .
The two most important cases are p = 1 and p = 2: J2 (H) is the space
of Hilbert–Schmidt operators and J1 (H) is the space of trace class
operators.
Example 3.14. Any multiplication operator by a sequence from `p (N) is in
the Schatten p-class of H = `2 (N).
Example 3.15. By virtue of the Weyl asymptotics (see Example 3.12) the
resolvent of our Sturm–Liouville operator is trace class.
90 3. Compact operators
Proof. First of all note that (3.55) implies that K is compact. To see this,
let Pn be the projection onto the space spanned by the first n elements of
the orthonormal basis {wj }. Then Kn = KPn is finite rank and converges
to K since
X X X 1/2
k(K − Kn )f k = k cj Kwj k ≤ |cj |kKwj k ≤ kKwj k2 kf k,
j>n j>n j>n
where f = j cj wj .
P
Proof. This follows from (3.56) upon using the triangle inequality for H and
for `2 (J).
But then
X XZ b Z bX
2 2
kKwj k = |(Kwj )(x)| dx = |(Kwj )(x)|2 dx
j∈N j∈N a a j∈N
2
≤ (b − a)M
as claimed.
Since Hilbert–Schmidt operators turn out easy to identify (cf. also Sec-
tion 3.5 from [47]), it is important to relate J1 (H) with J2 (H):
Lemma 3.26. An operator is trace class if and only if it can be written as
the product of two Hilbert–Schmidt operators, K = K1 K2 , and in this case
we have
kKk1 ≤ kK1 k2 kK2 k2 . (3.58)
In fact, K1 , K2 can be chosen such that kKk1 = kK1 k2 kK2 k2 .
Lemma 3.27. If K is trace class, then for every orthonormal basis {wn }
the trace X
tr(K) = hwn , Kwn i (3.59)
n
is finite,
| tr(K)| ≤ kKk1 , (3.60)
and independent of the orthonormal basis.
Clearly for self-adjoint trace class operators, the trace is the sum over
all eigenvalues (counted with their multiplicity). To see this, one just has to
choose the orthonormal basis to consist of eigenfunctions. This is even true
for all trace class operators and is known as Lidskij trace theorem (see [36]
for an easy to read introduction).
Example 3.19. We already mentioned that the resolvent of our Sturm–
Liouville operator is trace class. Choosing a basis of eigenfunctions we see
that the trace of the resolvent is the sum over its eigenvalues and combining
this with our trace formula (3.29) gives
∞ Z 1
X 1
tr(RL (z)) = = G(z, x, x)dx
Ej − z 0
j=0
for z ∈ C no eigenvalue.
Example 3.20. For our integral operator K from Example 3.16 we have in
the trace class case X
tr(K) = k̂j = k(0).
j∈Z
Note that this can again be interpreted as the integral over the diagonal
(2π)−1 k(x − x) = (2π)−1 k(0) of the kernel.
We also note the following elementary properties of the trace:
94 3. Compact operators
Proof. (i) and (ii) are straightforward. (iii) follows from K1 ≤ K2 if and
only if hf, K1 f i ≤ hf, K2 f i for every f ∈ H. (iv) By Problem 2.12 and (i),
it is no restriction to assume that A is unitary. Let {wn } be some ONB and
note that {w̃n = Awn } is also an ONB. Then
X X
tr(AK) = hw̃n , AK w̃n i = hAwn , AKAwn i
n n
X
= hwn , KAwn i = tr(KA)
n
and
X
kKk1 = min kfj kkgj k, (3.64)
j
Proof. To see that a trace class operator (3.40) can be written in such a
way choose fj = uj , gj = sj vj . This also shows that the minimum in (3.64)
is attained. Conversely note that the sum converges in the operator norm
3.6. Hilbert–Schmidt and trace class operators 95
N
!1/2 N
!1/2
X X X X
2
≤ |hvk , gj i| |hfj , uk i|2 ≤ kfj kkgj k.
j k=1 k=1 j
This also shows that the right-hand side in (3.64) cannot exceed kKk1 . To
see the last claim we choose an ONB {wk } to compute the trace
X XX XX
tr(K) = hwk , Kwk i = hwk , hfj , wk igj i = hhwk , fj iwk , gj i
k k j j k
X
= hfj , gj i.
j
Despite the many advantages of Hilbert spaces, there are also situations
where a non-Hilbert space is better suited (in fact the choice of the right
space is typically crucial for many problems). Hence we will devote our
attention to Banach spaces next.
Proof. Suppose X = ∞ n=1 Xn . We can assume that the sets Xn are closed
S
and none of them contains a ball; that is, X \ Xn is open and nonempty for
every n. We will construct a Cauchy sequence xn which stays away from all
Xn .
Since X \ X1 is open and nonempty, there is a ball Br1 (x1 ) ⊆ X \ X1 .
Reducing r1 a little, we can even assume Br1 (x1 ) ⊆ X \ X1 . Moreover,
since X2 cannot contain Br1 (x1 ), there is some x2 ∈ Br1 (x1 ) that is not
in X2 . Since Br1 (x1 ) ∩ (X \ X2 ) is open, there is a closed ball Br2 (x2 ) ⊆
Br1 (x1 ) ∩ (X \ X2 ). Proceeding recursively, we obtain a sequence (here we
97
98 4. The main theorems about Banach spaces
Proof. Let {On } be a family of open dense sets whose intersection is not
dense. ThenS this intersection must be missing some closed ball Bε . This ball
will lie in n Xn , where Xn := X \ On are closed and nowhere dense. Now
note that X̃n := Xn ∩ Bε are closed nowhere dense sets in Bε . But Bε is a
complete metric space, a contradiction.
Countable intersections of open sets are in some sense the next general
sets after open sets and are called Gδ sets (here G and δ stand for the German
words Gebiet and Durchschnitt, respectively). The complement of a Gδ set is
a countable union of closed sets also known as an Fσ set (here F and σ stand
for the French words fermé and somme, respectively). The complement of
a dense Gδ set will be a countable union of nowhere dense sets and hence
by definition meager. Consequently properties which hold on a dense Gδ are
considered generic in this context.
Example 4.2. The irrational numbers are a dense Gδ set in R. To see
this, let xn be an enumeration of the rational numbers and consider the
intersection of the open sets On := R \ {xn }. The rational numbers are
hence an Fσ set.
4.1. The Baire theorem and its consequences 99
By continuity of Aα and the norm, each On is a union of open sets and hence
open. Now either all of these sets are dense and hence their intersection
\
On = {x| sup kAα xk = ∞}
α
n∈N
2n
kAα xk ≤ kxk
ε
for every x.
Corollary 4.4. Let X be a Banach space and Y some normed vector space.
Let {Aα } ⊆ L (X, Y ) be a family of bounded operators. Suppose kAα xk ≤
C(x) is bounded for every fixed x ∈ X. Then {Aα } is uniformly bounded,
kAα k ≤ C.
This raises the question if a similar estimate can be true for continuous
functions. More precisely, can we find a sequence ck > 0 such that
|fˆk | ≤ Cf ck ,
where Cf is some constant depending on f . If this were true, the linear
functionals
fˆk
`k (f ) := , k ∈ Z,
ck
4.1. The Baire theorem and its consequences 101
Proof. Set BrX := BrX (0) and similarly for BrY (0). By translating balls
(using linearity of A), it suffices to prove that for every ε > 0 there is a δ > 0
such that BδY ⊆ A(BεX ).
So let ε > 0 be given. Since A is surjective we have
∞
[ ∞
[ ∞
[
Y = AX = A nBεX = A(nBεX ) = nA(BεX )
n=1 n=1 n=1
and the Baire theorem implies that for some n, nA(BεX ) contains a ball.
Since multiplication by n is a homeomorphism, the same must be true for
n = 1, that is, BδY (y) ⊂ A(BεX ). Consequently
So it remains to get rid of the closure. To this end choose εn > 0 such that
P∞
n=1 εn < ε and corresponding δn → 0 such that Bδn ⊂ A(Bεn ). Now
Y X
for y ∈ Bδ1 ⊂ A(BεX1 ) we have x1 ∈ Bε1 such that Ax1 is arbitrarily close
Y X
Conversely, if A is open, then the image of the unit ball contains again
some ball BεY ⊆ A(B1X ). Hence by scaling Brε
Y ⊆ A(B X ) and letting r → ∞
r
we see that A is onto: Y = A(X).
102 4. The main theorems about Banach spaces
Example 4.5. However, note that, under the assumptions of the open map-
ping theorem, the image of a closed set might not be closed. For example,
consider the bounded linear operator A : `2 (N) → `2 (N), x 7→ (x2 , x4 , . . . )
which is clearly surjective. Then the image of the closed set U = {x ∈
`2 (N)|x2n = x2n−1 /n} is dense (it contains all sequences with finite sup-
port) but not all of `2 (N) (e.g. yn = n1 is missing since this would imply
x2n = 1).
As an immediate consequence we get the inverse mapping theorem:
Theorem 4.6 (Inverse mapping). Let A ∈ L (X, Y ) be a continuous linear
bijection between Banach spaces. Then A−1 is continuous.
Example 4.6. Consider the operator (Aa)nj=1 = ( 1j aj )nj=1 in `2 (N). Then its
inverse (A−1 a)nj=1 = (j aj )nj=1 is unbounded (show this!). This is in agree-
ment with our theorem since its range is dense (why?) but not all of `2 (N):
For example, (bj = 1j )∞
j=1 6∈ Ran(A) since b = Aa gives the contradiction
∞
X ∞
X ∞
X
∞= 1= |jbj |2 = |aj |2 < ∞.
j=1 j=1 j=1
Proof. If Γ(A) is closed, then it is again a Banach space. Now the projec-
tion π1 (x, Ax) = x onto the first component is a continuous bijection onto
X. So by the inverse mapping theorem its inverse π1−1 is again continuous.
Moreover, the projection π2 (x, Ax) = Ax onto the second component is also
continuous and consequently so is A = π2 ◦ π1−1 . The converse is easy.
If this is the case, A is called closable and the operator A associated with
Γ(A) is called the closure of A.
104 4. The main theorems about Banach spaces
and Aaj = jaj . In fact, if an → a and Aan → b then we have anj → aj and
janj → bj for any j ∈ N and thus bj = jaj for any j ∈ N. In particular,
j=1 = (bj )j=1 ∈ ` (N) (c0 (N) if p = ∞). Conversely, suppose (jaj )j=1 ∈
(jaj )∞ ∞ p ∞
anj := n1 for 1 ≤ j ≤ n and anj := 0 for j > n. Then kan k2 = √1n implying
an → 0 but Ban = δ 1 6→ 0.
Example 4.10. Another example are point evaluations in L2 (0, 1): Let x0 ∈
[0, 1] and consider `x0 : D(`x0 ) → C, f 7→ f (x0 ) defined on D(`x0 ) :=
C[0, 1] ⊆ L2 (0, 1). Then fn (x) := max(0, 1 − n|x − x0 |) satisfies fn → 0 but
`x0 (fn ) = 1. In fact, a linear functional is closable if and only if it is bounded
(Problem 4.9).
−1
Lemma 4.8. Suppose A is closable and A is injective. Then A = A−1 .
Proof. If we set
Γ−1 = {(y, x)|(x, y) ∈ Γ}
4.1. The Baire theorem and its consequences 105
The question when Ran(A) is closed plays an important role when in-
vestigating solvability of the equation Ax = y and the last part gives us a
convenient criterion. Moreover, note that A−1 is bounded if and only if there
is some c > 0 such that
kAxk ≥ ckxk, x ∈ D(A). (4.5)
Indeed, this follows upon setting x = A−1 y in the above inequality which
also shows that c = kA−1 k−1 is the best possible constant. Factoring out
the kernel we even get a criterion for the general case:
Corollary 4.10. Suppose A : D(A) ⊆ X → Y is closed. Then Ran(A) is
closed if and only if
kAxk ≥ c dist(x, Ker(A)), x ∈ D(A), (4.6)
for some c > 0. The sup over all possible c is known as the (reduced) mini-
mum modulus of A.
Proof. Consider the quotient space X̃ := X/ Ker(A) and the induced op-
erator à : D(Ã) → Y where D(Ã) = D(A)/ Ker(A) ⊆ X̃. By construction
Ã[x] = 0 iff x ∈ Ker(A) and hence à is injective. To see that à is closed we
use π̃ : X × Y → X̃ × Y , (x, y) 7→ ([x], y) which is bounded, surjective and
hence open. Moreover, π̃(Γ(A)) = Γ(Ã). In fact, we even have (x, y) ∈ Γ(A)
iff ([x], y) ∈ Γ(Ã) and thus π̃(X × Y \ Γ(A)) = X̃ × Y \ Γ(Ã) implying that
Y \ Γ(Ã) is open. Finally, observing Ran(A) = Ran(Ã) we have reduced it
to the previous corollary.
There is also another criterion which does not involve the distance to the
kernel.
106 4. The main theorems about Banach spaces
The closed graph theorem tells us that closed linear operators can be
defined on all of X if and only if they are bounded. So if we have an
unbounded operator we cannot have both! That is, if we want our operator
to be at least closed, we have to live with domains. This is the reason why
in quantum mechanics most operators are defined on domains. In fact, there
is another important property which does not allow unbounded operators to
be defined on the entire space:
Theorem 4.12 (Hellinger–Toeplitz). Let A : H → H be a linear operator on
some Hilbert space H. If A is symmetric, that is hg, Af i = hAg, f i, f, g ∈ H,
then A is bounded.
closed and nowhere dense. For the first property Bolzano–Weierstraß might
be useful, for the latter property show that the set of piecewise linear functions
whose slopes are bounded below by some fixed number in absolute value are
dense.)
Problem 4.4. Let X be a complete metric space without isolated points.
Show that a dense Gδ set cannot be countable. (Hint: A single point is
nowhere dense.)
Problem 4.5. Let X be the space of sequences with finitely many nonzero
terms together with the sup norm. Consider the family of operators {An }n∈N
given by (An a)j := jaj , j ≤ n and (An a)j := 0, j > n. Then this family
is pointwise bounded but not uniformly bounded. Does this contradict the
Banach–Steinhaus theorem?
Problem 4.6. Show that a bilinear map B : X × Y → Z is bounded,
kB(x, y)k ≤ Ckxkkyk, if and only if it is separately continuous with respect
to both arguments. (Hint: Uniform boundedness principle.)
Problem 4.7. Consider a Schauder basis as in (1.31). Show that the co-
ordinate functionals αn are continuous. (Hint: Denote the set of all pos-
of Schauder coefficients by A and equip it with the norm
sible sequences P
kαk := supn k nk=1 αk uk k; note that A is precisely the set of sequences
P this norm is finite. By construction the operator A : A → X,
for which
α 7→ k αk uk has norm one. Now show that A is complete and apply the
inverse mapping theorem.)
Problem 4.8. Show that a compact symmetric operator in an infinite-dimensional
Hilbert space cannot be surjective.
Problem* 4.9. A linear functional defined on a dense subspace is closable
if and only if it is bounded.
Problem 4.10. Show that if A is a closed and B a bounded operator, then
A + B is closed. Show that this in general fails if B is not bounded. (Here
A + B is defined on D(A + B) = D(A) ∩ D(B).)
Problem 4.11. Suppose B ∈ L (X, Y ) is bounded and A : D(A) ⊆ Y → Z
is closed. Then AB : D(AB) ⊆ X → Z is closed, where D(AB) := {x ∈
D(B)|Ax ∈ D(A)}.
d
Problem 4.12. Show that the differential operator A = dx defined on
1
D(A) = C [0, 1] ⊂ C[0, 1] (sup norm) is a closed operator. (Compare the
example in Section 1.6.)
Problem* 4.13. Consider a linear operator A : D(A) ⊆ X → Y , where X
and Y are Banach spaces. Define the graph norm associated with A by
kxkA := kxkX + kAxkY , x ∈ D(A). (4.8)
108 4. The main theorems about Banach spaces
Show that A : D(A) → Y is bounded if we equip D(A) with the graph norm.
Show that the completion XA of (D(A), k.kA ) can be regarded as a subset of
X if and only if A is closable. Show that in this case the completion can
be identified with D(A) and that the closure of A in X coincides with the
extension from Theorem 1.16 of A in XA . In particular, A is closed if and
only if (D(A), k.kA ) is complete.
Problem 4.14. Let X := `2 (N) P and (Aa)j := j aj with D(A) := {a ∈ `2 (N)|
(jaj )j∈N ∈ ` (N)} and Ba := ( j∈N aj )δ 1 . Then we have seen that A is
2
identifies `p (N)∗ with `q (N) using this canonical isomorphism and simply
4.2. The Hahn–Banach theorem and its consequences 109
writes `p (N)∗ = `q (N). In the case p = ∞ this is not true, as we will see
soon.
It turns out that many questions are easier to handle after applying a
linear functional ` ∈ X ∗ . For example, suppose x(t) is a function R → X
(or C → X), then `(x(t)) is a function R → C (respectively C → C) for
any ` ∈ X ∗ . So to investigate `(x(t)) we have all tools from real/complex
analysis at our disposal. But how do we translate this information back to
x(t)? Suppose we have `(x(t)) = `(y(t)) for all ` ∈ X ∗ . Can we conclude
x(t) = y(t)? The answer is yes and will follow from the Hahn–Banach
theorem.
We first prove the real version from which the complex one then follows
easily.
Theorem 4.13 (Hahn–Banach, real version). Let X be a real vector space
and ϕ : X → R a convex function (i.e., ϕ(λx+(1−λ)y) ≤ λϕ(x)+(1−λ)ϕ(y)
for λ ∈ (0, 1)).
If ` is a linear functional defined on some subspace Y ⊂ X which satisfies
`(y) ≤ ϕ(y), y ∈ Y , then there is an extension ` to all of X satisfying
`(x) ≤ ϕ(x), x ∈ X.
Proof. Let us first try to extend ` in just one direction: Take x 6∈ Y and
set Ỹ = span{x, Y }. If there is an extension `˜ to Ỹ it must clearly satisfy
˜ + αx) = `(y) + α`(x).
`(y ˜
˜
So all we need to do is to choose `(x) ˜ + αx) ≤ ϕ(y + αx). But
such that `(y
this is equivalent to
ϕ(y − αx) − `(y) ˜ ≤ inf ϕ(y + αx) − `(y)
sup ≤ `(x)
α>0,y∈Y −α α>0,y∈Y α
and is hence only possible if
ϕ(y1 − α1 x) − `(y1 ) ϕ(y2 + α2 x) − `(y2 )
≤
−α1 α2
for every α1 , α2 > 0 and y1 , y2 ∈ Y . Rearranging this last equations we see
that we need to show
α2 `(y1 ) + α1 `(y2 ) ≤ α2 ϕ(y1 − α1 x) + α1 ϕ(y2 + α2 x).
Starting with the left-hand side we have
α2 `(y1 ) + α1 `(y2 ) = (α1 + α2 )` (λy1 + (1 − λ)y2 )
≤ (α1 + α2 )ϕ (λy1 + (1 − λ)y2 )
= (α1 + α2 )ϕ (λ(y1 − α1 x) + (1 − λ)(y2 + α2 x))
≤ α2 ϕ(y1 − α1 x) + α1 ϕ(y2 + α2 x),
110 4. The main theorems about Banach spaces
where λ = α1 +α2 .
α2
Hence one dimension works.
To finish the proof we appeal to Zorn’s lemma (see Appendix A): Let E
be the collection of all extensions `˜ satisfying `(x)
˜ ≤ ϕ(x). Then E can be
partially ordered by inclusion (with respect to the domain) and every linear
chain has an upper bound (defined on the union of all domains). Hence there
is a maximal element ` by Zorn’s lemma. This element is defined on X, since
if it were not, we could extend it as before contradicting maximality.
Proof. Clearly, if `(x) = 0 holds for all ` in some total subset, this holds
for all ` ∈ X ∗ . If x 6= 0 we can construct a bounded linear functional on
span{x} by setting `(αx) = α and extending it to X ∗ using the previous
corollary. But this contradicts our assumption.
Example 4.16. Let us return to our example `∞ (N). Let c(N) ⊂ `∞ (N) be
the subspace of convergent sequences. Set
l(x) = lim xn , x ∈ c(N), (4.10)
n→∞
Corollary 4.19. Let Y be a closed subspace and let {xj }nj=1 be a linearly
independent subset of X. If Y ∩ span{xj }nj=1 = {0}, then there exists a
biorthogonal system {`j }nj=1 ⊂ X ∗ such that `j (xk ) = 0 for j 6= k,
`j (xj ) = 1 and `(y) = 0 for y ∈ Y .
Example 4.17. This gives another quick way of showing that a normed
space has a completion: Take X̄ = J(X) ⊆ X ∗∗ and recall that a dual space
is always complete (Theorem 1.17).
Thus J : X → X ∗∗ is an isometric embedding. In many cases we even
have J(X) = X ∗∗ and X is called reflexive in this case.
Example 4.18. The Banach spaces `p (N) with 1 < p < ∞ are reflexive:
Identify `p (N)∗ with `q (N) (cf. Problem 4.21) and choose z ∈ `p (N)∗∗ . Then
4.2. The Hahn–Banach theorem and its consequences 113
But this implies z(y) = y(x), that is, z = J(x), and thus J is surjective.
(Warning: It does not suffice to just argue `p (N)∗∗ ∼
= `q (N)∗ ∼
= `p (N).)
However, `1 is not reflexive since `1 (N)∗ ∼= `∞ (N) but `∞ (N)∗ ∼ 6 `1 (N)
=
as noted earlier. Things get even a bit more explicit if we look at c0 (N),
where we can identify (cf. Problem 4.22) c0 (N)∗ with `1 (N) and c0 (N)∗∗ with
`∞ (N). Under this identification J(c0 (N)) = c0 (N) ⊆ `∞ (N).
Example 4.19. By the same argument, every Hilbert space is reflexive. In
fact, by the Riesz lemma we can identify H∗ with H via the (conjugate linear)
map x 7→ hx, .i. Taking z ∈ H∗∗ we have, again by the Riesz lemma, that
z(y) = hhx, .i, hy, .iiH∗ = hx, yi∗ = hy, xi = J(x)(y).
Lemma 4.21. Let X be a Banach space.
(i) If X is reflexive, so is every closed subspace.
(ii) X is reflexive if and only if X ∗ is.
(iii) If X ∗ is separable, so is X.
surjective.
(ii) Suppose X is reflexive. Then the two maps
(JX )∗ : X ∗ → X ∗∗∗ (JX )∗ : X ∗∗∗ → X ∗
−1
x0 7→ x0 ◦ JX x000 7→ x000 ◦ JX
−1 00
are inverse of each other. Moreover, fix x00 ∈ X ∗∗ and let x = JX (x ).
−1 00
Then JX ∗ (x )(x ) = x (x ) = J(x)(x ) = x (x) = x (JX (x )), that is JX ∗ =
0 00 00 0 0 0 0
that kxn k = 1 and `n (xn ) ≥ k`n k/2. We will show that {xn }∞ n=1 is total in
X. If it were not, we could find some x ∈ X \ span{xn }n=1 and hence there
∞
`1 (N)∗ ∼= `∞ (N) shows. In fact, this can be used to show that a separable
space is not reflexive, by showing that its dual is not separable.
Example 4.20. The space C(I) is not reflexive. To see this observe that
the dual space contains point evaluations `x0 (f ) := f (x0 ), x0 ∈ I. Moreover,
for x0 6= x1 we have k`x0 − `x1 k = 2 and hence C(I)∗ is not separable. You
should appreciate the fact that it was not necessary to know the full dual
space which is quite intricate (see Theorem 6.5 from [47]).
Note that the product of two reflexive spaces is also reflexive. In fact,
this even holds for countable products — Problem 4.24.
Problem 4.15. Let X := C3 equipped with the norm |(x, y, z)|1 := |x| +
|y| + |z| and Y := {(x, y, z)|x + y = 0, z = 0}. Find at least two extensions
of `(x, y, z) := x from Y to X which preserve the norm. What if we take
Y := {(x, y, z)|x + y = 0}?
Problem 4.16. Show that the extension from Corollary 4.15 is unique if X ∗
is strictly convex. (Hint: Problem 1.13.)
Problem* 4.17. Let X be some normed space. Show that
kxk = sup |`(x)|, (4.12)
`∈V, k`k=1
Problem 4.25 (Banach limit). Let c(N) ⊂ `∞ (N) be the subspace of all
bounded sequences for which the limit of the Cesàro means
n
1X
L(x) := lim xk
n→∞ n
k=1
exists. Note that c(N) ⊆ c(N) and L(x) = limn→∞ xn for x ∈ c(N).
Show that L can be extended to all of `∞ (N) such that
(i) L is linear,
(ii) |L(x)| ≤ kxk∞ ,
(iii) L(Sx) = L(x) where (Sx)n = xn+1 is the shift operator,
(iv) L(x) ≥ 0 when xn ≥ 0 for all n,
(v) lim inf n xn ≤ L(x) ≤ lim sup xn for all real-valued sequences.
(Hint: Of course existence follows from Hahn–Banach and (i), (ii) will come
for free. Also (iii) will be inherited from the construction. For (iv) note
that the extension can assumed to be real-valued and investigate L(e − x) for
x ≥ 0 with kxk∞ = 1 where e = (1, 1, 1, . . . ). (v) then follows from (iv).)
Problem* 4.26. Show that a finite dimensional subspace M of a Banach
space X can be complemented. (Hint: Start with a basis {xj } for M and
choose a corresponding dual basis {`k } with `k (xj ) = δj,k which can be ex-
tended to X ∗ .)
where we have used Problem 4.17 to obtain the fourth equality. In summary,
Theorem 4.22. Let A ∈ L (X, Y ), then A0 ∈ L (Y ∗ , X ∗ ) with kAk = kA0 k.
Rx := (0, x1 , x2 , . . . ).
Then for y 0 ∈ `q (N)
∞
X ∞
X ∞
X
y 0 (Rx) = yj0 (Rx)j = yj0 xj−1 = 0
yj+1 xj
j=1 j=2 j=1
since A(B1X (0)) ⊆ K is dense. Thus yn0 j is the required subsequence and A0
is compact.
To see the converse note that if A0 is compact then so is A00 by the first
part and hence also A = JY−1 A00 JX .
à : X/ Ker(A) → Ran(A) is the induced map (cf. Problem 1.43) which has
a bounded inverse by Theorem 4.6. By construction ` = A0 (`) ˜ ∈ Ran(A0 ).
(ii) ⇒ (iii): Clear since annihilators are closed.
(iii) ⇒ (i): Let Z = Ran(A) and let à : X → Z be the range restriction
of A. Then Ã0 is injective (since Ker(Ã0 ) = Ran(Ã)⊥ = {0}) and has the
same range Ran(Ã0 ) = Ran(A0 ) (since every linear functional in Z ∗ can be
extended to one in Y ∗ by Corollary 4.15). Hence we can assume Z = Y and
hence A0 injective without loss of generality.
Suppose Ran(A) were not closed. Then, given ε > 0 and 0 ≤ δ < 1, by
Corollary 4.11 there is some y ∈ Y such that εkxk + ky − Axk > δkyk for
all x ∈ X. Hence there is a linear functional ` ∈ Y ∗ such that δ ≤ k`k ≤ 1
and kA0 `k ≤ ε. Indeed consider X ⊕ Y and use Corollary 4.17 to choose
`¯ ∈ (X ⊕ Y )∗ such that `¯ vanishes on the closed set V := {(εx, Ax)|x ∈ X},
¯ = 1, and `(0,
k`k ¯ y) = dist((0, y), V ) (note that (0, y) 6∈ V since y 6= 0). Then
¯
`(.) = `(0, .) is the functional we are looking for since dist((0, y), V ) ≥ δkyk
and (A0 `)(x) = `(0,¯ Ax) = `(−εx,
¯ ¯ 0). Now this allows us to
0) = −ε`(x,
choose `n with k`n k → 1 and kA `n k → 0 such that Corollary 4.10 implies
0
With the help of annihilators we can also describe the dual spaces of
subspaces.
Show that
A0 x0 = (x01 , x01 , . . . ).
Conclude that A is not the adjoint of an operator from L (c0 (N)).
Problem* 4.28. Show
Ker(A0 ) ∼
= Coker(A)∗ , Coker(A0 ) ∼
= Ker(A)∗
for A ∈ L (X, Y ) with Ran(A) closed.
Problem 4.29. Let Xj be Banach spaces. A sequence of operators Aj ∈
L (Xj , Xj+1 )
A
1 2 A n A
X1 −→ X2 −→ X3 · · · Xn −→ Xn+1
is said to be exact if Ran(Aj ) = Ker(Aj+1 ) for 1 ≤ j ≤ n. Show that a
sequence is exact if and only if the corresponding dual sequence
A0 A0 A0
X1∗ ←−
1
X2∗ ←−
2
X3∗ · · · Xn∗ ←−
n ∗
Xn+1
is exact.
Problem 4.30. Suppose X is separable. Show that there exists a countable
set N ⊂ X ∗ with N⊥ = {0}.
Problem 4.31. Show that for A ∈ L (X, Y ) we have
rank(A) = rank(A0 ).
Problem* 4.32 (Riesz lemma). Let X be a normed vector space and Y ⊂ X
some subspace. Show that if Y 6= X, then for every ε ∈ (0, 1) there exists an
xε with kxε k = 1 and
inf kxε − yk ≥ 1 − ε. (4.24)
y∈Y
Note: In a Hilbert space the claim holds with ε = 0 for any normalized x in
the orthogonal complement of Y and hence xε can be thought of a replacement
of an orthogonal vector. (Hint: Choose a yε ∈ Y which is close to x and look
at x − yε .)
Problem* 4.33. Suppose X is a vector space and `, `P
1 , . . . , `n are linear
functionals such that nj=1 Ker(`j ) ⊆ Ker(`). Then ` = nj=1 αj `j for some
T
constants αj ∈ C.
P (Hint: Find a dual basis xk ∈ X such that `j (xk ) = δj,k
and look at x − nj=1 `j (x)xj .)
122 4. The main theorems about Banach spaces
Proof. (i) Follows from `(αn xn + yn ) = αn `(xn ) + `(yn ) → α`(x) + `(y). (ii)
Choose ` ∈ X ∗ such that `(x) = kxk (for the limit x) and k`k = 1. Then
kxk = `(x) = lim inf `(xn ) ≤ lim inf kxn k.
(iii) For every ` we have that |J(xn )(`)| = |`(xn )| ≤ C(`) is bounded. Hence
by the uniform boundedness principle we have kxn k = kJ(xn )k ≤ C.
(iv) If xn is a weak Cauchy sequence, then `(xn ) converges and we can define
j(`) = lim `(xn ). By construction j is a linear functional on X ∗ . Moreover,
by (iii) we have |j(`)| ≤ sup |`(xn )| ≤ k`k sup kxn k ≤ Ck`k which shows
j ∈ X ∗∗ . Since X is reflexive, j = J(x) for some x ∈ X and by construction
`(xn ) → J(x)(`) = `(x), that is, xn * x.
(v) This follows from
kxn − xm k = sup |`(xn − xm )|
k`k=1
Item (ii) says that the norm is sequentially weakly lower semicontinuous
(cf. Problem B.18) while the previous example shows that it is not sequen-
tially weakly continuous (this will in fact be true for any convex function
as we will see later). However, bounded linear operators turn out to be
sequentially weakly continuous (Problem 4.37).
Example 4.27. Consider L2 (0, 1) and recall (see Example 3.8) that
√
un (x) = 2 sin(nπx), n ∈ N,
form an ONB and hence un * 0. However, vn = u2n * 1. In fact, one easily
computes
√ √
2(1 − (−1)m ) 4n2 2(1 − (−1)m )
hum , vn i = → = hum , 1i
mπ (4n2 − m2 ) mπ
q
and the claim follows from Problem 4.40 since kvn k = 32 .
Remark: One can equip X with the weakest topology for which all ` ∈ X ∗
remain continuous. This topology is called the weak topology and it is
given by taking all finite intersections of inverse images of open sets as a
base. By construction, a sequence will converge in the weak topology if and
only if it converges weakly. By Corollary 4.17 the weak topology is Hausdorff,
but it will not be metrizable in general. In particular, sequences do not suffice
to describe this topology. Nevertheless we will stick with sequences for now
and come back to this more general point of view in Section 5.3.
In a Hilbert space there is also a simple criterion for a weakly convergent
sequence to converge in norm (see Theorem 5.19 for a generalization).
124 4. The main theorems about Banach spaces
Proof. By (ii) of the previous lemma we have lim kfn k = kf k and hence
kf − fn k2 = kf k2 − 2Re(hf, fn i) + kfn k2 → 0.
Now we come to the main reason why weakly convergent sequences are of
interest: A typical approach for solving a given equation in a Banach space
is as follows:
(i) Construct a (bounded) sequence xn of approximating solutions
(e.g. by solving the equation restricted to a finite dimensional sub-
space and increasing this subspace).
(ii) Use a compactness argument to extract a convergent subsequence.
(iii) Show that the limit solves the equation.
Our aim here is to provide some results for the step (ii). In a finite di-
mensional vector space the most important compactness criterion is bound-
edness (Heine–Borel theorem, Theorem B.22). In infinite dimensions this
breaks down as we have already seen in Section 1.5. We even have
Theorem 4.30. The closed unit ball in a Banach space X is compact if and
only if X is finite dimensional.
a contradiction.
Example 4.29. Let X := L1 [−1, 1]. Every bounded integrable ϕ gives rise
to a linear functional Z
`ϕ (f ) := f (x)ϕ(x) dx
(see Problem 3.29 from [47]) for every continuous ϕ. Furthermore, if ukj * u
we conclude Z
u(x)ϕ(x) dx = ϕ(0).
a contradiction.
In fact, uk converges to the Dirac measure centered at 0, which is not in
L1 [−1, 1].
126 4. The main theorems about Banach spaces
Note that the above theorem also shows that in an infinite dimensional
reflexive Banach space weak convergence is always weaker than strong con-
vergence since otherwise every bounded sequence had a weakly, and thus by
assumption also norm, convergent subsequence contradicting Theorem 4.30.
In a non-reflexive space this situation can however occur.
Example 4.30. In `1 (N) every weakly convergent sequence is in fact (norm)
convergent (such Banach spaces are said to have the Schur property). First
of all recall that `1 (N)∗ ∼
= `∞ (N) and an * 0 implies
∞
X
lb (an ) = bk ank → 0, ∀b ∈ `∞ (N).
k=1
Now suppose we could find a sequence an * 0 for which lim supn kan k1 ≥
ε > 0. After passing to a subsequence we can assume kan k1 ≥ ε/2 and
after rescaling the vector even kan k1 = 1. Now weak convergence an * 0
implies anj = lδj (an ) → 0 for every fixed j ∈ N. Hence the main contri-
bution to the norm of an must move towards ∞ and we can find a subse-
quence nj and a corresponding increasing sequence of integers kj such that
nj
kj ≤k<kj+1 |ak | ≥ 3 . Now set
P 2
n
bk = sign(ak j ), kj ≤ k < kj+1 .
Then
nj
X nj X nj 2 1 1
|lb (a )| ≥ |ak | −
bk ak ≥ − = ,
kj ≤k<kj+1 1≤k<kj ; kj+1 ≤k 3 3 3
contradicting anj * 0.
It is also useful to observe that compact operators will turn weakly con-
vergent into (norm) convergent sequences.
Theorem 4.32. Let A ∈ C (X, Y ) be compact. Then xn * x implies Axn →
Ax. If X is reflexive the converse is also true.
Then Sn converges to zero strongly but not in norm (since kSn k = 1) and Sn∗
converges weakly to zero (since hx, Sn∗ yi = hSn x, yi) but not strongly (since
kSn∗ xk = kxk) .
Lemma 4.33. Suppose An , Bn ∈ L (X, Y ) are sequences of bounded opera-
tors.
(i) s-lim An = A, s-lim Bn = B, and αn → α implies s-lim(An +Bn ) =
n→∞ n→∞ n→∞
A + B and s-lim αn An = αA.
n→∞
(ii) s-lim An = A implies kAk ≤ lim inf kAn k.
n→∞ n→∞
(iii) If An x converges for all x ∈ X then kAn k ≤ C and there is an
operator A ∈ L (X, Y ) such that s-lim An = A.
n→∞
(iv) If An y converges for y in a total set and kAn k ≤ C, then there is
an operator A ∈ L (X, Y ) such that s-lim An = A.
n→∞
The same result holds if strong convergence is replaced by weak convergence.
128 4. The main theorems about Banach spaces
and we get that the Fourier series does not converge for some L1 function.
Lemma 4.34. Suppose An ∈ L (Y, Z), Bn ∈ L (X, Y ) are two sequences of
bounded operators.
(i) s-lim An = A and s-lim Bn = B implies s-lim An Bn = AB.
n→∞ n→∞ n→∞
4.4. Weak convergence 129
Further topics on
Banach spaces
131
132 5. Further topics on Banach spaces
`(x) = c
a topological vector space with the usual topology generated by open balls.
As in the case of normed linear spaces, X ∗ will denote the vector space of
all continuous linear functionals on X.
Lemma 5.1. Let X be a vector space and U a convex subset containing 0.
Then
pU (x + y) ≤ pU (x) + pU (y), pU (λ x) = λpU (x), λ ≥ 0. (5.2)
Moreover, {x|pU (x) < 1} ⊆ U ⊆ {x|pU (x) ≤ 1}. If, in addition, X is a
topological vector space and U is open, then U = {x|pU (x) < 1}.
Note that two disjoint closed convex sets can be separated strictly if
one of them is compact. However, this will require that every point has
a neighborhood base of convex open sets. Such topological vector spaces
are called locally convex spaces and they will be discussed further in
Section 5.4. For now we just remark that every normed vector space is
locally convex since balls are convex.
Corollary 5.4. Let U , V be disjoint nonempty closed convex subsets of a
locally convex space X and let U be compact. Then there is a linear functional
` ∈ X ∗ and some c, d ∈ R such that
Re(`(x)) ≤ d < c ≤ Re(`(y)), x ∈ U, y ∈ V. (5.6)
is a convex open set which is disjoint from V . Hence by the previous theorem
we can find some ` such that Re(`(x)) < c ≤ Re(`(y)) for all x ∈ Ũ and y ∈
V . Moreover, since `(U ) is a compact interval [e, d], the claim follows.
A line segment is convex and can be generated as the convex hull of its
endpoints. Similarly, a full triangle is convex and can be generated as the
convex hull of its vertices. However, if we look at a ball, then we need its
entire boundary to recover it as the convex hull. So how can we characterize
those points which determine a convex sets via the convex hull?
Let K be a set and M ⊆ K a nonempty subset. Then M is called
an extremal subset of K if no point of M can be written as a convex
combination of two points unless both are in M : For given x, y ∈ K and
λ ∈ (0, 1) we have that
λx + (1 − λ)y ∈ M ⇒ x, y ∈ M. (5.9)
If M = {x} is extremal, then x is called an extremal point of K. Hence
an extremal point cannot be written as a convex combination of two other
points from K.
Note that we did not require K to be convex. If K is convex, then M is
extremal if and only if K \M is convex. Note that the nonempty intersection
of extremal sets is extremal. Moreover, if L ⊆ M is extremal and M ⊆ K is
extremal, then L ⊆ K is extremal as well (Problem 5.7).
Example 5.3. Consider R2 with the norms k.kp . Then the extremal points
of the closed unit ball (cf. Figure 1.1) are the boundary points for 1 < p < ∞
and the vertices for p = 1, ∞. In any case the boundary is an extremal set.
Slightly more general, in a strictly convex space, (ii) of Problem 1.13 says
that the extremal points of the unit ball are precisely its boundary points.
Example 5.4. Consider R3 and let C = {(x1 , x2 , 0) ∈ R3 |x21 + x22 = 1}.
Take two more points x± = (0, 0, ±1) and consider the convex hull K of
M = C ∪ {x+ , x− }. Then M is extremal in K and, moreover, every point
from M is an extremal point. However, if we change the two extra points to
be x± = (1, 0, ±1), then the point (1, 0, 0) is no longer extremal. Hence the
extremal points are now M \ {(1, 0, 0)}. Note in particular that the set of
extremal points is not closed in this case.
Extremal sets arise naturally when minimizing linear functionals.
Lemma 5.7. Suppose K ⊆ X and ` ∈ X ∗ . If
K` := {x ∈ K|`(x) = inf Re(`(y))}
y∈K
Proof. We want to apply Zorn’s lemma. To this end consider the family
M = {M ⊆ K|compact and extremal in K}
138 5. Further topics on Banach spaces
with the partial order given by reversed inclusion. Since K ∈ M this family
is nonempty. Moreover, given a linear chain C ⊂ M we consider M := C.
T
Then M ⊆ K is nonempty by the finite intersection property and since it
is closed also compact. Moreover, as the nonempty intersection of extremal
sets it is also extremal. Hence M ∈ M and thus M has a maximal element.
Denote this maximal element by M .
We will show that M contains precisely one point (which is then ex-
tremal by construction). Indeed, suppose x, y ∈ M . If x 6= y we can, by
Corollary 5.4, choose a linear functional ` ∈ X ∗ with Re(`(x)) 6= Re(`(y)).
Then by Lemma 5.7 M` ⊂ M is extremal in M and hence also in K. But
by Re(`(x)) 6= Re(`(y)) it cannot contain both x and y contradicting maxi-
mality of M .
Finally, we want to recover a convex set as the convex hull of its ex-
tremal points. In our infinite dimensional setting an additional closure will
be necessary in general.
Since the intersection of arbitrary closed convex sets is again closed and
convex we can define the closed convex hull of a set U as the smallest closed
convex set containing U , that is, the intersection of all closed convex sets
containing U . Since the closure of a convex set is again convex (Problem 5.8)
the closed convex hull is simply the closure of the convex hull.
Theorem 5.9 (Krein–Milman). Let X be a locally convex space. Suppose
K ⊆ X is convex and compact. Then it is the closed convex hull of its
extremal points.
Now consider K` from Lemma 5.7 which is nonempty and hence contains an
extremal point y ∈ E. But y 6∈ M , a contradiction.
While in the finite dimensional case the closure is not necessary (Prob-
lem 5.9), it is important in general as the following example shows.
Example 5.8. Consider the closed unit ball in `1 (N). Then the extremal
points are {eiθ δ n |n ∈ N, θ ∈ R}. Indeed, suppose kak1 = 1 with λ :=
|aj | ∈ (0, 1) for some j ∈ N. Then a = λb + (1 − λ)c where b := λ−1 aj δ j
and c := (1 − λ)−1 (a − aj δ j ). Hence the only possible extremal points
are of the form eiθ δ n . Moreover, if eiθ δ n = λb + (1 − λ)c we must have
1 = |λbn +(1−λ)cn | ≤ λ|bn |+(1−λ)|cn | ≤ 1 and hence an = bn = cn by strict
convexity of the absolute value. Thus the convex hull of the extremal points
5.2. Convex sets and the Krein–Milman theorem 139
are the sequences from the unit ball which have finitely many terms nonzero.
While the closed unit ball is not compact in the norm topology it will be in
the weak-∗ topology by the Banach–Alaoglu theorem (Theorem 5.10). To
this end note that `1 (N) ∼
= c0 (N)∗ .
Also note that in the infinite dimensional case the extremal points can
be dense.
Example 5.9. Let X = C([0, 1], R) and consider the convex set K = {f ∈
C 1 ([0, 1], R)|f (0) = 0, kf 0 k∞ ≤ 1}. Note that the functions f± (x) = ±x are
extremal. For example, assume
x = λf (x) + (1 − λ)g(x)
then
1 = λf 0 (x) + (1 − λ)g 0 (x)
which implies f 0 (x) = g 0 (x) = 1 and hence f (x) = g(x) = x.
To see that there are no other extremal functions, suppose |f 0 (x)| ≤ 1−ε
on some interval I. Choose a nontrivial continuous function R x g which is 0
outside I and has integral 0 over I and kgk∞ ≤ ε. Let G = 0 g(t)dt. Then
f = 21 (f + G) + 12 (f − G) and hence f is not extremal. Thus f± are the
only extremal points and their (closed) convex is given by fλ (x) = λx for
λ ∈ [−1, 1].
Of course the problem is that K is not closed. Hence we consider the
Lipschitz continuous functions K̄ := {f ∈ C 0,1 ([0, 1], R)|f (0) = 0, [f ]1 ≤ 1}
(this is in fact the closure of K, but this is a bit tricky to see and we won’t
need this here). By the Arzelà–Ascoli theorem (Theorem 1.13) K̄ is relatively
compact and since the Lipschitz estimate clearly is preserved under uniform
limits it is even compact.
Now note that piecewise linear functions with f 0 (x) ∈ {±1} away from
the kinks are extremal in K̄. Moreover, these functions are dense: Split
[0, 1] into n pieces of equal length using xj = nj . Set fn (x0 ) = 0 and
fn (x) = fn (xj ) ± (x − xj ) for x ∈ [xj , xj+1 ] where the sign is chosen such
that |f (xj+1 ) − fn (xj+1 )| gets minimal. Then kf − fn k∞ ≤ n1 .
Problem* 5.6. Show that the convex hull is given by (5.8).
Problem* 5.7. Show that the nonempty intersection of extremal sets is
extremal. Show that if L ⊆ M is extremal and M ⊆ K is extremal, then
L ⊆ K is extremal as well.
Problem 5.8. Let X be a topological vector space. Show that the closure
and the interior of a convex set is convex. (Hint: One way of showing the
first claim is to consider the continuous map f : X × X → X given by
(x, y) 7→ λx + (1 − λ)y and use Problem B.14.)
140 5. Further topics on Banach spaces
defines a metric on the unit ball B1 (0) ⊂ X which can be shown to generate
the weak topology (Problem 5.13). However, on all of X the weak topology
is not first countable unless X is finite dimensional (Problem 5.14).
Similarly, we define the weak-∗ topology on X ∗ as the weakest topology
for which all j ∈ J(X) ⊆ X ∗∗ remain continuous. In particular, the weak-∗
topology is weaker than the weak topology on X ∗ and both are equal if X
is reflexive. Like the weak topology it is Hausdorff (since different linear
functionals must differ at least at one point) and not first countable unless
X is finite dimensional (Problem 5.14). A base for the weak-∗ topology is
given by sets of the form
n
\
|J(xj )|−1 [0, εj ) = {`˜ ∈ X ∗ ||(` − `)(x
˜ j )| < εj , 1 ≤ j ≤ n},
`+
j=1
` ∈ X ∗ , xj ∈ X, εj > 0. (5.12)
5.3. Weak topologies 141
Proof. Suppose X is not reflexive and choose x00 ∈ B̄1∗∗ (0) \ J(B̄1 (0)) with
kx00 k = 1. Then, if B̄1 (0) is weakly compact, J(B̄1 (0)) is weak-∗ compact
and by Corollary 5.4 we can find some ` ∈ X ∗ with k`k = 1 and
Re(x00 (`)) < inf Re(y 00 (`)) = inf Re(`(y)) = −1.
y 00 ∈J(B̄1 (0)) y∈B̄1 (0)
Since the weak topology is weaker than the norm topology every weakly
closed set is also (norm) closed. Moreover, the weak closure of a set will in
general be larger than the norm closure. However, for convex sets both will
coincide. In fact, we have the following characterization in terms of closed
(affine) half-spaces, that is, sets of the form {x ∈ X|Re(`(x)) ≤ α} for
some ` ∈ X ∗ and some α ∈ R.
Theorem 5.12 (Mazur). The weak as well as the norm closure of a convex
set K is the intersection of all half-spaces containing K. In particular, a
convex set K ⊆ X is weakly closed if and only if it is closed.
Finally, we note two more important results. For the first note that
since X ∗∗ is the dual of X ∗ it has a corresponding weak-∗ topology and by
5.3. Weak topologies 143
the Banach–Alaoglu theorem B̄1∗∗ (0) is weak-∗ compact and hence weak-∗
closed.
Theorem 5.14 (Goldstine). The image of the closed unit ball B̄1 (0) under
the canonical embedding J into the closed unit ball B̄1∗∗ (0) is weak-∗ dense.
Proof. Let j ∈ B̄1∗∗ (0) be given. Since sets of the form j + nk=1 |`k |−1 ([0, ε))
T
provide a neighborhood base (where we can assume the `k ∈ X ∗ to be
linearly independent without loss of generality) it suffices to find some x ∈
B̄1+ε (0) with `k (x) = j(`k ) for 1 ≤ k ≤ n since then (1 + ε)−1 J(x) will
be in the above neighborhood. Without the requirement kxk ≤ 1 + ε this
follows from surjectivity of the map F : X → Cn , x 7→ (`1 (x), . . . , `n (x)).
Moreover, given
T one such x the same is true for every element from x + Y ,
where Y = k Ker(`k ). So if (x + Y ) ∩ B̄1+ε (0) were empty, we would have
dist(x, Y ) ≥ 1 + ε and by Corollary 4.17 we could find some normalized
` ∈ X ∗ which vanishes on Y and satisfies `(x) ≥ 1 + ε. But by Problem 4.33
we have ` ∈ span(`1 , . . . , `n ) implying
1 + ε ≤ `(x) = j(`) ≤ kjkk`k ≤ 1
a contradiction.
Note that if B̄1 (0) ⊂ X is weakly compact, then J(B̄1 (0)) is compact
(and thus closed) in the weak-∗ topology on X ∗∗ . Hence Glodstine’s theorem
implies J(B̄1 (0)) = B̄1∗∗ (0) and we get an alternative proof of Kakutani’s
theorem.
Example 5.12. Consider X = c0 (N), X ∗ ∼ = `1 (N), and X ∗∗ ∼
= `∞ (N) with
J corresponding to the inclusion c0 (N) ,→ `∞ (N). Then we can consider the
linear functionals `j (x) = xj which are total in X ∗ and a sequence in X ∗∗
will be weak-∗ convergent if and only if it is bounded and converges when
composed with any of the `j (in other words, when the sequence converges
componentwise — cf. Problem 4.41). So for example, cutting off a sequence
in B̄1∗∗ (0) after n terms (setting the remaining terms equal to 0) we get a
sequence from B̄1 (0) ,→ B̄1∗∗ (0) which is weak-∗ convergent (but of course
not norm convergent).
Problem 5.10. Show that in an infinite dimensional space, a weakly open
neighborhood of 0 contains a nontrivial subspace. Show the analogue state-
ment for weak-∗ open neighborhoods of 0.
Problem 5.11. Show that a weakly sequentially compact set is bounded.
Problem 5.12. Show that a convex set K ⊆ X is weakly closed if and only
if it is weakly sequentially closed.
Problem 5.13. Show that (5.11) generates the weak topology on B1 (0) ⊂ X.
Show that (5.13) generates the weak topology on B1∗ (0) ⊂ X ∗ .
144 5. Further topics on Banach spaces
Problem 5.14. Show that the neither the weak nor the weak-∗ topology
is first countable if X is infinite dimensional. (Hint: If there is a countable
neighborhood base, you can find, using Problem 5.10, a sequence of unbounded
vectors which converge weakly to zero.)
the open neighborhood z + nj=1 qα−1 ([0, εj )) contains the open neighborhood
T
j
Tn εj ε
(Bε (γ), x + j=1 qαj ([0, 2(|γ|+ε) )) with ε < 2qα j(x) .
−1
j
Corollary 5.16. Let (X, {qα }) and (Y, {pβ }) be locally convex vector spaces.
Then a linear map A : X → Y is continuous if and only if for every β
P seminorms qαj and constants cj > 0, 1 ≤ j ≤ n, such that
there are some
pβ (Ax) ≤ nj=1 cj qαj (x).
Pn
It will shorten notation when sums of the type j=1 cj qαj (x), which
appeared in the last two results, can be replaced by a single expression c qα .
This can be done if the family of seminorms {qα }α∈A is directed, that is,
for given α, β ∈ A there is a γ ∈ A such that qα (x) + qβ (x) ≤ Cqγ (x)
for some C P > 0. Moreover, if F(A) is the set of all finite subsets of A,
then {q̃F = α∈F qα }F ∈F (A) is a directed family which generates the same
topology (since every q̃F is continuous with respect to the original family we
do not get any new open sets).
While the family of seminorms is in most cases more convenient to work
with, it is important to observe that different families can give rise to the
same topology and it is only the topology which matters for us. In fact, it
is possible to characterize locally convex vector spaces as topological vector
spaces which have a neighborhood basis at 0 of absolutely convex sets. Here a
set U is called absolutely convex, if for |α|+|β| ≤ 1 we have αU +βU ⊆ U .
Since the sets qα−1 ([0, ε)) are absolutely convex we always have such a basis
in our case. To see the converse note that such a neighborhood U of 0 is
also absorbing (Problem 5.18) und hence the corresponding Minkowski func-
tional (5.1) is a seminorm (Problem 5.23).
T By construction, these seminorms
generate the topology since if U0 = nj=1 qα−1 j
([0, εj )) ⊆ U we have for the
corresponding Minkowski functionals pU (x) ≤ pU0 (x) ≤ ε−1 nj=1 qαj (x),
P
where ε = min εj . With a little more work (Problem 5.22), one can even
5.4. Beyond Banach spaces: Locally convex spaces 147
r
2−n −r
} are clearly open and convex (note that the intersection is finite). Con-
ε
versely, for every set of the form (5.14) we can choose ε = min{2−αj 1+εj j |1 ≤
j ≤ n} such that Bε (x) will be contained in this set. Hence both topologies
are equivalent (cf. Lemma B.2).
is a Fréchet space.
Note that ∂α : C ∞ (Rm ) → C ∞ (Rm ) is continuous. Indeed by Corol-
lary 5.16 it suffices to observe that k∂α f kj,k ≤ kf kj,k+|α| .
Example 5.20. The Schwartz space
S(Rm ) = {f ∈ C ∞ (Rm )| sup |xα (∂β f )(x)| < ∞, ∀α, β ∈ Nm
0 } (5.21)
x
Proof. In a Banach space every open ball is bounded and hence only the
converse direction is nontrivial. So let U be a bounded open set. By shifting
and decreasing U if necessary we can assume U to be an absolutely convex
open neighborhood of 0 and consider the associated Minkowski functional
q = pU . Then since U = {x|q(x) < 1} and supx∈U qα (x) = Cα < ∞ we infer
qα (x) ≤ Cα q(x) (Problem 5.19) and thus the single seminorm q generates
the topology.
Finally, we mention that, since the Baire category theorem holds for
arbitrary complete metric spaces, the open mapping theorem (Theorem 4.5),
the inverse mapping theorem (Theorem 4.6) and the closed graph theorem
(Theorem 4.7) hold for Fréchet spaces without modifications. In fact, they
are formulated such that it suffices to replace Banach by Fréchet in these
theorems as well as their proofs (concerning the proof of Theorem 4.5 take
into account Problems 5.18 and 5.25).
Then
yn + y
yn + y
` ≤
≤1
2 2
and letting n → ∞ shows k yn2+y k → 1. Finally uniform convexity shows
yn → y.
For the proof of the next result we need to following equivalent condition.
Lemma 5.20. Let X be a Banach space. Then
n o
x+y
δ(ε) = inf 1 − k 2 k kxk ≤ 1, kyk ≤ 1, kx − yk ≥ ε (5.24)
for 0 ≤ ε ≤ 2.
Proof. It suffices to show that for given x and y which are not both on
the unit sphere there is a better pair in the real subspace spanned by these
vectors. By scaling we could get a better pair if both were strictly inside the
5.5. Uniformly convex spaces 153
unit ball and hence we can assume at least one vector to have norm one, say
kxk = 1. Moreover, consider
cos(t)x + sin(t)y
u(t) := , v(t) := u(t) + (y − x).
k cos(t)x + sin(t)yk
Then kv(0)k = kyk < 1. Moreover, let t0 ∈ ( π2 , 3π 4 ) be the value such that
the line from x to u(t0 ) passes through y. Then, by convexity we must have
kv(t0 )k > 1 and by the intermediate value theorem there is some 0 < t1 < t0
with kv(t1 )k = 1. Let u := u(t1 ), v := v(t1 ). The line through u and x is
not parallel to the line through 0 and x + y and hence there are α, λ ≥ 0
such that
α
(x + y) = λu + (1 − λ)x.
2
Moreover, since the line from x to u is above the line from x to y (since
t1 < t0 ) we have α ≥ 1. Rearranging this equation we get
α
(u + v) = (α + λ)u + (1 − α − λ)x.
2
Now, by convexity of the norm, if λ ≤ 1 we have λ + α > 1 and thus
kλu + (1 − λ)xk ≤ 1 < k(α + λ)u + (1 − α − λ)xk. Similarly, if λ > 1 we
have kλu + (1 − λ)xk < k(α + λ)u + (1 − α − λ)xk again by convexity of the
norm. Hence k 12 (x + y)k ≤ k 12 (u + v)k and u, v is a better pair.
Proof. Pick some x00 ∈ X ∗∗ with kx00 k = 1. It suffices to find some x ∈ B̄1 (0)
with kx00 − J(x)k ≤ ε. So fix ε > 0 and δ := δ(ε), where δ(ε) is the modulus
of convexity. Then kx00 k = 1 implies that we can find some ` ∈ X ∗ with
k`k = 1 and |x00 (`)| > 1 − 2δ . Consider the weak-∗ neighborhood
U := {y 00 ∈ X ∗∗ | |(y 00 − x00 )(`)| < 2δ }
of x00 . By Goldstine’s theorem (Theorem 5.14) there is some x ∈ B̄1 (0) with
J(x) ∈ U and this is the x we are looking for. In fact, suppose this were not
the case. Then the set V := X ∗∗ \ B̄ε∗∗ (J(x)) is another weak-∗ neighborhood
of x00 (since B̄ε∗∗ (J(x)) is weak-∗ compact by the Banach-Alaoglu theorem)
and appealing again to Goldstine’s theorem there is some y ∈ B̄1 (0) with
J(y) ∈ U ∩ V . Since x, y ∈ U we obtain
1− δ
2 < |x00 (`)| ≤ |`( x+y
2 )| +
δ
2 ⇒ 1 − δ < |`( x+y x+y
2 )| ≤ k 2 k,
a contradiction to uniform convexity since kx − yk ≥ ε.
Problem 5.31. Find an equivalent norm for `1 (N) such that it becomes
strictly convex (cf. Problems 1.13 and 1.17).
154 5. Further topics on Banach spaces
Problem* 5.32. Show that a Hilbert space is uniformly convex. (Hint: Use
the parallelogram law.)
Problem 5.33. A Banach space X is uniformly convex if and only if kxn k =
kyn k = 1 and k xn +y
2 k → 1 implies kxn − yn k → 0.
n
We have started out our study by looking at eigenvalue problems which, from
a historic view point, were one of the key problems driving the development
of functional analysis. In Chapter 3 we have investigated compact operators
in Hilbert space and we have seen that they allow a treatment similar to
what is known from matrices. However, more sophisticated problems will
lead to operators whose spectra consist of more than just eigenvalues. Hence
we want to go one step further and look at spectral theory for bounded
operators. Here one of the driving forces was the development of quantum
mechanics (there even the boundedness assumption is too much — but first
things first). A crucial role is played by the algebraic structure, namely recall
from Section 1.6 that the bounded linear operators on X form a Banach
space which has a (non-commutative) multiplication given by composition.
In order to emphasize that it is only this algebraic structure which matters,
we will develop the theory from this abstract point of view. While the reader
should always remember that bounded operators on a Hilbert space is what
we have in mind as the prime application, examples will apply these ideas
also to other cases thereby justifying the abstract approach.
To begin with, the operators could be on a Banach space (note that even
if X is a Hilbert space, L (X) will only be a Banach space) but eventually
again self-adjointness will be needed. Hence we will need the additional
operation of taking adjoints.
155
156 6. Bounded linear operators
and
(xy)z = x(yz), α (xy) = (αx)y = x (αy), α ∈ C, (6.2)
and
kxyk ≤ kxkkyk. (6.3)
is called a Banach algebra. In particular, note that (6.3) ensures that
multiplication is continuous (Problem 6.1). In fact, one can show that (sep-
arate) continuity of multiplication implies existence of an equivalent norm
satisfying (6.3) (Problem 6.2).
An element e ∈ X satisfying
ex = xe = x, ∀x ∈ X (6.4)
is called identity (show that e is unique) and we will assume kek = 1 in this
case (by Problem 6.2 this can be done without loss of generality).
Example 6.1. The continuous functions C(I) over some compact interval
form a commutative Banach algebra with identity 1.
Example 6.2. The differentiable functions C n (I) over some compact inter-
val do not form a commutative Banach algebra since (6.3) fails for n ≥ 1.
However, the equivalent norm
n
X kf (k) k∞
kf k∞,n :=
k!
k=0
respectively
∞ ∞ ∞
!
X X X
n n
x (e − x) = x − xn = e.
n=0 n=0 n=1
In particular, both conditions are satisfied if kyk < kx−1 k−1 and the set of
invertible elements G(X) is open and taking the inverse is continuous:
kykkx−1 k2
k(x − y)−1 − x−1 k ≤ . (6.10)
1 − kx−1 yk
continuous function. But will it again have a convergent Fourier series, that
is, will it be in the Wiener Algebra? The affirmative answer of this question
is a famous theorem of Wiener, which will be given later in Theorem 6.24.
The map α 7→ (x − α)−1 is called the resolvent of x ∈ X. If α0 ∈ ρ(x)
we can choose x → x − α0 and y → α − α0 in (6.9) which implies
∞
X
−1
(x − α) = (α − α0 )n (x − α0 )−n−1 , |α − α0 | < k(x − α0 )−1 k−1 . (6.13)
n=0
Proof. Equation (6.13) already shows that ρ(x) is open. Hence σ(x) is
closed. Moreover, x − α = −α(e − α1 x) together with Lemma 6.1 shows
∞
1 X x n
(x − α)−1 = − , |α| > kxk,
α α
n=0
which implies σ(x) ⊆ {α| |α| ≤ kxk} is bounded and thus compact. More-
over, taking norms shows
∞
−1 1 X kxkn 1
k(x − α) k≤ n
= , |α| > kxk,
|α| |α| |α| − kxk
n=0
The second key ingredient for the proof of the spectral theorem is the
spectral radius
r(x) := sup |α| (6.18)
α∈σ(x)
of x. Note that by (6.15) we have
r(x) ≤ kxk. (6.19)
As our next theorem shows, it is related to the radius of convergence of the
Neumann series for the resolvent
∞
−1 1 X x n
(x − α) = − (6.20)
α α
n=0
encountered in the proof of Theorem 6.3 (which is just the Laurent expansion
around infinity).
Theorem 6.6 (Beurling–Gelfand). The spectral radius satisfies
r(x) = inf kxn k1/n = lim kxn k1/n . (6.21)
n∈N n→∞
Then `((x − α)−1 ) is analytic in |α| > r(x) and hence (6.22) converges
absolutely for |α| > r(x) by Cauchy’s integral formula for derivatives. Hence
for fixed α with |α| > r(x), `(xn /αn ) converges to zero for every ` ∈ X ∗ .
Since every weakly convergent sequence is bounded we have
kxn k
≤ C(α)
|α|n
and thus
lim sup kxn k1/n ≤ lim sup C(α)1/n |α| = |α|.
n→∞ n→∞
Since this holds for every |α| > r(x) we have
r(x) ≤ inf kxn k1/n ≤ lim inf kxn k1/n ≤ lim sup kxn k1/n ≤ r(x),
n→∞ n→∞
162 6. Bounded linear operators
Note that it might be tempting to conjecture that the sequence kxn k1/n
is monotone, however this is false in general – see Problem 6.7. To end this
section let us look at some examples illustrating these ideas.
Example 6.13. In X := C(I) we have σ(x) = x(I) and hence r(x) = kxk∞
for all x.
Example 6.14. If X := L (C2 ) and x := ( 00 10 ) such that x2 = 0 and
consequently r(x) = 0. This is not surprising, since x has the only eigenvalue
0. In particular, the spectral radius can be strictly smaller then the norm
(note that kxk = 1 in our example). The same is true for any nilpotent
matrix. In general x will be called nilpotent if xn = 0 for some n ∈ N and
any nilpotent element will satisfy r(x) = 0.
Example 6.15. Consider the linear Volterra integral operator
Z t
K(x)(t) := k(t, s)x(s)ds, x ∈ C([0, 1]), (6.23)
0
then, using induction, it is not hard to verify (Problem 6.9)
kkkn∞ tn
|K n (x)(t)| ≤ kxk∞ . (6.24)
n!
Consequently
kkkn∞
kK n xk∞ ≤ kxk∞ ,
n!
kkkn
that is kK n k ≤ n! ,
∞
which shows
kkk∞
r(K) ≤ lim = 0.
n→∞ (n!)1/n
Hence r(K) = 0 and for every λ ∈ C and every y ∈ C(I) the equation
x − λK x = y (6.25)
has a unique solution given by
∞
X
x = (I − λK)−1 y = λn K n y. (6.26)
n=0
Note that σ(K) = {0} but 0 is in general not an eigenvalue (consider
e.g. k(t, s) = 1). Elements of a Banach algebra with r(x) = 0 are called
quasinilpotent.
In the last two examples we have seen a strict inequality in (6.19). If we
regard r(x) as a spectral norm for x, then the spectral norm does not control
the algebraic norm in such a situation. On the other hand, if we had equal-
ity for some x, and moreover, this were also true for any polynomial p(x),
then spectral mapping would imply that the spectral norm supα∈σ(x) |p(α)|
6.1. Banach algebras 163
equals the algebraic norm kp(x)k and convergence on one side would imply
convergence on the other side. So by taking limits we could get an isometric
identification of elements of the form f (x) with functions f ∈ C(σ(x)). But
this is nothing but the content of the spectral theorem and self-adjointness
will be the property which will make all this work.
Problem* 6.1. Show that the multiplication in a Banach algebra X is con-
tinuous: xn → x and yn → y imply xn yn → xy.
Problem* 6.2. Suppose that X satisfies all requirements for a Banach al-
gebra except that (6.3) is replaced by
kxyk ≤ Ckxkkyk, C > 0.
Of course one can rescale the norm to reduce it to the case C = 1. However,
this might have undesirable side effects in case there is a unit. Show that if
X has a unit e, then kek ≥ C −1 and there is an equivalent norm k.k0 which
satisfies (6.3) and kek0 = 1.
Finally, note that for this construction to work it suffices to assume that
multiplication is separately continuous by Problem 4.6.
(Hint: Identify x ∈ X with the operator Lx : X → X, y 7→ xy in L (X).
For the last part use the uniform boundedness principle.)
Problem* 6.3 (Unitization). Show that if X is a Banach algebra then
C ⊕ X is a unital Banach algebra, where we set k(α, x)k = |α| + kxk and
(α, x)(β, y) = (αβ, αy + βx + xy).
Problem 6.4. Show σ(x−1 ) = σ(x)−1 if x is invertible.
Problem* 6.5. Suppose x has both a right inverse y (i.e., xy = e) and a
left inverse z (i.e., zx = e). Show that y = z = x−1 .
Problem* 6.6. Suppose xy and yx are both invertible, then so are x and y:
y −1 = (xy)−1 x = x(yx)−1 , x−1 = (yx)−1 y = y(xy)−1 .
(Hint: Previous problem.)
Problem* 6.7. Let X := L (C2 ) and compute kxn k1/n for x := 0 α
β 0 .
Conclude that this sequence is not monotone in general.
Problem 6.8. Let X := `∞ (N). Show σ(x) = {xn }n∈N . Also show that
r(x) = kxk for all x ∈ X.
Problem* 6.9. Show (6.24).
Problem 6.10. Show that L1 (Rn ) with convolution as multiplication is a
commutative Banach algebra without identity (Hint: Lemma 3.20 from [47]).
164 6. Bounded linear operators
The next result generalizes the fact that self-adjoint operators have only
real eigenvalues.
Lemma 6.8. If x is self-adjoint, then σ(x) ⊆ R. If x is positive, then
σ(x) ⊆ [0, ∞).
Proof. First of all, Φ is well defined for polynomials p and given by Φ(p) =
p(x). Moreover, since p(x) is normal spectral mapping implies
kp(x)k = r(p(x)) = sup |α| = sup |p(α)| = kpk∞
α∈σ(p(x)) α∈σ(x)
for every polynomial p. Hence Φ is isometric. Next we use that the poly-
nomials are dense in C(σ(x)). In fact, to see this one can either consider
a compact interval I containing σ(x) and use the Tietze extension theo-
rem (Theorem B.29 to extend f to I and then approximate the extension
using polynomials (Theorem 1.3) or use the Stone–Weierstraß theorem (The-
orem B.41). Thus Φ uniquely extends to a map on all of C(σ(x)) by Theo-
rem 1.16. By continuity of the norm this extension is again isometric. Sim-
ilarly, we have Φ(f g) = Φ(f )Φ(g) and Φ(f )∗ = Φ(f ∗ ) since both relations
hold for polynomials.
To show σ(f (x)) = f (σ(x)) fix some α ∈ C. If α 6∈ f (σ(x)), then
1
g(t) = f (t)−α ∈ C(σ(x)) and Φ(g) = (f (x) − α)−1 ∈ X shows α 6∈ σ(f (x)).
Conversely, if α 6∈ σ(f (x)) then g = Φ−1 ((f (x) − α)−1 ) = f −α
1
is continuous,
which shows α 6∈ f (σ(x)).
f is given by a power series, then f (A) defined via Φ coincides with f (A)
defined via its power series (cf. Problem 1.36).
Problem* 6.13 (Unitization). Show that if X is a non-unital C ∗ algebra
then C⊕X is a unital C ∗ algebra, where we set k(α, x)k := sup{kαy+xyk|y ∈
X, kyk ≤ 1}, (α, x)(β, y) = (αβ, αy + βx + xy) and (α, x)∗ = (α∗ , x∗ ). (Hint:
It might be helpful to identify x ∈ X with the operator Lx : X → X, y 7→ xy
in L (X). Moreover, note kLx k = kxk.)
Problem* 6.14. Let X be a C ∗ algebra and Y a ∗-subalgebra. Show that if
Y is commutative, then so is Y .
Problem 6.15. Show that the map Φ from the spectral theorem is positivity
preserving, that is, f ≥ 0 if and only if Φ(f ) is positive.
Problem 6.16. Let x be self-adjoint. Show that the following are equivalent:
(i) σ(x) ⊆ [0, ∞).
(ii) x is positive.
(iii) kλ − xk ≤ λ for all λ ≥ kxk.
(iv) kλ − xk ≤ λ for one λ ≥ kxk.
Problem 6.17. Let A ∈ L (H). Show that A is normal if and only if
kAuk = kA∗ uk, ∀u ∈ H.
In particular, Ker(A) = Ker(A∗ ). (Hint: Problem 1.20.)
Problem 6.18. Show that the Cayley transform of a self-adjoint element
x,
y = (x − i)(x + i)−1
is unitary. Show that 1 6∈ σ(y) and
x = i(1 + y)(1 − y)−1 .
Problem 6.19. Show if x is unitary then σ(x) ⊆ {α ∈ C||α| = 1}.
Problem 6.20. Suppose x is self-adjoint. Show that
1
k(x − α)−1 k = .
dist(α, σ(x))
Since in the bijective case boundedness of the inverse comes for free from
the inverse mapping theorem (Theorem 4.6), there are basically two things
which can go wrong: Either our map is not injective or it is not surjective.
Moreover, in the latter case one can also ask how far it is away from being
surjective, that is, if the range is dense or not. Accordingly one defines the
point spectrum
σp (A) := {α ∈ σ(A)| Ker(A − α) 6= {0}} (6.35)
as the set of all eigenvalues, the continuous spectrum
σc (A) := {α ∈ σ(A) \ σp (A)|Ran(A − α) = X} (6.36)
and finally the residual spectrum
σr (A) := {α ∈ σ(A) \ σp (A)|Ran(A − α) 6= X}. (6.37)
Clearly we have
σ(A) = σp (A) ∪· σc (A) ∪· σr (A). (6.38)
Here the dot indicates that the union is disjoint. Note that in a Hilbert space
σx (A∗ ) = σx (A0 )∗ for x ∈ {p, c, r}.
Example 6.20. Suppose H is a Hilbert space and A = A∗ is self-adjoint.
Then by (2.28), σr (A) = ∅.
Example 6.21. Suppose X := `p (N) and L is the left shift. Then σ(L) =
B̄1 (0). Indeed, a simple calculation shows that Ker(L − α) = span{(αj )j∈N }
for |α| < 1 if 1 ≤ p < ∞ and for |α| ≤ 1 if p = ∞. Hence σp (L) = B1 (0) for
1 ≤ p < ∞ and σp (L) = B̄1 (0) if p = ∞. In particular, since the spectrum
is closed and kLk P= 1 we have σ(L) = B̄1 (0). Moreover, for y ∈ `c (N)
we set xj := − ∞ k=j α j−k−1 y such that (L − α)x = y. In particular,
k
`c (N) ⊂ Ran(L − α) and hence Ran(S − α) is dense for 1 ≤ p < ∞. Thus
σc (L) = ∂B1 (0) for 1 ≤ p < ∞. Consequently, σr (L) = ∅.
Since A is invertible if and only if A0 is by Theorem 4.26 we obtain:
Lemma 6.10. Suppose A ∈ L (X). Then
σ(A) = σ(A0 ). (6.39)
Moreover,
σp (A0 ) ⊆ σp (A) ∪· σr (A), σp (A) ⊆ σp (A0 ) ∪· σr (A0 ),
σr (A0 ) ⊆ σp (A) ∪· σc (A), σr (A) ⊆ σp (A0 ), (6.40)
0 0 0
σc (A ) ⊆ σc (A), σc (A) ⊆ σr (A ) ∪· σc (A ).
If in addition, X is reflexive we have σr (A0 ) ⊆ σp (A) as well as σc (A0 ) =
σc (A).
Proof. This follows from Lemma 4.25 and (4.20). In the reflexive case use
A∼= A00 .
6.3. Spectral theory for bounded operators 169
Example 6.22. Consider L0 from the previous example, which is just the
right shift in `q (N) if 1 ≤ p < ∞. Then σ(L0 ) = σ(L) = B̄1 (0). Moreover,
it is easy to see that σp (L0 ) = ∅. Thus in the reflexive case 1 < p < ∞ we
have σc (L0 ) = σc (L) = ∂B1 (0) as well as σr (L0 ) = σ(L0 ) \ σc (L0 ) = B1 (0).
Otherwise, if p = 1, we only get B1 (0) ⊆ σr (L0 ) and σc (L0 ) ⊆ σc (L) =
∂B1 (0). Hence it remains to investigate Ran(L0 − α) for |α| = 1: If we have
(L0 − α)x = y with some y ∈ `p (N) we must have xj := −α−j−1 jk=1 αk yk .
P
Thus y = (αn )n∈N is clearly not in Ran(L0 − α). Moreover, if ky − ỹk∞ ≤ ε
we have |x̃j | = | jk=1 αk ỹk | ≥ (1 − ε)j and hence ỹ 6∈ Ran(L0 − α), which
P
shows that the range is not dense and hence σr (L0 ) = B̄1 (0), σc (L0 ) = ∅.
Moreover, for compact operators the spectrum is particularly simple (cf.
also Theorem 3.7). We start with the following observation:
and
X ⊇ Ran(A) ⊇ Ran(A2 ) ⊇ Ran(A3 ) ⊇ · · · (6.42)
(K − α)x = y (6.44)
6.4. Spectral measures 173
Proof. Consider the continuous functions on I = [−kAk, kAk] and note that
every f ∈ C(I) gives rise to some f ∈ C(σ(A)) by restricting its domain.
Clearly `u,v (f ) = hu, f (A)vi is a bounded linear functional and the existence
of a corresponding measure µu,v with |µu,v |(I) = k`u,v k ≤ kukkvk follows
from the Riesz representation theorem (Theorem 6.5 from [47]). Since `u,v (f )
depends only on the value of f on σ(A) ⊆ I, µu,v is supported on σ(A).
Moreover, if f ≥ 0 we have `u (f ) = hu, f (A)ui = hf (A)1/2 u, f (A)1/2 ui =
kf (A)1/2 uk2 ≥ 0 and hence `u is positive and the corresponding measure µu
is positive. The rest follows from the properties of the scalar product.
for every u ∈ H. Here the dot inside the union just emphasizes that the sets
are mutually disjoint. Such a family of projections is called a projection-
valued measure. Indeed the first claim follows since χR = 1 and by
χΩ1 ∪· Ω2 = χΩ1 + χΩ2 if Ω1 ∩ Ω2 = ∅ the second claim follows at least for
finite unions. The case of
Pcountable unions follows from the last part of the
previous theorem since N χ
n=1 Ωn = χ S
·N → χ · n=1 Ωn pointwise (note
S ∞
n=1 Ωn
that the limit will not be uniform unless the Ωn are eventually empty and
hence there is no chance that this series will converge in the operator norm).
Moreover, since all spectral measures are supported on σ(A) the same is true
for PA in the sense that
PA (σ(A)) = I. (6.54)
I also remark that in this connection the corresponding distribution function
PA (t) := PA ((−∞, t]) (6.55)
is called a resolution of the identity.
Using our projection-valued measure we can define an operator-valued
integral as follows: For every simple function f = nj=1 αj χΩj (where Ωj =
P
176 6. Bounded linear operators
By (6.51) we conclude that this definition agrees with f (A) from Theo-
rem 6.17: Z
f (t)dPA (t) = f (A). (6.57)
R
Extending this integral to functions from B(σ(A)) by approximating such
functions with simple functions we get an alternative way of defining f (A)
for such functions. This can in fact be done by just using the definition of
a projection-valued measure and hence there is a one-to-one correspondence
between projection-valued measures (with bounded support) and (bounded)
self-adjoint operators such that
Z
A = t dPA (t). (6.58)
Hence
Pm using that any f ∈ B(σ(A)) is given as a simple function f =
j=1 f (αj )χ{αj } we obtain
Z m
X
f (A) = f (t)dPA (t) = f (αj )PA ({αj }).
j=1
points (the value of this function being m(x)). This gives a map, the Gelfand
representation, from X into an algebra of functions.
A nonzero algebra homeomorphism m : X → C will be called a multi-
plicative linear functional or character:
m(xy) = m(x)m(y), m(e) = 1. (6.60)
Note that the last equation comes for free from multiplicativity since m is
nontrivial. Moreover, there is no need to require that m is continuous as this
will also follow automatically (cf. Lemma 6.20 below).
As we will see, they are closely related to ideals, that is linear subspaces
I of X for which a ∈ I, x ∈ X implies ax ∈ I and xa ∈ I. An ideal is called
proper if it is not equal to X and it is called maximal if it is not contained
in any other proper ideal.
Example 6.30. Let X := C([a, b]) be the continuous functions over some
compact interval. Then for fixed x0 ∈ [a, b], the linear functional mx0 (f ) :=
f (x0 ) is multiplicative. Moreover, its kernel Ker(mx0 ) = {f ∈ C([a, b])|f (x0 ) =
0} is a maximal ideal (we will prove this in more generality below).
Example 6.31. Let X be a Banach space. Then the compact operators are
a closed ideal in L (X) (cf. Theorem 3.1).
We first collect a few elementary properties of ideals.
Lemma 6.19. Let X be a unital Banach algebra.
(i) A proper ideal can never contain an invertible element.
(ii) If X is commutative every non-invertible element is contained in
a proper ideal.
(iii) The closure of a (proper) ideal is again a (proper) ideal.
(iv) Maximal ideals are closed.
(v) Every proper ideal is contained in a maximal ideal.
Note that if I is a closed ideal, then the quotient space X/I (cf. Lemma 1.18)
is again a Banach algebra if we define
[x][y] = [xy]. (6.61)
Indeed (x + I)(y + I) = xy + I and hence the multiplication is well-defined
and inherits the distributive and associative laws from X. Also [e] is an
identity. Finally,
k[xy]k = inf kxy + ak = inf k(x + b)(y + c)k ≤ inf kx + bk inf ky + ck
a∈I b,c∈I b∈I c∈I
= k[x]kk[y]k. (6.62)
In particular, the projection map π : X → X/I is a Banach algebra homo-
morphism.
Example 6.32. Consider the Banach algebra L (X) together with the ideal
of compact operators C (X). Then the Banach algebra L (X)/C (X) is
known as the Calkin algebra. Atkinson’s theorem (Theorem 6.34) says
that the invertible elements in the Calkin algebra are precisely the images of
the Fredholm operators.
Lemma 6.20. Let X be a unital Banach algebra and m a character. Then
Ker(m) is a maximal ideal and m is continuous with kmk = m(e) = 1.
of the unit ball in X ∗ and the first claim follows form the Banach–Alaoglu
theorem (Theorem 5.10).
Next (x+y)∧ (m) = m(x+y) = m(x)+m(y) = x̂(m)+ ŷ(m), (xy)∧ (m) =
m(xy) = m(x)m(y) = x̂(m)ŷ(m), and ê(m) = m(e) = 1 shows that the
Gelfand transform is an algebra homomorphism.
Moreover, if m(x) = α then x − α ∈ Ker(m) implying that x − α is
not invertible (as maximal ideals cannot contain invertible elements), that
is α ∈ σ(x). Conversely, if X is commutative and α ∈ σ(x), then x − α is
not invertible and hence contained in some maximal ideal, which in turn is
the kernel of some character m. Whence m(x − α) = 0, that is m(x) = α
for some m.
6.5. The Gelfand representation theorem 181
the Gelfand–Naimark theorem below will show that the Gelfand transform
is bijective for commutative C ∗ algebras.
Since 0 6∈ σ(x) implies that x is invertible the Gelfand representation
theorem also contains a useful criterion for invertibility.
Corollary 6.23. In a commutative unital Banach algebra an element x is
invertible if and only if m(x) 6= 0 for all characters m.
And applying this to the last example we get the following famous the-
orem of Wiener:
Theorem 6.24 (Wiener). Suppose f ∈ Cper [−π, π] has an absolutely con-
vergent Fourier series and does not vanish on [−π, π]. Then the function f1
also has an absolutely convergent Fourier series.
The first moral from this theorem is that from an abstract point of view
there is only one commutative C ∗ algebra, namely C(K) with K some com-
pact Hausdorff space. Moreover, the formulation also very much reassembles
6.5. The Gelfand representation theorem 183
the spectral theorem and in fact, we can derive the spectral theorem by ap-
plying it to C ∗ (x), the C ∗ algebra generated by x (cf. (6.32)). This will even
give us the more general version for normal elements. As a preparation we
show that it makes no difference whether we compute the spectrum in X or
in C ∗ (x).
Lemma 6.27 (Spectral permanence). Let X be a C ∗ algebra and Y ⊆ X
a closed ∗-subalgebra containing the identity. Then σ(y) = σY (y) for every
y ∈ Y , where σY (y) denotes the spectrum computed in Y .
Proof. Clearly we have σ(y) ⊆ σY (y) and it remains to establish the reverse
inclusion. If (y−α) has an inverse in X, then the same is true for (y−α)∗ (y−
α) and (y − α)(y − α)∗ . But the last two operators are self-adjoint and hence
have real spectrum in Y . Thus ((y − α)∗ (y − α) + ni )−1 ∈ Y and letting
n → ∞ shows ((y − α)∗ (y − α))−1 ∈ Y since taking the inverse is continuous
and Y is closed. Similarly ((y −α)(y −α)∗ )−1 ∈ Y and whence (y −α)−1 ∈ Y
by Problem 6.6.
Proof. First of all note that the induced map à : X/ Ker(A) → Y is in-
jective (Problem 1.43). Moreover, the assumption that the cokernel is finite
says that there is a finite subspace Y0 ⊂ Y such that Y = Y0 u Ran(A).
Then
 : X/ Ker(A) ⊕ Y0 → Y, Â(x, y) = Ãx + y
is bijective and hence a homeomorphism by Theorem 4.6. Since X is a closed
subspace of X ⊕ Y0 we see that Ran(A) = Â(X) is closed in Y .
of finite multiplicity (in fact, inspecting this example shows that the converse
is also true).
It is however important to notice that Ran(A)⊥ finite dimensional does
not imply Ran(A) closed! For example consider (Ax)n = n1 xn in `2 (N) whose
range is dense but not closed.
Another useful formula concerns the product of two Fredholm operators.
For its proof it will be convenient to use the notion of an exact sequence:
Let Xj be Banach spaces. A sequence of operators Aj ∈ L (Xj , Xj+1 )
A
1 2 A n A
X1 −→ X2 −→ X3 · · · Xn −→ Xn+1
is exact. Here the maps which are not explicitly stated are canonical inclu-
sions/projections. Hence by Problem 6.30, if two operators are Fredholm,
so is the third. Moreover, the formula for the index follows from Prob-
lem 6.31.
Next we want to look a bit further into the structure of Fredholm op-
erators. First of all, since Ker(A) is finite dimensional it is complemented
(Problem 4.26), that is, there exists a closed subspace X0 ⊆ X such that
X = Ker(A)uX0 and a corresponding projection P ∈ L (X) with Ran(P ) =
Ker(A). Similarly, Ran(A) is complemented (Problem 1.44) and there exists
a closed subspace Y0 ⊆ Y such that Y = Y0 u Ran(A) and a corresponding
projection Q ∈ L (Y ) with Ran(Q) = Y0 . With respect to the decomposition
Ker(A) ⊕ X0 → Y0 ⊕ Ran(A) our Fredholm operator is given by
0 0
A= , (6.75)
0 A0
6.6. Fredholm operators 187
Proof. That I−K is Fredholm follows from Lemma 6.11 since K 0 is compact
as well and Coker(I − K)∗ ∼= Ker(I − K 0 ) by Problem 4.28. Furthermore, the
index is constant along [0, 1] → Φ(X), α 7→ I − αK and hence ind(I − K) =
ind(I) = 0.
Fredholm operators are also used to split the spectrum. For A ∈ L (X)
one defines the essential spectrum
σess (A) := {α ∈ C|A − α 6∈ Φ0 (X)} ⊆ σ(A) (6.78)
and the Fredholm spectrum
σΦ (A) := {α ∈ C|A − α 6∈ Φ(X)} ⊆ σess (A). (6.79)
By Dieudonné’s theorem both σess (A) and σΦ (A) are closed. Warning:
These definitions are not universally accepted and several variants can be
found in the literature.
Example 6.38. let X be infinite dimensional and K ∈ C (X). Then σess (K) =
σΦ (K) = {0}.
By Corollary 6.35 both the Fredholm spectrum and the essential spec-
trum are invariant under compact perturbations:
Theorem 6.36 (Weyl). Let A ∈ L (X), then
σΦ (A + K) = σΦ (A), σess (A + K) = σess (A), K ∈ C (X). (6.80)
Nonlinear Functional
Analysis
Chapter 7
Analysis in Banach
spaces
f (t + ε) − f (t)
f˙(t) := lim (7.1)
ε→0 ε
exists. If t is a boundary point, the limit/derivative is understood as the
corresponding onesided limit/derivative.
The set of functions f : I → X which are differentiable at all t ∈ I and
for which f˙ ∈ C(I, X) is denoted by C 1 (I, X). Clearly C 1 (I, X) ⊂ C(I, X).
As usual we set C k+1 (I, X) := {f ∈ C 1 (I, X)|f˙ ∈ C k (I, X)}. Note that if
U ∈ L (X, Y ) and f ∈ C k (I, X), then U f ∈ C k (I, Y ) and dt
d
U f = U f˙.
The following version of the mean value theorem will be crucial.
for s ≤ t ∈ I.
193
194 7. Analysis in Banach spaces
In particular,
Corollary 7.2. For f ∈ C 1 (I, X) we have f˙ = 0 if and only if f is constant.
X by
Z b n
X
f (t)dt := xj (tj − tj−1 ), (7.4)
a j=1
since this holds for simple functions by the triangle inequality and hence for
all functions by approximation.
We remark that it is possible to extend the integral to a larger class of
functions in various ways. The first generalization is to replace step functions
by simple functions (and at the same time one could also replace the Lebesgue
measure on I by an arbitrary finite measure). Then the same approach
defines the integral for uniform limits of simple functions. However, things
only get interestingR when you also replace the sup norm by an L1 type
seminorm: kf k1 := kf (x)k dµ(x). As before the integral can be extended
to all functions which can be approximated by simple functions with respect
to this seminorm. This is know as the Bochner integral and we refer to
Section 5.5 from [47] for details.
In addition, if A ∈ L (X, Y ), then f ∈ R(I, X) implies Af ∈ R(I, Y )
and
Z b Z b
A f (t)dt = A f (t)dt. (7.7)
a a
Again this holds for step functions and thus extends to all regulated functions
by continuity. In particular, if ` ∈ X ∗ is a continuous linear functional, then
Z b Z b
`( f (t)dt) = `(f (t))dt, f ∈ R(I, X). (7.8)
a a
Rt
Moreover, we will use the usual conventions t12 f (s)ds := I χ(t1 ,t2 ) (s)f (s)ds
R
Rt Rt
and t21 f (s)ds := − t12 f (s)ds. Note that we could replace (t1 , t2 ) by a
closed or half-open interval with the same endpoints (why?) and hence
R t3 R t2 R t3
t1 f (s)ds = t1 f (s)ds + t2 f (s)ds.
Rt
Hence if F ∈ C 1 (I, X) then G(t) := a (Ḟ (s))ds satisfies Ġ = Ḟ and hence
F (t) = C + G(t) by Corollary 7.2. Choosing t = a finally shows F (a) =
C.
Problem* 7.1 (Product rule). Let X be a Banach algebra. Show that if
d
f, g ∈ C 1 (I, X) then f g ∈ C 1 (I, X) and dt f g = f˙g + f ġ.
Problem* 7.2. Let f ∈ R(I, X) and I˜ := I + t0 . then f (t − t0 ) ∈ R(I,
˜ X)
and Z Z
f (t)dt = f (t − t0 )dt.
I I˜
dG(x))u = o(u) implying that for every ε > 0 we can find a δ > 0 such that
|(dF (x)−dG(x))u| ≤ ε|u| whenever |u| ≤ δ. By homogeneity of the norm we
conclude kdF (x) − dG(x)k ≤ ε and since ε > 0 is arbitrary dF (x) = dG(x).
Note that for this argument to work it is crucial that we can approach x
from arbitrary directions u, which explains our requirement that U should
be open.
If I ⊆ R, we have an isomorphism L (I, X) ≡ X and if F : I → X
we will write Ḟ (t) instead of dF (t) if we regard dF (t) as an element of X.
Clearly this is consistent with the definition (7.1) from the previous section.
Example 7.1. Let X be a Hilbert space and consider F : X → R given by
F (x) := |x|2 . Then
F (x + u) = hx + u, x + ui = |x|2 + 2Rehx, ui + |u|2 = F (x) + 2Rehx, ui + o(u).
Hence if X is a real Hilbert space, then F is differentiable with dF (x)u =
2hx, ui. However, if X is a complex Hilbert space, then F is not differentiable.
The previous example emphasizes that for F : U ⊆ X → Y it makes a big
difference whether X is a real or a complex Banach space. In fact, in case of
a complex Banach space X, we obtain a version of complex differentiability
which of course is much stronger than real differentiability. Note that in this
respect it makes no difference whether Y is real or complex.
Example 7.2. Suppose f ∈ C 1 (R) with f (0) = 0. Let X := `pR (N), then
F : X → X, (xn )n∈N 7→ (f (xn ))n∈N
is differentiable for every x ∈ X with derivative given by the multiplication
operator
(dF (x)u)n = f 0 (xn )un .
First of all note that the mean value theorem implies |f (t)| ≤ MR |t| for
|t| ≤ R with MR := sup|t|≤R |f 0 (t)|. Hence, since kxk∞ ≤ kxkp , we have
kF (x)kp ≤ Mkxk∞ kxkp and F is well defined. This also shows that multipli-
cation by f 0 (xn ) is a bounded linear map. To establish differentiability we
use Z 1
0
f 0 (t + sτ ) − f 0 (t) dτ
f (t + s) − f (t) − f (t)s = s
0
and since f 0 is uniformly continuous on every compact interval, we can find
a δ > 0 for every given R > 0 and ε > 0 such that
|f 0 (t + s) − f 0 (t)| < ε if |s| < δ, |t| < R.
Now for x, u ∈ X with kxk∞ < R and kuk∞ < δ we have |f (xn + un ) −
f (xn ) − f 0 (xn )un | < ε|un | and hence
kF (x + u) − F (x) − dF (x)ukp < εkukp
198 7. Analysis in Banach spaces
δF (0) = I
or, if µ is finite,
q
p
|G(z)| ≤ C(1 + |z| ), |∂x G(z)|2 + |∂y G(z)|2 ≤ C(1 + |z|p−1 ).
Note that the first condition comes for free from the second in the finite case
and also in the general case if G(0) = 0. We only write down the estimates in
the first case and leave the easy adaptions for the second case as an exercise.
Then Z
N (f ) := G(f )dµ
X
is Gâteaux differentiable and we have
Z
δN (f )g = (∂x G)(f )Re(g) + (∂y G)(f )Im(g) dµ.
X
200 7. Analysis in Banach spaces
In fact, by the chain rule h(ε) := G(f + εg) is differentiable with h0 (0) =
(∂x G)(f )Re(g) + (∂y G)(f )Im(g). Moreover, by the mean value theorem
h(ε) − h(0) q
≤ sup (∂x G)(f + τ g)2 + (∂y G)(f + τ g)2 |g|
ε 0≤τ ≤ε
Proof. For every ε > 0 we can find a δ > 0 such that |F (x + u) − F (x) −
dF (x) u| ≤ ε|u| for |u| ≤ δ. Now choose M = kdF (x)k + ε.
Example 7.6. Note that this lemma fails for the Gâteaux derivative as the
example of an unbounded linear function shows. In fact, it already fails in
R2 as the function F : R2 → R given by F (x, y) = 1 for y = x2 6= 0 and
F (x, y) = 0 else shows: It is Gâteaux differentiable at 0 with δF (0) = 0 but
it is not continuous since limε→0 F (ε, ε2 ) = 1 6= 0 = F (0, 0).
7.2. Multivariable calculus in Banach spaces 201
Using |ṽ| ≤ kdF (x)k|u| + |o(u)| we see that o(ṽ) = o(u) and hence
G(F (x + u)) = G(y) + dG(y)v + o(u) = G(F (x)) + dG(F (x)) ◦ dF (x)u + o(u)
as required. This establishes the case r = 1. The general case follows from
induction.
Proof. First of all note that ∂j F (x) ∈ L (R, Y ) and thus it can be regarded
as an element of Y . Clearly the same applies to ∂i ∂j F (x). Let ` ∈ Y ∗
be a bounded linear functional, then ` ◦ F ∈ C 2 (R2 , R) and hence ∂i ∂j (` ◦
F ) = ∂j ∂i (` ◦ F ) by the classical theorem of Schwarz. Moreover, by our
remark preceding this lemma ∂i ∂j (` ◦ F ) = ∂i `(∂j F ) = `(∂i ∂j F ) and hence
`(∂i ∂j F ) = `(∂j ∂i F ) for every ` ∈ Y ∗ implying the claim.
Finally, note that to each L ∈ L n (X, Y ) we can assign its polar form
L ∈ C(X, Y ) using L(x) = L(x, . . . , x), x ∈ X. If L is symmetric it can be
reconstructed using polarization (Problem 7.8):
n
1 X
L(u1 , . . . , un ) = ∂t1 · · · ∂tn L( ti ui ). (7.24)
n!
i=1
Proof. As in the proof of the previous lemma, the case r = 0 is just the
fundamental theorem of calculus applied to f (t) := F (x + tu). For the in-
duction step we use integration by parts. To this end let fj ∈ C 1 ([0, 1], Xj ),
L ∈ L 2 (X1 × X2 , Y ) bilinear. Then the product rule (7.25) and the funda-
mental theorem of calculus imply
Z 1 Z 1
˙
L(f1 (t), f2 (t))dt = L(f1 (1), f2 (1))−L(f1 (0), f2 (0))− L(f1 (t), f˙2 (t))dt.
0 0
Hence applying integration by parts with L(y, t) = ty, f1 (t) = dr+1 F (x+ut),
r+1
and f2 (t) = (1−t)
(r+1)! establishes the induction step.
7.2. Multivariable calculus in Banach spaces 207
Of course this also gives the Peano form for the remainder:
Corollary 7.14. Suppose U ⊆ X and F ∈ C r (U, Y ). Then
1 1
F (x+u) = F (x)+dF (x)u+ d2 F (x)u2 +· · ·+ dr F (x)ur +o(|u|r ). (7.31)
2 r!
Proof. Just estimate
Z 1
1 1
r−1 r
(1 − t) d F (x + tu)dt − d F (x) ur
r
(r − 1)! r!
0
r Z 1
|u|
≤ (1 − t)r−1 kdr F (x + tu) − dr F (x)kdt
(r − 1)! 0
|u|r
≤ sup kdr F (x + tu) − dr F (x)k.
r! 0≤t≤1
The set of all r times continuously differentiable functions for which this
norm is finite forms a Banach space which is denoted by Cbr (U, Y ).
In the definition of differentiability we have required U to be open. Of
course there is no stringent reason for this and (7.12) could simply be required
for all sequences from U \ {x} converging to x. However, note that the
derivative might not be unique in case you miss some directions (the ultimate
problem occurring at an isolated point). Our requirement avoids all these
issues. Moreover, there is usually another way of defining differentiability
at a boundary point: By C r (U , Y ) we denote the set of all functions in
C r (U, Y ) all whose derivatives of order up to r have a continuous extension
to U . Note that if you can approach a boundary point along a half-line then
the fundamental theorem of calculus shows that the extension coincides with
the Gâteaux derivative.
Problem 7.5. Let X be a real Hilbert space, A ∈ L (X) and F (x) :=
hx, Axi. Compute dn F .
Problem* 7.6. Let X := C([0, 1], R) and suppose f ∈ C 1 (R). Show that
F : X → X, x 7→ f ◦ x
is differentiable for every x ∈ X with derivative given by
(dF (x)y)(t) = f 0 (x(t))y(t).
Problem 7.7. Let X := `2 (N), Y := `1 (N) and F : X → Y given by
F (x)j := x2j . Show F ∈ C ∞ (X, Y ) and compute all derivatives.
208 7. Analysis in Banach spaces
Note that if δ n F (x, u) exists, then δ n F (x, λu) exists for every λ ∈ R and
δ n F (x, λu) = λn δ n F (x, u), λ ∈ R. (7.35)
However, the condition > 0 for all unit vectors u is not sufficient as
δ 2 F (x, u)
there are certain features you might miss when you only look at the function
along rays through a fixed point. This is demonstrated by the following
example:
Example 7.13. Let X = R2 and consider the points (xn , yn ) := ( n1 , n12 ).
For each point choose a radius rn such that the balls Bn := Brn (xn , yn )
2
are disjoint and lie between two parabolas Bn ⊂ {(x, y)|x ≥ 0, x2 ≤ y ≤
2x2 }. Moreover, choose a smooth nonnegative bump function φ(r2 ) with
support in [−1, 1] and maximum 1 at 0. Now consider F (x, y) = x2 + y 2 −
2 2
2 n∈N ρn φ( (x−xn ) r+(y−y n)
), where ρn = x2n + yn2 . By construction F is
P
2
n
smooth away from zero. Moreover, at zero F is continuous and Gâteaux
differentiable of arbitrary order with F (0, 0) = 0, δF ((0, 0), (u, v)) = 0,
δ 2 F ((0, 0), (u, v)) = 2(u2 + v 2 ), and δ k F ((0, 0), (u, v)) = 0 for k ≥ 3.
In particular, F (ut, vt) has a strict local minimum at t = 0 for every
(u, v) ∈ R2 \{0}, but F has no local minimum at (0, 0) since F (xn , yn ) = −ρn .
Cleary F is not differentiable at 0. In fact, note that the Gâteaux derivatives
are not continuous at 0 (the derivatives in Bn grow like rn−2 ).
Lemma 7.15. Suppose F : U → R has Gâteaux derivatives up to the order
of two. A necessary condition for x ∈ U to be a local minimum is that
δF (x, u) = 0 and δ 2 F (x, u) ≥ 0 for all u ∈ X. A sufficient condition for a
strict local minimum is if in addition δ 2 F (x, u) ≥ c > 0 for all u ∈ ∂B1 (0)
and δ 2 F is continuous at x uniformly with respect to u ∈ ∂B1 (0).
Proof. The necessary conditions have already been established. To see the
sufficient conditions note that the assumptions on δ 2 F imply that there is
some ε > 0 such that δ 2 F (y, u) ≥ 2c for all y ∈ Bε (x) and all u ∈ ∂B1 (0).
Equivalently, δ 2 F (y, u) ≥ 2c |u|2 for all y ∈ Bε (x) and all u ∈ X. Hence
applying Taylor’s theorem to f (t) using f¨(t) = δ 2 F (x + tu, u) gives
Z 1
c
F (x + u) = f (1) = f (0) + (1 − s)f¨(s)ds ≥ F (x) + |u|2
0 4
for u ∈ Bε (0).
X x2
n 4
F (x) := − xn .
2n2
n∈N
Then F ∈ C 2 (X, R) with dF (x)u = n∈N ( xnn2 − 4x3n )un and d2 F (x)(u, v) =
P
Z b
S(q) := L(t, q(t), q̇(t))dt
a
stationary, that is
δS(q) = 0.
t−a
q(t) := q(a) + (q(b) − q(a)) + x(t), x ∈ X.
b−a
7.3. Minimizing nonlinear functionals via calculus 211
Proof. Suppose x is a local minimum and F (y) < F (x). Then F (λy + (1 −
λ)x) ≤ λF (y) + (1 − λ)F (x) < F (x) for λ ∈ (0, 1) contradicts the fact that x
is a local minimum. If x, y are two global minima, then F (λy + (1 − λ)x) <
F (y) = F (x) yielding a contradiction unless x = y.
As in the one-dimensional case, convexity can be read off from the second
derivative.
There is also a version using only first derivatives plus the concept of a
monotone operator. A map F : U ⊆ X → X ∗ is monotone if
(F (x) − F (y))(x − y) ≥ 0, x, y ∈ U.
It is called strictly monotone if we have strict inequality for x 6= y. Mono-
tone operators will be the topic of Chapter 12.
Of course we know that the shortest curve between two given points q0 and q1
is a straight line. Notwithstanding that this is evident, defining the length as
7.3. Minimizing nonlinear functionals via calculus 213
the total variation, let us show this by seeking the minimum of the following
functional
Z b
t−a
F (x) := |q 0 (s)|ds, q(t) = x(t) + q0 + (q1 − q0 )
a b−a
for x ∈ X := {x ∈ C 1 ([a, b], Rn )|x(a) = x(b) = 0}. Unfortunately our inte-
grand will not be differentiable unless |q 0 | ≥ c. However, since the absolute
value is convex, so is F and it will suffice to search for a local minimum
within the convex open set C := {x ∈ X||x0 | < |q2(b−a)1 −q0 |
}. We compute
Z b 0
q (s)u0 (s)
δF (x, u) = ds
a |q 0 (s)|
which shows (Lemma 3.24 from [47]) that q 0 /|q 0 | must be constant. Hence
the local minimum in C is indeed a straight line and this must also be a
global minimum in X. However, since the length of a curve is independent
of its parametrization, this minimum is not unique!
Example 7.18. Let us try to find a curve y(x) from y(0) = 0 to y(x1 ) = y1
which minimizes
Z x1 r
1 + y 0 (x)2
F (y) := dx.
0 x
√
Note that since the function t 7→ 1 + t2 is convex, we obtain that F is
convex. Hence it suffices to find a zero of
y 0 (x)u0 (x)
Z x1
δF (y, u) = p dx,
0 x(1 + y 0 (x)2 )
y0
which shows (Lemma 3.24 from [47]) that √ = C −1/2 is constant or
x(1+y 02 )
equivalently r
0 x
y (x) =
C −x
and hence r
x p
y(x) = C arctan − x(C − x).
C −x
The constant C has to be chosen such that y(x1 ) matches the given value
y1 . Note that C 7→ y(x1 ) decreases from πx2 1 to 0 and hence there will be a
unique C > x1 for 0 < y1 < πx2 1 .
Problem 7.11. Consider the least action principle for a classical one-dimensional
particle. Show that
Z b
2
m u̇(s)2 − V 00 (q(s))u(s)2 ds.
δ F (x, u) =
a
Moreover, show that we have indeed a minimum if V 00 ≤ 0.
214 7. Analysis in Banach spaces
Proof. Without loss of generality we can assume F (x) < ∞ for some x ∈ M .
As above we start with a sequence xn ∈ M such that F (xn ) → inf M F < ∞.
If M = X then the fact that F is coercive implies that xn is bounded.
Otherwise, it is bounded since we assumed M to be bounded. Hence by
Theorem 4.31 we can pass to a subsequence such that xn * x0 with x0 ∈
M since M is assumed sequentially closed. Now since F is weakly se-
quentially lower semicontinuous we finally get inf M F = limn→∞ F (xn ) =
lim inf n→∞ F (xn ) ≥ F (x0 ).
such that F (u) > lim inf F (un ). After passing to a subsequence we can
assume that un (x) → u(x) a.e. and hence K(x, un (x)) → K(x, u(x)) a.e.
Finally applying Fatou’s lemma (Theorem 2.4 from [47]) gives the contra-
diction F (u) ≤ lim inf F (un ). Note that this result generalizes to Cn -valued
functions in a straightforward manner.
Moreover, in this case our variational principle reads as follows:
Corollary 7.24. Let X be a reflexive Banach space and let M be a nonempty
closed convex subset. If F : M ⊆ X → R is quasiconvex, lower semicontinu-
ous, and, if M is unbounded, weakly coercive, then there exists some x0 ∈ M
with F (x0 ) = inf M F . If F is strictly quasiconvex then x0 is unique.
which coincides with the weak formulation of our problem. Hence a mini-
mizer (which is necessarily in HR1 (Rn ) ∩ L3R (Rn )) is a weak solution of our
nonlinear elliptic problem and it remains to show existence of a minimizer.
First of all note that
1 1
F (u) ≥ kuk22 − kuk2 kf k2 ≥ kuk22 − kf k22
2 4
and hence F is coercive. To see that it is weakly sequentially lower contin-
uous, observe that for the second terms this follows from Example 7.21 and
the last two are easy. For the first term let un * u in L2 and observe
Proof. This follows from Theorem 4.32 since every weakly convergent se-
quence in X is convergent in Y .
and if we can find some u ∈ H01 (U )such that this derivative is nonzero, then
u0 satisfies
Z
∂u0 · ∂u − λG0 (u0 )u dn x = 0, u ∈ H01 (R),
U
and hence is a weak solution of the nonlinear eigenvalue problem
−∆u0 = λG0 (u0 ).
Note that is last condition is for example satisfied if G(0) 0
R =0 0, G (x)x >0
for x 6= 0, and N0 > 0. Indeed, in this case δN (u0 )u0 = U G (u0 )u0 dn x > 0
since otherwise we would have u0 = 0 contradicting 0 < N0 = N (u0 ) =
N (0) = 0.
Of course in the case G(x) = 12 |x|2 and N0 = 1 this gives us the lowest
eigenvalue of the Laplacian on U with Dirichlet boundary conditions.
Note that using continuous embeddings L2 ,→ Lp with 2 ≤ p ≤ ∞ for
n = 1, 2 ≤ p < ∞ for n = 2, and 2 ≤ p ≤ n−2
2n
for n ≥ 3 one can improve
this result to the case
|G0 (x)| ≤ C(1 + |x|p−1 ).
R1
Problem 7.12. Consider X = C[0, 1] and M = {f | 0 f (x)dx = 1, f (0) =
0}. Show that M is closed and convex. Show that d(0, M ) = 1 but there is
no minimizer. If we replace the boundary condition by f (0) = 1 there is a
unique minimizer and for f (0) = 2 there are infinitely many minimizers.
Problem 7.13. Show that F : M → R is convex if and only if its epigraph
epi F := {(x, a) ∈ M × R|F (x) ≤ a} ⊂ X × R is convex.
Problem* 7.14. Show that F : M → R is quasiconvex if and only if the
sublevel sets F −1 ((−∞, a]) are convex for every a ∈ R.
shows that x is a fixed point and the estimate (7.39) follows after taking the
limit m → ∞ in (7.40).
traction since we have k∂x F (x(y), y)k ≤ θ by Theorem 7.9. Hence we get a
unique continuous solution x0 (y). It remains to show
x(y + v) − x(y) − x0 (y)v = o(v).
Let us abbreviate u := x(y + v) − x(y), then using (7.45) and the fixed point
property of x(y) we see
(1 − ∂x F (x(y), y))(u − x0 (y)v) =
= F (x(y) + u, y + v) − F (x(y), y) − ∂x F (x(y), y)u − ∂y F (x(y), y)v
= o(u) + o(v)
since F ∈ C 1 (U × V, U ) by assumption. Moreover, k(1 − ∂x F (x(y), y))−1 k ≤
(1 − θ)−1 and u = O(v) (by (7.44)) implying u − x0 (y)v = o(v) as desired.
Finally, suppose that the result holds for some r − 1 ≥ 1. Thus, if F is
Cr, then x(y) is at least C r−1 and the fact that d x(y) satisfies (7.45) shows
d x(y) ∈ C r−1 (V, U ) and hence x(y) ∈ C r (V, U ).
Note that our proof is constructive, since it shows that the solution ξ(y)
can be obtained by iterating x − (∂x F (x0 , y0 ))−1 (F (x, y) − F (x0 , y0 )).
Moreover, as a corollary of the implicit function theorem we also obtain
the inverse function theorem.
Problem 7.15. Derive Newton’s method for finding the zeros of a twice
continuously differentiable function f (x),
f (x)
xn+1 = F (xn ), F (x) = x − ,
f 0 (x)
from the contraction principle by showing that if x is a zero with f 0 (x) 6=
0, then there is a corresponding closed interval C around x such that the
assumptions of Theorem 7.27 are satisfied.
Proof. Fix x0 ∈ C(I, U ) and ε > 0. For each t ∈ I we have a δ(t) > 0
such that B2δ(t) (x0 (t)) ⊂ U and |f (x) − f (x0 (t))| ≤ ε/2 for all x with
|x − x0 (t)| ≤ 2δ(t). The balls Bδ(t) (x0 (t)), t ∈ I, cover the set {x0 (t)}t∈I
and since I is compact, there is a finite subcover Bδ(tj ) (x0 (tj )), 1 ≤ j ≤
n. Let kx − x0 k ≤ δ := min1≤j≤n δ(tj ). Then for each t ∈ I there is
a tj such that |x0 (t) − x0 (tj )| ≤ δ(tj ) and hence |f (x(t)) − f (x0 (t))| ≤
|f (x(t)) − f (x0 (tj ))| + |f (x0 (tj )) − f (x0 (t))| ≤ ε since |x(t) − x0 (tj )| ≤
|x(t) − x0 (t)| + |x0 (t) − x0 (tj )| ≤ 2δ(tj ). This settles the case r = 0.
Next let us turn to r = 1. We claim that df∗ is given by (df∗ (x0 )x)(t) :=
df (x0 (t))x(t). To show this we use Taylor’s theorem (cf. the proof of Corol-
lary 7.14) to conclude that
|f (x0 (t) + x) − f (x0 (t)) − df (x0 (t))x| ≤ |x| sup kdf (x0 (t) + sx) − df (x0 (t))k.
0≤s≤1
7.6. Ordinary differential equations 225
By the first part (df )∗ is continuous and hence for a given ε we can find a
corresponding δ such that |x(t) − y(t)| ≤ δ implies kdf (x(t)) − df (y(t))k ≤ ε
and hence kdf (x0 (t) + sx) − df (x0 (t))k ≤ ε for |x0 (t) + sx − x0 (t)| ≤ |x| ≤ δ.
But this shows differentiability of f∗ as required and it remains to show that
df∗ is continuous. To see this we use the linear map
λ : C(I, L (X, Y )) → L (C(I, X), C(I, Y )) ,
T 7→ T∗
where (T∗ x)(t) := T (t)x(t). Since we have
kT∗ xk = sup |T (t)x(t)| ≤ sup kT (t)k|x(t)| ≤ kT kkxk,
t∈I t∈I
we infer |λ| ≤ 1 and hence λ is continuous. Now observe df∗ = λ ◦ (df )∗ .
The general case r > 1 follows from induction.
Now we come to our existence and uniqueness result for the initial value
problem in Banach spaces.
Theorem 7.33. Let I be an open interval, U an open subset of a Banach
space X and Λ an open subset of another Banach space. Suppose F ∈ C r (I ×
U × Λ, X), r ≥ 1, then the initial value problem
ẋ = F (t, x, λ), x(t0 ) = x0 , (t0 , x0 , λ) ∈ I × U × Λ, (7.48)
has a unique solution x(t, t0 , x0 , λ) ∈ C r (I1
× I2 × U1 × Λ1 , U ), where I1,2 ,
U1 , and Λ1 are open subsets of I, U , and Λ, respectively. The sets I2 , U1 ,
and Λ1 can be chosen to contain any point t0 ∈ I, x0 ∈ U , and λ0 ∈ Λ,
respectively.
initial value problem with initial condition x(t+ ) = φ1 (t+ ) = φ2 (t+ ) shows
that both solutions coincide in a neighborhood of t+ by local uniqueness.
This contradicts maximality of t+ and hence t+ = T+ . Similarly, t− = T− .
Moreover, we get a solution
(
φ1 (t), t ∈ I1 ,
φ(t) := (7.51)
φ2 (t), t ∈ I2 ,
defined on I1 ∪ I2 . In fact, this even extends to an arbitrary number of
solutions and in this way we get a (unique) solution defined on some maximal
interval.
Theorem 7.34. Suppose the initial value problem (7.48) has a unique local
solution (e.g. the conditions of Theorem 7.33 are satisfied). Then there ex-
ists a unique maximal solution defined on some maximal interval I(t0 ,x0 ) =
(T− (t0 , x0 ), T+ (t0 , x0 )).
Proof. Let S be the set of all Ssolutions φ of (7.48) which are defined on
an open interval Iφ . Let I := φ∈S Iφ , which is again open. Moreover, if
t1 > t0 ∈ I, then t1 ∈ Iφ for some φ and thus [t0 , t1 ] ⊆ Iφ ⊆ I. Similarly for
t1 < t0 and thus I is an open interval containing t0 . In particular, it is of the
form I = (T− , T+ ). Now define φmax (t) on I by φmax (t) := φ(t) for some
φ ∈ S with t ∈ Iφ . By our above considerations any two φ will give the same
value, and thus φmax (t) is well-defined. Moreover, for every t1 > t0 there is
some φ ∈ S such that t1 ∈ Iφ and φmax (t) = φ(t) for t ∈ (t0 − ε, t1 + ε) which
shows that φmax is a solution. By construction there cannot be a solution
defined on a larger interval.
The solution found in the previous theorem is called the maximal so-
lution. A solution defined for all t ∈ R is called a global solution. Clearly
every global solution is maximal.
The next result gives a simple criterion for a solution to be global.
Lemma 7.35. Suppose F ∈ C 1 (R×X, X) and let x(t) be a maximal solution
of the initial value problem (7.48). Suppose |F (t, x(t))| is bounded on finite
t-intervals. Then x(t) is a global solution.
Proof. Let (T− , T+ ) be the domain of x(t) and suppose T+ < ∞. Then
|F (t, x(t))| ≤ C for t ∈ (t0 , T+ ) and for t0 < s < t < T+ we have
Z t Z t
|x(t) − x(s)| ≤ |ẋ(τ )|dτ = |F (τ, x(τ ))|dτ ≤ C|t − s|.
s s
Then (
x(t), t < T+ ,
x̃(t) =
y(t), t ≥ T+ ,
is a larger solution contradicting maximality of T+ .
Example 7.31. Finally, we want to apply this to a famous example, the so-
called FPU lattices (after Enrico Fermi, John Pasta, and Stanislaw Ulam
who investigated such systems numerically). This is a simple model of a
linear chain of particles coupled via nearest neighbor interactions. Let us
assume for simplicity that all particles are identical and that the interaction
is described by a potential V ∈ C 2 (R). Then the equation of motions are
given by
q̈n (t) = V 0 (qn+1 − qn ) − V 0 (qn − qn−1 ), n ∈ Z,
where qn (t) ∈ R denotes the position of the n’th particle at time t ∈ R and
the particle index n runs through all integers. If the potential is quadratic,
V (r) = k2 r2 , then we get the discrete linear wave equation
q̈n (t) = k qn+1 (t) − 2qn (t) + qn−1 (t) .
If we use the fact that the Jacobi operator Aqn = −k(qn+1 − 2qn + qn−1 ) is
a bounded operator in X = `pR (Z) we can easily solve this system as in the
case of ordinary differential equations. In fact, if q 0 = q(0) and p0 = q̇(0)
are the initial conditions then one can easily check (cf. Problem 7.17) that
the solution is given by
sin(tA1/2 ) 0
q(t) = cos(tA1/2 )q 0 + p .
A1/2
In the Hilbert space case p = 2 these functions of our operator A could
be defined via the spectral theorem but here we just use the more direct
definition
∞ ∞
1/2
X t2k k sin(tA1/2 ) X t2k+1
cos(tA ) := A , := Ak .
(2k)!
k=0
A1/2 (2k + 1)!
k=0
In the general case an explicit solution is no longer possible but we are still
able to show global existence under appropriate conditions. To this end
we will assume that V has a global minimum at 0 and hence looks like
V (r) = V (0) + k2 r2 + o(r2 ). As V (0) does not enter our differential equation
we will assume V (0) = 0 without loss of generality. Moreover, we will also
introduce pn := q̇n to have a first order system
q̇n = pn , ṗn = V 0 (qn+1 − qn ) − V 0 (qn − qn−1 ).
Since V 0 ∈ C 1 (R) with V 0 (0) = 0 it gives rise to a C 1 map on `pR (N) (see
Example 7.2). Since the same is true for shifts, the chain rule implies that
the right-hand side of our system is a C 1 map and hence Theorem 7.33 gives
7.6. Ordinary differential equations 229
∂x F (µ0 , x0 ) (7.53)
Now split our equation into a system of two equations according to the above
splitting of the underlying Banach space:
Since P, Q are bounded, this system is still C 1 and the derivatives are
given by (recall the block structure of A from (6.75))
∂u F1 (µ0 , 0, 0) = 0, ∂v F1 (µ0 , 0, 0) = 0,
∂u F2 (µ0 , 0, 0) = 0, ∂v F2 (µ0 , 0, 0) = A0 . (7.56)
Moreover, since A0 is an isomorphism, the implicit function theorem tells us
that we can (locally) solve F2 for v. That is, there exists a neighborhood U
of (µ0 , 0) ∈ R × Ker(A) and a unique function ψ ∈ C 1 (U, X0 ) such that
F2 (µ, u, ψ(µ, u)) = 0, (µ, u) ∈ U. (7.57)
In particular, by the uniqueness part we have ψ(µ, 0) = 0. Moreover,
∂u ψ(µ0 , 0) = −A−1
0 ∂u F2 (µ0 , 0, 0) = 0.
Plugging this into the first equation reduces to the original system to the
finite dimensional system
F̃1 (µ, u) = F1 (µ, u, ψ(µ, u)) = 0. (7.58)
Of course the chain rule tells us that F̃ ∈ C 1 . Moreover, we still have
F̃1 (µ, 0) = F1 (µ, 0, ψ(µ, 0)) = QF (µ, 0) = 0 as well as
∂u F̃1 (µ0 , 0) = ∂u F1 (µ0 , 0, 0) + ∂v F1 (µ0 , 0, 0)∂u ψ(µ0 , 0) = 0. (7.59)
This is known as Lyapunov–Schmidt reduction.
Now that we have reduced the problem to a finite-dimensional system,
it remains to find conditions such that the finite dimensional system has a
nontrivial solution. For simplicity we make the requirement
dim Ker(A) = dim Coker(A) = 1 (7.60)
such that we actually have a problem in R × R → R.
Explicitly, let u0 span Ker(A) and let u1 span X1 . Then we can write
F̃1 (µ, λu0 ) = f (µ, λ)u1 , (7.61)
where f ∈ C 1 (V, R) with V = {(µ, λ)|(µ, λu0 ) ∈ U } ⊆ R2 a neighborhood of
(µ0 , 0). Of course we still have f (µ, 0) = 0 for (µ, 0) ∈ V as well as
∂λ f (µ0 , 0)u1 = ∂u F̃1 (µ0 , 0)u0 = 0. (7.62)
It remains to investigate f . To split off the trivial solution it suggests itself
to write
f (µ, λ) = λ g(µ, λ) (7.63)
We already have
g(µ0 , 0) = ∂λ f (µ0 , 0) = 0 (7.64)
and hence if
0 6= ∂µ g(µ0 , 0) = ∂µ ∂λ f (µ0 , 0) 6= 0 (7.65)
7.7. Bifurcation theory 233
the implicit function theorem implies existence of a function µ(λ) with µ(0) =
∂ 2 f (µ ,0)
µ0 and g(µ(λ), λ) = 0. Moreover, µ0 (0) = − ∂∂µλ g(µ0 ,0)
g(µ0 ,0) = − 2∂µ ∂λ f (µ0 ,0) .
λ 0
Note that if Q∂x2 F (µ0 , 0)(u0 , u0 ) 6= 0 we could have also solved for λ
obtaining a function λ(µ) with λ(µ0 ) = 0. However, in this case it is not
obvious that λ(µ) 6= 0 for µ 6= µ0 , and hence that we get a nontrivial
solution, unless we also require Q∂µ ∂x F (µ0 , 0)u0 6= 0 which brings us back
to our previous condition. If both conditions are met, then µ0 (0) 6= 0 and
there is a unique nontrivial solution x(µ) which crosses the trivial solution
non transversally at µ0 . This is known as transcritical bifurcation. If
µ0 (0) = 0 but µ00 (0) 6= 0 (assuming this derivative exists), then two solutions
will branch off (either for µ > µ0 or for µ < µ0 depending on the sign of
234 7. Analysis in Banach spaces
Problem 7.19. Show that if F (µ, −x) = −F (µ, x), then ψ(µ, −u) = −ψ(µ, u)
and µ(−λ) = µ(λ).
Chapter 8
Operator semigroups
237
238 8. Operator semigroups
Proof. Set
n
X tj
Tn (t) := Aj .
j!
j=0
Then (for m ≤ n)
n n
X tj j
|t|j |t|m+1
X
kTn (t)−Tm (t)k =
A
≤
kAkj ≤ kAkm+1 e|t|kAk .
j=m+1 j!
j=m+1 j! (m + 1)!
In particular,
kT (t)k ≤ e|t| kAk
and AT (t) = limn→∞ ATn (t) = limn→∞ Tn (t)A = T (t)A. Furthermore we
have Ṫn+1 = ATn and thus
Z t
Tn+1 (t) = I + ATn (s)ds.
0
Taking limits shows
Z t
T (t) = I + AT (s)ds
0
or equivalently T (t) ∈ C 1 (R, L (X)) and Ṫ (t) = AT (t), T (0) = I.
Suppose S(t) is another solution, Ṡ = AS, S(0) = I. Then, by the
product rule (Problem 7.1), dtd
T (−t)S(t) = T (−t)AS(t) − AT (−t)S(t) = 0
implying T (−t)S(t) = T (0)S(0) = I. In the special case T = S this shows
T (−t) = T −1 (t) and in the general case it hence proves uniqueness S = T .
Finally, T (t + s) and T (t)T (s) both satisfy our differential equation and
coincide at t = 0. Hence they coincide for all t by uniqueness.
Lemma 8.2. Let A ∈ L (X) and g ∈ C(I, X). Then (8.4) has a unique
solution given by (8.5).
Example 8.1. For example look at the discrete linear wave equation
q̈n (t) = k qn+1 (t) − 2qn (t) + qn−1 (t) , n ∈ Z.
Factorizing this equation according to
q̇n (t) = pn (t), ṗn (t) = k qn+1 (t) − 2qn (t) + qn−1 (t) ,
we can write this as a first order system
d qn 0 1 qn
=
dt pn k A0 0 pn
with the Jacobi operator A0 qn = qn+1 −2qn +qn−1 . Since A0 is a bounded op-
erator on X = `p (Z) we obtain a well-defined uniformly continuous operator
group in `p (Z) ⊕ `p (Z).
Problem* 8.1 (Product rule). Suppose f ∈ C 1 (I, X) and T ∈ C 1 (I, L (X, Y )).
d
Show that T f ∈ C 1 (I, Y ) and dt T f = Ṫ f + T f˙.
where the domain D(A) is precisely the set of all f ∈ X for which the above
limit exists. By linearity of limits D(A) is a linear subspace of X (and A
is a linear operator) but at this point it is unclear whether it contains any
nontrivial elements. We will however postpone this issue and begin with
the observation that a C0 -semigroup is the solution of the abstract Cauchy
problem associated with its generator A:
Lemma 8.4. Let T (t) be a C0 -semigroup with generator A. If f ∈ D(A)
then T (t)f ∈ D(A) and AT (t)f = T (t)Af . Moreover, suppose g ∈ X with
u(t) := T (t)g ∈ D(A) for t > 0. Then u(t) ∈ C([0, ∞), X) ∩ C 1 ((0, ∞), X)
and u(t) is the unique solution of the abstract Cauchy problem
u̇(t) = Au(t), u(0) = g. (8.8)
This is, for example, the case if g ∈ D(A) in which case we even have
u(t) ∈ C 1 ([0, ∞), X).
Similarly, if T (t) is a C0 -group and g ∈ D(A), then u(t) := T (t)g ∈
C 1 (R, X) is the unique solution of (8.8) for all t ∈ R.
This shows the first part. To show that u(t) is differentiable it remains to
compute
1 1
lim u(t − ε) − u(t) = lim T (t − ε) T (ε)f − f
ε↓0 −ε ε↓0 ε
= lim T (t − ε) Af + o(1) = T (t)Af
ε↓0
Note that our proof in fact even shows a bit more: If g ∈ D(A) we have
u ∈ C 1 ([0, ∞), X) and hence not only u ∈ C([0, ∞), X) but also Au = u̇ ∈
C([0, ∞), X). Hence, if we regard D(A) as a normed space equipped with the
graph norm kf kA := kf k + kAf k, in which case we will write [D(A)], then
g ∈ D(A) implies u ∈ C([0, ∞), [D(A)]). Similarly, u(t) = T (t)g ∈ D(A) for
t > 0 implies u ∈ C((0, ∞), [D(A)]). Moreover, recall that [D(A)] will be a
Banach space if and only if A is a closed operator (cf. Problem 4.13) and the
latter fact will be established in Corollary 8.6 below.
Before turning to some examples, we establish a useful criterion for a
semigroup to be strongly continuous.
Lemma 8.5. A (semi)group of bounded operators is strongly continuous if
and only if lim supε↓0 kT (ε)gk < ∞ for every g ∈ X and limε↓0 T (ε)f = f
for f in a dense subset.
Proof. We first show that lim supε↓0 kT (ε)gk < ∞ for every g ∈ X implies
that T (t) is bounded in a small interval [0, δ]. Otherwise there would exist
a sequence εn ↓ 0 with kT (εn )k → ∞. Hence kT (εn )gk → ∞ for some g by
the uniform boundedness principle, a contradiction. Thus there exists some
M such that supt∈[0,δ] kT (t)k ≤ M . Setting ω = log(M
δ
)
we even obtain (8.6).
242 8. Operator semigroups
(T (t)f )(x) := f (x + t)
tn ) − f (xn )| = 1.
Next consider
Z t
u(t) = T (t)g, v(t) := u(s)ds, g ∈ X. (8.9)
0
8.2. Strongly continuous semigroups 243
Proof. Since v(t) ∈ D(A) and limt↓0 1t v(t) = g for arbitrary g, we see that
D(A) is dense. Moreover, if fn ∈ D(A) and fn → f , Afn → g then
Z t
T (t)fn − fn = T (s)Afn ds.
0
Taking n → ∞ and dividing by t we obtain
1 t
Z
1
T (t)f − f = T (s)g ds.
t t 0
Taking t ↓ 0 finally shows f ∈ D(A) and Af = g.
Note that by the closed graph theorem we have D(A) = X if and only if
A is bounded. Moreover, since a C0 -semigroup provides the unique solution
of the abstract Cauchy problem for A, we obtain
Corollary 8.7. A C0 -semigroup is uniquely determined by its generator.
Lemma 8.8. Let A be the generator of a C0 -semigroup and f ∈ C([0, ∞), X).
If the inhomogeneous problem
u̇ = Au + f, u(0) = g, (8.12)
has a solution it is necessarily given by Duhamel’s formula
Z t
u(t) = T (t)g + T (t − s)f (s)ds. (8.13)
0
Conversely, this formula gives a solution if either one of the following con-
ditions is satisfied:
• g ∈ D(A) and f ∈ C([0, ∞), [D(A)]).
• g ∈ D(A) and f ∈ C 1 ([0, ∞), X).
The function u(t) defined by (8.13) is called the mild solution of the
inhomogeneous problem. In general a mild solution is not a solution:
Example 8.3. Let T (t) be a strongly continuous group with an unbounded
generator A (e.g. the one form Example 8.2). Choose f0 ∈ X \ D(A) and set
g := 0, f (t) := T (t)f0 . Then f ∈ C(R, X) and the mild solution is given by
Z t Z t
u(t) = T (t) T (−s)f (s)ds = T (t) f0 ds = t T (t)f0 .
0 0
Since T (t) leaves D(A) invariant, we have u(t) 6∈ D(A) for all t ∈ R and
hence u(t) is not a solution.
Problem* 8.3. Show that a uniformly continuous semigroup has a bounded
generator. (Hint: Write T (t) = V (t0 )−1 V (t0 )T (t) = . . . with V (t) :=
Rt 1
0 T (s)ds and conclude that it is C .)
Problem 8.4. Let T (t) be a C0 -semigroup. Show that if T (t0 ) has a bounded
inverse for one t0 > 0 then it extends to a strongly continuous group.
Problem 8.5. Define a semigroup on L1 (−1, 1) via
(
2f (s − t), 0 < s ≤ t,
(T (t)f )(s) =
f (s − t), else,
where we set f (s) = 0 for s < 0. Show that the estimate from Lemma 8.3
does not hold with M < 2.
Problem 8.6. Let A be the generator of a C0 -semigroup T (t). Show
Z t
T (t)f = f + tAf + (t − s)T (s)A2 f dt, f ∈ D(A2 ).
0
Moreover,
M
kRA (z)k ≤ , Re(z) > ω. (8.18)
Re(z) − ω
Rs
Proof. Let us abbreviate Rs (z)f := − 0 e−zt T (t)f dt. Then, by virtue
of (8.6), ke−zt T (t)f k ≤ M e(ω−Re(z))t kf k shows that Rs (z) is a bounded
operator satisfying kRs (z)k ≤ M (Re(z) − ω)−1 . Moreover, this estimates
also shows that the limit R(z) := lims→∞ Rs (z) exists (and still satisfies
kR(z)k ≤ M (Re(z) − ω)−1 ). Next note that S(t) = e−zt T (t) is a semigroup
with generator A − z (Problem 8.11) and hence for f ∈ D(A) we have
Z s Z s
Rs (z)(A − z)f = − S(t)(A − z)f dt = − Ṡ(t)f dt = f − S(s)f.
0 0
In particular, taking the limit s → ∞, we obtain R(z)(A − z)f = f for
f ∈ D(A). Similarly, still for f ∈ D(A), by Problem 7.3
Z s Z s
(A − z)Rs (z)f = − (A − z)S(t)f dt = − Ṡ(t)f dt = f − S(s)f
0 0
and taking limits, using closedness of A, implies (A − z)R(z)f = f for
f ∈ D(A). Finally, if f ∈ X choose fn ∈ D(A) with fn → f . Then
R(z)fn → R(z)f and (A − z)R(z)fn = fn → f proving R(z)f ∈ D(A) and
(A − z)R(z)f = f for f ∈ X.
Corollary 8.10. Let T be a C0 -semigroup with generator A satisfying (8.6).
Then
(−1)n+1 ∞ n −zt
Z
RA (z) n+1
= t e T (t)dt, Re(z) > ω, (8.19)
n! 0
and
M
kRA (z)n k ≤ , Re(z) > ω, n ∈ N. (8.20)
(Re(z) − ω)n
R∞
Proof. Abbreviate Rn (z) := 0 tn e−zt T (t)dt and note that
Z ∞
Rn (z + ε) − Rn (z)
= −Rn+1 (z) + ε tn+2 φ(εt)e−zt T (t)dt
ε 0
248 8. Operator semigroups
Given these preparations we can now try to answer the question when
A generates a semigroup. In fact, we will be constructive and obtain the
corresponding semigroup by approximation. To this end we introduce the
Yosida approximation
An := −nARA (ω + n) = −n − n(ω + n)RA (ω + n) ∈ L (X). (8.21)
Of course this is motivated by the fact that this is a valid approximation for
numbers limn→∞ a−ω−n−n
= 1. That we also get a valid approximation for
operators is the content of the next lemma.
Proof. Necessity has already been established in Corollaries 8.6 and 8.10.
For the converse we use the semigroups
Tn (t) := exp(tAn )
8.3. Generator theorems 249
Note that in combination with the following lemma this also answers the
question when A generates a C0 -group.
The following examples show that the spectral conditions are indeed cru-
cial. Moreover, they also show that an operator might give rise to a Cauchy
problem which is uniquely solvable for a dense set of initial conditions, with-
out generating a strongly continuous semigroup.
Example 8.4. Let
0 A0
A= , D(A) = X × D(A0 ).
0 0
we have
1 1 1/z 1 t
RA (z) = − , T (t) = ,
z 0 1 0 1
which shows that the bound on the resolvent is crucial.
However, for a given operator even the simple estimate (8.26) might be
difficult to establish directly. Hence we outline another criterion.
Example 8.7. Let X be a Hilbert space and observe that for a contraction
semigroup the expression kT (t)f k must be nonincreasing. Consequently, for
f ∈ D(A) we must have
d
2
kT (t)f k = 2Re hf, Af i ≤ 0.
dt t=0
Operators satisfying Re(hf, Af i) ≤ 0 are called dissipative and this clearly
suggests to replace the resolvent estimate by dissipativity.
To formulate this condition for Banach spaces, we first introduce the
duality set
J (x) := {x0 ∈ X ∗ |x0 (x) = kxk2 = kx0 k2 } (8.27)
of a given vector x ∈ X. In other words, the elements from J (x) are those
linear functionals which attain their norm at x and are normalized to have
the same norm as x. As a consequence of the Hahn–Banach theorem (Corol-
lary 4.15) note that J (x) is nonempty. Moreover, it is also easy to see that
J (x) is convex and weak-∗ closed.
Example 8.8. Let X be a Hilbert space and identify X with X ∗ via x 7→
hx, .i as usual. Then J (x) = {x}. Indeed since we have equality hx0 , xi =
kx0 kkxk in the Cauchy–Schwarz inequality, we must have x0 = αx for some
α ∈ C with |α| = 1 and α∗ kxk2 = hx0 , xi = kxk2 shows α = 1.
Example 8.9. If X ∗ is strictly convex (cf. Problem 1.13), then the duality
set contains only one point. In fact, suppose x0 , y 0 ∈ J (x), then z 0 = 21 (x0 +
y 0 ) ∈ J (x) and kxk 0 kxk
2 kx + y k = z (x) = 2 (kx k + ky k) implying x = y
0 0 0 0 0 0
by strict convexity. Note that the converse is also true: If x , y ∈ J (x) for
0 0
Lemma 8.15. Let x, y ∈ X. Then kxk ≤ kx − αyk for all α > 0 if and only
if there is an x0 ∈ J (x) such that Re(x0 (y)) ≤ 0.
where
( √ √
−1 sinh( λ(1 − x)) sinh( λy), y ≤ x,
G(λ, x, y) := √ √ √
λ sinh(λ) sinh( λ(1 − y)) sinh( λx), x ≤ y,
is in D(A) and satisfies (A − λ)f = g. Note that alternatively one could
compute the norm of the resolvent
1 1
kRA (λ)k = 1− √
λ cosh( λ/2)
(equality is attained for constant functions; while these are not in X, you can
approximate them by choosing functions which are constant on [ε, 1−ε]).
Example 8.13. Another neat example is the following linear delay differ-
ential equation
Z t
u̇(t) = u(s)dν(s), t > 0, u(s) = g(s), −1 ≤ s ≤ 0,
t−1
where ν is a complex measure. To this end we introduce the following oper-
ator
Z 0
0 1 0
Af := f , D(A) := {f ∈ C [−1, 0]|f (0) = f (s)dν(s)} ⊂ C[0, 1].
−1
Suppose that we can show that it generates a semigroup T on X = C[0, 1] and
set u(t) := (T (t)f )(0) for f ∈ D(A). Then, since T leaves D(A) invariant,
the function r 7→ (T (t + r)f )(s − r) is differentiable with
d
(T (t + r)f )(s − r) = (T (t + r)Af )(s − r) − (T (t + r)f 0 )(s − r) = 0
dr
and we conclude (T (t + r)f )(s − r) = (T (t)f )(s) for −1 + r ≤ s ≤ 0. In
particular, for r = s we obtain u(t + s) = (T (t)f )(s). Hence we obtain
Z 0
d
u̇(t) = (T (t)f )(0) = (AT (t)f )(0) = (T (t)f )(s)dν(s)
dt −1
Z 0
= u(t + s)dν(s)
−1
and u solves our delay differential equation. Now if g ∈ C[0, 1] is given we
can approximate it by a sequence fn ∈ D(A). Then un (t) := (Tn (t)fn )(0)
will converge uniformly on compact sets to u(t) := (T (t)g)(0) and taking the
limit in the differential equation shows that u is differentiable and satisfies
the differential equation.
Hence it remains to show that A generates a semigroup. First of all
we claim that à := A − kνk is dissipative, where kνk is the total vari-
ation of ν. As in the previous example, for ` ∈ J (f ) we can choose
`(g) = f (x0 )∗ g(x0 ) where x0 is chosen such that |f (x0 )| = kf k∞ . Then
256 8. Operator semigroups
Proof. Recall that A is closable if and only if for every xn ∈ D(A) with
xn → 0 and Axn → y we have y = 0. So let xn be such a sequence and chose
another sequence yn ∈ D(A) such that yn → y (which is possible since D(A)
is assumed dense). Then by dissipativity (specifically Corollary 8.16)
k(A − λ)(λxn + ym )k ≥ λkλxn + ym k, λ>0
and letting n → ∞ and dividing by λ shows
ky + (λ−1 A − 1)ym k ≥ kym k.
Finally λ → ∞ implies ky − ym k ≥ kym k and m → ∞ yields 0 ≥ kyk, that
is, y = 0 and A is closable. To see that A is dissipative choose x ∈ D(A) and
xn ∈ D(A) with xn → x and Axn → Ax. Then (again using Corollary 8.16)
taking the limit in k(A − λ)xn k ≥ λkxn k shows k(A − λ)xk ≥ λkxk as
required.
Consequently:
dn d
RA (z) = n!RA (z)n+1 , RA (z)n = nRA (z)n+1 .
dz n dz
d
Problem* 8.14. Consider X = C[0, 1] and A = dx with D(A) = C 1 [0, 1].
d
Compute σ(A). Do the same for A0 = dx with D(A) = {x ∈ C 1 [0, 1]|x(0) =
0}.
Problem 8.15. Suppose z0 ∈ ρ(A) (in particular A is closed; also note that
0 ∈ σ(RA (z0 )) if and only if A is unbounded). Show that for z 6= 0 we have
σ(A) = z0 + (σ(RA (z0 )) \ {0})−1 and RRA (z0 ) (z) = − z1 − z12 RA (z0 + z1 ) for
z ∈ ρ(RA (z0 )) \ {0}. Moreover, Ker(RA (z0 ) − z)n = Ker(A − z0 − z1 )n and
Ran(RA (z0 ) − z)n = Ran(A − z0 − z1 )n for every n ∈ N0 .
Problem 8.16. Show that RA (z) ∈ C (X) for one z ∈ ρ(A) if and only
this holds for all z ∈ ρ(A). Moreover, in this case the spectrum of A consists
only of discrete eigenvalues with finite (geometric and algebraic) multiplicity.
(Hint: Use the previous problem to reduce it to Theorem 6.14.)
Problem 8.17. Consider the heat equation (Example 8.12) on [0, 1] with
Neumann boundary conditions u0 (0) = u0 (1) = 0.
Problem 8.18. Consider the heat equation (Example 8.12) on Cb (R) and
C0 (R).
258 8. Operator semigroups
and
Z t
kK(u)(t) − K(v)(t)k ≤ M L ku(s) − v(s)k ds ≤ M Lt sup ku(s) − v(s)k
0 0≤s≤t
then θ := M Lt0 < 1 and K will be a contraction on B̄r (0) ⊂ C([0, t0 ], X).
In particular, for two solutions uj corresponding to gj with kgj k ≤ kgk we
will have ku1 − u2 k∞ ≤ 1−θ1
kg1 − g2 k by (7.44).
This establishes the theorem except for the fact that it only shows unique-
ness for solutions which stay within B̄r (0). However, since K maps from
B̄r (0) to its interior Br (0), a potential different solution starting at g ∈ Br (0)
would need to branch off at the boundary, which is impossible since our so-
lution does not reach the boundary.
If solutions are not global, there is still a unique maximal solution: Fix
g ∈ X and let uj be two solutions on [0, tj ) with 0 < t1 < t2 . By the
uniqueness part of our theorem, we will have u1 (t) = u2 (t) for 0 ≤ t < τ
for some τ > 0. Suppose τ < t1 and τ is chosen maximal. Let r :=
max0≤t≤τ ku1 (t)k and 0 < ε < min(τ, t0 (r)/2) with t0 (r) from our theorem.
Then there is a solution v starting with initial condition u1 (τ − ε) which is
defined on [0, 2ε]. Moreover, again by the uniqueness part of our theorem
u1 (t) = v(t − (τ − ε)) = u2 (t) for τ − ε ≤ t ≤ τ + ε) contradiction our
assumption that τ is maximal. Hence taking the union (with respect to
their domain) over all mild solutions starting at g, we get a unique solution
defined on a maximal domain [0, t+ (g)). Note that if t+ (g) < ∞, then ku(t)k
must blow up as t → t+ (g):
Lemma 8.22. Let t+ (g) be the maximal time of existence for the mild so-
lution starting at g. If t+ (g) < ∞, then lim inf t→t+ (g) ku(t)k = ∞.
Proof. Assume that ρ := sup0≤t≤t+ (g) ku(t)k < ∞. As above, choose 0 <
ε < min(t0 , t+ (ρ)) with t+ (ρ) from our theorem. Then the solution v starting
with initial condition u(t+ (g) − ε) extends u to the interval [0, t+ (g) + ε),
contradicting maximality.
The nonlinear
Schrödinger equation
n
9.1. Local well-posedness in H r for r > 2
261
262 9. The nonlinear Schrödinger equation
Note that the mild solution will be a strong solution for g ∈ H r+2 since
F : H r+2 → H r+2 is continuous. Moreover, for each initial condition there
is a maximal solution and Lemma 8.22 implies:
Lemma 9.3. This solution exists on a maximal time interval (t− (g), t+ (g))
and if |t± (g)| < ∞ we must have lim inf t→t± (g) ku(t)k = ∞.
Lemma 9.4. Let g ∈ H r (Rn ) with r > n2 or g ∈ A(Rn ). Let t+,r (g),
t+,A (g) be the maximal existence time of the solution with initial condition
g with respect to these cases. Then t+,r (g) = t+,A (g).
Moreover, it suffices to take the sup over functions which have support in a
compact rectangle.
We call a pair (p, r) admissible if
(
2 ≤ p ≤ ∞, n=1 2 n n
2n
, = − . (9.18)
2 ≤ p < n−2 , n ≥ 2 r 2 p
Lemma 9.6. Let TS be the Schrödinger group and let (p, r) be admissible
with p > 2. Then we have
Z Z r 1/r
kTS (t − s)g(s)kp ds dt ≤ CkgkLr0 (Lp0 ) , (9.19)
R R
Proof. Since the case p = 2 follows from unitarity, we can assume p > 2.
The claims about integrability and the last estimate follow from the lemma.
Using unitarity of TS and Fubini we get
Z Z Z Z
(TS (t)f )(x)g(t, x)dn x dt = f (x) (TS (t)g(t))(x)dt dn x,
R Rn Rn R
0 0
for g ∈ Lr (R, Lp (Rn )) with support in a compact rectangle. Note that in
this case we have g(t) ∈ L2 (Rn ) since p0 ≤ 2. This shows that the first and
second estimate are equivalent upon using the above characterization (9.17)
as well as the analogous characterization for the L2 norm.
9.2. Strichartz estimates 267
which shows that the second and the third estimate are equivalent with a
similar argument as before.
Note that using the scaling f (x) → f (λx) for λ > 0 shows that the left-
hand side of (9.20) scales like λn/p+2/r while the right-hand side scales like
λn/2 . So (9.20) can only hold if np + 2r = n2 .
In connection with the Duhamel formula the following easy consequence
is also worth while noticing:
Proof. The second estimate is immediate from the lemma and the first
estimate follows from (9.21) upon restricting to functions g supported in
Rt Rt
[0, t] and using a simple change of variables 0 T (t − s)g(s)ds = 0 T (s)g(t −
s)ds.
Note that apart from unitarity of TS only (9.14) was used to derive these
estimates. Moreover, since TS commutes with derivatives, we can also get
analogous estimates for derivatives:
as well as
Z t
(9.28)
TS (t − s)g(s)ds
≤ Ckgk r0 k,p0 ,
k L (W )
0 H
t
Z
(9.29)
TS (t − s)g(s)ds
≤ CkgkLr0 (W k,p0 ) .
0 Lr (W k,p )
Proof. Consider dense sets f ∈ S(Rn ) and g ∈ Cc (R, S(Rn )). Then we have
for example
k∂j TS (t)f kLr (Lp ) = kTS (t)∂j f kLr (Lp ) ≤ Ck∂j f k2
by applying (9.20) to ∂j f . Combining the estimates for f and its derivatives
gives (9.25). Similarly for the other estimates.
Problem 9.2. Does the translation group T (t)g(x) := g(x−t) satisfy (9.14)?
Problem 9.3. Let u(t) := TS (t)g for some g ∈ L1 (Rn ). Show that u ∈
C(R \ {0}, C0 (Rn )). (Hint: Lemma 4.33 (iv).)
Problem 9.4. Show that (9.20) can only hold if 2r = n2 − np . (Hint: Look
for a scaling which leaves the Schrödinger equation invariant.)
Problem 9.5. Prove that there is no triple p, q, t with 1 ≤ q < p < ∞, t ∈ R
such that
kTS (t)gkq ≤ Ckgkp .
(Hint: The translation operator Ta f (x) := f (x − a) commutes with TS (t).
Moreover, we have
lim kf + Ta f kp = 21/p kf kp , 1 ≤ p < ∞.
|a|→∞
taken care of by unitarity and (9.20)). Since the spatial parts of the space-
time norms must match up, we need p0 α = p, that is, p = 1 + α. For the
0
time part an inequality r0 α ≤ r is sufficient since in this case Lr α ⊆ Lr by
Hölder’s inequality. This imposes the restriction α ≤ 1 + n4 . In fact, we will
impose a strict inequality since we will use the contribution from Hölder’s
inequality to get a contraction. Moreover, note that the dependence on the
initial condition g is controlled by the L2 norm alone and this will imply
that our contraction is uniform (in fact Lipschitz on bounded domains) with
respect to the initial condition in L2 , and so will be the solution.
4
Theorem 9.10. Suppose 1 < α < 1 + n and consider the Banach space
4(α + 1)
X := C([−t0 , t0 ], L2 (Rn )) ∩ Lr ([−t0 , t0 ], Lα+1 (Rn )), r= , (9.30)
n(α − 1)
with norm
Z t0 1/r
kf k := sup kf (t)k2 + kf (t)krα+1 dt . (9.31)
t∈[−t0 ,t0 ] −t0
Then for every g ∈ L2 (Rn ) there is a t0 = t0 (kgk2 ) > 0, such that there is a
unique solution u ∈ X of (9.7). Moreover, the solution map g 7→ u(t) will
be Lipschitz continuous from every ball kgk2 ≤ ρ to X defined with t0 (ρ).
for u ∈ B̄a (0). Now we choose a = (2 + C)kgk2 and 2C(2 + C)tθ0 aα−1 < 1
such that
kK(u)k ≤ (1 + C)kgk2 + 2Ctθ0 (2 + C)α kgkα2 < (2 + C)kgk2 = a.
Similarly we can show that K is a contraction. Invoking (9.23) and (9.24)
we have
Z t0 1/r0
α−1 α−1 r0
kK(u) − K(v)k ≤ 2C k|u(t)| u(t) − |v(t)| v(t)k(α+1)/α dt
0
Now using (Problem 9.6)
α−1
u − |v|α−1 v ≤ α |u|α−1 + |v|α−1 |u − v|,
|u| u, v ∈ C,
and invoking the generalized Hölder inequality in the form
k|u|α−1 |u − v|k(α+1)/α ≤ k|u|α−1 k(α+1)/(α−1) ku − vkα+1 = kukα−1
α+1 ku − vkα+1
and then in the previous form with f1 = kukα+1 , f2 = ku − vkα+1 , we obtain
Z t0 1/r0
α−1 α−1
r 0
kK(u) − K(v)k ≤ 2αC (kukα+1 + kvkα+1 )ku − vkα+1 dt
0
Z t0 1/r
≤ 2αCtθ0 2aα−1 ku − vkrα+1 dt
0
≤ 4αCtθ0 aα−1 ku − vk.
Hence, decreasing t0 further (if necessary), such that we also have 4αCtθ aα−1 <
1, we get a contraction. Moreover, since kKg (u) − Kf (u)k = kKg−f (0)k ≤
(1+C)kg−f k2 the uniform contraction principle establishes the theorem.
n+2
Theorem 9.13. Suppose n ≥ 3 and 2 ≤ α < n−2 . Consider the Banach
space
X := C([−t0 , t0 ], H 1 (Rn )) ∩ Lr ([−t0 , t0 ], W 1,p (Rn )), (9.32)
where
n(α + 1) 4(α + 1)
p= , r= , (9.33)
n+α−1 (n − 2)(α − 1)
with norm
Z t0 1/r
kf k := sup kf (t)k1,2 + kf (t)kr1,p dt . (9.34)
t∈[−t0 ,t0 ] −t0
Then for every g ∈ H 1 (Rn ) there is a t0 = t0 (kgk1,2 ) > 0, such that there is
a unique solution u ∈ X of (9.7). Moreover, the solution map g 7→ u(t) will
be Lipschitz continuous from every ball kgk1,2 ≤ ρ to X defined with t0 (ρ).
where we have applied the generalized Hölder inequality with p10 = α−2 1 1
q +q+p
in the first step (requiring α ≥ 2) and the Gagliardo–Nierenberg–Sobolev
inequality (Theorem 7.18 from [47] – since we need p < n, we need to
require n > 2) with 1q = p1 − n1 in the second step. In particular, this imposes
and hence
k|u|α−1 uk1,p0 ≤ C̃kukα1,p .
Similarly we obtain
≤ αC k∂ukα−1 + k∂vkα−1
p p ku − vkp
272 9. The nonlinear Schrödinger equation
and
In summary,
Now the rest follows as in the proof of Theorem 9.10. Note that in this case
2+n+(2−n)α
θ = 1 − α+1
r = 4 explaining our upper limit for α.
Note that since we have H 1 (Rn ) ⊆ Lα+1 (Rn ) for n ≥ 3 and α < n−2
n+2
In the focusing case we need to control the Lα+1 norm in terms of the
H1 norm using the Gagliardo–Nierenberg–Sobolev inequality.
Thus
2
k∂u(t)k22 = 2E(0) + ku(t)kα+1
α+1
α+1
2C α+1−n(α−1)/2 n(α−1)/2
≤ 2E(0) + kgk2 k∂u(t)k2 . (9.36)
α+1
n(α−1)
(i). Now if α < 1 + n4 , then 2 < 2 and k∂u(t)k2 remains bounded.
4/n
(ii). In the case α = 1 + 4
n this remains still true if 2C
2+4/n kgk2 < 1.
(iii). If α >
1+ n4we can choose kgk1,2 so small such that the given conditions
hold. Note that this is possible since our above calculation shows
1 2C α+1−n(α−1)/2 n(α−1)/2
E(0) ≤ k∂gk22 + kgk2 k∂gk2 .
2 α+1
Now if we start with k∂u(0)k22 ≤ 1 and assume k∂u(t)k22 = 1 we get the
α+1−n(α−1)/2
contradiction 1 = k∂u(t)k22 ≤ 2E(0) + α+1
2C
kgk2 < 1. Hence
k∂u(t)k22 < 1 as desired.
Problem 9.6. Show that the real derivative (with respect to the identification
C∼
= R2 ) of F (u) = |u|α−1 u is given by
Conclude in particular,
and hence
|vF 00 (u)w| ≤ (α − 1)(α + 2)|u|α−2 |v||w|.
Problem 9.7. Show that (9.32) is a Banach space. (Hint: Work with test
functions from Cc∞ .)
Problem 9.8. Suppose f ∈ Lp0 (I, Lq0 (U )) ∩ Lp1 (I, Lq1 (U )). Show that
f ∈ Lpθ (I, Lqθ (U )) for θ ∈ [0, 1], where
1 1−θ θ 1 1−θ θ
= + , = + .
pθ p0 p1 qθ q0 q1
(Hint: Lyapunov and generalized Hölder inequality — Problem 3.12 from
[47] and Problem 3.9 from [47].)
274 9. The nonlinear Schrödinger equation
9.4. Blowup in H 1
In this section we will show that solutions are not always global in the focus-
ing case. For simplicity we will only consider the one-dimensional case. We
first complement Theorem 9.13 with a result for the one-dimensional case.
Theorem 9.16. Let n = 1 and α ≥ 2. For every g ∈ H 1 (R) there is a t0 =
t0 (kgk1,2 ) > 0, such that there is a unique solution u ∈ C([−t0 , t0 ], H 1 (R))
of (9.7). Moreover, the solution map g 7→ u(t) will be Lipschitz continuous
from every ball kgk1,2 ≤ ρ to C([−t0 (ρ), t0 (ρ)], H 1 (R)).
and
k∂(F (u) − F (v))k2 ≤ (α − 1)(α + 2)(kukα−2 α−2
∞ + kvk∞ )k∂uk2 ku − vk2
+ αkvkα−1
∞ k∂(u − v)k2
Proof. Consider H 1,1 (R) := H 1 (R) ∩ L2 (R, x2 dx) together with the norm
kf k2 = kf k22 + kf 0 k22 + kxf (x)k22 . Then TS (t) is a C 0 group satisfying
kTS (t)f k ≤ (1 + 2|t|)kf k. Moreover, as in the previous theorem one ver-
ifies that F : H 1,1 (R) → H 1,1 (R) is locally Lipschitz on bounded sets. In
fact, note that by
kx(F (u)(x) − F (v)(x))k2 ≤ α(kukα−1 α−1
∞ + kvk∞ )kx(u(x) − v(x))k2
the Lipschitz constant depends only on the H 1 norm. Hence we get existence
of local solutions. Moreover, using (9.7) we obtain
Z t
ku(t)k ≤ (1 + 2t)kgk + α (1 + 2(t − s))ku(s)kα−1
1,2 ku(s)kds,
0
and Gronwall’s inequality
Z t
ku(t)k ≤ (1 + 2t)kgk exp (1 + 2(t − s))ku(s)kα−1
1,2 ds
0
Now we are ready to establish blowup for the focusing NLS equation.
Theorem 9.19. Consider the one-dimensional focusing NLS equation with
α ≥ 5. Let g ∈ H 1 (R) ∩ L2 (R, x2 dx) with negative energy E < 0. Then the
corresponding maximal mild solution u satisfies t+ (g) < ∞.
Notice that there are initial conditions with negative energy, since the
two contributions to the energy scale differently. In particular, the energy
will become negative if we scale g with a sufficiently large factor.
276 9. The nonlinear Schrödinger equation
Problem 9.10. Show that the one-dimensional focusing NLS equation has
global solutions in H 1 (R) if either α < 5 or α = 5 and kgk2 ≤ ( 34 )1/4 or α > 5
and kgk1,2 sufficiently small. (Hint: Use the estimate from Problem 9.9.)
which shows that, if we flip the sign in front of the nonlinearity (defocusing
case), there is only the trivial solution.
In one-dimension one has the explicit solution
√ 1/β
1+β α−1
ϕ(x) = , β= . (9.46)
cosh(βx) 2
In higher dimensions we can apply Theorem 7.26 to get existence of solutions:
n+2
Theorem 9.20. Suppose n ≥ 2 and 1 < α < n−2 . Then the nonlinear
elliptic problem (9.46) has a weak positive radial solution in H 1 (Rn ).
9.5. Standing waves 277
Lrad (R , R) and note that the Strauss lemma (Problem 7.31 from [47])
α+1 n
F (|u0 |) = F (u0 ) and hence |u0 | is also a minimizer. Rescaling this solution
according to ϕ(x) = λ1/(α−1) |u0 (x)| establishes the claim.
10.1. Introduction
Many applications lead to the problem of finding zeros of a mapping f : U ⊆
X → X, where X is some (real) Banach space. That is, we are interested in
the solutions of
f (x) = 0, x ∈ U. (10.1)
In most cases it turns out that this is too much to ask for, since determining
the zeros analytically is in general impossible.
Hence one has to ask some weaker questions and hope to find answers
for them. One such question would be “Are there any solutions, respectively,
how many are there?”. Luckily, these questions allow some progress.
To see how, lets consider the case f ∈ H(C), where H(U ) denotes the set
of holomorphic functions on a domain U ⊂ C. Recall the concept of the
winding number from complex analysis. The winding number of a path
γ : [0, 1] → C \ {z0 } around a point z0 ∈ C is defined by
Z
1 dz
n(γ, z0 ) := ∈ Z. (10.2)
2πi γ z − z0
It gives the number of times γ encircles z0 taking orientation into account.
That is, encirclings in opposite directions are counted with opposite signs.
In particular, if we pick f ∈ H(C) one computes (assuming 0 6∈ f (γ))
Z 0
1 f (z) X
n(f (γ), 0) = dz = n(γ, zk )αk , (10.3)
2πi γ f (z)
k
279
280 10. The Brouwer mapping degree
(1 − t)f (z) + t g(z) is the required homotopy since |f (z) − g(z)| < |g(z)|,
|z| = 1, implying H(t, z) 6= 0 on [0, 1] × γ. Hence f (z) has one zero inside
the unit circle.
Summarizing, given a (sufficiently smooth) domain U with enclosing Jor-
dan curve ∂U , we have defined a degree deg(f, U, z0 ) = n(f (∂U ), z0 ) =
n(f (∂U ) − z0 , 0) ∈ Z which counts the number of solutions of f (z) = z0
inside U . The invariance of this degree with respect to certain deformations
of f allowed us to explicitly compute deg(f, U, z0 ) even in nontrivial cases.
Our ultimate goal is to extend this approach to continuous functions
f : Rn → Rn . However, such a generalization runs into several problems.
First of all, it is unclear how one should define the multiplicity of a zero. But
even more severe is the fact, that the number of zeros is unstable with respect
to small perturbations. For example, consider fε : [−1, 2] → R, x 7→ x2 − ε.
Then fε has no √ zeros for ε < 0, one zero
√ for ε = 0, two zeros for 0 < ε ≤ 1,
one for 1 < ε ≤ 2, and none for ε > 2. This shows the following facts.
(i) Zeros with f 0 6= 0 are stable under small perturbations.
(ii) The number of zeros can change if two zeros with opposite sign
change (i.e., opposite signs of f 0 ) run into each other.
(iii) The number of zeros can change if a zero drops over the boundary.
Hence we see that we cannot expect too much from our degree. In addition,
since it is unclear how it should be defined, we will first require some basic
properties a degree should have and then we will look for functions satisfying
these properties.
10.2. Definition of the mapping degree and the determinant formula 281
is positive for f ∈ C̄y (U, Rn ) and thus C̄y (U, Rn ) is an open subset of
C(U , Rn ).
Now that these things are out of the way, we come to the formulation of
the requirements for our degree.
A function deg which assigns each f ∈ C̄y (U, Rn ), y ∈ Rn , a real number
deg(f, U, y) will be called degree if it satisfies the following conditions.
(D1). deg(f, U, y) = deg(f − y, U, 0) (translation invariance).
(D2). deg(I, U, y) = 1 if y ∈ U (normalization).
(D3). If U1,2 are open, disjoint subsets of U such that y 6∈ f (U \(U1 ∪U2 )),
then deg(f, U, y) = deg(f, U1 , y) + deg(f, U2 , y) (additivity).
(D4). If H(t) = (1 − t)f + tg ∈ C̄y (U, Rn ), t ∈ [0, 1], then deg(f, U, y) =
deg(g, U, y) (homotopy invariance).
282 10. The Brouwer mapping degree
Before we draw some first conclusions form this definition, let us discuss
the properties (D1)–(D4) first. (D1) is natural since deg(f, U, y) should
have something to do with the solutions of f (x) = y, x ∈ U , which is the
same as the solutions of f (x) − y = 0, x ∈ U . (D2) is a normalization
since any multiple of deg would also satisfy the other requirements. (D3)
is also quite natural since it requires deg to be additive with respect to
components. In addition, it implies that sets where f 6= y do not contribute.
(D4) is not that natural since it already rules out the case where deg is the
cardinality of f −1 ({y}). On the other hand it will give us the ability to
compute deg(f, U, y) in several cases.
Theorem 10.1. Suppose deg satisfies (D1)–(D4) and let f, g ∈ C̄y (U, Rn ),
then the following statements hold.
(i). We have deg(f, ∅, y) = 0. Moreover, if Ui , 1 ≤ i ≤ N , are disjoint
open subsets of U such that y 6∈ f (U \ N
S
PN i=1 Ui ), then deg(f, U, y) =
i=1 deg(f, Ui , y).
(ii). If y 6∈ f (U ), then deg(f, U, y) = 0 (but not the other way round).
Equivalently, if deg(f, U, y) 6= 0, then y ∈ f (U ).
(iii). If |f (x)−g(x)| < |f (x)−y|, x ∈ ∂U , then deg(f, U, y) = deg(g, U, y).
In particular, this is true if f (x) = g(x) for x ∈ ∂U .
Proof. For the first part of (i) use (D3) with U1 = U and U2 = ∅. For
the second part use U2 = ∅ in (D3) if N = 1 and the rest follows from
induction. For (ii) use N = 1 and U1 = ∅ in (i). For (iii) note that H(t, x) =
(1 − t)f (x) + t g(x) satisfies |H(t, x) − y| ≥ dist(y, f (∂U )) − |f (x) − g(x)| for
x on the boundary.
Proof. For (i) it suffices to show that deg(., U, y) is locally constant. But
if |g − f | < dist(y, f (∂U )), then deg(f, U, y) = deg(g, U, y) by (D4) since
|H(t) − y| ≥ |f − y| − |g − f | > 0, H(t) = (1 − t)f + t g. The proof of (ii) is
similar.
10.2. Definition of the mapping degree and the determinant formula 283
N
X
deg(f, U, 0) = deg(f, U (xi ), 0). (10.9)
i=1
It suffices to consider one of the zeros, say x1 . Moreover, we can even assume
x1 = 0 and U (x1 ) = Bδ (0). Next we replace f by its linear approximation
around 0. By the definition of the derivative we have
f (x) = df (0)x + |x|r(x), r ∈ C(Bδ (0), Rn ), r(0) = 0. (10.10)
Now consider the homotopy H(t, x) = df (0)x + (1 − t)|x|r(x). In order
to conclude deg(f, Bδ (0), 0) = deg(df (0), Bδ (0), 0) we need to show 0 6∈
H(t, ∂Bδ (0)). Since Jf (0) 6= 0 we can find a constant λ such that |df (0)x| ≥
λ|x| and since r(0) = 0 we can decrease δ such that |r| < λ. This implies
|H(t, x)| ≥ ||df (0)x| − (1 − t)|x||r(x)|| ≥ λδ − δ|r| > 0 for x ∈ ∂Bδ (0) as
desired.
In summary we have
N
X
deg(f, U, 0) = deg(df (xi ), Bδ (0), 0) (10.11)
i=1
284 10. The Brouwer mapping degree
Using this lemma we can now show the main result of this section.
Theorem 10.4. Suppose f ∈ C̄y1 (U, Rn ) and y 6∈ CV(f ), then a degree
satisfying (D1)–(D4) satisfies
X
deg(f, U, y) = sign Jf (x), (10.12)
x∈f −1 ({y})
P
where the sum is finite and we agree to set x∈∅ = 0.
Up to this point we have only shown that a degree (provided there is one
at all) necessarily satisfies (10.12). Once we have shown that regular values
are dense, it will follow that the degree is uniquely determined by (10.12)
since the remaining values follow from point (iii) of Theorem 10.1. On the
other hand, we don’t even know whether a degree exists since it is unclear
whether (10.12) satisfies (D4). Hence we need to show that (10.12) can be
extended to f ∈ C̄y (U, Rn ) and that this extension satisfies our requirements
(D1)–(D4).
Proof. Since the claim is easy for linear mappings our strategy is as follows.
We divide U into sufficiently small subsets. Then we replace f by its linear
approximation in each subset and estimate the error.
Let CP(f ) := {x ∈ U |Jf (x) = 0} be the set of critical points of f . We
first pass to cubes which are easier to divide. Let {Qi }i∈N be a countable
cover for U consisting of open cubes such that Qi ⊂ U . Then it suffices
to prove that f (CP(f ) ∩ Qi ) has zero measure since CV(f ) = f (CP(f )) =
i f (CP(f ) ∩ Qi ) (the Qi ’s are a cover).
S
Let Q be anyone of these cubes and denote by ρ the length of its edges.
Fix ε > 0 and divide Q into N n cubes Qi of length ρ/N . These cubes don’t
have to be open and hence we can assume that they cover Q. Since df (x) is
uniformly continuous on Q we can find an N (independent of i) such that
Z 1
ερ
|f (x) − f (x̃) − df (x̃)(x − x̃)| ≤ |df (x̃ + t(x − x̃)) − df (x̃)||x̃ − x|dt ≤
0 N
(10.13)
for x̃, x ∈ Qi . Now pick a Qi which contains a critical point x̃i ∈ CP(f ).
Without restriction we assume x̃i = 0, f (x̃i ) = 0 and set M := df (x̃i ). By
det M = 0 there is an orthonormal basis {bi }1≤i≤n of Rn such that bn is
286 10. The Brouwer mapping degree
(e.g., C := maxx∈Q |df (x)|). Next, by our estimate (10.13) we even have
n
X √ ρ √ ρ
f (Qi ) ⊆ { λi bi | |λi | ≤ (C + ε) n , |λn | ≤ ε n }
N N
i=1
and hence the measure of f (Qi ) is smaller than N n . Since there are at most
C̃ε
C̃ε.
δy (.) is the Dirac distribution at y. But since we don’t want to mess with
distributions, we replace δy (.) by φε (. − y), where {φε }ε>0 is a family of
functions such R that φε is supported on the ball Bε (0) of radius ε around 0
and satisfies Rn φε (x)dn x = 1.
Lemma 10.6 (Heinz). Suppose f ∈ C̄y1 (U, Rn ) and y 6∈ CV(f ). Then the
degree defined as in (10.12) satisfies
Z
deg(f, U, y) = φε (f (x) − y)Jf (x)dn x (10.14)
U
for all positive ε smaller than a certain ε0 depending on f and y. Moreover,
supp(φε (f (.) − y)) ⊂ U for ε < dist(y, f (∂U )).
Z N Z
X
φε (f (x) − y)Jf (x)dn x = φε (f (x) − y)Jf (x)dn x
U i=1 U (xi )
N
X Z
= sign(Jf (xi )) φε (x̃)dn x̃ = deg(f, U, y),
i=1 Bε0 (0)
Our new integral representation makes sense even for critical values. But
since ε0 depends on f and y, continuity is not clear. This will be tackled
next.
The key idea is to show that the integral representation is independent
of ε as long as ε < dist(y, f (∂U )). To this end we will rewrite the difference
as an integral over a divergence supported in U and then apply the Gauss–
Green theorem. For this purpose the following result will be used.
Proof. We compute
n
X n
X
div Df (u) = ∂xj Df (u)j = Df (u)j,k ,
j=1 j,k=1
where Df (u)j,k is the determinant of the matrix obtained from the matrix
associated with Df (u)j by applying ∂xj to the k-th column. Since ∂xj ∂xk f =
∂xk ∂xj f we infer Df (u)j,k = −Df (u)k,j , j 6= k, by exchanging the k-th and
the j-th column. Hence
n
X
div Df (u) = Df (u)i,i .
i=1
(i,j)
Now let Jf (x) denote the (i, j) cofactor of df (x) and recall the cofactor
(i,j)
expansion of the determinant ni=1 Jf ∂xi fk = δj,k Jf . Using this to expand
P
288 10. The Brouwer mapping degree
as required.
where f˜ ∈ C̄y1 (U, Rn ) is in the same component of C̄y (U, Rn ), say kf − f˜k∞ <
dist(y, f (∂U )), such that y ∈ RV(f˜).
Proof. We will first show that our integral formula works in fact for all
ε < ρ := dist(y, f (∂U )). For this we will make some additional assumptions:
Let f ∈ C̄ 2 (U, Rn ) and choose Ra family of functions φε ∈ C ∞ ((0, ∞)) with
ε
supp(φε ) ⊂ (0, ε) such that Sn 0 φ(r)rn−1 dr = 1. Consider
Z
Iε (f, U, y) := φε (|f (x) − y|)Jf (x)dn x.
U
Then I := Iε1 − Iε2 will be of the same form but with φε replaced
Rρ by ϕ :=
φε1 −φε2 , where ϕ ∈ C ∞ ((0, ∞)) with supp(ϕ) ⊂ (0, ρ) and 0 ϕ(r)rn−1 dr =
0. To show that I = 0 we will use our previous lemma with u chosen such
that div(u(x)) = ϕ(|x|). To this end we make the ansatz u(x) = ψ(|x|)x
such that div(u(x)) = |x|ψ 0 (|x|) + n ψ(|x|). Our requirement now leads to
an ordinary differential equation whose solution is
Z r
1
ψ(r) = n sn−1 ϕ(s)ds.
r 0
Moreover, one checks ψ ∈ C ∞ ((0, ∞)) with supp(ψ) ⊂ (0, ρ). Thus our
lemma shows Z
I= div Df −y (u)dn x
U
and since the integrand vanishes in a neighborhood of ∂U we can extend it
to all of Rn by setting it zero outside U and choose a cube Q ⊃ U . Then
elementary coordinatewise integration gives I = Q div Df −y (u)dn x = 0.
R
10.3. Extension of the determinant formula 289
(1 − t)f (x) − tx must have a zero (t0 , x0 ) ∈ (0, 1) × ∂U and hence f (x0 ) =
1−t0 x0 . Otherwise, if deg(f, U, 0) = −1 we can apply the same argument to
t0
where the sum is even since for every x ∈ f −1 (0) \ {0} we also have −x ∈
f −1 (0) \ {0} as well as Jf (x) = Jf (−x).
Hence we need to reduce the general case to this one. Clearly if f ∈
C̄0 (U, Rn ) we can choose an approximating f0 ∈ C̄01 (U, Rn ) and replacing f0
by its odd part 12 (f0 (x) − f0 (−x)) we can assume f0 to be odd. Moreover, if
Jf0 (0) = 0 we can replace f0 by f0 (x) + δx such that 0 is regular. However,
if we choose a nearby regular value y and consider f0 (x) − y we have the
problem that constant functions are even. Hence we will try the next best
thing and perturb by a function which is constant in all except one direction.
To this end we choose an odd function ϕ ∈ C 1 (R) such that ϕ0 (0) = 0 (since
we don’t want to alter the behavior at 0) and ϕ(t) 6= 0 for t 6= 0. Now we
consider f1 (x) = f0 (x) − ϕ(x1 )y 1 and note
f0 (x) f0 (x)
df1 (x) = df0 (x) − dϕ(x1 )y 1 = df0 (x) − dϕ(x1 ) = ϕ(x1 )d
ϕ(x1 ) ϕ(x1 )
for every x ∈ U1 := {x ∈ U |x1 6= 0} with f1 (x) = 0. Hence if y 1 is chosen
f0 (x)
such that y 1 ∈ RV(h1 ), where h1 : U1 → Rn , x 7→ ϕ(x 1)
, then 0 will be
10.3. Extension of the determinant formula 291
At first sight the obvious conclusion that an odd function has a zero
does not seem too spectacular since the fact that f is odd already implies
f (0) = 0. However, the result gets more interesting upon observing that it
suffices when the boundary values are odd. Moreover, local constancy of the
degree implies that f does not only attain 0 but also any y in a neighborhood
of 0. The next two important consequences are based on this observation:
This theorem is often illustrated by the fact that there are always two
opposite points on the earth which have the same weather (in the sense that
they have the same temperature and the same pressure). In a similar manner
one can also derive the invariance of domain theorem.
Proof. Suppose there where such a map and extend it to a map from U to
Rn by setting the additional coordinates equal to zero. The resulting map
contradicts the invariance of domain theorem.
f ◦ R ∈ C(B̄ρ (0), B̄ρ (0)). By our previous analysis, there is a fixed point
x = f˜(x) ∈ conv(f (K)) ⊆ K.
Proof. We equip Rn with the norm |x|1 := nj=1 |xj | and set ∆ := {x ∈
P
Ax
f : ∆ → ∆, x 7→
|Ax|1
has a fixed point x0 by the Brouwer fixed point theorem. Then Ax0 =
|Ax0 |1 x0 and x0 has positive components since Am x0 = |Ax0 |m
1 x0 has.
v1 v2
For each vertex vi in this subdivision pick an element yi ∈ f (vi ). Now de-
fine f k (vi ) = yi and extend f k to the interior of each subsimplex as before.
10.5. Kakutani’s fixed point theorem and applications to game theory 295
m
X m
X
xk = λki vik = λki yik , yik = f k (vik ). (10.18)
i=1 i=1
If f (x) contains precisely one point for all x, then Kakutani’s theorem
reduces to the Brouwer’s theorem (show that the closedness of Γ is equivalent
to continuity of f ).
Now we want to see how this applies to game theory.
An n-person game consists of n players who have mi possible actions to
choose from. The set of all possible actions for the i-th player will be denoted
by Φi = {1, . . . , mi }. An element ϕi ∈ Φi is also called a pure strategy for
reasons to become clear in a moment. Once all players have chosen their
move ϕi , the payoff for each player is given by the payoff function
n
Ri (ϕ) ∈ R, ϕ = (ϕ1 , . . . , ϕn ) ∈ Φ = Φi (10.19)
i=1
of the i-th player. We will consider the case where the game is repeated a
large number of times and where in each step the players choose their action
according to a fixed strategy. Here a strategy si for the i-th player is a
probability distribution on Φi , that is, si = (s1i , . . . , sm
i ) such that si ≥ 0
i k
Pmi k
and k=1 si = 1. The set of all possible strategies for the i-th player is
denoted by Si . The number ski is the probability for the k-th pure strategy
to be chosen. Consequently, if s = (s1 , . . . , sn ) ∈ S = ni=1 Si is a collection
of strategies, then the probability that a given collection of pure strategies
gets chosen is
n
Y
s(ϕ) = si (ϕ), si (ϕ) = ski i , ϕ = (k1 , . . . , kn ) ∈ Φ (10.20)
i=1
296 10. The Brouwer mapping degree
(assuming all players make their choice independently) and the expected
payoff for player i is X
Ri (s) = s(ϕ)Ri (ϕ). (10.21)
ϕ∈Φ
By construction, Ri : S → R is polynomial and hence in particular continu-
ous.
The question is of course, what is an optimal strategy for a player? If
the other strategies are known, a best reply of player i against s would be
a strategy si satisfying
Ri (s \ si ) = max Ri (s \ s̃i ) (10.22)
s̃i ∈Si
Of course, both players could get the payoff 1 if they both agree to cooperate.
But if one would break this agreement in order to increase his payoff, the
other one would get less. Hence it might be safer to defect.
Now that we have seen that Nash equilibria are a useful concept, we
want to know when such an equilibrium exists. Luckily we have the following
result.
Theorem 10.17 (Nash). Every n-person game has at least one Nash equi-
librium.
X
deg(g ◦ f, U, y) = deg(f, U, Gj ) deg(g, Gj , y), (10.27)
j
where only finitely many terms in the sum are nonzero (and in particu-
lar, summands corresponding to unbounded components are considered to
be zero).
straightforward calculation
X
deg(g ◦ f, U, y) = sign(Jg◦f (x))
x∈(g◦f )−1 ({y})
X
= sign(Jg (f (x))) sign(Jf (x))
x∈(g◦f )−1 ({y})
X X
= sign(Jg (z)) sign(Jf (x))
z∈g −1 ({y}) x∈f −1 ({z}))
X
= sign(Jg (z)) deg(f, U, z)
z∈g −1 ({y})
Now choose f˜ ∈ C 1 such that |f (x) − f˜(x)| < 2−1 dist(g −1 ({y}), f (∂U ))
for x ∈ U and define G̃j , L̃l accordingly. Then we have Ll ∩ g −1 ({y}) =
L̃l ∩ g −1 ({y}) by Theorem 10.1 (iii) and hence deg(g, L̃l , y) = deg(g, Ll , y)
by Theorem 10.1 (i) implying
m̃
X
deg(g ◦ f, U, y) = deg(g ◦ f˜, U, y) = deg(f˜, U, G̃j ) deg(g, G̃j , y)
j=1
X X
= l deg(g, L̃l , y) = l deg(g, Ll , y)
l6=0 l6=0
ml
XX m
X
= l deg(g, Gj l , y) = deg(f, U, Gj ) deg(g, Gj , y)
k
l6=0 k=1 j=1
300 10. The Brouwer mapping degree
The Leray–Schauder
mapping degree
301
302 11. The Leray–Schauder mapping degree
Our next aim is to tackle the infinite dimensional case. The following
example due to Kakutani shows that the Brouwer fixed point theorem (and
hence also the Brouwer degree) does not generalize to infinite dimensions
directly.
Example 11.1. Let X be the Hilbert space `2 (N) and let R be the right
shift
p given by Rx := (0, px1 , x2 , . . . ). Define f : B̄1 (0) → B̄1 (0), x 7→
1 − kxk δ1 + Rx = ( 1 − kxk2 , x1 , x2 , . . . ). Then a short calculation
2
shows kf (x)k2 p
= (1 − kxk2 ) + kxk2 = 1 and any fixed point must satisfy
kxk = 1, x1 = 1 − kxk2 = 0 and xj+1 = xj , j ∈ N giving the contradiction
xj = 0, j ∈ N.
However, by the reduction property we expect that the degree should
hold for functions of the type I + F , where F has finite dimensional range.
In fact, it should work for functions which can be approximated by such
functions. Hence as a preparation we will investigate this class of functions.
Proof. Pick {xi }ni=1 ⊆ K such that ni=1 Bε (xi ) covers K. Let {φi }ni=1 be
S
a partition of unity (restricted to K) subordinate
Pn to {Bε (xi )}i=1 , that is,
n
φi ∈ C(K, [0, 1]) with supp(φi ) ⊂ Bε (xi ) and i=1 φi (x) = 1, x ∈ K. Set
n
X
Pε (x) = φi (x)xi ,
i=1
then
n
X n
X n
X
|Pε (x) − x| = | φi (x)x − φi (x)xi | ≤ φi (x)|x − xi | ≤ ε.
i=1 i=1 i=1
xnm + F (xnm ) such that ynm → y. As before this implies xnm → x and thus
(I + F )−1 (K) is compact.
Proof. Except for (iv) all statements follow easily from the definition of the
degree and the corresponding property for the degree in finite dimensional
spaces. Considering H(t, x) − y(t), we can assume y(t) = 0 by (i). Since
H([0, 1], ∂U ) is compact, we have ρ = dist(y, H([0, 1], ∂U )) > 0. By Theo-
rem 11.2 we can pick H1 ∈ F([0, 1] × U, X) such that |H(t) − H1 (t)| < ρ,
t ∈ [0, 1]. This implies deg(I + H(t), U, 0) = deg(I + H1 (t), U, 0) and the rest
follows from Theorem 10.2.
In addition, Theorem 10.1 and Theorem 10.2 hold for the new situation
as well (no changes are needed in the proofs).
Theorem 11.5. Let F, G ∈ C¯y (U, X), then the following statements hold.
(i). We have deg(I + F, ∅, y) = 0. Moreover, if Ui , 1 ≤ i ≤ N , are
disjoint open subsets of U such that y 6∈ (I + F )(U \ N
S
PN i=1 Ui ), then
deg(I + F, U, y) = i=1 deg(I + F, Ui , y).
(ii). If y 6∈ (I + F )(U ), then deg(I + F, U, y) = 0 (but not the other way
round). Equivalently, if deg(I + F, U, y) 6= 0, then y ∈ (I + F )(U ).
(iii). If |F (x) − G(x)| < dist(y, (I + F )(∂U )), x ∈ ∂U , then deg(I +
F, U, y) = deg(I + G, U, y). In particular, this is true if F (x) =
G(x) for x ∈ ∂U .
(iv). deg(I + ., U, y) is constant on each component of C¯y (U, X).
(v). deg(I + F, U, .) is constant on each component of X \ (I + F )(∂U ).
In the same way as in the finite dimensional case we also obtain the
invariance of domain theorem.
Now we can extend the Brouwer fixed point theorem to infinite dimen-
sional spaces as well.
Theorem 11.9 (Schauder fixed point). Let K be a closed, convex, and
bounded subset of a Banach space X. If F ∈ C(K, K), then F has at least
one fixed point. The result remains valid if K is only homeomorphic to a
closed, convex, and bounded subset.
Proof. Consider the open cover {Bρ(x) (x)}x∈X\K for X \ K, where ρ(x) =
dist(x, K)/2. Choose a (locally finite) partition of unity {φλ }λ∈Λ subordinate
to this cover (cf. Lemma B.30) and set
X
F̄ (x) := φλ (x)F (xλ ) for x ∈ X \ K,
λ∈Λ
where xλ ∈ K satisfies dist(xλ , supp φλ ) ≤ 2 dist(K, supp φλ ). By con-
struction, F̄ is continuous except for possibly at the boundary of K. Fix
x0 ∈ ∂K, ε > 0 and choose δ > 0 such that |F (x) − F (x0 )| ≤ ε for all
11.4. The Leray–Schauder principle 307
x ∈ K with |x − x0 | < 4δ. We will show that |F̄ (x) − F (x0 )| ≤ ε for
all
P x ∈ X with |x − x0 | < δ. Suppose x 6∈ K, then |F̄ (x) − F (x0 )| ≤
λ∈Λ φλ (x)|F (xλ ) − F (x0 )|. By our construction, xλ should be close to x
for all λ with x ∈ supp φλ since x is close to K. In fact, if x ∈ supp φλ we
have
|x − xλ | ≤ dist(xλ , supp φλ ) + diam(supp φλ )
≤ 2 dist(K, supp φλ ) + diam(supp φλ ),
where diam(supp φλ ) := supx,y∈supp φλ |x − y|. Since our partition of unity is
subordinate to the cover {Bρ(x) (x)}x∈X\K we can find a x̃ ∈ X \ K such that
supp φλ ⊂ Bρ(x̃) (x̃) and hence diam(supp φλ ) ≤ 2ρ(x̃) ≤ dist(K, Bρ(x̃) (x̃)) ≤
dist(K, supp φλ ). Putting it all together implies that we have |x − xλ | ≤
3 dist(K, supp φλ ) ≤ 3|x0 − x| whenever x ∈ supp φλ and thus
|x0 − xλ | ≤ |x0 − x| + |x − xλ | ≤ 4|x0 − x| ≤ 4δ
as expected. By our choice of δ we have |F (xλ ) − F (x0 )| ≤ ε for all λ with
φλ (x) 6= 0. Hence |F (x) − F (x0 )| ≤ ε whenever |x − x0 | ≤ δ and we are
done.
Example 11.3. Consider the nonlinear integral equation
Z 1
x = F (x), F (x)(t) := e−ts cos(λx(s))ds
0
in X := C[0, 1] with λ > 0. Then one checks that F ∈ C(X, X) since
Z 1
|F (x)(t) − F (y)(t)| ≤ e−ts | cos(λx(s)) − cos(λy(s))|ds
0
Z 1
≤ e−ts λ|x(s) − y(s)|ds ≤ λkx − yk∞ .
0
In particular, for λ < 1 we have a contraction and the contraction principle
gives us existence of a unique fixed point. Moreover, proceeding similarly,
one obtains estimates for the norm of F (x) and its derivative:
kF (x)k∞ ≤ 1, kF (x)0 k∞ ≤ 1.
Hence the Arzelà–Ascoli theorem (Theorem B.39) implies that the image of
F is a compact subset of the unit ball and hence F ∈ C(B̄1 (0), B̄1 (0)). Thus
the Schauder fixed point theorem guarantees a fixed point for all λ > 0.
Finally, let us prove another fixed point theorem which covers several
others as special cases.
Theorem 11.11. Let U ⊂ X be open and bounded and let F ∈ C(U , X).
Suppose there is an x0 ∈ U such that
F (x) − x0 6= α(x − x0 ), x ∈ ∂U, α ∈ (1, ∞). (11.7)
308 11. The Leray–Schauder mapping degree
Proof. Our strategy is to verify (11.7) with x0 = 0. (i). F (∂Bρ (0)) ⊆ B̄ρ (0)
and F (x) = αx for |x| = ρ implies |α|ρ ≤ ρ and hence (11.7) holds. (ii).
F (x) = αx for |x| = ρ implies (α − 1)2 ρ2 ≥ (α2 − 1)ρ2 and hence α ≤ 1.
(iii). Special case of (ii) since |F (x) − x|2 = |F (x)|2 − 2hF (x), xi + |x|2 .
Proof. Note that, by our assumption on λ, λF + y maps B̄ρ (y) into itself.
Now apply the Schauder fixed point theorem.
This result immediately gives the Peano theorem for ordinary differential
equations.
Theorem 11.15 (Peano). Consider the initial value problem
ẋ = f (t, x), x(t0 ) = x0 , (11.10)
where f ∈ C(I ×U, Rn ) and I ⊂ R is an interval containing t0 . Then (11.10)
has at least one local solution x ∈ C 1 ([t0 −ε, t0 +ε], Rn ), ε > 0. For example,
any ε satisfying εM (ε, ρ) ≤ ρ, ρ > 0 with M (ε, ρ) := max |f ([t0 − ε, t0 + ε] ×
B̄ρ (x0 ))| works. In addition, if M (ε, ρ) ≤ M̃ (ε)(1 + ρ), then there exists a
global solution.
310 11. The Leray–Schauder mapping degree
Recall that by the Poincaré inequality (Theorem 7.30 from [47]) the corre-
sponding norm is equivalent to the usual one. In order to take care of the
incompressibility condition we will choose
X := {v ∈ H01 (U, R3 )|∇ · v = 0}. (11.14)
as our configuration space (check that this is a closed subspace of H01 (U, R3 )).
Now we multiply (11.12) by w ∈ X and integrate over U
Z Z
η∆v − ρ(v · ∇)v − K · w d x = (∇p) · w d3 x
3
U
ZU
= p(∇w)d3 x = 0, (11.15)
U
where we have used integration by parts (Lemma 7.9 from [47] (iii)) to
conclude that the pressure term drops out of our picture. Using further inte-
gration by parts we finally arrive at the weak formulation of the stationary
Navier–Stokes equation
Z
ηhv, wi − a(v, v, w) − K · w d3 x = 0, for all w ∈ X , (11.16)
U
where
3 Z
X
a(u, v, w) := uk vj (∂k wj ) d3 x. (11.17)
j,k=1 U
Monotone maps
Proof. Our first assumption implies that G(x) = F (x) − y satisfies G(x)x =
F (x)x − yx > 0 for |x| sufficiently large. Hence the first claim follows from
Problem 10.2. The second claim is trivial.
315
316 12. Monotone maps
Proof. Set
G(x) := x − t(F (x) − y), t > 0,
then F (x) = y is equivalent to the fixed point equation
G(x) = x.
It remains to show that G is a contraction. We compute
kG(x) − G(x̃)k2 = kx − x̃k2 − 2thF (x) − F (x̃), x − x̃i + t2 kF (x) − F (x̃)k2
C
≤ (1 − 2 (Lt) + (Lt)2 )kx − x̃k2 ,
L
where L is a Lipschitz constant for F (i.e., kF (x) − F (x̃)k ≤ Lkx − x̃k).
Thus, if t ∈ (0, 2C
L ), G is a uniform contraction and the rest follows from the
uniform contraction principle.
Again observe that our proof is constructive. In fact, the best choice
for t is clearly t = LC2 such that the contraction constant θ = 1 − ( C
L ) is
2
hAx, xi ≥ Ckxk2
and
|a(x, z) − a(y, z)| ≤ Lkzkkx − yk. (12.12)
Then there is a unique x ∈ H such that (12.10) holds.
Proof. By the Riesz lemma (Theorem 2.10) there are elements F (x) ∈ H
and z ∈ H such that a(x, y) = b(y) is equivalent to hF (x) − z, yi = 0, y ∈ H,
and hence to
F (x) = z.
By (12.11) the map F is strongly monotone. Moreover, by (12.12) we infer
kxn k ≤ R. (12.17)
implies F (x) = y.
At the beginning of the 20th century Russell showed with his famous paradox
that naive set theory can lead into contradictions. Hence it was replaced by
axiomatic set theory, more specific we will take the Zermelo–Fraenkel
set theory (ZF), which assumes existence of some sets (like the empty
set and the integers) and defines what operations are allowed. Somewhat
informally (i.e. without writing them using the symbolism of first order logic)
they can be stated as follows:
The last axiom implies that the empty set is unique and that any set which
is not equal to the empty set has at least one element.
321
322 A. Some set theory
holds for any other sufficiently rich (such that one can do basic math) system
of axioms. In particular, it also holds for ZFC defined below. So we have to
live with the fact that someday someone might come and prove that ZFC is
inconsistent.
Starting from ZF one can develop basic analysis (including the construc-
tion of the real numbers). However, it turns out that several fundamental
results require yet another construction for their proof:
Given an index set A and for every α ∈ A some set M Sα the product
α∈A M α is defined to be the set of all functions ϕ : A → α∈A Mα which
assign each element α ∈ A some element mα ∈ Mα . If all sets Mα are
nonempty it seems quite reasonable that there should be such a choice func-
tion which chooses an element from Mα for every α ∈ A. However, no matter
how obvious this might seem, it cannot be deduced from the ZF axioms alone
and hence has to be added:
• Axiom of Choice: Given an index set A and nonempty sets
{Mα }α∈A their product α∈A Mα is nonempty.
ZF augmented by the axiom of choice is known as ZFC and we accept
it as the fundament upon which our functional analytic house is built.
Note that the axiom of choice is not only used to ensure that infinite
products are nonempty but also in many proofs! For example, suppose you
start with a set M1 and recursively construct some sets Mn such that in
every step you have a nonempty set. Then the axiom of choice guarantees
the existence of a sequence x = (xn )n∈N with xn ∈ Mn .
The axiom of choice has many important consequences (many of which
are in fact equivalent to the axiom of choice and it is hence a matter of taste
which one to choose as axiom).
A partial order is a binary relation "" over a set P such that for all
A, B, C ∈ P:
• A A (reflexivity),
• if A B and B A then A = B (antisymmetry),
• if A B and B C then A C (transitivity).
It is custom to write A ≺ B if A B and A 6= B.
Example A.1. Let P(X) be the collections of all subsets of a set X. Then
P is partially ordered by inclusion ⊆.
It is important to emphasize that two elements of P need not be com-
parable, that is, in general neither A B nor B A might hold. However,
if any two elements are comparable, P will be called totally ordered. A
set with a total order is called well-ordered if every nonempty subset has
324 A. Some set theory
Proof. Otherwise the set of all k for which A(k) is false had a least element
k0 . But by our choice of k0 , A(l) holds for all l ≺ k0 and thus for k0
contradicting our assumption.
We will also frequently use the cardinality of sets: Two sets A and
B have the same cardinality, written as |A| = |B|, if there is a bijection
ϕ : A → B. We write |A| ≤ |B| if there is an injective map ϕ : A → B. Note
that |A| ≤ |B| and |B| ≤ |C| implies |A| ≤ |C|. A set A is called infinite if
|A| ≥ |N|, countable if |A| ≤ |N|, and countably infinite if |A| = |N|.
Theorem A.3 (Schröder–Bernstein). |A| ≤ |B| and |B| ≤ |A| implies |A| =
|B|.
The cardinality of the power set P(A) is strictly larger than the cardi-
nality of A.
Theorem A.5 (Cantor). |A| < |P(A)|.
This innocent looking result also caused some grief when announced by
Cantor as it clearly gives a contradiction when applied to the set of all sets
(which is fortunately not a legal object in ZFC).
The following result and its corollary will be used to determine the car-
dinality of unions and products.
Lemma A.6. Any infinite set can be written as a disjoint union of countably
infinite sets.
Proof. Without loss of generality we can assume |B| ≤ |A| (otherwise ex-
change both sets). Then |A| ≤ |A × B| ≤ |A × A| and it suffices to show
|A × A| = |A|.
We proceed as before and consider the set of all bijective functions ϕα :
Aα → Aα × Aα with Aα ⊆ A with the same partial ordering as before. By
A. Some set theory 327
Since this map is again injective (note that we avoid expansions which are
eventually 1) we get |P(N)| ≤ |[0, 1)|.
Hence we have
|N| < |P(N)| = |R| (A.1)
and the continuum hypothesis states that there are no sets whose cardi-
nality lie in between. It was shown by Gödel and Cohen that it, as well as
its negation, is consistent with ZFC and hence cannot be decided within this
framework.
Problem A.1. Show that Zorn’s lemma implies the axiom of choice. (Hint:
Consider the set of all partial choice functions defined on a subset.)
Problem A.2. Show |RN | = |R|. (Hint: Without loss we can replace R by
(0, 1) and identify each x ∈ (0, 1) with its decimal expansion. Now the digits
in a given sequence are indexed by two countable parameters.)
Appendix B
This chapter collects some basic facts from metric and topological spaces as
a reference for the main text. I presume that you are familiar with most of
these topics from your calculus course. As a general reference I can warmly
recommend Kelley’s classical book [25] or the nice book by Kaplansky [23].
As always such a brief compilation introduces a zoo of properties. While
sometimes the connection between these properties are straightforward, oth-
ertimes they might be quite tricky. So if at some point you are wondering if
there exists an infinite multi-variable sub-polynormal Woffle which does not
satisfy the lower regular Q-property, start searching in the book by Steen
and Seebach [43].
B.1. Basics
One of the key concepts in analysis is convergence. To define convergence
requires the notion of distance. Motivated by the Euclidean distance one is
lead to the following definition:
A metric space is a space X together with a distance function d :
X × X → [0, ∞) such that for arbitrary points x, y, z ∈ X we have
329
330 B. Metric and topological spaces
That is, O is closed under finite intersections and arbitrary unions. In-
deed, (i) is obvious, (ii) follows since the intersection of two open balls cen-
tered at x is again an open ball centered at x (explicitly Br1 (x) ∩ Br2 (x) =
Bmin(r1 ,r2) (x)), and (iii) follows since every ball contained in one of the sets
is also contained in the union.
Now it turns out that for defining convergence, a distance is slightly more
than what is actually needed. In fact, it suffices to know when a point is
B.1. Basics 331
shows Br/√n (x) ⊆ B̃r (x) ⊆ Br (x), where B, B̃ are balls computed using d,
˜ respectively. In particular, both distances will lead to the same notion of
d,
convergence.
Example B.4. We can always replace a metric d by the bounded metric
˜ y) := d(x, y)
d(x, (B.5)
1 + d(x, y)
without changing the topology (since the family of open balls does not
change: Bδ (x) = B̃δ/(1+δ) (x)). To see that d˜ is again a metric, observe
that f (r) = 1+r r
is monotone as well as concave and hence subadditive,
f (r + s) ≤ f (r) + f (s) (cf. Problem B.3).
Every subspace Y of a topological space X becomes a topological space
of its own if we call O ⊆ Y open if there is some open set Õ ⊆ X such that
O = Õ ∩Y . This natural topology O ∩Y is known as the relative topology
(also subspace, trace or induced topology).
Example B.5. The set (0, 1] ⊆ R is not open in the topology of X := R,
but it is open in the relative topology when considered as a subset of Y :=
[−1, 1].
332 B. Metric and topological spaces
A family of open sets B ⊆ O is called a base for the topology if for each
x and each neighborhood U (x), there is some set O ∈ B with x ∈ O ⊆ U (x).
Since an open set
S O is a neighborhood of every one of its points, it can be
written as O = O⊇Õ∈B Õ and we have
Lemma B.1. A family of open sets B ⊆ O is a base for the topology if and
only if every open set can be written as a union of elements from B.
Proof. To see the converse let x and U (x) be given. Then U (x) contains an
open set O containing x which can be written as a union of elements from
B. One of the elements in this union must contain x and this is the set we
are looking for.
to take balls with rational center, and hence Rn (as well as Cn ) is second
countable.
Given two topologies on X their intersection will again be a topology on
X. In fact, the intersection of an arbitrary collection of topologies is again a
topology and hence given a collection M of subsets of X we can define the
topology generated by M as the smallest topology (i.e., the intersection of all
topologies) containing M. Note that if M is closed under finite intersections
and ∅, X ∈ M, then it will be a base for the topology generated by M
(Problem B.9).
Given two bases we can use them to check if the corresponding topologies
are equal.
The next definition will ensure that limits are unique: A topological
space is called a Hausdorff space if for any two different points there are
always two disjoint neighborhoods.
Example B.9. Any metric space is a Hausdorff space: Given two different
points x and y, the balls Bd/2 (x) and Bd/2 (y), where d = d(x, y) > 0,
are disjoint neighborhoods. A pseudometric space will in general not be
Hausdorff since two points of distance 0 cannot be separated by open balls.
The complement of an open set is called a closed set. It follows from
De Morgan’s laws
[ \ \ [
X\ Uα = (X \ Uα ), X \ Uα = (X \ Uα ) (B.6)
α α α α
That is, closed sets are closed under finite unions and arbitrary intersections.
334 B. Metric and topological spaces
The smallest closed set containing a given set U is called the closure
\
U := C, (B.7)
C∈C,U ⊆C
and the largest open set contained in a given set U is called the interior
[
U ◦ := O. (B.8)
O∈O,O⊆U
It is not hard to see that the closure satisfies the following axioms (Kuratowski
closure axioms):
(i) ∅ = ∅,
(ii) U ⊆ U ,
(iii) U = U ,
(iv) U ∪ V = U ∪ V .
In fact, one can show that these axioms can equivalently be used to define the
topology by observing that the closed sets are precisely those which satisfy
U = U . Similarly, the open sets are precisely those which satisfy U ◦ = U .
Lemma B.3. Let X be a topological space. Then the interior of U is the
set of all interior points of U , and the closure of U is the union of U with
all limit points of U . Moreover, ∂U = U \ U ◦ .
Proof. The first claim is straightforward. For the second claim observe that
by Problem B.7 we have that U = (X \ (X \ U )◦ ), that is, the closure is the
set of all points which are not interior points of the complement. That is,
x 6∈ U iff there is some open set O containing x with O ⊆ X \ U . Hence,
x ∈ U iff for all open sets O containing x we have O 6⊆ X \ U , that is,
O ∩ U 6= ∅. Hence, x ∈ U iff x ∈ U or if x is a limit point of U . The last
claim is left as Problem B.8.
Example B.10. For any x ∈ X the closed ball
B̄r (x) := {y ∈ X|d(x, y) ≤ r} (B.9)
is a closed set (check that its complement is open). But in general we have
only
Br (x) ⊆ B̄r (x) (B.10)
since an isolated point y with d(x, y) = r will not be a limit point. In Rn
(or Cn ) we have of course equality.
Problem B.1. Show that |d(x, y) − d(z, y)| ≤ d(x, z).
Problem B.2. Show the quadrangle inequality |d(x, y) − d(x0 , y 0 )| ≤
d(x, x0 ) + d(y, y 0 ).
B.2. Convergence and completeness 335
Proof. From every dense set we get a countable base by considering open
balls with rational radii and centers in the dense set. Conversely, from every
countable base we obtain a dense set by choosing an element from each set
in the base.
Proof. Let A = {xn }n∈N be a dense set in X. The only problem is that A∩Y
might contain no elements at all. However, some elements of A must be at
least arbitrarily close to this intersection: Let J ⊆ N2 be the set of all pairs
(n, m) for which B1/m (xn )∩Y 6= ∅ and choose some yn,m ∈ B1/m (xn )∩Y for
all (n, m) ∈ J. Then B = {yn,m }(n,m)∈J ⊆ Y is countable. To see that B is
dense, choose y ∈ Y . Then there is some sequence xnk with d(xnk , y) < 1/k.
Hence (nk , k) ∈ J and d(ynk ,k , y) ≤ d(ynk ,k , xnk ) + d(xnk , y) ≤ 2/k → 0.
if x = (xn )n∈N and y = (yn )n∈N are Cauchy sequences, so is d(xn , yn ) and
hence we can define a metric on X̄ via
Problem B.11. Let X be a metric space and denote by B(X) the set of all
bounded functions X → C. Introduce the metric
Problem B.12. Let X be a metric space and B(X) as in the previous prob-
lem. Consider the embedding J : X ,→ B(X) defind via
for some fixed x0 ∈ X. Show that this embedding is isometric. Hence J(X)
is another (equivalent) completion of X.
B.3. Functions 339
B.3. Functions
Next, we come to functions f : X → Y , x 7→ f (x). We use the usual
conventions f (U ) := {f (x)|x ∈ U } for U ⊆ X and f −1 (V ) := {x|f (x) ∈ V }
for V ⊆ Y . Note
U ⊆ f −1 (f (U )), f (f −1 (V )) ⊆ V. (B.14)
The set Ran(f ) := f (X) is called the range of f , and X is called the
domain of f . A function f is called injective or one-to-one if for each
y ∈ Y there is at most one x ∈ X with f (x) = y (i.e., f −1 ({y}) contains at
most one point) and surjective or onto if Ran(f ) = Y . A function f which
is both injective and surjective is called bijective.
Recall that we always have
[ [ \ \
f −1 ( Vα ) = f −1 (Vα ), f −1 ( Vα ) = f −1 (Vα ),
α α α α
−1 −1
f (Y \ V ) = X \ f (V ) (B.15)
as well as
[ [ \ \
f ( Uα ) = f (Uα ), f ( Uα ) ⊆ f (Uα ),
α α α α
f (X) \ f (U ) ⊆ f (X \ U ) (B.16)
with equality if f is injective.
A function f between metric spaces X and Y is called continuous at a
point x ∈ X if for every ε > 0 we can find a δ > 0 such that
dY (f (x), f (y)) ≤ ε if dX (x, y) < δ. (B.17)
If f is continuous at every point, it is called continuous. In the case
dY (f (x), f (y)) = dX (x, y) we call f isometric and every isometry is of
course continuous.
Lemma B.10. Let X, Y be metric spaces. The following are equivalent:
(i) f is continuous at x (i.e., (B.17) holds).
(ii) f (xn ) → f (x) whenever xn → x.
(iii) For every neighborhood V of f (x) the preimage f −1 (V ) is a neigh-
borhood of x.
Proof. (i) ⇒ (ii) is obvious. (ii) ⇒ (iii): If (iii) does not hold, there is a
neighborhood V of f (x) such that Bδ (x) 6⊆ f −1 (V ) for every δ. Hence we
can choose a sequence xn ∈ B1/n (x) such that xn 6∈ f −1 (V ). Thus xn → x
but f (xn ) 6→ f (x). (iii) ⇒ (i): Choose V = Bε (f (x)) and observe that by
(iii), Bδ (x) ⊆ f −1 (V ) for some δ.
340 B. Metric and topological spaces
Show that both are independent of the neighborhood base and satisfy
(i) lim inf x→x0 (−f (x)) = − lim supx→x0 f (x).
(ii) lim inf x→x0 (αf (x)) = α lim inf x→x0 f (x), α ≥ 0.
(iii) lim inf x→x0 (f (x) + g(x)) ≥ lim inf x→x0 f (x) + lim inf x→x0 g(x).
Moreover, show that
lim inf f (xn ) ≥ lim inf f (x), lim sup f (xn ) ≤ lim sup f (x)
n→∞ x→x0 n→∞ x→x0
for every sequence xn → x0 and there exists a sequence attaining equality if
X is a metric space.
B.4. Product topologies 341
Problem* B.17. Show that the supremum over lower semicontinuous func-
tions is again lower semicontinuous.
Problem* B.18. Let X be a topological space and f : X → R. Show that
f is lower semicontinuous if and only if
lim inf f (x) ≥ f (x0 ), x0 ∈ X.
x→x0
continuous. In fact, this topology must contain all sets which are inverse
images of open sets U ⊆ X, that is all sets of the form U × Y as well as
all inverse images of open sets V ⊆ Y , that is all sets of the form X × V .
Adding finite intersections we obtain all sets of the form U × V and hence
the same base as before. In particular, a sequence (xn , yn ) will converge if
and only if both components converge.
Note that the product topology immediately extends to the product of
an arbitrary number of spaces X := α∈A Xα by defining it as the weakest
topology which makes all projections πα : X → Xα continuous.
Example B.15. Let X be a topological space and A an index set. Then
X A = A X is the set of all functions x : A → X and a neighborhood base
at x are sets of functions which coincide with x at a given finite number
of points. Convergence with respect to the product topology corresponds to
pointwise convergence (note that the projection πα is the point evaluation at
α: πα (x) = x(α)). If A is uncountable (and X is not equipped with the trivial
topology), then there is no countable neighborhood base (if there were such a
base, it would involve only a countable number of points, now choose a point
from the complement . . . ). In particular, there is no corresponding metric
even if X has one. Moreover, this topology cannot be characterized with
sequences alone. For example, let X = {0, 1} (with the discrete topology)
and A = R. Then the set F = {x|x−1 (1) is countable} is sequentially closed
but its closure is all of {0, 1}R (every set from our neighborhood base contains
an element which vanishes except at finitely many points).
In fact this is a special case of a more general construction which is often
used. Let {fα }α∈A be a collection of functions fα : X → Yα , where Yα are
some topological spaces. Then we can equip X with the weakest topology
(known as the initial topology) which makes all fα continuous. That is, we
take the topology generated by sets of the forms fα−1 (Oα ), where Oα ⊆ Yα
is open, known as open cylinders. Finite intersections of such sets, known
as open cylinder sets, are hence a base for the topology and a sequence xn
will converge to x if and only if fα (xn ) → fα (x) for all α ∈ A. In particular,
if the collection is countable, then X will be first (or second) countable if all
Yα are.
The initial topology has the following characteristic property:
Lemma B.11. Let X have the initial topology from a collection of functions
{fα : X → Yα }α∈A and let Z be another topological space. A function
f : Z → X is continuous (at z) if and only if fα ◦ f is continuous (at z) for
all α ∈ A.
and some open neighborhoods Oαj of fαj (f (z)). But then f −1 (U ) contains
the neighborhood f −1 ( nj=1 fα−1 (Oαj )) = nj=1 (fαj ◦ f )−1 (Oαj ) of z.
T T
j
If all Yα are Hausdorff and if the collection {fα }α∈A separates points,
that is for every x 6= y there is some α with fα (x) 6= fα (y), then X will
again be Hausdorff. Indeed for x 6= y choose α such that fα (x) 6= fα (y)
and let Uα , Vα be two disjoint neighborhoods separating fα (x), fα (y). Then
fα−1 (Uα ), fα−1 (Vα ) are two disjoint neighborhoods separating x, y. In partic-
ular, X = α∈A Xα is Hausdorff if all Xα are.
Note that a similar construction works in the other direction. Let {fα }α∈A
be a collection of functions fα : Xα → Y , where Xα are some topological
spaces. Then we can equip Y with the strongest topology (known as the
final topology) which makes all fα continuous. That is, we take as open
sets those for which fα−1 (O) is open for all α ∈ A.
Example B.16. Let ∼ be an equivalence relation on X with equivalence
classes [x] = {y ∈ X|x ∼ y}. Then the quotient topology on the set of
equivalence classes X/ ∼ is the final topology of the projection map π : X →
X/ ∼.
Example B.17. Let Xα be a collection of topological spaces. The disjoint
union [
X := · Xα
α∈A
is usually given the final topology from the canonical injections iα : Xα ,→ X
such that O ⊆ X is open if and only if O ∩ Xα is open for all α ∈ A.
Lemma B.12. Let Y have the final topology from a collection of functions
{fα : Xα → Y }α∈A and let Z be another topological space. A function
f : Y → Z is continuous if and only if f ◦ fα is continuous for all α ∈ A.
B.5. Compactness
A cover of a set Y ⊆ X is a family of sets {Uα } such that Y ⊆ α Uα . A
S
cover is called open if all Uα are open. Any subset of {Uα } which still covers
Y is called a subcover.
Lemma B.13 (Lindelöf). If X is second countable, then every open cover
has a countable subcover.
Proof. Let {Uα } be an open cover for Y , and let B be a countable base.
Since every Uα can be written as a union of elements from B, the set of all
B ∈ B which satisfy B ⊆ Uα for some α form a countable open cover for Y .
Moreover, for every Bn in this set we can find an αn such that Bn ⊆ Uαn .
By construction, {Uαn } is a countable subcover.
Proof. Denote the cover by {Oj }j∈N and introduce the sets
[
Ôj,n := B2−n (x), where
x∈Aj,n
[
Aj,n := {x ∈ Oj \ (O1 ∪ · · · ∪ Oj−1 )|x 6∈ Ôk,l and B3·2−n (x) ⊆ Oj }.
k∈N,1≤l<n
Proof. (i) Observe that if {Oα } is an open cover for f (Y ), then {f −1 (Oα )}
is one for Y .
(ii) Let {Oα } be an open cover for the closed subset Y (in the induced
topology). Then there are open sets Õα with Oα = Õα ∩Y and {Õα }∪{X\Y }
is an open cover for X which has a finite subcover. This subcover induces a
finite subcover for Y .
(iii) Let Y ⊆ X be compact. We show that X \ Y is open. Fix x ∈ X \ Y
(if Y = X there is nothing to do). By the definition of Hausdorff, for
every y ∈ Y there are disjoint neighborhoods V (y) of y and Uy (x) of x. By
compactness
Tn of Y , there are y1 , . . . , yn such that the V (yj ) cover Y . But
then j=1 Uyj (x) is a neighborhood of x which does not intersect Y .
(iv) Note that a cover of the union is a cover for each individual set and
the union of the individual subcovers is the subcover we are looking for.
346 B. Metric and topological spaces
(v) Follows from (ii) and (iii) since an intersection of closed sets is closed.
Proof. It suffices to show that f maps closed sets to closed sets. By (ii)
every closed set is compact, by (i) its image is also compact, and by (iii) it
is also closed.
Proof. We say that a family F of closed subsets of K has the finite inter-
section property if the intersection of every finite subfamily has nonempty
intersection. The collection of all such families which contain F is partially
ordered by inclusion and every chain has an upper bound (the union of all
sets in the chain). Hence, by Zorn’s lemma, there is a maximal family FM
(note that this family is closed under finite intersections).
Denote by πα : K → Kα the projection onto the α component. Then
the closed sets {πα (F )}F ∈FM also have the finite intersection property and
since Kα is compact, there is some xα ∈ F ∈FM πα (F ). Consequently, if Fα
T
at least one of these two intervals, call it I1 , contains infinitely many ele-
ments of our sequence. Let y1 = xn1 be the first one. Subdivide I1 and pick
y2 = xn2 , with n2 > n1 as before. Proceeding like this, we obtain a Cauchy
sequence yn (note that by construction In+1 ⊆ In and hence |yn − ym | ≤ b−a 2n
for m ≥ n).
Combining Theorem B.22 with Lemma B.16 (i) we also obtain the ex-
treme value theorem.
Theorem B.24 (Weierstraß). Let X be compact. Every continuous function
f : X → R attains its maximum and minimum.
Proof. (i) ⇒ (ii): Let {xn } be a dense set. Then the balls Bn,m = B1/m (xn )
form a base. Moreover, for every n there is some mn such that Bn,m is
relatively compact for m ≤ mn . Since those balls are still a base we are
done. (ii) ⇒ (iii): Take
S the union over the closures of all sets in the base.
(iii) ⇒ (vi): Let X = n Kn with Kn compact. Without loss Kn ⊆ Kn+1 .
For a given compact set K we can find a relatively compact open set V (K)
such that K ⊆ V (K) (cover K by relatively compact open balls and choose
B.5. Compactness 349
a finite subcover). Now define Un = V (Un ). (vi) ⇒ (i): Each of the sets Un
has a countable dense subset by Corollary B.21. The union gives a countable
dense set for X. Since every x ∈ Un for some n, X is also locally compact.
B.6. Separation
The distance between a point x ∈ X and a subset Y ⊆ X is
dist(x, Y ) := inf d(x, y). (B.22)
y∈Y
A topological space is called normal if for any two disjoint closed sets C1
and C2 , there are disjoint open sets O1 and O2 such that Cj ⊆ Oj , j = 1, 2.
Lemma B.28 (Urysohn). Let X be a topological space. Then X is normal
if and only if for every pair of disjoint closed sets C1 and C2 , there exists a
continuous function f : X → [0, 1] which is one on C1 and zero on C2 .
If in addition X is locally compact and C1 is compact, then f can be
chosen to have compact support.
B.6. Separation 351
that f := f s = f i is continuous.
Conversely, given f choose O1 := f −1 ([0, 1/2)) and O2 := f −1 ((1/2, 1]).
For the second claim, observe that there is an open set O0 such that O0
is compact and C1 ⊂ O0 ⊂ O0 ⊂ X \ C2 . In fact, for every x ∈ C1 , there
is a ball Bε (x) such that Bε (x) is compact and Bε (x) ⊂ X \ C2 . Since C1
is compact, finitely many of them cover C1 and we can choose the union of
those balls to be O0 .
dist(x,C2 )
Example B.21. In a metric space we can choose f (x) := dist(x,C1 )+dist(x,C2 )
and hence every metric space is normal.
Another important result is the Tietze extension theorem:
Theorem B.29 (Tietze). Suppose C is a closed subset of a normal topo-
logical space X. For every continuous function f : C → [−1, 1] there is a
continuous extension f¯ : X → [−1, 1].
2 2
obtain a function such that for x ∈ C and
f2 |f (x) − f1 (x) − f2 (x)| ≤ 3
|f2 (x)| ≤ 31 23 . Continuing this process we arrive at a sequence of functions
n n−1
fn such that |f (x) − nj=1 fj (x)| ≤ 23 for x ∈ C and |fn (x)| ≤ 31 32 .
P
By constructionPthe corresponding series converges uniformly to the desired
extension f¯ := ∞ j=1 fj .
we see that all fn (and hence all gn ) are continuous. Moreover, the very same
argument shows that f∞ is continuous and thus we have found the required
partition of unity hj = gj /f∞ .
Example B.22. The standard bump function is φ(x) := exp( |x|21−1 ) for
|x| < 1 and φ(x) = 0 otherwise. To show that this function is indeed smooth
it suffices to show that all left derivatives of f (r) = exp( r−1
1
) at r = 1
vanish, which can be done using l’Hôpital’s rule. By scaling and translation
r )we get a bump function which is supported in Br (x0 ) and satisfies
φ( x−x 0
Proof. Let Uj be as in Lemma B.25 (iv). For U j choose finitely many bump
functions h̃j,k such that h̃j,1 (x) + · · · + h̃j,kj (x) > 0 for every x ∈ U j \ Uj−1
and such that supp(h̃j,k ) is contained in one of the Ok and in Uj+1 \ Uj−1 .
Then {h̃j,k }j,k is locally finite and hence h := j,k h̃j,k is a smooth function
P
B.7. Connectedness
Roughly speaking a topological space X is disconnected if it can be split
into two (nonempty) separated sets. This of course raises the question what
should be meant by separated. Evidently it should be more than just disjoint
since otherwise we could split any space containing more than one point.
Hence we will consider two sets separated if each is disjoint form the closure
of the other. Note that if we can split X into two separated sets X = U ∪ V
then U ∩ V = ∅ implies U = U (and similarly V = V ). Hence both sets
354 B. Metric and topological spaces
must be closed and thus also open (being complements of each other). This
brings us to the following definition:
A topological space X is called disconnected if one of the following
equivalent conditions holds
• X is the union of two nonempty separated sets.
• X is the union of two nonempty disjoint open sets.
• X is the union of two nonempty disjoint closed sets.
In this case the sets from the splitting are both open and closed. A topo-
logical space X is called connected if it cannot be split as above. That
is, in a connected space X the only sets which are both open and closed
are ∅ and X. This last observation is frequently used in proofs: If the set
where a property holds is both open and closed it must either hold nowhere
or everywhere. In particular, any continuous mapping from a connected to
a discrete space must be constant since the inverse image of a point is both
open and closed.
A subset of X is called (dis-)connected if it is (dis-)connected with respect
to the relative topology. In other words, a subset A ⊆ X is disconnected if
there are disjoint nonempty open sets U and V which split A according to
A = (U ∩ A) ∪ (V ∩ A).
Example B.23. In R the nonempty connected sets are precisely the inter-
vals (Problem B.32). Consequently A = [0, 1] ∪ [2, 3] is disconnected with
[0, 1] and [2, 3] being its components (to be defined precisely below). While
you might be reluctant to consider the closed interval [0, 1] as open, it is im-
portant to observe that it is the relative topology which is relevant here.
The maximal connected subsets (ordered by inclusion) of a nonempty
topological space X are called the connected components of X.
Example B.24. Consider Q ⊆ R. Then every rational point is its own
component (if a set of rational points contains more than one point there
would be an irrational point in between which can be used to split the set).
In many applications one also needs the following stronger concept. A
space X is called path-connected if any two points x, y ∈ X can be joined
by a path, that is a continuous map γ : [0, 1] → X with γ(0) = x and
γ(1) = y. A space is called locally (path-)connected if for every given
point and every open set containing that point there is a smaller open set
which is (path-)connected.
Example B.25. Every normed vector space is (locally) path-connected since
every ball is path-connected (consider straight lines). In fact this also holds
for locally convex spaces. Every open subset of a locally (path-)connected
space is locally (path-)connected.
B.7. Connectedness 355
A few simple consequences are also worth while noting: If two different
components contain a common point, their union is again connected con-
tradicting maximality. Hence two different components are always disjoint.
Moreover, every point is contained in a component, namely the union of all
connected sets containing this point. In other words, the components of any
topological space X form a partition of X (i.e., they are disjoint, nonempty,
and their union is X). Moreover, every component is a closed subset of the
original space X. In the case where their number is finite we can take com-
plements and each component is also an open subset (the rational numbers
from our first example show that components are not open in general). In a
locally (path-)connected space, components are open and (path-)connected
by (vi) of the last lemma. Note also that in a second countable space an
open set can have at most countably many components (take those sets from
a countable base which are contained in some component, then we have a
surjective map from these sets to the components).
Example B.27. Consider the graph of the function f : (0, 1] → R, x 7→
sin( x1 ). Then Γ(f ) ⊆ R2 is path-connected and its closure Γ(f ) = Γ(f ) ∪
{0} × [−1, 1] is connected. However, Γ(f ) is not path-connected as there is
no path from (1, 0) to (0, 0). Indeed, suppose γ were such a path. Then,
since γ1 covers [0, 1] by the intermediate value theorem (see below), there is
a sequence tn → 1 such that γ1 (tn ) = (2n+1)π
2
. But then γ2 (tn ) = (−1)n 6→ 0
contradicting continuity.
B.8. Continuous functions on metric spaces 357
since d(fn (z), y) < 2ε implies d(f (z), y) ≤ d(f (z), fn (z)) + d(fn (z), y) ≤
2 + 2 = ε for n ≥ N .
ε ε
Corollary B.35. Let X be a topological space and Y a complete metric
space. The space Cb (X, Y ) together with the metric d is complete.
Proof. Choose a countable base B for X and let I the collection of all balls in
Cn with rational radius and center. Given O1 , . . . , Om ∈ B and I1 , . S
. . , Im ∈
I we say that f ∈ Cc (X, Cn ) is adapted to these sets if supp(f ) ⊆ m j=1 Oj
and f (Oj ) ⊆ Ij . The set of all tuples (Oj , Ij )1≤j≤m is countable and for
B.8. Continuous functions on metric spaces 359
each tuple we choose a corresponding adapted function (if there exists one
at all). Then the set of these functions F is dense. It suffices to show that
the closure of F contains Cc (X, Cn ). So let f ∈ Cc (X, Cn ) and let ε > 0
be given. Then for every x ∈ X there is some neighborhood O(x) ∈ B such
that |f (x) − f (y)| < ε for y ∈ O(x). Since supp(f ) is compact, it can be
covered by O(x1 ), . . . , O(xm ). In particular f (O(xj )) ⊆ Bε (f (xj )) and we
can find a ball Ij of radius at most 2ε with f (O(xj )) ⊆ Ij . Now let g be
the function from F which is adapted to (O(xj ), Ij )1≤j≤m and observe that
|f (x) − g(x)| < 4ε since x ∈ O(xj ) implies f (x), g(x) ∈ Ij .
Proof. Suppose the claim were wrong. Fix ε > 0. Then for every δn = n1
we can find xn , yn with dX (xn , yn ) < δn but dY (f (xn ), f (yn )) ≥ ε. Since
X is compact we can assume that xn converges to some x ∈ X (after pass-
ing to a subsequence if necessary). Then we also have yn → x implying
dY (f (xn ), f (yn )) → 0, a contradiction.
The proof will use the fact that the absolute value can be approximated
by polynomials on [−1, 1]. This of course follows from the Weierstraß ap-
proximation theorem but can also be seen directly by defining the sequence
of polynomials pn via
t2 − pn (t)2
p1 (t) := 0, pn+1 (t) := pn (t) + . (B.28)
2
Then this sequence of polynomials satisfies pn (t) ≤ pn+1 (t) ≤ |t| and con-
verges pointwise to |t| for t ∈ [−1, 1]. Hence by Dini’s theorem (Prob-
lem B.35) it converges uniformly. By scaling we get the corresponding result
for arbitrary compact subsets of the real line.
Proof. Just observe that F̃ = {Re(f ), Im(f )|f ∈ F } satisfies the assump-
tion of the real version. Hence every real-valued continuous function can be
approximated by elements from the subalgebra generated by F̃ ; in particular,
this holds for the real and imaginary parts for every given complex-valued
function. Finally, note that the subalgebra spanned by F̃ is contained in the
∗-subalgebra spanned by F .
Note that the additional requirement of being closed under complex con-
jugation is crucial: The functions holomorphic on the unit disc and contin-
uous on the boundary separate points, but they are not dense (since the
uniform limit of holomorphic functions is again holomorphic).
Proof. There are two possibilities: either all f ∈ F vanish at one point
t0 ∈ K (there can be at most one such point since F separates points) or
there is no such point.
If there is no such point, then the identity can be approximated by
elements in A: First of all note that |f | ∈ A if f ∈ A, since the polynomials
pn (t) used to prove this fact can be replaced by pn (t)−pn (0) which contain no
constant term. Hence for every point y we can find a nonnegative function
in A which is positive at y and by compactness we can find a finite sum
of such functions which is positive everywhere, say m ≤ f (t) ≤ M . Now
approximate min(m−1 t, t−1 ) by polynomials qn (t) (again a constant term is
not needed) to conclude that qn (f ) → f −1 ∈ A. Hence 1 = f · f −1 ∈ A as
claimed and so A = C(K) by the Stone–Weierstraß theorem.
If there is such a t0 we have A ⊆ {f ∈ C(K)|f (t0 ) = 0} and the identity
is clearly missing from A. However, adding the identity to A we get A + C =
C(K) by the Stone–Weierstraß theorem. Moreover, if f ∈ C(K) with f (t0 ) =
0 we get f = f˜ + α with f˜ ∈ A and α ∈ C. But 0 = f (t0 ) = f˜(t0 ) + α = α
implies f = f˜ ∈ A, that is, A = {f ∈ C(K)|f (t0 ) = 0}.
B.8. Continuous functions on metric spaces 363
365
366 Bibliography
[18] L. Grafakos, Modern Fourier Analysis, 2nd ed., Springer, New York, 2009.
[19] G. Grubb, Distributions and Operators, Springer, New York, 2009.
[20] E. Hewitt and K. Stromberg, Real and Abstract Analysis, Springer, Berlin, 1965.
[21] D. Hundertmark, M. Meyries, L. Machinek, and R. Schnaubelt, Operator Semi-
groups and Dispersive Equations, Lecture Notes (16th Internet Seminar on Evo-
lution Equations), 2013. https://isem.math.kit.edu/images/b/b3/Isem16_
final.pdf
[22] K. Jänich, Toplogy, Springer, New York, 1995.
[23] I. Kaplansky, Set Theory and Metric Spaces, AMS Chelsea, Providence, 1972.
[24] T. Kato, Perturbation Theory for Linear Operators, Springer, New York, 1966.
[25] J. L. Kelley, General Topology, Springer, New York, 1955.
[26] O. A. Ladyzhenskaya, The Boundary Values Problems of Mathematical Physics,
Springer, New York, 1985.
[27] P. D. Lax, Functional Analysis, Wiley, New York, 2002.
[28] E. Lieb and M. Loss, Analysis, 2nd ed., Amer. Math. Soc., Providence, 2000.
[29] F. Linares and G. Ponce, Introduction to Nonlinear Dispersive Equations, 2nd
ed., Springer, New York, 2015.
[30] G. Leoni, A First Course in Sobolev Spaces, Amer. Math. Soc., Providence, 2009.
[31] N. Lloyd, Degree Theory, Cambridge University Press, London, 1978.
[32] R. Meise and D. Vogt, Introduction to Functional Analysis, Oxford University
Press, Oxford, 2007.
[33] F. W. J. Olver et al., NIST Handbook of Mathematical Functions, Cambridge
University Press, Cambridge, 2010.
[34] I. K. Rana, An Introduction to Measure and Integration, 2nd ed., Amer. Math.
Soc., Providence, 2002.
[35] M. Reed and B. Simon, Methods of Modern Mathematical Physics I. Functional
Analysis, rev. and enl. edition, Academic Press, San Diego, 1980.
[36] J. R. Retherford, Hilbert Space: Compact Operators and the Trace Theorem,
Cambridge University Press, Cambridge, 1993.
[37] J.J. Rotman, Introduction to Algebraic Topology, Springer, New York, 1988.
[38] H. Royden, Real Analysis, Prencite Hall, New Jersey, 1988.
[39] W. Rudin, Real and Complex Analysis, 3rd edition, McGraw-Hill, New York,
1987.
[40] M. Ru̇žička, Nichtlineare Funktionalanalysis, Springer, Berlin, 2004.
[41] H. Schröder, Funktionalanalysis, 2nd ed., Harri Deutsch Verlag, Frankfurt am
Main 2000.
[42] B. Simon, A Comprehensive Course in Analysis, Amer. Math. Soc., Providence,
2015.
[43] L. A. Steen and J. A. Seebach, Jr., Counterexamples in Topology, Springer, New
York, 1978.
[44] T. Tao, Nonlinear Dispersive Equations: Local and Global Analysis, Amer. Math.
Soc., Providence, 2006.
[45] M. E. Taylor, Measure Theory and Integration, Amer. Math. Soc., Providence,
2006.
Bibliography 367
369
370 Glossary of notation
373
374 Index
completion, 23 eigenspace, 68
complexification, 36 eigenvalue, 68
component, 354 algebraic multiplicity, 171
conjugate linear, 17 geometric multiplicity, 171
connected, 354 index, 171
continuous, 339 simple, 68
contraction principle, 221 eigenvector, 68
uniform, 222 order, 171
contraction semigroup, 251 elliptic problem, 318
convergence, 335 epigraph, 220
strong, 127 equicontinuous, 26, 360
weak, 122 uniformly, 363
weak-∗, 129 equilibrium
convex, 8 Nash, 296
absolutely, 134, 146 equivalent norms, 21
cover, 344 exact sequence, 121, 186
locally finite, 344 exhaustion, 349
open, 344 extended real numbers, 332
refinement, 344 extremal
critical value, 281 point, 136
C ∗ algebra, 164 subset, 136
cylinder, 342 Extreme value theorem, 348
set, 342
Fσ set, 98
face, 137
De Morgan’s laws, 333 fat set, 98
decent, 170 Fejér kernel, 61
delay differential equation, 255 final topology, 343
demicontinuous, 318 finite dimensional map, 302
dense, 337 finite intersection property, 345
derivative first category, 98
Fréchet, 196 first countable, 332
Gâteaux, 198 first resolvent identity, 164, 257
partial, 201 fixed point theorem
variational, 198 Altman, 308
diffeomorphism, 202 Brouwer, 292
differentiable, 201 contraction principle, 221
differential equations, 224 Kakutani, 294
diffusion equation, 3 Krasnosel’skii, 308
dimension, 45 Rothe, 308
direct sum, 33 Schauder, 306
directed, 146 Weissinger, 221
Dirichlet kernel, 59 form
Dirichlet problem, 55 bounded, 23
disconnected, 354 Fourier series, 44, 58
discrete set, 330 cosine, 81
discrete topology, 331 sine, 80
disjoint union topology, 343 FPU lattice, 228
dissipative, 253 Fréchet derivative, 196
distance, 329, 350 Fréchet space, 148
divergence, 287 Fredholm alternative, 172
domain, 27 Fredholm operator, 185
double dual space, 112 Frobenius norm, 91
dual basis, 30 from domain, 78
dual space, 30 function
duality set, 252 open, 340
Duhamel formula, 238, 244 fundamental theorem of algebra, 160
Index 375
fundamental theorem of calculus, 195 Jacobi operator, 69, 228, 229, 239
Jacobi theta function, 7
Gδ set, 98 Jacobson radical, 181
Gâteaux derivative, 198 Jordan curve theorem, 300
Galerkin approximation, 319
gauge, 131 Kakutani’s fixed point theorem, 294
Gelfand transform, 180 kernel, 27
global solution, 227 Kronecker delta, 12
Gram determinant, 47 Kuratowski closure axioms, 334
Gram–Schmidt orthogonalization, 44
graph, 102
Ladyzhenskaya, 312
graph norm, 107
Landau inequality, 246
Green function, 76
Landau kernel, 14
Gronwall’s inequality, 313
Landau symbols, 196
group
Lax–Milgram theorem, 54
strongly continuous, 240
nonlinear, 317
Legendre polynomials, 44
half-space, 142
Leray–Schauder principle, 306
Hamel basis, 11, 16
Lidskij trace theorem, 93
Hankel operator, 95
Lie group, 223
Hardy space, 184
liminf, 340
Hausdorff space, 333
limit, 335
heat equation, 3, 254
limit point, 330
Heine–Borel theorem, 347
limsup, 340
Hermitian form, 17
Lindelöf theorem, 344
Hilbert space, 18
linear
dimension, 45
functional, 30, 48
Hilbert–Schmidt operator, 89
operator, 27
Hölder continuous, 37
linearly independent, 12
Hölder’s inequality, 10, 25
holomorphic function, 279 Lipschitz continuous, 37
homeomorphic, 340 locally
homeomorphism, 340 (path-)connected, 354
homotopy, 280 locally convex space, 134, 144
homotopy invariance, 281 lower semicontinuous, 340
Lyapunov–Schmidt reduction, 232
ideal, 178
proper, 178 maximal solution, 227
identity, 31, 156 maximum norm, 7
implicit function theorem, 222 meager set, 98
index, 185 mean value theorem, 204
induced topology, 331 metric, 329
Induction Principle, 324 translation invariant, 148
initial topology, 342 mild solution, 245
injective, 339 minimum modulus, 105
inner product, 17 Minkowski functional, 131
inner product space, 18 monotone, 212, 316
integral, 194 map, 315
interior, 334 strictly, 212, 316
interior point, 330 strongly, 316
inverse function theorem, 223 Morawetz identity, 275
involution, 164 multilinear function, 203
isolated point, 330 symmetric , 203
isometric, 339 multiplicative linear functional, 178
multiplicity
Jacobi determinant, 281 algebraic, 171
Jacobi matrix, 201 geometric, 171
376 Index