Funcional

Graduate Texts in Mathematics
Manfred Einsiedler
Thomas Ward
Functional
Analysis,
Spectral Theory,
and Applications
Graduate Texts in Mathematics 276
Series Editors:
Sheldon Axler
San Francisco State University, San Francisco, CA, USA
Kenneth Ribet
University of California, Berkeley, CA, USA
Advisory Board:
Alejandro Adem, University of British Columbia

David Eisenbud, University of California, Berkeley & MSRI
Irene M. Gamba, The University of Texas at Austin
J.F. Jardine, University of Western Ontario
Jeffrey C. Lagarias, University of Michigan
Ken Ono, Emory University
Jeremy Quastel, University of Toronto
Fadil Santosa, University of Minnesota
Barry Simon, California Institute of Technology
Graduate Texts in Mathematics bridge the gap between passive study and creative
understanding, offering graduate-level introductions to advanced topics in mathe-
matics. The volumes are carefully written as teaching aids and highlight character-
istic features of the theory. Although these books are frequently used as textbooks
in graduate courses, they are also suitable for individual study.
More information about this series at http://www.springer.com/series/136

Manfred Einsiedler • Thomas Ward
Functional Analysis, Spectral

Theory, and Applications
Manfred Einsiedler Thomas Ward
ETH Zürich School of Mathematics
Zürich, Switzerland University of Leeds
Leeds, UK
ISSN 0072-5285 ISSN 2197-5612 (electronic)

ISBN 978-3-319-58539-0 ISBN 978-3-319-58540-6 (eBook)
DOI 10.1007/978-3-319-58540-6
Library of Congress Control Number: 2017946473
Mathematics Subject Classification (2010): 46-01, 47-01, 11N05, 20F69, 22B05, 35J25, 35P10, 35P20,
37A99, 47A60
© Springer International Publishing AG 2017

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein
or for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Printed on acid-free paper
This Springer imprint is published by Springer Nature

The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Believe us, we also asked ourselves what could be the rationale for ‘Yet an-
other book on functional analysis’.(1) Little indeed can justify this beyond
our own enjoyment of the beauty and power of the topics introduced here.
Functional analysis might be described as a part of mathematics where
analysis, topology, measure theory, linear algebra, and algebra come together
to create a rich and fascinating theory. The applications of this theory are
then equally spread throughout mathematics (and beyond).
We follow some fairly conventional journeys, and have of course been in-
fluenced by other books, most notably that of Lax [59]. While developing the
theory we include reminders of the various areas that we build on (in the
appendices and throughout the text) but we also reach some fairly advanced
and diverse applications of the material usually called functional analysis that
often do not find their place in a course on that topic.
The assembled material probably cannot be covered in a year-long course,
but has grown out of several such introductory courses taught at the Eid-
genössische Technische Hochschule Zürich by the first named author, with
a slightly different emphasis on each occasion. Both the student and (espe-
cially) the lecturer should be brave enough to jump over topics and pick the
material of most interest, but we hope that the student will eventually be
sufficiently interested to find out what happens in the material that was not
covered initially. The motivation for the topics discussed may by found in
Chapter 1.
Notation and Conventions
The symbols N = {1, 2, . . . }, N0 = N ∪ {0}, and Z denote the natural
numbers, non-negative integers and integers; Q, R, C denote the rational
numbers, real numbers and complex numbers. The real and imaginary parts
of a complex number are denoted by x = ℜ(x + iy) and y = ℑ(x + iy).
For functions f, g defined on a set X we write f = O(g) or f ≪ g if there
is a constant A > 0 with kf (x)k 6 Akg(x)k for all x ∈ X. When the implied
constant A depends on a set of parameters A, we write f = OA (g) or f ≪A g
v
vi Preface
(but we may also forget the index if the set of parameters will not vary at
all in the discussion). A sequence a1 , a2 , . . . in any space will be denoted (an )
(or (an )n if we wish to emphasize the index variable of the sequence). For
two C-valued functions f, g defined on Xr{x0 } for a topological space X
containing x0 we write f = o(g) as x → x0 if limx→x0 fg(x) (x)
= 0. This definition
includes the case of sequences by letting X = N ∪ {∞} and x0 = ∞ with
the topology of the one-point compactification. Additional specific notation
introduced throughout the text is collected in an index of notation on p. 600.
Prerequisites
We will assume throughout that the reader is familiar with linear algebra
and quite frequently that she is also familiar with finite-dimensional real
analysis and complex analysis in one variable. Further background and con-
ventions in topology and measure theory are collected in two appendices, but
let us note that throughout compact and locally compact spaces are implicitly
assumed to be Hausdorff.
Organisation
There are 402 exercises in the text, 221 of these with hints in an appendix,
all of which contribute to the reader’s understanding of the material. A small
number are essential to the development (of the ideas in the section or of later
theories); these are denoted ‘Essential Exercise’ to highlight their significance.
We indicate the dependencies between the various chapters in the Leitfaden
overleaf and in the guide to the chapters that follows it.
Acknowledgements
We are thankful for various discussions with Menny Aka, Uri Bader,
Michael Björklund, Marc Burger, Elon Lindenstrauss, Shahar Mozes, René
Rühr, Akshay Venkatesh, and Benjamin Weiss on some of the topics presen-
ted here. We also thank Emmanuel Kowalski for making available his notes
on spectral theory and allowing us to raid them. We are grateful to sev-
eral people for their comments on drafts of sections, including Menny Aka,
Manuel Cavegn, Rex Cheung, Anthony Flatters, Maxim Gerspach, Tommaso
Goldhirsch, Thomas Hille, Guido Lob, Manuel Lüthi, Clemens Macho, Alex
Maier, Andrea Riva, René Rühr, Lukas Ruosch, Georg Schildbach, Samuel
Stark, Andreas Wieser, Philipp Wirth, and Gao Yunting. Special thanks are
due to Roland Prohaska, who proofread the whole volume in four months.
Needless to say, despite these many helpful eyes, some typographical and
other errors will remain — these are of course solely the responsibility of the
authors.
The second named author also thanks Grete for her repeated hospitality
which significantly aided this book’s completion, and thanks Saskia and Toby
for doing their utmost to prevent it.
Manfred Einsiedler, Zürich
Thomas Ward, Leeds
2nd April 2017
Leitfaden
Banach Spaces 2
Hilbert Spaces & Fourier Series
4 5 Sobolev Spaces
& Dirichlet Problem
Completeness
6
Dual Spaces 7
Compact Operators
& Weyl’s Law
Weak* Compactness
& Locally Convex Spaces 8
Unitary Operators 9 Banach Algebras

& Fourier Transform 10 & Pontryagin Duals
Haar Measure,
Amenability, 11
& Property (T)
Spectral Theorems
12
& Pontryagin Duality
14 13
Prime Number Theorem Unbounded Operators
vii
viii Preface
Guide to Chapters
Chapter 1 is mostly motivational in character and can be skipped for the
theoretical discussions later.
Chapter 4 has a somewhat odd role in this volume. On the one hand
it presents quite central theorems for functional analysis that also influence
many of the definitions later in the volume, but on the other hand, by chance,
the theorems are not crucial for our later discussions.
The dotted arrows in the Leitfaden indicate partial dependencies. Chapter 6
consists of two parts; the discussion of compact groups depends only on
Chapter 3 while the material on Laplace eigenfunctions also builds on mater-
ial from Chapter 5. The discussion of the adjoint operator and its properties
in Chapter 6 is crucial for the spectral theory in Chapters 11, 12, and 13.
Moreover, one section in Chapter 8 builds on and finishes our discussion of
Sobolev spaces in Chapters 5 and 6. Finally, some of Chapter 11 needs the
discussion of Haar measures in the first section of Chapter 10.
With these comments and the Leitfaden it should be easy to design many
different courses of different lengths focused around the topic of Functional
Analysis.
Contents
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 From Even and Odd Functions to Group Representations . . . . 1
1.2 Partial Differential Equations and the Laplace Operator . . . . . 5
1.2.1 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3 The Mantegna Fresco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 What is Spectral Theory? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 The Prime Number Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Norms and Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1 Norms and Semi-Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.2 Semi-Norms and Quotient Norms . . . . . . . . . . . . . . . . . . . 21
2.1.3 Isometries are Affine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.4 A Comment on Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.1 Proofs of Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.2 The Completion of a Normed Vector Space . . . . . . . . . . 36
2.2.3 Non-Compactness of the Unit Ball . . . . . . . . . . . . . . . . . . 38
2.3 The Space of Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.1 The Arzela–Ascoli Theorem . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.2 The Stone–Weierstrass Theorem . . . . . . . . . . . . . . . . . . . . 42
2.3.3 Equidistribution of a Sequence . . . . . . . . . . . . . . . . . . . . . 48
2.3.4 Continuous Functions in Lp Spaces . . . . . . . . . . . . . . . . . 51
2.4 Bounded Operators and Functionals . . . . . . . . . . . . . . . . . . . . . . 55
2.4.1 The Norm of Continuous Functionals on C0 (X) . . . . . . 60
2.4.2 Banach Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.5 Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.5.1 The Volterra Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.5.2 The Sturm–Liouville Equation . . . . . . . . . . . . . . . . . . . . . 66
ix
x Contents
2.6 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3 Hilbert Spaces, Fourier Series, Unitary Representations . . 71

3.1 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.1.1 Definitions and Elementary Properties . . . . . . . . . . . . . . 71
3.1.2 Convex Sets in Uniformly Convex Spaces . . . . . . . . . . . . 75
3.1.3 An Application to Measure Theory . . . . . . . . . . . . . . . . . 83
3.2 Orthonormal Bases and Gram–Schmidt . . . . . . . . . . . . . . . . . . . 86
3.2.1 The Non-Separable Case . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.3 Fourier Series on Compact Abelian Groups . . . . . . . . . . . . . . . . 91
3.4 Fourier Series on Td . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.4.1 Convolution on the Torus . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.4.2 Dirichlet and Fejér Kernels . . . . . . . . . . . . . . . . . . . . . . . . 99
3.4.3 Differentiability and Fourier Series . . . . . . . . . . . . . . . . . . 104
3.5 Group Actions and Representations . . . . . . . . . . . . . . . . . . . . . . . 106
3.5.1 Group Actions and Unitary Representations . . . . . . . . . 107
3.5.2 Unitary Representations of Compact Abelian Groups . 110
3.5.3 The Strong (Riemann) Integral. . . . . . . . . . . . . . . . . . . . . 111
3.5.4 The Weak (Lebesgue) Integral . . . . . . . . . . . . . . . . . . . . . 113
3.5.5 Proof of the Weight Decomposition . . . . . . . . . . . . . . . . . 115
3.5.6 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.6 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4 Uniform Boundedness and the Open Mapping Theorem . . 121

4.1 Uniform Boundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.1.1 Uniform Boundedness and Fourier Series . . . . . . . . . . . . 123
4.2 The Open Mapping and Closed Graph Theorems . . . . . . . . . . . 126
4.2.1 Baire Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.2.2 Proof of the Open Mapping Theorem . . . . . . . . . . . . . . . 128
4.2.3 Consequences: Bounded Inverses and Closed Graphs . . 130
4.3 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5 Sobolev Spaces and Dirichlet’s Boundary Problem . . . . . . . . 135

5.1 Sobolev Spaces and Embedding on the Torus . . . . . . . . . . . . . . 135
5.1.1 L2 Sobolev Spaces on Td . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.1.2 The Sobolev Embedding Theorem on Td . . . . . . . . . . . . 138
5.2 Sobolev Spaces on Open Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.2.2 Restriction Operators and Traces . . . . . . . . . . . . . . . . . . . 146
5.2.3 Sobolev Embedding in the Interior . . . . . . . . . . . . . . . . . . 149
5.3 Dirichlet’s Boundary Value Problem and Elliptic Regularity . 152
5.3.1 The Semi-Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.3.2 Elliptic Regularity for the Laplace Operator . . . . . . . . . 155
5.3.3 Dirichlet’s Boundary Value Problem . . . . . . . . . . . . . . . . 160
5.4 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Contents xi
6 Compact Self-Adjoint Operators, Laplace Eigenfunctions . 167

6.1 Compact Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.1.1 Integral Operators are Often Compact . . . . . . . . . . . . . . 170
6.2 Spectral Theory of Self-Adjoint Compact Operators. . . . . . . . . 174
6.2.1 The Adjoint Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.2.2 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.2.3 Proof of the Spectral Theorem . . . . . . . . . . . . . . . . . . . . . 178
6.2.4 Variational Characterization of Eigenvalues . . . . . . . . . . 181
6.3 Trace-Class Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.4 Eigenfunctions for the Laplace Operator . . . . . . . . . . . . . . . . . . . 196
6.4.1 Right Inverse and Compactness on the Torus . . . . . . . . 197
6.4.2 A Self-Adjoint Compact Right Inverse . . . . . . . . . . . . . . 198
6.4.3 Eigenfunctions on a Drum . . . . . . . . . . . . . . . . . . . . . . . . . 199
6.4.4 Weyl’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
7 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

7.1 The Hahn–Banach Theorem and its Consequences . . . . . . . . . . 209
7.1.1 The Hahn–Banach Lemma and Theorem . . . . . . . . . . . . 209
7.1.2 Consequences of the Hahn–Banach Theorem . . . . . . . . . 212
7.1.3 The Bidual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
7.1.4 An Application of the Spanning Criterion . . . . . . . . . . . 214
7.2 Banach Limits, Amenable Groups, Banach–Tarski . . . . . . . . . . 217
7.2.1 Banach Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.2.2 Amenable Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.2.3 The Banach–Tarski Paradox . . . . . . . . . . . . . . . . . . . . . . . 223
7.3 The Duals of Lpµ (X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.3.1 The Dual of L1µ (X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.3.2 The Dual of Lpµ (X) for p > 1 . . . . . . . . . . . . . . . . . . . . . . 230
7.3.3 Riesz–Thorin Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 233
7.4 Riesz Representation: The Dual of C(X) . . . . . . . . . . . . . . . . . . 239
7.4.1 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.4.2 Totally Disconnected Compact Spaces . . . . . . . . . . . . . . 240
7.4.3 Compact Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
7.4.4 Locally Compact σ-Compact Metric Spaces . . . . . . . . . . 246
7.4.5 Continuous Linear Functionals on C0 (X) . . . . . . . . . . . . 248
7.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
8 Locally Convex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

8.1 Weak Topologies and the Banach–Alaoglu Theorem . . . . . . . . . 253
8.1.1 Weak* Compactness of the Unit Ball . . . . . . . . . . . . . . . 256
8.1.2 More Properties of the Weak and Weak* Topologies . . 257
8.1.3 Analytic Functions and the Weak Topology . . . . . . . . . . 260
8.2 Applications of Weak* Compactness . . . . . . . . . . . . . . . . . . . . . . 261
8.2.1 Equidistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
xii Contents
8.2.2 Elliptic Regularity for the Laplace Operator . . . . . . . . . 270

8.2.3 Elliptic Regularity at the Boundary . . . . . . . . . . . . . . . . . 278
8.3 Topologies on the space of bounded operators . . . . . . . . . . . . . . 290
8.4 Locally Convex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
8.5 Distributions as Generalized Functions . . . . . . . . . . . . . . . . . . . . 296
8.6 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
8.6.1 Extreme Points and the Krein–Milman Theorem . . . . . 301
8.6.2 Choquet’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
8.7 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
9 Unitary Operators and Flows, Fourier Transform . . . . . . . . . 313

9.1 Spectral Theory of Unitary Operators . . . . . . . . . . . . . . . . . . . . . 313
9.1.1 Herglotz’s Theorem for Positive-Definite Sequences . . . 314
9.1.2 Cyclic Representations and the Spectral Theorem . . . . 316
9.1.3 Spectral Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
9.1.4 Functional Calculus for Unitary Operators . . . . . . . . . . . 323
9.1.5 An Application of Spectral Theory to Dynamics . . . . . . 326
9.2 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
9.2.1 The Fourier Transform on L1 (Rd ) . . . . . . . . . . . . . . . . . . 331
9.2.2 The Fourier Transform on L2 (Rd ) . . . . . . . . . . . . . . . . . . 337
9.2.3 The Fourier Transform, Smoothness, Schwartz Space . . 340
9.2.4 The Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . . . . . 342
9.3 Spectral Theory of Unitary Flows . . . . . . . . . . . . . . . . . . . . . . . . 344
9.3.1 Positive-Definite Functions; Cyclic Representations . . . 344
9.3.2 The Case G = Rd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
9.3.3 Stone’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
9.4 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
10 Locally Compact Groups, Amenability, Property (T) . . . . . 353

10.1 Haar Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
10.2 Amenable Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
10.2.1 Definitions and Main Theorem . . . . . . . . . . . . . . . . . . . . . 362
10.2.2 Proof of Theorem 10.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
10.2.3 A More Uniform Følner Set . . . . . . . . . . . . . . . . . . . . . . . . 371
10.2.4 Further Equivalences and Properties . . . . . . . . . . . . . . . . 373
10.3 Property (T) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
10.3.1 Definitions and First Properties . . . . . . . . . . . . . . . . . . . . 375
10.3.2 Main Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
10.3.3 Proof of Každan’s Property (T), Connected Case . . . . . 378
10.3.4 Proof of Každan’s Property (T), Discrete Case . . . . . . . 384
10.3.5 Iwasawa Decomposition and Geometry of Numbers . . . 392
10.4 Highly Connected Networks: Expanders . . . . . . . . . . . . . . . . . . . 400
10.4.1 Constructing an Explicit Expander Family . . . . . . . . . . . 406
10.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
Contents xiii
11 Banach Algebras and the Spectrum . . . . . . . . . . . . . . . . . . . . . . . 409

11.1 The Spectrum and Spectral Radius . . . . . . . . . . . . . . . . . . . . . . . 409
11.1.1 The Geometric Series and its Consequences . . . . . . . . . . 411
11.1.2 Using Cauchy Integration . . . . . . . . . . . . . . . . . . . . . . . . . 413
11.2 C ∗ -algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
11.3 Commutative Banach Algebras and their Gelfand Duals . . . . . 418
11.3.1 Commutative Unital Banach Algebras . . . . . . . . . . . . . . 419
11.3.2 Commutative Banach Algebras without a Unit . . . . . . . 421
11.3.3 The Gelfand Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
11.3.4 The Gelfand Transform for Commutative C ∗ -algebras . 423
11.4 Locally Compact Abelian Groups . . . . . . . . . . . . . . . . . . . . . . . . . 425
11.4.1 The Pontryagin Dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
11.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
12 Spectral Theory and Functional Calculus . . . . . . . . . . . . . . . . . 433

12.1 Definitions and Basic Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
12.1.1 Decomposing the Spectrum . . . . . . . . . . . . . . . . . . . . . . . . 433
12.1.2 The Numerical Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
12.1.3 The Essential Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
12.2 The Spectrum of a Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
12.2.1 The Correct Upper Bound for the Summing Operator . 439
12.2.2 The Spectrum of S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
12.2.3 No Eigenvectors on the Tree . . . . . . . . . . . . . . . . . . . . . . . 442
12.3 Main Goals: The Spectral Theorem and Functional Calculus . 443
12.4 Self-Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
12.4.1 Continuous Functional Calculus . . . . . . . . . . . . . . . . . . . . 447
12.4.2 Corollaries to the Continuous Functional Calculus . . . . 451
12.4.3 Spectral Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
12.4.4 The Spectral Theorem for Self-Adjoint Operators . . . . . 454
12.4.5 Consequences for Unitary Representations . . . . . . . . . . . 458
12.5 Commuting Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
12.6 Spectral Measures and the Measurable Functional Calculus . . 461
12.6.1 Non-Diagonal Spectral Measures . . . . . . . . . . . . . . . . . . . 461
12.6.2 The Measurable Functional Calculus . . . . . . . . . . . . . . . . 462
12.7 Projection-Valued Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
12.8 Locally Compact Abelian Groups and Pontryagin Duality . . . 473
12.8.1 The Spectral Theorem for Unitary Representations . . . 474
12.8.2 Characters Separate Points . . . . . . . . . . . . . . . . . . . . . . . . 477
12.8.3 The Plancherel Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
12.8.4 Pontryagin Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
12.9 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
xiv Contents
13 Self-Adjoint and Symmetric Operators . . . . . . . . . . . . . . . . . . . 487

13.1 Examples and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
13.2 Operators of the Form T ∗ T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
13.3 Self-Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
13.4 Symmetric Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
13.4.1 The Friedrichs Extension . . . . . . . . . . . . . . . . . . . . . . . . . . 499
13.4.2 Cayley Transform and Deficiency Indices . . . . . . . . . . . . 500
13.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
14 The Prime Number Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503

14.1 Two Reformulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
14.2 The Selberg Symmetry Formula and Banach Algebra Norm . . 507
14.2.1 Dirichlet Convolution and Möbius Inversion. . . . . . . . . . 507
14.2.2 The Selberg Symmetry Formula . . . . . . . . . . . . . . . . . . . . 509
14.2.3 Measure-Theoretic Reformulation . . . . . . . . . . . . . . . . . . 514
14.2.4 A Density Function and the Continuity Bound . . . . . . . 517
14.2.5 Mertens’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
14.2.6 Completing the Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
14.3 Non-Trivial Spectrum of the Banach Algebra . . . . . . . . . . . . . . . 523
14.4 Trivial Spectrum of the Banach Algebra . . . . . . . . . . . . . . . . . . . 524
14.5 Primes in Arithmetic Progressions . . . . . . . . . . . . . . . . . . . . . . . . 526
14.5.1 Non-Vanishing of Dirichlet L-function at 1 . . . . . . . . . . . 529
Appendix A: Set Theory and Topology . . . . . . . . . . . . . . . . . . . . . . . 537

A.1 Set Theory and the Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . 537
A.2 Basic Definitions in Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
A.3 Inducing Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
A.4 Compact Sets and Tychonoff’s Theorem . . . . . . . . . . . . . . . . . . . 545
A.5 Normal Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
Appendix B: Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551

B.1 Basic Definitions and Measurability . . . . . . . . . . . . . . . . . . . . . . . 551
B.2 Properties of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
B.3 The p-Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
B.4 Near-Continuity of Measurable Functions . . . . . . . . . . . . . . . . . . 558
B.5 Signed Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
Hints for Selected Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
General Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
Chapter 1
Motivation
†
We start by discussing some seemingly disparate topics that are all intim-
ately linked to notions from functional analysis. Some of the topics have
been important motivations for the development of the theory that came
to be called functional analysis in the first place. We hope that the variety
of topics discussed in Sections 1.1–1.4 and those mentioned in Section 1.5
will help to give some insight into the central role of functional analysis in
mathematics.
1.1 From Even and Odd Functions to Group

Representations
We recall the following elementary notions of symmetry and anti-symmetry

for functions. A function f : R → R is said to be even if f (−x) = f (x) for
all x ∈ R, and odd if f (−x) = −f (x) for all x ∈ R. Every function f : R → R
can be split into an even and an odd component, since
f (x)+f (−x)
f (x) = 2 + f (x)−f
2
(−x)
. (1.1)
| {z } | {z }
the even part the odd part
Exercise 1.1. Is the decomposition of a function into odd and even parts in (1.1) unique?
That is, if f = e + o with e even and o odd, is e(x) = f (x)+f
2
(−x)
?
As one might guess, behind the definition of even and odd functions and
the decomposition in (1.1), is the group Z/2Z = {0, 1} acting on R via the
map x 7→ (−1)ℓ x for ℓ ∈ Z/2Z. Here we are using ℓ ∈ Z as a shorthand for
the coset ℓ + 2Z ∈ Z/2Z.
† Chapter 1 is atypical for this book. The reader may, and the lecturer should, skip this
chapter or return to it later, as convenient.
© Springer International Publishing AG 2017 1

M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications,
Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_1
2 1 Motivation
In order to generalize this observation, recall that an action of a group G

.
on a set X is a map G × X → X, written (g, x) 7→ g x, with the proper-
.. . .
ties g (h x) = (gh) x for all g, h ∈ G and x ∈ X, and e x = x for all x ∈ X,
where e ∈ G is the identity element. We will sometimes write G ý
X for an
action of G on X.
Having associated (in an informal way for the moment) the decomposition
of a function into odd and even parts with the action of the group Z/2Z on R,
the notion of group action suggests many generalizations of the decomposi-
tion. For this we note that an action of a group G on a space X gives rise
to a linear action on the space of functions on X by the formula (g, f ) 7→ f g
.
where f g (x) = f (g −1 x) for all x ∈ X, g ∈ G, and functions f on X.
.
Exercise 1.2. Show that (ℓ1 , ℓ2 ) (x1 , x2 ) = (−1)ℓ1 x1 , (−1)ℓ2 x2 for ℓ1 , ℓ2 ∈ Z/2Z
and (x1 , x2 ) ∈ R2 defines an action of (Z/2Z)2 on R2 . Show that every real- or complex-
valued function on R2 can be decomposed uniquely into a sum of four functions that are
even in both variables; even in x1 and odd in x2 ; odd in x1 and even in x2 ; and odd in
both variables.
In order to define another action on R2 we write

cos φ − sin φ
k(φ) =
sin φ cos φ
for the matrix of anti-clockwise rotation on R2 by the angle φ ∈ R. We

also let T = R/Z be the one-dimensional circle group or 1-torus, and define
the action of φ ∈ T on R2 by the rotation by k(2πφ). Once again we will
sometimes write t as a shorthand for the coset t + Z ∈ R/Z; in particular
the interval [0, 1) may be identified with T using addition modulo 1. We note
that we can also make T into a topological group by declaring cosets to be
close if their representatives can be chosen to be close in R or equivalently
by using the metric d(t1 + Z, t2 + Z) = minn∈Z |t1 − t2 + n| for t1 , t2 ∈ R. In
studying any situation with rotational symmetry on R2 one is naturally led
to this action. What is the corresponding decomposition of functions for this
action?
Clearly one distinguished class of functions is given by the functions in-
variant under rotation. That is, functions satisfying f (v) = f (k(2πφ)v) for
all φ ∈ T. The graph of such a function is the surface obtained by rotating a
graph of a real- or complex-valued function on [0, ∞) about the z-axis.
Let us postpone the answers to the above questions and instead consider
a finite analogue to the problem. Fix some integer q > 1 and define the
group G = Z/qZ, which acts on R2 by letting ℓ + qZ ∈ G act by rotation by
the angle 2πℓ 2πℓ 2π
q using k( q ). To simplify the notation set K = k( q ) and write
the action as G×R2 ∋ (ℓ+qZ, v) 7→ K ℓ v (which is well-defined since K q = I).
This simplification in the notation helps to clarify the underlying structure,
and reflects one of the themes of functional analysis: thinking of progressively
more complicated objects (numbers, vectors, functions, operators) as ‘points’
in a larger space allows the structures to be seen more clearly.
1.1 From Even and Odd Functions to Group Representations 3
We say that a complex-valued function on R2 has weight n for this action

if f (K ℓ v) = e2πinℓ/q f (v) for every v ∈ R2 and ℓ + qZ ∈ G (or equivalently
only for ℓ = 1). We now generalize the formula (1.1) to the case of the finite
rotation group considered here. In fact, using the shorthand ζ = e2πi/q we
define for a complex-valued function f on R2 the functions
q−1
X
fn (v) = 1
q ζ −nℓ f (K ℓ v) (1.2)
ℓ=0
for every v ∈ R2 and n = 0, . . . , q − 1. Since

q−1
X q
X ′
fn (Kv) = 1
q ζ −nℓ f (K ℓ+1 v) = 1
q ζ −n(m−1) f (K ℓ v) = ζ n fn (v)
ℓ=0 m=1
R2 , we see that fn has weight n. By the geometric series formula,

for every v ∈P
we see that q−1n=0 ζ
−nℓ
equals q for ℓ = 0 and 0 for ℓ = 1, . . . , q − 1, so
q−1 q−1 q−1 X
q−1
!
X X X
−nℓ ℓ −nℓ
fn (v) = 1
q ζ f (K v) = 1
q ζ f (K ℓ v) = f (v)
n=0 n,ℓ=0 ℓ=0 n=0
for every v ∈ R2 . Therefore, f can be written as a finite sum of functions of

weight n = 0, . . . , q − 1.
Let us return now to the case of the group SO2 (R) of all rotations k(2πt)
for t ∈ T. To guess what the classes of functions should be, we note that all
of the symmetries of functions considered above can be phrased naturally in
terms of the possible continuous group homomorphisms of the acting group
to the group S1 = {z ∈ C | |z| = 1}. It is easy to show (see Exercise 1.3 for
the third, non-trivial, statement) that
(1) any homomorphism Z/2Z → S1 has the form ℓ 7→ (±1)ℓ ,
(2) any homomorphism Z/qZ → S1 has the form ℓ 7→ e2πinℓ/q = ζ nℓ for
some n ∈ {0, . . . , q − 1}, and
(3) any continuous homomorphism T → S1 has the form φ 7→ χn (φ) = e2πinφ
for some n ∈ Z, so there are infinitely many such homomorphisms, and
they are naturally parameterized by the integers.
For any topological group G, we call the continuous homomorphisms from G
to S1 the (unitary) characters of G.
Exercise 1.3. Show that any character of T has the form claimed in (3) above.
Notice that each of the characters in (1) and (2) corresponds to ex-
actly one type of function in the decompositions of functions discussed
above. Generalizing this correspondence, we turn to (3) and say that a
complex-valued function f : R2 → C has weight n (is of type n) if it sat-
isfies f (k(2πφ)v) = χn (φ)f (v) for all φ ∈ T and v ∈ R2 .
4 1 Motivation
One might now guess — and we will see in Chapter 3 that this is indeed
the case — that any reasonable function f : R2 → C can be written as a
linear combination X
f= fn (1.3)
n∈Z
where fn has weight n. However, in contrast to (1.1) this is an infinite sum,

so we are no longer talking about a purely algebraic phenomenon. The de-
composition (1.3) — its existence and its properties — lies both in algebra
and in analysis. We therefore have to become concerned both with the algeb-
raic structure and with questions of convergence. Depending on the notion of
convergence used, the class of reasonable functions turns out to vary. These
classes of reasonable functions are in fact important examples of Banach
spaces, which will be defined in Chapter 2.
The discussion above on decompositions into sums of functions of different
weights will later be part of the treatment of Fourier analysis (see Chapter 3).
For this we will initially study the mathematically simpler situation of the
action of T on T by translation, (x, y) 7→ x + y for x, y ∈ T. Adjusting
the definitions above appropriately, we say that a function f : T → C has
weight n ∈ Z if and only if f is a multiple of χn itself. We therefore seek, for
a reasonable function f : T → C, constants cn for n ∈ Z with
X
f= cn χn . (1.4)
n∈Z
The right-hand side of (1.4) is called the Fourier series of f . We will see later
that it is relatively straightforward (at least in the abstract sense) to find the
Fourier coefficients cn via the identity
Z
cn = f (x)χn (x) dx
T
for all n ∈ Z.
Fourier series arise naturally in many day-to-day applications. A string
or a wind instrument playing a note is producing a periodic pressure wave
with a certain frequency. The tone humans hear usually corresponds to this
frequency, which is called the fundamental in music theory. There are also
higher frequencies, usually integer multiples of the fundamental frequency,
appearing in the wave. These frequences are called harmonics and the ratio
between the Fourier coefficients of the harmonics and the Fourier coefficient
of the fundamental make the distinctive sound of different instruments (for
example, the flute and the clarinet) when playing the same fundamental note.
Returning to our discussion of symmetries for functions on R2 we will show
similarly that for a reasonable function f : R2 → C the function
Z
fn (v) = χn (φ)f (k(2πφ)v) dφ
T
1.2 Partial Differential Equations and the Laplace Operator 5
for n ∈ Z has weight n (compare this with (1.2)), and that (1.3) holds.
R
Exercise 1.4. Show that if the function fn (v) = T χn (φ)f (k(2πφ)v) dφ is well-defined
(say the integral exists for almost every v ∈ R2 ), then it has weight n.
To summarize, we will introduce classes of functions (which will be ex-

amples of Banach spaces), and determine whether for functions in these
classes the Fourier series (1.4) or the weight decomposition (1.3) converges,
and in what sense the convergence does or does not happen.
For functions f : R3 → C one can generalize the discussion above in
many different ways, by considering the actions of various different groups as
follows:
• Z/2Z, giving the familiar generalization of even and odd functions.
• T∼= SO2 (R) acting by rotations in the x, y-plane about the z-axis. This
gives a generalization of our discussion of functions R2 → C, and we will
be able to treat this case in a similar way to the two-dimensional case.
• SO3 (R), the full group of orientation-preserving rotations of R3 .
The last case in this list is more difficult to analyze than any of the cases dis-
cussed above. The additional complications arise since SO3 (R) is not abelian.
In fact, the group SO3 (R) is simple, and as a result there are no non-trivial
continuous homomorphisms SO3 (R) → S1 , so this cannot be used to define
classes of functions in the same way. The case of SO3 (R) requires the theory
of harmonic analysis and unitary representations of compact groups. We will
not reach these important topics here, but will lay the ground for them and
refer to the treatment in Folland [32, Ch. 5] or [26].
Although we have used actions of geometric origin to motivate the dis-
cussion above, the decompositions described hold more generally for general
linear actions of finite abelian groups on vector spaces (often called group rep-
resentations) and also for unitary representations of compact abelian groups
(with T being the main example of a compact abelian group) on Hilbert
spaces. Hilbert spaces are Banach spaces that are equipped with an inner
product, and will be introduced in Chapter 3 where we will also discuss unit-
ary representations for the first time.
1.2 Partial Differential Equations and the Laplace

Operator
There is no need to motivate the study of differential equations, as they are of

central importance across all sciences concerned with measurable quantities
that change with respect to other variables of the system studied. Even the
simplest ordinary differential equations can lead directly to the study of in-
tegral operators, which may be analyzed using tools from functional analysis.
The reader familiar with the theorem of Picard and Lindelöf on existence and
6 1 Motivation
uniqueness of solutions to certain initial value problems and its proof will not
be surprised by this connection. We refer to Section 2.4 for more on this.
However, here we would like to discuss two particular partial differential
equations. As we will see later, the mathematical background needed for
this, most of which comes from functional analysis, is much more interesting
(meaning difficult) than that needed for ordinary differential equations. One
of the objectives of this book is to make the informal discussion in this section
more formal and rigorous. We will cover this topic in Chapters 5 and 6 (apart
from a technical point, which we resolve in Section 8.2).
In both of the partial differential equations that we will discuss, we will
need to express the difference between the value of a function at a point and
its values in a neighbourhood of the point. To make the resulting equations
more amenable for study one uses an infinitesimal version of this difference,
which brings into the picture(2) the Laplace operator ∆ (also sometimes de-
noted by ∇2 ) defined by
∂2f ∂ 2f
∆f = 2 + ··· + 2 (1.5)
∂x1 ∂xd
for a smooth function f : Rd → R because of the following simple observation.
Proposition 1.5 (Laplace and neighbourhood averages). Let U ⊆ Rd

be an open set, and suppose that f : U → R is a C 2 function. Then
Z
1
lim f (y) − f (x) dy = c∆f (x)
r→0 r2 vol(Br (x)) B (x)
r
for any x ∈ U , where dy denotes integration with respect to the Lebesgue

1
measure on the r-ball Br = {y ∈ Rd | kyk < r} ⊆ Rd and c = 2(d+2) .
Proof. Suppose for simplicity of notation that x = 0, and apply Taylor

approximation to obtain
d
1 X ∂ 2f
f (y) = f (0) + f ′ (0)y + (0)yi yj + o kyk2 ,
2 i,j=1 ∂xi ∂xj
as y → 0, where f ′ (0) is the total derivative of f at 0, and we use the

notation o(·) from p. vi. Now in the integral over the r-ball Br the linear
terms cancel out due to the symmetry of the ball. The same argument applies
to the mixed quadratic terms. Thus we are left with
Z d Z
1 X ∂2f
f (y) dy = vol(Br )f (0) + (0) yi2 dy + vol(Br ) o r2 . (1.6)
Br 2 i=1 ∂x2i Br
R R
Next notice that Br
yi2 dy = Br
yj2 dy for all 1 6 i, j 6 d and
Z Z
kyk2 dy = r2 kzk2 rd dz
Br B1
using the substitution y = rz. It follows that

Z d Z Z Z
1X 1 rd+2
yi2 dy = yj2 dy = kyk2 dy = kzk2 dz .
Br d j=1 Br d Br d B
| 1 {z }
=:C
Combining this with (1.6) gives

Z
1 1 1 1 rd+2
2
(f (y) − f (0)) dy = ∆f (0) C + o(1)
r vol(Br ) Br r2
vol(Br ) 2 d
C
= ∆f (0) + o(1).
2d vol(B1 )
| {z }
=c
For completeness, we calculate the value of c using d-dimensional spherical

coordinates. Every point z ∈ Rd is of the form z = rv for some r > 0 and
v ∈ Sd−1 = {w ∈ Rd | kwk = 1}.
Using this substitution we have

Z Z 1
1
vol(B1 ) = rd−1 dr dv = vol(Sd−1 ),
Sd−1 0 d
where the integration with respect to v uses the (d − 1)-dimensional volume

measure on the sphere Sd−1 . Similarly,
Z Z Z 1
1
C= kzk2 dz = rd+1 dr dv = vol(Sd−1 ).
B1 Sd−1 0 d + 2
Thus ✘✘
C
1
d+2✘ ✘✘
vol(Sd−1
) 1
c= = ✘ ✘✘= .
2d vol(B1 ) 2d 1
✁ d✘ ✘
vol(S d−1
) 2(d + 2)
✁

1.2.1 The Heat Equation
The heat equation describes how temperatures in a region U ⊆ Rd (represent-

ing a physical medium) evolve given an initial temperature distribution and
some prescribed behaviour of the heat at the boundary ∂U . Inside the me-
dium we expect the flow of heat to be proportional to the difference between
8 1 Motivation
the temperature at each point and the temperature in a neighbourhood of the

point. If we write u(x, t) for the temperature of the medium at the point x
at the time t, then this suggests a relationship
∂u
= |constant
{z } ∆x u, (1.7)
∂t
>0
where
∂2u ∂2u
∆x u = ∆u = 2 + ···+ 2
∂x1 ∂xd
is the Laplace operator with respect to the space variables x1 , . . . , xd only.
Equation (1.7) is called the heat equation. If we take the physical interpreta-
tion of this equation for granted, then we can use it to give heuristic explan-
ations of some of the mathematical phenomena that arise.
Suppose first that we prescribe a time-independent temperature distribu-
tion at the boundary ∂U of the medium U , and then wait until the system
has settled into thermal equilibrium. Experience (that is, physical intuition)
suggests that in the long run (as time goes to infinity) the temperature distri-
bution inside U will reach a stable (time-independent) configuration. That is,
for any prescribed boundary value b : ∂U → R we expect the heat equation
on U to have a time-independent solution. More formally, we expect there to
be a function u : U → R satisfying

∆u = 0
(1.8)
u|∂U = b.
The boundary value problem (1.8) is the Dirichlet boundary value problem,
the partial differential equation ∆u = 0 is called the Laplace equation, and its
solutions are called harmonic functions. Proving what the physical intuition
suggests, namely that the Dirichlet boundary value problem does indeed have
a (smooth) solution, will take us into the theory of Sobolev spaces. We will
prove the existence of smooth solutions for the Dirichlet boundary value
problem in Chapter 5 (and Section 8.2).
Leaving the Dirichlet problem to one side for now, we continue with the
heat equation. Motivated by the methods of linear ordinary differential equa-
tions and their initial value problems, we would like to know how we can find
other solutions to the partial differential equation while ignoring the bound-
ary values. A simple kind of solution to seek would be those with separated
variables, that is solutions of the form
u(x, t) = F (x)G(t)
with x ∈ U ⊆ Rd and t ∈ R. The heat equation would then imply that

∂u
F (x)G′ (t) = = c(∆F (x))G(t)
∂t
and so (we may as well choose all physical constants to make c = 1) the
quotient
G′ (t) ∆F (x)
=
G(t) F (x)
is independent of x and of t, and therefore is a constant (as this is not really
a proof, we will not worry about the division by a quantity that may vanish).
In summary, u(x, t) = F (x)G(t) solves the equation
∂u
= ∆x u
∂t
if G(t) = eλt and ∆F = λF for some constant λ, which one can quickly check
(rigorously). Ignoring for the moment the values of F on the boundary ∂U , it
is easy to find functions with ∆F = λF for any λ ∈ R by using suitable expo-
nential and trigonometric functions. However, these simple-minded solutions
turn out not to be particularly useful. Only those special functions F : U → R
with
∆F = λF inside U
F |∂U = 0
turn out to be useful in the general case. However, it is not clear that such
functions even exist, nor for which values of λ they may exist.
Suppose now that the following non-trivial result — the existence of a
basis of eigenfunctions — (which we will be able to prove in many special
cases in Chapter 6) is known for the region U ⊆ Rd .
P sufficiently nice function f : U → R can be decomposed into a

Claim. Every
sum f = n Fn of functions Fn : U → R satisfying

∆Fn = λn Fn for some λn < 0
Fn |∂U = 0.
We may then solve the partial differential equation

∂u
= ∆x u
∂t
with boundary values
u|∂U×{t} = 0 for all t
u|U×{0} = f
using the principle of superposition to obtain the general solution
X
u(x, t) = Fn (x)eλn t . (1.9)
n
Since λn < 0 for each n > 1, the series (1.9) converges to 0 as t → ∞ if it

is absolutely convergent, in accordance with our physical intuition, since the
boundary condition states that the temperature is kept at 0 at the boundary
10 1 Motivation
of the region for all t > 0. We conclude by mentioning that the claim above
will follow from the study of the spectral theory of an operator, but the
definition of the operator involved will be somewhat indirect.
1.2.2 The Wave Equation
The wave equation describes how an elastic membrane moves. We let u(x, t)
be the vertical position of the membrane at time t above the point with
coordinate x. As the membrane has mass (and hence inertia) our assumption
is that the vertical acceleration — a second derivative of position with respect
to time t — of the membrane at time t above x will be proportional to the
difference between the position of the membrane at that point and at nearby
points. Hence we call
∂2u
= c∆x u (1.10)
∂t2
the wave equation. As in the case of the heat equation, we may as well choose
physical units to arrange that c = 1.
Once more we may argue from physical intuition that the Dirichlet bound-
ary problem for the wave equation always has a solution. Consider a wire loop
above the boundary ∂U (notice that even at this vague level we are imposing
some smoothness: our physical image of a wire loop may be very distorted
but will certainly be piecewise smooth) and imagine a soap film whose edge
is the wire. Then, after some initial oscillations,† we expect the soap film to
stabilize, giving a solution to the Dirichlet boundary value problem in (1.8)
defined by the shape of the wire.
In this context, what is the meaning of eigenfunctions of the Laplace oper-
ator that vanish on the boundary? To see this, imagine a drum whose skin has
the shape U so that the vibrating membrane is fixed along the boundary ∂U ,
which is simply a flat loop. Suppose now that F : U → R satisfies

∆F = λF in U
F |∂U = 0
√
for some λ < 0, then we see that u(x, t) = F (x) cos( −λt) satisfies
∂2 √ √
2
u(x, t) = F (x) (−( −λ)2 ) cos( −λt)
∂t | {z }
=λ
√ √
= λF (x) cos( −λt) = ∆x (F cos( −λt))
and hence solves the wave equation. In other words, if we start the drum
at time t = 0 with the prescribed shape given by the function F , then the
†In the real world there would also be a friction term, and the model for this is a modified
wave equation (which we will not discuss further).
drum will produce a pure tone† of frequency √2π −λ

. This also sheds some
light on which values of λ appear in the claim on p. 9 – namely those that
correspond to the pure frequencies that the drum can produce. For a one-
dimensional drum (that is, a string) these frequencies are easy to understand
(see also Exercise 1.7). However, even for two-dimensional drums the precise
eigenvalues remain mysterious. We will nonetheless be able to count these
shape-specific eigenvalues asymptotically using the method of Weyl from 1911
(see Section 6.4).
Exercise 1.6. Assume that U satisfies the basis of eigenfunctions claim from p. 9, and
that the Dirichlet boundary value problem always has a solution on U .
(a) Combine the above discussions to produce a general procedure to solve the boundary
value problem (no rigorous proof is expected, but find the places that lack rigour)
(
∂u
= ∆u in U × [0, ∞);
∂t
u|∂U ×{t} = b; u|U ×{0} = f.
(b) Repeat (a) for the wave equation.
Exercise 1.7. For a circular string vibrating in one dimension — the wave equation over T
— the basis of eigenfunctions claim is precisely the claim that every nice function can be
represented by its Fourier series. Assuming that this holds, show the basis of eigenfunctions
claim for the domain U = (0, 1) ⊆ R. This relates to the wave equation for the clamped
vibrating string on [0, 1], that is, to the boundary conditions y(0) = y(1) = 0. (In fact the
eigenfunctions are given by x 7→ sin(πnx) with n = 1, 2 . . . ; no rigorous proof is expected,
but explore the connection.)
1.2.3 The Mantegna Fresco
An illustration of how some of the ideas discussed above link together will
be seen in Section 6.4.3, where we discuss eigenfunctions of the Laplacian
on a disk. Here the circular symmetry is exploited as in Section 1.1, and the
eigenfunctions may be used to decompose functions.
A remarkable application of these ideas was made to the problem of recon-
structing a bombed fresco by Andrea Mantegna in a church in Padua. The
damage resulted in the fresco being broken into approximately 88, 000 small
pieces which needed to be reassembled using a black and white photograph;
we refer to Fornasier and Toniolo [35] for the detailed description of how cir-
cular harmonics were used to render the computation required practicable.
The partially reconstructed coloured image was then used to build a coloured
image of the entire fresco.
† This preferred frequency for certain physical objects is part of the phenomena of reson-
ance, and the design of large structures like buildings or bridges tries to prevent resonances
that may lead to reinforcement of oscillations by wind, for example.
12 1 Motivation
1.3 What is Spectral Theory?
As we will see later, the topics considered in Sections 1.1 and 1.2 are connected
to spectral theory.
The goal of spectral theory, at its broadest, might be described as an
attempt to ‘classify’ all linear operators. We will restrict our attention to
Hilbert spaces, which is natural for two reasons. Firstly, it is much easier
than the general case of operators on Banach spaces. Secondly, many of the
most important applications belong to this simpler setting of operators on
Hilbert spaces.
In finite-dimensional linear algebra the classification problem for linear
operators is successfully solved by the theory of eigenvalues, eigenspaces,
minimal and characteristic polynomials, which leads to a canonical normal
form (the Jordan normal form) for any linear operator Cn → Cn for n > 1.
We will not be able to get such a general theory if the Hilbert space H is
infinite-dimensional, but it turns out that many operators of great interest
have properties which, in the finite-dimensional case, ensure an even simpler
description. They may belong to any of the special classes of operators defined
on a Hilbert space by means of the adjoint operation T 7→ T ∗ : self-adjoint
operators, unitary operators, or normal operators. For these, if dim H = n and
we work over C, then there is an orthonormal basis (e1 , . . . , en ) of eigenvectors
of T with corresponding eigenvalues (λ1 , . . . , λn ) so that
X
n n
X
T αj ej = αj λj ej . (1.11)
j=1 j=1
P
In other words, the map φ( nj=1 αj ej ) = (α1 , . . . , αn ) is an isometry
from H to Cn and we may rephrase (1.11) to become
T1 = φ ◦ T ◦ φ−1 (1.12)
where T1 is the diagonal map defined by T1 : Cn ∋ (αi ) 7→ (αi λi ) ∈ Cn . This

is obvious, but gives a slightly different view of the classification problem.
For any finite-dimensional Hilbert space H, and normal operator T , we have
found a model space and operator (Cn , T1 ), such that (H, T ) is equivalent
to (Cn , T1 ) in the sense of (1.12).
The theory we will describe in Chapters 9, 12 and 13 will be a gener-
alization of this type of normal form reduction. This succeeds because the
model spaces and operators are indeed simple: they are of the type L2µ (X)
for some measure space (X, µ), and the operators are multiplication operat-
ors Mg : f 7→ gf for a suitable function g : X → C.
1.4 The Prime Number Theorem 13
1.4 The Prime Number Theorem
Part of the inherent beauty of mathematics comes from the interplay between
simple problems and the sophisticated theories that are sometimes required
to solve these problems. The natural numbers are among the simplest math-
ematical objects, but number theory tends to use techniques from much of
mathematics to study basic properties of N. Additively N is quite simple,
but multiplicatively N is much more complex as it is generated by the prime
numbers 2, 3, 5, . . . . Perhaps because of mathematics’ omnipresence across all
of the sciences, its absolute (internal) truth, and the pre-eminent role played
by the natural numbers, Gauss is alleged to have said “mathematics is the
queen of the sciences and number theory is the queen of mathematics”.
For modern number theory functional analysis is one of many essential
tools. While we will not be able to really justify this statement without de-
voting a significant proportion of this volume to number theory, we do at-
tempt a partial justification by giving a proof of the prime number theorem
in Chapter 14. Prime numbers have been a source of inspiration for math-
ematicians certainly since Euclid proved (approx. 300 BCE) that there are
infinitely many prime numbers. One of many mysteries concerning the prime
numbers is their distribution or location within the natural numbers.
Exercise 1.8. Writing p1 , p2 , p3 , . . . for the primes 2, 3, 5, . . . and π(x) = |{n | pn 6 x}|
for the number of primes less than or equal to x, recall Euclid’s argument using the fact
that p1 p2 · · · pn + 1 has a prime divisor not in {p1 , . . . , pn } to show that there are infinitely
n
many primes. Use this to show that pn < 22 and deduce that
log(log(x)) < π(x) 6 x (1.13)
for all large x.
The property of being a prime is in some sense ‘deterministic’ but, meas-

ured appropriately, their appearance in N seems to mimic randomness. Leav-
ing finer questions of this nature to one side, one may ask if Euclid’s proof
that there are infinitely many primes can be improved to a more effective
statement concerning the growth rate of π(x). That is, can (1.13) be im-
proved, and if so, to what extent? Attempting to answer this question has
been the source of many developments in number theory and other fields.
For instance, Euler had studied
P properties of what we now call the
Riemann zeta function ζ(z) = n>1 n1z for z ∈ R, obtaining in 1737 a result
Px
that says (in a manner of speaking) n=1 p1n is approximately the logarithm
P P
of xn=1 n1 , equivalently that xn=1 p1n is approximately log log x, presaging
an important result of Mertens from 1874. We will prove a weaker form of
Mertens’ theorem in Chapter 14 (Theorem 14.15). Euler’s paper introduced
several seminal ideas into number theory and arguably is the founding work
of analytic number theory.
Based on tables of primes Gauss (in 1792 or 1793), Legendre (in 1797
or 1798) and Dirichlet (in 1838) made conjectures about the asymptotic
14 1 Motivation
growth of π; the latter two both suggesting that logx x π(x) → 1 as x → ∞.

Chebyshev was the first to really establish the correct order of growth
for π, proving in 1848 that if the sequence logx x π(x) x>1 converges at all
as x → ∞, then it must converge to 1, and proving in 1851 that there are
constants A, B > 0 with A logx x 6 π(x) 6 B logx x for all x.
Riemann’s memoir of 1859 introduced methods of complex analysis into
the subject, building on Euler’s use of real analysis by allowing complex
variables and relating analytic properties of the zeta function and its zeros
to arithmetic properties of the primes. Finally, in 1886 Hadamard and de
la Vallée-Poussin independently used complex analysis on the Riemann zeta
function to prove the prime number theorem, logx x π(x) −→ 1 as x → ∞.
We will present a proof of the prime number theorem (Theorem 14.1) in
Chapter 14, closely following Tao’s blog [103], which builds on a key step of
Selberg’s elementary proof of the prime number theorem from 1949 in [96]
and Mertens’ theorem. In combining these ingredients to prove the prime
number theorem we will use the language and several tools developed in
this volume. Moreover, we will close our discussion with an application of the
spectral theory of finite abelian groups (as in the discussion from Section 1.1)
proving Dirichlet’s theorem from 1837 on primes in arithmetic progressions
and the prime number theorem for primes in arithmetic progressions.
1.5 Further Topics
We list here two more topics that we will be able to discuss.

• Suppose that f : R2 → R is a continuous function and the partial de-
∂k ∂k
rivatives ∂x k f, ∂xk f exist and are continuous for all k > 1. Then f is
1 2
smooth (see Exercises 3.69 and 5.18).
• Another application of functional analysis that we wish to discuss con-
cerns the construction of sparse but highly connected graphs called ex-
panders, which are an important concept in graph theory and computer
science, see Section 10.4.
Chapter 2
Norms and Banach Spaces
In this chapter we start the more formal treatment of functional analysis, giv-
ing the fundamental definitions and introducing some of the basic examples
and their properties. We also discuss some theorems and constructions that
may be considered part of topology or measure theory, to put them into the
context of the theory developed here.
2.1 Norms and Semi-Norms
Throughout this book we will be working with real or complex vector

spaces (V, +, ·) (here + is vector addition, and · scalar multiplication). We
will call the elements of the field simply scalars if we want to avoid making
the distinction between the real and complex case. For instance, in the fun-
damental definitions to come in this section, we treat the real and complex
cases simultaneously.
We will assume familiarity with the following concepts from linear algebra:
vector spaces, subspaces, quotient spaces, dimension (which may be infinite),
linear maps, image and kernel of linear maps. The notion of a basis of a
vector space will only be used to distinguish finite-dimensional vector spaces
from infinite-dimensional ones. We will not usually try to describe the vector
spaces that arise in functional analysis, or the linear maps between them, in
terms of bases. An exception will arise in the study of Hilbert spaces (see
Section 3.1) and in the study of certain (important but nonetheless special)
operators on them (see Section 6.2).
Also recall that a subset K ⊆ V of a vector space is said to be convex if
for k1 , k2 ∈ K and t ∈ [0, 1] we have that the convex combination (1−t)k1 +tk2
also belongs to K.

16 2 Norms and Banach Spaces
2.1.1 Normed Vector Spaces
Definition 2.1. Let V be a real or complex vector space. A map k·k : V → R

is called a norm if it has the following properties
• (strict positivity) kvk > 0 for any v ∈ V , and kvk = 0 if and only if v = 0;
• (homogeneity) kαvk = |α|kvk for all v ∈ V and scalars α; and
• (triangle inequality) kv + wk 6 kvk + kwk for all v, w ∈ V .
If k · k is a norm on V , then (V, k · k) is called a normed vector space.
It is easy to give examples of normed vector spaces, and we list a few

standard examples here (more will appear throughout the text).
Example 2.2. The following are examples of normed real vector spaces, in
which we write v = (v1 , . . . , vd )t for elements of Rd .
p
(1) Rd with the Euclidean norm kvk = kvk2 = |v1 |2 + · · · + |vd |2 .
(2) Rd with kvk = kvk∞ = max16i6d |vi |.
(3) Rd with kvk = kvk1 = |v1 | + · · · + |vd |.
(4) Rd with the norm defined by kvkB = inf{α > 0 | α1 v ∈ B}, where B is
a non-empty, open, centrally symmetric (that is, with B = −B), convex,
bounded (with respect to the Euclidean norm) subset of Rd .
(5) Let X be any topological space (for example, a metric space; see Ap-
pendix A), and let Cb (X) = {f : X → R | f is continuous and bounded}
with the uniform or supremum norm
kf k = kf k∞ = sup |f (x)|.
x∈X
Notice that if X is compact, then Cb (X) coincides with C(X), the space of
continuous functions X → R. We note that our definition of compactness
(see Definition A.18) contains the assumption that X is Hausdorff.
(6) A special case of (5) makes C([0, 1]), and so also the subspace
C 1 ([0, 1]) = {f : [0, 1] → R | f has a continuous derivative on [0, 1]},
into a normed vector space. A different norm on C 1 ([0, 1]) may be ob-
tained by setting kf kC 1([0,1]) = max{kf k∞ , kf ′ k∞ }.
(7) Finally, consider the vector space of real polynomials
n N
X o
R[x] = f= cf (k)xk | N ∈ N, cf (k) ∈ R
k=0
on which we can define any of the following norms (thinking of f ∈ R[x]

really as the finitely supported but infinite vector
(cf (0), cf (1), . . . , cf (N ), cf (N + 1), . . . ) ∈ RN0

2.1 Norms and Semi-Norms 17
of its coefficients with cf (k) = 0 for k > N ):

∞
X
(a) kf k1 = |cf (k)|,
k=0
∞
!1/2
X
(b) kf k2 = |cf (k)|2 , or
k=0
(c) kf k∞ = max |cf (k)|.
k>0
We could also think of polynomials as defining continuous functions

on [0, 1], thus embedding R[x] ⊆ C 1 ([0, 1]) ⊆ C([0, 1]), so that the
norm k · kC 1 ([0,1]) or k · k∞ may also be used.
The examples in Example 2.2 all generalize in the obvious way to form
normed complex vector spaces, with the exception of (4), where additional
requirements on the set B are required (see Exercise 2.3).
Exercise 2.3. (a) Verify that Example 2.2(1), (2), (3), (5), (6), and (7) define normed
vector spaces over R or C.
(b) Show that Example 2.2(4) defines a real normed vector space.
(c) Show that for a complex normed vector space (V, k · k) the open unit ball
B = B1V = {v ∈ V | kvk < 1}
has the property that αB = B for any α ∈ C with |α| = 1.

(d) Show that if B ⊆ Cd is non-empty, open, convex, bounded, and satisfies αB = B for
any α ∈ C with |α| = 1, then there exists a norm on Cd whose open unit ball is B.
Throughout the text we will use notions from topology (see Appendix A
for a summary).
Lemma 2.4 (Associated metric). Suppose that (V, k·k) is a normed vector
space. Then for every v, w ∈ V we have

kvk − kwk 6 kv − wk. (2.1)
Moreover, writing d(v, w) = kv − wk for v, w ∈ V defines a metric d on V

such that the norm function k · k : V → R is continuous with respect to the
topology induced by the metric d.
Proof. For any v, w ∈ V , we have kvk = kv − w + wk 6 kv − wk + kwk and

similarly kwk = kw − v + vk 6 kv − wk + kvk, by Definition 2.1 (the triangle
inequality and homogeneity), and the two inequalities together give (2.1).
To see that d is a metric we need to check the following defining properties
of a metric:
• (strict positivity) that d(v, w) > 0 for all v, w ∈ V and d(v, w) = 0 if and
only if v = w is clear by strict positivity in Definition 2.1.
• (symmetry) d(v, w) = d(w, v) for all v, w ∈ V follows by applying the
homogeneity in Definition 2.1 with the choice α = −1.
• (triangle inequality) Finally, we have
d(u, w) = ku − wk = ku − v + v − wk 6 ku − vk + kv − wk = d(u, v) + d(v, w).
The norm is continuous at v ∈ V if for every ε > 0 there exists some δ > 0
such that d(u, v) < δ implies kuk − kvk < ε. By (2.1), we may choose δ = ε
to see this.
Notice that the triangle inequality makes addition continuous. Indeed, if
we write
Bεk·k (v) = {w ∈ V | kw − vk < ε}
for the ball of radius ε around v ∈ V , then we have
k·k k·k
Bε/2 (v1 ) + Bε/2 (v2 ) ⊆ Bεk·k (v1 + v2 ) (2.2)
for every ε > 0. This means that (v, w) 7→ v + w is continuous at (v1 , v2 ) and,
since v1 , v2 ∈ V were arbitrary, shows that addition is continuous.
Scalar multiplication is also continuous. To see this fix a scalar α and a
vector v ∈ V , and notice that
βw − αv = (β − α)w − α(v − w).

ε
So if ε ∈ (0, 1), |β − α| < kvk+1 for a scalar β and kw − vk < ε for a
vector w ∈ V , then kwk < kvk + 1 and hence
kβw − αvk < ε(1 + |α|). (2.3)
This gives continuity of scalar multiplication at (α, v).

We now turn to the sense in which the topology induced by a norm de-
termines the norm.
Lemma 2.5 (Equivalence of norms). Two norms k·k and k·k′ on the same
vector space induce the same topology if and only if there exists a (Lipschitz)
constant c > 1 such that
1 ′
c kvk 6 kvk 6 ckvk′ (2.4)
for all v ∈ V . In this case we call the norms equivalent.
Proof. If (2.4) holds, then the standard neighbourhoods of v ∈ V ,

′
Bεk·k (v) = {w ∈ V | kw − vk′ < ε}
and
Bεk·k (v) = {w ∈ V | kw − vk < ε}
with respect to the two norms satisfy
k·k′ ′
B 1 ε (v) ⊆ Bεk·k (v) ⊆ Bcε
k·k
(v).
c
This implies that the topologies have the same notion of neighbourhood, and
so are identical.
k·k
Suppose now that the two topologies are the same, so that B1 is a neigh-
bourhood of 0 in this topology. Then there must be some ε > 0 with
′ k·k
Bεk·k ⊆ B1 .
Equivalently, kvk′ < ε implies that kvk < 1. For any v ∈ V r{0}, if w = ε
2kvk′ v
then
ε
kwk′ = kvk′ < ε
2kvk′
and so
ε
kvk = kwk < 1.
2kvk′
This implies that kvk 6 2ε kvk′ for all v ∈ V , giving the second inequality
in (2.4). Reversing the roles of k · k and k · k′ gives the first inequality, and
choosing c to be the larger of the two choices produced for c gives the lemma.
The phenomenon seen in the proof of Lemma 2.5, where a property on
all of V is determined by the local behaviour at 0, is something that will
occur frequently. For Rd the notion of equivalence of norms has the following
property.
Proposition 2.6 (Equivalence in finite dimensions). Any two norms
on Rd are equivalent, for any d > 1.
As we will see in the proof, this is related to the compactness of the closed
unit ball in Rd .
Proof of Proposition 2.6. Let k · k1 be the norm on Rd from Ex-
ample 2.2(3), and let k · k′ be an arbitrary norm on Rd . It is enough to
show that these two norms are equivalent. Write e1 , . . . , ed for the standard
basis of Rd , and let M = max16i6d kei k′ . Then
Xd ′ Xd

kvk′ = vi ei 6 |vi |kei k′ 6 M kvk1 , (2.5)
i=1 i=1
where we have used the triangle inequality generalized by induction to finite

sums and homogeneity of the norm. This gives one of the inequalities in (2.4).
Using compactness: To obtain the reverse inequality, notice first that
S1 = {v ∈ Rd | kvk1 = 1}
is closed and bounded in the standard topology of Rd (which may either be

seen as a consequence of the bounds d1 k · k1 6 k · k2 6 k · k1 or directly
using k · k1 ), and so is compact by the Heine–Borel theorem. Also, (2.5)

shows that v 7→ kvk′ is a continuous function inthe standard topology: (2.1)
for k · k′ and (2.5) together give kvk′ − kwk′ 6 M kv − wk1 , giving the
ε
continuity claimed just as in the end of the proof of Lemma 2.4 with δ = M .
Together this implies that
m = min kvk′ = kv0 k′

v∈S1
is attained for some v0 ∈ S1 . By definition of S1 we have v0 6= 0, so that m > 0

by the property of the norm k · k′ . Therefore, v ∈ V r{0} implies that

v ′

kvk1 > m,
or v ∈ V implies that kvk′ > mkvk1 , as required.

This might suggest that the equivalence of norms is a widespread phe-
nomenon. However, once we leave the setting of finite-dimensional normed
spaces, we will quickly see that a given normed space may have many inequi-
valent norms.
Exercise 2.7. Show that the norms kf k∞ and kf kC 1([0,1]) for f ∈ C 1 ([0, 1]) from Ex-
ample 2.2(5)–(6) are not equivalent. Show that the norm kf kC 1([0,1]) and the norm defined
by kf k0 = |f (0)| + kf ′ k∞ for f ∈ C 1 ([0, 1]) are equivalent.
Exercise 2.8. Show that no two of the norms on R[x] from Example 2.2(7) are equivalent.
However, some of the pairs of norms do satisfy an inequality of the form kf k 6 ckf k′ for
some fixed c > 0 and any f ∈ R[x]. Find those that do and identify the smallest relevant
constant c in each case.
Exercise 2.9. Let V, W be normed vector spaces. Show that V × W with its canonical
inherited vector space structure can be made into a normed vector space using either of
the norms 1/p
k(v, w)kp = kvkpV + kwkpW
for some p ∈ [1, ∞), or
k(v, w)k∞ = max{kvkV , kwkW }.
Show that all of these norms are equivalent, and that they induce the product topology.
The next exercise also shows why we are careful in setting up the theory
of normed spaces instead of just declaring that everything is a generalization
of the finite-dimensional theory.
P∞
Exercise 2.10. We define the space ℓ1 (N) = {(xn ) | n=1 |xn | < ∞} to be the space of
all absolutely summable sequences.
P ∞
n=1 |xn | for x ∈ ℓ (N) defines a norm, and that the sub-
• Show that kxk1 = 1
space cc (N) = {x | xn = 0 for all large enough n} ⊆ ℓ1 (N) is dense.

• Let V = ℓ1 (N)×ℓ1 (N) (with any of the equivalent norms from Exercise 2.9) and define
the subspaces V1 = ℓ1 (N) × {0} and
V2 = {(x, y) ∈ V | nyn = xn for all n ∈ N}.

Show that V1 , V2 are closed subspaces of V , but that
V1 + V2 = {v1 + v2 | v1 ∈ V1 , v2 ∈ V2 }
is not closed.
2.1.2 Semi-Norms and Quotient Norms
†
The following weakening of Definition 2.1 is often useful.
Definition 2.11. A non-negative function k · k : V → R>0 on a vector
space V is called a semi-norm (or a pseudo-norm) if k · k satisfies the homo-
geneity property and the triangle inequality of a norm.
Thus a semi-norm is allowed to have a non-trivial subset (which will be a
subspace, see below) on which it vanishes. A semi-norm gives rise to a pseudo-
metric, which in turn gives rise to a topology on V . The resulting topology is
Hausdorff if and only if the original semi-norm is a norm in the usual sense.
Indeed, if v ∈ V has kvk = 0, then v will belong to every neighbourhood of 0
in the topology defined by k · k.
Example 2.12. Let (X, B, µ) be a measure space (see Appendix B), and define
Lµ1 (X) = {f : X → R | f is measurable and Lebesgue integrable w.r.t. µ}.
On this space we can define a semi-norm

Z
kf k1 = |f | dµ,
X
and this is not a norm (unless the measure space (X, µ) has special properties;
see Exercise 2.13).
Exercise 2.13. Characterize those measure spaces (X, B, µ) on which the semi-norm from
Example 2.12 on the space Lµ1 (X) of Lebesgue integrable functions is a norm.
A reader familiar with measure theory may misread Example 2.12, so

we should emphasize that Lµ1 (X) denotes the space that contains genuine
functions defined at each point of X. The usual solution to the problems
created by the many functions on which the semi-norm vanishes is to define
an equivalence class of a function f to consist of all functions that differ from f
on a null set. This generalizes to a construction that allows any semi-norm
on a vector space to be modified to give a norm (on a related vector space).
To describe this construction, we first prove a simple lemma that starts to
connect the algebraic properties of spaces equipped with a semi-norm to their
topological properties.
† The construction in this section is satisfying and useful at times but, with the exception
of Definition 2.11, is not critical for later developments.
Lemma 2.14 (Kernel of a semi-norm). The kernel
V0 = {v ∈ V | kvk = 0}
of a semi-norm on a vector space is a closed subspace in the topology induced

by the semi-norm.
Proof. For v, w ∈ V0 and any scalar α we have
0 6 kαv + wk 6 |α|kvk + kwk = 0,
so V0 is a subspace.
By the argument used in Lemma 2.4, we see that the semi-norm k · k
is continuous with respect to the induced topology. It follows that the pre-
image V0 = (k · k)−1 ({0}) is also closed.
Returning to Example 2.12, recall that for f ∈ Lµ1 (X), kf k1 = 0 is equi-
valent to the statement that f = 0 almost everywhere with respect to µ.
Thus the usual equivalence class of a function f is precisely the coset f + V0
defined by f with respect to the kernel V0 ⊆ Lµ1 (X) of the semi-norm. We
define, as is standard, the quotient space
L1µ (X) = Lµ1 (X)/V0 ,
and note that the semi-norm k · k1 on Lµ1 (X) gives rise to a norm, also
denoted k · k1 , on L1µ (X). For an introduction to the function spaces Lpµ (X)
for p ∈ [1, ∞) we refer to Appendix B.3. Where the measure is clear from
the context or has a standard choice (for example, the Lebesgue measure
on [0, 1]), it is omitted from the notation. This construction is a special case
of the following.
Lemma 2.15 (Quotient norm). For any vector space V equipped with a
semi-norm k · k, and any closed subspace W ⊆ V , the expression
kv + W kV /W = inf kv + wk
w∈W
for v ∈ V defines a norm on the quotient space V /W = {v + W | v ∈ V }.

For the kernel W = V0 we have kv + V0 kV /V0 = kvk for v ∈ V .
Proof. This is simply a matter of chasing the definitions through the state-
ments. Let v1 , v2 ∈ V and ε > 0 be given. Then there exist w1 , w2 ∈ W
with
kvi + wi k 6 kvi + W kV /W + ε
for i = 1, 2. Hence
kv1 + v2 + W kV /W 6 kv1 + v2 + w1 + w2 k
6 kv1 + w1 k + kv2 + w2 k
6 kv1 + W kV /W + kv2 + W kV /W + 2ε,
and so the triangle inequality holds for k · kV /W . Similarly, for any scalar α,

kαv1 + αw1 k = |α|kv1 + w1 k 6 |α| kv1 + W kV /W + ε ,
which gives
kαv1 + W kV /W 6 |α|kv1 + W kV /W .
If α = 0 then this is clearly an equality, and if α 6= 0 then we may apply the
above to αv1 and the scalar α−1 to give
kv1 + W kV /W 6 |α|−1 kαv1 + W kV /W .
However, this is the remaining half of the homogeneity property.

It remains to check that k · kV /W is indeed a norm and not simply a semi-
norm. Assume therefore that
kv + W kV /W = 0.
Then for every ε > 0 there exists some w ∈ W with kv − wk < ε. However,
this shows that v belongs to the closure W of W . By assumption W = W is
closed, so that v ∈ W and v + W = W is the zero element in the quotient
space V /W .
For W = V0 we have kv + wk = kv + wk + k − wk > kvk for every v ∈ V
and w ∈ V0 , which gives the final claim.
Notice that we cannot expect the infimum in Lemma 2.15 to be a minimum
in general (see, for example, Exercise 2.16).
Exercise 2.16. Let (C([−1, 1]), k · k∞ ) be the normed vector space defined as in Ex-
ample 2.2(5). Define
Z 0 Z 1
W = f ∈ C([−1, 1]) | f (x) dx = f (x) dx = 0 .
−1 0
Show that W is a closed subspace. Now let f (x) = x, calculate kf kC([−1,1])/W , and show
that the infimum is not achieved.
2.1.3 Isometries are Affine
†
The following strengthening of the triangle inequality has interesting con-
sequences.
† The results in Section 2.1.3 are interesting, but will not be needed later.
Definition 2.17. A norm k · k on a vector space V is strictly sub-additive if

we have strict inequality in the triangle inequality except in the case when
the two vectors are real non-negative multiples, more precisely
kv + wk < kvk + kwk
for all v, w ∈ V unless v, w ∈ {t(v + w) | t > 0}.

Exercise 2.18. A normed space (V, k · k) is strictly convex if the closed unit ball is a
strictly convex set, or equivalently if a line segment with end points x 6= y in the unit
sphere {v ∈ V | kvk = 1} only intersects the unit sphere at its end points. Show that the
closed unit ball in a normed linear space is strictly convex if and only if the norm is strictly
sub-additive.
A map f : V → W between normed spaces is an isometry if
kf (v) − f (v0 )kW = kv − v0 kV
for all v, v0 ∈ V .
Exercise 2.19. Show that the supremum norm k·k∞ on R2 is not strictly convex. Give an
example to show that an isometry between normed spaces need not be affine, by considering
maps of the form x 7→ (x, f (x)) from (R, |·|) to (R2 , k·k∞ ) for a suitably chosen function f .
Theorem 2.20 (Mazur–Ulam(3) ). Let V and W be normed linear spaces

over R, and let M : V → W be a function. Assume that either
• M is a surjective isometry, or
• M is an isometry and the norm on W is strictly sub-additive.
Then M is affine, that is M (v) = Mlinear (v) + M (0) where Mlinear : V → W
is a linear isometry.
Proof. Clearly the map v 7→ M (v)−M (0) is an isometry if M is an isometry,

so we may assume that M (0) = 0 without loss of generality, and need to show
in this case that M is linear.
Midpoint-preserving maps: We claim first that if M preserves mid-points
in the sense that
M v1 +v
2
2
= M(v1 )+M(v
2
2)
(2.6)
for all v1 , v2 ∈ V and satisfies M (0) = 0, then M is linear. To see this, pick v
in V and apply (2.6) to the pairs v and 0, then to 12 v and 0, and inductively
to 21k v and 0 to prove that M ( 21k v) = 21k M (v) for all k ∈ N and v ∈ V . Next
apply (2.6) to 2v and 0, and inductively to (ℓ + 1)v and (ℓ − 1)v to prove
that M (ℓv) = ℓM (v) for all ℓ ∈ N and v ∈ V . Finally, apply (2.6) to v and −v
to see that M (−v) = −M (v) for all v ∈ V . This gives

M 2kn v = 2kn M (v)
for any k ∈ Z, n ∈ N and v ∈ V , and so by continuity M (av) = aM (v) for

all a ∈ R and v ∈ V . With (2.6), this also gives M (v1 + v2 ) = M (v1 ) + M (v2 )
for all v1 , v2 ∈ V .
Subadditive norms: The case of the theorem under the sub-additivity hy-
pothesis is now easily obtained. Suppose that M is an isometry and the norm
on W is strictly sub-additive. Let v1 , v2 ∈ V have mid-point z = v1 +v 2 , so
2
that
kv1 − zk = kz − v2 k = 12 kv1 − v2 k,
and hence (since M is an isometry)
kM (v1 ) − M (z)k = kM (z) − M (v2 )k = 21 kM (v1 ) − M (v2 )k.
Moreover, M (v1 ) − M (v2 ) = (M (v1 ) − M (z)) + (M (z) − M (v2 )) . Thus if the

norm on W is strictly sub-additive then (M (v1 )− M (z)) and (M (z)− M (v2 ))
must be real non-negative multiples of each other by strict sub-additivity,
but as they have the same norm this forces them to be be equal. Solving this
equation for M (z) gives (2.6).
Surjective isometries: Consider now the case where M is assumed to be
a surjective isometry (and hence also bijective). We define for z ∈ V the
reflection in z to be the map ψz : V → V defined by ψz (v) = 2z − v. It is
easy to check that ψz2 is the identity, so ψz has inverse ψz and is a bijective
isometry. Note that
kψz (v) − zk = kv − zk, (2.7)
and
kψz (v) − vk = 2kv − zk (2.8)
for all v ∈ V , which also implies that z itself is the only point fixed of ψz .
Now fix v1 , v2 ∈ V and write z = v1 +v
2
2
as before for the midpoint. Let B
be the group of all bijective isometries V → V that fix v1 and v2 , and define
λ = sup{kg(z) − zk | g ∈ B}.
Since any g ∈ B is an isometry satisfying g(v1 ) = v1 we have
kg(z) − zk 6 kg(z) − g(v1 )k + kv1 − zk = 2kv1 − zk
and hence λ < ∞. We also have
2kg(z) − zk = kψz g(z) − g(z)k (by (2.8) for v = g(z))

−1
= kg ψz g(z) − zk (since g is an isometry)
−1
= kψz g ψz g(z) − zk (by (2.7) for v = g −1 ψz g(z))
for any g ∈ B. Furthermore, note that ψz (v1 ) = v2 and ψz (v2 ) = v1 . It

follows that g ∈ B implies g ′ = ψz g −1 ψz g ∈ B and so kg ′ (z) − zk 6 λ.
Combining this with the above gives
2kg(z) − zk = kg ′ (z) − zk 6 λ,
for all g ∈ B, and hence by definition of λ ∈ [0, ∞) also 2λ 6 λ. This forces λ

to be 0, and therefore g(z) = z for all g ∈ B.
Now let M : V → W be a bijective isometry, and let z ′ = M(v1 )+M(v 2
2)
.
−1
Then h = ψz M ψz′ M ∈ B, so h(z) = z and therefore ψz′ M (z) = M (z).
On the other hand, the only point fixed by ψz′ is z ′ itself, so M (z) = z ′
and M preserves mid-points as required.
Exercise 2.21. Show that the vertex set of a graph consisting of vertices v1 , v2 , v3 , vc
and three edges connecting one central vertex vc to the remaining three vertices v1 , v2 , v3 ,
endowed with the combinatorial distance given by d(vj , vk ) = 2δjk for j, k ∈ {1, 2, 3}
and d(vj , vc ) = 1 for j = 1, 2, 3 admits no isometric embedding into any Banach space
with a strictly sub-additive norm.
2.1.4 A Comment on Notation
On several occasions in this section we considered different norms on the

same vector space. This will happen less frequently in the theoretical parts
of the text, and most of the time the normed vector space (V, k · k) will be
equipped with a particular norm. Where we are dealing with a single norm,
we will write
Br = BrV = BrV = {w ∈ V | kwk < r}
for the open ball of radius r around 0, and
Br (v) = BrV (v) = {w ∈ V | kw − vk < r} = BrV + v
for the open ball of radius r around v ∈ V .

We will also frequently write k·kV for the natural norm on V . For example,
in Example 2.2(6) we may write kf kC 1 ([0,1]) for the natural norm of a func-
tion f ∈ C 1 ([0, 1]), but may also write kf kC([0,1]) = kf k∞ for the supremum
norm of f ∈ C 1 ([0, 1]) thought of as an element of the large space C([0, 1]). At
this point it is reasonable to ask what makes a norm be naturally associated
to a given space, and this is partially explained in the next section.
2.2 Banach Spaces
We start by recalling a basic definition from analysis on metric spaces.

2.2 Banach Spaces 27
Definition 2.22. A sequence (xn ) in a metric space (X, d) is said to be a

Cauchy sequence if for any ε > 0 there is an N = N (ε) such that
d(xm , xn ) < ε
for any m, n > N . The metric space is called complete if every Cauchy se-
quence converges to an element of X.
From a purely logical point of view, ‘converges to an element of X’ should

be written ‘converges’, but we wish to emphasize here that the limit of the
sequence belongs to X (and not to some strictly larger space that contains X).
The notion of Cauchy sequences gives rise to one of the fundamental types
of normed spaces in functional analysis.
Definition 2.23. A normed vector space (V, k · k) is a Banach space if V is

complete with respect to (the metric induced by) the norm k · k.
Once again there are many familiar examples of Banach spaces. As we will
see there is often an almost canonical choice of norm k · kV which makes a
linear space V into a Banach space (V, k · kV ). It is clear that this property of
a norm does not define it uniquely. In fact, any equivalent norm would induce
the same topology, the same notion of Cauchy sequence, and therefore also
make V into a Banach space.
Example 2.24. We start with a small number of examples, and postpone the
proof that these are indeed Banach spaces to Section 2.2.1.
(1) The Euclidean space Rd with any of the norms from Example 2.2(1)–(4)
from Section 2.1.1 forms a Banach space.
(2) Let X be any set. Then B(X) = {f : X → R | f is bounded}, equipped
with the norm
kf k∞ = sup |f (x)|,
x∈X
is a Banach space. Convergence of a sequence of functions in this space

is also called uniform convergence. If X = N, then one often writes
ℓ∞ = ℓ∞ (N) = B(N).
(3) Let X be a topological space. Then
Cb (X) = {f ∈ B(X) | f is continuous}
is a closed subspace of B(X) and so is also a Banach space. Notice that

if X is compact then Cb (X) = C(X).
(4) Let X be a locally compact topological space (that is, a Hausdorff topo-
logical space in which every point has a compact neighbourhood, see also
Definition A.21). Then
C0 (X) = {f ∈ Cb (X) | lim f (x) = 0}

x→∞
is a closed subspace of Cb (X) and hence a Banach space. The notion of the
limit of f (x) as x → ∞ used here is defined as follows: limx→∞ f (x) = A
if and only if for every ε > 0 there exists some compact set K ⊆ X
with |f (x) − A| < ε for all x ∈ XrK. If X = N (with the discrete topo-
logy), one often writes c0 = c0 (N) = C0 (N) for this subspace of ℓ∞ (N).
(5) The space C 1 ([0, 1]) of continuously differentiable functions on [0, 1] with
the norm
kf kC 1 ([0,1]) = max{kf k∞, kf ′ k∞ }
is a Banach space.
(6) Let U ⊆ Rd be non-empty and open, and fix k > 1. Then the space Cbk (U )
of functions U → R for which all partial derivatives up to order k exist
and are continuous and bounded on U , equipped with the norm
kf kCbk (U) = max k∂α f k∞ ,

kαk1 6k
is a Banach space, where ∂α for α ∈ Nd0 stands for the partial differential
operator defined by
∂ kαk1
∂α f = f
∂x1 · · · ∂xα
α1
d
d
of degree kαk1 = α1 + · · ·+ αd . In the case α = ej we will call this the jth

partial derivative and write ∂j f = ∂ej f .
(7) Fix p ∈ [1, ∞) and let (X, B, µ) be a measure space. Then
Z 1/p
kf kp = |f |p dµ
X
defines a semi-norm on the vector space
Lµp (X) = {f : X → R | f is measurable and kf kp < ∞}.
The associated space of equivalence classes, equal to the quotient
Lpµ (X) = Lµp (X)/V0
by the kernel V0 of the semi-norm k · kp , is a Banach space. We will

write Lp (X, µ) and Lp (µ) in place of Lpµ (X) when the space is clear,
and in particular where it is useful to avoid multiple levels of subscript.
Important special cases of this construction include the following:
(a) (X, B, µ) = (Ω, B, m) where Ω is a Borel subset of Rd , B is the
Borel σ-algebra, and m is d-dimensional Lebesgue measure on Ω.
(b) (X, B, µ) = (N, P(N), λcount ), where λcount denotes the counting meas-
ure, which is defined on any subset of N. In this case we will write
ℓp = ℓp (N) = Lpλcount (N).
(8) The analogue of (7) with p = ∞ is constructed slightly differently. As

before, let (X, B, µ) be a measure space. Then
L ∞ (X) = {f : X → R | f is measurable, f ∈ B(X)}
is already a Banach space with respect to kf k∞ . However, one also defines
L∞ ∞
µ (X) = L (X)/Wµ (X),
where
Wµ (X) = {f ∈ L ∞ (X) | f = 0 µ-almost everywhere}
and L∞
µ (X) is equipped with the essential supremum norm defined by

kf kesssup = esssupx∈X |f (x)| = inf α > 0 | µ ({x | |f (x)| > α}) = 0 .
(2.9)
We will generally follow the convention that the essential supremum norm
of f is also denoted for simplicity by kf k∞ . All of these ℓp and Lp spaces also
have natural complex-valued analogues.
As is customary, we will quickly stop being too careful about the dis-
tinction between an element of L ∞ (X) and the equivalence class defined
by it in L∞ µ (X). For example, |f |(x) = |f (x)| for all x ∈ X really depends
on f ∈ L ∞ (X) and not just on the equivalence class, but (as we will see
later in the proof of completeness) the norm defined in (2.9) is independent
of the representative chosen for a given equivalence class.
Exercise 2.25. Show that a product of two normed vector spaces V × W is complete with
respect to one of the norms from Exercise 2.9 if and only if both V and W are complete
with respect to their own norms. Thus the product of two Banach spaces is a Banach space.
2.2.1 Proofs of Completeness
In this subsection we will explain why the examples from Example 2.24 are
indeed Banach spaces. Depending on the background of the reader, parts of
this section may be skipped. In each case it is proving completeness that
really takes up what effort is required. The following principle will be used
several times.
Essential Exercise 2.26. Let (X, d) be a metric space.

(a) Show that if Y is a subset of X that is complete (with respect to the
restriction of the metric on X to Y ), then Y is a closed subset of X.
(b) If X is complete, show that Y ⊆ X is complete if and only if Y is closed.
Example 2.24(1). If two norms are equivalent then they define the same
notion of convergence and of Cauchy sequence. Thus it is enough to con-
sider Rd with the norm k·k∞ by Proposition 2.6. Now a Cauchy sequence (vn )
(i)
in Rd has the property that each component sequence (vn ) for a fixed i,
(1) (d) t
where vn = (vn , . . . , vn ) for all n, is itself a Cauchy sequence in R. Since R
(i)
is complete, there exists a limit v (i) = limn→∞ vn for each i. These limits
(1) (d) t
together define a vector v = (v , . . . , v ) and it is easy to see that v is the
limit of (vn ) in Rd .
Example 2.24(2). Let X be any set and let (fn ) be a Cauchy sequence
in B(X) with respect to k·k∞ . Then for any fixed x ∈ X the sequence (fn (x))
is a Cauchy sequence in R, which therefore has a limit f (x). This defines a
function f : X → R. We need to show that f ∈ B(X) and fn → f as n → ∞
with respect to k · k∞ . Since (fn ) is Cauchy, for any ε > 0 there is some N (ε)
with kfm − fn k∞ < ε for all m, n > N (ε), and so |fm (x) − fn (x)| < ε for
any x ∈ X and m, n > N (ε). Now let m → ∞ to see that |f (x) − fn (x)| 6 ε
for all n > N (ε). Setting ε = 1 and n = N (1) gives |f (x)| 6 1 + kfN (1) k∞ for
any x ∈ X, showing that f ∈ B(X). For any ε > 0, we obtain kf − fn k∞ 6 ε
for all n > N (ε) and hence that f = limn→∞ fn ∈ B(X), as required. If |X|
has cardinality d then this example reduces to the previous one.
Example 2.24(3). By definition, Cb (X) is a subspace of B(X), and we use
the same norm on both spaces. Thus, if (fn ) is a Cauchy sequence in Cb (X)
then, by (2), there exists a limit f = limn→∞ fn ∈ B(X). It remains to show
that f ∈ Cb (X) — that is, to show that Cb (X) is a closed subspace of B(X).
This is a familiar argument from real analysis. Given any ε > 0 there exists
some n with kfn − f k∞ < ε. Since fn ∈ Cb (X) is continuous at x, there is a
neighbourhood U ⊆ X of x with |fn (y) − fn (x)| < ε for all y ∈ U . Therefore,
|f (y) − f (x)| 6 |f (y) − fn (y)| + |fn (y) − fn (x)| + |fn (x) − f (x)| < 3ε
| {z } | {z } | {z }
6kf −fn k∞ <ε <ε 6kf −fn k∞ <ε
for all y ∈ U . As the existence of such a neighbourhood holds for all ε > 0
and x ∈ X, we see that f ∈ Cb (X) as required.
Example 2.24(4). Once again C0 (X) ⊆ Cb (X) and we use the same norm.
So if (fn ) is a Cauchy sequence in C0 (X), then f = limn→∞ fn ∈ Cb (X)
exists by (3). We only need to show that f ∈ C0 (X). For this, let ε > 0 and
choose n ∈ N with kfn − f k∞ < ε. Since fn ∈ C0 (X), there exists some
compact set K ⊆ X with |fn (x)| < ε for all x ∈ XrK. Thus
|f (x)| 6 |f (x) − fn (x)| + |fn (x)| < 2ε
for all x ∈ XrK. This implies that f ∈ C0 (X) as required.

Example 2.24(5). Let (fn ) be a Cauchy sequence in C 1 ([0, 1]) with respect
to the norm kf kC 1([0,1]) = max{kf k∞ , kf ′ k∞ }. Then each fn lies in C([0, 1]),
and (fn ) is a Cauchy sequence with respect to the norm k · k∞ . Thus (3)
applies and shows that fn converges uniformly to some f ∈ C([0, 1]). The
same argument applies to the sequence (fn′ ) of derivatives, showing that (fn′ )
converges uniformly to some g ∈ C([0, 1]). All that remains is to verify that
f ′ = g. (2.10)
In order to show (2.10), it is convenient to rephrase the statement as an

integral equation. Since fn is continuously differentiable, we certainly have
Z x
fn (x) = fn (0) + fn′ (t) dt. (2.11)
0
Now note that

Z x Z x

fn′ (t) dt − g(t) dt 6 kfn′ − gk∞

0 0
converges to 0 as n → ∞. Since fn (x) → f (x) and fn (0) → f (0) as n → ∞,

we see that (2.11) implies
Z x
f (x) = f (0) + g(t) dt.
0
By continuity of g, this is equivalent to (2.10), and now it is clear that
kfn − f kC 1 ([0,1]) = max{kfn − f k∞ , kfn′ − gk∞ } −→ 0
as n → ∞, as required.
Example 2.24(6). Let U ⊆ Rd be an open subset, and let (fn ) be a Cauchy

sequence in Cbk (U ) with respect to the norm k·kCbk (U) . Just as in the argument
for (5), we know from (3) that for any α ∈ Nd0 with α1 + · · · + αd 6 k, the
sequence (∂α fn ) in Cb (U ) has a uniform limit gα ∈ Cb (U ). All that remains
is to show that
gα = ∂α g0 . (2.12)
Suppose therefore that x ∈ U and i ∈ {1, . . . , d}. It is enough (by induction)
to show that
∂ei gα = gα+ei (2.13)
for α1 + · · · + αd < k. For the function fn we have, for sufficiently small h,
that Z h
∂α fn (x + hei ) = ∂α fn (x) + ∂α+ei fn (x + tei ) dt,
0
so letting n → ∞ and using the known uniform convergence just as in (5)

gives
Z h
gα (x + hei ) = gα (x) + gα+ei (x + tei ) dt,
0
which implies (2.13) and hence (2.12) by induction on α1 + · · · + αd . As in (5)

it now follows that fn → g0 with respect to k · kCbk (U) as n → ∞.
Exercise 2.27. Generalize Example 2.24(6) to give a Banach space over C in two different
ways as follows.
(a) Let U ⊆ Rd be open and consider C-valued bounded differentiable functions with
bounded continuous derivative (here there is little difference from the real case).
(b) Let U ⊆ C (or in Cd ) be open, and consider the space of bounded complex differentiable
functions with bounded derivative.
For Examples 2.24(7) and (8) regarding integrable functions and bounded
measurable functions, we will use two lemmas that we formulate more gen-
erally.
The usual definitions of convergence and absolute convergence
P∞ of series
extend easily to normed vector spaces as follows. A series n=1 vn converges
PN
if the sequence of partial sums (sN )N >1 converges, where sN = v
P∞ n=1 n
for all N > 1, and converges absolutely if the real-valued series n=1 kvn k
converges.
Lemma 2.28 (Absolute convergence). A normed vector space (V, k · k)

is a Banach space if and only if any absolutely convergent series in V is
convergent.
P∞
Proof. If V is a Banach
P∞ space and a series n=1 vn is absolutely convergent,
which means thatPn n=1 kvn k < ∞, then the sequence of partial sums (sn )
defined by sn = k=1 vk is a Cauchy sequence, since for m > n we have

X
m Xm

ksm − sn k = vk 6 kvk k,

k=n+1 k=n+1
and the last sum can be made arbitrarily small by requiring n to be sufficiently
large.
Assume now for the converse that (V, k·k) is a normed vector space in which
every absolutely convergent series is convergent, and let (vn ) be a Cauchy
sequence in V . In order to render the Cauchy property more uniform, we
choose a subsequence of (vn ) as follows. For each k > 1 there exists some Nk
such that
1
kvm − vn k < k
2
for all m, n > Nk . Using these numbers we define inductively an increasing
sequence (nk ) by n1 = N1 and nk = max{nk−1 + 1, Nk } for k > 2. The
corresponding subsequence (vnk )k>1 satisfies kvnk+1 − vnk k < 21k . Now define
wk = vnk+1 − vnk
P P∞ 1
for all k > 1,Pso that ∞ k=1 kwk k < k=1 2k = 1 converges, and hence the
∞
infinite sum k=1 wk = w ∈ V converges by our assumption on the normed
space (V, k · k). For the ℓth partial sum of this series we obtain
ℓ
X
wk = vnℓ+1 − vn1 ,
k=1
and so the subsequence (vnk ) satisfies v = limk→∞ vnk = w + vn1 . Finally,

we use the fact that any Cauchy sequence with a convergent subsequence
must converge; we quickly recall the argument. For any ε > 0 choose N
with kvm −vn k < ε for m, n > N and choose K with kvnk −vk < ε for k > K.
Then if k > K has nk > N we have kvm − vk 6 kvm − vnk k + kvnk − vk < 2ε
for all m > N , showing that the sequence converges.
Lemma 2.29 (Quotients of Banach spaces). If (V, k · k) is a Banach

space and W ⊆ V is a closed subspace then (V /W, k · kV /W ) is a Banach
space.
Proof. Assume that (vn ) is a sequence with

∞
X
kvn + W kV /W < ∞.
n=1
Since Banach spaces can be characterized by absolute convergence (see

Lemma 2.28), it suffices to show that
∞
X
(vn + W )
n=1
exists. For this, choose for each n > 1 some wn ∈ W with

1
kvn + wn k 6 kvn + W kV /W + 2n .
P∞
Then n=1 kvn + wn k < ∞, so the limit
∞
X
v= (vn + wn )
n=1
exists in V by Lemma 2.28. Also note that the canonical map π : V → V /W

(defined by π(v) = v + W for all v ∈ V ) is continuous since kv − v0 k < ε
implies k(v + W ) − (v0 + W )kV /W < ε for all v, v0 ∈ V and ε > 0. This
implies that
∞
X
(vn + W ) = v + W
n=1
converges.
We refer to Appendix B for basic properties of k · kp on and inLpµ (X),
particular for the triangle inequality. Moreover, in the proof below and in
the remainder of the book we will need the monotone convergence and the
dominated convergence theorems (Theorems B.7 and B.8, respectively).
Example 2.24(7). Let (fn ) be a sequence in Lpµ (X) with
∞
X
M= kfn kp < ∞.
n=1
P
By Lemma 2.28 it is enough to show that ∞ p
n=1 fn converges in Lµ (X). For
this, define a sequence of functions (gn ) by
n
X
gn (x) = |fk (x)|.
k=1
Clearly gn (x) ր g(x) for some measurable function g : X → [0, ∞]. Note
that !p
Z Xn
p p
|gn | dµ = kgn kp 6 kfk kp 6 Mp
k=1
by the triangle inequality for k · kp . By monotone convergence, this implies

that
kgkpp = lim kgn kpp 6 M p ,
n→∞
and so g(x) < ∞ for µ-almost every x ∈ X. Therefore,

∞
X
f (x) = fn (x)
n=1
exists for µ-almost every x ∈ X, and hence defines a measurable function
f : X → R.
Strictly speaking we have only defined f on the complement of a null set, but
we simplify the notation by ignoring this distinction here.
Since we also have |f (x)| 6 g(x) for all x, we have f ∈ Lpµ (X). It remains
to show that n
X

fk − f −→ 0 (2.14)

k=1 p
Pn p
notice first that | k=1 fk − f | 6 (2g)p and by definition
as n → ∞. For this,P
n p
of f we also have | k=1 fk − f | −→ 0 as n → ∞ and almost everywhere,
so that we may apply dominated convergence to the sequence of integrals
defined by
n p Z n p
X X

fk − f = fk − f dµ.
X
k=1 p k=1
From this we obtain (2.14), as required.
Example 2.24(8). Since a pointwise limit (and a fortiori a uniform limit)

of a sequence of measurable functions is a measurable function, the sub-
space L ∞ (X) ⊆ B(X) is closed. Therefore, by part (2) of the example we
see that L ∞ (X) is a Banach space with respect to the norm k · k∞ .
Now let Wµ = {f ∈ L ∞ | f = 0 µ-almost everywhere}. Clearly Wµ is
closed, since if fn ∈ Wµ for all n > 1 and fn → f uniformly, then
[
{x ∈ X | f (x) 6= 0} ⊆ {x ∈ X | fn (x) 6= 0}
n>1
is a µ-null set. Therefore

L∞ ∞
µ = L /Wµ
is a Banach space with respect to the quotient norm k · kL ∞ /Wµ . It remains

to show that
kf kL ∞ /Wµ = inf kf + gk∞
g∈Wµ
coincides, as claimed, with the essential supremum norm
kf kesssup = inf{α > 0 | µ ({x ∈ X | |f (x)| > α}) = 0}
as given in Example 2.24(8). For this, assume first that α > kf kesssup so
that Nα = {x ∈ X | |f (x)| > α} is a µ-null set, and hence gα = −f 1Nα ∈ Wµ .
It follows that
kf kL ∞ /Wµ 6 kf + gα k∞ 6 α.
Since this holds for any α > kf kesssup it follows that
kf kL ∞ /Wµ 6 kf kesssup .
If, on the other hand, α > kf kL ∞ /Wµ , then there exists some g ∈ Wµ with
kf + gk∞ < α,
and so
{x ∈ X | |f (x)| > α} ⊆ {x ∈ X | g(x) 6= 0}
is a null set. Varying α once more, we see that kf kesssup 6 kf kL ∞ /Wµ .
Exercise 2.30. Show that in the definition of k · kesssup and of k · kL ∞ /Wµ (from the
proof that Example 2.24(8) is a Banach space on p. 35) the infima are actually minima
and hence that for f ∈ L∞
µ (X) we have |f (x)| 6 kf kesssup µ-almost everywhere.
Finally, let us recall two facts from real analysis:

• If a series of real numbers is absolutely convergent, then the sum is in-

dependent of any rearrangement of the series. The latter property is also
called unconditional convergence.
• If a convergent series of real numbers is not absolutely convergent, then
the series may be rearranged to obtain any value in R∪{±∞} for the sum
of the rearranged series. We say that a series is conditionally convergent
if its convergence properties depend on the order of its elements.
In finite-dimensional spaces unconditional convergence is equivalent to abso-
lute convergence, and in infinite dimensions an absolutely convergent series
is unconditionally convergent (see the exercise below). However, any infinite-
dimensional Banach space contains an unconditionally convergent series that
is not absolutely convergent(4) (see Corollary 3.42 for a particular case).
Exercise 2.31. Show that in a normed vector space any absolutely convergent series is
unconditionally convergent.
2.2.2 The Completion of a Normed Vector Space
Even though we have seen several examples of Banach spaces above, there
are many natural normed vector spaces that are not Banach spaces. For
example, R[x] is not a Banach space with respect to any of the five norms
discussed in Example 2.2(7) (see also Exercise 2.62). As a result it is useful
to know that any normed vector space has a completion (whose uniqueness
properties we will discuss in Corollary 2.60).
Theorem 2.32 (Existence of a Completion). Let (V, k · k) be a normed

vector space. Then there exists a Banach space (B, k · k) which contains V as
a dense subspace, and the indicated norm on B restricts to the original norm
on the image of V in B.

Proof† . Let W = (vn ) ∈ V N | (vn ) is a Cauchy sequence . It is straight-
forward to check that W is a vector space. We also define the semi-norm
k(vn )k′ = lim kvn k,

n→∞
which is well-defined as (kvn k) is a Cauchy sequence in R, since (vn ) is a

Cauchy sequence in V (due to (2.1)). The kernel of this semi-norm is the
space
W0 = (vn ) | vn → 0 as n → ∞
of null sequences (that is, sequences converging to 0) in V . We define
† The proof in this section can be skipped, as many natural normed vector spaces are
already Banach spaces, and we will be able to give another shorter construction in
Chapter 7 on p. 214.
B = W/W0
and
kbkB = lim kvn k
n→∞
where b = (vn ) + W0 .
It follows from our discussion concerning quotient norms (see Lemma 2.15)
that (B, k · kB ) is a normed vector space. Moreover, B contains an isometric
copy of V (that is, there is an isometry V → B), since an element v ∈ V can
be identified with the equivalence class of the constant sequence
φ(v) = (v, v, . . . ) + W0 ,
with the norm of this coset being
kφ(v)kB = lim kvk = kvk

n→∞
by definition.
We claim that (the image of) V is dense in B. Given an equivalence
class b = (v1 , v2 , . . . ) + W0 ∈ B of a Cauchy sequence (vn ), for every ε > 0
there exists some N with kvm − vn k < ε for m, n > N . Then
k(v1 , v2 , . . . ) + W0 − φ(vN )kB = lim kvn − vN k 6 ε.

n→∞
Using this for any ε > 0 shows that the image of V is dense in B.
It remains to show that B is complete with respect to k · kB . For this,
assume that (bn )n>1 is a Cauchy sequence in B. Since the image of V is
dense in B we can find a sequence (vn ) of vectors in V with
1
kbn − φ(vn )kB <
n
for each n ∈ N. Then for every ε > 0 there exists some N (ε) with
kbm − bn kB < ε
1 1
and m, n < ε for m, n > N (ε), so that
kvm − vn k 6 kφ(vm ) − bm kB + kbm − bn kB + kbn − φ(vn )kB < 3ε

| {z } | {z } | {z }
1
<m <ε <ε 1
<n <ε
for m, n > N (ε). Therefore, (vn ) is a Cauchy sequence in V . We define
b = (v1 , v2 , . . . ) + W0 ∈ B (2.15)
and see that

1
kb − bm kB 6 kb − φ(vm )kB + kφ(vm ) − bm kB < lim kvn − vm k + < 4ε
n→∞ | {z } m
<3ε
for m > N (ε). Thus b ∈ B defined by (2.15) is the limit of (bn ) and so B is
a Banach space.
Exercise 2.33. Let Cc (R) be the vector space of continuous functions f : R → R with
Supp(f ) = {x ∈ R | f (x) 6= 0}
compact, with the norm k · k∞ . Show that this space is not complete, and find a Banach
space containing Cc (R) as a dense subspace so that the induced norm obtained by restric-
tion is k · k∞ . Can you do the same for the norm kf kΨ = kf Ψk∞ , where Ψ : R → R>0 is
2
a fixed continuous function (for example, Ψ(x) = ex )?
Exercise 2.34. Generalize Theorem 2.32 to metric spaces as follows. If (X, d) is a metric
space, then a completion of (X, d) is a pair consisting of a complete metric space (X ∗ , d∗ )
and an isometry φ : X → X ∗ with the property that φ(X) is dense in X ∗ . Prove that any
metric space has a completion.
2.2.3 Non-Compactness of the Unit Ball
Many properties of finite-dimensional vector spaces are consequences of the

Heine–Borel theorem, which implies that the closed unit ball in a finite-
dimensional vector space is compact. Correspondingly, many interesting prob-
lems in infinite-dimensional Banach spaces are related to the opposite phe-
nomenon.
Proposition 2.35 (Non-compactness of the unit ball). The closed unit
ball B1V in an infinite-dimensional normed space V is not compact.
Proof† . It is enough to construct a sequence (vn ) in V with
kvn k 6 1 (2.16)
for all n > 1 and with

1
kvm − vn k > 2 (2.17)
for all m 6= n (for then such a sequence has no Cauchy subsequence, and
therefore no convergent subsequence).
Choose v1 ∈ V with norm 1 (this is always possible by homogeneity).
Suppose that we have already found v1 , . . . , vk ∈ V with (2.16) and (2.17)
for 1 6 n 6 k and 1 6 m 6= n 6 k. The subspace W = hv1 , . . . , vk i spanned
by the vectors v1 , . . . , vk is finite-dimensional, and therefore complete with re-
spect to the induced norm (see Proposition 2.6). Thus W is a closed subspace,
† The proof in this subsection could be skipped: the result is negative and will only be
used in the more concrete setting of Hilbert spaces, where the proof is a simple exercise.
2.3 The Space of Continuous Functions 39
and so we may consider the quotient norm k · kV /W on the normed vector

space V /W as in Lemma 2.15. Since V is not finite-dimensional, the quotient
space V /W is non-trivial, and so we may choose some v ∈ V with v + W 6= 0.
Thus
d = kv + W kV /W > 0.
It follows that there exists some w ∈ W with kv + wk 6 2d. Define
1
vk+1 = (v + w),
kv + wk
so that kvk+1 k = 1. Also, for 1 6 n 6 k, we have vn ∈ W and so
1 d 1
kvk+1 − vn k > kvk+1 + W kV /W = kv + W kV /W > =
kv + wk 2d 2
as required. Thus by induction we obtain a sequence (vn ) with the claimed

properties, and hence the proposition.
Given the negative statement in Proposition 2.35, a natural question to
ask is how to characterize compact subsets of a Banach space. This depends
on the space concerned (see Exercise 2.36). A vague principle is that one
tries to extract topological and geometrical properties of finite subsets of
the Banach space, and then compact subsets are sometimes characterized by
suitable uniform versions of those properties. We will also illustrate this in
the next section, where we will prove the Arzela–Ascoli theorem.
Exercise 2.36. Characterize the compact subsets of the following Banach spaces.
(a) The space c0 of null sequences (that is, sequences (xn ) of scalars with |xn | → 0
as n → ∞) with the norm k(xn )k∞ = supn>1 |xn | = maxn>1 |xn |.
(b) The space ℓp of p-summable sequences of scalars with p ∈ [1, ∞). That is,
( ∞
)
X
ℓp = (xn ) | |xn |p < ∞
n=1
P∞
p 1/p
with the p-norm k(xn )kp = n=1 |xn | .
2.3 The Space of Continuous Functions
To illustrate the failure of compactness of the closed unit ball in a Banach

space, we now discuss the Banach space of continuous functions C(X) on a
compact metric space (X, d). A subset K ⊆ C(X) is said to be equicontinuous
if for every ε > 0 there is a δ > 0 such that
d(x, y) < δ =⇒ |f (x) − f (y)| < ε

for all x, y ∈ X and f ∈ K. The key uniformity here is that a single δ may
be used for all the functions f ∈ K.
2.3.1 The Arzela–Ascoli Theorem
Essential Exercise 2.37. Recall that a function f : X → R on a metric

space (X, d) is uniformly continuous if for any ε > 0 there is some δ > 0 for
which
d(x, y) < δ =⇒ |f (x) − f (y)| < ε
for all x, y ∈ X. Show that any continuous function on a compact metric
space is uniformly continuous and hence that any finite set of continuous
functions is equicontinuous.
Theorem 2.38 (Arzela–Ascoli). Let (X, d) be a compact metric space, and

let C(X) be the Banach space of continuous (real- or complex-valued) func-
tions on X with the supremum norm. A subset K ⊆ C(X) is compact if and
only if K is closed, bounded, and equicontinuous.
Proof. Suppose that K ⊆ C(X) is compact, so it is closed and bounded.

We will now show that it is also equicontinuous. Fix ε > 0. Then we may
find finitely many functions f1 , . . . , fn ∈ K such that
n
[
K⊆ Bε (fi ) (2.18)
i=1
by compactness, since {Bε (f ) | f ∈ K} is an open cover of K. Each fi is

continuous and, since X is compact, each fi is also uniformly continuous by
Exercise 2.37. Since the family {fi } is finite, we can conclude that there is
a δ > 0 with
d(x, y) < δ =⇒ |fi (x) − fi (y)| < ε (2.19)
for i = 1, . . . , n. We now combine (2.18) and (2.19) for the given ε > 0. Fix
some f ∈ K. By (2.18), there exists some i with kf − fi k∞ < ε. If x, y ∈ X
and d(x, y) < δ, then
|f (x) − f (y)| 6 |f (x) − fi (x)| + |fi (x) − fi (y)| + |fi (y) − f (y)| < 3ε,
| {z } | {z } | {z }
<ε <ε by (2.19) <ε
showing equicontinuity.
Proving compactness: Now suppose that K ⊆ C(X) is closed, bounded,
and equicontinuous. To show that K is compact, let (fn ) be an arbitrary
sequence in K. It will be enough to exhibit a Cauchy subsequence of (fn ),
since by Example 2.24(3) such a subsequence will converge in C(X), and by
our assumption that K is closed the limit will be in K.
First notice that X contains a dense countable subset D since X is a

compact metric space (a topological space is separable if it contains a dense
countable subset). The argument S given Xhere shows that a compact metric
space is separable. In fact, X ⊆ x∈X B1/m (x), which implies that
[
X
X⊆ B1/m (y) (2.20)
y∈Dm
S
for some finite subset Dm ⊆ X, so that the set D = m>1 Dm is countable
and dense.
Next notice that by our assumption on K there is some M with
kfn k∞ 6 M
for every n > 1. Let us write IM = [−M, M ] ⊆ R or IM = BM C depending on
whether the field of scalars is R or C. Then by Tychonoff’s theorem (The-

D
orem A.20 and Lemma A.17) IM is a compact metric space with respect to
D D
the product topology. Define φn ∈ IM by φn = fn |D . By compactness of IM
D
there exists a subsequence (φnk ) which converges in IM . This convergence is
precisely the statement that
fnk (y) → φ(y)
as k → ∞ for all y ∈ D and some function φ : D → IM . Note, however, that

at this point no uniformity of the convergence is known.
We now upgrade the argument above to give the desired statement
that (fnk ) is a Cauchy sequence. Fix ε > 0. Then there exists some δ > 0
with
d(x, y) < δ =⇒ |fnk (x) − fnk (y)| < ε (2.21)
for all k > 1 (this is possible by equicontinuity of K). Now choose some m ∈ N
1
with m 6 δ. Since
fnk (y) −→ φ(y)
as k → ∞ for y ∈ Dm , each of the sequences (fnk (y))k in IM (with k varying)
is a Cauchy sequence. Since m is fixed, there are only finitely many sequences
concerned, so there exists some N (ε) such that k, ℓ > N (ε) implies that
|fnk (y) − fnℓ (y)| < ε (2.22)
for all y ∈ Dm . Now we combine (2.21) and (2.22) as follows. Given x ∈ X,

1
by (2.20) there is some y ∈ Dm with d(x, y) < m 6 δ. For k, ℓ > N (ε) this
implies
|fnk (x) − fnℓ (x)| 6 |fnk (x) − fnk (y)|

| {z }
<ε by (2.21)
+ |fnk (y) − fnℓ (y)| + |fnℓ (y) − fnℓ (x)| < 3ε.
| {z } | {z }
<ε by (2.22) <ε by (2.21)
Thus kfnk − fnℓ k < 3ε for all k, ℓ > N (ε), showing that the subsequence is
Cauchy as required.
Exercise 2.39. (a) Prove the Arzela–Ascoli theorem for any compact space (that is,
without assuming that the space is a metric space). To do this, define a subset K of C(X)
to be equicontinuous if for every ε > 0 and every x ∈ X there exists a neighbourhood U
of x with |f (y) − f (x)| < ε for all f ∈ K.
(b) Extend the Arzela–Ascoli theorem to the space C0 (X) of continuous functions vanishing
at infinity with the uniform norm kf k∞ = supx∈X |f (x)|, where X is a locally compact
metric (or just locally compact) space.
2.3.2 The Stone–Weierstrass Theorem
Let X be a compact topological space. We now prove a useful criterion for

a subset of functions to be dense in C(X). However, for this we will need to
distinguish between the space CR (X) of real-valued, and the space CC (X) of
complex-valued, continuous functions on X.
Theorem 2.40 (Stone–Weierstrass). Let (X, T ) be a compact topological
space.
(a) Suppose that A ⊆ CR (X) is a collection of functions that satisfy the
following properties:
• (Algebra) A is a sub-algebra, meaning that A is a linear subspace
of CR (X) and, for any f, g ∈ A, the pointwise product f g also belongs
to A;
• (Constants) the constant function 1 lies in A;
• (Separation) the algebra A separates points: for any x, y ∈ X with x
not equal to y, there is some function f ∈ A with f (x) 6= f (y).
Then A is dense in CR (X) with respect to k · k∞ .
(b) Suppose that A ⊆ CC (X) satisfies all of the properties in (a) and, in
addition, has
• (Complex conjugation) A is closed under conjugation, meaning that
if f is in A then f is in A.
Then A is dense in CC (X) with respect to k · k∞ .
Example 2.41. The algebra A = R[t] (or C[t]) of polynomials on a compact
interval X ⊆ R satisfies Theorem 2.40. This recovers the classical Weierstrass
approximation theorem.(5)
Uniform approximation in the complex setting is more complicated.(6) Let

us note that closure under complex conjugation is necessary for the conclusion
of the theorem to hold due to the following obstacle. If D ⊆ C is a compact
set with non-empty interior (a closed disk, for example), then the limit of
any sequence of holomorphic functions uniformly converging on D is also
holomorphic in the interior Do , so the continuous function z 7→ z cannot be
in the closure of the point-separating algebra C[z] in CC (D).
We will start the proof of Theorem 2.40 with the following lemma, and
will write A for the closure of A with respect to k · k∞ (notice that we also
use · to denote complex conjugation, but it should always be clear from the
context whether closure or conjugation is meant).
Lemma 2.42. Let A be a sub-algebra of CR (X) containing the constant func-
tions. Then A is also a sub-algebra, and
|f |, max{f, g}, min{f, g} ∈ A
for any f, g ∈ A.
Proof. It is easy to check that the algebra operations are continuous with
respect to k · k∞ . (See (2.2)–(2.3) for the vector space operations and gen-
eralize the argument to include the product operation for functions; see also
Section 2.4.2 for a more general discussion containing this case.) Therefore A
is also an algebra. Recall that
∞
X
√ 1/2
n
1 + u = (1 + u)1/2 = n u
n=0
is a power series with radius of convergence 1 (this can be shown using

real√analysis by using the standard estimates for the Taylor approximation
√
of 1 + u. Using complex analysis this follows from the fact that 1 + u is
holomorphic for |u| < 1.) In particular, we have
∞
X
1/2 (1 − ε)n < ∞.
n
n=0
P∞
Studying the coefficients more closely gives n=0 | 1/2
n | < ∞, and the reader
who knows this may set ε = 0 and simplify the argument accordingly, but
we will not use this. Suppose that f ∈ A, M = kf k∞ , and ε > 0. Then the
function
1
gε = 2 (f 2 + ε)
M +ε
is in A and takes on values in [ε/(M 2 + ε), 1], and so
∞
X
1/2kgε − 1kn∞ < ∞,
n
n=0
which implies that

X∞
√ 1/2 1/2
gε = (1 + (gε − 1)) = (gε − 1)n
n=0
n
converges with
p respect to k · k∞ by Example 2.24(3) and Lemma 2.28. We
deduce that f 2 + ε ∈ A. Now
p p p f2 + ε − f2 ε
06 f 2 + ε − |f | = f 2 + ε − f 2 = p p 6 √ .
2
f +ε+ f 2 ε
p √ p

In particular, |f | − f 2 + ε 6 ε, and so the fact that f 2 + ε ∈ A
∞
for all ε > 0 implies that |f | ∈ A. The identities
max{f, g} = 21 (f + g) + 21 |f − g|
and
min{f, g} = 12 (f + g) − 21 |f − g|
give the other parts of the lemma.
Proof of Theorem 2.40. We start with the case of an algebra A ⊆ CR (X).

Notice that by Lemma 2.42 the algebra A is closed under taking finitely many
maxima or minima: if f1 , . . . , fn ∈ A then
max{f1 , . . . , fn }, min{f1 , . . . , fn } ∈ A.
We will use this property for a given f ∈ CR (X) and ε > 0 to find a func-
tion fε ∈ A with kf − fε k∞ < ε. This then implies that A = CR (X). The
construction has three steps.
First step: correct value at two points. Let x0 , x ∈ X be (not ne-
cessarily distinct) points. Then there exists some hx0 ,x ∈ A with

hx0 ,x (x0 ) = f (x0 )
(2.23)
hx0 ,x (x) = f (x).
Indeed, if x0 = x then we simply take hx0 ,x = f (x0 )1 ∈ A. If x ∈ Xr{x0 }

we know that A contains a function e h ∈ A with e h(x) 6= eh(x0 ) since the
algebra separates points. In this case, we may find a linear combination hx0 ,x
h ∈ A and the constant function 1 ∈ A with the desired property.
of e
Second step: correct value at one point, nowhere much smaller.
Let x0 ∈ X. As our next step we claim that there exists a function gx0 ∈ A
with
gx0 (x0 ) = f (x0 )
(2.24)
gx0 (y) > f (y) − ε
for all y ∈ X. That is, gx0 is chosen to have the correct value at x0 for the
objective of approximating f , and to be not much smaller than f at every
other point, as illustrated in Figure 2.1.
hx0 ,x f
f −ε
hx0 ,x0 = f (x0 )1
x0 x X
| {z }
| {z } Ox
O x0
Fig. 2.1: The function gx0 is constructed by finding x1 , . . . , xn (in this case, x0
and x) with the property that gx0 = max{hx0 ,x1 , . . . , hx0 ,xn } > f − ε.
We will construct gx0 as a maximum after finding a finite subcover for the
following open cover of X. For any x ∈ X (including x0 ) there exists an open
neighbourhood Ox of x with
y ∈ Ox =⇒ hx0 ,x (y) > f (y) − ε, (2.25)
where hx0 ,x ∈ A is as in (2.23). This defines an open cover {Ox | x ∈ X}

of X. By compactness there exists some finite subcover
X = Ox1 ∪ · · · ∪ Oxn . (2.26)
We define
gx0 = max{hx0 ,x1 , . . . , hx0 ,xn } ∈ A,
and notice that gx0 satisfies
gx0 (x0 ) = max{f (x0 ), . . . , f (x0 )} = f (x0 )
by (2.23), and by (2.26) for every y ∈ X there is some i ∈ {1, . . . , n} for

which y ∈ Oxi , and hence
gx0 (y) > hx0 ,xi (y) > f (y) − ε

by (2.25).
f +ε
g x2
f
g x1
f −ε
g x3
| {z }
U x1
x2 x1 x3 X
| {z } | {z }
U x2 U x3
Fig. 2.2: The function fε = min{gx1 , . . . , gxm } is constructed with kf −fε k∞ < ε.
Third step: nowhere much smaller, nowhere much bigger. The

claim (2.24) above takes care of one half of the need to find an approximation
to f within A. For every x ∈ X we found some gx ∈ A that is nowhere much
smaller than f , and is equal to f at x. We now vary the point x, and essentially
repeat the argument to find an ε-approximation to f within A. Indeed, for
every x ∈ X there is an open neighbourhood Ux for which
y ∈ Ux =⇒ gx (y) < f (y) + ε. (2.27)
By allowing x ∈ X to vary this gives an open cover {Ux | x ∈ X} of X, and

once again by compactness there is a finite subcover
X = Ux1 ∪ · · · ∪ Uxm . (2.28)
We define
fε = min{gx1 , . . . , gxm } ∈ A,
and claim that kf − fε k∞ 6 ε, as illustrated in Figure 2.2. For every y ∈ X
we have
gxi (y) > f (y) − ε
by the property of gxi in (2.24), and so fε (y) > f (y)−ε. By (2.28) every y ∈ X
lies in some Uxi and so (2.27) implies that
fε (y) 6 gxi (y) < f (y) + ε.
Since fε ∈ A and ε > 0 was arbitrary, we deduce that f ∈ A.

Complex case. In the case of a complex sub-algebra A ⊆ CC (X) that is
closed under conjugation we may consider
AR = A ∩ CR (X).
This is again a sub-algebra that separates points if A separates points. Indeed,

if x, y ∈ X with x 6= y then there is (by the assumption on A) some f ∈ A
with f (x) 6= f (y). Let u = ℜ(f ) and v = ℑ(f ), so that
f +f f −f
u= ,v = ∈ AR
2 2i
by our assumption on A. Thus AR also contains a function that separates x
and y. By the real case, AR is dense in CR (X), so by splitting an arbitrary
function in CC (X) into real and imaginary parts and approximating each of
these with elements of AR ⊆ A the theorem is proved.
Exercise 2.43. (a) Let X be as in Theorem 2.40. Show that the second requirement
(Constants) on A ⊆ C(X) could also be replaced by the requirement
• (Nowhere vanishing) for every x ∈ X there is a function f ∈ A with f (x) 6= 0.
(b) Let X be a locally compact space. Extend the Stone–Weierstrass theorem to C0 (X) by
considering a sub-algebra A ⊆ C0 (X) that separates points, is closed under conjugation,
and vanishes nowhere.
Exercise 2.44. Define, for every infinite compact subset K ⊆ R,
kpkK = kp|K kC(K) = sup |p(x)|.

x∈K
Show that k · kK and k · kL are inequivalent norms on R[x] if K 6= L are two different
infinite compact subsets.
Exercise 2.45. Let X and Y be two compact spaces. Prove that the linear hull of all
functions of the form (x, y) ∈ X × Y → f (x)g(y) for f ∈ C(X) and g ∈ C(Y ) is dense
in C(X × Y ).
The following is also an easy consequence of the Stone–Weierstrass the-

orem.
Lemma 2.46 (Separability). Let (X, d) be a compact metric space. Then

the Banach space C(X) is separable with respect to the topology induced by
the supremum norm.
Proof. The space X is separable (this may be seen, for example, from the
proof of Theorem 2.38) so we may choose a countable dense set {xn | n ∈ N}
in X. We now define fn (x) = d(x, xn ) for all x ∈ X and n > 1, and claim
that these functions separate points in X. That is, if x 6= y then there exists
some n for which fn (x) 6= fn (y). To see this, notice that by density there is
some n with d(x, xn ) = fn (x) < 12 d(x, y), which implies that
fn (y) = d(y, xn ) > d(y, x) − d(x, xn ) > 21 d(x, y).
Now let AQ = Q[f0 = 1, f1 , f2 , . . . ] be the Q-algebra generated by the

functions f1 , f2 , . . . together with the constant function f0 = 1. Clearly AQ
is countable, and the closure of AQ contains the algebra A = R[f0 , f1 , f2 , . . . ].
Since A is an algebra that separates points, it is dense in CR (X) (and A +iA
is dense in CC (X)) by the Stone–Weierstrass theorem (Theorem 2.40).
2.3.3 Equidistribution of a Sequence
†
As an application of the discussion above, and in particular of the Stone–
Weierstrass theorem, we now describe the notion of equidistribution. A se-
quence (xn )n of elements of a metric space X is dense if for every x ∈ X
there is a subsequence (xnk )k that converges to x. A much finer property
is given by equidistribution, which roughly speaking corresponds to the se-
quence spending the right proportion of time in any given part of the space.
In this section we will define and discuss this notion carefully for X = [0, 1].
A sequence (xn )n>1 of points in [0, 1] is said to be equidistributed or uni-
formly distributed if any one of the following equivalent conditions is satisfied:
1
(1) |{k ∈ N | 1 6 k 6 K, xk ∈ [a, b]}| → b−a as K → ∞ for 0 6 a < b 6 1.
K
K Z 1
1 X
(2) f (xk ) −→ f (x) dx as K → ∞ for any f ∈ C([0, 1]) (that is,
K 0
k=1
any continuous function).
K Z 1
1 X
(3) f (xk ) −→ f (x) dx as K → ∞ for any f ∈ R([0, 1]) (that is,
K 0
k=1
any Riemann-integrable function). (
K Z 1
1 X 0 if n 6= 0
(4) χn (xk ) −→ χn (x) dx = as K → ∞ for any n
K 0 1 if n = 0
k=1
in Z, where χn (x) = e2πinx for all x ∈ [0, 1].
We will now sketch some of the implications between these equivalent state-
ments (see Exercise 2.48) and will return to the topic of equidistribution in
Chapter 8 from a more general point of view.
Almost a proof of (4) =⇒ (2). Consider the algebra of trigonometric
polynomials
† The results of this section will not be needed in this form later, so may be skipped.
( N
)
X
A= cn χn | cn ∈ C, N ∈ N .
n=−N
Using the complex version of the Stone–Weierstrass theorem (Theorem 2.40),

it follows that A is dense in C(T) with respect to the uniform metric (see
Exercise 2.47 and Proposition 3.65), where T = R/Z. This means that
given f ∈ C(T) and ε > 0, there is some g ∈ A with
kf − gk∞ = sup |f (x) − g(x)| < ε,

x∈T
which implies that Z 1 Z 1

f (x) dx − g(x) dx < ε

0 0
and
1 X
K K
1 X

f (xk ) − g(xk ) < ε
K K
k=1 k=1
for any K > 1. If K is sufficiently large then, by assumption,

Z 1
1 X K

g(xk ) − g(x) dx < ε.
K 0
k=1
It follows that
1 X
K Z 1

f (xk ) − f (x) dx < 3ε,
K 0
k=1
which is not quite the claim in (2) since C(T) and C([0, 1]) differ slightly.
Indeed, any function f : T → C gives rise to a function f : R → C via the
diagram
f
R /C
⑧⑧?
⑧⑧
⑧⑧⑧ f
⑧
T
which we can restrict to [0, 1], defining an element g ∈ C([0, 1]). If f : T → C
is continuous then so is g, but g will always satisfy g(0) = g(1). On the other
hand, if g ∈ C([0, 1]) is a function satisfying g(0) = g(1) then one can define
a continuous function f : T → C by f (t + Z) = g(t) for t ∈ [0, 1], and obtain
the result for such g.
The extension to general continuous functions on [0, 1] can be handled by
the same method as in the proof that (2) implies (1) below, where we will
only assume (2) for all f ∈ C(T).
Exercise 2.47. Show that A as in the previous proof does indeed satisfy all the assump-
tions of Theorem 2.40,
Proof of (2) =⇒ (1). Suppose first that 0 < a 0 and choose
continuous functions f− , f+ : [0, 1] → R with
(a) 0 6 f− (x) 6 1[a,b] (x) 6 f+ (x) 6 1 for all x ∈ [0, 1],
Z 1
(b) (f+ − f− ) dx < ε, and
0
(c) f+ (0) = f+ (1) = f− (0) = f− (1) = 0.
For example, the functions f+ and f− could be chosen to be piecewise linear,
as illustrated in Figure 2.3. In this case the shaded region can easily be
chosen to have total area bounded above by ε, as required in (b). By (c), the
functions f− and f+ also define continuous functions on T.
1[a,b]
f+
f−
0 a b 1
Fig. 2.3: The function 1[a,b] and the approximations f− (drawn using dots) and f+
(using dashes).
Since
K K K
1 X 1 X 1 X
f− (xk ) 6 1[a,b] (xk ) 6 f+ (xk )
K K K
k=1 k=1 k=1
R1
for all K > 1, and the left-hand side converges to 0 f− (x) dx while the
R1
right-hand side converges to 0 f+ (x) dx as K → ∞, we obtain
K K
1 X 1 X
(b−a)−ε 6 lim inf 1[a,b] (xk ) 6 lim sup 1[a,b] (xk ) 6 (b−a)+ε,
K→∞ K K→∞ K
k=1 k=1
which implies the claim in (1) for 0 < a < b < 1. The formula in (1) holds
trivially if f ≡ 1, so we also get
K
1 X
1[0,a) (xk ) + 1(b,1] (xk ) −→ 1 − (b − a)
K
k=1
as K → ∞ by taking the difference. Suppose now that 0 = a 0, we have
f− = 1[ε,b] 6 1[0,b] 6 1[0,b+ε) + 1(1−ε,1] = f+
and Z 1
(f+ − f− ) dx < 3ε,
0
and the formula in (1) already holds for f− and f+ . As before, this implies
the claim for 1[0,b] . The case of 0 < a < b = 1 is similar.
Exercise 2.48. Prove the remaining implications to show that the four characterizations
of equidistribution at the start of this section are indeed equivalent.
Example 2.49. A simple example of an equidistributed sequence may be ob-

tained as follows. Fix α ∈ RrQ and define xk = {kα} ∈ [0, 1) for k ∈ N,
where {t} denotes the fractional part of the real number t. To see that this
defines an equidistributed sequence, the characterization in (4) is the most
1
PK
convenient to use. For n = 0, we have χn ≡ 1 and so K k=1 χ0 (xk ) = 1 for
all K. If n 6= 0, then
K K
1 X 2πinkα 1 X 2πinα k e2πin(K+1)α − e2πinα
e| {z } = e = −→ 0
K K K(e2πinα − 1)
k=1 6=1 k=1
as K → ∞.
An amusing consequence of this example is a special case of Benford’s

law [7].
Exercise 2.50. Use the equidistribution from Example 2.49 to show the following. Write ℓn
for the leading digit of 2n written in decimal (so the sequence (ℓn ) begins (2, 4, 8, 1, 3, . . . )).
Then
1
|{k ∈ N | 1 6 k 6 K, ℓk = 1}| −→ log10 2
K
as K → ∞.
2.3.4 Continuous Functions in Lp Spaces
Another important feature of continuous functions (of compact support) is

that they form a dense subset of the Lp spaces. We say that a measure on a
topological space is locally finite if each point of X has an open neighbourhood
of finite measure.
Proposition 2.51 (Density of Cc (X) in Lpµ (X)). Let X be a locally com-

pact σ-compact metric space equipped with a locally finite measure µ on the
Borel σ-algebra B(X). Then, for any p ∈ [1, ∞), Cc (X) is dense in Lpµ (X).
Proof† . We split the proof of the proposition into several steps.

Compact case. We assume first that X is a compact metric space, that µ is
a finite Borel measure on X, and start with some preparatory observations.
Fix p ∈ [1, ∞) and let f ∈ Lµp (X). Then
f = ℜf + iℑf,
and it is enough to show that each of ℜf and ℑf can be approximated in Lpµ

by elements of Cc (X). We may therefore assume that f is real-valued, and
by writing f = f + − f − we may also assume that it takes values in [0, ∞).
Now notice that such a real-valued, non-negative function f is the pointwise
limit of the simple functions

fn (x) = min n, 21n ⌊2n f (x)⌋ ր f (x)
as n → ∞, which implies that

Z 1/p
p
kfn − f kp = |fn − f | dµ −→ 0
X
as n → ∞, by dominated convergence. PN
Thus it is sufficient to show that any simple function f = i=1 ai 1Bi
(where ai ∈ R and Bi ∈ B(X) have µ(Bi ) < ∞ for i = 1, . . . , N ) can be
approximated by elements of C(X). This in turn will follow if we can show
that the characteristic function of any Borel set can be approximated by
elements of C(X) in the k · kp norm.
Defining a σ-algebra. Having made these initial reductions, we can now
turn to the heart of the argument (still assuming that X is compact). We
define the family n o k·kp
A = B ∈ B | 1B ∈ C(X)
of all Borel sets whose characteristic function can be approximated by ele-

ments of C(X) (the notation indicates that the closure is taken within Lpµ (X)
and with respect to the norm k · kp ).
We claim that A = B, and will prove this by showing that
• A contains any open subset of X, and
• A is a σ-algebra.
Open Subsets. Let O ⊆ X be open. Define the closed set A = XrO and
the distance function
d(x, A) = inf d(x, y). (2.29)
y∈A
† The reader may be familiar with this result for the Lebesgue measure (for example),
and this case is sufficient for much of the material that will follow. Thus she may skip the
general proof and return to it at a later stage if needed.
This distance function satisfies
|d(x1 , A) − d(x2 , A)| 6 d(x1 , x2 ) (2.30)
for all x1 , x2 ∈ X. Indeed, given x1 , x2 ∈ X and ε > 0, there exists some y ∈ A

for which d(x2 , y) 6 d(x2 , A) + ε, and so
d(x1 , A) 6 d(x1 , y) 6 d(x1 , x2 ) + d(x2 , y) 6 d(x1 , x2 ) + d(x2 , A) + ε,
which implies that d(x1 , A) 6 d(x1 , x2 ) + d(x2 , A), and hence (2.30) by the
symmetry between x1 and x2 . This shows the continuity of x 7→ d(x, A)
and so it follows that the function defined by fn (x) = min{1, nd(x, A)} lies
in C(X). Moreover, if x ∈ A = XrO then fn (x) = 0 = 1O (x), while if x ∈ O
then d(x, A) > 0 and fn (x) ր 1 = 1O (x). Thus fn ր 1O as n → ∞ on X,
and so Z 1/p
kfn − 1O kp = |fn − 1O |p dµ −→ 0
O
as n → ∞ by dominated convergence. This shows that O ∈ A, by definition.

Complements. Suppose that A ∈ A. Then there exists a sequence of
functions (fn ) in C(X) with kfn − 1A kp → 0 as n → ∞. Using the se-
quence 1 − fn ∈ C(X), we see that XrA ∈ A.
Finite Intersections. Suppose that A, B ∈ A. Then there exist sequences
of functions (fn ), (gn ) in C(X) with kfn − 1A kp → 0 and kgn − 1B kp → 0
as n → ∞. We may assume that fn and gn take on values in [0, 1], for if not
we can replace fn by
ff
n = max{0, min{1, fn }}
(which will approximate 1A equally well or better), and similarly gn by gfn .

Then fn gn ∈ C(X) and
fn gn − 1A∩B = fn gn − 1A 1B = (fn − 1A ) gn + 1A (gn − 1B ) ,
which implies
kfn gn − 1A∩B kp 6 kfn − 1A kp kgn k∞ + kgn − 1B kp −→ 0
as n → ∞. This shows that A ∩ B ∈ A as desired.

Countable unions. Let A, B ∈ A. Then XrA, XrB ∈ A and hence

B ∪ A = Xr (XrA) ∩ (XrB) ∈ A
by the two steps above. This extends to finite unions by induction.

Now suppose that A1 , A2 , . . . all lie in A, and fix ε > 0. Then there exists
an ℓ such that
[ ∞ [ℓ
µ Akr Ak < εp .
k=1 k=1
Thus
[
∞ ℓ
[ 1/p
S∞
1 k=1 Ak − 1Sℓk=1 Ak = µ Akr Ak < ε.
p
k=1 k=1
Sℓ
However, since k=1 Ak ∈ A for any ℓ > 1, we already know that there exists
an f ∈ C(X) with

f − 1Sℓk=1 Ak < ε,
p
and so
f − 1S∞ A < 2ε.
k=1 k p
S
Since ε > 0 was arbitrary, we deduce that ∞k=1 Ak ∈ A.
Concluding the compact case. By the arguments above, A is a σ-algebra
containing all the open subsets of X. By definition, A ⊆ B and so A = B by
definition of the Borel σ-algebra B. As explained above, this implies that every
simple function, and so also every function, in Lpµ (X) can be approximated
by continuous functions.
Extending to the locally compact case. Let us now extend the above
to the general case where X is locally compact σ-compact metric and µ is
locally finite. By Lemma A.22 we find a sequence (XS m ) of compact subsets
o ∞
of X with Xm ⊆ Xm+1 for all m > 1, and with X = m=1 Xm .
Given some f ∈ Lpµ (X) we first note that the sequence fm = f 1Xm con-
verges to f with respect to k · kp as m → ∞ (by dominated convergence).
Given some ε > 0 we choose m such that kf − fm kp < ε.
Next we apply the compact case above and hence we find some g ∈ C(Xm )
with kg − fm kLp(Xm ,µ) < ε. Applying Tietze’s extension theorem (Proposi-
o
tion A.29) we can extend g to an element g ∈ Cc (Xm+1 ) ⊆ Cc (X). Using
again the distance function in (2.29) with A = Xm we define the sequence

gn (x) = 1 − min{1, nd(x, Xm )} g(x) ∈ Cc (X).
For x ∈ Xm we have gn (x) = g(x) and for x ∈ / Xm we have |gn (x)| ց 0

as n → ∞. Now notice that
Z Z
p p p
kgn −fmkp = |gn − fm | dµ = kg − fm kLp (Xm ,µ) + |gn |p dµ,
Xm+1 Xm+1 rXm
where the first expression on the right is less than εp by construction of g and
the second expression converges to 0 by dominated convergence as n → ∞.
Therefore, there exists some n > 1 such that kgn − fm kp < 2ε. Combining
this with the choice of fm above, we obtain kf − gn kp < 3ε and gn ∈ Cc (X),
as desired.
2.4 Bounded Operators and Functionals 55
2.4 Bounded Operators and Functionals
Just as in linear algebra, linear maps are of fundamental importance in func-

tional analysis. However, in infinite-dimensional normed vector spaces con-
tinuity of linear maps is not guaranteed.
Lemma 2.52 (Continuity and boundedness). Let L : V → W be a linear

map between the two normed vector spaces (V, k · kV ) and (W, k · kW ). Then L
is continuous if and only if the operator norm
kLk = kLkop = sup kLvkW

v∈V
kvkV 61
is finite.
Definition 2.53. A continuous linear map L : V → W between normed

vector spaces is called a bounded linear operator. We denote the space of all
bounded operators from V to W by B(V, W ). For brevity we write B(V )
for B(V, V ). If W = R (or W = C if the field of scalars is C) then we also
write V ∗ for B(V, R) (respectively B(V, C)) the dual space of V , and elements
of the dual space are called linear functionals.
Lemma 2.54 (Space of operators). Let (V, k · kV ) and (W, k · kW ) be

normed vector spaces. Then the space B(V, W ) of bounded linear maps from V
to W is also a normed vector space with addition and scalar multiplication
defined pointwise as in any space of functions, and with the operator norm
from Lemma 2.52. If W is a Banach space, then so is B(V, W ), and in par-
ticular V ∗ is always a Banach space.
Proof of Lemma 2.52. The case L = 0 is trivial, so we may assume that L

is not 0. Suppose that kLkop < ∞. Then for any v0 ∈ V we have
V

L v0 + Bε/kLk op
⊆ L(v0 ) + BεW
V
since v ∈ Bε/kLk r{0} implies that
op

kLvkW = kvkV L kvk−1
V v W < ε.
| {z } | {z }
<ε/kLkop 6kLkop
Hence kLkop < ∞ implies that L is continuous.

Suppose now that L is continuous. Then there exists some δ > 0 such that

L BδV ⊆ B1W .
In particular, kvkV 6 1 implies that kL( 2δ v)kW 6 1, and kLvkW 6 2

δ. As
this holds for all v with kvkV 6 1, we deduce that kLkop 6 δ2 < ∞.
As the next exercise shows, the notion of boundedness for an operator

makes a clear distinction between integration and differentiation of real-
valued functions.
Exercise 2.55. Show that the operator I : C([0, 1]) → C([0, 1]) defined as the integral
Z x
I(f )(x) = f (t) dt
0
is continuous. Use this to shorten the argument in the proof for Example 2.24(5) on p. 30.
Show also that the operator D : C 1 ([0, 1]) → C([0, 1]) defined as the derivative D(f ) = f ′
is not continuous if we use the norm k · k∞ on both spaces.
Notice that the definition of the operator norm immediately gives the
general inequality
kLvkW 6 kLkop kvkV ,
for all v ∈ V , and the operator norm may be characterized as being the
smallest number C with the property that
kLvkW 6 CkvkV , (2.31)
for all v ∈ V . We will use both these statements frequently in the sequel
without comment.
Essential Exercise 2.56. Prove that the operator norm of a bounded op-
erator L : V → W between two normed vector spaces is the smallest con-
stant C > 0 such that (2.31) holds for all v ∈ V .
Proof of Lemma 2.54. As indicated in the lemma, for L1 , L2 ∈ B(V, W )

and a scalar α we define αL1 + L2 by (αL1 + L2 ) (v) = αL1 (v) + L2 (v) for
all v ∈ V . This is clearly another linear map. In order to bound its operator
norm, let v ∈ V with kvkV 6 1. Then
k(αL1 + L2 )(v)kW = kαL1 (v) + L2 (v)kW

6 |α|kL1 (v)kW + kL2 (v)kW
6 |α|kL1 kop + kL2 kop ,
and so kαL1 + L2 kop 6 |α|kL1 kop + kL2 kop . That is, the operator norm
satisfies the triangle inequality and one half of the homogeneity property.
The reverse inequality for homogeneity of the operator norm follows easily
by considering the case α = 0 and α 6= 0 separately, as in the proof of
Lemma 2.15. Strict positivity is clear, so we have shown that B(V, W ) is a
normed vector space with the operator norm.
Now suppose that W is a Banach space and that (Ln ) is a Cauchy sequence
in B(V, W ). We claim that
L(v) = lim Ln (v)

n→∞
defines an element L of B(V, W ) which is the limit of the sequence with

respect to the operator norm. To see that L(v) is well-defined it is enough
to check that (Ln (v)) is a Cauchy sequence, which follows at once from the
bound
kLm (v) − Ln (v)kW = k(Lm − Ln )(v)kW 6 kLm − Ln kop kvkV ,
which (for fixed v) may be made as small as we please for m, n large by the
Cauchy property for the sequence (Ln ).
To see that the limit L is a bounded operator one has to show that it is
linear (which we leave as an exercise) and that it is bounded. For the latter,
assume that v ∈ V has kvkV 6 1 and choose N (ε) as in the Cauchy property
for (Ln ) with kLm − Ln kop 6 ε for m, n > N (ε). Continuity of the norm now
gives
kLn (v) − L(v)kW = lim kLn (v) − Lm (v)kW 6 ε
m→∞
for n > N (ε). Taking the supremum over v with kvkV 6 1, we get
kL − Ln kop 6 ε,
so L is bounded with kLkop 6 kLn kop + ε, and as ε > 0 is arbitrary we also

see that Ln → L as n → ∞ with respect to k · kop .
A word about notation: Where the spaces concerned are clear, or where
we wish to emphasize certain aspects of the spaces, we will for brevity often
use k · k or k · kX to mean the appropriate norm in that situation. Thus, for
example, depending on context, the symbols kLk, kLkop , and kLkB(V,W ) all
mean the same thing. A good exercise for the reader is to ensure that they
can identify the norms in each case.
Lemma 2.57 (Sub-multiplicativity of operator norms). Let V, W, Z be
three normed vector spaces, and let R : V → W and S : W → Z be bounded
operators. Then S ◦ R : V → Z is also a bounded operator, and
kS ◦ Rk 6 kSkkRk.
In particular, if L : V → V is a bounded operator then kLn k 6 kLkn for

all n > 1.
Proof. We have kS ◦ R(v)k 6 kSkkR(v)k 6 kSkkRkkvk 6 kSkkRk for

any v ∈ V with kvk 6 1.
Exercise 2.58. Compute the operator norm of the continuous map f 7−→ f when viewed:
(a) as a map from the Banach space C 1 ([0, 1]) to C([0, 1]) (and where the former is equipped
with the norm kf kC 1 ([0,1]) = max{kf k∞ , kf ′ k∞ } for f ∈ C 1 ([0, 1])); and
(b) as a map C([0, 1]) → L1m ([0, 1]), where m denotes Lebesgue measure on [0, 1].
(c) Compute the operator norm of the composition of the maps from (a) and from (b).
(d) Now restrict the maps in (a), (b) and (c) to the subspace of functions f with f (0) = 0,
and compute the operator norms again.
The following result is both quite easy and extremely useful for the theory
to come.
Proposition 2.59 (Unique extension to completion). Let V be a normed
vector space, let V0 ⊆ V be a dense subspace, and assume that L0 : V0 → W is
a bounded operator into a Banach space W . Then L0 has a unique bounded
extension L : V → W , that is a bounded linear map L : V → W which
satisfies L|V0 = L0 . Moreover, kLkB(V,W ) = kL0 kB(V0 ,W ) .
We implicitly assume here that a subspace V0 ⊆ V is equipped with the
restriction of the norm on V to V0 . This is important to remember in applic-
ations where the subspace may have other natural norms defined on it.
Proof of Proposition 2.59. For any v ∈ V there is a sequence (vn ) in V0
with vn → v as n → ∞. In particular, this implies that (vn ) is a Cauchy
sequence in V0 , and since
L0 : V0 → W
is bounded (and so Lipschitz), it follows that (L0 (vn )) is a Cauchy sequence
in W . If (vn′ ) is another sequence in V0 with vn′ → v as n → ∞ then it is
clear that vn − vn′ → 0 as n → ∞ and so
L0 (vn ) − L0 (vn′ ) −→ 0
as n → ∞ since L0 is bounded (and so continuous at 0). Thus it makes sense

to define an operator L on V by
L(v) = lim L0 (vn ) ∈ W,

n→∞
because W is a Banach space. Notice that by density and the desired continu-
ity of the extension, this is the only possible definition of a bounded operator
that extends L0 . One can quickly check that L is a linear map from V to W .
Moreover, if v ∈ V and (vn ) is a sequence in V0 with vn → v as n → ∞, then
kL(v)k = lim kL0 (vn )k 6 kL0 k lim kvn k,

n→∞ n→∞
| {z }
=kvk
showing that L is bounded, with kLk 6 kL0 k. On the other hand L|V0 = L0 ,
so kLk > kL0 k.
Corollary 2.60. Any two completions B1 and B2 of a given normed vector

space V are isometrically isomorphic.
Here a completion of a normed vector space V is a Banach space B contain-
ing an isometric dense copy of V , just as in the construction in Section 2.2.2.
Proof of Corollary 2.60. Suppose that φ1 : V → B1 and φ2 : V → B2
are isometric embeddings associated to the two completions, as illustrated in
Figure 2.4.
V
❆
φ1 ⑥⑥⑥ ❆❆❆ φ2
⑥ ❆❆
⑥⑥⑥ ❆❆
~ ⑥ ψ1 +B
B1 k 2
ψ2
Fig. 2.4: The two given completions φ1 , φ2 and the maps ψ1 , ψ2 to be constructed.
Since φ1 and φ2 are isometries, the map
φ2 ◦ φ−1
1 : φ1 (V ) −→ φ2 (V ) ⊆ B2
φ1 (v) 7−→ φ2 (v)
is a well-defined bounded operator defined on a dense subset φ1 (V ) ⊆ B1 .

By Proposition 2.59 there is an extension ψ1 : B1 → B2 with norm
kψ1 k = kφ2 ◦ φ−1

1 k = 1.
Similarly there exists an extension ψ2 : B2 → B1 which extends φ1 ◦ φ−1 2

and which also has norm kψ2 k = 1. It follows that ψ2 ◦ ψ1 and ψ1 ◦ ψ2
are extensions of the identity map on φ1 (V ) and on φ2 (V ) respectively. By
uniqueness of the extension in Proposition 2.59 we must have ψ2 ◦ ψ1 = IB1
and ψ1 ◦ ψ2 = IB2 . We also see that kbk = kψ2 (ψ1 (b))k 6 kψ1 (b)k 6 kbk for
any b ∈ B1 , so that ψ1 is an isometry from B1 to B2 with ψ2 its inverse.
Exercise 2.61. Let D = {z ∈ C | |z| < 1} ⊆ C be the open unit disk, and parameterize
the circle of radius r ∈ (0, 1) by the map γr : [0, 1] → C defined by γr (t) = re2πit . Let V
be the space of functions f ∈ C(D) holomorphic on D, and fix p ∈ [1, ∞).
(a) Equip V with the norm
Z 1 1/p
kf kH p (D) = sup |f (γr (t))|p dt .
r∈(0,1) 0
Show that the linear map Ez : f 7−→ f (z) is continuous with respect to k · kH p (D) for
all z ∈ D. Also show that if O ⊆ D is open with compact closure O ⊆ D, then
V ∋ f 7−→ f |O ∈ C(O)
is a bounded operator with respect to k·kH p (D) and k·k∞ on C(O). In particular, conclude
that there exists a canonical injective map from the completion H p (D) of V , known as a
Hardy space, into the space of holomorphic functions on D.
(b) Equip V with the norm
kf kAp (D) = kf kLp (D) ,
and repeat the problems from (a) to obtain the(7) Bergman space Ap (D).
Exercise 2.62. For each of the five norms on R[x] given in Example 2.2(7), find a Banach
space containing R[x] for which the induced norm obtained by restriction coincides with
the given norm on R[x].
2.4.1 The Norm of Continuous Functionals on C0 (X)
Let X be a locally compact metric space, and let µ be a finite Borel measure.
Then Z
µ : f 7−→ f dµ
is a continuous functional on C0 (X). Indeed,

Z Z

f dµ 6 |f | dµ 6 µ(X)kf k∞

shows the continuity by Lemma 2.52. More generally, if µ is a Borel measure

on X and g ∈ L1µ (X) then
Z
g dµ : C0 (X) ∋ f 7−→ f g dµ (2.32)
is also a continuous functional on C0 (X). Again this is easy to see since

Z Z

f g dµ 6 |f ||g| dµ 6 kf k∞ kgkL1 . (2.33)
µ
In fact, a more precise statement holds, but this takes a little more work.
Lemma 2.63 (Operator norm of integration). Suppose that µ is a Borel
measure on a locally compact σ-compact metric space X and g is a function
in L1µ (X). Then the norm of the functional on C0 (X) defined in (2.32) is
precisely kgkL1µ .
Proof† . Let (
g(x)
|g(x)| if g(x) 6= 0,
h(x) = arg(g(x)) =
0 if g(x) = 0.
Clearly h ∈ L∞
µ (X) and
Z Z
hg dµ = |g| dµ = kgkL1µ .
We wish to approximate h by continuous functions. Fix ε > 0. By Lusin’s the-

orem (Theorem B.17) applied to the finite measure ν defined by dν = |g| dµ
† The result of Section 2.4.1 will initially only be used in more concrete settings. The reader
may therefore skip the proof and return to it later if needed.

there exists a compact

R set K ⊆ X such that the restriction h|K of h to K
is continuous, and XrK |g| dµ < ε. By Tietze’s extension theorem (Pro-
position A.29) the restriction h|K can be extended to a continuous func-
tion fε ∈ Cc (X) of compact support. We may assume that kfε k∞ 6 1,
because if this is not the case we may replace fε by the continuous function
(
fε (x) if |fε (x)| 6 1,
fε (x)
|fε (x)| if |fε (x)| > 1.
Thus
Z Z Z

kg dµkop
> fε g dµ > fε g dµ − fε g dµ
X
ZK Z XrK
> |g| dµ − |g| dµ > kgkL1µ − 2ε.
K XrK
Since ε > 0 was arbitrary, this shows that
kgkL1µ 6 kg dµkop ,
and the reverse inequality follows from (2.33).
Exercise 2.64. Let X be a compact metric space, µ a Borel measure on X, and g a

function in L1µ (X). Give and prove a precise criterion in terms
R of properties of g for the
existence of a function f ∈ C(X) with kf k∞ 6 1 such that | f g dµ| = kgk1 .
2.4.2 Banach Algebras
In many situations it makes sense to multiply elements of a normed vector

space with each other. Recall that an algebra is a vector space and simultan-
eously a ring in such a way that the two structures are compatible: addition
in the vector space and addition in the ring are the same, the scalar multi-
plication and the ring multiplication satisfy (αx)y = x(αy) = α(xy) for all
scalars α and elements x, y ∈ A.
Definition 2.65. Let A be a Banach space, and assume there is a multi-

plication operation (x, y) 7→ xy from A × A to A such that addition and
multiplication make A into an algebra, with the sub-multiplicativity prop-
erty that
kxyk 6 kxkkyk
for all x, y ∈ A. Then A is called a Banach algebra. Elements a, b of an algebra
are said to commute if ab = ba, and the algebra is said to be commutative if
any a, b ∈ A commute.
Recall that a ring or an algebra does not need to have a unit; if a non-
trivial ring A has a unit 1A satisfying 1A a = a1A = a for all a ∈ A then it is
called unital.
The additional axiom on the norm makes the product operation continuous
by the following argument. Fix ε ∈ (0, 1) and x, y ∈ A. Then kx′ − xk < ε < 1
and ky ′ − yk < ε together imply that
kx′ y ′ − xyk 6 kx′ (y ′ − y)k + k(x′ − x)yk

6 (kx′ k + kyk) ε 6 (kxk + 1 + kyk) ε. (2.34)
Since ε ∈ (0, 1) was arbitrary, this shows the continuity of the product map
at (x, y) ∈ A × A.
Example 2.66. (1) The continuous functions C(X) on a compact topological

space X with the supremum norm form a Banach algebra with respect to
the pointwise multiplication operation (f g)(x) = f (x)g(x) for all x ∈ X.
Notice that the constant function 1 is a unit in this ring.
(2) Let X be a non-compact topological space. Then C0 (X) is a Banach
algebra with respect to the supremum norm and pointwise multiplication
as in (1) above, but it does not have a unit.
(3) If V is any Banach space, then B(V ) = B(V, V ) is a Banach algebra
with respect to composition. The sub-multiplicativity property of the
operator norm in Definition 2.65 is precisely the content of Lemma 2.57.
The algebra has a unit, namely the identity map I(v) = v for all v ∈ V .
(4) A special case of (3) above is the case V = Rn . By choosing a basis for Rn
we may identify B(Rn ) with the space of n × n real matrices.
In a Banach algebra with unit, we can apply many well-known functions

to its elements and obtain new elements of the Banach algebra. For example,
if a is any element of a unital Banach algebra, then we may define
X∞
an
exp a = ,
n=0
n!
where a0 = 1A is the unit in A. The series defines an element of A by

Lemma 2.28. We will return to the topic of Banach algebras in Chapter 11.
2.5 Ordinary Differential Equations
We want to briefly indicate how even the simplest differential equations can
lead directly to the study of integral operators, which may be analyzed using
tools introduced above (and in Chapter 6).
Consider first the differential equation
2.5 Ordinary Differential Equations 63
f ′′ (x) + f (x) = g(x) (2.35)
with the initial values

f (0) = 1, f ′ (0) = 0. (2.36)
Let us recall briefly the familiar approach to solving such an equation. First
one finds all solutions to the homogeneous equation
f ′′ (x) + f (x) = 0,
giving
f (x) = A sin x + B cos x (2.37)
for constants A and B. Then one moves on to the problem of finding one
particular solution fp to the equation
fp′′ (x) + fp (x) = g(x), (2.38)
ignoring the initial values, which may be done by a sophisticated guess if g is

sufficiently simple, or by using the method of variation of parameters (that is,
treating A and B as functions of x rather than constants). Finally, taking the
sum of f from (2.37) and a solution to (2.38), one chooses the constants A
and B in the solution to the homogeneous equation to satisfy the initial
values. Rather than going through this in detail, we claim that the function
Z x
f (x) = cos(x) + sin(x − t)g(t) dt (2.39)
0
is a solution to the initial value problem. This is easily checked by a calcula-

tion: f (0) = 1 clearly, and
Z x
f ′ (x) = − sin x + sin(x − x)g(x) + cos(x − t)g(t) dt,
0
so f ′ (0) = 0. Finally,
Z x
f ′′ (x) = − cos x + cos(x − x)g(x) − sin(x − t)g(t) dt
0
= −f (x) + g(x),
as required. To summarize, we have shown that (2.35)–(2.36) together are

equivalent to (2.39).
2.5.1 The Volterra Equation
If the original differential equation in (2.35) is changed slightly, to take the

form
f ′′ (x) + f (x) = σ(x)f (x), (2.40)

′
with the same initial values f (0) = 1 and f (0) = 0, then the discussion above
does not solve the equation. Nonetheless, the ideas are still useful, since it
transforms the equation into the integral equation
Z x
f (x) = cos(x) + sin(x − t) σ(t)f (t) dt. (2.41)
0 | {z }
g(t)
Now define k(x, t) = sin(x − t)σ(t) so that (2.41) takes the form
f = u + K(f ), (2.42)
where u(x) = cos x and

Z x
K(f )(x) = k(x, t)f (t) dt.
0
Due to its inventors and its nature K is called a Hilbert–Schmidt integral

operator. The function k will be referred to as the kernel of the integral
operator.
Solving the perturbed equation (2.40) with initial values turns out to be
straightforward at the level of abstraction aimed at in functional analysis.
We can rewrite the equation (2.42) as a Volterra equation
(I − K)f = u
where I is the identity map. The solution f is then given by applying the
inverse operator (I − K)−1 to u, which we may calculate (in this particular
case) using an operator form of the geometric series (in this context the
geometric series is usually called a von Neumann series),
∞
X
(I − K)−1 = K n,
n=0
and hence ∞
X
f= K n u.
n=0
The heuristic above is made formal in the following lemma.
Lemma 2.67. Suppose that k ∈ C([0, 1]2 ). Then

Z x
K(f )(x) = k(x, t)f (t) dt
0
defines a bounded linear operator K : C([0, 1]) → C([0, 1]) with kKk 6 kkk∞ ,
and more generally with kK n k 6 kkkn∞ /n! for n > 1. In particular, the
geometric series
X∞
−1
(I − K) = Kn
n=0

converges in B C([0, 1]) . It follows that the integral equation (I − K)f = u
has a unique solution for any u ∈ C([0, 1]). For u(x) = cos x, σ ∈ C([0, 1])
and k(x, t) = sin(x−t)σ(t) with x, t ∈ [0, 1], this solution belongs to C 2 ([0, 1]),
and solves the initial value problem
(
f ′′ + f = σf
(2.43)
f (0) = 1, f ′ (0) = 0
on [0, 1].
Proof. As k is uniformly continuous, it is easy to check that K(f ) ∈ C([0, 1])

for every f ∈ C([0, 1]). Indeed, if ε > 0 then there exists some δ > 0 for which
|x1 − x2 | < δ =⇒ |k(x1 , t) − k(x2 , t)| < ε
for all t ∈ [0, 1]. Multiplying by f (t) and integrating from 0 to x shows that
|x1 − x2 | < δ =⇒ |K(f )(x1 ) − K(f )(x2 )| < kf k∞ ε.
Also, K is linear, and

Z x

kKf k∞ 6 sup k(x, t)f (t) dt 6 kkk∞ kf k∞ ,
x∈[0,1] 0
so K defines a bounded linear operator with kKk 6 kkk∞ .

To prove the estimate on kK n k we need to be a bit more careful. For
every x ∈ [0, 1] we have
Z x
|K(f )(x)| 6 |k(x, t)f (t)| dt
0
6 xkkk∞ kf k∞ .
Suppose we have already shown for x ∈ [0, 1] that
xn
|K n (f )(x)| 6 kkkn∞ kf k∞ . (2.44)
n!
Then
Z x
n+1
K (f )(x) 6 |k(x, t)| |K n (f )(t)| dx
0
Z x n
t xn+1
6 kkkn+1
∞ kf k∞ dx = kkkn+1
∞ kf k∞
0 n! (n + 1)!
for all x ∈ [0, 1]. By induction on n, it follows that (2.44) holds for all n > 1.
Hence kK n k 6 kkkn∞ /n! for alln > 1, as claimed.
By Lemma 2.54, B C([0, 1]) is a Banach space. It follows by Lemma 2.28
P∞
that the absolutely convergent series n=0 K n also converges in B C([0, 1]) .
However,
X
∞ X
∞ ∞
X ∞
X
(I − K) Kn = K n (I − K) = Kn − K n = I,
n=0 n=0 n=0 n=1
P∞
so the sum n=0 K n is the inverse of I − K and, for any u ∈ C([0, 1]), the
equation (I − K)f = u has the unique solution f = (I − K)−1 u. In the
case u(x) = cos x, k(x, t) = sin(x − t)σ(t) for x, t ∈ [0, 1] and σ ∈ C([0, 1]),
the calculation after (2.39) shows that the solution f belongs to C 2 ([0, 1])
and solves (2.40) with the initial values f (0) = 1 and f ′ (0) = 0.
2.5.2 The Sturm–Liouville Equation
We now make two more small changes to the initial value problem (2.35)
and (2.36). Fix a parameter λ > 0 and consider instead the Sturm–Liouville
equation
f ′′ + λ2 f = g, (2.45)
with the boundary conditions
f (0) = f (1) = 0.
These boundary conditions (made at the two end points of [0, 1]) replace the
initial value conditions in (2.36), and that change has a surprisingly deep
impact on the resulting equation.
As we recall below the space of functions satisfying (2.45) is, if non-empty,
a two-dimensional affine subspace of functions, so that the additional bound-
ary conditions might lead to a unique solution.
We may proceed just as before. The functions of the form
f (x) = A cos(λx) + B sin(λx)
give all solutions to the homogeneous differential equation f ′′ +λ2 f = 0. Next

one needs to find a particular solution fp to
fp′′ + λ2 fp = g
(ignoring the boundary conditions). After this, one would use the solutions
to the homogeneous differential equation to satisfy the boundary conditions.
Explicitly, given fp we can calculate the vector

fp (0)
(2.46)
fp (1)
and try to express it as a linear combination of the two vectors

cos(λ0) 1
=
cos(λ1) cos λ
and
sin(λ0) 0
= .
sin(λ1) sin λ
If
1 0
det = sin λ
cos λ sin λ
is non-zero, then this is always possible and we find a unique solution to the
boundary value problem. However, if λ ∈ πZ then sin λ = 0 and we may be
unlucky with the value of the vector (2.46): if the vectors

fp (0) 1
,
fp (1) cos λ
are linearly independent, then there will not be a solution to the boundary
value problem.
This obstruction to being able to find a solution to the boundary value
problem may be phrased in terms of another integral operator.
Lemma 2.68. Define the continuous Green function on [0, 1]2 by

(
s(t − 1) for 0 6 s 6 t 6 1;
G(s, t) =
t(s − 1) for 0 6 t 6 s 6 1.
Then for f, h ∈ C([0, 1]) the conditions

f (0) = f (1) = 0
(2.47)
f ∈ C 2 ([0, 1]) and f ′′ = h
are equivalent to the operator equation f = Kh, where K is the operator

defined by Z 1
K(h)(s) = G(s, t)h(t) dt. (2.48)
0
Proof. Assume first that f = Kh. Then

Z 1
f (0) = G(0, t) h(t) dt = 0,
0 | {z }
=0
and f (1) = 0 for the same reason. Moreover,

Z s Z 1
f (s) = t(s − 1)h(t) dt + s(t − 1)h(t) dt,
0 s
Z s
✭✭ ✭
s(s✭−✭1)h(s)
f ′ (s) = ✭ + th(t) dt
0
Z 1
✭✭ ✭
s(s✭−✭1)h(s)
−✭ + (t − 1)h(t) dt,
s
and
f ′′ (s) = sh(s) − (s − 1)h(s) = h(s),
so f is a solution of the boundary value problem (2.47).
To see the converse, notice that the boundary value problem has a solution
(by the argument above). However, our previous discussion of the boundary
value problem associated to the Sturm–Liouville equation (2.45) (which needs
to be modified for the case λ = 0) shows that in this case the solution is
unique. Thus the equivalence of (2.47) and f = Kh is established.
Exercise 2.69. Modify the argument for the Sturm–Liouville equation for the case λ = 0,
and show that the solution is always unique.
In particular, the fact that sn (x) = sin(πnx) for any n ∈ N satisfies

sn (0) = sn (1) = 0
s′′n = −(πn)2 sn
implies by Lemma 2.68 that
sn = −(πn)2 K(sn ).
In other words, the values

µn = −(πn)−2
for n = 1, 2, . . . are eigenvalues of the integral operator K (actually these are
all the eigenvalues of K; see Exercise 2.70).
Thus we can rephrase our earlier observation regarding the equivalent
formulations
′′
f + λ2 f = g
⇐⇒ f = K(−λ2 f + g) ⇐⇒ I + λ2 K f = K(g)
f (0) = f (1) = 0
by saying that this differential equation always has a unique solution for any g
unless λ = πn corresponds to one of the eigenvalues µn = −(πn)−2 = −λ−2
of K.
In Chapter 6 we will start the discussion of eigenvalues of operators, and as

discussed in Section 1.3 it is easier for this to restrict to the case of operators
on Hilbert spaces. As we will show in Chapter 6 the operator K also makes
sense in L2 ([0, 1]) and is completely diagonalizable on that space.
Exercise 2.70 (A special case of elliptic regularity). Suppose that the function f
in L2 ([0, 1]) satisfies Kf = λf for some λ in Rr{0}, where K is the operator (2.48)
discussed in connection with the Sturm–Liouville problem. Show that f must be smooth
on (0, 1), and deduce that f and λ must satisfy the conditions found above.
Exercise 2.71. In this exercise we generalize the connection between the Sturm–Liouville
boundary value problem and integral operators. Let a 0 and q > 0. We
define the second-order differential operator
L(f ) = (pf ′ )′ + qf.
Also let α1 , α2 , β1 , β2 ∈ R and define the boundary conditions
B1 (f ) = α1 f (a) + α2 f ′ (a) = 0,
B2 (f ) = β1 f (b) + β2 f ′ (b) = 0.
Assume that f1 and f2 are fundamental solutions † of the differential equation
L(f ) = 0
such that we also have

B1 (f1 ) = B2 (f2 ) = 0,
but B1 (f2 ) 6= 0 and B2 (f1 ) 6= 0. Show that
p(f1 f2′ − f1′ f2 ) = c
is a constant. Using this, define an associated Green function

(
1
f (s)f2 (t)
c 1
for a 6 s 6 t 6 b,
G(s, t) = 1
f (t)f2 (s)
c 1
for a 6 t 6 s 6 b,
and show that for h ∈ C([a, b]) the boundary-value problem

B1 (f ) = B2 (f ) = 0
L(f ) = h
is equivalent to the equation

Z b
f (s) = K(h)(s) = G(s, t)h(t) dt.
a
Calculate G explicitly for the equation L(f ) = f ′′ , B1 (f ) = f (a) and B2 (f ) = f ′ (b).
† That is, the functions f1 , f2 form a basis of the vector space of all solutions.
2.6 Further Topics
The material in this chapter represents the basic language and some of the
main examples of functional analysis. Let us mention briefly some directions
in which the theory continues.
• In Chapter 3 and the following chapters we will start to see why we
insisted on completeness in the definition of Banach spaces.
• We have seen the definition of dual spaces, but have not yet found a
description of any dual space. This will be corrected in the next chapter
and more generally in Chapter 7, where we will describe the dual spaces
of many of the Banach spaces that we discussed here.
• How can one construct a generalized limit notion that assigns to every
bounded sequence a limit, and still has many of the expected properties?
One such property is linearity (but notice, for example, that lim sup is
not a linear function on the space of bounded sequences). Another such
property is translation-invariance with respect to the underlying group
(for a sequence in the normal sense, this group would be Z). After we
construct this so-called Banach limit, we ask which groups have similar
notions of generalized limits. We will discuss these topics in Sections 7.2
and 10.2.
• Clearly there is some hidden notion of convergence of measures to the
Lebesgue measure in Section 2.3.3. In order to formulate this precisely, we
will need to define an appropriate topology on a space of measures. This
topology will be called the weak* topology (read as ‘weak star’ topology;
see Chapter 8), and as we will show the space of probability measures on
a compact metric space is itself a compact metric space in this topology.
This result helps to provide a coherent setting for many equidistribution
results.
• Some natural spaces (examples include Cc (X) and C ∞ ([0, 1])) do not fit
into the framework of Banach spaces, but do fit into the more general
context of locally convex spaces. These will be introduced in Chapter 8.
• Convexity will also turn out to be fundamental for many discussions in
functional analysis. One of the goals in Chapter 8 will be to analyze how
the extreme points of a convex compact set determine the set.
• Banach algebras will be discussed in greater detail in Chapter 11, which
lays the foundations for the more advanced spectral theory in Chapters 12
and 13.
The reader is advised to continue with the next chapter (or at least the
first three or four sections of it), after which she may select parts of the text.
Chapter 3
Hilbert Spaces, Fourier Series, and
Unitary Representations
In this chapter we define Hilbert spaces as a special case of Banach spaces,

pick up some of the informal claims from Section 1.1, and prove them. In
particular, we will introduce Fourier series in two different settings: the first
abstract, and the second being the — at least soon to be — familiar setting
of the torus. In Section 3.5 we discuss the spectral theory of compact abelian
groups as well as two notions of integrals for Banach space-valued functions.
3.1 Hilbert Spaces
The notion of a Hilbert space is a fundamental idea in functional analysis.

We will see in this section that a Hilbert space is a Banach space of a special
sort, and the additional structure entailed by the extra hypothesis turns out
to be highly significant.
3.1.1 Definitions and Elementary Properties
Definition 3.1. An inner product space or a pre-Hilbert space is a vector

space over R (or C) with an inner product h·, ·i : V × V → R (or C) with the
• (Strict positivity) hv, vi > 0 for all v ∈ V r{0};
• ((Conjugate-)Symmetry) hv, wi = hw, vi for all v, w ∈ V ; and
• (Linearity) for any fixed w ∈ V the map v 7−→ hv, wi is linear.
If the first property of strict positivity is replaced by the weaker axiom of
• (Positivity) hv, vi > 0 for all v ∈ V ,
then we call h·, ·i a semi-inner product.

72 3 Hilbert Spaces, Fourier Series, and Unitary Representations
Notice that over R a consequence is linearity of the map w 7→ hv, wi in the

second variable for fixed v, so that h·, ·i is bilinear. In the complex case, we
have semi-linearity in the second argument, that is
hv, α1 w1 + α2 w2 i = α1 hv, w1 i + α1 hv, w2 i
for any v, w1 , w2 ∈ H and α1 , α2 ∈ C. A map L from a complex vector

space V to C is semi-linear ( 12 -linear) if L(α1 v1 + α2 v2 ) = α1 L(v1 ) + α2 L(v2 )
for all vectors v1 , v2 ∈ V and scalars α1 , α2 ∈ C, and a map B : V × V → C is
sesqui-linear (1 12 -linear) if the map v ∈ V 7→ B(v, w) is linear for any w ∈ V
and the map w ∈ V 7→ B(v, w) is semi-linear for any v ∈ V . Thus the inner
product h·, ·i is sesqui-linear.
In an inner product space, we will see shortly that defining
p
kvk = hv, vi (3.1)
gives a norm on V .
Proposition 3.2 (Cauchy–Schwarz). Let (V, h·, ·i) be an inner product

space. Then we have the Cauchy–Schwarz inequality,
|hv, wi| 6 kvkkwk (3.2)
for all v, w ∈ V , where equality holds if and only if v and w are linearly
dependent. Moreover, the function k · k defined in (3.1) is a norm on V , so
that every inner product space is also a normed vector space.
If h·, ·i is onlypassumed to be a semi-inner product on V , then the induced
function kvk = hv, vi for v ∈ V is a semi-norm and the inequality in (3.2)
also holds in that case.
Definition 3.3. A Hilbert space is an inner product space (H, h·, ·i) which
is complete with respect to the norm k · k induced by the inner product as
in (3.1).
Proof of Proposition 3.2. To see that k · k : V → R>0 from (3.1) defines

a norm, we need to check the following properties:
• strict positivity of k · k, which follows at once from the strict positivity
property of the inner product, and
• homogeneity of k·k, which follows from linearity and (conjugate-)symmetry,
since
kαvk2 = hαv, αvi = |α|2 kvk2 .
For the proof of the triangle inequality we will need the Cauchy–Schwarz
inequality (3.2), which we will prove now. We note that the latter holds
trivially if w = 0. So assume that w 6= 0. By definition, we have
3.1 Hilbert Spaces 73
0 6 kv + twk2 = hv + tw, v + twi

= hv, vi + htw, vi + hv, twi + htw, twi
= kvk2 + t hw, vi + t hw, vi + |t|2 kwk2

= kvk2 + 2ℜ thv, wi + |t|2 kwk2 (3.3)
for any scalar t by linearity and (conjugate-)symmetry of the inner product.

We set
hv, wi
t=− .
kwk2
Then the inequality (3.3) becomes
|hv, wi|2 |hv, wi|2

0 6 kvk2 − 2 + kwk2 ,
kwk2 kwk4
or
0 6 kvk2 kwk2 − |hv, wi|2 ,
giving (3.2).
Reading the manipulations above in the reverse direction we see that equal-
ity in (3.2) gives
kv + twk2 = hv + tw, v + twi = 0,
which forces v + tw = 0 by the positivity property. If, on the other hand, v
and w are linearly dependent with v = αw for some scalar α, then
|hv, wi| = |hαw, wi| = |α|kwk2 = kvkkwk
by the homogeneity property of k · k.

It remains to show the
• triangle inequality, which may be seen as follows:
kv + wk2 = hv + w, v + wi = kvk2 + 2ℜ hv, wi + kwk2

2
6 kvk2 + 2kvkkwk + kwk2 = (kvk + kwk)
by the Cauchy–Schwarz inequality.

Analyzing the proof above shows that the strict positivity of h·, ·i was only
needed to obtain strict positivity of the induced norm and in the proof of the
equality case of the Cauchy–Schwarz inequality.
As we have seen, the triangle inequality for the norm needs the Cauchy–
Schwarz inequality. Another reason for the importance of this fundamental
inequality is that it gives us continuity of the inner product, which we will
use frequently.
Essential Exercise 3.4. Show that an inner product on an inner product

space is jointly continuous with respect to the induced norm: if vn → v
and wn → w as n → ∞, then hvn , wn i → hv, wi as n → ∞.
We record a few elementary properties of inner product spaces.

• The parallelogram identity,
kv + wk2 + kv − wk2 = 2kvk2 + 2kwk2 (3.4)
for all v, w ∈ V .
• The relationship with linear functionals: for fixed w ∈ V the map φw
defined by φw (v) = hv, wi is a linear functional with norm kφw k = kwk.
• The relationship with geometry: the vector hv,wi
kwk2 w (appearing as tw in
the proof of Proposition 3.2 above) is the orthogonal projection of v onto
the subspace spanned by w. Moreover, if hv, wi = 0 then we recover
Pythagoras’ theorem in the form kv + wk2 = kvk2 + kwk2 .
These are easy to check. For the first, expand the left-hand side to obtain
kv + wk2 + kv − wk2 = hv + w, v + wi + hv − w, v − wi
= kvk2 + 2ℜ hv, wi + kwk2 + kvk2 − 2ℜ hv, wi + kwk2 .
The second claim is a consequence of the linearity of the inner product, the
Cauchy–Schwarz inequality and the
definition of the operator norm. The
two final claims follow by expanding v − hv,wi
kwk2 w, w , respectively the square
norm kv + wk2 = hv + w, v + wi.
Exercise 3.5. (a) Show that any real inner product space satisfies the polarization identity
1

hx, yi = 4
kx + yk2 − kx − yk2
which expresses the inner product in terms of the norm.

(b) Show that the parallelogram identity (3.4) characterizes the real inner product spaces
among the real normed spaces in the following sense. If a real normed vector space satisfies
the parallelogram identity, then an inner product can be defined in such a way that the
norm arises from the inner product.
(c) Generalize the polarization identity to complex inner product spaces, and show the
complex analogue of (b).
Example 3.6. We have already seen several Hilbert spaces without making
explicit the underlying inner product.
Pd
(1) Rd (or Cd ) with hv, wi = i=1 vi wi , (also written v ·w) giving the 2-norm
d
!1/2
X
kvk2 = |vi |2 .
i=1
(2) ℓ2 = ℓ2 (N), the space of square-summable sequences of scalars, with inner

product
∞
X
hv, wi = vi wi
i=1
and the 2-norm !1/2

∞
X
2
kvk2 = |vi | .
i=1
Equivalently ℓ2 (N) = L2λcount (N), where λcount is the counting measure

on N.
(3) L2µ (X) for a measure space (X, B, µ) with the inner product
Z
hf, gi = f g dµ, (3.5)
giving the 2-norm

Z 1/2
kf k2 = |f |2 dµ .
Notice that in Example 3.6(2) and (3), the spaces are themselves defined
as the set of sequences or functions with finite 2-norm. We recall how this
implies that the inner product is well-defined and note that (3) contains (2)
as a special case.
Lemma 3.7. If (X, B, µ) is a measure space and f, g ∈ L2µ (X), then the
right-hand side of (3.5) is well-defined.
Proof. Since (|f | − |g|) = |f |2 − 2|f g| + |g|2 > 0 we have

Z Z Z
2 |f g| dµ 6 |f | dµ + |g|2 dµ < ∞,
2
which proves that f g is integrable.
Definition 3.8. Let V, W be two Hilbert spaces and M : V → W a linear

map. If M is both an isometry and a bijection, then M is called a unitary
operator.
Essential Exercise 3.9. Show that a bijective linear operator M : V → W
is unitary if and only if hM v1 , M v2 iW = hv1 , v2 iV for all v1 , v2 ∈ V .
3.1.2 Convex Sets in Uniformly Convex Spaces
From the equality case of the Cauchy–Schwarz inequality, which is itself used
in the proof of the triangle inequality, it follows quickly that a norm in an
inner product space is strictly sub-additive (see Definition 2.17). Thus the
Mazur–Ulam theorem concerning isometries (Theorem 2.20) applies in par-
ticular to Hilbert spaces.
While the emphasis in this section is on Hilbert spaces, we will isolate a
more abstract convexity property which is precisely what is needed for several
proofs in this section.
Exercise 3.10. (a) Show that the norm in a Hilbert space is strictly sub-additive (see
Definition 2.17).
(b) Show that the norm in a uniformly convex vector space (as defined below) is strictly
sub-additive.
Definition 3.11. A normed vector space (V, k · k) is called uniformly convex

if
x + y

kxk, kyk 6 1 =⇒ 6 1 − η (kx − yk) ,
2
for all x, y ∈ V where η : [0, 2] → [0, 1] is a monotonically increasing function
with η(r) > 0 for all r > 0.
x+y
2
Fig. 3.1: If x and y are not close to each other, then the mid-point is uniformly
closer to zero (independent of the choice of x and y).
Heuristically, we can think of Definition 3.11 as having the following geo-

metrical meaning, illustrated in Figure 3.1. If vectors x and y have norm
(length) one, then their mid-point x+y 2 has significantly smaller norm un-
less x and y are very close together. This accords closely with the geometrical
intuition from finite-dimensional spaces with Euclidean distance.
Lemma 3.12. A Hilbert space (H, h·, ·i) is uniformly convex.
Proof. For x, y ∈ H with kxk, kyk 6 1 we have

x + y q1
2 1 2 1 2
2 = 2 kxk + 2 kyk − 4 kx − yk
q
6 1 − 14 kx − yk2 = 1 − η (kx − yk)
q
by the parallelogram identity, with η(r) = 1 − 1 − 14 r2 .
The following theorem, whose conclusion is illustrated in Figure 3.2, will

have many important consequences for the study of Hilbert spaces.
Theorem 3.13 (Unique approximation within a closed convex set).

Let (V, k · k) be a Banach space with a uniformly convex norm, let K ⊆ V be
a non-empty closed convex subset, and assume that v0 ∈ V . Then there exists
a unique element w ∈ K that is closest to v0 in the sense that w is the only
element of K with kw − v0 k = inf kk − v0 k.
k∈K
K
v0
w
Fig. 3.2: The unique closest element of K to v0 .
The following exercise shows that the unique existence of the best approx-
imation is by no means guaranteed.
Exercise 3.14. (1) Let K ⊆ V be a non-empty compact subset of a normed vector
space (V, k · k) or let V be finite-dimensional and K closed. Show the existence of a best
approximation of any v0 ∈ V within K.
(2) Let V = R2 equipped with the norm k · k∞ and let K be the closed unit ball. Find a
point v0 ∈ V that has more than one best approximation within K. Describe the points
that have exactly one best approximation within K.
(3) Let V = ℓ1R (N) equipped with the norm k · k1 . Let
n ∞
X o
K= (xn ) ∈ ℓ1R (N) | xn > 0 and an xn = 1 ,
n=1
where (an ) is a fixed sequence in (0, 1) with limn→∞ an = 1. Show that K is closed and
convex. Let v0 = 0 and show that there is no best approximation of v0 within K. Conclude
in particular from this that the closed unit ball in ℓ1R (N) is not compact and that ℓ1R (N) is
not uniformly convex.
Exercise 3.15. (8) Let (X, B, µ) be a measure space with Lpµ (X), for p ∈ [1, ∞], the asso-
ciated function spaces.
p p 2 2 p/2 for any p ∈ [2, ∞) and a, b > 0.
(a) Show that a + b 6 (a
+ b )
p p
(b) Show that f +g
2
+ f −g
2
6 1
2
kf kpp + 12 kgkpp for any p ∈ [2, ∞) and f, g in Lpµ (X).
p p
(c) Deduce from (b) that Lpµ (X) is uniformly convex for p ∈ [2, ∞).
(d) Show that L1µ (X) and L∞µ (X) are in general not uniformly convex.
Proof of Theorem 3.13. By translating both the set K and the point v0
by −v0 , we may assume without loss of generality that v0 = 0. We define
s = inf kk − v0 k = inf kkk.

k∈K k∈K
If s = 0, then we must have 0 ∈ K since K is closed, and the only choice is

then w = v0 = 0 (the uniqueness of w is a consequence of the strict positivity
of the norm). So assume that s > 0. By multiplying by the scalar 1s we
may also assume without loss of generality that s = 1. Notice that once we
have found a point w ∈ K with norm 1, then its uniqueness is an immediate
consequence of the uniform convexity: if w1 , w2 ∈ K have kw1 k = kw2 k = 1,
then w1 +w
2
2
∈ K because K is convex. Also, k w1 +w 2
2
k = 1 by the triangle
inequality and since s = 1. By uniform convexity this implies that w1 = w2 .
Existence: Turning to the existence, let us first give the idea of the proof.
Choose a sequence (kn ) in K with kkn k → 1 as n → ∞. Then the mid-
points kn +k
2
m
also lie in K, since K is convex, and thus the mid-point must
have norm greater than or equal to 1, since s = 1. Therefore kn and km must
be close together by uniform convexity, so (kn ) is a Cauchy sequence. Since V
is complete and K is closed, this will give a point w ∈ K with kwk = 1 = s
as required.
To make this more precise, it is easier to apply uniform convexity to the
normalized vectors
1
xn = kn ,
sn
where sn = kkn k. The mid-point of xn and xm can now be expressed as

xm + xn 1 1 1 1
= km + kn = + (akm + bkn )
2 2sm 2sn 2sm 2sn
with
1
2sm
a= 1 1 > 0,
2sm + 2sn
1
2sn
b= 1 1 > 0,
2sm + 2sn
and a + b = 1. Therefore akm + bkn ∈ K by convexity, and so

xm + xn 1 1 1 1
= + kakm + bkn k > + .
2 2sm 2sn 2sm 2sn
Let η be as in Definition 3.11 and fix ε > 0. Choose N = N (ε) large enough
to ensure that m > N implies that
1
> 1 − η(ε).
sm
Then m, n > N implies that
1 1
+ > 1 − η(ε),
2sm 2sn
which together with the definition of uniform convexity gives

1 − η kxm − xn k > k xm +x 2
n
k > 1 − η(ε).
By monotonicity of the function η this implies that
kxm − xn k < ε
for all m, n > N , showing that (xn ) is a Cauchy sequence. As V is assumed

to be complete, we deduce that (xn ) converges to some x ∈ V with kxk = 1.
Since sn → 1 and kn = sn xn as n → ∞ it follows that limn→∞ kn = x.
As K is closed the limit x belongs to K and by construction is an (hence the
unique) element in K closest to v0 = 0.
Definition 3.16. Let H be a Hilbert space, and A ⊆ H any subset. Then

the orthogonal complement of A is defined to be
A⊥ = {h ∈ H | hh, ai = 0 for all a ∈ A}.
Corollary 3.17 (Orthogonal decomposition). Let H be a Hilbert space,

and let Y ⊆ H be a closed subspace. Then Y ⊥ is a closed subspace with
H = Y ⊕ Y ⊥,
meaning that every h ∈ H can be written in the form h = y + z with y ∈ Y

and z ∈ Y ⊥ , and y and z are unique with these properties. Moreover, we
⊥
have Y = Y ⊥ and
khk2 = kyk2 + kzk2 (3.6)
if h = y + z with y ∈ Y and z ∈ Y ⊥ .
In a two-dimensional real vector space, (3.6) is familiar as Pythagoras’
theorem.
Proof of Corollary 3.17. As h 7→ hh, yi is a (continuous linear) functional
for each y ∈ Y , the set Y ⊥ is an intersection of closed subspaces and hence is
a closed subspace. Using strict positivity of the inner product, it is easy to see
that Y ∩ Y ⊥ = {0}; from this the uniqueness of the decomposition h = y + z
with y ∈ Y and z ∈ Y ⊥ follows at once.
So it remains to show the existence of this decomposition. Fix h ∈ H, and
apply Theorem 3.13 (which we may as a Hilbert space is uniformly convex
by Lemma 3.12) with the closed convex set K = Y to find a point y ∈ Y that
is closest to h. Let z = h − y, so that for any v ∈ Y and any scalar t we have

kzk2 6 kh − (tv + y) k2 = kz − tvk2 = kzk2 − 2ℜ thv, zi + |t|2 kvk2 .
| {z }
∈Y
In particular, for t ∈ R the function t 7→ kh − (tv + y)k2 is a quadratic

polynomial with
its minimum at t = 0. Taking the derivative at t = 0
gives ℜ hv, zi = 0 for all v ∈ Y . Similarly, restricting to t = is with s ∈ R
and taking the derivative once more it follows that ℑhv, zi = 0 for all v ∈ Y .
Thus z ∈ Y ⊥ , and hence
khk2 = hh, hi = hy + z, y + zi = kyk2 + kzk2,
showing (3.6).
⊥ ⊥
It is clear from the definitions that Y ⊆ Y ⊥ . If v ∈ Y ⊥ then we
may write v = y + z for some y ∈ Y and z ∈ Y ⊥ by the first part of the
⊥
proof. However, 0 = hv, zi = kzk2 implies that v = y and so Y = Y ⊥ .
An immediate consequence of Corollary 3.17 is the following.
Corollary 3.18 (Orthogonal projection). For a closed subspace Y of a

Hilbert space H, the orthogonal projection onto Y , defined by
PY : H −→ Y
h 7−→ y
where y is the unique element of Y with h − y ∈ Y ⊥ , is a bounded linear

operator with kPY k 6 1 (and with kPY k = 1 unless Y = {0}) satisfying
(and characterized by) hh, yi = hPY h, yi for all h ∈ H and y ∈ Y . Moreover,
if H = Y ⊕ Y ⊥ , then the orthogonal decomposition from Corollary 3.17 is
given by
h = PY h + PY ⊥ h.
Recall that we write V ∗ = B(V, R) or B(V, C) for the dual space of a

normed vector space V , equipped with the operator norm. The following
gives our first classification of a dual space, and is crucial for the further
development of the theory as well as of its applications.
Corollary 3.19 (Fréchet–Riesz representation). For a Hilbert space H

the map sending h ∈ H to φ(h) ∈ H∗ defined by φ(h)(x) = hx, hi is a linear
(resp. semi-linear in the complex case) isometric isomorphism between H and
its dual space H∗ .
Proof. By the axioms of the inner product, we know that φ is (semi-)linear.

By the Cauchy–Schwarz inequality and since φ(h)(h) = khk2 we also know
that φ is isometric. It remains to show that φ is onto. Let
ℓ : H → R (or C)
be a linear functional. Then Y = ker(ℓ) is a closed linear subspace of H

(since ℓ is continuous). If Y = H then ℓ = 0 and so φ(0) = ℓ. So suppose
that Y 6= H, in which case we can choose† a non-zero element z ∈ Y ⊥ .
We claim that !
ℓ(z)
ℓ=φ z .
kzk2
Indeed, if x ∈ H then ℓ(z)x−ℓ(x)z ∈ ker(ℓ) = Y and so hℓ(z)x − ℓ(x)z, zi = 0

2
by choice of z. In other words,
D we haveE shown that ℓ(z) hx, zi = ℓ(x)kzk ,
ℓ(z)
which is equivalent to ℓ(x) = x, kzk 2z for x ∈ H, as claimed.
The following exercise shows that completeness is essential in Corollar-

ies 3.17 and 3.19.
Exercise 3.20. Consider the space ℓ2c of sequences
P with finite support with the ℓ2 inner
xn
product, and the subspace V = {x ∈ ℓ2c | n>1 n = 0}. Show that V is closed, and
that its orthogonal complement in ℓ 2 is empty. Deduce that the bounded linear functional
P c
xn
sending (xn ) ∈ ℓ2c to n>1 n ∈ C cannot be represented as x 7→ hx, yi for any y ∈ ℓc .
2
Exercise 3.21. The following is known as the Lax–Milgram lemma.

(a) Suppose that H is a Hilbert space, and suppose that B : H × H −→ R (or C) is bilinear
(or sesqui-linear in the complex case). Finally, assume that B is bounded in the sense that
there is some M > 0 with
|B(x, y)| 6 M kxkkyk
for all x, y ∈ H. Show that there exists a unique linear operator T : H → H with
B(x, y) = hT x, yi
for which kT kop 6 M .

(b) Assume in addition that B is coercive, meaning that there exists some c > 0 such
that |B(x, x)| > ckxk2 for all x ∈ H. Show in this case that the operator T from (a) has
an inverse, and that kT −1 kop 6 1c .
Exercise 3.22. Use Corollary 3.19 to show that if H is a Hilbert space, then H∗ is also a
Hilbert space, and exhibit a natural isometric isomorphism between H and H∗∗ .
Essential Exercise 3.23. Show that the completion of an inner product

space is a Hilbert space.
Exercise 3.24. Recall the definition of the Hardy space H 2 (D) (or the definition of the
Bergman space A2 (D)) for D = B1C from Exercise 2.61. Show that for every a ∈ D there
is a function ka in H 2 (D) (respectively ka ∈ A2 (D)) with
f (a) = hf, ka iH 2 (D)
or
f (a) = hf, ka iA2 (D)
respectively. The function D × D ∋ (a, w) 7→ ka (w) is called a reproducing kernel. Determ-
ine ka explicitly in both cases.
† The space Y ⊥ is one-dimensional, since ℓ|

Y ⊥ has trivial kernel. Hence z is really uniquely
determined up to a scalar multiple.
We recall (and extend) the definition of linear hull as follows.
Definition 3.25. Let (V, k · k) be a normed vector space, and let S ⊆ V

be a subset. The linear hull of S, written hSi, is the smallest
P subspace of V
containing S. Thus hSi consists of all linear combinations s∈F cs s for F ⊆ S
finite and scalars cs . The closed linear hull hSi is the smallest closed subspace
of V containing S — it is the closure of the linear hull.
Corollary 3.26 (Characterization of the closed linear hull). Let H be

⊥
a Hilbert space and S ⊆ H a subset. Then hSi = S ⊥ .
Proof. Let Y = hSi be the closed linear hull. By orthogonal decomposition

⊥
of Hilbert spaces (see Corollary 3.17), Y = Y ⊥ . We claim that Y ⊥ = S ⊥ ,
which together with the last statement gives the corollary. To see the claim,
notice that for any x ∈ H we have
x ∈ S ⊥ ⇐⇒ hx, yi = 0 for y ∈ hSi

⇐⇒ hx, yi = 0 for y ∈ hSi
by (semi-)linearity in the second argument and continuity of the inner

product.
We have seen that a closed subspace Y in a Hilbert space has a closed
orthogonal complement Y ⊥ , allowing the Hilbert space to be written as the
direct sum Y ⊕ Y ⊥ . The next two exercises explore the same question for
Banach spaces, where no such simple conclusion can be drawn.
Exercise 3.27. Let (V, k · k) be a normed space. Two subspaces V1 , V2 of V are said to be
algebraically complemented if V1 + V2 = V and V1 ∩ V2 = {0}. In that case we define linear
maps π : V → V /V2 by v 7→ v + V2 , φ : V1 × V2 → V by (v1 , v2 ) 7→ v1 + v2 , and P : V → V1
by v1 + v2 7→ v1 where v ∈ V, v1 ∈ V1 and v2 ∈ V2 . We may call P the projection of V
onto V1 along V2 . Show that the following are equivalent:
(1) V1 and V2 are closed subspaces of V and the map π|V1 is a homeomorphism
(where V /V2 is equipped with the quotient norm from Lemma 2.15);
(2) the map φ is a homeomorphism (where V1 × V2 is equipped with any of the norms
from Exercise 2.9);
(3) P is a bounded operator.
If any of these equivalent conditions hold, then the subspaces are called topologically com-
plemented.
A closed subspace W of a normed space V is called complemented if there

is a closed subspace W ′ with the property that W and W ′ are topologic-
ally complemented. The next exercise shows that a closed subspace is not
necessarily complemented.(9)
Exercise 3.28. Prove that the closed subspace c0 = {x = (xn ) ∈ ℓ∞ | limn→∞ xn = 0} is
not complemented in ℓ∞ as follows.
(1) Show that (ℓ∞ )∗ contains a countable subset A with the property that if x ∈ ℓ∞
has a(x) = 0 for all a ∈ A then x = 0, and deduce that the same holds for any space
isomorphic to a closed subspace of ℓ∞ .
Using the following steps, show that V = ℓ∞ /c0 does not have the property in (1) and
hence that c0 cannot be complemented by Exercise 3.27.
(2) Use an enumeration of Q to construct for each i ∈ I = RrQ a sequence
(i)
x(i) = (xn ) ∈ ℓ∞
with values in {0, 1} and with infinite support
(i)
Supp(x(i) ) = {n ∈ N | xn = 1}
in such a way that Supp(x(i) ) ∩ Supp(x(j) ) is finite for all i 6= j.

(3) Show that for any finite non-empty subset J ⊆ I and any list of numbers (bi )i∈J
with |bi | = 1 for all i ∈ J we have

X
(i)
bi x = 1.

i∈J ℓ∞ /c0
(4) Deduce from (3) that for any continuous linear functional f ∈ V ∗ and n ∈ N the
1
set {i ∈ I | |f (x(i) )| > n } is finite. Conclude that for any countable subset A of V ∗ there
is some i ∈ I with the property that a(x(i) ) = 0 for all a ∈ A.
3.1.3 An Application to Measure Theory
We will show in this section how the results from Section 3.1.2 can be used in
measure theory. Before stating the main result, we recall some definitions. A
measure ν is absolutely continuous with respect to another measure µ, writ-
ten ν ≪ µ, if any measurable set N with µ(N ) = 0 must also satisfy ν(N ) = 0.
Two measures µ and ν are singular with respect to each other, written ν ⊥ µ,
if there exist disjoint measurable sets Xµ , Xν ⊆ X with X = Xµ ⊔ Xν and
with ν(Xµ ) = 0 = µ(Xν ). Finally, recall that a measure µ is σ-finite if there
is a decomposition of X into measurable sets,
∞
G
X= Xi ,
i=1
with µ(Xi ) < ∞ for all i > 1.
Proposition 3.29 (Lebesgue decomposition, Radon–Nikodym deriv-

ative). Let µ and ν be two σ-finite measures on a measurable space (X, B).
Then ν can be decomposed as ν = νabs + νsing into the sum of two σ-
finite measures with νabs ≪ µ and νsing ⊥ µ. Moreover, there exists a (µ-
almost everywhere uniquely determined) measurable function f > 0, called
the Radon–Nikodym derivative and often denoted by f = dνdµabs
, such that
Z
νabs (B) = f dµ
B
for any measurable B ⊆ X.
Proof. Suppose that µ and ν are both finite measures (the general case can
be reduced to this case by using the assumption that µ and ν are both σ-finite;
see Exercise 3.31).
We define a new measure m = µ + ν and will work with the real Hilbert
space H = L2m (X). On this Hilbert space we define a linear functional φ by
Z
φ(g) = g dν
for g ∈ H = L2m (X). Note that equivalence of functions modulo m implies

equivalence of functions modulo ν, and moreover that for g ∈ H we have
Z Z
|g| dν 6 |g| dm 6 kgkL2m k1kL2m ,
where we have used the fact that m = µ + ν, that µ is a positive measure,

and the Cauchy–Schwarz inequality on H. Therefore, φ(g) is well-defined
for g ∈ H and satisfies |φ(g)| 6 kgkL2m k1kL2m . By Fréchet–Riesz representa-
tion (Corollary 3.19) there is some k ∈ H = L2m such that
Z Z
g dν = φ(g) = gk dm (3.7)
for all g ∈ L2m (X). We claim that k takes values in [0, 1] almost surely with
respect to m. Indeed, for any B ∈ B we have 0 6 ν(B) 6 m(B), so (using
the fact that g = 1B ) Z
06 k dm 6 m(B).
B
Using the choices B = {x ∈ X | k(x) < 0} and B = {x ∈ X | k(x) > 1}

implies the claim that k takes values in [0, 1], m almost surely. Modifying k
if necessary we will assume that k only takes values in [0, 1].
Since m = µ + ν, we can reformulate (3.7) as
Z Z
g(1 − k) dν = gk dµ. (3.8)
This holds by construction for all simple functions g, and hence for all non-
negative measurable functions by monotone convergence. Now define
Xsing = {x ∈ X | k(x) = 1},

Xµ = XrXsing = {x ∈ X | k(x) < 1},
and νsing = ν|Xsing . By definition, νsing (Xµ ) = 0, and by equation (3.8)

applied with g = 1Xsing we also have µ(Xsing ) = 0. Therefore νsing ⊥ µ. We
also define νabs = ν|Xµ so that ν = νsing + νabs . Finally, define the function
(
k
on Xµ ,
f = 1−k
0 on Xsing .
For any measurable g > 0 we may now apply (3.8) to obtain

Z Z Z Z
g g
g dνabs = 1−k (1 − k) dν = 1−k k dµ = gf dµ,
X Xµ Xµ X
which shows that f = dνdµ abs

is a Radon–Nikodym derivative and also shows
that νabs ≪ µ.
If f1 , f2 are both Radon–Nikodym derivatives of νabs with respect to µ,
then Z Z
f1 dµ = νabs (B) = f2 dµ
B B
for all measurable B ⊆ X, which implies that f1 = f2 µ-almost surely by
considering B = {x ∈ X | f1 (x) < f2 (x)} and B = {x ∈ X | f1 (x) > f2 (x)}.

Exercise 3.30.FFor every n > 1, let (Xn , Bn , µn ) be a measure space.

(a) Define X = ∞ n=1 Xn . Show that B = {B ⊆ X | B ∩ Xn ∈ Bn for every n > 1} defines
a σ-algebra on X. P∞
(b) For every B ∈ B as in (a) define µ(B) = n=1 µn (B ∩ Xn ) and show that µ is a
measure on B.
Essential Exercise 3.31. Complete the proof of Proposition 3.29 in the σ-

finite case.
Exercise 3.32. Use Lemma 2.63 and Proposition 3.29 to calculate the norm of the func-
tional Z Z
Cc (X) ∋ f 7−→ f dµ − f dν
for two finite Borel measures µ and ν on a locally compact metric space X (which may or
may not be mutually singular).
Exercise 3.33. Let (X, B) be a measurable space and denote the space of signed measures
on X (as defined in Section B.5) by M(X).
(a) Given a signed measure dν = g dµ with a finite measure µ and g ∈ L1µ (X), define kνk
to be kgkL1 (µ) . Show that this yields a well-defined norm on M(X).
(b) Show that M(X) is a Banach space with respect to this norm.
The notion of conditional expectation with respect to a sub-σ-algebra is a

powerful tool, and finds wide applications in probability (see Loéve [64], [65])
and ergodic theory (see [27], for example). Roughly speaking it is an ‘ortho-
gonal projection’ defined on L1 , and we invite the reader to construct it in
this manner in the following exercise.
Exercise 3.34. Let (X, B, µ) be a probability space, and let A ⊆ B be a sub-σ-algebra.

(a) Show that there exists a bounded operator, called the conditional expectation,
Eµ ( · | A) : L1µ (X, B) −→ L1µ (X, A)

f 7−→ Eµ f | A
such that Z Z

f dµ = Eµ f | A dµ (3.9)
A A
for all A ∈ A.
(b) Show that (3.9) uniquely characterizes Eµ (f | A) ∈ L1µ (X, A) as an equivalence class.
(c) Show that f ∈ L1µ (X, B) and g ∈ L∞
µ (X, A) implies that Eµ (f g | A) = gEµ (f | A).
(d) Show that kEµ (f | A) k1 6 kf k1 for f ∈ L1µ (X, B).
3.2 Orthonormal Bases and Gram–Schmidt
Definition 3.35. A finite or countable list (xn ) of vectors in an inner product

space (V, h·, ·i) is called orthonormal if
(
1 if m = n,
hxm , xn i = δmn =
0 if m 6= n
for all m, n > 1. In other words, we require that all the vectors have length
one, and are mutually orthogonal.
As one might expect, this notion is fundamental for Hilbert spaces, and
gives rise to the following satisfying abstract result, which as we will see lays
the ground for Fourier analysis.
Proposition 3.36 (The closed linear hull of an orthonormal list).
Let H be a Hilbert space. Then the closed linear hull of an orthonormal
list (xn ) is given by
X
h{xn }i = an xn | the sum converges in H ,
n
P P
where the sum v = an xn converges in H if and only if n |an |2 < ∞. In
n
P 1/2
2
that case we also have kvk = n |an | and hv, xm i = am for m > 1.
P 2
Hence the linear map φ that sends the sequence (an ) with n |an | < ∞
P
to n an xn ∈ h{xn }i is a unitary isomorphism of Hilbert spaces.
P
We note that the series n an xn need not be absolutely convergent,
since ℓ2 (N) ) ℓ1 (N).
Proof of Proposition 3.36. Suppose first that (x1 , . . . , xN ) is a finite
orthonormal list. Then we may define a map φ from KN (with K being the
3.2 Orthonormal Bases and Gram–Schmidt 87
PN
field of scalars R or C) to H by setting φ((an )) = n=1 an xn . Using the
assumption of orthonormality it follows that
X X
kφ((an ))k2 = ham xm , an xn i = |an |2 = k(an )k22 (3.10)
m,n n
and X
hφ((an )), xj i = han xn , xj i = aj (3.11)
n
for j = 1, . . . , N . This proves the statements in the case of a finite list.

Now suppose that the list is infinite. In this case we define the space

cc (N) = (an ) ∈ ℓ2 (N) | an = 0 for all but finitely many n
and the linear map φ : cc (N) → H by

∞
X
(an ) 7−→ an xn ,
n=1
where the sum is actually finite by definition of the space cc (N), so that prop-
erties (3.10) and (3.11) clearly still hold in this case. We note that φ(cc (N))
is the linear hull of the set {xn }.
Now notice that cc (N) ⊆ ℓ2 (N) is dense. Indeed, if a = (an ) ∈ ℓ2 (N) and
we define (
an if n 6 N ;
a(N
n
)
=
0 if n > N
for N ∈ N, then
2 ∞
X
(N )
an − (an ) = |an |2 −→ 0
2
n=N +1
as N → ∞. By Proposition 2.59 there is therefore a unique extension of φ

to ℓ2 (N), defined by
φ((an )) = lim φ((a(N )
n )). (3.12)
N →∞
By continuity of the norm and the inner product, properties (3.10) and (3.11)
extend to all of ℓ2 (N). By (3.10), φ is an isometry from ℓ2 (N) onto its image,
so the image is complete and therefore closed in H. Since φ(cc (N)) = h{xn }i,
2
it follows that φ(ℓ
P∞(N)) = h{xn }i is the closed linear hull. Finally
P (3.12) shows
that the series n=1 an xn converges if (an ) ∈ ℓ2 (N), and if ∞ an xn con-
n=1P
∞
verges, then (3.10) (applied to the partial sums) also implies that n=1 |an |2
converges.
The argument in the proof above can also be used for orthogonal subspaces
as in the following exercise.
Essential Exercise 3.37. (a) Let (Hn ) be a finite or countable list of Hilbert
spaces. Then we define the direct Hilbert space sum
M n X o
Hn = (vn ) | vn ∈ Hn and kvn k2 < ∞
n n
to consist of all ‘square summable sequences’. Show that

X
h(vn ), (wn )i⊕ = hvn , wn i
n
defines an inner product on the direct sum, making it into a Hilbert space.
(b) Let H be a Hilbert space and (Hn ) a finite or countable list of mutually
orthogonal closed subspaces of H. Show that there is a canonical isometric
isomorphism
M D[ E
φ: Hn −→ Hn
n n
analogous to Proposition 3.36, and describe the inverse map of φ using or-
thogonal projections.
Definition 3.38. A list of orthonormal vectors in a Hilbert space H is said
to be complete (or to be an orthonormal basis) if its closed linear hull is H.
We note that strictly speaking this notion of a Schauder basis of an infinite-
dimensional Hilbert space does not coincide with the notion of a basis in the
sense of linear algebra as we allow (in contrast to the standard definition)
infinite converging sums to represent arbitrary vectors as linear combinations
of the basis vectors. We invite the reader to compare our discussion here with
the proof of the existence of a basis for an infinite-dimensional vector space
(relying on the axiom of choice and often called a Hamel basis), and hope
that the reader agrees with us that the notion of an orthonormal basis of a
Hilbert space is much more natural than the notion of a Hamel basis in our
context. More importantly, the notion of an orthonormal basis will prove to
be much more useful in the following discussions.
Theorem 3.39 (Gram–Schmidt). Every separable Hilbert space H has an
orthonormal basis. If H is n-dimensional, then H is isomorphic to Rn or Cn .
If H is not finite-dimensional, then H is isomorphic to ℓ2 (N).
Here isomorphic means isomorphic as Hilbert spaces, so there is a linear
bijection between the spaces that preserves the inner product. The proof
of Theorem 3.39 is simply an interpretation of the familiar Gram–Schmidt
orthonormalization procedure.
Proof of Theorem 3.39. Let {y1 , y2 , . . . } ⊆ H be a dense countable subset.
We are going to use the vectors {yn } to construct an orthonormal list of
vectors which has the same linear hull. This is built up from the simple
geometrical observation that if a vector v does not lie in the linear span of a
3.2 Orthonormal Bases and Gram–Schmidt 89
finite set of vectors, then something from the linear span may be added to v
to produce a non-zero vector orthogonal to the linear span.
We may assume that y1 6= 0 and define x1 = kyy11 k . Suppose now that
we have already constructed orthonormal vectors x1 , . . . , xn by using the
vectors y1 , . . . , yk with k > n in such a way that
Vn = hx1 , . . . , xn i = hy1 , . . . , yk i.
If yk+1 ∈ Vn we simply increase k but not n. If yk+1 ∈ / Vn we decompose yk+1

into a sum v+w with v ∈ Vn and w ∈ Vn⊥ using the orthogonal decomposition
w
in Hilbert spaces (see Corollary 3.17). Since w 6= 0, we may define xn+1 = kwk
and obtain an extended list of orthonormal vectors satisfying
Vn+1 = hx1 , . . . , xn+1 i = hy1 , . . . , yk+1 i.
Continuing this construction, we see that either H is n-dimensional for

some n > 0 and we have produced an orthonormal basis {x1 , . . . , xn } for H,
or that H is infinite-dimensional, and we construct an infinite list x1 , x2 , . . .
of orthonormal vectors in H. In this case the linear hull of {x1 , x2 , . . . } con-
tains the original dense set {y1 , y2 , . . . } and so the closed linear hull must
be all of H, showing that {xn } is an orthonormal basis of H. The remaining
statement that H is isomorphic to ℓ2 (N) follows from Proposition 3.36.
Exercise 3.40. Give a direct proof that the closed unit ball in an infinite-dimensional
Hilbert space is not compact by using the material from this section.
Exercise 3.41. Recall the Hardy and Bergman spaces H 2 (D) and A2 (D) on the unit
disk D = B1C from Exercise 2.61. Describe the spaces H 2 (D) andP A2 (D) in terms of the
sequence of Taylor coefficients (an ) of the Taylor expansion f (z) = n>0 an z n of elements
of the space.
As we have noted above, a convergent sum obtained from an orthonormal

list may not always be absolutely convergent. However, as we will show now, it
is always unconditionally convergent in the sense that the order of summation
is irrelevant.
Corollary 3.42. Let (xn ) beP a countable orthonormal list in a Hilbert space H
∞
and let (an ) ∈ ℓ2 (N). Then n=1 an xn converges
P∞ unconditionally, P∞meaning
that for any permutation  : N → N we have m=1 a(m) x(m) = n=1 an xn .
In particular, it makes sense to speak of a countable orthonormal basis even
if we do not specify an enumeration of the basis.
and  : N → N be as in the corollary. By Proposi-

Proof. Let (xn ), (an ),P
tion 3.36 the series v = ∞n=1 an xn converges, and since
∞
X ∞
X
|a(m) |2 = |an |2
m=1 n=1
P∞
the same applies to w
= a(m) x(m) . Also by Proposition 3.36 we
m=1
have hv, xn i = an and w, x(m) = a(m) for all m, n ∈ N. As  is a permuta-
tion we see that hv − w, xn i = 0 for all n ∈ N. As v − w belongs to the closed
linear hull of {xn }, it follows that v = w, as claimed.
Suppose now B ⊆ H is a countable set consisting of mutually ortho-
gonal unit vectors with dense linear hull. Then we may choose an enumer-
ation B = {xn | n ∈ N} and obtain an orthonormal basis in the sense of
Definition 3.38. By the above the properties of the orthonormal basis and
also the coordinates hv, xi of v ∈ H associated to a given element of x ∈ B
remain unchanged if a different enumeration is being used.
3.2.1 The Non-Separable Case
While the motivation generated by natural examples and the notational con-
venience of thinking of countable collections as sequences incline one strongly
to the separable case, there is no reason to restrict attention completely to
separable Hilbert spaces.
Example 3.43. Let I be a set, equipped with the discrete topology and the
counting measure λcount defined on the σ-algebra P(I) of all subsets of I.
Then ℓ2 (I) = L2 (I, P(I), λcount ) is a Hilbert space, and it comprises all func-
tions a : I → R (or C) for which the P support Supp(a)
P = {i ∈ I | ai 6= 0} is
finite or countable, and for which i∈I |ai |2 = i∈Supp(a) |ai |2 < ∞.
Theorem 3.44 (Non-separable Gram–Schmidt). Let H be a Hilbert

space that is not separable. Then there is an orthonormal basis consisting of
elements xi for every i in an uncountable index set I. Moreover, we have an
isomorphism H ∼ = ℓ2 (I), where the isomorphism between H and ℓ2 (I) is given
by X
ℓ2 (I) ∋ a 7−→ ai xi
i∈Supp(a)
and the sum on the right is countable and convergent.
Proof. We will construct a maximal orthonormal set of vectors by using

Zorn’s lemma (see Appendix A.1). Define a partially ordered† set
. .
F = {(I, x ) | the function x : I → H has orthonormal image} ,
with partial order defined by (I, x.) 4 (J, y.) if I ⊆ J and x. = y.| . In this I
partially ordered set every totally ordered subset (or chain) has an upper
bound, which can be found by simply taking the union of the index sets and
the natural extension of the partially defined functions to the union.
† In order to ensure that this definition does indeed define a set, we could add the require-
.
ment that I is a subset of H, and let x be the identity.
3.3 Fourier Series on Compact Abelian Groups 91
.
It follows that there exists a maximal element (I, x ) of this partially
ordered set by Zorn’s lemma. Using this, define an isometry φ : ℓ2 (I) → H
by X
a 7−→ ai xi
i∈Supp(a)
first on the subset of all elements a ∈ ℓ2 (I) with | Supp(a)| < ∞, and then,
by applying the automatic extension to the closure (Proposition 2.59), on
all of ℓ2 (I). This again defines an isomorphism from ℓ2 (I) to the complete,
and hence closed, subspace Y = φ(ℓ2 (I)) ⊆ H. We claim that Y = H, for
otherwise there would exist some x ∈ Y ⊥ of norm one by the orthogonal
decomposition of Hilbert spaces (Corollary 3.17), and using this element x
we can define a new element of F which is strictly bigger than the maximal
.
element (I, x ) in the partial order. This contradiction shows the claim, and
hence proves the theorem.
3.3 Fourier Series on Compact Abelian Groups
Definition 3.45. A topological group is a group G that carries a topology

with respect to which the maps (g, h) 7→ gh and g 7→ g −1 are continuous as
maps G × G → G and G → G respectively. A compact (σ-compact, locally
compact, and so on) group is a topological group for which the topological
space is compact (σ-compact, locally compact, and so on). We similarly ex-
tend other topological and algebraic properties to topological groups. For
example, a metric compact abelian group is a compact metric topological
space with an abelian group structure satisfying the continuity conditions
above.
Below we will be largely concerned with specific metric abelian groups, in
which the circle or 1-torus
T = R/Z ∼= S1
with its metric inherited from the usual metric on R (see the footnote on p. 2)
and the d-torus d
Td = Rd /Zd ∼= S1
are the main examples (which will also be discussed in Section 3.4 from a
slightly different, more concrete, point of view). The notation T will be used
for the additive circle and S1 = {z ∈ C | |z| = 1} for the multiplicative circle.
Here compact and abelian are necessary assumptions for the type of result we
prove, and dropping either of these two assumptions changes the theory sig-
nificantly. However, we assume metrizability mainly for convenience, because
it gives separability of C(G) by Lemma 2.45.
We will use the following facts about compact abelian groups as ‘black
boxes’ (that is, we will not need to know how they are proved at this stage).
However, we will also see in some examples below that these are often easy
to prove if the group is given concretely.
Theorem (Existence of Haar measure†(10) ). Every locally compact σ-
compact metric group G has a left Haar measure mG , satisfying (and, up to
positive multiples, characterized by) the properties:
• mG (K) < ∞ for any compact set K ⊆ G;
• mG (O) > 0 for any non-empty open set O ⊆ G; and
• mG (gB) = mG (B) for all measurable B ⊆ G and g ∈ G.
We will usually be dealing with σ-compact metrizable groups, which sim-
plifies the measure theory needed, but the existence of Haar measure only
requires the group to be locally compact. For G = Td , which as a measur-
able space can be identified with [0, 1)d , the Haar measure is simply the d-
dimensional Lebesgue measure restricted to [0, 1)d .
Exercise 3.46. Show that the Lebesgue measure on [0, 1)d considered as a measure on Td
satisfies all the properties of the Haar measure.
Knowing the defining third property mG (x + B) = mG (B) for all Borel

subsets B ⊆ G, the formula
Z Z
f (x) dm(x) = f (g + x) dm(x) (3.13)
G G
follows immediately for simple functions and then by monotone convergence

also for positive integrable functions, and then by taking differences for all
integrable functions.
Before stating the next fact, we recall that a (unitary) character on a
topological group G is a continuous homomorphism
χ : G −→ S1 = {z ∈ C | |z| = 1}. (3.14)
The trivial character is the character defined by χ(g) = 1 for all g ∈ G. A

collection F of functions on G (and, in particular, a collection of characters)
is said to separate points if for any g, g ′ ∈ G with g 6= g ′ there is some f ∈ F
with f (g) 6= f (g ′ ).
Theorem (Completeness of characters‡ ). On every locally compact σ-
compact metric abelian group G there are enough characters to separate
points.
For G = T this is trivial, because the single character χ defined by
χ(x + Z) = e2πix
† This will be proved in Section 10.1.
‡ This will be established in Section 12.8, and holds more generally for locally compact
abelian groups.
3.3 Fourier Series on Compact Abelian Groups 93
for x ∈ R already separates points since it is an isomorphism between T

and S1 . For G = Td the characters χ1 , . . . , χd , where
χj (x + Zd ) = e2πixj
t
for x = (x1 , . . . , xd ) ∈ Rd , separate points since if x 6= y we must have
some j ∈ {1, . . . , d} with xj 6= yj , and then χj (x) 6= χj (y).
In some discussions about characters we will parameterize the collection
of all characters using some index set. For example, we will see shortly that
the characters on Td are parameterized by elements n ∈ Zd in a natural way
if we define for n ∈ Zd the character χn on Td by
χn (x + Zd ) = e2πin·x
where n·x denotes the usual inner product Rd . We will write x ∈ Td as a short-
hand for the element x+Zd ∈ Td , and whenever convenient we identify x ∈ Td
with x ∈ [0, 1)d .
Assuming the existence of a Haar measure and the completeness of char-
acters as above for a compact metric abelian group G, we will now describe
the theory of Fourier series on G. This will give a complete description
of L2 (G) = L2mG (G) where mG is the Haar measure on G. For convenience
we normalize mG to satisfy mG (G) = 1.
Theorem 3.47 (Fourier series). Assume that a metric compact abelian

group G has a Haar measure and satisfies completeness of characters. Then
the set of characters is finite or countably infinite and forms an orthonormal
basis of L2 (G). That is, the set of characters is an orthonormal set and
any f ∈ L2 (G) may be written as
X
f= aχ χ,
χ
where the sum, which runs over all the characters of G, is convergent(11) with
respect to k · k2 , the equality is meant as elements of L2 (G), the coefficients
are given by aχ = hf, χi, and they satisfy
X
|aχ |2 = kf k22 .
χ
The final equality is a form of Parseval’s theorem or Parseval’s formula.

Proof of Theorem 3.47. Let χ be a non-trivial character on G, so that
there is some element g ∈ G with χ(g) 6= 1. Since χ is continuous and by
assumption m(G) = 1, the function χ is integrable, and
Z Z Z
χ(x) dm(x) = χ(g + x) dm(x) = χ(g) χ(x) dm(x). (3.15)
G G G
In fact, in (3.15) we used the defining invariance property of the Haar measure
extended to integrals as in (3.13) and the fact that a character is in particular
a homomorphism. However, we have chosen g with χ(g) 6= 1 so (3.15) gives
Z
χ dm = 0.
G
Now let χ1 , χ2 be any characters, and write χ = χ1 χ2 . Then χ is also a

character, and since χ2 (g) = χ2 (g)−1 , we see that χ is trivial if and only
if χ1 = χ2 . Therefore the calculation above gives
Z (
m(G) = 1 if χ1 = χ2 ;
hχ1 , χ2 i = χ1 χ2 dm = δχ1 ,χ2 =
G 0 if χ1 6= χ2 ,
so the characters form an orthonormal set (and this is a consequence of the

properties of the Haar measure).
To show that there are only countably many characters on G, notice that
by orthonormality of √the set of characters, the L2 distance between any two
distinct characters is 2. By Lemma 2.46, C(G) is separable with respect to
the k · k∞ norm. This extends to L2 (G) with respect to k · k2 since the bound
Z
kf k22 = |f (g)|2 dm(g) 6 kf k2∞
G | {z }
6kf k2∞
shows that the embedding C(G) → L2 (G), which we know has dense image
by Proposition 2.51, is continuous. It follows that there can be only countably
many distinct characters, since an uncountable collection would
√ give rise to
an uncountable collection of disjoint open balls of radius 21 2, contradicting
separability.
In order to show completeness we will use the completeness of characters
from p. 92. Define the complex linear hull A = hχ | χ a character on Gi, and
notice that A is an algebra since the product of two characters is another
character. Also notice that A is closed under conjugation, since
χ(g) = χ(g) = χ(g)−1 = χ(−g)
for g ∈ G defines another character if χ is a character. Since by completeness

of characters the algebra A separates points in G, the Stone–Weierstrass
theorem now implies that A is dense in C(G) with respect to k·k∞ . However,
by the continuity of the embedding from C(G) to L2 (G) the closed linear hull
of A in L2 (G) contains C(G) and so by Proposition 2.51 must be all of L2 (G).
Now the theorem follows from the description of the closed linear hull of an
orthonormal list in Theorem 3.39.
3.4 Fourier Series on Td 95
Exercise 3.48. Let G ⊆ Td be a closed subgroup. Show that any character χ on G is the
restriction of a character of the form χn for some n ∈ Zd (by using the arguments from
the proof of Theorem 3.47).
Exercise 3.49. Find all the characters on G = Z/q Z and prove Theorem 3.47 directly for
this case.
Exercise 3.50 (Uncertainty principle on finite groups). Generalize Exercise 3.49 to

a finite abelian group G, and write G b for the group of characters on G.
(a) Show that G b is a group under the operation of pointwise multiplication, and that we
b = |G|.
have |G|
(b) Define the discrete Fourier transform of f : G → C to be the function fb : G b → C defined
P P P
by fb(χ) = |G|1
g∈G f (g)χ(g). Show that χ∈G b | b
f (χ)| 2 = 1
|G| g∈G |f (g)| 2 (Parseval’s
formula).
P
(c) Prove that kfbk∞ = maxχ∈G b 1
b |f (χ)| 6 |G| g∈G |f (g)| = kf k1 .
(d) Use the Cauchy–Schwarz inequality, Parseval’s formula, and the inequality from (c) to
deduce the following uncertainty principle(12) : | Supp f |·| Supp fb| > |G| for f ∈ L2 (G)r{0}.
Exercise 3.51. (a) Find all the characters on G = (Z/N Z)N (endowed with the product
topology). Show the existence of a Haar measure and the completeness of characters
from p. 92 for this case.
(b) Now set N = 2 and notice that G = (Z/2Z)N is, as a measure space, isomorphic to (0, 1)
with the Lebesgue measure (by using the binary expansion of real numbers). Interpret the
characters of G as maps on (0, 1) to obtain the orthonormal basis known as the Walsh
system.
Exercise 3.52. Let p ∈ N be a prime number. Describe all the characters on the compact
group of p-adic integers G = Zp , defined by
( ∞
)
Y
G = lim Z/(pn Z) = (zn ) ∈ Z/(pn Z) | zn ≡ zn+1 mod pn Z .
←−
n→∞ n=1
Show the existence of a Haar measure and the completeness of characters from p. 92 for
this case.
The following exercise shows how large the class of metric compact abelian
groups really is.
Exercise 3.53. Let Γ be a countable abelian group and use it to define
G = {(zγ ) ∈ TΓ | zγ1 +γ2 = zγ1 + zγ2 for all γ1 , γ2 ∈ Γ }.
(a) Show that G is a metric compact abelian group in the induced topology from the
product topology on TΓ .
(b) Use the theorem on completeness of the characters from p. 92 to show that the group
of characters on G is isomorphic to Γ .
3.4 Fourier Series on Td
The discussion in Section 3.3 applies in particular to the torus G = Td , giving

the basic theory of Fourier series of L2 functions there, but this case is so
important that we will treat it in greater detail here. Along the way, we will
give a proof for Fourier series on the torus that will be independent of the
theorem regarding Fourier series on general groups (Theorem 3.47).
For this section, we will define a character on Td to be a function of the
form
χn (x) = e2πin·x = e2πi(n1 x1 +···+nd xd )
for all x ∈ Td , for some n ∈ Zd . We will see in Corollary 3.67 that these are
indeed all the characters on Td in the sense of Section 3.3 (also see Exer-
cise 3.48). We note that χn (x) = χ−n (x) = χn (−x) for n ∈ Zd and x ∈ Td .
A trigonometric polynomial is a finite linear combination
X
p= a n χn
n∈F
of characters, where F ⊆ Zd is a finite set and an ∈ C for all n ∈ F .

As in the proof of Theorem 3.47, one can use the complex version of the
Stone–Weierstrass theorem to show that every continuous function can be
approximated by a trigonometric polynomial. In this section we will give
another proof of this using convolution.
Theorem 3.54 (Fourier series on the torus). The characters χn with n

in Zd form an orthonormal basis for L2 (Td ), so that every f ∈ L2 (Td ) is
given by an L2 convergent Fourier series
X
f= a n χn , (3.16)
n∈Zd
where the an are the Fourier coefficients defined by

Z
an = an (f ) = hf, χn i = f (t)χn (−t) dt
Td
for n ∈ Zd . Moreover, X
kf k22 = |an |2 . (3.17)
n∈Zd
Exercise 3.55. (a) Phrase Theorem 3.47 for d = 1 using the function χ0 = 1 and the
functions x 7→ cos(2πnx) and x 7→ sin(2πnx) for n > 1.
(b) For every n > 1 choose dn such that fn (x) = dn sin(πnx) has norm one in L2 ((0, 1)).
Show that (fn )n>1 forms an orthonormal basis of L2 ((0, 1)). Notice that each fn satisfies
the boundary conditions fn (0) = fn (1) = 0, which are called the Dirichlet boundary
conditions.
(c) For every n > 0 choose dn such that gn (x) = dn cos(πnx) has norm one in L2 ((0, 1)).
Show that (gn )n>0 forms an orthonormal basis of L2 ((0, 1)). Note that every gn satis-
′ (0) = g ′ (1) = 0, which are called the Neumann boundary conditions.
fies gn n
(d) Find an orthonormal basis (hn )n>1 of L2 ((0, 1)) that consists of smooth functions
satisfying the mixed boundary conditions hn (0) = h′n (1) = 0 for all n > 1.
Exercise 3.56. (a) Rephrase Theorem 3.47 for Td for real-valued functions using sine and
cosine functions.
(b) Find an orthonormal basis of L2 ((0, 1)d ) satisfying the Dirichlet boundary conditions
(that is, the basis should consist of smooth functions that vanish on the boundary of [0, 1]d ).
The relation (3.17) is Parseval’s formula, and it may be viewed as an

infinite-dimensional form of Pythagoras’ theorem. We will see later (see The-
orem 4.9 in Section 4.1.1) that it is too much to ask for the Fourier series
of a continuous function to converge uniformly, or even pointwise. However,
some additional smoothness assumptions do imply uniform convergence of
the Fourier series, and this will be the starting point for our excursion into
the theory of Sobolev spaces in Chapter 5 and Section 6.4.
Theorem 3.57 (Differentiability and Fourier series). Suppose that f

is a function in C k (Td ) for some k > 1. Let α = (α1 , . . . , αd ) ∈ Nd0 be a
multi-index with kαk1 6 k. Then the Fourier coefficient an (∂α f ) of ∂α f is
given by
an (∂α f ) = (2πin1 )α1 · · · (2πind )αd an (f ). (3.18)
If k > d/2, then the Fourier series on the right-hand side of (3.16) converges
absolutely, and
X q
kf k∞ 6 |an (f )| ≪d kf k22 + k∂ek1 f k22 + · · · + k∂ekd f k22 .
n∈Zd
The reader may wonder in what sense the absolute convergence is meant,
and the answer is in all of them: With respect to k · k2 , pointwise at every
point, and with respect to k · k∞ .
In order to prove these results (independently from the previous section)
we will need to discuss convolution.
3.4.1 Convolution on the Torus
Definition 3.58 (Convolution). Fix p, q ∈ [1, ∞] satisfying p1 + q1 = 1, let f

be an element of Lp (Td ) and g an element of Lq (Td ). Then the convolution
of f and g is the function f ∗ g defined by
Z
f ∗ g(x) = f (t)g(x − t) dt.
Td
A pair of numbers p, q related as in Definition 3.58 are called Hölder con-

jugate or conjugate exponents due to Hölder’s inequality kf gk1 6 kf kp kgkq
for f ∈ Lp (Td ) and g ∈ Lq (Td ) (see Theorem B.15). This implies in particular
that the integral defining f ∗ g(x) exists for all x ∈ Td .
Lemma 3.59. Let p, q ∈ [1, ∞] be Hölder conjugate numbers, let f ∈ Lp (Td )

and g ∈ Lq (Td ). Then
(1) f ∗ g = g Z
∗ f;
(2) f ∗ χn = f (t)χn (t) dt χn ; and
(3) hχm , χn i = δm,n .
Proof. The first formula follows by a simple substitution (see Exercise 3.46):
Z Z
f ∗ g(x) = f (t)g(x − t) dt = f (x − u)g(u) du = g ∗ f (x).
Td
| {z } Td
=u
The second formula follows from the definition, since

Z
f ∗ χn (x) = f (t)χn (x − t) dt
d
ZT Z
= f (t)χn (x)χn (−t) dt = f (t)χn (t) dt χn (x).
Td Td
For the last identity (which is a general property of characters, as we have

seen in Section 3.3), note that
χm (t)χn (t) = χm−n (t) = e2πi((m1 −n1 )t1 +···+(md −nd )td ) ,
and integrate this character over Td to obtain the result.
Lemma 3.60 (Continuity). Let p, q ∈ [1, ∞] be Hölder conjugate numbers.

If f ∈ Lp (Td ) and p < ∞ then the shifted function f x ∈ Lp (Td ) defined
by f x (t) = f (t − x) depends continuously on x ∈ Td in the k · kp norm.
Moreover, if f ∈ Lp (Td ) and g ∈ Lq (Td ), then f ∗ g ∈ C(Td ).
Proof. For 1 6 p < ∞, C(Td ) is dense in Lp (Td ) by Proposition 2.51. Now

fix f ∈ Lp (Td ), ε > 0 and choose F ∈ C(Td ) with kf − F kp < ε. Then by
uniform continuity of F there exists some δ > 0 for which
d(x, y) < δ =⇒ kF x − F y k∞ < ε =⇒ kF x − F y kp < ε.
Since shifting functions preserves their integrals and their p-norms, we deduce
that d(x, y) < δ implies that
kf x − f y kp 6 kf x − F x kp + kF x − F y kp + kF y − f y kp < 3ε,
showing the continuity of the map Td ∋ x 7→ f x ∈ Lp (Td ).

Now suppose that f ∈ Lp (Td ) and g ∈ Lq (Td ). By Lemma 3.59(1) we may
switch f and g if necessary and assume q < ∞. Fix ε > 0 and choose δ > 0
so that kg x − g y kq < ε holds whenever d(x, y) < δ. Using the fact that
the function g essentially appears in the shifted form g x in the definition
of f ∗ g(x), we now obtain
Z

|f ∗ g(x) − f ∗ g(y)| = f (t) (g(x − t) − g(y − t)) dt
d
ZT
6 |f (t)||g(x − t) − g(y − t)| dt
Td
6 kf kp kg x − g y kq 6 εkf kp
by the Hölder inequality, whenever d(x, y) < δ (strictly speaking the func-
tion (g̃)x (t) = g(x − t) with g̃(t) = g(−t) appears in the definition of f ∗ g(x),
but using kg̃ x − g̃ y kp = kg x − g y kp this does not make much of a difference).
As ε > 0 was arbitrary we see that f ∗ g is continuous.
3.4.2 Dirichlet and Fejér Kernels
Let us assume first that d = 1. By Lemma 3.59(2) the nth term in the Fourier
series of f is given by an (f )χn = f ∗ χn with an (f ) = hf, χn i for every n ∈ Z.
Thus the partial sums of the Fourier series satisfy
N N
!
X X
an (f )χn = f ∗ χn .
n=−N n=−N
This observation motivates the following definition.
Definition 3.61. The N th Dirichlet kernel is the function DN ∈ C(T)

defined by
XN
DN = χn .
n=−N
The 8th Dirichlet kernel is illustrated in Figure 3.3.
Fig. 3.3: The 8th Dirichlet kernel on the interval [− 21 , 12 ].

Fig. 3.4: The 8th Fejér kernel on the interval [− 21 , 21 ].
Lemma 3.62 (Dirichlet kernel). The Dirichlet kernel DN is real-valued

and can also be expressed in the form
(
2N + 1 if x = 0 ∈ T,
DN (x) = e2πi(N +1)x −e−2πiN x sin((N + 21 )2πx)
e2πix −1 = sin(πx) if x 6= 0,
and satisfies Z
DN (x) dx = 1.
T
Proof. The case x = 0 and the integral calculation follow immediately from
the definitions. To check the formula for x 6= 0 we notice that the Dirichlet
iφ −iφ
kernel is a geometric series and use the relation sin φ = e −e2i :
N
X n 2N
DN (x) = e2πix = e−2πiN x 1 + · · · + e2πix
n=−N
2N +1
−2πiN x e2πix −1 e2πi(N +1)x − e−2πiN x
=e =
e2πix − 1 e2πix − 1

2πi(N + 12 )x −2πi(N + 12 )x sin (N + 12 )2πx
e −e
= = .
eπix − e−πix sin(πx)
The latter formula also implies that DN is real-valued.

By the above, the Dirichlet kernel is real-valued but takes on both positive
and negative values. By averaging we obtain another kernel that only takes
on positive values, which will be crucial later.
Definition 3.63. The M th Fejér kernel is the function FM ∈ C(T) defined

by
M−1
1 X
FM = Dm .
M m=0
The 8th Fejér kernel is shown in Figure 3.4.
Lemma 3.64 (Fejér kernel). The M th Fejér kernel is given by


M if x = 0,
FM (x) = 2
1 sin(Mπx)
M sin(πx) if x 6= 0
and satisfies the following properties:

•ZFM (x) > 0 for all x ∈ T;
• FM (x) dx = 1;
T
• FM (x) → 0 as M → ∞ uniformly on every set of the form [δ, 1 − δ]
for δ > 0.
Proof. We first verify the formula claimed for FM . If x = 0, then

M−1
1 X
FM (0) = (2m + 1) = M.
M m=0
For x 6= 0 we use Lemma 3.62 and obtain

M−1
1 X e2πi(m+1)x − e−2πimx
FM (x) =
M m=0 e2πix − 1
M−1 M−1
!
1 1 X X
2πix 2πimx −2πimx
= e e − e
M (e2πix − 1) m=0 m=0

1 eπix πix e
2πiMx
−1 −πix e
−2πiMx
−1
= e − e
M (e2πix − 1) e2πix − 1 e−2πix − 1

1 eπix eπix
= e2πiMx − 1
M (e2πix − 1) e2πix − 1

e−πix −2πiMx

+ e −1
1 − e−2πix
1 1
= πix −πix 2
e2πiMx − 2 + e−2πiMx
M (e − e )
1 (eπiMx − e−πiMx )2 1 sin2 (M πx)
= = .
πix
M (e − e −πix )2 M sin2 (πx)
Now it isR clear that FM (x) > 0. R

Since T Dm (x) dx = 1 for all m > 0 we also have T FM (x) dx = 1. Finally,
for x ∈ [δ, 1 − δ] we have sin(πx) > π2 δ and hence
2
1 2
FM (x) 6 −→ 0
M πδ
uniformly for x ∈ [δ, 1 − δ] as M → ∞.

Proposition 3.65 (Density of trigonometric polynomials). For a con-

tinuous function f on T we have f ∗FM → f as M → ∞ with respect to k·k∞ .
In particular, trigonometric polynomials are dense in C(T).
This behaviour of the sequence of functions (FM ) with respect to con-
volution is also described by saying that the sequence (FM ) is an approx-
imate identity. As we will see in the proof, this property holds for any se-
quence of functions that satisfy the last three properties of the Fejér kernel
in Lemma 3.64.
Proof of Proposition 3.65. Let f ∈ C(T) and fix ε > 0. Then there exists
some δ ∈ (0, 12 ) for which d(x, y) < δ =⇒ |f (x)−f (y)| < ε. Now we estimate
the difference f ∗ FM (x) − f (x) Ras follows. By commutativity of convolution
and the facts that FM > 0 and T FM (t) dt = 1 we have
|f ∗ FM (x) − f (x)| = |FM ∗ f (x) − f (x)|

Z Z

= FM (t)f (x − t) dt − FM (t)f (x) dt

ZT T
6 FM (t)|f (x − t) − f (x)| dt.

T
Now we split the range of integration into the interval [δ, 1 − δ] and its com-
plement:
Z 1−δ
|f ∗ FM (x) − f (x)| 6 FM (t) |f (x − t) − f (x)| dt
δ | {z }
62kf k∞
Z δ
+ FM (t) |f (x − t) − f (x)| dt.
−δ | {z }
<ε
The first integral goes to zero as M → ∞ since FM → 0 uniformly as M → ∞

on [δ, 1 − δ]. Hence for large enough M , the first integral is smaller than ε.
The second integral is bounded above by ε as FM > 0 and
Z
FM (t) dt = 1.
T
As δ is independent of x, the same is true for M , and we see that f ∗ FM

converges uniformly to f .
To see the final statement of the proposition note that f ∗ FM is a trigo-
nometric polynomial (by linearity and Lemma 3.59(2)).
Exercise 3.66. Analyze where the above proof fails if we replace the Fejér kernel by the
Dirichlet kernel.
Proof of Theorem 3.54. We start with the case d = 1. Proposition 3.65

shows that the linear hull A of the characters (that is, the space of trigono-
metric polynomials) is dense in C(T) with respect to k · k∞ . Therefore the

same holds with respect to k·k2 in L2 (T). By Lemma 3.59, the characters on T
form an orthonormal set. Thus the description of the closed linear hull of the
characters follows from Proposition 3.36 and proves the theorem for d = 1.
The case of d > 2 is similar once we have shown that the space of trigono-
metric polynomials is dense in the continuous functions. For this, notice first
that
g
FM (x1 , . . . , xd ) = FM (x1 ) · · · FM (xd )
is a trigonometric polynomial satisfying

g
• F
R M > 0,
• Td FgM (x) dx = 1, and
R R d
g δ
• [−δ,δ]d FM (x) dx = −δ
FM (t) dt >1−ε
for ε, δ > 0 and large enough M (how large depending on ε and δ). Next notice
that f ∗ Fg d
M is a trigonometric polynomial for any f ∈ C(T ). The argument
is now similar to the case d = 1: we again show that the sequence (F gM ) is an
d
approximate identity. Given f ∈ C(T ) and ε > 0 we can choose δ > 0 such
that |f (x − t) − f (x)| < ε for x ∈ Td and t ∈ [−δ, δ]d . This implies that
Z
g g
f ∗ FM (x) − f (x) 6 |f (x − t) − f (x)|F M (t) dt
Td
Z
6 g
|f (x − t) − f (x)| FM (t) dt
[−δ,δ]d | {z }
<ε
Z
+ g
2kf k∞FM (t) dt
Td r[−δ,δ]d
< ε + 2kf k∞ ε,
as required. As in the case d = 1 this implies that the set of characters forms
an orthonormal basis, and the theorem follows.
Let us briefly describe how the definition of a character in Section 3.3
relates to the characters χn for n ∈ Zd that we used here (see also Exer-
cise 3.48).
Corollary 3.67 (Description of characters). Every character of Td in the

sense of the definition given on p. 92 is of the form χ = χn for some n ∈ Zd .
Proof. Let χ : T → S1 be a continuous homomorphism. By Proposi-

tion 3.65, χ can be approximated uniformly by a trigonometric polynomial f .
If χ does not appear in this trigonometric polynomial, then f is orthogonal
to χ (by the argument on p. 94 based on (3.15)) and so cannot be close to χ.
However, by definition of the Fejér kernel, the characters appearing in f are
those of the form χn for n ∈ Z.
Now suppose that d > 1 and χ : Td → S1 is a continuous homomorphism.

Writing x = (x1 , . . . , xd ) we have
χ(x) = χ(x1 , 0, 0, . . . , 0) χ(0, x2 , 0, . . . , 0) · · · χ(0, 0, . . . , xd ),

| {z }| {z } | {z }
χ(1) (x1 ) χ(2) (x2 ) χ(d) (xd )
where each χ(i) : T → S1 is a continuous homomorphism for 1 6 i 6 d. By

the argument above we have χ(i) = χni for some ni ∈ Z, and so the result
follows.
Exercise 3.68 (Polynomial approximate identities on R). Give another proof of

the Weierstrass approximation theorem (Example 2.41) using the Landau kernel defined
R1
by Ln (x) = ℓn (1 − x2 )n for x ∈ R, where ℓn > 0 is chosen to make −1 Ln (x) dx = 1.
(a) Given f ∈ C([0, 1]), extend f continuously to R with support in [−c, 1 + c] for some
small c > 0 (the choice of which will matter in (b)). Show that
Z
Ln ∗ f (x) = Ln (t)f (x − t) dt
R
is a polynomial.
(b) State, prove, and use an appropriate approximate identity property of the sequence (Ln )
to show that Ln ∗ f (x) converges uniformly on [0, 1] to f as n → ∞.
3.4.3 Differentiability and Fourier Series
We now turn to the interplay between Fourier series and differentiation, with
the goal of proving Theorem 3.57. As we will see, this relationship will be a
simple but important consequence of integration by parts.
Suppose that f ∈ C 1 (Td ) and j ∈ {1, . . . , d}. Notice that
Z xj =1 Z

∂j f (x)χn (x) dxj = f (x)χn (x) − f (x)∂j χn (x) dxj
T xj =0 T
Z
= − f (x)∂j χn (x) dxj
T
Z
= 2πinj f (x)χn (x) dxj (3.19)
T
by integration by parts and periodicity, since
(∂j χn ) (x) = ∂j e2πi(n1 x1 +···+nd xd ) = 2πinj e2πi(n1 x1 +···+nd xd ) .
Integrating over the remaining variables, we see that the Fourier coefficients
of f and of the partial derivative ∂j f satisfy the relation
Z Z
an (∂j f ) = ∂j f (x)χn (x) dx = 2πinj f (x)χn (x) dx (by (3.19))
Td Td
= 2πinj an (f ). (3.20)
Proof of Theorem 3.57. The formula (3.18) follows from (3.20) by induc-
tion on k. To prove the last claim of the theorem, we will show that
X q
|an (f )| ≪d kf k22 + k∂1k f k22 + · · · + k∂dk f k22 (3.21)
n∈Zd
for f ∈ C k (Td ) and k > d2 . Assuming (3.21) for the moment, we see that
X
an (f )χn
n∈Zd
is an absolutely convergent series with respect to k · k∞ , and so converges to

some limit F in C(Td ) with respect to k ·k∞. However, since k ·k2 6 k ·k∞ the
same function F is also a limit with respect to k · k2 . Hence by Fourier series
on the torus (Theorem 3.54) we have F = f first as an identity in L2 (Td )
(and hence almost everywhere), but as both functions are continuous, also
in C(Td ) (and hence everywhere).
To prove (3.21), we start by expressing the right-hand side in terms of
the Fourier coefficient an = an (f ). By Parseval’s theorem in (3.17) applied
to ∂ekj f we have
X X 2k
k∂ekj f k22 = |an (∂ekj f )|2 = (2πnj ) |an (f )|2 ,
n∈Zd n∈Zd
where we have used (3.18) in the last step. Therefore we can simplify the sum
under the square root in (3.21) to give the estimate
 
X d
X
kf k22 + k∂ek1 f k22 + · · · + k∂ekd f k22 = 1 + (2π)2k n2k
j
 |an (f )|2
n∈Zd j=1
X
2k
≫d 1 + knk2 |an (f )|2
n∈Zd
√ Pd
since knk2 6 d max16j6d |nj | and hence knk2k
2 ≪d
2k
j=1 |nj | . We claim
that −1/2
1 + knk2k2 d
∈ ℓ2 (Zd ) (3.22)
n∈Z
d
for k > 2.
From this claim the inequality (3.21) follows quickly by the
Cauchy–Schwarz inequality:
X X −1/2 1/2
|an (f )| = 1 + knk2k
2 1 + knk2k
2 |an (f )|
n∈Zd n∈Zd
sX

2k −1/2

6
1 + knk 1 + knk2k |an (f )|2
n∈Z
2 d 2
ℓ2 (Zd ) n∈Zd
q
≪d kf k22 + k∂ek1 f k22 + · · · + k∂ekd f k22 .
To verify the claim that

X 1
<∞
1 + knk2k
2
n∈Zd
we split up the sum. Firstly, by running through the possibilities of the signs of
the nj , it is sufficient to show convergence for n1 , . . . , nd > 0. Secondly, using
the symmetry of the summands with respect to permutation of the variables,
we may restrict the sum to those n ∈ Zd for which n2 , n3 , . . . , nd 6 n1 , and
we may also assume that n1 > 1. Now 1 + knk2k 2 > n1 , so
2k
X X∞ Xn1
1 1
2k
≪d
1 + knk2 n2k
n =1 n ,...,n =0 1
n∈Zd 1 2 d
X∞ X∞
(n1 + 1)d−1 1
= 2k
≪d 2k+1−d
,
n1 =1
n1 n
n1 =1 1
and the last sum converges if 2k > d. This implies the claim above, the
inequality (3.21), and hence the theorem.
The above is already sufficient to answer another claim from Section 1.5.
We state a special case of the inheritance of smoothness below, and return
again to this topic in Chapter 5.
Exercise 3.69. Let f be a real-valued function defined on an open subset U ⊆ R2 . Suppose
that f is continuous and that ∂14 f , ∂24 f exist and are continuous. Show that ∂1 ∂2 f exists
and is continuous.
3.5 Group Actions and Representations
We are going to describe in this section in particular how functions on R2

can be decomposed into functions that have special rotational symmetries as
alluded to in Section 1.1. The same argument will also apply to symmetries
with respect to rotations about the z-axis in R3 (but not to all rotations
in R3 ), and to many other situations. A convenient framework that incorpor-
ates all of these examples and much more is the following set-up.
3.5 Group Actions and Representations 107
3.5.1 Group Actions and Unitary Representations
Definition 3.70. Let G be a topological group, and let X be a topological

space. A continuous group action of G on X is a continuous map
. : G × X −→ X
(g, x) 7−→ g.x
.. . .
with g (h x) = (gh) x for g, h ∈ G and x ∈ X, and e x = x for all x ∈ X,
where e ∈ G is the identity element.
In this definition we have used multiplicative notation for the group oper-
ation in G, but as usual if G is abelian we will often use additive notation.
Definition 3.71. Let G be a topological group with a continuous action on

a topological space X, and let µ be a measure on the Borel sets of X. Then
we say that the G-action is measure-preserving (or that the measure is G-
.
invariant ) if µ(g B) = µ(B) for all g ∈ G and any Borel set B ⊆ X.
It is straightforward (see Exercise 3.72) to see that a measure-preserving

action also preserves integration with respect to µ in the sense that
Z Z Z Z
f g dµ = .
f (g −1 x) dµ(x) = f (x) dµ(x) = f dµ, (3.23)
X X X X
for all integrable functions f and g ∈ G, where we define f g (x) = f (g −1 x).

for all x ∈ X (the inverse in the definition of f g is only necessary if G is
non-abelian). In particular, if f1 , f2 ∈ L2µ (X) then kf1g k2 = kf1 k2 and, more
generally
hf1g , f2g i = hf1 , f2 i . (3.24)
We can associate to the action of g on X an operator πg on L2µ (X) defined
by
πg f = f g = f ◦ g −1 .
Using (3.24) and the relation πg πg−1 = πg−1 πg = I, we see that πg is unitary
(see Definition 3.8 and Exercise 3.9) for all g ∈ G.
Essential Exercise 3.72. Prove that a group action is measure-preserving

if and only if (3.23) holds for all integrable f .
Definition 3.73. A unitary representation of a topological group G on a

.
Hilbert space H is a map π : G → B(H), written as πg (or π(g), g v, or v g
for v ∈ H), such that
• (Identity) πe = I, the identity operator on H;
• (Composition) πg1 ◦ πg2 = πg1 g2 for g1 , g2 ∈ G;
• (Unitary) πg : H → H is unitary for every g ∈ G; and
• (Continuity) for any given v ∈ H, the map g 7→ πg v ∈ H is continuous.

We note that the first three properties together state that π is a homo-
morphism into the group of unitary operators.
Lemma 3.74 (Continuity). Let G be a locally compact metric group G
acting continuously on a locally compact σ-compact metric space X. Suppose
that the action is measure-preserving with respect to a locally finite† measure µ
on the Borel sets of X. Then the induced action of G on H = L2µ (X) is
a unitary representation of G on H. More generally, for any p ∈ [1, ∞)
and f ∈ Lpµ (X) we have that G ∋ g 7→ f g ∈ Lpµ (X) is continuous with respect
to k · kp and satisfies kf g kp = kf kp .
Lemma 3.74 is an important motivation for the study of unitary repres-
entations in general. We will return to this topic in several settings.
Proof of Lemma 3.74. By Exercise 3.72 (see the hints on p. 563), πg defined
.
by πg f (x) = f (g −1 x) for x ∈ X and f ∈ Lpµ (X) satisfies kπg f kp = kf kp for
every g ∈ G. The first properties of a unitary representation hold trivially.
For the second, let f be an element of Lpµ (X) and g1 , g2 ∈ G. Applying the
definition twice we obtain
. . .
πg1 (πg2 f )(x) = πg2 f (g1−1 x) = f (g2−1 (g1−1 x))
= f ((g g ) .x) = π
1 2
−1
g1 g2 f (x)
for x ∈ X, giving the second defining property of a unitary representation.

For the continuity we essentially have to repeat the argument from the
proof of Lemma 3.60 regarding convolutions on the torus. So let p ∈ [1, ∞)
and f ∈ Lpµ (X) and fix ε > 0. By Proposition 2.51, there exists some func-
tion F ∈ Cc (X) with
kf − F kp < ε. (3.25)
Let U be a compact neighbourhood of e ∈ G, so that
.
K = U Supp F ⊆ X
is compact and hence has finite measure. The map
(g, x) 7−→ F (g −1 x) .
is continuous and so is uniformly continuous on the set U × K. Hence there
exists a δ > 0 for which
p
.
d(g, e) < δ =⇒ g ∈ U and F (g −1 y) − F (y) < ε/ µ(K)
p
.
for all y ∈ K. Note also that F (g −1 y) 6= 0 for y ∈ X and g ∈ U im-
.
plies g −1 y ∈ Supp F and hence y ∈ K. Thus
† That is, a measure µ with the property that at every point there is an open neighbourhood
of the point with finite µ-measure.
Z

kπg F − F kpp = .
F (g −1 x) − F (x)p dµ(x)
X
Z

= F (g −1 .x){z− F (x)} dµ(x) < ε
p p
K|
<εp /µ(K)
for all g with d(g, e) < δ. Together with (3.25), this gives
kπg f − f kp < 3ε
for all g ∈ U with d(g, e) < δ. Therefore, G ∋ g 7→ πg f is continuous at g = e.

However, this gives the general case since
kπg f − πg0 f kp = kπg−1 g f − f kp < 3ε

0
if we assume that g is sufficiently close to g0 so that d(g0−1 g, e) < δ. As ε > 0

and g0 ∈ G were arbitrary, continuity of g 7→ πg f follows.
We explain now that any continuous group action and invariant measure
gives rise to a convolution, which will play a critical role in the following
discussions.
Lemma 3.75. Let G be a locally compact σ-compact metric group with a

left Haar measure mG , X a locally compact σ-compact metric space with
a continuous group action of G on X, and µ a locally finite G-invariant
.
measure on X. Let φ ∈ L1 (G), p ∈ [1, ∞) and f ∈ Lpµ (X). Then the integral
Z
φ ∗ f (x) =
. φ(g)f (g −1 x) dmG (g).
G
exists for µ-almost every x ∈ X, and kφ ∗ f kp 6 kφk1 kf kp .

.
.
We note that the inequality also shows that φ ∗f ∈ Lpµ (X) only depends
on the equivalence classes of f ∈ Lpµ (X) and φ ∈ L1 (G).
Proof of Lemma 3.75. Let (Y, B, ν) be a probability Rspace. Recall R that the
p
map [0, ∞) ∋ t 7→ tp is convex, which implies that f dν 6 f p dν for
any non-negative simple function f on Y . Using monotone convergence we
obtain the same for any non-negative measurable f on Y (that is, a special
case of Jensen’s inequality).
Consider now G, φ, X, and f as in the lemma,
R but assume first that φ
is a non-negative measurable function with φ dmG = 1, and f > 0. We
.
apply Jensen’s inequality to the function g 7→ f (g −1 x) and the probability
measure φ dmG on G to obtain
Z p Z
−1 .
f (g x)φ(g) dmG (g) 6 .
f (g −1 x)p φ(g) dmG (g).
G G
. R
.
Fubini’s theorem shows that φ ∗f (x) = G f (g −1 x)φ(g) dmG (g) depends
measurably on x ∈ X. Integrating over X with respect to µ gives
Z Z Z
.
(φ ∗f )p dµ 6 .
f (g −1 x)p φ(g) dmG (g) dµ(x)
X X G
Z
= kf kpp φ(g) dmG (g) = kf kpp ,
G
where we also applied Fubini’s theorem to the function
.
G × X ∋ (g, x) 7−→ f (g −1 x)p φ(g)
and used the fact that πg preserves the p-norm (see Lemma 3.74). Taking
.
the pth root, we obtain kφ ∗f kp 6 kf kp in the case considered.
If f ∈ Lpµ (X) and φ ∈ L1 (G), then we can apply the above to fe = |f |
and φe = kφk−1
1 |φ|. Since
Z

∗ .
|φ f |(x) = φ(g)f (g x) dmG (g)
−1 .
G
Z
6 kφk1 .
e fe(g −1 x) dmG (g) = kφk1 φ̃ ∗f˜(x),
φ(g) .
G
the lemma follows from the previous case.
Essential Exercise 3.76. Prove Lemma 3.75 in the case p = ∞.
We will sometimes consider direct sums of representations as in the next

exercise.
Exercise 3.77. Let G be a topological group, and let π L n be a unitary representation of G
on the Hilbert space Hn for n > 1. Define H⊕ = n Hn as in Exercise 3.37. Show
that π⊕,g (vn ) = (πn,g vn ) for g ∈ G and (vn ) ∈ H⊕ defines a unitary representation of G.
3.5.2 Unitary Representations of Compact Abelian Groups
We now describe the decomposition of elements in a complex Hilbert space

into elements of special types with respect to a unitary representation of a
compact abelian group.
Definition 3.78. Let G be a topological group and let π be a unitary repres-

entation of G on the complex Hilbert space H. Let χ : G → S1 be a character
on G. Then v ∈ H is of type χ or has weight χ (type or weight n ∈ Zd
if χ = χn in the case G = Td ) if πg v = χ(g)v for all g ∈ G. We also define
the weight space Hχ = {v ∈ H | v has weight χ}.
T
We note that Hχ = g∈G ker(πg − χ(g)I) is in fact just a common ei-
genspace if one considers all operators πg for g ∈ G simultaneously, and in
particular is closed. Here χ gives us the various eigenvalues χ(g) as g ∈ G
(and hence the operator πg ) varies.
Lemma 3.79 (Orthogonality). Let G, π, and H be as in Definition 3.78,

then Hχ ⊥ Hη for any two characters χ 6= η of G.
Proof. Let v ∈ Hχ , w ∈ Hη and g ∈ G. Then
χ(g) hv, wi = hχ(g)v, wi = hπg v, wi = hv, π−g wi = hv, η(−g)wi = η(g) hv, wi .
However, for some g ∈ G we have χ(g) 6= η(g) and so hv, wi = 0.

The following result gives as a special case the decomposition of functions
on R2 into components of different weights for SO2 (R).
Theorem 3.80 (Spectral theorem of compact abelian groups). Let G

be a compact metric abelian group and let π be a unitary representation on
a complex Hilbert space H. Then the weight spaces Hχ are closed, mutually
orthogonal, subspaces of H and
M
H= Hχ ,
χ
where the sum is over all charactersPχ of G. More concretely, every v ∈ H

can be written as a convergent sum χ vχ in H, where
Z
vχ = χ ∗π v = χ(g)πg v dmG (g) (3.26)
G
has weight χ.
We need to explain the meaning of the convolution by χ in (3.26) by

extending the notion of Riemann or Lebesgue integration to functions taking
values in a Hilbert space (or even a Banach space). In Lemma 3.75 we have
already seen one possible interpretation of the convolution in the case where
the Hilbert space is L2µ (X) and the unitary representation is induced from
a measure-preserving action. Although this is an interesting case, there are
other ways to come by a unitary representation. Hence we will discuss in the
following two subsections two more general definitions of such integrals.
3.5.3 The Strong (Riemann) Integral
One way to interpret χ ∗π v is to generalize the Riemann integral to this con-

text, giving rise to a special case of the Bochner or strong integral. This ap-
proach applies to any unitary representation as in Theorem 3.80 (and more
generally). Recall that for a fixed v ∈ H we have assumed in that theorem

that πg v ∈ H depends continuously on g ∈ G, and since the map g 7→ χ(g)
is continuous we also see that χ(g)πg v ∈ H depends continuously on g ∈ G.
Extracting the essential assumptions from this application we obtain the as-
sumptions of the next statement.
Proposition 3.81 (Strong integration). Let (X, d) be a compact metric
space and let µ be a finite Borel measure on X. Let V be a Banach space,
and let f : X → V be a continuous map. Then we can define Riemann sums
of the form X
R(f, ξ) = mG (P )f (xP ) ∈ V,
P ∈ξ
where ξ = {P1 , . . . , Pk } is a partition of X into finitely many non-empty

measurable sets and xP ∈ P is a point chosen arbitrarily in each partition
element P ∈ ξ. Suppose that
lim max (diam(P )) = 0 (3.27)

n→∞ P ∈ξn
along a sequence of partitions (ξn ). Then the sequence of associated Riemann

sums (R(f, ξn )) forms
R a Cauchy sequence in V . The limit is a well-defined
Riemann integral R- fX dµ ∈ V that is independent of the choice of the parti-
tion and sample points.
Proof. Since f is continuous on a compact metric space, it is also uniformly

continuous. Fix some ε > 0 and let δ > 0 be such that d(x, y) < δ im-
plies kf (x) − f (y)k < ε. Suppose ξ = {P1 , . . . , Pk } and ζ = {Q1 , . . . , Qℓ } are
two partitions such that diam(Pi ) < δ and diam(Qj ) < δ for all i, j. Let
η = ξ ∨ ζ = {Pi ∩ Qj | Pi ∩ Qj 6= ∅, i = 1, . . . , k and j = 1, . . . , ℓ}.
Fix some choice of sample points for ξ, ζ, and η and note that xPi ∈ Pi
and zPi ∩Qj ∈ Pi ∩ Qj ⊆ Pi implies d(xPi , zPi ∩Qj ) < δ for every i and j. This
gives

X
k X′
kR(f, ξ) − R(f, η)k =
(f (x Pi ) − f (z Pi ∩Qj ))µ(Pi ∩ Q )
j
i=1 j
k X
X ′
6 kf (xPi ) − f (zPi ∩Qj )kµ(Pi ∩ Qj )
i=1 j
k X
X ′
6 εµ(Pi ∩ Qj ) = εµ(X),
i=1 j
P
where we write ′j for the sum over those j ∈ {1, . . . , ℓ} with Pi ∩ Qj 6= ∅.
The same holds for the Riemann sums R(f, ζ) and R(f, η), which implies
that
kR(f, ξ) − R(f, ζ)k 6 2µ(X)ε.
This implies the lemma: If (ξn ) is a sequence satisfying (3.27), then for
every ε > 0 there exists some N such that ξ = ξm , ζ = ξn satisfy the above
discussions whenever m, n > N . This implies that (R(f, ξn )) is a Cauchy
sequence. If (ζn ) is another such sequence, we may mix the two sequences
of partitions into another sequence (by, for example, setting η2n−1 = ξn
and η2n = ζn for all n ∈ N) satisfying (3.27). Since the Riemann sums for
this sequence also form a Cauchy sequence, we see that the limit is indeed
independent of the choice of the sequence of partitions and the choice of the
sample points.
Note that in the context of Theorem 3.80 we may set X = G, the meas-
ure µ = mG , V = H, and f (g) = χ(g)πg (v) and use Proposition 3.81 to
obtain a definition of χ ∗π v.
Essential Exercise 3.82. Using

R the same notation and assumptions as in
Proposition 3.81, show that R- X f dµ depends linearly on f and satisfies
Z Z

R- f dµ 6 kf k dµ.

X X
3.5.4 The Weak (Lebesgue) Integral
In the following we will describe an integral for functions taking values in a

Hilbert space, and this is also the approach that requires the least amount of
structure for the function. On the other hand, this approach does not allow
integration of functions taking values in any Banach space, but works simil-
arly for any dual space (and so also for the class of reflexive Banach spaces to
be defined later). This is a special case of the weak, Pettis or Gelfand–Pettis
integral. In the general setting the properties of the Pettis integral involve
subtle developments in measure theory; we refer to Talagrand [102] for the
details.
Proposition 3.83 (Weak integration). Let (X, B, µ) be a measure space

and let H be a Hilbert space. Let the function f : X → H have the properties
that x 7→ kf (x)k is measurable and integrable, and that for any v ∈ H the
map x 7→ R hv, f (x)i is measurable. Then there exists a unique element of H
denoted X f dµ and called the weak integral of f with
Z Z
v, f dµ = hv, f (x)i dµ(x) (3.28)
X X
R
for all v ∈ H. Moreover, X f dµ depends linearly on f and satisfies
Z Z

f dµ 6 kf k dµ.

X X
Proof. To see this we only have to show that the right-hand side of (3.28)
defines a continuous functional on H, for then the Fréchet–Riesz representa-
tion theorem (Corollary 3.19) implies the claimed existence and uniqueness.
By the Cauchy–Schwarz inequality we have
Z Z Z

hv, f (x)i dµ(x) 6 |hv, f (x)i| dµ(x) 6 kvk kf (x)k dµ(x),

X X X
so the hypotheses show that the integral converges, and hence the map is
well-defined. Moreover, for any scalar α and v, w ∈ H we have by linearity of
the inner product that
Z Z Z
hαv + w, f (x)i dµ(x) = α hv, f (x)i dµ(x) + hw, f (x)i dµ(x),
X X X
showing linearity of Rthe functional. R R

This shows that X f dµ ∈ H exists, and that X f dµ 6 X kf k dµ.
Using uniqueness, one can now check that this definition depends linearly
on f .
Lemma 3.84. Let (X, d) be a compact metric space and let µ be a finite Borel
measure on X.
R Let H beR a Hilbert space, and let f : X → H be a continuous
map. Then R- X f dµ = X f dµ, that is, the strong and weak integrals agree.
Proof.R Let (ξn ) be a sequence of partitions as in Proposition 3.81 defin-

ing R- Xf dµ as the limit of the Riemann sums R(f, ξn ). Let w ∈ H and
notice that
X Z
hw, R(f, ξn )i = hw, f (xP )i µ(P ) = Fn dµ,
P ∈ξn X
where Fn is the simple function with values Fn (x) = hw, f (xP )i for all x ∈ P
and P ∈ ξn . Note that Fn (x) → hw, f (x)i as n → ∞ by continuity of f and
the assumption (3.27) on (ξn ). Letting n → ∞ we can apply the definition of
the strong integral and dominated convergence to obtain
Z Z
w, R- f dµ = hw, f (x)i dµ(x).
X X
As this holds for any w ∈ H, the lemma follows from the construction of the
weak integral in Proposition 3.83.
In the context of unitary representations of a group G on a Hilbert space H,
we can use the notions of integration above to define convolution with meas-
ures or with L1 functions as follows.
Definition 3.85. Let π be a unitary representation of a topological group G

on a Hilbert space H. Let µ be a finite measure on G. Then for v ∈ H we
define the convolution operator
Z
µ ∗π v = πg v dµ(g).
G
If G has a left Haar measure mG and φ ∈ L1 (G) = L1mG (G), then for v ∈ H
we define Z
φ ∗π v = φ(g)πg v dmG (g).
G
Essential Exercise 3.86. (a) Let π, G, H, µ, and φ be as in Definition 3.85.

Show that kµ ∗π vk 6 µ(G)kvk and kφ ∗π vk 6 kφk1 kvk for all v ∈ H, so
that µ ∗π (·) and φ ∗π (·) define two bounded operators on H.
(b) Suppose that the unitary representation π on H = L2µ (X) is induced by a
measure-preserving action of G as in Lemma 3.75. Let ν be a finite measure
on G, φ ∈ L1 (G) and f ∈ H. Generalize Lemma 3.75 to also give a definition
. . .
of ν ∗ f . Show that the pointwise defined functions ν ∗f and φ ∗ f satisfy (3.28)
. .
(or equivalently that ν ∗ f = ν ∗π f and φ ∗f = φ ∗π f ).
3.5.5 Proof of the Weight Decomposition
We are now ready to prove Theorem 3.80 which, apart from the generalized
context, is a simple extension of the theorem regarding Fourier series on
compact abelian groups (Theorem 3.47). We will use the assumptions of the
theorem in this section without further remark.
Lemma 3.87 (Convolution with χ). Let χ be a character of G. Then χ ∗π v

has weight χ for any v in H.
Proof. We need to prove that πg (χ ∗π v) = χ(g)(χ ∗π v). Then
hw, πg (χ ∗π v)i = hπ−g w, χ ∗π vi

Z
= hπ−g w, χ(h)πh vi dmG (h)
ZG
= hw, χ(h)πg+h vi dmG (h)
ZG
= hw, χ(h′ − g)πh′ vi dmG (h′ )
ZG
= hw, χ(g)χ(h′ )πh′ vi dmG (h′ ) = hw, χ(g)χ ∗π vi
G
for w ∈ H, which gives the lemma.

Lemma 3.88 (Convolution with χ). If χ is a character and v ∈ Hχ

then we have χ ∗π v = v. If η 6= χ are two different characters and v ∈ Hη ,
then χ ∗π v = 0.
Proof. If v ∈ Hχ then
Z Z
χ ∗π v = χ(g)πg v dmG (g) = χ(g)χ(g)v dmG (g) = v,
G G
since we assume that mG (G) = 1 (strictly speaking, we should argue by

taking the inner product with another vector, as in Proposition 3.83; here,
and in similar cases below, we keep this implicit when convenient). Also
for v ∈ Hη we have
Z Z Z
χ ∗π v = χ(h)πh v dmG = χ(h)η(h)v dmG = χη dmG v = 0.
G G G
Proof of Theorem 3.80. Let H′ be the closed linear hull of Hχ for all
characters χ of G. By Lemma 3.79 the various weight spaces are mutually
orthogonal, and (as already noted) closed, which gives us the following de-
scription of their closed linear hull
( )
M X X
′ 2
H = Hχ = vχ | vχ ∈ Hχ and kvχ k < ∞ ,
χ χ χ
see Exercise 3.37. If H′ = H then the theorem follows from Exercise 3.86(a)
(which specializes PropositionP 3.83) and Lemma 3.88. In fact, (3.26) can then
be shown as follows: if v = χ vχ is an element of H′ with vχ ∈ Hχ for every
character χ of G, then continuity of convolution implies
!
X
χ ∗π v = χ ∗π vχ = χ ∗π vχ = vχ
χ
for any character χ of G.

To see that H′ = H we show that any vector v ∈ H can be approximated
by a vector in the linear hull of the spaces Hχ . Fix v and ε > 0. By continuity
of the unitary representation there exists some δ > 0 such that kπg v − vk < ε
for g ∈ BδG . By Urysohn’s lemma (Lemma A.27) there exists a function f
in C(G) with f (0) > 0, f > 0, and f (g) = 0 for all g ∈ GrRBδG .
Replacing f by a multiple of itself we may also suppose G f dmG = 1. Us-
ing this function together with the last part of the proof of Theorem 3.47 (or
the more concrete argument from Section 3.4.2 in the case of the torus) we see
that there is a trigonometric polynomial (that is, a finite linear combination
of characters) F with the following properties:
• F
Z > 0;
• F dmG = 1; and
ZG
• F dmG > 1 − ε.
BδG
The careful reader might notice that the approximation may not be real-
valued nor have integral one. To deal with this issue we let F1 be the first ε′ -
approximation and define F2 to be the function 12 (F1 + F1 ) R+ ε′ , which is a
′ −1
positive
R 2ε -approximation since Rf > 0. Now define F′ = ( F2 dmG ) F2 .
Since f dmG = 1, we have | F2 dmG − 1| < 2ε and so F is an ε-
approximation if ε′ is sufficiently small.
Then F ∗π v ∈ H′ is a finite linear combination of elements from weight
spaces by Lemma 3.87. However, we also claim that
kF ∗π v − vk 6 ε (1 + 2kvk) . (3.29)
To see this, let w ∈ H, and then notice that

Z Z

|hw, F ∗π v − vi| = w, F (g)πg v dmG (g) − w, F dmG v
G G
Z

= hw, F (g) (πg v − v)i dmG (g)

ZG
6 |hw, πg v − vi| F (g) dmG (g)
BδG | {z }
6kwkε
Z
+ |hw, πg v − vi| F (g) dmG (g)
GrBδG | {z }
62kwkkvk
6 kwkε (1 + 2kvk),
which implies (3.29) since we may apply the inequality with w = F ∗π v − v.

For the case of the continuous action of T on R2 defined by

.x
φ 1 = k(2πφ)x =
x2
cos(2πφ) − sin(2πφ)
sin(2πφ) cos(2πφ)
x1
x2
,
for φ ∈ T and all x = (x1 , x2 )t ∈ R2 , the above immediately gives the

following corollary (that we hinted at in Section 1.1).
Corollary
P 3.89. Every function f ∈ L2 (R2 ) can be written uniquely as a
sum n∈Z fn that converges with respect to k · k2 , where
Z
fn (x) = χn (φ)f (k(2πφ)x) dφ
T
and fn has weight n with respect to the rotation action of T on R2 for all
integers n ∈ Z.
Exercise 3.90. Let f ∈ C ∞ (R2 ) ∩ L2 (R2 ) (or, more generally, f ∈ C ∞ (R2 )). Show that
the decomposition of f given by Corollary 3.89 converges uniformly on compact subsets
of R2 to f . How much smoothness is needed to arrive at this uniform convergence?
3.5.6 Convolution
We conclude the chapter by generalizing the convolution considered in Sec-

tion 3.4.1 to a more general context, combining it with the discussion regard-
ing the convolution with respect to a unitary representation.
Proposition 3.91 (Convolutions). Let G be a locally compact σ-compact

metric group with a left Haar measure mG . Define the convolution f1 ∗ f2
of f1 , f2 ∈ L1 (G) by
Z
f1 ∗ f2 (g) = f1 (h)f2 (h−1 g) dmG (h)
for all g ∈ G. The integral defining f1 ∗ f2 (g) exists for mG -almost every g
in G, and defines an element f1 ∗ f2 ∈ L1 (G) with kf1 ∗ f2 k1 6 kf1 k1 kf2 k1 . In
other words, the convolution makes L1 (G) into a separable Banach algebra.
Suppose in addition π is a unitary representation of G on the Hilbert space H.
Then
f1 ∗π (f2 ∗π v) = (f1 ∗ f2 ) ∗π v,
where ∗π is defined in Definition 3.85. In other words, H is a module for the
Banach algebra L1 (G).
Proof. Let G be as in the proposition. Then G ∋ h 7−→ g h = gh for .

every g ∈ G defines (by the definition of topological group) a continuous
group action of G on X = G, called the left action of G (on G). By the
definition of a left Haar measure, mG is a G-invariant locally finite meas-
ure on G for the left action of G. Therefore, we may apply Lemma 3.75 to
this action, giving kf1 ∗ f2 k1 6 kf1 k1 kf2 k1 for f1 , f2 ∈ L1 (G). Clearly the
map (f1 , f2 ) 7→ f1 ∗ f2 is bilinear in f1 , f2 ∈ L1 (G). It remains to show asso-
ciativity for the operation. For this assume that f1 , f2 , f3 ∈ L1 (G), and then
note that
Z
(f1 ∗ f2 ) ∗ f3 (g) = (f1 ∗ f2 )(h)f3 (h−1 g) dmG (h)
ZG Z
−1 −1
= f1 (k)f2 (k
| {z h})f3 (h g) dmG (k) dmG (h).
G G
ℓ
Using Fubini and the substitution ℓ = k −1 h for a fixed k gives

Z Z
(f1 ∗ f2 ) ∗ f3 (g) = f1 (k) f2 (ℓ)f3 (ℓ−1 k −1 g) dmG (ℓ) dmG (k)
G
|G {z }
(f2 ∗f3 )(k−1 g)
= f1 ∗ (f2 ∗ f3 )(g),
as required. For the proof of separability of L1 (G) we apply Lemma A.22

o
to find a sequence (Kn )Sof compact subsets Kn ⊆ G with Kn ⊆ Kn+1 for
all n > 1 and with X = ∞ n=1 K n . By Lemma 2.46, C(K n ) is separable, which
implies the same for the subset Cc (Kno ), for every n > 1. Since mG (Kn ) < ∞
the inclusion of Cc (Kno ) into L1 (G) is continuous, andS so the image of Cc (Kno )
in L1 (G) is also separable. By density of Cc (G) = ∞ o 1
n=1 Cc (Kn ) in L (G)
(Proposition 2.51) this proves separability of L1 (G).
For the second part we suppose π is a unitary representation of G on the
Hilbert space H and that v, w ∈ H. Then
Z Z
hw, f1 ∗π (f2 ∗π v)i = w, f1 (h)πh f2 (g)πg v dmG (g) dmG (h)
Z Z
= f1 (h) πh−1 w, f2 (g)πg v dmG (g) dmG (h)
ZZ

= f1 (h) πh−1 w, f2 (g)πg v dmG (g) dmG (h)
ZZ
= hw, f1 (h)f2 (g)πhg vi dmG (g) dmG (h).
Using the substition k = hg in the inner integral and exchanging the order
of integration we obtain
ZZ

hw, f1 ∗π (f2 ∗π v)i = w, f1 (h)f2 (h−1 k)πk v dmG (k) dmG (h)
Z
= hw, f1 ∗ f2 (k)πk vi dmG (k) = hw, (f1 ∗ f2 ) ∗π vi .
By the definition of the weak integral in Proposition 3.83, this implies the
proposition.
Essential Exercise 3.92. Suppose in addition to the assumptions of Pro-

position 3.91 that G is abelian and that the Haar measure is invariant un-
der† g 7→ −g. Show that f1 ∗ f2 = f2 ∗ f1 for any f1 , f2 ∈ L1 (G).
Exercise 3.93. Recall from Exercise 3.33 that the space of signed measures on G is a
Banach space. Define a convolution for elements in the space of signed measures on G, and
extend Proposition 3.91 to this case.
† The assumption that G is abelian actually implies this by the uniqueness properties of
the Haar measure (see Proposition 10.2).
Exercise 3.94. Show that there is a continuous injective algebra homomorphism from
the Banach algebra ℓ1 (Z) (which may be thought of as L1 (Z) with respect to the counting
measure on Z and convolution) to C(T), where multiplication is pointwise multiplication.
3.6 Further Topics
• Hilbert spaces are at the heart of many developments. We will start to see
this in the context of Sobolev spaces and the Laplace differential operator
in Chapters 5 and 6.
• Another case where the Hilbert space splits into eigenspaces will be con-
sidered in Chapter 6.
• The spectral theory of a single unitary operator (equivalently, a unitary
representation of the group G = Z) is actually more delicate than the
case considered above (see Exercise 6.1, for example), where we showed
that the Hilbert space splits into a sum of generalized eigenspaces (the
weight spaces). We will treat the case of a single unitary operator only
in Chapter 9 (which will build on the material in Chapters 7 and 8).
• The topic of Fourier series on Td leads naturally to the study of the
Fourier integral on Rd (see Section 9.2). The concepts of Fourier series and
Fourier integrals on Td and Rd , respectively, find a common generalization
in the theory of Pontryagin duality (see Section 12.8).
• The case of unitary representations for compact abelian groups considered
in this chapter was quite straightforward and is only the beginning of the
important theory of unitary representations of locally compact groups.
For locally compact abelian groups this is strongly related to Pontryagin
duality; see Sections 11.4 and 12.8. For compact groups the main theorem
in this direction is the Peter–Weyl theorem [85] (which is covered in
Folland [32]). For many other groups that are neither abelian nor compact
this topic is also important and can have many interesting surprises.
• One such surpise may be the so-called property (T) that was introduced
by Každan in 1967 and has become important in many parts of math-
ematics since then. Building on the material in Chapter 9 we will study
this notion in Section 10.3.
• We have seen in this chapter that the notion of a left Haar measure leads
to many interesting concepts. For a concretely given group it is often
not difficult to find its left Haar measure. In Chapter 10 (which relies
on Chapter 7) we will prove the existence of the left Haar measure in
general.
The reader may continue with Chapter 4, 5, 6, or 7 (with some of the
material of Chapter 6 building on Chapter 5).
Chapter 4
Uniform Boundedness and the Open
Mapping Theorem
In this chapter we present the main consequences of completeness for Banach

spaces.
4.1 Uniform Boundedness
Our first result is the principle of uniform boundedness or the Banach–

Steinhaus theorem.
Theorem 4.1 (Banach–Steinhaus). Let X be a Banach space and let Y

be a normed vector space. Let {Tα | α ∈ A} be a family of bounded linear
operators from X to Y . Suppose that for each x ∈ X, the set {Tα x | α ∈ A}
is a bounded subset of Y . Then the set {kTα k | α ∈ A} is bounded.
The reader in a hurry may also first prove the Baire category theorem
(Theorem 4.12) and derive Theorem 4.1 relatively quickly from it (see Ex-
ercise 4.16). We refrain from doing this here as it might help her to see the
argument behind the Baire category theorem once here in the concrete ap-
plication and once in the general case.
Proof of Theorem 4.1. Assume first that there is an open ball Bε (x0 ) on
which
{Tα x | α ∈ A}
is uniformly bounded: that is, there is a constant K such that
kx − x0 k < ε =⇒ kTα xk 6 K (4.1)
for all α ∈ A. Then we claim that it is possible to find a bound on the

family {kTα k | α ∈ A} of the norms of the operators. Indeed, for any y 6= 0
define
ε
z= y + x0 .
2kyk

122 4 Uniform Boundedness and the Open Mapping Theorem
Then z ∈ Bε (x0 ) by construction, so (4.1) implies that kTα zk 6 K. Now by

linearity of Tα the triangle inequality shows that

ε ε
kTα yk − kTα x0 k 6
T α y + T
α 0 = kTα zk 6 K,
x
2kyk 2kyk
which can be solved for kTα yk to give
K + kTα x0 k K + K′
kTα yk 6 2 kyk 6 2 kyk,
ε ε
where K ′ = supα kTα x0 k 6 K < ∞. It follows that
4K
kTα k 6
ε
for every α ∈ A, as required.
To finish the proof we have to show that there is a ball on which prop-
erty (4.1) holds. This is proved by contradiction. Assume that there is no ball
on which (4.1) holds. Fix an arbitrary open ball B0 . By assumption there is
a point x1 ∈ B0 such that
kTα1 x1 k > 1
for some index α1 ∈ A. Since each Tα is continuous, there is a ball Bε1 (x1 )
with kTα1 yk > 1 for all y ∈ Bε1 (x1 ). Assume without loss of generality
that Bε1 (x1 ) ⊆ B0 and ε1 < 1. By assumption, in this new ball the fam-
ily {Tα x | α ∈ A} is not bounded, so there is a point x2 ∈ Bε1 (x1 ) with
kTα2 x2 k > 2
for some index α2 ∈ A. We continue in the same way. By continuity of α2

there is a ball Bε2 (x2 ) with Bε2 (x2 ) ⊆ Bε1 (x1 ) and with kTα2 yk > 2 for
all y ∈ Bε2 (x2 ). Assume without loss of generality that ε2 < 21 .
Repeating this process produces points x1 , x2 , . . . , indices α1 , α2 , . . . , and
positive numbers ε1 , ε2 , . . . such that Bεn (xn ) ⊆ Bεn−1 (xn−1 ), εn < n1 , and
kTαn yk > n for all y ∈ Bεn (xn )
for all n > 1. Now the sequence (xn ) is clearly Cauchy (since xm ∈ Bεn (xn )
for all m > n, and so d(xm , xn ) < εn < 1/n), and therefore converges to
some z ∈ X. By construction, z ∈ Bεn (xn ) and kTαn zk > n for all n > 1,
which contradicts the hypothesis that the set {Tα z | α ∈ A} is bounded.
Corresponding to the operator norm defined in Lemma 2.52 there is of
course a notion of convergence in the space B(X, Y ) of bounded linear op-
erators from X to Y . A sequence (Tn ) in B(X, Y ) is uniformly convergent
to T ∈ B(X, Y ) if kTn − T k → 0 as n → ∞ (so uniform convergence of a
sequence of operators is simply convergence in the operator norm).
4.1 Uniform Boundedness 123
A different (and weaker, despite the name) notion of convergence for a

sequence of operators is given by the following definition. We will discuss
this and other notions of convergence again in Section 8.3.
Definition 4.2. A sequence (Tn ) in B(X, Y ) is strongly convergent if for

any x ∈ X the sequence (Tn x) converges in Y . If there is a T ∈ B(X, Y )
with limn→∞ Tn x = T x for all x ∈ X, then (Tn ) is strongly convergent to T .
Corollary 4.3. Let X be a Banach space, and Y any normed vector space.
If a sequence (Tn ) in B(X, Y ) is strongly convergent, then there exists an
operator T ∈ B(X, Y ) such that (Tn ) is strongly convergent to T .
Proof. For each x ∈ X the sequence (Tn x) is bounded since it is convergent.

By the uniform boundedness principle (Theorem 4.1), there is a constant K
such that kTn k 6 K for all n. Hence
kTn xk 6 Kkxk for all x ∈ X. (4.2)
We now define T : X → Y by T x = limn→∞ Tn x for all x ∈ X. It is clear

that T is linear, and (4.2) shows that kT xk 6 Kkxk for all x ∈ X, so T is
bounded. The construction of T means that (Tn ) converges strongly to T .
We note that the conclusions of Theorem 4.1 and of Corollary 4.3 crucially
rely on the assumption that X is a Banach space (see also Exercise 4.6).
(For Corollary 4.3 it is also crucial that we have restricted our attention to
sequences.)
Exercise 4.4. Prove that uniform convergence implies strong convergence, and find an
example of a sequence of bounded operators from a Banach space into a Banach space to
show that strong convergence does not imply uniform convergence.
Exercise 4.5. Phrase the definition of a unitary representation of a metric group (Defin-
ition 3.73) using the notion of strong convergence for operators.
Exercise 4.6. Let cc (N) ⊆ ℓ∞ (N) be the space of sequences with finite support equipped
with the supremum norm. Define T : cc (N) → cc (N) by
T (x1 , x2 , x3 , . . .) = (x1 , 2x2 , 3x3 , . . .)
for all (x1 , x2 , x2 , . . .) ∈ cc (N). Show that T is not bounded. Construct a sequence of
bounded linear operators Tk on cc (N) with Tk x → T x as k → ∞ for all x in cc (N).
4.1.1 Uniform Boundedness and Fourier Series
This section gives an application of Theorem 4.1 to classical Fourier analysis

on T (see Section 3.4 for the background). Recall that if f ∈ C(T) then the
Fourier coefficients of f are defined by am = hf, χm i where χm (x) = e2πimx ,
for m ∈ Z and x ∈ T. The nth partial sum of the Fourier series is
n
X
sn (x) = am e2πimx .
m=−n
Recall that one of the basic goals of Fourier analysis is to clarify the relation-
ship between the sequence of partial sums (sn ) and the function f . That is,
to understand in what sense does the function sn approximate f for large n
(if it does at all). We now ask if the sequence of functions (sn ) converges
uniformly or pointwise to f for f ∈ C(T).
Recall from Definition 3.61 that the Dirichlet kernel Dn is defined by
n
X sin((n + 21 )2πx)
Dn (x) = e2πikx =
sin(πx)
k=−n
for x ∈ T. By the discussion in Section 3.4.2 we have

Z Z
sn (0) = f (x)Dn (−x) dx = f (x)Dn (x) dx.
T T
Lemma 4.7. The linear functional Tn : C(T) → R defined by

Z
Tn f = f (x)Dn (x) dx
T
is bounded, with Z
kTn k = |Dn (x)| dx.
T
This is a very special case of the general argument in Lemma 2.63, but we
include it for the case at hand as this is easier to prove.
Proof. For any function f ∈ C(T) we have
Z Z
|Tn f | 6 |f (x)||Dn (x)| dx 6 kf k∞ |Dn (x)| dx,
T T
so Z
kTn k 6 |Dn (x)| dx.
T
Fix δ > 0. Since Dn is analytic it can only have finitely many sign changes
in [0, 1]. Therefore, we may find a continuous (this could even be chosen to
be piecewise-linear, for example) function fn with kfn k∞ 6 1 that differs
from sign(Dn (x)) only on a finite union of intervals whose total length is less
than kDn1k∞ δ. The triangle inequality for integrals now gives
Z Z

fn (x)Dn (x) dx > |Dn (x)|dx − 2δ,

T T
which proves the lemma as δ > 0 was arbitrary.

4.1 Uniform Boundedness 125
Lemma 4.8. The Dirichlet kernel Dn from Definition 3.61 satisfies

Z Z
sin((n + 21 )2πx)
|Dn (x)| dx = dx −→ ∞
sin(πx)
T T
as n → ∞.
Proof. Recall that | sin t| 6 |t| for all t ∈ R. It follows that

Z Z 1
sin((n + 12 )2πx) 1
dx > | sin((2n + 1)πx)| dx.
sin(πx) πx
T 0
1
Now | sin t| > 2 for all t ∈ πZ + [ π6 , 5π
6 ]. In particular, it follows that if
2n
[
(2n + 1)πx ∈ [(k + 16 )π, (k + 65 )π]
k=0
then
| sin((2n + 1)πx)| > 12 .
Together this gives
Z X2n Z (k+ 5 )π/(2n+1)
sin((n + 21 )2πx) 6 1 1
dx > dx
sin(πx) 1
5
π(k + 6 )/(2n + 1) 2
T k=0 (k+ 6 )π/(2n+1)
+✘
2n✘
2n
1 X✘ 1 64 π
= −→ ∞
2π 2n✘
k + 56 ✘ +✘ 1
k=0
as n → ∞.
Theorem 4.9. There exists a continuous function f ∈ C(T) whose Fourier

series diverges at x = 0.
Proof. As noted before Lemma 4.7, we have Tn f = sn (0) for all f ∈ C(T).
Moreover, for a fixed f ∈ C(T), if the Fourier series of f converges at 0, then
the family {Tn f | n > 1} is bounded (since each element is just a partial
sum of a convergent series). Thus if the Fourier series of f converges at 0
for all f ∈ C(T), then for each f ∈ C(T) the set {Tn f | n > 1} is bounded.
By Theorem 4.1, this implies that the set {kTn k | n > 1} is bounded, which
contradicts Lemmas 4.7 and 4.8.
It follows that there must be some f ∈ C(T) whose Fourier series does not
converge at 0 (and in fact the partial sums must be unbounded).
In principle the proofs of Theorem 4.1 and Theorem 4.9 allow one to con-
struct the function f as in Theorem 4.9 more concretely, at least as the
limit of a Cauchy sequence of explicit continuous functions. Comparing The-
orem 4.9 with the absolute convergence claim in Theorem 3.57 and the result
regarding the Fejér kernel in Proposition 3.65, we see that this limit func-
tion is not continuously differentiable and that the Fourier series of f at 0
is an oscillating function with the property that the Césaro averages of the
diverging sequence (sn (0)) actually converge to f (0).
4.2 The Open Mapping and Closed Graph Theorems
Recall that a continuous map has the property that the pre-image of any
open set is open, but in general the image of an open set is not open. We now
show that bounded linear maps between Banach spaces on the other hand
have the following special property.
Theorem 4.10 (Open mapping theorem). Let X and Y be Banach
spaces, and let T be a bounded linear map from X onto Y . Then T maps
open sets in X onto open sets in Y .
The assumption that X maps onto Y is essential. Consider, for example,
the projection (x, y) 7→ (x, 0) from R2 → R2 to see this.
The proof of Theorem 4.10 uses the Baire category theorem,(13) which
states that a complete non-empty metric space cannot be written as a count-
able union of nowhere dense subsets.
4.2.1 Baire Category
Definition 4.11. A subset S ⊆ X of a metric space (X, d) is said to be

nowhere dense if for every point x ∈ S, and for every ε > 0, Bε (x) ∩ (X\S) is
non-empty (equivalently, if (S)o = ∅). A set is called meagre or first category
if it is a countable union of nowhere dense sets.
We will think of a nowhere dense set as being small and want to extend
this to meagre sets. The next result is needed as a justification of that inter-
pretation.
Theorem 4.12 (Baire category theorem). A complete non-empty metric
space cannot be written as a countable union of nowhere dense sets. Indeed,
the complement of a countable union of nowhere dense sets is dense.
This is often described by saying that a complete metric space is of second
category. As we will see, the method of proof is similar to the proof of The-
orem 4.1 (see also Exercise 4.16).
We note that in a normed linear space the closed unit ball B r (x) coincides
with the closure Br (x) of the open ball. However, in a metric space they
may be entirely different: if any set X is given the discrete metric defined
by d(x, y) = 1 if x 6= y and 0 if x = y, then B 1 (x) = X while B1 (x) = {x}
for any x ∈ X.
4.2 The Open Mapping and Closed Graph Theorems 127
Proof of Theorem 4.12. Let X be a complete non-empty metric space,

and suppose that (Xj ) is a sequence of nowhere dense subsets of X (that
is, the sets Xj all have empty interior for j = 1, 2, . . .). Fix an arbitrary
ball Bε (x0 ) with ε > 0 and x0 ∈ X. Since X1 does not contain Bε (x0 ),
there must be a point x1 in Bε (x0 ) with x1 ∈ / X1 . It follows that there is
some r1 > 0 such that the closed ball B r1 (x1 ) = {y ∈ X | d(y, x1 ) 6 r1 }
satisfies B r1 (x1 ) ⊆ Bε (x0 ) and B r1 (x1 ) ∩ X1 = ∅. Assume without loss of
generality that r1 < 1.
Similarly, there is some x2 ∈ X and r2 > 0 such that B r2 (x2 ) ⊆ Br1 (x1 ),
and B r2 (x2 ) ∩ X2 = ∅, and without loss of generality r2 < 12 . Notice
that B r2 (x2 ) ∩ X1 = ∅ since B r2 (x2 ) ⊆ Br1 (x1 ).
Inductively, we construct a sequence of decreasing closed balls B rn (xn )
such that B rn (xn ) ∩ Xj = ∅ for 1 6 j 6 n, and rn → 0 as n → ∞. It follows
that (xn ) is a Cauchy sequence, and the limit z lies in the intersection of all
the closed balls B rn (xn ), so z ∈
/ Xj for all j > 1. This implies that
[
z ∈ Bε (x0 )r Xj 6= ∅,
j>1
which gives the result since ε > 0 and x0 ∈ X were arbitrary.

Exercise 4.13. Prove the Baire category theorem for compact topological spaces (that is,
without the assumption that the space is metric).
By taking complements we can also phrase the Baire category theorem in

terms of Gδ -sets.
Definition 4.14. A countable intersection of open sets in a topological space
is called a Gδ -set.
Corollary 4.15 (Baire category theorem). Let (X, d) be a complete met-
ric space, and assume
T∞ that Gn ⊆ X is a dense Gδ -set for each n > 1. Then
the intersection n=1 Gn is also a dense Gδ -set.
T∞
Proof. By assumption we can write each Gn in the form Gn = k=1 On,k ,
where each On,k is open and dense. It follows that
∞
\ ∞ \
\ ∞
Gn = On,k
n=1 n=1 k=1
is a Gδ -set, and that it is sufficient to consider the case where each Gn = On

is open and dense. In that case, Xn = XrGn is closed and so
Bε (x) ∩ (XrXn ) = Bε (x) ∩ (XrXn ) = Bε (x) ∩ Gn 6= ∅
for any open ball Bε (x) since Gn is dense. Therefore,SXn is nowhere dense
∞
for each n > 1. By
T∞Theorem 4.12 the complement of n=1 Xn is dense, and
this is precisely n=1 Gn , by construction.
Exercise 4.16. Prove the Banach–Steinhaus theorem (Theorem 4.1) using the Baire cat-
egory theorem (Theorem 4.12).
Let us mention that the notion of a dense Gδ -set is the topological ver-
sion of being a ‘large’ set, while a set is measure-theoretically ‘large’ if its
complement is a null set. Both notions of being large share similar features,
and in particular a countable intersection of large sets in either sense is also
large.(14) However, these two notions are quite different. Example 4.17 shows
how to construct topologically large sets that are measure-theoretically small,
and vice-versa.
Example 4.17. For every ε > 0 there exists an open set Oε ⊆ R which con-
tains Q and has Lebesgue measure less than ε. This may be found, for ex-
ample, by listing the elements of Q as {x1 , x2 , . . . } and setting
[
Oε = Bε/2k+2 (xk ).
k>1
T
Then G = n>1 O1/n is a dense Gδ and a null set, and its complement RrG
is meagre and of full measure.
The Baire category theorem can be used to show the existence of elements
of a complete metric space with certain properties. If the set of elements of
a complete space which do not satisfy the property can be obtained as a
countable union of nowhere dense sets, there must be elements that satisfy
the property (indeed, there exists a dense set of such elements).
Exercise 4.18. (a) Assume that X and Y are metric spaces. Show that for any f : X → Y
the set {x ∈ X | f is continuous at x} is a Gδ -set.
(b) Show that the map f : R → R defined by
(
1 p
q
if x = q
∈ Q;
f (x) =
0 if x ∈ RrQ
is continuous at each irrational point and is not continuous at each rational, where we
assume that pq is written in lowest terms and has q > 1.
(c) Use (a) to show that no function could have the reverse properties of the function
in (b).

Exercise 4.19. Show that f ∈ L1 ((0, 1)) | kf |(a,b) k∞ = ∞ whenever 0 6 a < b 6 1
contains a dense Gδ -subset of L1 ((0, 1)).
Exercise 4.20. Show that the set of functions in C([0, 1]) that are nowhere differentiable
contains a dense Gδ -set.
4.2.2 Proof of the Open Mapping Theorem
Recall that we write BrX and BrY for the open balls of radius r and centre 0
in X and Y , respectively.
Lemma 4.21. Assume that X is a normed vector space, Y is a Banach space,

and T : X → Y is a bounded, surjective linear operator. For any ε > 0, there
is a δ > 0 such that
T BεX ⊇ BδY . (4.3)
Proof. Since ∞
[
X= nBεX ,
n=1
S∞
and T is onto, we have Y = T (X) = n=1 nT BεX . By the Baire category the-
orem (Theorem 4.12) applied to Y it follows that, for some n, the set nT BεX
contains some ball BrY (z) in Y . Then, by linearity, T BεX must contain the
ball BδY (y), where y = n1 z and δ = n1 r. We note that
BδY (y) − BδY (y) = {y1 − y2 | y1 , y2 ∈ BδY (y)} = B2δ

Y
X
and similarly B2ε = BεX − BεX . Therefore,
Y X
B2δ ⊆ T BεX − T BεX ⊆ T B2ε
and (4.3) follows.

The above lemma only used the fact that Y is a Banach space. Using the
hypothesis that X is also a Banach space, we are able to prove the main step
towards the theorem in the following lemma.
Lemma 4.22. Let T : X → Y be as in Theorem 4.10. For any ε > 0 there
is a δ > 0 such that
T BεX ⊇ BδY . (4.4)
P∞
Proof. Choose a sequence (εn ) with each εn > 0 and with n=1 εn < ε. By
Lemma 4.21 there is a sequence (δn ) of positive numbers such that
T BεXn ⊇ BδYn (4.5)
for all n > 1. Without loss of generality, assume that δn → 0 as n → ∞.

(Actually this holds unless Y is very special indeed.) Now let δ = δ1 .
Let y be any point in BδY = BδY1 . By (4.5) there is a point x1 ∈ BεX1 such
that T x1 is as close to y as we wish, say with ky − T x1 k < δ2 . Since
y − T x1 ∈ BδY2 ,
the inclusion (4.5) with n = 2 implies that there exists a point x2 ∈ BεX2 such
that ky − T x1 − T x2 k < δ3 . Continuing, we obtain a sequence (xn ) in X such
that kxn k < εn for all n, and
!
Xn

y − T xk < δn+1 . (4.6)

k=1
This argument may be paraphrased

Pn−1 as follows. At each stage we approximate
the current element y − T k=1 xk in Y up to an error δn+1 that we know
can be dealt with later. This pushes the problem along until it ultimately
vanishes in the limit. P
Since kxn k < εn , the series n xP
n is absolutely convergent, so by
Lemma 2.28 it is convergent; write x = n xn . Then
∞
X ∞
X
kxk 6 kxn k 6 εn < ε.
n=1 n=1
The map T is continuous, so (4.6) shows that y = T x, since δn → 0 as n → ∞.

That is, for any y ∈ BδY we have found a point x ∈ BεX such that T x = y,
proving (4.4).
Proof of Theorem 4.10. Let O ⊆ X be a non-empty open subset and let x
be an element of O. Then there is a ball BεX such that
x + BεX ⊆ O.
By Lemma 4.22, T BεX ⊇ BδY for some δ > 0. Hence
T (O) ⊇ T (x + BεX ) = T x + T (BεX ) ⊇ T x + BδY .
To summarize, we have shown that for every x ∈ O the point T x is in the

interior of T (O).
Exercise 4.23. Let X and Y be Banach spaces and T : X → Y a bounded operator.

Show that the following three conditions are equivalent:
(a) T is surjective;
(b) T is open;
(c) there exists a dense subspace Y ′ ⊆ Y and a constant c > 0 such that for any y ∈ Y ′
there exists some x ∈ X with T x = y and kxkX 6 ckykY .
4.2.3 Consequences: Bounded Inverses and Closed Graphs
As an application of Theorem 4.10, we establish a general property of inverse

maps. As is standard, T : X → Y means that T is defined on all of X, but
sometimes it is convenient (or necessary) to permit an operator T to only be
defined on a domain DT which is then a (possibly proper) subspace of X.
Definition 4.24. Let T : X → Y be an injective linear operator. Define the
inverse of T , denoted T −1 , by requiring that T −1 y = x if and only if T x = y.
Then the domain of T −1 is the linear subspace T X ⊆ Y , and T −1 is a linear
operator on its domain.
Clearly T −1 T x = x for all x ∈ X, and T T −1y = y for all y in the domain
of T −1 . We also say that T −1 is a left inverse of T .
Proposition 4.25 (Bounded inverse). Let X and Y be Banach spaces,

and let T be a bijective bounded operator from X to Y . Then T −1 is also a
bounded operator.
Proof. Since T −1 is a linear operator, we only need to show it is continuous

(which is equivalent to boundedness by Lemma 2.52). By Theorem 4.10, T
maps open sets onto open sets. For the map T −1 this shows that the pre-
image (T −1 )−1 (O) = T (O) of an open set O ⊆ X is open in Y . Therefore, T −1
is continuous.
Corollary 4.26 (Equivalent norms). If X is a Banach space with respect

to two norms k · k(1) and k · k(2) and there is a constant K such that
kxk(1) 6 Kkxk(2) ,
for all x ∈ X, then the two norms are equivalent. That is, there is another
constant K ′ > 0 with
kxk(2) 6 K ′ kxk(1)
for all x ∈ X.
Proof. Consider the map T : x 7→ x from (X, k · k(2) ) to (X, k · k(1) ). By

assumption, T is bounded, so by Proposition 4.25, T −1 is also bounded, giving
the bound in the other direction.
Definition 4.27. Let T be a linear operator from a normed linear space X

into a normed linear space Y , with domain a linear subspace DT ⊆ X. The
graph of T is the set
GT = {(x, T x) | x ∈ DT } ⊆ X × Y.
If GT is a closed subspace of X × Y then T is a closed operator.
Notice as usual that this notion becomes trivial in finite dimensions in the
following sense. If X and Y are finite-dimensional, then the graph of T is
simply some linear subspace, which is automatically closed. Also it is easy
to see that a continuous operator has a closed graph. The next theorem —
the converse — is called the closed graph theorem. Notice that this converse
is not a purely topological fact. For instance, the set consisting of the graph
of the hyperbola xy = 1 and the origin is the closed graph of a discontinuous
function f : R → R.
Theorem 4.28 (Closed graph theorem). Let X and Y be Banach spaces,

and T : X → Y a linear operator with DT = X. If T is closed, then it is
continuous.
Proof. Fix the norm k(x, y)k = kxkX + kykY on X × Y . The graph GT is,
by hypothesis, a closed subspace of X × Y , so GT is itself a Banach space.
Consider the projection P : GT → X defined by P (x, T x) = x. Then P is
clearly bounded, linear, and bijective. It follows by Proposition 4.25 that P −1
is a bounded linear operator from X to GT , so
k(x, T x)k = kP −1 xk 6 KkxkX
for all x ∈ X, for some constant K. It follows that kxkX + kT xkY 6 KkxkX
for all x ∈ X, so T is bounded, and hence continuous by Lemma 2.52.
Exercise 4.29 (Hellinger–Toeplitz theorem). Suppose that A : H → H is a linear

operator on a Hilbert space H that is self-adjoint in the sense that
hAx, yi = hx, Ayi
for all x, y ∈ H. Show that this implies A is bounded.
Corollary 4.30. Let (X, µ) be a σ-finite measure space and g : X → C a

measurable function. If T : f 7→ gf maps L2µ (X) to L2µ (X), then g ∈ L∞
µ (X)
and kT k = kgk∞ .
Proof. Notice that the hypotheses in the statement do not require that the
map is continuous, but simply ask that the range lies in L2µ (X). However,
if (fn , gfn ) has fn → f and gfn → ψ as n → ∞ in L2µ (X), (that is, a
sequence in the graph that converges to (f, ψ)), then we can extract a sub-
sequence along which both convergences hold µ-almost everywhere. Along
this subsequence gfn converges almost everywhere to gf and to ψ, so that
gf = ψ ∈ L2µ (X),
and hence (f, ψ) also lies in the graph of T . It follows that T is closed, and
hence continuous by Theorem 4.28.
Knowing now that T is bounded, there is a constant C = kT k > 0 such
that kgf k2 6 Ckf k2 for any f ∈ L2µ (X). Let
XC = {x ∈ X | |g(x)| > C},
which we claim is a null set. Assuming the opposite, let B ⊆ XC be a meas-

urable subset of positive finite measure and let f = 1B be its characteristic
function. Then
Z
2
C µ(B) < |g|2 |f |2 dµ = kgf k22 6 C 2 kf k22 = C 2 µ(B)
B
gives a contradiction, which implies that µ(XC ) = 0. Hence |g| is al-

most everywhere less than or equal to C, and in particular g ∈ L∞ µ (X).
Moreover, kgk∞ 6 kT k and the opposite inequality follows directly from the
definition.
4.3 Further Topics 133
Some important and natural operators are unbounded. An example is the

derivative operator D0 : f 7→ f ′ considered on L2 ((0, 1)), which is originally
defined on the dense subset {f ∈ C 1 ((0, 1)) | f, f ′ ∈ L2 ((0, 1))}. It is not
continuous, but as we will see in Chapter 5, considering the closure of its
graph gives a closed (but still unbounded) weak differential operator defined
on a dense subspace of L2 . In particular, we see that closed operators provide
a suitable framework for studying unbounded operators.
The closed graph theorem says that these generalized operators, namely
closed operators, are usually only defined on a proper subset of the first
Banach space unless they actually are bounded.
4.3 Further Topics
The above consequences of completeness are useful in several different areas.

We indicate below a few applications and extensions of these results.
• As already mentioned, in the next two chapters we will study partial
differential operators as examples of closed operators. We can do this
without developing any theory on unbounded operators, by simply study-
ing the graphs of these operators.
• As we have seen the Banach–Steinhaus theorem (Theorem 4.1) has inter-
esting consequences for the notion of strong convergence (Corollary 4.3).
We will discuss the corresponding topology again in Section 8.3.
• We will encounter multiplication operators as in Corollary 4.30 again in
Chapter 9, 12, and 13
• The notion of closed operators is the starting point of the theory of
self-adjoint unbounded operators, which we will study systematically in
Chapter 13.
The reader may continue with Chapter 5, 6, or 7 (with some of the material
of Chapter 6 building on Chapter 5).
Chapter 5
Sobolev Spaces and Dirichlet’s
Boundary Problem
Using the theory of Fourier series developed in Section 3.4, we will now de-
velop the notion of Sobolev spaces and prove the Sobolev embedding theorem.
Sobolev spaces combine familiar notions of smoothness (that is, differentiabil-
ity) with bounds on Lp norms. We will set p = 2 and so will have all the tools
of Hilbert spaces at our disposal, but the theory can be extended to all p > 1.
The Sobolev embedding theorem and elliptic regularity for the Laplace op-
erator will allow us to prove in Section 5.3 the existence of solutions to the
Dirichlet boundary value problem introduced in Section 1.2.
5.1 Sobolev Spaces and Embedding on the Torus
In this section we are going to construct Hilbert spaces of functions on Td

depending on a parameter k ∈ N0 . Unlike the equivalence classes f ∈ L2 (Td ),
these functions may be continuous or even differentiable, depending on k (see
Section 5.1.2).
5.1.1 L2 Sobolev Spaces on Td
Definition 5.1. Let k > 0 be an integer. We (initially) define the (L2 ) So-
bolev space H k (Td ) to be the closure of C ∞ (Td ) inside
M
V = L2 (Td ),
kαk1 6k
where the direct sum runs over all multi-indices α ∈ Nd0 with kαk1 6 k and a
function f ∈ C ∞ (Td ) is identified with the tuple φk (f ) = (∂α f )kαk1 6k ∈ V .
In order to make this definition more palatable, we now describe some

special cases.

136 5 Sobolev Spaces and Dirichlet’s Boundary Problem
(1) If k = 0, then there is only the multi-index α = 0 and so H 0 (Td ) is

the closure of C ∞ (Td ) in L2 (Td ) with respect to k · k2 . As we have
seen, C ∞ (Td ) is dense in C(Td ) with respect to k · k∞ (indeed, the tri-
gonometric polynomials are already dense) and C(Td ) is dense in L2 (Td )
with respect to k · k2 , so we obtain H 0 (Td ) = L2 (Td ). We will also
write kf kH 0 = kf k2 for f ∈ H 0 (Td ).
(2) Now let k = 1, and in this case there are d + 1 multi-indices. Hence
H 1 (Td ) = φ1 (C ∞ (Td ))
d+1
is the closure of φ1 (C ∞ (Td )) in L2 (Td ) , where we used the em-
bedding φ1 : f 7→ (f, ∂1 f, . . . , ∂d f ) ∈ V . So, by our definition, elements
of H 1 (Td ) are (d + 1)-tuples of functions on Td . In order to be able to
think of these as single functions on Td (which is how we will think of So-
bolev spaces), notice that the last d terms of the (d+1)-tuple are uniquely
determined by the first term. This is clear for φ1 (f ) with f ∈ C ∞ (Td ),
but also remains true in the closure H 1 (Td ), as we show next.
Lemma 5.2 (Fourier series of weak derivatives). Suppose that the vec-
tor (f, f1 , . . . , fd ) belongs to H 1 (Td ) and the Fourier series of f is given by
X
f= cn χn .
n∈Zd
Then X
fj = 2πinj cn χn . (5.1)
n∈Zd
Proof. For f ∈ L2 (Td ) and n ∈ Zd , write an (f ) for the nth Fourier coeffi-
cient. We start with the formula

an ∂j f = h∂j f, χn i = 2πinj hf, χn i = 2πinj an (f )
for all n ∈ Zd and all f ∈ C ∞ (Td ), see (3.18). Using continuity of the inner
product and the definition of H 1 (Td ), this formula automatically extends to
all (f, f1 , . . . , fd ) ∈ H 1 (Td ). Expanding fj into its Fourier series (see The-
orem 3.54) gives the lemma.
The lemma now shows in full generality that the first component f of any
element (f, f1 , . . . , fd ) ∈ H 1 (Td ) determines all the other components. Thus
we can identify an element of H 1 (Td ) with the associated element f ∈ L2 (Td ),
and will write f ∈ H 1 (Td ) and ∂ j f = fj ∈ L2 (Td ) for j = 1, . . . , d. We will
also call the other components ∂ j f weak derivatives (this will be further
justified in Section 5.2), as these generalize the notion of partial derivative
for smooth functions. However, the norm associated to f ∈ H 1 (Td ) is
5.1 Sobolev Spaces and Embedding on the Torus 137
v
u d
u X
kf kH 1 = tkf k22 + ∂ j f k22 .
k∂
j=1
Summarizing the above discussion for the case k = 1, we may inter-

pret H 1 (Td ) as the domain (and, in the original definition, as the graph)
d
of the closed operator ∇ : H 1 (Td ) ∋ f 7→ (∂
∂ x1 f, . . . , ∂ xd f ) ∈ L2 (Td ) . This
discussion generalizes to any k > 1 as follows.
Proposition 5.3 (Forgetting regularity, weak derivative). Fix k and ℓ
with 0 6 ℓ < k. Then the identity map on C ∞ (Td ) uniquely extends to
an injective continuous operator ık,ℓ : H k (Td ) −→ H ℓ (Td ) of norm one.
Moreover, for every j ∈ {1, . . . , d} the partial derivative extends uniquely to
a continuous operator
∂ j : H k (Td ) −→ H k−1 (Td )
with norm less than or equal to one. Finally, (3.18) holds similarly for all f
in H k (Td ) and for all α ∈ Nd0r{(0, . . . , 0)} with kαk1 6 k.
Proof. For the first claim consider the map
M M
πk,ℓ : L2 (Td ) −→ L2 (Td )
kαk1 6k kαk1 6ℓ
(fα )kαk1 6k 7−→ (fα )kαk1 6ℓ
and notice that πk,ℓ (φk (f )) = φℓ (f ) for all f ∈ C ∞ (Td ). Therefore the ex-
tended map ık,ℓ is simply the restriction of this projection to H k (Td ), and so
has norm less than or equal to one. Using constant functions we see that the
norm of ık,ℓ is equal to one. Injectivity will follow from the last claim of the
proposition.
For the second claim, regarding the operator ∂ j : H k (Td ) → H k−1 (Td ),
we modify the argument above as follows. Consider the projection map
M M
πj : L2 (Td ) −→ L2 (Td )
kαk1 6k kαk1 6k−1
(fα )kαk1 6k 7−→ (fα+ej )kαk1 6k−1
which clearly has norm one. Figure 5.1 illustrates the difference between the
projection πk,ℓ and the projection πj in a simple example. For f ∈ C ∞ (Td )
we see that πj (φk (f )) = φk−1 ∂ej (f ) , which (as above) shows that the
restriction of πj to H k (Td ) is the desired operator ∂ j : H k (Td ) → H k−1 (Td ).
The final claim of the proposition follows from the description of the Four-
ier series of the weak derivative in Lemma 5.2 for k = 1 and induction.
Now justified by Proposition 5.3, we identify an element f = (fα )kαk1 6k
in H k (Td ) with its first component f0 in L2 (Td ). The other components are
Fig. 5.1: With d = 2, ℓ = 2, k = 3 and j = 1 the projection defining ık,ℓ

corresponds to the set of multi-index α highlighted on the left. The projection
defining ∂ j corresponds to the set highlighted on the right.
identified with the weak derivative, so we write
f = f0 = ∂ 0 f ∈ H k (Td ) ⊆ L2 (Td ), ∂ α f = fα ∈ L2 (Td )
for all multi-indices α with kαk1 6 k. In this notation our norm becomes
s X
kf kH k = k∂∂ α f k22 .
kαk1 6k
5.1.2 The Sobolev Embedding Theorem on Td
As we have seen in the discussion above, each of the spaces H k (Td ) consists
of certain L2 functions on Td . For k = 0 we have H 0 (Td ) = L2 (Td ). A nat-
ural question for k > 1 is to ask which functions in L2 (Td ) lie in H k (Td ).
Using Fourier series we can give a formal answer to this, and this will have
interesting and important consequences which will be discussed below. An-
other consequence of this lemma is that it makes it meaningful to define H k
for k ∈ R by using the convergence property in the lemma as a definition —
we will not pursue this further.
k d
Lemma 5.4 (CharacterizingP H (T ) by 2thed Fourier series). Let k > 0
be an integer and let f = n∈Zd cn χn ∈ L (T ). Then f ∈ H k (Td ) if and
only if X
|cn |2 knk2k
2 < ∞. (5.2)
n∈Zd
d ) ∈ Z and α = (α1 , . . . , αd ) ∈ N0 we write n

d d α
Proof. For n = (n1 , . . . , nP
α1 αd k d
for (n1 , . . . , nd ). If f = n∈Zd cn χn ∈ H (T ) then, by Proposition 5.3
and by Fourier series on the torus (Theorem 3.54),
(nα cn )n∈Zd ∈ ℓ2 (Zd )
for all α with kαk1 6 k. We apply this to α = ke1 , ke2 , . . . , ked and see that
5.1 Sobolev Spaces and Embedding on the Torus 139
d X
X
n2k 2
j |cn | < ∞.
j=1 n∈Zd
d
X
Using the bound knk2k
2 ≪ n2k
j for all n ∈ Z , we get (5.2) as required.
d
j=1
Conversely, assume (5.2). Then for any α ∈ Nd0r{(0, . . . , 0)} with kαk1 6 k
we have |nα | 6 knkk2 , and so
X 2
|(2πin)α cn | < ∞.
n∈Zd
The characterization of convergence for a series involving P an orthonormal

basis (Proposition 3.36) shows that every component of φk knk2 6N cn χn
converges as N → ∞, and so we deduce that the images of these partial sums
converge in H k (Td ). As the first component of the limit vector equals f , it
follows that f ∈ H k (Td ).
Exercise 5.5. Show that H 1 (Td ) is a meagre subset of L2 (Td ).
The following theorem shows (in a more constructive manner than the pre-
vious exercise) how special the elements of the subset H k (Td ) within L2 (Td )
become once k is sufficiently large. If k > d2 , then any element of H k (Td )
agrees almost surely (and will be identified) with a continuous function. In-
creasing k further also gives some differentiability of this continuous function.
Theorem 5.6 (Sobolev embedding on the torus). Let k and ℓ be non-

negative integers with k > ℓ + d2 . Then the inclusion map from C ∞ (Td )
to C ℓ (Td ) has a continuous extension to H k (Td ). In particular, any func-
tion f ∈ H k (Td ) has a uniquely defined continuous representative belonging
to C ℓ (Td ) with kf kC ℓ ≪d kf kH k , where we also denote the continuous rep-
resentative by f .
The proof will show that most of the work has already been done.
Proof of Theorem 5.6. Let us start with the case ℓ = 0. In this case we
already know that
q
kf k∞ ≪d kf k22 + k∂ek1 f k22 + · · · + k∂ekd f k22
for f ∈ C ∞ (Td ) by Theorem 3.57. However, the square root on the right-hand
side is bounded above by kf kH k , which shows that the inclusion map

ı : C ∞ (Td ), k · kH k −→ C(Td ), k · k∞
is a bounded operator. Since C(Td ) is a Banach space, this operator extends

to H k (Td ) by Proposition 2.59. We still need to argue that this extension
really does select a continuous representative. For this, notice that the com-
position of the inclusion maps
ı
C ∞ (Td ) −→ C(Td ) −→ L2 (Td )
is the inclusion map φ0 : C ∞ (Td ) → L2 (Td ), whose unique extension

to H k (Td ) is ık,0 (see Proposition 5.3). Hence the composition of the con-
structed extension ı : H k (Td ) → C(Td ) with the inclusion into L2 (Td ) co-
incides with ık,0 and so ı(f ) ∈ C(Td ) is a representative of the equivalence
class ık,0 (f ) ∈ L2 (Td ) associated to H k (Td ).
Now let ℓ > 1 satisfy ℓ + d2 < k. Note (for example, by using the argument
from Example 2.24(6)) that C ℓ (Td ) is a Banach space with the norm
kf kC ℓ = max k∂γ f k∞ .
kγk1 6ℓ
We apply the above to ∂γ f and obtain from Proposition 5.3 that
k∂γ f k∞ ≪d k∂γ f kH k−ℓ 6 kf kH k
for f ∈ C ∞ (Td ) and kγk1 6 ℓ. Therefore,
kf kC ℓ ≪d kf kH k
for f ∈ C ∞ (Td ), and the inclusion C ∞ (Td ) → C ℓ (Td ) once again gives rise
to a bounded operator ıℓ : H k (Td ) → C ℓ (Td ). Composing with the inclusion
map from C ℓ (Td ) to L2 (Td ) we again see that f ∈ H k (Td ) agrees almost
everywhere with ıℓ (f ) ∈ C ℓ (Td ).
5.2 Sobolev Spaces on Open Sets
Much of the discussion in Section 5.1.1 regarding Sobolev spaces on Td can

be quickly generalized to open subsets of Rd . However, in Section 5.1.2 we
frequently made use of Fourier series in the arguments, so we will go through
the definitions and elementary properties once again, without appealing to
Fourier series.
We will define spaces of functions on an open subset U ⊆ Rd that form
Hilbert spaces, and in which the elements are once more continuous or even
differentiable depending on a regularity parameter k (and the dimension d).
We have a choice regarding the behaviour of the functions at the boundary ∂U
of U ⊆ Rd , giving rise to two different Sobolev spaces for every k > 1.
5.2 Sobolev Spaces on Open Sets 141
Definition 5.7. Let d > 1 and k > 0 be integers, and let U ⊆ Rd be an open
subset. Then the (L2 ) Sobolev space H k (U ) is defined† to be the closure of

(∂α f )α | f ∈ C ∞ (U ), ∂α f ∈ L2 (U ) for kαk1 6 k (5.3)
L
inside kαk1 6k L2 (U ), where as before we take the direct sum over all α ∈ Nd0
with kαk1 6 k.
Even though the closure H k (U ) contains many new functions that are not
in C ∞ (U ), those new elements
(fα )kαk1 6k ∈ H k (U ) (5.4)
still have some of the properties of the elements in the subspace (5.3) used
to define H k (U ). In fact, as we will show below, f = f0 determines all the
other components fα of the vector (5.4), and these are derivatives of f in the
following weaker sense (which, as we will see, turns integration by parts into
the definition of a derivative).
Definition 5.8. Suppose that α ∈ Nd0 and f, g ∈ L2 (U ). Then g is called a

weak α-partial derivative (or an α-partial derivative in the sense of distribu-
tions) of f , written g = ∂ α f , if
Z Z
f ∂α φ dx = (−1)kαk1 gφ dx
U U
for all φ ∈ Cc∞ (U ). In the case α = ej , we will call this the weak jth partial
derivative and write g = ∂ j f .
We view the functions φ appearing in this definition (and similar instances)

as ‘test functions’.
Example 5.9. Let U = (−1, 1) ⊆ R and define the functions

( (
x if x > 0 1 if x > 0
f (x) = and g(x) =
0 if x < 0 0 if x < 0
for x ∈ (−1, 1). Then f has weak e1 -partial derivative g. In fact, for φ
in Cc∞ ((−1, 1)) we have
Z 1 Z 1 1 Z 1 Z 1

f φ′ dx = xφ′ (x) dx = xφ(x) − φ dx = 0 − gφ(x) dx,
−1 0 0 0 −1
as required.
† In the literature another notation that is used is W k,2 . The more general case of W k,p
is defined similarly using Lp (U ) instead of L2 (U ).
Lemma 5.10 (Weak derivatives). Let U ⊆ Rd be open. A weak α-partial

derivative of an L2 (U ) function f is uniquely determined as an element
of L2 (U ) if it exists. If (fα )kαk1 6k ∈ H k (U ) then f = f0 has fα = ∂ α f
as a weak α-partial derivative for α with kαk1 6 k. In particular, f = f0
determines all the elements of the vector in H k (U ).
For convenience in this section we restrict attention at several points to

real-valued functions. As a C-valued function f on U belongs to H k (U )
(or H0k (U )) for some k > 0 if and only if both ℜ(f ) and ℑ(f ) belong to
that space, this is not a significant restriction.
Proof. If g is a weak α-partial derivative of f then the inner product
Z Z
hg, φi = gφ dx = (−1)kαk1 f ∂α φ dx
U
is determined by f for all φ ∈ Cc∞ (U ). As Cc∞ (U ) is dense in L2 (U ) (see

Exercise 5.11 or Exercise 5.17(e)), we see that g is uniquely determined by f .
Let f ∈ C ∞ (U ) with ∂ej f ∈ L2 (U ) and φ ∈ Cc∞ (U ). Then for every y ∈ Rd
the set Ky = Supp φ∩(y+Rej ) is a compact subset of Uy = U ∩(y+Rej ) which
is relatively open in y + Rej for every y ∈ Rd . Therefore Uy is a countable
union of intervals and finitely many of these are sufficient to cover Ky . Using
integration by parts for the integral along the jth coordinate xj gives
Z Z
f (x)∂ej φ(x) dxj = − ∂ej f (x)φ(x) dxj
Uy Uy
for any y ∈ U , where the boundary terms vanish since φ ∈ Cc∞ (U ). Integ-
rating over the remaining variables shows that ∂ej f is indeed also a weak ej -
partial derivative. By induction on kαk1 , this implies that
hf0 , ∂α φi = (−1)kαk1 hfα , φi

L
first for all (fα )kαk1 6k in the subspace (5.3) inside kαk1 6k L2 (U ), and then
by continuity of the scalar product for all (fα )kαk1 6k ∈ H k (U ). This implies
the lemma.
Essential Exercise 5.11. Let U ⊆ Rd be open. Show that Cc∞ (U ) ⊆ Lp (U )

is dense for any p ∈ [1, ∞).
Definition 5.12 (Modifying Definition 5.7). Lemma 5.10 justifies the

following notational convention. We identify the elements of H k (U ) with
functions f ∈ L2 (U ) and equip the space H k (U ) with the norm
s X
kf kH k (U) = ∂ α f k2L2 (U) .
k∂
kαk1 6k
Using this identification, the subspace (5.3) will from now on be referred to
as C ∞ (U ) ∩ H k (U ).
Proposition 5.13 (Forgetting regularity and the weak partial deriv-

ative). For k > ℓ > 0 there is a natural injection ık,ℓ : H k (U ) → H ℓ (U )
of norm one, extending the identity on C ∞ (U ). For any multi-index α
with kαk1 6 k there is a natural operator
∂ α : H k (U ) −→ H k−kαk1 (U )
f 7−→ ∂ α f
of norm less than or equal to one, which extends ∂α on C ∞ (U ) ∩ H k (U ).
This may be proved along the same lines as Proposition 5.3.

We may obtain other Sobolev spaces — which will be subspaces of H k (U )
— by requiring additional decay properties at the boundary ∂U .
Definition 5.14. We define H0k (U ) = Cc∞ (U ) ⊆ H k (U ) to be the closure of

all smooth compactly supported functions in H k (U ).
We will see later that elements of H0k (U ) ‘vanish in the square-mean norm
sense’ at ∂U if k > 1.
Let us add the following remark to Definition 5.7. We defined H k (U ) to
consist of those f ∈ L2 (U ) that have weak α-partial derivatives ∂ α f ∈ L2 (U )
for all α with kαk1 6 k such that the vector
M
∂ α f )kαk1 6k ∈
(∂ L2 (U )
kαk1 6k
can be approximated by vectors corresponding to elements of
C ∞ (U ) ∩ H k (U ).
One may ask whether this approximation statement can be proved instead
of assumed.
Exercise 5.15. Show that f ∈ L2 (Td ) belongs to H k (Td ) if and only if there exists, for
every α ∈ Nd0 with kαk1 6 k, a weak α-partial derivative ∂ α f ∈ L2 (Td ). Here the weak
partial derivative is defined in terms of smooth test functions φ ∈ C ∞ (Td ).
The analogue of Exercise 5.15 also holds(15) for certain open subsets U
of Rd , but we will not use this possible alternative definition of the Sobolev
spaces here (and will return to this question in Section 8.2.2). We note that
due to the boundary of U this equivalence is a bit harder to prove. For this
proof, but more importantly also for the material that follows in this chapter,
we need some more background concerning smooth functions on Rd , which
we outline in the following series of exercises.
Exercise 5.16 (A smooth function on R). Show that the function

(
e1/t for t < 0,
ψ(t) =
0 for t > 0
is smooth on R.
Essential Exercise 5.17 (Smooth approximate identities on Rd ). Define

a real function  on Rd by
(
ce1/(kxk2 −1) for kxk2 < 1,
2
2
(x) = cψ(kxk2 − 1) =
0 for kxk2 > 1,
R R
where c > 0 is chosen so that Rd (x) dx = B1 (x) dx = 1. For ε > 0, also

define ε (x) = ε1d  xε for x ∈ Rd . Show the following.
(a)  ∈ Cc∞ (Rd ). R
(b) The function fε defined by fε (x) = f ∗ ε (x) = Rd f (y)ε (x − y) dy
converges uniformly to f as ε → 0 on any compact subset of Rd for
any f ∈ C(Rd ).
(c) fε ∈ C ∞ (Rd ) for any f ∈ C(Rd ).
(d) Supp fε ⊆ Supp f + Bε .
(e) Generalize the above in the appropriate sense to any f ∈ Lp (Rd ) and
any p ∈ [1, ∞). Derive the statement in Exercise 5.11 from this.
Exercise 5.18. Generalize and prove the first claim from Section 1.5 for functions defined
on Td or on an open subset of Rd .
Exercise 5.19. Suppose U ⊆ Rd is open, bounded, and star-shaped with centre 0 in the
sense that U ⊆ λU for all λ > 1 (see, for example, Figure 5.2). Let f, f1 , . . . , fd be in L2 (U )
and suppose fj is the weak ej -partial derivative of f for j = 1, . . . , d. Show that f ∈ H 1 (U ).
5.2.1 Examples
We illustrate the theory above with some simple examples, which will be
justified below.
Example 5.20. Let d = 1, U = (0, 1) and k = 1. Then every f ∈ H 1 ((0, 1))

has a continuous representative ı(f ) ∈ C([0, 1]) with
kı(f )k∞ 6 2kf kH 1 . (5.5)
This is again an instance of the Sobolev embedding theorem, which we will

prove for open subsets in Theorem 5.34 below. However, the instance dis-
cussed here can be proven quite directly and independently. Also see Exer-
cise 5.22 and Exercise 5.23.
Example 5.21. Let d = 2, U = r

B1/2 {0} and k = 1. Then the function f

defined by f (x) = log log kxk lies in H 1 (U ) and cannot be extended to an
element of C(U ). Also see Exercise 5.25 and Exercise 5.26.
Justification of Example 5.20. For x, y ∈ U and f ∈ C ∞ (U ) ∩ H 1 (U )

we clearly have Z y
f (y) = f (x) + f ′ (s) ds.
x
Notice that for fixed x and y the integral on the right is a continuous func-
tional on H 1 (U ), but for the terms f (y) and f (x) this is not clear yet. Now
integrate over x ∈ (0, 1) to get
Z 1 Z 1 Z y
f (y) = f (x) dx + f ′ (s) ds dx
0 0 x
Z 1 Z 1Z 1
= f (x) dx + f ′ (s)σ(y, x, s) ds dx,
0 0 0
where 

1 if x < s < y,
σ(y, x, s) = −1 if y < s < x, and


0 if s is not between x and y.
Applying Fubini’s theorem we get
Z 1 Z 1
f (y) = f (x) dx + f ′ (s)k(y, s) ds, (5.6)
0 0
where Z (
1
s if s < y,
k(y, s) = σ(y, x, s) dx =
0 s − 1 if s > y.
Hence (5.6) expresses the value of f at y ∈ U as the sum hf, 1U i+hf ′ , k(y, ·)i,
which is clearly continuous on H 1 ((0, 1)), and since kk(y, ·)kL2 6 1 we also
have |f (y)| 6 2kf kH 1 . Moreover, we may use (5.6) for y = 0 and y = 1 as a
definition of f (0) and f (1), and then
Z 1

|f (y1 ) − f (y2 )| = f (s) (k(y1 , s) − k(y2 , s)) ds
′
0
p
6 kf ′ k2 kk(y1 , ·) − k(y2 , ·)k2 = kf ′ k2 |y1 − y2 | (5.7)
for all y1 , y2 ∈ [0, 1]. It follows that any f ∈ C ∞ (U ) ∩ H 1 (U ) extends to

a continuous function satisfying (5.5). Applying automatic extension to the
closure (Proposition 2.59) to the so described map from C ∞ (U ) ∩ H 1 (U )
to C(U ), the claims in Example 5.20 follow.
Exercise 5.22. Let U = (0, 1) and let f : U → C be a function continuous on U and

continuously differentiable at all but finitely many points of U . If we also have f ′ ∈ L2 (U ),
show that f ∈ H 1 (U ).
Exercise 5.23. Show that Example 5.20 can be generalized to the statement that any
function f ∈ H k ((0, 1)) (continuously extended) belongs to C k−1 ([0, 1]). Show also
that H01 ((0, 1)) is mapped under the embedding from Example 5.20 into the space
{f ∈ C([0, 1]) | f (0) = f (1) = 0}.

Exercise 5.24. Show that not every function in H01 ((0, 1))∩H 2 ((0, 1)) is also in H02 ((0, 1)).
Justification of Example 5.21. It is easy to check that f ∈ L2 (U ). Since f

is also smooth on U , we only have to check that ∂j f ∈ L2 (U ). By the chain
rule we have
1 1 xj
∂j (log |log kxk|) = .
log kxk kxk kxk
Taking the square and integrating with respect to dx dy = r dφ dr we get
s s
Z 1/2 Z 2π Z 1/2
1 1
k∂j f kL2 (U) 6 2
r dφ dr ≪ dr < ∞.
0 0 (r log r) 0 r(log r)2

Exercise 5.25. Extend Example 5.21 by showing that f (x) = loglog kxk defines an
element of H 1 (B1/2 ).
d
Exercise 5.26. Let U = B1R , and let fα (x) = kxkα for x ∈ U . For which values of α do
we have fα ∈ H k (U )?
5.2.2 Restriction Operators and Traces
Essential Exercise 5.27 (Open subsets). Let V ⊆ U ⊆ Rd be open sub-

sets. Let k > 0.
(a) Show that the restriction ·|V : H k (U ) → H k (V ) is a bounded operator.
(b) Show that the extension operator sending functions in H0k (V ) to H0k (U ),
defined by extending the functions to be zero on U rV , is a bounded operator.
(c) Show that for k > 1 in general H0k (U )|V does not belong to H0k (V ).
Show that one cannot define an extension operator from H k (V ) to H k (U )
by simply extending the functions to be zero on U rV .
In order to get a better geometric understanding of what it means for a
function to belong to H 1 (U ), we will now show that an element f ∈ H 1 (U ),
when restricted to any hyperplane, still belongs to L2 . Notice that since a
hyperplane is a null set, any property claimed for the restriction to a hy-
perplane cannot be demanded — indeed does not even make sense — for
a function f ∈ L2 (U ) (as in this case f is really an equivalence class of
functions). For notational simplicity we start with the case U = (0, 1)d and
describe how Example 5.20 can be generalized to higher dimensions.
Example 5.28. Let U = (0, 1)d , and S = (0, 1)d−1 and write
Sy = S × {y} ⊆ U
for y ∈ [0, 1]. For every y ∈ [0, 1] there is a natural restriction operator
to L2 (Sy ), called the trace on Sy ,

H 1 (U ) ∋ f 7−→ f ∈ L2 (Sy ),
Sy
which for y ∈ (0, 1) is the continuous extension of the restriction operator

C ∞ (U ) ∩ H 1 (U ) ∋ f 7−→ f S .
y
Moreover, if we identify the space L2 (Sy ) with L2 (S) for all y ∈ [0, 1] (by
simply identifying Sy with S via the projection Sy ∋ (x, y) 7→ x ∈ S), then
we also have
p

f Sy − f Sy 2 6 kf kH 1 (U) |y1 − y2 |.
1 2 L (S)
Exercise 5.29. Prove the statements of Example 5.28 by the following steps:
(a) Fix some f ∈ C ∞ (U ) ∩ H 1 (U ) and apply Fubini’s theorem to see that the restriction
of f to {x} × (0, 1) belongs to H 1 ((0, 1)) for almost every x ∈ (0, 1)d−1 . Now apply
Example 5.20 (or more precisely (5.6)) to show that
Z 1 Z y Z 1
f (x, y) = f (x, s) ds + s∂2 f (x, s) ds + (s − 1)∂2 f (x, s) ds.
0 0 y
Notice that this also gives a definition for the trace in the cases y = 0 and y = 1. Use this
to estimate the L2 norm of the restriction of f to Sy .
(b) For the last statement show that for any f ∈ C ∞ (U ) ∩ H 1 (U ),
p
|f (x, y1 ) − f (x, y2 )| 6 |y1 − y2 | · k∂2 f kL2 ({x}×(0,1)) .
of Example 5.28 for any bounded open set U ⊆ R

Exercise 5.30. Prove an extension d
and the image of φ [0, 1]d−1 × {0} for a smooth map φ : [0, 1]d → U .
We now consider a general open set U ⊆ Rd and define the trace for ele-
ments of H01 (U ). For the statement that such functions vanish in the square-
mean sense at ∂U we want to assume that U has a sufficiently regular bound-
ary in the following sense (this may feel familiar after recalling the implicit
function theorem).
Definition 5.31. Let U ⊆ Rd be an open set and k ∈ N0 ∪ {∞}. We

say that U has a C k -smooth boundary if for every z (0) ∈ ∂U there exists
a neighbourhood Bε (z (0) ), a rotated coordinate system (which we denote
(0) (0)
by x1 , . . . , xd−1 , y) so that z (0) corresponds to (x1 , . . . , xd−1 , y (0) ), and a
function
(0) (0)
φ ∈ C k Bε (x1 , . . . , xd−1 )
such that U ∩ Bε (z (0) ) = {(x, y) ∈ Bε (z (0) ) | y < φ(x)}, as illustrated in

Figure 5.2. If k = ∞ then we simply say that U has smooth boundary.
Bε (z0 )
Fig. 5.2: A set with smooth boundary.
This includes examples like U = Br (x), but excludes U = (0, 1)d if k > 1
and d > 2. Notice that an open set with a C k -smooth boundary need not
be connected, simply connected, or bounded. Also note that the rotation
within Definition 5.31 does not affect whether a function belongs to H k (U ).
In fact, since a rotation R preserves the H k norm, a convergent sequence (fn )
in C ∞ (U ) ∩ H k (U ) is mapped to another convergent sequence (fn ◦ R)
in H k (R−1 U ).
Exercise 5.32. Let U ⊆ Rd be a bounded open set, and let Φ be a diffeomorphism (a
rotation, for example) defined on a neighbourhood of U. Let k > 0 be an integer. Show
that H k (U ) ∋ f 7→ f ◦ Φ ∈ H k (Φ−1 (U )) is an isomorphism (and in the case of a rotation,
an isometry) between H k (U ) and H k (Φ−1 (U )).
Proposition 5.33 (Trace on graphs). Let U ⊆ Rd be a bounded open set.

We let ε > 0, denote the variables (z1 , . . . , zd ) ∈ Rd by (x1 , . . . , xd−1 , y), and
assume that
φ : BεRd−1 (x(0) ) −→ R
is continuous. Then there is a natural restriction operator, called the trace
on the graph Graph(φ) = {(x, φ(x)) | x ∈ BεRd−1 (x(0) )} of φ,

H01 (U ) ∋ f 7−→ f Graph(φ) ∈ L2 (Graph(φ))
which satisfies
p

f Graph(φ) 6 ∂ d f kL2 (U) ,
δφ k∂
L2 (Graph(φ))
where δφ is chosen to have (x, φ(x) + δφ ) ∈ / U for all x ∈ Bε (x(0) ) (see

Figure 5.3) and we use the measure dx1 · · · dxd−1 on Graph(φ).
Consider now the case of a bounded open set U ⊆ Rd with C 0 -smooth
boundary and the function φδ (x) = φ(x) − δ with φ as in Definition 5.31.
1
Then, by Proposition 5.33, the trace of an element f ∈ H√
0 (U ) on this trans-
lated portion of the boundary has L2 norm of order O( δkf kH 1 (U) ). This
δφ
Graph(φ)
Fig. 5.3: The trace on the graph of φ.
explains the earlier claim that f ∈ H01 (U ) vanishes in the square-mean sense
at ∂U .
We also note that if we set φ(x) = y0 to be constant we obtain that the L2
norm of the restriction of f to U ∩ Rd−1 × {y0 } is bounded by a multiple
∂ y f k2 . Integrating the square of this inequality over y0 we obtain
of k∂
∂ y f k2
kf k2 ≪U k∂ (5.8)
for all f ∈ H01 (U ), where the implicit constant depends on the bounded open
set U .
Proof of Proposition 5.33. Recall that we write x for the first (d − 1)
coordinates and y for the last coordinate. For f ∈ Cc∞ (U ) we have
Z φ(x)+δφ
f (x, φ(x)) = − ∂d f (x, s) ds (5.9)
φ(x)
by our assumption on δφ . Taking the square and applying the Cauchy–

Schwarz inequality gives
Z 2 Z φ(x)+δφ
φ(x)+δφ
2
|f (x, φ(x))| = ∂d f (x, s) ds 6 δφ |∂d f (x, s)|2 ds.
φ(x) φ(x)
(5.10)
Integrated over x this gives
p p
f |Graph(φ) 2 6 δ φ k∂ d f k 2 (U) 6 δφ kf kH 1 (U) .
L (Graph(φ)) L
Using automatic extension to the closure (Proposition 2.59), this implies the
proposition.
5.2.3 Sobolev Embedding in the Interior
We now extend the Sobolev embedding theorem (Theorem 5.6) to open sub-
sets U ⊆ Rd (but, for now, leave open the question regarding the behaviour
of f at ∂U ).
Theorem 5.34 (Sobolev embedding on open subsets of Rd ). Let U be

an open subset of Rd and let ℓ > 0 and k > d2 + ℓ be integers. Then any
function in H k (U ) (has a continuous representative that) also lies in C ℓ (U ).
In the following exercise we simplify the geometry again and only look at
the unit cube, but go further than in the theorem above.
Exercise 5.35. Let U = (0, 1)d as in Example 5.28.
(a) Extend Example 5.28 by showing that f ∈ H k (U ) implies that f |Sy ∈ H k−1 (Sy )
for y ∈ [0, 1].
(b) Use (a) to prove the following weak version of the Sobolev embedding theorem. For
any ℓ > 0 there is a natural map from H d+ℓ (U ) to C ℓ (U ) that extends the identity on the
functions in C ∞ (U ) ∩ H d+ℓ (U ).
(c) Extend the arguments from (b) to the boundary, by showing that there is a natural
map from H d+ℓ (U ) to C ℓ (U ).
(d) Now improve the needed regularity in (c) in the following way: Show that there is a
natural map from H k (U ) to C ℓ (S0 ) if k > ℓ + 1 + d−1 2
(by also applying the Sobolev
embedding theorem on S0 ).
The proof of Theorem 5.34 will be (apart from Exercise 5.18) the first
example of a technique that we will use frequently: If a given statement
is already known to hold on Td (where we can use Fourier series to prove
it), then one can sometimes obtain the same statement for open subsets
of Rd by moving the functions or the problem to Td . For this the following
lemma and the notation TdR = Rd /(2RZd ) for R > 0 will be useful. We
also define H k (TdR ) in the same way as we defined H k (Td ) except that we
will use the fundamental domain [−R, R)d and the restriction of the Lebesgue
measure to it to define the L2 norm and the derived Sobolev norms. Of course
the theorems of the previous section also hold in that context (possibly with
different multiplicative constants).
Lemma 5.36 (Transfering regularity). Let U ⊆ Rd be open, k > 1,
and χ ∈ Cc∞ (V ) for some open V ⊆ U . Then Mχ : H k (U ) → H0k (V )
defined by Mχ (f ) = χf is a bounded operator. Let R > 0 and assume now
that U ⊆ BR . For a function f on U we define P (f ) on TdR by first ex-
tending f to [−R, R)d by setting it to be zero outside of U and then identi-
fying [−R, R)d with TdR . Then P : H0k (U ) → H k (TdR ) is a linear isometry.
Finally, f ∈ H k (U ), χ ∈ Cc∞ (U ) and P (χf ) ∈ H ℓ (TdR ) for some ℓ > k
implies that χf ∈ H0ℓ (U ).
Proof. For f ∈ C ∞ (U ) ∩ H k (U ) we have Mχ (f ) = χf ∈ Cc∞ (V ) ⊆ H0k (V )

and the partial derivatives of χf are sums of products of partial derivatives
of χ and partial derivatives of f of lower order (the full expansion is given by
Leibniz’ rule). Since χ ∈ Cc∞ (V ) this gives the estimate
k∂α (χf )kL2 (V ) ≪ sup k∂β χk∞ sup k∂γ f kL2 (U)
kβk1 6kαk1 kγk1 6kαk1
for all α with kαk1 6 k, which leads to kMχ (f )kH k (V ) ≪χ kf kH k (U) . From
this it follows that the operator Mχ is a bounded operator from H k (U )
into H0k (V ) (and that the weak partial derivatives ∂ α (χf ) for f ∈ H k (U )
are obtained by the same Leibniz rule as the partial derivates ∂α (χf ) for f
in C ∞ (U ).
For the second statement of the lemma notice first that
P (Cc∞ (U )) ⊆ C ∞ (TdR )
(which would not be true for C ∞ (U ) ∩ H k (U )). Since the norm on H k (TdR )
is defined by integration over the Lebesgue measure on (−R, R)d , and since
U ⊆ (−R, R)d
we obtain kP (f )kH k (TdR ) = kf kH k (U) for f in Cc∞ (U ). Hence the automatic

extension of P to an operator from H0k (U ) to H k (TdR ) (Proposition 2.59)
exists and is an isometry.
Assume now that P (χf ) lies in H ℓ (TdR ) for some f ∈ H k (U ), ℓ > k,
and χ ∈ Cc∞ (U ). Let ψ in Cc∞ (U ) be chosen with ψ ≡ 1 on a neighbourhood
of Supp χ and with Supp ψ ⊆ U (see Exercise 5.37). Since any g ∈ C ∞ (TdR )
can be identified with a (2RZ)d -periodic smooth function g on Rd , the map
C ∞ (TdR ) ∋ g 7−→ ψg ∈ Cc∞ (U )
is well-defined. Arguing just as in the first part of the proof, we get that
multiplication by ψ defines a bounded operator from H ℓ (TdR ) to H0ℓ (U ). Ap-
plying this map to g = P (χf ) ∈ H ℓ (TdR ) we get ψP (χf ) = χf ∈ H0ℓ (U ).

The existence of functions in Cc∞ (U ) that are equal to one on large subsets
of U (as used in the above proof) will frequently be useful.
Essential Exercise 5.37 (Smooth approximate characteristic func-
tions). Let K ⊆ U be a compact subset of an open subset U ⊆ Rd . Find a
smooth function ψ ∈ Cc∞ (U ) with ψ|K ≡ 1.
Proof of Theorem 5.34. Let x0 ∈ U and ε > 0 be such that
d
R
V = B2ε (x0 ) ⊆ U.
Assume k > d2 + ℓ. We fix some χ ∈ Cc∞ (V ) satisfying χ|Bε (x0 ) ≡ 1. Note

that χ ∈ Cc∞ (V ) ⊆ Cc∞ (BR ) for R = kx0 k + 2ε. The theorem follows from
the existence of the following chain of operators
P ◦Mχ ı ·|Bε (x )
H k (U ) −→ H k (TdR ) −→ C ℓ (TdR ) −→0 Cbℓ (Bε (x0 )) −→ L2 (Bε (x0 )),
where Mχ : H k (U ) → H0k (V ) and P : H0k (V ) → H k (TdR ) are as in

Lemma 5.36, ı is from the Sobolev embedding theorem for the torus (The-
orem 5.6), and ·|Bε (x0 ) denotes the operator sending a function to its re-
striction to Bε (x0 ). We also note that the composition of these operators
applied to f ∈ H k (U ) simply gives the restriction of f to Bε (x0 ) (since this

holds initially for f ∈ C ∞ (U ) ∩ H k (U )). This gives the theorem (see also
Exercise 5.38).
Exercise 5.38 (Merging lemma for continuous functions). If we wish to be pedantic

the above proof is not yet complete since we have only shown that for every point x there
exists a neighbourhood Bε (x) such that we can find a C ℓ -version of the restriction of f to
the given neighbourhood. However, we actually claimed that there is a version of f on all
of U which is in C ℓ (U ). To complete the proof, prove or recall the following statements:
(a) Suppose that U is covered by a family of open subsets Bτ for τ ∈ T and for every τ ∈ T
we are given some fτ ∈ C(Bτ ). Assume that fτ1 |Bτ1 ∩Bτ2 = fτ2 |Bτ1 ∩Bτ2 for every τ1 , τ2
in T . Then there exists some f ∈ C(U ) with fτ = f |Bτ for all τ ∈ T .
(b) Use the fact that U is σ-compact to construct a countable cover Bn = Bεn (xn ) of U
with B2εn (xn ) ⊆ U .
(c) Complete the proof of Theorem 5.34 above, using (a) and (b).
Exercise 5.39. Let U ⊆ Rd be open and K ⊆ U a compact subset. Show that
kf kK,∞ ≪K,U kf kH k (U )
d
for f ∈ H k (U ) and k > 2
.
5.3 Dirichlet’s Boundary Value Problem and Elliptic

Regularity
In this section we will combine the discussion of Sobolev spaces from Sec-
tion 5.2, the Fréchet–Riesz representation theorem (Corollary 3.19), and a
simple orthogonality relation to solve the Dirichlet boundary value problem

∆u = 0
(5.11)
u|∂U = b
introduced and motivated in Section 1.2.1 for certain domains U ⊆ Rd and

certain functions b : ∂U → R. Recall that a function g is said to be harmonic
if ∆g = 0, where
∂2g ∂2g
∆g = + · · · + = ∂12 g + · · · + ∂d2 g
∂x21 ∂x2d
is the Laplacian of g.
We note that (with the exception of Lemma 5.48 and its proof) we restrict
our attention in this section to real-valued functions. In the following we will
also write h·, ·iL2 (U) to denote the inner product on L2 (U ), and similarly for
other Hilbert spaces, to emphasize the difference between the various inner
products used, especially for the semi-inner product h·, ·i1 as introduced in
5.3 Dirichlet’s Boundary Value Problem and Elliptic Regularity 153
the next lemma. Recall from Definition 3.1 that a semi-inner product satisfies
positivity instead of strict positivity.
Lemma 5.40 (Orthogonality). Let U ⊆ Rd be open, φ ∈ Cc∞ (U ), and

assume that g ∈ C 2 (U ) ∩ H 1 (U ) is a harmonic function. Then φ and g are
orthogonal with respect to the semi-inner product h·, ·i1 defined by
d
X
hu, vi1 = ∂ j u, ∂ j viL2 (U)
h∂ (5.12)
j=1
for u, v ∈ H 1 (U ).
Proof. Fix j ∈ {1, . . . , d} and φ ∈ Cc∞ (U ). Then as in Lemma 5.10 we can

use integration by parts to obtain
Z Z

∂j g∂j φ dx = − ∂j2 g φ dx
U U
since the boundary terms vanish. Integrating over the remaining variables
and summing over all j = 1, . . . , d, we get hg, φi1 = h−∆g, φiL2 (U) = 0 by
the assumption on g.
Motivated by Lemma 5.40, the approach is to decompose a function f
in C 1 (U ) as f = g + v, where v ∈ H01 (U ) and g is ‘orthogonal to’ H01 (U )
with respect to the semi-inner product h·, ·i1 . As harmonic functions have
this orthogonality property by Lemma 5.40, there is some hope that g will be
harmonic and indeed it will turn out to be. Morevoer, v will vanish at ∂U in
the square-mean sense and so f |∂U = g|∂U at least in the square-mean sense.
As we wish to use the semi-inner product from (5.12) in the definition of
the orthogonal complement, we will have to discuss properties of this semi-
inner product. We will then show that g is smooth and harmonic, and it is this
step that relies on a general phenomenon called elliptic regularity, the Laplace
operator being an example of an elliptic differential operator. We will show in
Section 5.3.3, for d = 2, that g extends continuously to the boundary ∂U and
agrees with f |∂U there. Finally, we will discuss in Section 8.2.2 the behaviour
at a smooth boundary in any dimension.
5.3.1 The Semi-Inner Product
Let U ⊆ Rd be an open bounded set.
Lemma 5.41 (Semi-inner product). The semi-inner product h·, ·i1 re-
stricted to H01 (U ) is an inner product, and the norm defined by this inner
product is equivalent to k · kH 1 (U) . The semi-norm k · k1 induced by h·, ·i1
on C ∞ (U ) ∩ H 1 (U ) has as its kernel the subspace of all locally constant func-
tions.
Here the kernel is the subspace of all functions f with hf, f i1 = kf k21 = 0.
A function f on U is called locally constant if for every x ∈ U there is a
neighbourhood V of x such that f |V is constant. If U is connected, then any
locally constant function is constant.
Proof of Lemma 5.41. Let f ∈ H01 (U ). We have kf kL2 (U) ≪ k∂ ∂ xd f kL2 (U)
by (5.8). Thus
q q Pd q
∂ j f k2L2 (U) ≪ hf, f i1
hf, f i1 6 kf kH 1 (U) = kf k2L2 (U) + j=1 k∂
for f ∈ H01 (U ), proving the first statement in the proposition.

If f ∈ C ∞ (U ) ∩ H 1 (U ) is locally constant then it is clear that hf, f i1 = 0.
On the other hand, if f ∈ C ∞ (U ) ∩ H 1 (U ) has hf, f i1 = 0, then ∂j f = 0
almost everywhere and for all j, so f is locally constant.
We are now ready to exhibit the desired orthogonal decomposition.
Proposition 5.42 (Existence of weak solution). Let U ⊆ Rd be an open

bounded set with C 1 -smooth boundary, and let f ∈ C 1 (U ) (that is, f and
all of its partial derivatives are continuous and extend continuously to U ).
Then there exists some v in H01 (U ) such that g = f − v ∈ H 1 (U ) is weakly
harmonic in the sense that hg, ∆φiL2 (U) = 0 for all φ ∈ Cc∞ (U ).
As before with ∂ and ∂ , we will think of this statement as giving meaning

to ‘∆g = 0’ by writing h∆g, φiL2 (U) = hg, ∆φiL2 (U) = 0 for all φ ∈ Cc∞ (U ).
If ∆g = 0, then we say that g is weakly harmonic.
Proof of Proposition 5.42. We equip Cc∞ (U ) with the inner product h·, ·i1 .
By Lemma 5.41, k·k1 defines an inner product on H01 (U ), which makes H01 (U )
into a Hilbert space. Let f ∈ C 1 (U ) be as in the statement of the proposition,
and notice that
Xd
ℓ(u) = hu, f i1 = h∂j u, ∂j f iL2 (U) (5.13)
j=1
defines a linear functional on H01 (U ) since

d
X d
X
|ℓ(u)| 6 ∂ j u, ∂j f iL2 (U) | 6
|h∂ ∂ j ukL2 (U) k∂j f kL2 (U) ≪f kukH01 (U)
k∂
j=1 j=1
for all u ∈ H01 (U ). Applying the Fréchet–Riesz representation theorem (Co-

rollary 3.19) for the Hilbert space H01 (U ) with the inner product h·, ·i1 we
find some v ∈ H01 (U ) with
ℓ(φ) = hφ, vi1 (5.14)
for all φ ∈ H01 (U ). This implies for every φ ∈ Cc∞ (U ) that
d
X
2
h∆φ, f − viL2 (U) = ∂j φ, f − v L2 (U) (by definition of ∆)
j=1
d
X
=− h∂j φ, ∂j f − ∂ j viL2 (U) (by Lemma 5.10)
j=1
= −ℓ(φ) + ℓ(φ) = 0, (by (5.13) and (5.14))
completing the proof.
5.3.2 Elliptic Regularity for the Laplace Operator
In this section we will upgrade the conclusion from the previous section to
show that the weakly harmonic function g is actually smooth and harmonic.
The principle at work here is much more general, and is called elliptic reg-
ularity. We will again rely on Fourier series in the argument, and this will
only give the result in the interior of U and not at the boundary ∂U . For this
reason, it is natural to start with functions that have little structure on ∂U ,
as in the following definition.
Definition 5.43. A measurable function f on U is called locally Lp for

some p ∈ [1, ∞] if 1K f ∈ Lp (U ) for every compact set K ⊆ U . In this
case we write f ∈ Lploc (U ). A measurable function f on U is called loc-
ally H k for some k ∈ N0 if χf ∈ H k (U ) for all χ ∈ Cc∞ (U ). In this case we
k
write f ∈ Hloc (U ).
Notice that the characteristic function 1K localizes f and removes the

values of f near the boundary ∂U ; in the second case χ has the same effect but
is chosen to be C ∞ so as not to disturb any of the smoothness properties of f .
Clearly Lp (U ) ⊆ Lploc (U ), and by Lemma 5.36 we also have H k (U ) ⊆ Hloc
k
(U ).
Exercise 5.44. Let f be a measurable function on an open subset U ⊆ Rd . Show that f
k (U ) if and only if f | o ∈ H k (K o ) for every compact K ⊆ U , if and only
lies in Hloc K
if χf ∈ H0k (U ) for all χ ∈ Cc∞ (U ) (we have not used this as our definition as we will need
the functions χ as in Definition 5.43 in the proofs anyway).
Theorem 5.45 (Elliptic regularity for ∆ inside open subsets of Rd ).

Suppose that U ⊆ Rd is open and bounded, and g ∈ Hloc 1
(U ). Assume
that ∆g ∈ Hloc (U ) for k ∈ N0 , in the sense that there exists some u ∈ Hloc
k k
(U )
with
h∆g, φiL2 (U) = hg, ∆φiL2 (U) = hu, φiL2 (U)
k+2
for all φ ∈ Cc∞ (U ). Then g ∈ Hloc (U ).
The assumption of boundedness is not important, but simplifies the dis-

cussion slightly and is sufficient for all our applications of the theorem. Sim-
ilarly, the assumption on the regularity of g could be significantly weakened.
Roughly speaking, the theorem says that if ∆g exists, then the Sobolev-
regularity of g must be two more than that of ∆g. In other words, any
non-smoothness of g will be visible also in ∆g, or there is no cancellation
of singularities when ∆g is calculated from g. This remarkable result has
many striking consequences, a few of which we list here.
1
Corollary 5.46. If g ∈ Hloc (U ) has ∆g = u ∈ C ∞ (U ) (or g is weakly
harmonic in the sense that ∆g = 0), then g ∈ C ∞ (U ) satisfies ∆g = u
(respectively is harmonic).
k
Proof. Since u ∈ Hloc (U ) for all k ∈ N0 , Theorem 5.45 implies that
k+2
g ∈ Hloc (U )
for all k > 0. Hence χg ∈ H k+2 (U ) for all k > 0 and all functions χ ∈ Cc∞ (U ).
By the Sobolev embedding theorem for open subsets (Theorem 5.34), this
implies that χg ∈ C ∞ (U ) for all χ ∈ Cc∞ (U ). Choosing χ ∈ Cc∞ (U ) equal
to 1 on a neighbourhood of a given x ∈ U shows that g ∈ C ∞ (U ), since it
is C ∞ in a neighbourhood of each point. Finally, integration by parts gives
h∆g, φi = hg, ∆φi = h∆g, φi = hu, φi
for all φ ∈ Cc∞ (U ). By density of Cc∞ (U ) ⊆ L2 (U ) and continuity of ∆g

and u we see that ∆g = u.
The following might be even more surprising and will be important in the
next chapter.
1
Corollary 5.47. If g ∈ Hloc (U ) is a weak eigenfunction of ∆ in the sense
that there exists a λ ∈ C with h∆g, φiL2 (U) = hg, ∆φiL2 (U) = λ hg, φiL2 (U) for
all φ ∈ Cc∞ (U ), then g ∈ C ∞ (U ) and ∆g = λg.
1
Proof. By assumption, ∆g = λg ∈ Hloc (U ) and so by Theorem 5.45 we
3 3
also have g ∈ Hloc (U ). However, this shows that ∆g = λg ∈ Hloc (U ) and
5
Theorem 5.45 may be applied again to see that g ∈ Hloc (U ), and so on.
k
It follows that g ∈ Hloc (U ) for all k > 0, and arguing as in the proof of
Corollary 5.46 we see that g ∈ C ∞ (U ).
We will prove Theorem 5.45 in two steps: firstly we deal with the case
of functions on Td (which turns out to be easy because of Fourier series),
and secondly we show how to transfer the theorem from Td to open subsets
of U . Morally the second step (the transfer) should be the easy step as we are
discussing the ‘Laplace operator’ on both of these spaces. However, some care
is necessary as ∆ has different meanings on Td and on U since the spaces
of allowed test functions in the definition of ∆ are C ∞ (Td ) and Cc∞ (U ),
respectively.
Lemma 5.48 (Elliptic regularity on Td ). Let g ∈ L2 (Td ), and assume

that ∆g = u ∈ H k (Td ) so h∆g, φiL2 (Td ) = hg, ∆φiL2 (Td ) = hu, φiL2 (Td ) for
all φ ∈ C ∞ (Td ). Then g ∈ H k+2 (Td ).
Proof. If ∆g P = u as in the lemma, then u is uniquely determined by g.

Indeed, if g = n∈Zd cn χn is the Fourier series of g then
X
u=− cn (2π)2 knk22 χn
n∈Zd
is the Fourier series of u. This follows from Fourier series on the torus (The-
orem 3.54) since the characters χn are eigenfunctions of the Laplace operator:
∆χn = (2πi)2 knk22 χn = −(2π)2 knk22 χn
and so
hu, χn iL2 (Td ) = hg, ∆χn iL2 (Td ) = −(2π)2 knk22 hg, χn iL2 (Td ) .
| {z }
=cn
By assumption u ∈ H k (Td ), which shows that

X
cn (2π)2 knk22 2 knk2k
2 <∞
n∈Zd
by the characterization of H k (Td ) in terms of the Fourier series in Lemma 5.4.

However, this is equivalent to
X 2(k+2)
|cn |2 knk2 < ∞,
n∈Zd
which implies that g ∈ H k+2 (Td ), again by Lemma 5.4.

At first it is a bit hard to pinpoint where the non-cancellation of singu-
larities in Theorem 5.45 really comes from. However, if one is determined to
find a single step within the proof to blame for this, then it would be the fact
that the eigenvalues of the character χn grow with the rate knk22 . This would,
for instance, not be true for the non-elliptic (hyperbolic) partial differential
operator D = ∂12 − ∂22 corresponding to the wave equation in two dimensions
— for this operator the eigenvalues on characters can cancel (that is, be zero
or much smaller than knk22 ) and there are also many non-smooth solutions
to Dg = 0.
Before extending Lemma 5.48 to open subsets (Theorem 5.45), we state
ℓ
how the localizing step in the definition of Hloc (U ) in which g is replaced
by χg affects weak derivatives and the assumption regarding ∆g. As the
statements of the next two lemmas should be easy to believe and the proofs
rely only on the definitions and are arguably a bit tedious, we postpone their
proofs until after the proof of Theorem 5.45.
Lemma 5.49. Let U ⊆ Rd be open and g ∈ Hloc ℓ

(U ) with ℓ > 1. Then there
ℓ−1
exists a weak partial derivative ∂ j g ∈ Hloc (U ) ⊆ L2loc (U ) with
∂ j g, φiL2 (U) = − hg, ∂j φiL2 (U)

h∂
for all φ ∈ Cc∞ (U ) and j = 1, . . . , d.
Lemma 5.50. Let U ⊆ Rd be open and bounded. If g lies in Hloc ℓ

(U ) for
k ∞
some ℓ > 1, ∆g = u ∈ Hloc (U ) with k > 0, and χ ∈ Cc (U ), then
d
X
∆(χg) = χu + (∆χ)g +2 ∂ j g) ∈ H min{k,ℓ−1} (U ).
(∂j χ)(∂
|{z} | {z }
j=1
∈H k ∈H ℓ | {z }
∈H ℓ−1
Notice that if g happened to be smooth, then the formula in the lemma

calculates ∆(χg) since in that case
d
X d
X
∆(χg) = ∂j2 (χg) = ∂j (χ(∂j g) + (∂j χ)g)
j=1 j=1
d
X
= χ(∂j2 g) + 2(∂j χ)(∂j g) + (∂j2 χ)g.
j=1
Proof of Theorem 5.45. Let R > 0 and U ⊆ BR be an open subset of Rd .

Suppose k ∈ N0 and that g ∈ Hloc 1
(U ) has ∆g = u ∈ Hloc k
(U ) weakly. We
k+2 ℓ
wish to show that g ∈ Hloc (U ), and will do so by showing that g ∈ Hloc (U )
by induction on ℓ ∈ {1, . . . , k + 2}. The case ℓ = 1 is the assumption in the
ℓ
theorem. So suppose that 1 6 ℓ 6 k + 1, g ∈ Hloc (U ), and fix χ ∈ Cc∞ (U ).
Then min(k, ℓ−1) = ℓ−1 and ∆(χg) = u1 ∈ H ℓ−1 (U ) weakly by Lemma 5.50.
This means that
hu1 , φiL2 (U) = hχg, ∆φiL2 (U) (5.15)
for all φ ∈ Cc∞ (U ).
Using the fact that χ ∈ Cc∞ (U ) and U ⊆ BR , we can now make a switch
in this formula to TdR as follows. For this we will use Lemma 5.36 and its
notation. Let ψ ∈ Cc∞ (U ) be such that ψ ≡ 1 on Supp χ. Then ψφ lies
in Cc∞ (U ) for any φ ∈ C ∞ (TdR ), where TdR = Rd /(2RZ)d and functions
on TdR are identified with (2RZ)d -periodic functions on Rd . Applying (5.15)
to ψφ we get
hP (ψu1 ), φiL2 (Td ) = hψu1 , φiL2 (U) = hu1 , ψφiL2 (U) = hχg, ∆(ψφ)iL2 (U) .
R
Since ψ is one and its derivatives are zero at any point of Supp χ, we can
remove ψ on the right-hand side. One may wonder why ψ is introduced in
the first place, since it is only brought in so that it can be removed again.
The answer lies in the definition of ∆ which depends crucially on a choice of
test functions. In particular, ∆ is defined differently on Td and on U — we
use ψ to bridge between these two definitions.
We now obtain
hP (ψu1 ), φiL2 (Td ) = hχg, ∆φiL2 (U) = hP (χg), ∆φiL2 (Td )

R R
for any φ ∈ C ∞ (TdR ). In fact, by definition of Hloc

k
(U ) we have χg ∈ H k (U ).
k
By Lemma 5.36 ψχg ∈ H0 (U ), but ψχg = χg by our choice of ψ, so we
deduce that χg ∈ H0k (U ) and P (χg) ∈ H k (TdR ) is defined by Lemma 5.36.
In other words, we have shown that ∆(P (χg)) = P (ψu1 ) ∈ H ℓ−1 (TdR )
weakly by Lemma 5.36 (where elements of C ∞ (TdR ) are used as the test
functions). By Lemma 5.48 it follows that P (χg) ∈ H ℓ+1 (TdR ), which by the
last claim of Lemma 5.36 allows us to pull the statement back to U and
deduce that χg lies in H ℓ+1 (U ). Since this holds for all χ ∈ Cc∞ (U ) we see
ℓ+1
that g ∈ Hloc (U ). Repeating the argument and increasing ℓ each time, we
ℓ+1 k+2
eventually reach ℓ = k + 1 and then g ∈ Hloc (U ) = Hloc (U ).
We now give the proof of the lemmas that were used in the theorem. In the
remainder of the chapter we will only consider the inner product on L2 (U )
and hence will again simply write h·, ·i.
Proof of Lemma 5.49. Fix some j ∈ {1, . . . , d}. Let V ⊆ U be an open
subset with V ⊆ U compact, and choose χ ∈ Cc∞ (U ) with χ ≡ 1 on V .
Then χg ∈ H ℓ (U ) has a weak partial derivative along xj , which we will
denote by gχ,j ∈ H ℓ−1 (U ). By definition, we now have for the inner products
in L2 (U ) that
hgχ,j , φi = − hχg, ∂j φi = − hg, ∂j φi
for all φ ∈ Cc∞ (V ) ⊆ Cc∞ (U ). This shows that gχ,j |V ∈ H ℓ−1 (V ) is the weak
partial derivative of g|V along xj and by the properties of weak derivatives
(Lemma 5.10) is therefore uniquely determined by g|V (and independent
of χ). S
Now write U = n>1 Vn for an increasing sequence of open subsets of U
with compact closures within† U , and define gj to be gχn ,j on Vn where χn
and gχn ,j are the functions as above corresponding to the set V = Vn . By
Lemma 5.10 the function gj is well-defined almost everywhere.
Let φ ∈ Cc∞ (U ), then there exists some n with Supp φ ⊆ Vn , which
gives hgj , φi = hgχn ,j , φi = − hg, ∂j φi. Moreover, φgj = φχn gχn ,j ∈ H ℓ−1 (U )
by Lemma 5.36. As these two facts hold for every φ in Cc∞ (U ), we see
ℓ−1
that gj ∈ Hloc (U ) is a weak partial derivative of g along xj .
† For example, Vn = {x | d(x, RdrU ) > 1
n
} ∩ Bn .
k
Proof of Lemma 5.50. By assumption u ∈ Hloc (U ) and so χu ∈ H k (U ) by
k ℓ
definition of Hloc (U ) (Definition 5.43). Similarly, g ∈ Hloc (U ) and so (∆χ)g
ℓ ℓ
lies in H (U ). Finally, by assumption, g ∈ Hloc (U ) with ℓ > 1 and so ∂ j g lies
ℓ−1
in Hloc (U ) by Lemma 5.49, which gives (∂j χ)(∂ ∂ j g) ∈ H ℓ−1 (U ). Therefore,
d
X
χu + (∆χ)g + 2 ∂ j g) ∈ H min{k,ℓ−1} (U )
(∂j χ)(∂
j=1
and it remains to show that this function is equal to ∆(gχ) weakly. For this,
recall that ∆g = u, let φ ∈ Cc∞ (U ), and calculate
* d
+
X
χu + (∆χ)g + 2 ∂ j g), φ
(∂j χ)(∂
j=1
d
X
= hu, χφi + hg, (∆χ)φi + 2 ∂ j g, (∂j χ)φi
h∂
j=1
d
X
= hg, ∆(χφ)i + hg, (∆χ)φi − 2 hg, ∂j ((∂j χ)φ)i
j=1
* d
X
= g, (∆χ)φ + 2 (∂j χ)(∂j φ) + χ∆φ
j=1
d d
+
X X
+(∆χ)φ − 2 (∂j2 χ)φ −2 (∂j χ)(∂j φ)
j=1 j=1
= hg, χ∆φi = hχg, ∆φi .
As φ ∈ Cc∞ (U ) was arbitrary we see that

d
X
∆(χg) = χu + (∆χ)g + 2 ∂ j g)
(∂j χ)(∂
j=1
weakly.
5.3.3 Dirichlet’s Boundary Value Problem
For k > 0 and a bounded open set U ⊆ Rd the function space C k (U ) consists
of all continuous functions with f |U ∈ C k (U ) such that the partial derivatives
extend continuously to the closure U . If U has C k -smooth boundary, then
the function space C k (∂U ) is defined using the assumption that ∂U has the
structure of a manifold; the local charts allow smoothness properties to be
transported onto ∂U from a suitable subset of Rd−1 ; an example of how this

may be done appears in the proof of Theorem 5.51 below.
Theorem 5.51 (Dirichlet’s boundary value problem). Let U ⊆ Rd be
open and bounded with C 1 -smooth boundary and let f ∈ C 1 (∂U ). Then there
exists a function g ∈ H 1 (U ) such that g|U in C ∞ (U ) is harmonic and g|∂U
is equal to f in the square-mean sense. If d = 2 then g extends uniquely to
an element in C(U ), and g|∂U = f .
Proof of the existence of a solution in the square-mean sense.

Since U has C 1 -smooth boundary, we can find an extension of f , again de-
noted by f , to a function in C 1 (U ). We only sketch the argument. In an
open neighbourhood of a point z (0) in ∂U as in Definition 5.31, such a func-
tion can, for example, be chosen to be independent of the y-coordinate. Us-
ing compactness we find finitely many open sets V1 , . . . , Vk covering ∂U and
bounded C 1 -functions fj defined on Vj with fj (x) = f (x) for x ∈ ∂U ∩ Vj
and j = 1, . . . , k. Using a smooth partition ψ1 , . . . , ψk of unity (that is,
Pk
functions ψj ∈ Cc∞ (Vj ) for j = 1, . . . , k with j=1 ψj |∂U ≡ 1; see Exer-
Pk
cise 5.52) with the cover V1 , . . . , Vk , Vk+1 = U we may define f = j=1 ψj fj
(where ψj fj (x) = 0 for x ∈/ Vj and j = 1, . . . , k) and restrict it to U .
Now apply Proposition 5.42 to f , giving functions v in H01 (U ) and g
in H 1 (U ) such that f = g + v with g weakly harmonic in U . Now Co-
rollary 5.46 implies that g ∈ C ∞ (U ), so that ∆g (in the usual sense) is
well-defined. By integration by parts (see Lemma 5.10), it follows that
h∆g, φi = hg, ∆φi = h∆g, φi = 0
for all φ ∈ Cc∞ (U ), showing that ∆g = 0. That is, g is a harmonic function.

Moreover, v vanishes at ∂U in the square-mean sense (Proposition 5.33),
so g|∂U = f |∂U in L2 (∂U ).
Essential Exercise 5.52 (Smooth partion of unity). Let U ⊆ Rd be

open and bounded, let V1 , . . . , Vk be an open cover of U and let V0 = RdrU .
Show that there exist smooth functions ψj ∈ Cc∞ (Vj ) for j = 0, . . . , k (always
P
extended to Rd by setting ψj (x) = 0 for x ∈ RdrVj ) such that kj=0 ψj = 1.
Using the averaging property of harmonic functions (Proposition 5.53) and
Lemma 5.55 (which only works in two dimensions), we will upgrade the first
part to give the second part of the theorem for d = 2.
Proposition 5.53 (Mean value principle). Let U ⊆ Rd be open, and
let φ ∈ C ∞ (U ) be a harmonic function. Let x0 ∈ U and r > 0 be chosen
with Br (x0 ) ⊆ U . Then the value of the harmonic function at x0 is equal to
the average over the sphere of radius r around x0 , that is
Z
1
φ(x0 ) = φ(x0 + x) dσ(x),
σ(rSd−1 ) rSd−1
where σ denotes the natural area measure on the sphere rSd−1 .
Proof. Without loss of generality x0 = 0. The proof consists of applying

the d-divergence theorem to the vector field f (x) = φ(x)∇v(x) − v(x)∇φ(x),
where v : Rdr{0} → R is an auxiliary function. In fact, v is defined by
(
log kxk2 for d = 2,
v(x) = 1
kxkd−2
for d > 2.
2
A direct calculation shows that it satisfies

( 1 x
kxk2 kxk2 for x 6= 0 and d = 2,
∇v = 2−d x
kxkd−1 kxk2
for x 6= 0 and d > 2,
2
and ∆v = div ∇v = 0 for x 6= 0. For ε ∈ (0, r) the divergence theorem applied

to f on the annulus BrrBε ⊆ Rd has the form
Z Z Z
f · n dσ − f · n dσ = div f dx1 · · · dxd ,
∂Br ∂Bε Br rBε
x
where n = kxk 2
is the normalized outward normal vector to the sphere of
radius kxk2 at x and we also write σ for the area measure on ∂Bε . For the
right-hand side, we calculate for f as above
div f = div (φ∇v − v∇φ) = ∇φ · ∇v + φ∆v − ∇v · ∇φ − v∆φ = 0.
Therefore Z Z
f · n dσ = f · n dσ. (5.16)
∂Br ∂Bε
For x with kxk2 = r we have
f (x) · n = (φ(x)∇v(x) − v(x)∇φ(x)) · n

c1 x x
= φ(x) · − v(x)∇φ(x) · n
kxkd−1
2
kxk 2 kxk 2
c1
= φ(x) d−1 − c2 ∇φ(x) · n,
r
where c1 = 1 if d = 2 or c1 = 2 − d if d > 2 and c2 = v(x) which is a constant
for kxk = r since v depends only on kxk2 . Furthermore, we notice that
Z Z
∇φ · n dσ = div ∇φ dx1 · · · dxd = 0.
∂Br Br | {z }
=∆φ=0
Using this and the analogous formula for kxk = ε allows us to write (5.16) as
Z Z
1 1
φ dσ = φ(x) dσ.
rd−1 rSd−1 εd−1 εSd−1
Now divide by the area of Sd−1 and notice that we get

Z Z
1 1
φ dσ = φ dσ −→ φ(0)
σ(rSd−1 ) rSd−1 σ(εSd−1 ) εSd−1
as ε → 0, by continuity of φ.
Exercise 5.54. Use Proposition 5.53 to prove that any bounded harmonic function on Rd
is constant.
Lemma 5.55 (Convergence on average). Suppose that U ⊆ R2 is open

and bounded with C 1 -smooth boundary. Let v ∈ H01 (U ), which we extend to
a function on R2 by setting it equal to 0 outside U . Then for any z (0) ∈ ∂U
we have Z
1
|v| dx dy −→ 0 (5.17)
ε2 z(0) +Bε
as ε → 0, uniformly for all z (0) ∈ ∂U .
Proof. To prove (5.17) for z (0) ∈ ∂U we use the assumption that U has C 1 -
smooth boundary and rotate the coordinate system so that Bδ (z (0) ) ∩ U
can be described as in Definition 5.31 (see also Exercise 5.32). By a further
rotation and by shrinking δ if necessary we may also assume that |φ′ (x)| <1
for all x ∈ Bδ (x0 ) with the notation z (0) = (x0 , y0 ) and φ ∈ C 1 Bε (x0 as in
Definition 5.31. We claim that for ε ∈ (0, 14 δ), x1 ∈ Bδ (x0 ), and y1 = φ(x1 )
we have
Z Z sZ Z
x1 +ε y1 +ε x1 +ε y1 +ε
|v| dx dy 6 2ε2 ∂ 2 v|2 dx dy,
|∂ (5.18)
x1 −ε y1 −ε x1 −ε y1 −ε
first for v ∈ Cc∞ (U ), which then extends by continuity to all v ∈ H01 (U ). We

will call the domain of integration [x1 − ε, x1 + ε] × [y1 − ε, y1 + ε] the ε-box
around (x1 , y1 ).
To prove (5.18) for v ∈ Cc∞ (U ) we note first that (x, y1 + ε) ∈ / U for
all x with |x − x1 | < ε (by the local description of U in terms of φ, the
assumption that |φ′ (x)| < 1, and the mean value theorem). If (x, y) ∈ U
with |x − x1 |, |y − y1 | < ε this gives
Z y1 +ε
v(x, y) = − ∂2 v(x, s) ds.
y
Now integrate the absolute value of v(x, y) with respect to x and y to get
Z x1 +ε Z y1 +ε Z x1 +ε Z y1 +ε Z y1 +ε
|v(x, y)| dy dx 6 |∂2 v(x, s)| ds dy dx
x1 −ε y1 −ε x1 −ε y1 −ε y
Z x1 +ε Z y1 +ε
= |∂2 v(x, s)||s − y1 + ε| ds dx,
x1 −ε y1 −ε
where we simply switched the order of integration, used the identity
1[y,y1+ε] (s) = 1[y1 −ε,s] (y)

for s, y ∈ [y1 − ε, y1 + ε], and evaluated the integral over y. Writing Qε
for the ε-box [x1 − ε, x1 + ε] × [y1 − ε, y1 + ε] around (x1 , y1 ) and applying
Cauchy–Schwarz to the last integral we get
ZZ sZ Z sZ Z
|v(x, y)| dy dx 6 |∂2 v|2 dx dy |s − y1 + ε|2 ds dx
Qε Qε Qε
sZ Z
√
62 |∂2 v|2 dx dy ε4
Qε
since |s − y1 + ε| 6 2ε for all s ∈ (y1 − ε, y1 + ε). This gives the estimate

claimed.
It is clear that (5.18) implies (5.17) since ∂ 2 v ∈ L2 (U ) and hence the L2
norm of ∂ 2 v restricted to smaller and smaller subsets of neighbourhoods
of z (1) converges to zero.
The uniformity claim in the lemma also follows from the discussion
above.
Indeed, we only assumed that x1 ∈ x0 − 4δ , x0 + 4δ and ε ∈ 0, 4δ .
y1
y0
x0 x1

Fig. 5.4: The point z (1) = x1 , y1 ∈ ∂U and the ε-box Qε , containing the ε-ball.
We can make the L2 norms of ∂ 2 v restricted to the ε-box around z (1)

uniformly small for all z (1) as above by using the fact that
Z x0 + 2δ Z φ(x1 )
∂ 2 v| dy dx −→ 0
|∂
x0 − δ2 φ(x1 )−2ε
as ε → 0 by dominated convergence. Since the ε-ball around z (1) as in (5.17) is

contained in the ε-box Qε around z (1) as in (5.18) we obtain (5.17) uniformly
for z (1) = (x1 , φ(x1 )) ∈ ∂U and x1 ∈ (x0 − 2δ , x0 + δ2 ). Using the fact that ∂U
is compact we can find finitely many neighbourhoods of points z (0) ∈ ∂U as
above, and the lemma follows.
Exercise 5.56. Describe what prevents the proof of Lemma 5.55 from extending to higher
dimensions by trying to emulate the calculations involved.
Proof of pointwise boundary condition in Theorem 5.51. Recall that

we already established that f can be written as f = g + v with g ∈ H 1 (U )
harmonic and v ∈ H01 (U ). Let z ∈ U be a point of distance ε from ∂U , and
write Z
4
f (z) = 2 f (w) dw
ε π z+Bε/2
for the average of f over the ball of radius ε/2 with centre z (and similarly
define g and v). Then g(z) = g(z) by the mean value property in Proposi-
tion 5.53. By uniform continuity of f we also have f (z)−f(z) = o(1) as ε → 0.
Finally, by the convergence on average of v(z) in Lemma 5.55, v(z) = o(1)
as ε → 0. Thus
g(z) − f (z) = g(z) − f (z) − v(z) + o(1)
for all z ∈ U at distance ε from ∂U as ε → 0, where the sum of the three

functions on the right is equal to 0. This shows that g extends continuously
to ∂U and agrees there with f , and so concludes the proof of the theorem.
We finish our discussion of the Dirichlet boundary value problem by out-
lining how to establish the uniqueness of solutions by the methods we have
developed so far.
Exercise 5.57. Let U ⊆ Rd be open and bounded with smooth boundary. To avoid com-
plications arising from the geometry of the boundary, you may also suppose that U ⊆ Rd is
convex (for instance, U could be as in Figure 5.2 on p. 148). Let f be a function in C 1 (∂U )
and suppose that g1 , g2 ∈ C ∞ (U ) ∩ H 1 (U ) both solve the Dirichlet boundary value prob-
lem. Taking the difference g = g1 − g2 we obtain an element of H 1 (U ) that vanishes at ∂U
in the square-mean sense. Show that this implies g ∈ H01 (U ) and deduce that g = 0.
5.4 Further Topics
• We will continue our excursion into Sobolev spaces in Chapter 6, where

we will prove the existence of an orthonormal basis consisting of eigen-
functions of the Laplace operator (see Section 6.4).
• In Section 8.2.2 we will return to the topic of elliptic regularity one more
time and will present an argument that also gives the result at the bound-
ary of U .
Chapter 6
Compact Self-Adjoint Operators and
Laplace Eigenfunctions
There is no doubt that eigenvalues and eigenvectors are of fundamental im-

portance in linear algebra and in its applications, both within and outside
mathematics. In finite dimensions eigenvectors always exist over C because
the corresponding eigenvalues arise as zeros of a polynomial. However, even
in finite dimensions the eigenvectors are not guaranteed to give a basis of
the space (because there may be non-trivial Jordan blocks). However, if the
linear map is self-adjoint (over R or over C; see Definition 6.22) or unitary
(over C) then there is an orthonormal basis consisting of eigenvectors. That
is, in these cases the linear map can be diagonalized.
In infinite dimensions the inherent complications of linear algebra in finite
dimensions have added to them entirely new phenomena, illustrated by the
following exercise.
Exercise 6.1. (a) Let H = ℓ2 (Z) and let U : H → H be the operator defined by
by U ((xn )n∈Z ) = (xn+1 )n∈Z , which simply shifts the sequence by one step to the left.
Show that the operator U has no eigenvectors.
(b) Let H = ℓ2 (N) and define the operator S : H → H by S ((xn )n∈N ) = (xn+1 )n∈N , which
again simply shifts the sequence one step to the left, but now ‘forgets’ the first entry of the
sequence. Show that S has uncountably many different eigenvalues. In particular, deduce
that there are too many eigenvalues to hope for a diagonalization (since the space H is
separable, a ‘diagonal’ map would only have countably many eigenvalues).
There is an important class of operators for which some of the difficulties

illustrated in Exercise 6.1 do not arise. These are the compact operators which
will be defined in Section 6.1. In Section 6.2 we then prove that compact self-
adjoint operators can be diagonalized using an orthonormal basis, and relate
this to the Sturm–Liouville equation from Section 2.5.2.
Using this and results from Chapter 5, we will prove in Section 6.4 the
existence of a basis of eigenfunctions of the Laplace operator claimed in Sec-
tion 1.2 for a bounded domain U ⊆ Rd . At first sight this is surprising, since
the Laplace operator ∆ is not even bounded on L2 (U ) (it is also not defined
on all of L2 (U ), which is a related issue by the closed graph theorem (The-
orem 4.28)), but we will find a compact operator defined on all of L2 (U )
whose eigenfunctions are precisely the eigenfunctions of ∆.

168 6 Compact Self-Adjoint Operators and Laplace Eigenfunctions
6.1 Compact Operators
Bounded linear operators with finite-dimensional image have the property

that the image of a bounded set has compact closure. Requiring the latter
property gives rise to a natural generalization of the class of operators with
finite-dimensional image.
Definition 6.2. Let V and W be normed vector spaces, and let L : V → W
be a linear operator. Then L is said to be a compact operator if the closure

L B1V ⊆ W
of the image of the unit ball is compact in W . We will sometimes write K(V, W )
for the space of compact operators, and if V = W we will write K(V ) for the
space of compact operators from V to V .

We will see in Example 6.5 that L B1V is in general not closed, even
if L is a compact operator. Since compact sets are bounded, every compact
operator is also bounded, but the converse does not hold. For example, the
identity operator V → V on an infinite-dimensional normed vector space is
not a compact operator by Proposition 2.35. As noted above, if L : V → W is
a bounded operator and L(V ) is finite-dimensional, then L is a compact op-
erator. We will see many more examples after we prove a few basic properties
of compact operators.
Lemma 6.3 (Composition). Let V1 , V2 , V3 be normed vector spaces, and
let L1 : V1 → V2 and L2 : V2 → V3 be bounded operators. If L1 or L2 is a
compact operator, then so is L2 ◦ L1 .

Proof. Suppose that L1 is compact. Then L2 L1 B1V1 ⊆ L2 L1 B1V1

and the latter is compact since L1 B1V1 has compact closure and L2 is con-

tinuous. It follows that L2 L1 B1V1 is contained in a compact subset of V3 ,
so its closure is compact, and therefore L2 ◦ L1 is a compact operator.

If L2 is compact, then L2 ◦ L1 B1V1 ⊆ L2 kL1 kop B1V2 = kL1 kop L2 B1V2 ,
which is compact, and so L1 ◦ L2 is again compact.
Exercise 6.4. Let V and W be two normed vector spaces. Show that
K(V, W ) = {L : V → W | L is a compact operator}
is a linear subspace. Deduce that if V = W is a Banach space, then K(V ) = K(V, V ) is a

two-sided ideal in the Banach algebra B(V ). That is, if L ∈ K(V ) and A ∈ B(V ), then A◦L
and L ◦ A lie in K(V ).
Example 6.5. (a) The inclusion map

ı : C 1 ([0, 1]), k · kC 1 −→ C([0, 1]), k · k∞
6.1 Compact Operators 169
is a compact operator. This follows from the Arzela–Ascoli theorem (The-

orem 2.38), since
C 1 ([0,1])
ı B1 ⊆ {f ∈ C([0, 1]) | kf k∞ 6 1, f is 1-Lipschitz}.
This example shows that it is necessary to take the closure of the image and
not just the image of the closed ball: for example, the function f defined
C 1 ([0,1]) C 1 ([0,1])
by f (x) = |x − 21 | belongs to L(B1 ) but not to L(B1 ).
(b) For f ∈ C([0, 1]) and x ∈ [0, 1], define
Z x
T (f )(x) = f (t) dt.
0
Then T : C([0, 1]) → C([0, 1]) is compact, since T : C([0, 1]) → C 1 ([0, 1]) is
bounded and the inclusion C 1 ([0, 1]) → C([0, 1]) is compact by (a).
Exercise 6.6. For which pR > 1 is the operator sending f ∈ Lp ([0, 1]) to the function
x
in C([0, 1]) defined by x 7→ 0 f (t) dt a compact operator?
Many of the compact operators that we will encounter have a similar

flavour to the example above. They either map from a space of functions
with more regularity properties (in this instance, differentiability) to a space
of functions with fewer regularity properties (in this case, continuity), or are
integral operators. The next lemma is a useful tool for proving compactness
of bounded operators.
Lemma 6.7 (Uniform approximation). Let V be a normed vector space,

and let W be a Banach space. Suppose that (Ln ) is a sequence of compact
operators V → W , and suppose that Ln → L ∈ B(V, W ) as n → ∞ with
respect to the operator norm. Then L is a compact operator as well.
Lemma 6.7 improves the claim from Exercise 6.4 in that the two-sided
ideal K(V ) in B(V ) is even closed for any Banach space V (see also Exer-
cise 6.8).

Proof of Lemma 6.7. Let M = L B1V ⊆ W . Since W is assumed to be a
Banach space, M is complete. It remains to show that M is totally bounded
(see Section A.4 for the notion and for the equivalence to compactness).
Let ε > 0 and choose Ln with kLn − Lk < ε. Since Ln is compact, we know

that Ln B1V is compact and hence is totally bounded. It follows that there

exist elements w1 , . . . , wm ∈ Ln B1V with
m
[
Ln B1V ⊆ BεW (wi ).
i=1
For each wi there exists some vi ∈ B1V with kwi − Ln (vi )k < ε.
If now v ∈ B1V , then for some i ∈ {1, . . . , m} we have
kLn (v) − Ln (vi )k < 2ε.
Now kLn − Lk < ε and kvk, kvi k < 1 so
kL(v)−L(vi )k 6 kL(v)−Ln(v)k+kLn (v)−Ln (vi )k+kLn(vi )−L(vi )k < 4ε.
It follows that
m
[
L B1V ⊆ W
B4ε (L(vi )),
i=1
which implies that the points L(vi ) for i = 1, . . . , m are 5ε-dense in the

set M = L B1V . As ε was arbitrary, M is therefore totally bounded, so M
is a compact set and hence L is a compact operator.
Exercise 6.8. Continuing the discussion from Exercise 6.4, show that B(V )/ K(V ) be-
comes a Banach algebra — the Calkin algebra — by defining (A + K(V ))(B + K(V )) to
be AB + K(V ) for all A, B ∈ B(V ) and using the quotient norm k · kB(V )/ K(V ) .
Exercise 6.9. In each of the following, justify your claim.

(a) Is the inclusion map ık+1,k : H k+1 (Td ) −→ H k (Td ) from Proposition 5.3 a compact
operator?
(b) Let U ⊆ Rd be an open set. Show that the inclusion ık+1,k : Cbk+1 (U ) −→ Cbk (U ) is a
compact operator if U ⊆ Rd is compact. Show that for U = R and k = 0 (or for any k > 0),
the inclusion map ı1,0 (or ık+1,k ) is not a compact operator.
(c) Is the inclusion map C(Td ) → L2 (Td ) a compact operator?
6.1.1 Integral Operators are Often Compact
We explore here briefly the realm of integral operators and show that many
(but not all) are in fact compact operators.
Lemma 6.10 (Integral operators defined by continuous kernels). As-
sume that (X, dX ) and (Y, dY ) are compact metric spaces. Let µ be a finite
Borel measure on X, and let k be a function in C(X × Y ). Then the oper-
ator K : L2µ (X) −→ C(Y ) defined by
Z
K(f )(y) = f (x)k(x, y) dµ(x)
X
is a compact operator.
Proof. We first need to show that K is well-defined. To see this, notice that
Z
|f (x)||k(x, y)| dµ(x) 6 kf k2 kk(·, y)k2 6 kf k2 kkk∞ µ(X)1/2 ,
X
where k(·, y) denotes the function on X obtained by fixing the coordinate y

in Y . This shows that the integral defining K(f )(y) is well-defined and that
|K(f )(y)| 6 kkk∞ µ(X)1/2 kf k2 . (6.1)
We now must show that K(f ) is continuous, and in doing so we will obtain
equicontinuity of the image of the unit ball, which together with (6.1) and
the Arzela–Ascoli theorem will give the compactness of K. Since X × Y is
compact, k is uniformly continuous, and so for any ε > 0 there is a δ > 0
for which dY (y1 , y2 ) < δ implies that |k(x, y1 ) − k(x, y2 )| < ε for all x ∈ X.
Therefore
Z
|K(f )(y1 )−K(f )(y2 )| 6 |f (x)||k(x, y1 )−k(x, y2 )| dµ(x) 6 εµ(X)1/2 kf k2
X
if dY (y1 , y2 ) < δ, by the same argument as above. Hence K(f ) ∈ C(Y )

L2 (X)
and the image of the unit ball B1 µ is an equicontinuous bounded fam-
ily of functions. By the Arzela–Ascoli theorem (Theorem 2.38) the closure
L2 (X)
of K B1 µ is a compact subset of C(Y ), and so K is a compact operator.

Proposition 6.11 (Hilbert–Schmidt [46]). Let (X, BX , µ) and (Y, BY , ν)

be σ-finite measure spaces. Let k ∈ L2µ×ν (X × Y ). Then the Hilbert–Schmidt
integral operator K : L2µ (X) → L2ν (Y ) defined by
Z
K(f )(y) = f (x)k(x, y) dµ(x)
X
for ν-almost every y ∈ Y defines a compact operator.
Exercise 6.12. Assume in addition that X, Y are compact metric spaces and µ, ν are
finite measures on the Borel σ-algebras of X and Y , respectively. Deduce Proposition 6.11
in this case as a corollary of Lemma 6.10.
Proof of Proposition 6.11. Note first that

Z Z 1/2
2
|f (x)k(x, y)| dµ(x) 6 kf kL2µ |k(x, y)| dµ(x) . (6.2)
X X
Squaring and integrating over Y gives

Z Z 2
|f (x)k(x, y)| dµ(x) dν(y) 6 kf k2L2µ kkk2L2 <∞
µ×ν
Y X
by Fubini’s theorem. Thus (6.2) is finite almost everywhere, so K(f )(y) is

well-defined for ν-almost every y. The bound above also shows that
kK(f )kL2ν 6 kf kL2µ kkkL2µ×ν .
Hence K : L2µ (X) → L2ν (Y ) is well-defined, clearly linear, and

kKkop 6 kkkL2µ×ν .
If k is a simple function of the form

n
X
k(x, y) = ci 1Ai ×Bi (6.3)
i=1
for some measurable sets Ai ⊆ X, Bi ⊆ Y of finite measure and constants ci

in C, then
n
X Z
K(f ) = ci f dµ 1Bi
i=1 Ai
is a bounded operator with finite-dimensional range and so is also compact.

We wish to apply uniform approximation as in Lemma 6.7 to show that the
compactness extends to all operators of the form described in the lemma.
Since we already showed that the operator norm is bounded from above by
the L2 norm of the kernel k, we only have to show that any k ∈ L2µ×ν (X × Y )
can be written as the limit in L2µ×ν (X × Y ) of a sequence of functions (kn )
with each kn of the form (6.3). Indeed, if Kn is the operator associated to kn
then kK − Kn kop 6 kk − kn k2 → 0 as n → ∞, which together with the
previous discussion and Lemma 6.7 gives the compactness of K.
In order to show that k ∈ L2µ×ν can be obtained as the limit of simple
functions as in (6.3), note first that simple functions are dense in L2µ×ν . Hence
it is sufficient to show that a characteristic function 1D for a measurable
set DP ⊆ X × Y of finite measure can be approximated by functions of the
form i=1 1Ai ×Bi , where
n
A1 × B1 , . . . , Am × Bm
S∞
are all disjoint
S∞ and have finite µ × ν-measure. Let us write X = n=1 Xn
and Y = n=1 Yn with X1 ⊆ X2 ⊆ · · · , Y1 ⊆ Y2 ⊆ · · · and with µ(Xn ) < ∞
and ν(Yn ) < ∞ for all n > 1. Then
A = {D ∈ BX ⊗ BY | the claim above holds for D∩(Xn ×Yn ) for all n > 1}
is a σ-algebra containing all rectangles A × B for A ∈ BX and B ∈ BY . It

follows that A = BX ⊗ BY . Finally, if D ⊆ X × Y has finite measure, then

1D − 1D∩(X ×Y ) 2 −→ 0
n n L µ×ν
as n → ∞ by dominated convergence. Therefore the simple functions as

in (6.3) are indeed dense, which gives the proposition.
Exercise 6.13. Prove that the collection A in the proof of Proposition 6.11 is a σ-algebra.
Exercise 6.14. Let g ∈ L2 (Td ). Show that L2 (Td ) ∋ f 7→ f ∗g ∈ C(Td ) defines a compact
operator from (L2 (Td ), k · k2 ) to (C(Td ), k · k∞ ).
Not all integral operators are compact, as shown by the Holmgren oper-
ators.
Proposition 6.15 (Holmgren). Let (X, BX , µ) and (Y, BY , ν) be σ-finite

measure spaces. Let k : X × Y → R be measurable on X × Y , with
Z
sup |k(x, y)| dν(y) < ∞
x∈X Y
and Z
sup |k(x, y)| dµ(x) < ∞.
y∈Y X
Then the integral operator K defined by

Z
K(f )(y) = f (x)k(x, y) dµ(x) (6.4)
is a bounded operator K : L2µ → L2ν . Moreover,

Z 1/2 Z 1/2
kKk 6 sup |k(x, y)| dν(y) sup |k(x, y)| dµ(x) < ∞.
x∈X Y y∈Y X
Proof. The proof that the integral in (6.4) makes sense for ν-almost every y
in Y , and defines an element in L2ν , is less straightforward than the proof
of Proposition 6.11, and uses the Fréchet–Riesz representation theorem (Co-
rollary 3.19). Suppose that f ∈ L2µr{0} and g ∈ L2νr{0}, and consider the
integral Z
I= |f (x)k(x, y)g(y)| dµ×ν(x, y).
X×Y
Notice that for any real numbers a, b > 0 and c > 0, we always have
q q 2
c 1 ca2 b2
ab 6 ab + 2a − 2c b = + .
2 2c
Applying this and Fubini’s theorem to the definition of I with a = |f (x)|

and b = |g(y)| gives
ZZ

I6 |k(x, y)| 2c |f (x)|2 + 2c1
|g(y)|2 dµ(x) dν(y)
ZX×Y
Z Z Z
c 2 1
6 |k(x, y)|dν(y)|f (x)| dµ(x)+ |k(x, y)|dµ(x)|g(y)|2 dν(y)
2 X Y 2c Y X
Z Z
c 1
6 kf k2L2µ sup |k(x, y)| dν(y) + kgk2L2ν sup |k(x, y)| dµ(x) .
2 x∈X Y 2c y∈Y X
| {z } | {z }
sX sY
If sX or sY is 0, then k = 0 µ × ν-almost everywhere and the proposition

q kgk 2
sY Lν
holds trivially. If not, we optimize the parameter c by setting c =
sX kf k 2 , Lµ
and obtain
Z
√
|f (x)k(x, y)g(y)| dµ×ν(x, y) 6 sX sY kf kL2µ kgkL2ν .
X×Y
It follows that (x, y) 7→ f (x)k(x, y)g(y) is µ × ν-integrable on X × Y , and

that Z Z
φ : g 7−→ g(y) f (x)k(x, y) dµ(x) dν(y) (6.5)
Y X
√
is a continuous functional on L2ν with kφk 6 sX sY kf kL2µ . We conclude first
that Z
K(f )(y) = f (x)k(x, y) dµ(x)
X
is well-defined ν-almost everywhere. Using the Fréchet–Riesz representation
theorem (Corollary 3.19) the functional φ can be represented by taking the in-
√
ner product with a function in L2ν again with norm bounded by sX sY kf kL2µ .
Varying the element g in (6.5) we see that K(f ) must be this function, and
√
we obtain kK(f )kL2ν 6 sX sY kf kL2µ .
The main difference between Hilbert–Schmidt integral operators and
Holmgren integral operators is that the latter are not automatically com-
pact.
Exercise 6.16. Let X = Y = R and µ = ν = λ, the Lebesgue measure on R. Define
(
1 for kx − yk 6 1,
k(x, y) =
0 otherwise.
Show that the corresponding Holmgren operator K as defined in Proposition 6.15 is not a
compact operator on L2λ (R).
6.2 Spectral Theory of Self-Adjoint Compact Operators
There is a general spectral theory of compact operators L : V → V on Banach

spaces. However, as we will discuss later, our applications do not need that
level of generality and the statement and proof for the simpler case of self-
adjoint operators is significantly easier. For these reasons we will restrict to
that case below and refer to Lax [59, Ch. 21] for the general result.
6.2 Spectral Theory of Self-Adjoint Compact Operators 175
6.2.1 The Adjoint Operator
Let H1 , H2 be Hilbert spaces, and let A : H1 → H2 be a bounded operator.

For any fixed v2 ∈ H2 the map H1 ∋ v1 7→ hAv1 , v2 iH2 is linear and bounded
since |hAv1 , v2 i| 6 kAv1 kkv2 k 6 kAkop kv2 kkv1 k. Therefore, by the Fréchet–
Riesz representation theorem (Corollary 3.19) applied to H1 there exists some
uniquely determined element, which will be denoted A∗ v2 ∈ H1 , with the
properties that
hv1 , A∗ v2 iH1 = hAv1 , v2 iH2 (6.6)
for all v1 ∈ H1 , and
kA∗ v2 k 6 kAkop kv2 k. (6.7)
This defines a bounded operator A∗ : H2 → H1 , called the adjoint of A. This
map is indeed linear, since
hv1 , A∗ (v2 + αv2′ )i = hAv1 , v2 + αv2′ i = hAv1 , v2 i + α hAv1 , v2′ i

= hv1 , A∗ v2 i + α hv1 , A∗ v2′ i = hv1 , A∗ v2 + αA∗ v2′ i
for v1 ∈ H1 , v2 , v2′ ∈ H2 and any scalar α. By (6.7) we have kA∗ kop 6 kAkop ,
so A∗ is bounded. Taking conjugates in (6.6) implies that A∗∗ = A,
so kAkop = kA∗ kop .
Essential Exercise 6.17. (a) Show that the map A 7→ A∗ is semi-linear.

(b) Let A : H1 → H2 and B : H2 → H3 be bounded operators between
Hilbert spaces. Show that (BA)∗ = A∗ B ∗ .
Exercise 6.18. Show that im(T )⊥ = ker(T ∗ ) and ker(T )⊥ = im(T ∗ ) for a linear oper-
ator T between Hilbert spaces.
The adjoint operation allows us to give an alternate definition of unitarity.
Definition 6.19. An operator U : H1 → H2 between two Hilbert spaces is

unitary if U ∗ U = IH1 and U U ∗ = IH2 , which we also write as U ∗ = U −1 .
Exercise 6.20. (a) Show that an operator U : H1 → H2 is unitary in the sense of Defini-
tion 6.19 if and only if it is a bijective isometry (that is, a bijection with kU vkH2 = kvkH1
for all v ∈ H1 ).
(b) Suppose that U : H1 → H2 is an isometry. Show that U ∗ U = IH1 and that U U ∗ is
the orthogonal projection Pim(U ) from H2 onto the closed subspace im(U ) ⊆ H2 .
Exercise 6.21 (Von Neumann’s mean ergodic theorem [78]). Let U : H → H be a

unitary operator on a Hilbert space H and let I = {v ∈ H | U v = v} be the subspace of
invariant vectors.
(a) Show that I isPclosed and that {U v − v | v ∈ H} is dense in I ⊥ .
(b) Show that n n−1
1 n
j=0 U v → PI v as n → ∞, where PI is the orthogonal projection
onto I.
Definition 6.22. A bounded operator A : H → H on a Hilbert space H is

called self-adjoint if A∗ = A.
The next exercise revisits the maps introduced in Exercise 6.1.

Exercise 6.23. (a) Define U : ℓ2 (Z) → ℓ2 (Z) by U ((xn )n∈Z ) = (xn+1 )n∈Z . Show that
the operator U is unitary.
(b) Define S : ℓ2 (N) → ℓ2 (N) by S ((xn )n∈N ) = (xn+1 )n∈N . Show that kSkop = 1, but
that S is not an isometry.
(c) Define T : ℓ2 (N) → ℓ2 (N) by T ((xn )) = (0, x1 , x2 , . . . ), which shifts the sequence to
the right and fills in the first entry of the new sequence with a 0. Show that kT kop = 1,
that T = S ∗ is an isometry, is not surjective, and has no eigenvectors.
Exercise 6.24 (Decomposition of isometries). Let H be a Hilbert space and U : H → H

an isometry. Show that there exists an orthogonal decompositionL H= Hshift ⊕ Hunitary
into two closed subspaces with the property that Hshift = n
n>0 U V for some closed
subspace V , and U |Hunitary : Hunitary → Hunitary is unitary.
The next exercise is not simply another example. It turns out to really be
the basis of the powerful spectral theory of normal bounded operators as well
as self-adjoint unbounded operators.
Essential Exercise 6.25. Let (X, B, µ) be a measure space, H = L2µ (X),
let g : X → C be a measurable function, and let Mg be the multiplication
operator Mg : f 7→ gf for f ∈ H.
(a) What properties of g ensure that Mg : H → H is well-defined and
bounded? What is kMg kop ?
(b) When is Mg a bounded self-adjoint operator? That is, what property of g
is equivalent to hMg f1 , f2 i = hf1 , Mg f2 i holding for all f1 , f2 ∈ H? What
property of g is equivalent to Mg being unitary?
(c) When does Mg have λ ∈ C as an eigenvalue?
(d) Suppose that X = R and let g(x) = x, and assume that µ is an arbitrary
finite compactly supported Borel measure on R. Characterize in terms of µ
the property that Mg can be diagonalized. That is, characterize the property
that H has an Porthonormal basis
∞ P∞{en | n ∈ N} and a sequence of scalars (λn )
such that Mg ( n=1 xn en ) = n=1 λn xn en for every (xn ) ∈ ℓ2 (N).
Exercise 6.26. Let H = Cn be a finite-dimensional Hilbert space with respect to the
usual inner product. Show that the linear operator defined by a matrix A = (ai,j ) is self-
adjoint if and only if A is equal to its own conjugate transpose (that is, ai,j = aji for
all i, j). Such matrices are also called Hermitian.
6.2.2 The Spectral Theorem
The spectral theorem presented here generalizes to an infinite-dimensional

setting the familiar fact that a Hermitian matrix has real eigenvalues and
can be diagonalized using a unitary matrix. We will assume separability of
Hilbert spaces in this section in order to make use of an orthonormal basis
that consists of a sequence. Properly formulated, the next result holds more
generally, and in particular allows the kernel of A to be a non-separable space.
Both the finite-dimensional and the inseparable case can easily be extracted
from the proof we give.
Theorem 6.27 (Spectral theorem for compact self-adjoint operat-

ors). Let H be a separable infinite-dimensional Hilbert space, and let A be
a compact self-adjoint operator on H. Then there exists a sequence of real
eigenvalues (λn ) with λn → 0 as n → ∞, and an orthonormal basis {vn } of
eigenvectors with Avn = λn vn for all n > 1.
In other words, a compact self-adjoint operator is diagonalizable, each non-
zero eigenvalue has finite multiplicity, and 0 is the only possible accumulation
point of the set of eigenvalues. Given these properties — which will turn out
to be extremely useful — it is worth asking if there are any such operators.
Clearly such operators exist in the following sense. If {en } is an orthonor-
mal basis of a Hilbert space H, and (λn ) is a sequence of real numbers
with λn → 0 as n → ∞, then we may define an operator A : H → H
by !
X∞ X∞
A xn en = λn xn en
n=1 n=1
P∞
for any convergent series n=1 xn en . It may then be checked that A is com-
pact and self-adjoint. Of course, Theorem 6.27 does not tell us anything we
did not already know about such an operator.
A more interesting kind of example is found among the integral operators.
Let H = L2µ (X), where (X, B, µ) is a σ-finite measure space, and suppose
that k ∈ L2µ×µ (X × X) satisfies k(x, y) = k(y, x) for µ × µ-almost every
point (x, y) ∈ X × X. Then the operator K defined by
Z
K(f )(y) = f (x)k(x, y) dµ(x)
X
is compact by Proposition 6.11, and is self-adjoint since

Z Z
∗
hf1 , K (f2 )i = hK(f1 ), f2 i = f1 (x)k(x, y) dµ(x)f2 (y) dµ(y)
X X
Z Z
= f1 (x) f2 (y) k(x, y) dµ(y) dµ(x) = hf1 , K(f2 )i
X X | {z }
=k(y,x)
for all f1 , f2 ∈ L2µ (X) by Fubini’s theorem. Hence Theorem 6.27 applies, but
in this case it is a priori not at all clear how one could find the eigenvalues
or eigenvectors for the operator.
Example 6.28. Notice that the integral operator from Section 2.5.2 defined
by the kernel (
s(t − 1) for 0 6 s 6 t 6 1;
G(s, t) =
t(s − 1) for 0 6 t 6 s 6 1
satisfies the conditions above, and so the eigenfunctions found in Section 2.5.2
coincide with the eigenvectors which must exist by Theorem 6.27.
In fact as we saw in Section 3.4 (see Exercise 3.55(b) and its hint
on p. 566) the functions s1 , s2 , . . . form an orthonormal basis of L2 ([0, 1])
which makes K a diagonalizable operator. These notions P∞ also explain the
argument from P Section 2.5.2 quite clearly: If g = n=1 dn sn and we are
∞
looking for f = n=1 cn sn with (I + λ2 K)f = g, then (1 + λ2 µn )cn = dn for
all n ∈ N, which can be solved for cn unless λ2 = −µ−1 n and dn 6= 0.
Exercise 6.29. Let K be the Hilbert–Schmidt integral operator on L2µ (X) defined by a
kernel k ∈ L2µ×µ (X × X) with k(x, y) = k(y, x) as above. Prove that the generalized
Fredholm integral equation of the second kind f = λK(f ) + φ has a solution for any
function φ ∈ L2µ (X) if and only if λλn 6= 1 for all n, where (λn ) is the sequence of
eigenvalues of K on L2µ (X).
We will see another class of compact self-adjoint operators in Section 6.4.
6.2.3 Proof of the Spectral Theorem
Lemma 6.30 (Invariance of orthogonal complement). Let A : H → H

be a bounded operator on a Hilbert space. If V ⊆ H is an A-invariant subspace
(that is, a subspace with A(V ) ⊆ V ), then V ⊥ is A∗ -invariant.
Proof. If v ′ ∈ V ⊥ and v ∈ V , then hA∗ v ′ , vi = hv ′ , Avi = 0. As this holds

for all v ∈ V , we must have A∗ v ′ ∈ V ⊥ .
As we will see, Lemma 6.30 reduces the proof of Theorem 6.27 mostly to
finding a single eigenvector e1 , as we can then apply the lemma to V = he1 i
and A = A∗ to see that V ⊥ is A-invariant.
We now approach the central statement concerning the existence of an
eigenvalue. Before doing this, it is useful to recall how one proves the complete
diagonalizability of self-adoint operators on Rd . By compactness, we may
choose e ∈ Sd−1 = {v ∈ Rd | kvk2 = 1} such that the quadratic form hAx, xi
achieves its maximum at x = e. Using Lagrange multipliers one can then
check that e is an eigenvector of A. The vector e is then an eigenvector with
eigenvalue λ ∈ R of absolute value |λ| = kAkop . This relies in an essential way
on the compactness of the unit sphere Sd−1 , which as we know fails in infinite-
dimensional Hilbert spaces, and it is here that the additional assumptions
on A will become important.
Lemma 6.31 (The norm and the quadratic form). Let A : H → H be

a bounded self-adjoint operator on a Hilbert space. Then
kAk = sup |hAx, xi| . (6.8)

kxk61
Notice that if A is self-adjoint, then hAx, xi ∈ R for all x ∈ H, since
hAx, xi = hx, Axi = hA∗ x, xi = hAx, xi .

Proof of Lemma 6.31. Let us write
s(A) = sup |hAx, xi|

kxk61
for the right-hand side of (6.8). Then, by the Cauchy–Schwarz inequality,
|hAx, xi| 6 kAxkkxk 6 kAkkxk2 6 kAk
for all x ∈ H with kxk 6 1. Hence s(A) 6 kAk.

The proof of the opposite inequality is slightly more involved. For λ > 0,
we have

A(λx ± λ1 Ax), λx ± λ1 Ax = hA(λx), λxi + A2 ( λ1 x), A( λ1 x) ± 2kAxk2 .
Taking the difference of the two equations we see that

4kAxk2 = A(λx + λ1 Ax), λx + λ1 Ax − A(λx − λ1 Ax), λx − λ1 Ax

6 s(A) kλx + λ1 Axk2 + kλx − λ1 Axk2
since the two inner products appearing are of the form hAu, ui and thus
satisfy |hAu, ui| 6 s(A)kuk2 . Now we apply the parallelogram identity (3.4)
to obtain
4kAxk2 6 2s(A) λ2 kxk2 + λ12 kAxk2 .
kAxk
Assuming that kAxk 6= 0, we set λ2 = kxk and get

kAxk kxk
4kAxk2 6 2s(A) kxk2 + kAxk2 = 4s(A)kAxkkxk,
kxk kAxk
and so kAxk 6 s(A)kxk for all x ∈ H. This shows that kAk 6 s(A).
We are now ready to prove the existence of an eigenvector.
Lemma 6.32 (Main step: finding the first eigenvector). Let A be a

compact self-adjoint operator on a non-trivial Hilbert space. Then either kAk
or −kAk is an eigenvalue of A.
Proof. If kAk = 0 then A = 0 and there is nothing to prove, so we may

assume that kAk > 0. By Lemma 6.31 there exists a scalar α with |α| = kAk
and a sequence (xn ) in H with kxn k = 1 for all n > 1 and with hAxn , xn i → α
as n → ∞. As remarked before the proof of Lemma 6.31, hAxn , xn i is real
and so α ∈ {kAk, −kAk}. Now notice that
0 6 kAxn − αxn k2 = kAxn k2 − 2ℜ (α hAxn , xn i) + α2 kxn k2

= kAxn k2 − 2α hAxn , xn i + α2
6 2kAk2 − 2α hAxn , xn i −→ 2kAk2 − 2kAk2 = 0
as n → ∞. In particular, this shows that (Axn ) converges if and only if (αxn )

converges, and that the limits agree if this is the case. However, since kxn k = 1
and A is a compact operator there exists a subsequence (xnk ) for which Axnk
converges, say
Axnk −→ αx (6.9)
as k → ∞ for some x ∈ H. Therefore, αxnk → αx as k → ∞ as well, and
hence xnk → x as k → ∞. Since A is continuous, we deduce that Axnk → Ax
as k → ∞. Together with (6.9) we have Ax = αx, and since kxnk k = 1 for
all k > 1 and xnk → x as k → ∞ we also have kxk = 1 and hence x 6= 0.
Now we combine the arguments above to prove the spectral theorem for
compact self-adjoint operators.
Proof of Theorem 6.27. By assumption, H is an infinite-dimensional Hil-
bert space and A : H → H is a compact self-adjoint operator. By Lemma 6.32
there exists an eigenvector e1 with eigenvalue λ1 ∈ R, and with |λ1 | = kAk.
We may assume without loss of generality that ke1 k = 1.
Suppose now, for the purposes of an induction argument, that we have
already found orthonormal eigenvectors e1 , . . . , en with corresponding eigen-
values λ1 , . . . , λn . Let Vn = he1 , . . . , en i be the linear span of these vectors,
and notice that A(Vn ) ⊆ Vn since they are eigenvectors for A. By Lemma 6.30
we have A∗ (Vn⊥ ) ⊆ Vn⊥ , but since A∗ = A this means that A(Vn⊥ ) ⊆ Vn⊥ .
Write
An = A|Vn⊥ : Vn⊥ −→ Vn⊥
for the restriction of A to Vn⊥ . Then An is a compact operator because A is
compact, and is self-adjoint because A is self-adjoint.† Therefore, we may
apply Lemma 6.32 again to the operator An : Vn⊥ → Vn⊥ to find an-
other eigenvector en+1 orthogonal to e1 , . . . , en with eigenvalue λn+1 sat-
isfying |λn+1 | = kAn k, and ken+1 k = 1.
Repeating the argument, we find an orthonormal sequence (en ) of eigen-
vectors with Aen = λn en and λn ∈ R. We need to show that λn → 0
as n → ∞. By construction we have
|λn+1 | = kAn k = kA|Vn⊥ k 6 kA|Vn−1

⊥ k = kAn−1 k = |λn |,
so that by induction we have
|λ1 | > |λ2 | > · · · . (6.10)
If λn 6→ 0 as n → ∞, then there is some ε > 0such that |λn | > ε for

all n > 1 by (6.10). This shows that εen = A λεn en ∈ A (B1 ) for all n > 1,
√
and since en ⊥ em for n 6= m we must have kεen − εem k = ε 2 for n 6= m.
This shows that the sequence (εen ) lies in A(B1 ) (which is compact because A
† If w , w ∈ V ⊥ then hA∗ w , w i = hw , A w i = hA∗ w , w i = hAw , w i and Aw lies
1 2 n n 1 2 1 n 2 1 2 1 2 1
in Vn⊥ , so we have A∗n = A|V ⊥ = An .
n
is a compact operator) but cannot have a convergent subsequence, which is

a contradiction.
Also, since
|λn | = kA|Vn⊥ k > kA|V ⊥ k,
where V = he1 , e2 , . . . i, we see that A|V ⊥ = 0.
Thus far we have not used the assumption that H is separable, and the
statement at the end of the last paragraph is the general result. Assuming
now that H is separable, we can choose an orthonormal basis of V ⊥ (which
might be zero, in which case the theorem is already proved, or might be
finite-dimensional). Listing this orthonormal basis of V ⊥ together with the
basis of V already constructed proves the theorem.
Exercise 6.33. Let H be a separable Hilbert space, and let A1 , A2 , . . . be a sequence of

commuting self-adjoint bounded operators on H. Using Theorem 6.27, state and prove a
simultaneous spectral theorem for the sequence assuming either of the properties below:
(1) An is compact for all n > 1; or
(2) A1 is compact and ker(A1 ) = {0}.
6.2.4 Variational Characterization of Eigenvalues
†
In the following we let A be a compact self-adjoint operator on a separable
infinite-dimensional Hilbert space H (or a Hermitian matrix in Matn,n (C)).
Applying Theorem 6.27 we find a (finite or countable) sequence of positive
eigenvalues ϕ1 (A) > ϕ2 (A) > · · · > 0 and a (finite or countable) sequence of
negative eigenvalues ν1 (A) 6 ν2 (A) 6 · · · < 0, with corresponding orthonor-
mal eigenvectors v1 , v2 , . . . and w1 , w2 , . . ., respectively, so that
X X
A= ϕj (A)vj ⊗ vj∗ + νj (A)wj ⊗ wj∗ ,
j j
where we define v ∗ (w) = hw, vi and u ⊗ v ∗ (w) = v ∗ (w)u for u, v, w ∈ H. This

decomposition is known as the spectral resolution of A.
In many situations it is useful to be able to say something about the
eigenvalues of A + B (even for Hermitian matrices A and B) in terms of
the eigenvalues of A and of B, a fundamentally non-linear problem. The
variational approach to finding eigenvalues dates back to Cauchy’s interlacing
theorem [17] (see Exercise 6.36). There are three elementary observations that
can be made in this direction.
• Assuming that H is infinite-dimensional the spectral resolution shows
that the numerical range {hAv, vi | kvk = 1} coincides with the real
interval [ν1 , ϕ1 ] unless there are no negative or positive eigenvalues. In
† The material of this subsection motivates some later arguments in this chapter but is
strictly speaking not necessary.

the former case we obtain the numerical range (0, ϕ1 ] or [0, ϕ1 ] (and hence
set ν1 = 0) and in the latter case we obtain [ν1 , 0) or [ν1 , 0] (and hence
set ϕ1 = 0). In particular,
ϕ1 (A) = sup hAv, vi , (6.11)

kvk=1
where the supremum is achieved if ϕ1 > 0, and
ν1 (A) = inf hAv, vi , (6.12)

kvk=1
which is again achieved if ν1 < 0. Thus ϕ1 (A + B) 6 ϕ1 (A) + ϕ1 (B)

and ν1 (A + B) > ν1 (A) + ν1 (B).
• In the case of a Hermitian matrix A ∈ Matn,n (C), we do not have to
distinguish between positive and negative eigenvalues and may simply
write λ1 (A) 6 · · · 6 λn (A) for its eigenvalues. Viewing the functions λj
on the linear space of Hermitian matrices the above applies (again without
giving 0 a special role) as well, and we see that λn is a convex function
and λ1 a concave one.
• Also note that the trace map tr : Matn,n (C) −→ C is linear. Hence
λ1 (A + B) + · · · + λn (A + B) = λ1 (A) + · · · + λn (A) + λ1 (B) + · · · + λn (B)

(6.13)
for any A, B ∈ Matn,n (C).
More detailed assertions about the relationships between the eigenvalues of
Hermitian matrices are the subject of Horn’s conjecture.(16) We will not go
into the details of this, but state as exercises some special cases which can be
proved with elementary methods and which are widely used in other parts of
mathematics. The first of these is the min-max principle.
Exercise 6.34. Generalize the identities (6.11) and (6.12) by proving the Courant–
Fischer–Weyl theorem or min-max principle for compact self-adjoint operators as follows.
Fix some k > 1 and in the following let V vary over all k-dimensional subspaces of an
infinite-dimensional separable Hilbert space H. Show that
inf max hAv, vi = νk (A),

V v∈V,kvk=1
where we set νk (A) = 0 if there are fewer than k negative eigenvalues. Similarly
sup min hAv, vi = ϕk (A), (6.14)

V v∈V,kvk=1
where we set ϕk (A) = 0 if there are fewer than k positive eigenvalues. Formulate and prove
the result also for Hermitian matrices.
Exercise 6.35. Deduce from Exercise 6.34 the Weyl monotonicity principle (17) as follows.
For compact self-adjoint operators A and B write A 6 B if hAv, vi 6 hBv, vi for all v. Show
that if A 6 B then νj (A) 6 νj (B) and ϕj (A) 6 ϕj (B) (where we set νj = 0 and ϕj = 0
if there are not sufficient eigenvalues of the necessary sign) for all j. Formulate and prove
the result also for Hermitian matrices.
6.3 Trace-Class Operators 183
Exercise 6.36. Use Exercise 6.34 to prove Cauchy’s interlacing theorem as follows.
Let A ∈ Matn,n (C) be a Hermitian matrix. A matrix B ∈ Matm,m (C) with m 6 n is called
a compression of A if there is an orthogonal projection Q from Cn onto an m-dimensional
subspace with QAQ∗ = B. Show that λj (A) 6 λj (B) 6 λn−m+j (A) for 1 6 j 6 m in this
case.
6.3 Trace-Class Operators
†
The trace is undoubtedly one of the fundamental functions on the Pspace of
matrices. Recall that for any n > 1 the trace is defined by tr(A) = nk=1 Akk
for all A = (Ajk ) ∈ Matn,n (C) and that it satisfies tr(AB) = tr(BA) for A
and B in Matn,n (C), so that
tr(S −1 AS) = tr(A) (6.15)
for any A ∈ Matn,n (C) and S ∈ GLn (C). The identity (6.15) means that
the trace is well-defined on the space of linear maps of a finite-dimensional
vector space (specifically, independent of the choice of basis). Using a Hilbert
space structure on Cn and fixing an orthonormal basis v1 , . . . , vn we note
that hAvj , vk i is the coefficient of vk when expressing Avj in terms of the
orthonormal basis for j, k = 1, . . . , n. Hence
n
X
tr(A) = hAvj , vj i.
j=1
It is desirable to extend the definition of the trace to operators on an

infinite-dimensional Hilbert space H. However, since tr(In ) = n for the iden-
tity matrix In ∈ Matn,n (C) and all n > 1, it is clear that the trace cannot
have a reasonable definition on all operators on H, and in particular not on
the identity. The following definition gives the natural domain of the trace
functional.
Definition 6.37. Let H be a Hilbert space. A linear operator A : H → H is

called trace-class if its trace-class norm
N
X
kAktc = sup |hAvn , wn i|
(vn ),(wn ) n=1
is finite, where the supremum is taken over all integers N > 0 and over any
two finite lists of orthonormal vectors (v1 , . . . , vN ) and (w1 , . . . , wN ) of the
same length N .
† In this section we present an important class of compact operators. However, it is not
needed for the further developments in this volume.

In the following we will assume that H is separable and complex (once

again separability is only needed to simplify the notation and some steps
in the proofs, but is not crucial for the results and as every real Hilbert
space H has a complexification HC = H ⊗R C = H ⊕ iH as in Exercise 6.51,
the assumption that H is a complex vector space is also not a significant
restriction). We list a few consequences of this definition for trace-class op-
erators A, B : H → H, which in particular justify our calling it a norm.
1
• If v ∈ H is a unit vector with Av 6= 0 then we may set w = kAvk Av, and
apply the definition with N = 1, v and w to see that
kAvk = hAv, wi 6 kAktc ,
so kAkop 6 kAktc and hence a trace-class operator is bounded.

• If α is a scalar, then
N
X
kαAktc = sup |hαAvn , wn i| = |α|kAktc .
(vn ),(wn ) n=1
• The triangle inequality follows, since
N
X
kA + Bktc = sup |h(A + B)vn , wn i|
(vn ),(wn ) n=1
N
X
6 sup |hAvn , wn i| + |hBvn , wn i| 6 kAktc + kBktc .
(vn ),(wn ) n=1
Thus the space TC(H) = {A ∈ B(H) | kAktc < ∞} of trace-class operat-

ors is a linear subspace of the space of bounded operators, and k · ktc is
a norm on TC(H).
• Noting that a unitary operator maps an orthonormal list of vectors to an
orthonormal list of vectors, we see that kAU ktc = kU Aktc = kAktc for
any unitary U : H → H.
• By writing a bounded operator of H as a linear combination of four
unitary operators (see Lemma 6.38 below) we see that if A ∈ TC(H)
and B ∈ B(H) then AB, BA ∈ TC(H).
Lemma 6.38 (Four unitary operators). Any bounded operator on a sep-
arable complex Hilbert space may be written as a linear combination of four
unitary operators.
This will be shown using the spectral theory of self-adjoint operators in
Section 12.4.2 (after Corollary 12.45) and will be used here as a black box.
Theorem 6.39 (Trace functional). Let H be a separable complex Hilbert
space. Then there exists a linear functional tr : TC(H) −→ C with the fol-
lowing properties:
(1) | tr(A)| 6 kAktc ,

(2) tr(A) = tr(U −1 AU ), and
(3) tr(AB) = tr(BA)
for all A ∈ TC(H), B ∈ B(H) and unitary U ∈ B(H). Moreover,
∞
X
(4) tr(A) = hAvn , vn i
n=1
for any A ∈ TC(H) and orthonormal basis (vn ) of H.
Proof of Theorem 6.39: assuming independence. Using wn = vn for

all n > 1 in the definition of kAktc shows that
∞
X
|hAvn , vn i| 6 kAktc ,
n=1
so the right-hand side of (4) converges absolutely and gives a definition of

the linear functional tr satisfying (1).
The difficult part of the theorem is to show that in (4) the right-hand
side is independent of the choice of the orthonormal basis. Assuming this
for now, property (2) follows at once as the operation sending A to U −1 AU
corresponds to choosing the orthonormal basis (U vn ) in place of (vn ). In
particular, tr(AU ) = tr(U A) for all A ∈ TC(H), and by applying Lemma 6.38
we can write every bounded operator B as a linear combination of four unitary
operators and deduce by linearity of the trace that (3) holds as well.
For the proof that the right-hand side of (4) is independent of the choice
of the orthonormal basis two lemmas are needed.
Lemma 6.40 (Orthonormal approximations). Let (vn ) be an orthonor-

mal basis of a Hilbert space H and let w1 , . . . , wm be orthonormal vectors.
Then for every ε > 0 there exists some N > 1 and orthonormal vec-
tors w1′ , . . . , wm
′
∈ hv1 , . . . , vN i satisfying kwj − wj′ k < ε for j = 1, . . . , m.
Proof. Let πN : H → hv1 , . . . , vN i be the orthogonal projection. By the

properties of an orthonormal basis (Proposition 3.36) we have πN (w) → w
as N → ∞ for any w ∈ H. Applying this projection to w1 , . . . , wm gives
wj′′ = πN (wj ) → wj (6.16)
as N → ∞. For every N we now apply the Gram–Schmidt procedure to

obtain
w1′ = c1 w1′′
w2′ = c2 (w2′′ − hw2′′ , w1′ i w1′ )
w3′ = c3 (w3′′ − hw3′′ , w1′ i w1′ − hw3′′ , w2′ i w2′ )
..
.
′ ′′ ′′

′′ ′ ′
wm = cm wm − hwm , w1′ i w1′ − · · · − wm , wm−1 wm−1 ,
where the constants c1 , . . . , cm > 0 are chosen to normalize the vectors to have
unit length. As w1 , . . . , wm are orthogonal and due to (6.16), a simple induc-
tion on j = 1, . . . , m shows that cj exists for all large enough N , that cj → 1
and also wj′ → wj as N → ∞.
Lemma 6.41 (Tail estimate). Let H be a Hilbert space, let A ∈ TC(H)

and let (vn ) be an orthonormal basis of H. Then for every ε > 0 there exists
′ ′
some N such that for every extension vN +1 , vN +2 , . . . of v1 , . . . , vN to an
orthonormal basis of H we have
∞
X
|hAvn′ , vn′ i| 6 ε.
n=N +1
Proof. Fix some ε > 0 and some N > 0 and suppose the claim in the
lemma does not hold for N . Then there exist orthonormal vectors w1 , . . . , wm
in hv1 , . . . , vN i⊥ such that
m
X
|hAwk , wk i| > ε. (6.17)
k=1
Now apply Lemma 6.40 to the orthonormal vectors w1 , . . . , wm and the or-
thonormal basis vN +1 , vN +2 . . . of the Hilbert space hv1 , . . . , vN i⊥ to find
some N ′ > N large enough and a very good orthonormal approxima-
tion w1′ , . . . , wm
′
to w1 , . . . , wm inside hvN +1 , vN +2 , . . . , vN ′ i. In particular,
we may suppose that (6.17) also holds for w1′ , . . . , wm ′
.
We now apply the argument above infinitely often to achieve a contra-
diction to the hypothesis that A ∈ TC(H). Indeed, set N0 = 0 to find
some N1 > N0 and orthonormal vectors w1,1 , . . . , w1,m1 in hv1 , . . . , vN1 i
so that (6.17) also holds for w1,1 , . . . , w1,m1 . Assuming we have already
found N0 < N1 < · · · < Nℓ and orthonormal vectors wj,1 , . . . , wj,mj
in hvNj−1 +1 , . . . , vNj i with the same estimate for all j = 1, . . . , ℓ, we may
apply the same argument to find wℓ+1,1 , . . . , wℓ+1,mℓ+1 in hvNℓ +1 , . . . , vNℓ+1 i
with the same properties. However, the bound
mj
ℓ X
X
ℓε < |hAwj,k , wj,k i| 6 kAktc
j=1 k=1
shows that the construction above has to stop, proving the lemma.
Proof of Theorem 6.39: P independence. With these two lemmas we are

now ready to prove that ∞ n=1 hAvn , vn i is independent of the choice of the
orthonormal basis (vn ) of H for a trace-class operator A. So let (wn ) be
another orthonormal basis of H, choose a positive ε, and choose N such that
the conclusion of Lemma 6.41 holds for both bases (vn ) and (wn ). Let
V = hv1 , . . . , vN , w1 , . . . , wN i,
′ ′
and extend v1 , . . . , vN with vectors vN +1 , . . . , vM to an orthonormal basis
′ ′
of V . Similarly, we may find an orthonormal basis w1 , . . . , wN , wN +1 , . . . , wM
of V . Define a linear map AV : V → V by sending v ∈ V to πV (Av) where πV
is the orthogonal projection H → V . Note that hAV v, wi = hAv, wi for any
two v, w ∈ V . By the tail estimate in Lemma 6.41 we have
M
X
|hAvk′ , vk′ i| 6 ε
k=N +1
and
M
X
|hAwk′ , wk′ i| 6 ε.
k=N +1
Finally, note that

N
X M
X N
X M
X
hAvk , vk i + hAvk′ , vk′ i = hAwk , wk i + hAwk′ , wk′ i
k=1 k=N +1 k=1 k=N +1
as both sides express the trace of the linear map

P AV on the finite-dimensional
space V . The choice of N now implies that ∞ hAvk , vk i is within ε of the
k=1P
PN N
finite sum k=1 hAvkP , vk i, which is within 2ε of k=1 hAwk , wk i. The latter
∞
in turn is within ε of k=1 hAwk , wk i again by the choice of N . As ε > 0 was
arbitrary the claimed independence follows.
As explained directly after the theorem, Theorem 6.39 follows from the
independence and the black box Lemma 6.38.
Proposition 6.42 (Compactness). Every trace-class operator on a com-

plex Hilbert space H is compact.
Before proving this, notice the following property of the trace-class norm.
As the supremum is taken over (vn )n=1,...,N and (wn )n=1,...,N separately, we
could multiply each wn by an appropriate scalar αn with |αn | = 1 to ensure
that
hAvn , αn wn i > 0.
Therefore we may also write
N
X

kAktc = sup hAvn , wn i ,
(vn ),(wn )
n=1
and if A ∈ TC(H) then for every ε > 0 there exist finite orthonormal
lists (vn )n=1,...,N and (wn )n=1,...,N with
N
X
hAvn , wn i > kAktc − ε.
n=1
In the proof of Proposition 6.42 we will approximate a trace-class operator

by operators with finite-dimensional range and then apply Lemma 6.7. In
order to do this, it will be useful to understand the behaviour of the trace-
class norm for matrices. For this we endow Cn and Cn+1 with the standard
inner product, and identify Cn with Cn × {0} ⊆ Cn+1 .
Lemma 6.43 (Trace-class norm for matrices). Let us assume that a

matrix A0 ∈ Matn,n (C), the vectors b, c ∈ Cn , and the scalar d ∈ C together
define
A0 b
A1 = ∈ Matn+1,n+1 (C)
ct d
and satisfy kA0 ktc > (1 − ε2 )kA1 ktc for some ε ∈ (0, 1). Then

b √

d 6 5εkA1 ktc .
Proof. From the fact that en+1 ⊥ Cn (by the identification between Cn
and Cn × {0}) and the definition of the trace-class norms, we have
kA0 ktc + |d| 6 kA1 ktc
and so
|d| 6 kA1 ktc − kA0 ktc 6 ε2 kA1 ktc 6 εkA1 ktc (6.18)
by the hypotheses. The lemma will follow from Pythagoras’ theorem after we
have shown the more delicate estimate
kbk 6 2εkA1 ktc (6.19)
for the vector b.

To highlight the main point in the argument for (6.19) let us treat the
case n = 1 first. In that case A0 = a, b, c, d ∈ C. If θ ∈ C has |θ| = 1 then
√
1 − ε2
v=
εθ
and

1
w=
0
are both unit vectors, and we may apply the definition of the trace-class
norm kA1 ktc to just these two vectors and obtain
p

|hA1 v, wi| = a 1 − ε2 + bεθ 6 kA1 ktc .
By choosing the argument of θ correctly this gives

p
|a| 1 − ε2 + |b|ε 6 kA1 ktc , (6.20)
and the assumption of the lemma implies that

p 3/2 2
kA0 ktc 1 − ε2 > 1 − ε2 kA1 ktc > 1 − ε2 kA1 ktc

> 1 − 2ε2 kA1 ktc . (6.21)
Combining (6.20) and (6.21) with kA0 ktc = |a| gives

1 − 2ε2 kA1 ktc + |b|ε 6 kA1 ktc ,
which is equivalent to (6.19).

The idea of the proof of (6.19) in the general case is similar. However,
since b is a vector for n > 2, some additional preparations are needed.
By compactness of the closed and bounded subset
2
Un (R) = {A ∈ Matn,n (C) | A∗ A = I} ⊆ Matn,n (C) ∼
= R2n
and the comment after the statement of Proposition 6.42 above, there exist
orthonormal bases v1 , . . . , vn ∈ Cn and w1 , . . . , wn ∈ Cn with
n
X
kA0 ktc = hA0 vj , wj i .
j=1
We define a unitary matrix U ∈ Matn,n (C) by requiring that U ∗ vj = wj for

all j = 1, . . . , n, so that
n
X n
X
kA0 ktc = hA0 vj , U ∗ vj i = hU A0 vj , vj i = tr(U A0 ). (6.22)
j=1 j=1
Since we have now expressed kA0 ktc as the trace of U A0 , independence of

the trace on the choice of orthonormal basis implies that (6.22) holds for an
arbitrary orthonormal basis v1′ , . . . , vn′ of Cn . In particular, from the equality
and the definition of the trace-class norm we deduce that

U A0 vj′ , vj′ > 0 (6.23)
for any choice of orthonormal basis of Cn and j = 1, . . . , n. Now let the or-
thonormal basis v1′ , . . . , vn′ of Cn be chosen so that U b = kbkvn′ . We extend U
to a unitary operator on Cn+1 by setting U en+1 = en+1 , and note that
U A1 en+1 = U (b + den+1 ) = kbkvn′ + den+1
by definition of A1 and the choice of orthonormal basis. We now consider the

two orthonormal lists
p
v1′ , . . . , vn−1
′
, 1 − ε2 vn′ + εen+1
and
U −1 v1′ , . . . , U −1 vn−1
′
, U −1 vn′ .
Using the definition of the trace-class norm kA1 ktc we get
n−1
X
p
kA1 ktc > U A1 vj′ , vj′ + 1 − ε2 hU A1 vn′ , vn′ i + εkbkhvn , vn i
j=1
p
> 1 − ε2 kA0 ktc + εkbk,
where we have used (6.23) and (6.22) in the last step. This is the analogue
to (6.20) with |a| replaced by kA0 ktc . Together with (6.21) we obtain (6.19)
in the general case.
Proof of Proposition 6.42. Let A be a trace-class operator on H. We will

construct, for every ε > 0, an operator Aε : H → H with finite-dimensional
range and with
kA − Aε kop ≪ εkAktc . (6.24)
This will imply the proposition by Lemma 6.7.
To construct Aε we choose orthonormal vectors v1 , . . . , vn and w1 , . . . , wn
in H with
Xn
hAvk , wk i > (1 − ε2 )kAktc
k=1
as in the definition of k·ktc and using the comment after Proposition 6.42. We
define Aε by setting it equal to A on hv1 , . . . , vn i and to 0 on hv1 , . . . , vn i⊥ .
For any vector v ′ ∈ hv1 , . . . , vn i and v ′′ ∈ hv1 , . . . , vn i⊥ we have
(A − Aε ) (v ′ + v ′′ ) = Av ′′ ,
and we claim that √

kAv ′′ k 6 5εkAktc kv ′′ k, (6.25)
which will imply (6.24). To prove (6.25) we may assume that
v ′′ = vn+1 ∈ hv1 , . . . , vn i⊥
is a unit vector. We write

n
X
Avn+1 = bj wj + dwn+1
j=1
for b ∈ Cn , d ∈ C, and wn+1 ∈ hw1 , . . . , wn i⊥ a unit vector.

We apply Lemma 6.43 to the matrix A1 ∈ Matn+1,n+1 (C) defined by
(A1 )jk = hAvk , wj i
for j, k = 1, . . . , n + 1. Then

A0 b
A1 = ∈ Matn+1,n+1 (C)
ct d
for some matrix A0 ∈ Matn,n (C) and c ∈ Cn . By choice of the orthonormal

lists (vn ) and (wn ) we have
n
X
(1 − ε2 )kAktc 6 hAvj , wj i = tr A0 6 k(A0 )ktc . (6.26)
j=1
Since a different choice of an orthonormal basis of Cn+1 corresponds to a

different choice of an orthonormal basis of hv1 , . . . , vn+1 i or of hw1 , . . . , wn+1 i,
it follows that kA1 ktc 6 kAktc . Arguing a bit more carefully we may find,
again by compactness, some unitary matrix U ∈ Matn+1,n+1 (C) with
n+1
X
kA1 ktc = tr(U ∗ A1 ) = hU ∗ A1 ej , ej i.
j=1
Pn+1 Pn+1
Since hU ∗ A1 ej , ej i = hAvj , k=1 ukj wk i and the vectors k=1 ukj wk ∈ H
are orthonormal for j = 1, . . . , n + 1, the inequality kA1 ktc 6 kAktc follows.
Combining the estimate kA1 ktc 6 kAktc with (6.26) and Lemma 6.43, we
get (6.25) for any v ′′ ∈ hv1 , . . . , vn i⊥ with kv ′′ k = 1.
It follows that A is the limit of Aε defined as above as ε ց 0 (with respect
to the operator norm), and so Lemma 6.7 implies the proposition.
The results above regarding the trace and the trace-class are satisfying, but
the concepts would not be important without non-trivial examples of trace-
class operators. We next discuss the relationship with the class of self-adjoint
(compact) operators, which gives us many examples.
We say that a self-adjoint operator A on a Hilbert space H is positive
if hAv, vi > 0 for all v ∈ H.
Proposition 6.44. Let H be a complex Hilbert space and A a bounded oper-

ator on H. If A is self-adjoint
P∞ and positive and (vn ) is an orthonormal basis
of H, then kAktc = n=1 hAvn , vn i.
In particular, tr(A) = kAktc , where in the case of a positive operator A

with kAktc = ∞ this extends our definition of the trace.
P
Proof of Proposition 6.44. The inequality kAktc > ∞ n=1 hAvn , vn i fol-
lows directly from the definition of thePtrace-class norm. For the opposite
∞
inequality we may suppose that S = n=1 hAvn , vn i is finite and let (xk )
and (yk ) be two orthonormal lists of length K as in the definition of the
trace-class norm. We wish to show
K
X
|hAxk , yk i| 6 S. (6.27)
k=1
Using Lemma 6.40 we can find some N > 1 and orthonormal approxima-
tions of (xk ) and (yk ) within V = hv1 , . . . , vN i. Letting N → ∞ later on,
it suffices to show (6.27) for the approximations within V and we will use
the same letters to denote the approximations. We extend the orthonormal
lists (xk ) and (yk ) to orthonormal bases of V . Using the comment after Pro-
position 6.42 we may adjust the yk once more and assume without loss of
generality that hAxk , yk i > 0 for k = 1, . . . , N without changing the value of
the left-hand side in (6.27). We also define a unitary operator U : V → V
satisfying U ∗ xk = yk for k = 1, . . . , N . In other words, we wish to estimate
K
X N
X N
X
hAxk , yk i 6 hAxk , yk i = hAxk , U ∗ xk i = tr(U AV ), (6.28)
k=1 k=1 k=1
where we let AV be the positive self-adjoint operator v ∈ V 7→ πV (Av)

and πV : H → V is the orthogonal projection. As a trace (on the finite-
dimensional space V ) can be calculated in any basis we may also calcu-
late tr(U AV ) using an orthonormal basis v1′ , . . . , vN
′
of V consisting of eigen-
vectors of AV . Let λ1 , . . . , λN in R>0 be the corresponding eigenvalues. Since
we have tr(U AV ) > 0 by (6.28), we obtain
N
X XN

tr(U AV ) = hU AV vj′ , vj′ i = λn hU vj′ , vj′ i
n=1 n=1
N
X N
X
6 λn = tr(AV ) = hAvn , vn i 6 S.
n=1 n=1
This, together with (6.28), implies (6.27), first for the approximations of (xk )
and (yk ) in V , and then using Lemma 6.40 and letting N → ∞ as indicated
earlier for any two lists of orthonormal vectors in H. Hence kAktc 6 S and
the proposition follows.
Exercise 6.45. Let A be a compact operator. Show that A has a polar decomposition of
the form A = QP where ker(A) = ker(Q) = ker(P ), Q|(ker(A))⊥ is an isometry, and P is
positive, self-adjoint, and compact. Show that P is trace-class if and only if A is.
Corollary 6.46 (Lidskiı̆’s theorem [61]). Let H be a separable complex

Hilbert space. A self-adjoint bounded operator A on H is trace-class if and
only if it is compact and its eigenvalues λn (allowing repetitions as in The-
orem 6.27) satisfy
X∞
|λn | = kAktc < ∞.
n=1
If A is indeed trace-class, then

∞
X
tr(A) = λn
n=1
and this sum converges absolutely.
Proof. Assume first that A is self-adjoint and trace-class. Then Proposi-

tion 6.42 implies that A is compact. Using the orthonormal basis consisting
of eigenvectors vn with eigenvalues
P∞ λn from Theorem 6.27 and the Pdefinition
∞
of the trace it follows that n=1 |λn | 6 kAktc < ∞ and tr(A) P∞ n=1 λn .
=
Let now A be a compact self-adjoint operator and assume n=1 |λn | < ∞,
where λn are the eigenvalues of A. Again let vn be an orthonormal basis
of H consisting of eigenvectors for A (with corresponding eigenvalues λn ).
We let αn ∈ {±1} be chosen with λn αn = |λn | for all n > 1. Define the
positive self-adjoint operator P on H by setting P vn = |λn |vn and the unitary
operator U on H by setting U vn = αn vn for all n > 1 and linearlyP extending
both to H, so that A = U P . By Proposition 6.44 we have kP ktc = ∞ n=1 |λn |,
and by the initial properties of the trace-class norm this gives
∞
X
kAktc = kU P ktc = kP ktc = |λn | < ∞,
n=1
which implies the corollary.

Let us indicate how the trace appears frequently in applications, and how
to calculate it in these special circumstances.
Exercise 6.47. Let Rℓ > d and k ∈ H ℓ (Td × Td ). Show that the Hilbert–Schmidt integral
operator K(f )(x) = Td k(x, y)f (y) dy on L2 (Td ) isRtrace-class and that the trace is given
by the integral along the diagonal, that is tr(K) = Td k(x, x) dx.
Proposition 6.48. Let X be a compact metric space, let µ be a finite measure

on X, and let k ∈ C(X × X) be a continuous kernel with the property that
the associated Hilbert–Schmidt operator K defined by
Z
K(f )(x) = k(x, y)f (y) dµ(y)
X
is trace-class. Then Z
tr(K) = k(x, x) dµ(x).
X
Proof. Let (ξℓ ) be a sequence of finite measurable partitions of X that

become finer in the sense that
max diam(P ) −→ 0 (6.29)

P ∈ξℓ
as ℓ → ∞, and assume that the sequence is refining, meaning that each

element of ξℓ is a union of elements of ξℓ+1 for ℓ > 1. For P ∈ ξℓ we also
define the unit vector
1
wP = p 1P
µ(P )
and notice that {wP | P ∈ ξℓ } is an orthonormal basis of its linear hull Wℓ
for all ℓ > 1 (if µ(P ) = 0 we do not associate a vector wP to this partition
element). Since (ξℓ ) is refining, we have Wℓ ⊆ Wℓ+1 for all ℓ > 1.
Now define an orthonormal sequence (vn ) by starting with wP for P ∈ ξ1
(in some fixed order) and extending it via Gram–Schmidt first to an orthonor-
mal basis v1 , . . . , vn(2) of W2 , then to an orthonormal basis v1 , . . . , vn(3) of W3 ,
and so on. It is clear that the assumption (6.29) implies that S the character-
istic function of every open set belongs to the closure of ℓ>1 Wℓ . This in
turn implies, as in the proof of Proposition 2.51, that
[
Wℓ = L2µ (X)
ℓ>1
and hence that (vn ) is an orthonormal basis of L2µ (X). Therefore
∞ n(ℓ)
X X X
tr(K) = hKvn , vn i = lim hKvn , vn i = lim hKwP , wP i
ℓ→∞ ℓ→∞
n=1 n=1 P ∈ξℓ
Pnℓ
since n=1 hKvn , vn i = tr(πWℓ K|Wℓ ), where πWℓ denotes the orthogonal
projection onto Wℓ , and this trace can also be computed in the orthonormal
basis {wP | P ∈ ξℓ }. Now we may use the definition of K to see that
X X Z X Z
hKwP , wP i = 1
µ(P ) K(1P ) dµ = 1
µ(P ) k(x, y)dµ×µ(x, y).
P ∈ξℓ P ∈ξℓ P P ∈ξℓ P ×P
Now fix ε > 0 and use uniform continuity of k to find an ℓ sufficiently large
to ensure that
|k(x, y) − k(x, x)| < ε
whenever x, y ∈ P for some P ∈ ξℓ . We may also suppose that ℓ is large
enough to have X

tr(K) − hKwP , wP i < ε.
P ∈ξℓ
Together we see that

Z

tr(K) − k(x, x) dµ(x)

X Z Z
1
6 ε + k(x, y) dµ×µ(x, y) − k(x, x) dµ(x)
P ∈ξℓ µ(P ) P ×P P

X 1 Z

6ε+ (k(x, y) − k(x, x)) dµ×µ(x, y) 6 (1 + µ(X)) ε.

µ(P ) P ×P | {z }
P ∈ξℓ
<ε
As ε > 0 was arbitrary, the proposition follows.
Exercise 6.49. Generalize Proposition 6.48 by assuming that X is a σ-compact, locally

compact, metric space, µ is locally finite, and k ∈ C(X × X) ∩ L2µ×µ (X × X) defines a
trace-class integral operator.
Exercise 6.50. Let H be a (separable) Hilbert space.

(a) Show that TC(H) is complete (and hence a Banach space) with respect to the trace-
class norm.
(b) What is the closure of TC(H) with respect to the operator norm?
Exercise 6.51. Let H be a real Hilbert space.

(a) Define HC = H ⊗R C = H ⊕ iH and define
ha1 + ia2 , b1 + ib2 iC = ha1 , b1 i + ha2 , b2 i + i(ha2 , b1 i − ha1 , b2 i).
Show that h·, ·iC is a complex inner product making HC into a complex Hilbert space.
(b) We used Lemma 6.38 (concerning complex Hilbert spaces) twice in this section. Use (a)
to show that the results of this section also hold in the case of a real Hilbert space.
Exercise 6.52. Let (T, B, µ) be a probability space, H a Hilbert space, and
T ∋ t 7−→ At ∈ TC(H)
a mapR such that t ∈ T 7→ hAt v, wi is measurable

R for every v, w ∈ H.
R Also suppose
that T kAt ktc dµ(t) < ∞. Show that A = A R t dµ(t) (defined via v 7→ At v dµ(t) as in
Section 3.5.4) is trace-class, and that tr(A) = tr(At ) dµ(t).
The next two exercises give a tool for showing that certain Hilbert–Schmidt
integral operators (as in Proposition 6.48) are trace-class.
Exercise 6.53. Let H be a Hilbert space with an orthonormal basis (en ). Define the
Hilbert–Schmidt norm X
kAk2HS = |hAej , ek i|2
j,k
and the space of Hilbert–Schmidt operators HS(H) = {A ∈ B(H) | kAkHS < ∞}.
(a) Show that A ∈ HS(H) if and only if A∗ ∈ HS(H), and that kA∗ kHS = kAkHS for
all A ∈ HS(H).
(b) Show that the definition of the Hilbert–Schmidt norm is independent of the choice of
orthonormal basis.
(c) Show that HS(H) forms a two-sided ideal in B(H). That is, for any A ∈ HS(H)
and B ∈ B(H) we have AB ∈ HS(H) and BA ∈ HS(H).
(d) Find an inner product on HS(H) which induces the norm k · kHS , and show that HS(H)
is a Hilbert space with this inner product.
(e) Show that HS(H) is also a Banach algebra, meaning that kABkHS 6 kAkHS kBkHS .
(f) Show that HS(H) is a closed subspace of B(H) if and only if H is finite-dimensional.
(g) Show that every Hilbert–Schmidt operator is compact.
(h) Assume now that H = L2 ((0, 1)). For every k ∈ L2 ((0, 1)2 ) we define the associated
Hilbert–Schmidt integral operator as in Proposition 6.11. Show that the space of Hilbert–
Schmidt integral operators corresponds exactly to HS(H). In particular, show that for any
operator A ∈ HS(H) the corresponding kernel kA is given by
X
kA (x, y) = hAei , ej i ei (x)ej (y).
i,j
Exercise 6.54. (a) Show that if A, B ∈ HS(H) then AB is trace-class.

(b) If C is trace-class then there are operators A, B ∈ HS(H) with C = AB.
Exercise 6.55. (18) Let H be a Hilbert space with respect to the inner product h·, ·iH ,
and write k · kH for the induced norm on H. Let h·, ·i0 be a semi-inner product on H, and
write k · k0 for the induced semi-norm on H. Assume that k · k0 6 k · kH .
(a) Show that there exists a unique positive bounded self-adjoint operator A such that
hv, wi0 = hAv, wiH .
The relative trace of k · k0 with respect to k · kH is defined as the trace of A (which might
be infinity).
(b) Let k > d2 , H = H k (U ) for some open subset U ⊆ Rd , and hf, gi0 = f (x)g(x) for some
fixed x ∈ U . Show that A as in (a) has finite trace (and so k · k0 has finite relative trace
with respect to k · kH ).
(c) Let µ be a compactly supported measure on U . Combine (b) with Exercise 6.52 to show
R 1/2
that the semi-norm kf kL2 (µ) = |f |2 dµ for f ∈ H k (U ) has finite relative trace with
respect to k · kH .
6.4 Eigenfunctions for the Laplace Operator
We will prove in this section the claim from Section 1.2 that for any open
bounded subset U ⊆ Rd there is a basis of L2 (U ) consisting of eigenfunctions
of the Laplace operator such that these functions also vanish (in the square-
mean sense) at the boundary of U .
In the proof we will first go back to the case of the d-dimensional torus,
even though (or actually precisely because) we already have an orthonormal
basis consisting of eigenfunctions of the Laplacian in this setting, namely
the characters. In Section 6.4.2 we will define a right inverse of ∆ defined
on L2 (U ) for an open subset U of Rd — a setting in which we do not know
the eigenfunctions of the Laplacian. Finally, we will ask in Section 6.4.4 about
the growth rate of the eigenvalues and prove Weyl’s law for Jordan measurable
open domains. We start by stating the main theorem, which will be proved
in Section 6.4.2.
6.4 Eigenfunctions for the Laplace Operator 197
Theorem 6.56 (Existence of basis of Laplace eigenfunctions). Let U

be an open bounded subset of Rd . Then there exists an orthonormal basis {fn }
of L2 (U ) of functions in H01 (U ) which are smooth in U and have ∆fn = λn fn ,
with λn < 0 for all n > 1, and λn → −∞ as n → ∞.
6.4.1 Right Inverse and Compactness on the Torus
We already used the fact that the characters on Td are eigenfunctions of the
Laplace operator on Td in the proof of elliptic regularity (Lemma 5.48). Ob-
taining a compact self-adjoint right inverse to ∆ is quite easy on the torus Td .
R
Exercise 6.57. Define L20 (Td ) = f ∈ L2 (Td ) | Td f dx = 0 , and prove that there exists
a compact self-adjoint operator S : L20 (Td ) −→ L20 (Td ) with the property that ∆Sf = f
for all f ∈ L20 (Td ).
For the discussion on an open subset we will need the following lemma.
Lemma 6.58 (Compactness on the torus). The operator
ı1,0 : H 1 (Td ) −→ H 0 (Td ) = L2 (Td )
is compact.
Proof. For the proof we define

K = f ∈ L2 (Td ) | kf k2 6 1 and ∂ j f exists with k∂
∂ j f k2 6 1 for j = 1, . . . , d ,
1 d
H (T )
and note that ı1,0 B1 ⊆ K. Hence it suffices to show that K is totally
P
bounded. Let now f = n∈Zd an χn ∈ K. The definition of ∂ j f implies
∂ j f, χn i = −hf, ∂j χn i = 2πinj an . Using kf k2 6 1 and k∂
that h∂ ∂ j f k2 6 1
for j = 1, . . . , d we obtain
X
1 + knk22 |an |2 ≪ 1.
n∈Zd
This implies a uniformity claim for the convergence of the Fourier series of
all f ∈ K. Indeed, for any N > 1 we have
X X X
|an |2 = N −2 N 2 |an |2 6 N −2 (1 + knk22 )|an |2 ≪ N −2 ,
knk2 >N knk2 >N knk2 >N
and so we see that the above tail sum goes to zero uniformly for all f ∈ K
as N → ∞.
To see that K is totally bounded we fix some ε > 0 and choose N such
that the above statement becomes
X
|an |2 < ε2 /4
knk2 >N
P
for all f = n an χn ∈ K. Next take a finite ε/2-dense subset of the finite-
dimensional compact set
n X o
f= an χn | kf k2 6 1 and k∂j f k2 6 1 for j = 1, . . . d .
knk2 6N
Combining these statement shows that we have found an ε-dense subset of K.

As ε > 0 was arbitary,
K is totally bounded, which implies that the closure
H 1 (Td )
of ı1,0 B1 is compact.
Exercise 6.59. Consider the map ık,ℓ : H k (Td ) → H ℓ (Td ) for k > ℓ > 0.
(a) Characterize those k and ℓ for which the map ık,ℓ is compact.
(b) Characterize those k for which the map ık,0 ı∗k,0 is Hilbert–Schmidt class.
(c) Characterize those k for which the map ık,0 ı∗k,0 is trace-class.
6.4.2 A Self-Adjoint Compact Right Inverse on Open Subsets
The following provides the link between the Laplace operator and our dis-
cussion of compact self-adjoint operators in Theorem 6.27. The compactness
claim is a special case of Rellich’s Theorem.
Proposition 6.60 (Self-adjoint compact right inverse). Let U ⊆ Rd
be a bounded and open subset. Using Lemma 5.41 we equip H01 (U ) with the
inner product h·, ·i1 . Then the map ı = ı1,0 : H01 (U ) −→ H 0 (U ) = L2 (U ) has
the property that ∆(ıı∗ f ) ∈ L2 (U ) exists for all f ∈ L2 (U ) and equals −f . In
other words, ∆ ◦ (−ıı∗ ) = I is the identity on L2 (U ). Finally, S = −ıı∗ is a
compact self-adjoint operator L2 (U ) −→ L2 (U ).
Proof. Recall the map ı : H01 (U ) −→ H 0 (U ) sending f 7→ f from Proposi-

tion 5.13. The adjoint is a map ı∗ : H 0 (U ) −→ H01 (U ), and so the compos-
ition ıı∗ is indeed a map from L2 (U ) to L2 (U ). By Exercise 6.17 ıı∗ (and
hence S) is self-adjoint. Now let φ ∈ Cc∞ (U ) and f ∈ L2 (U ). Then
X
X
h−ıı∗ f,∆φiL2 (U) = − ıı∗ f,∂j2 φ L2 (U) = ∂ j ıı∗ f,∂j φiL2 (U)
h∂
j j
= hı∗ f,φi1 = hf, ıφiL2 (U) = hf,φiL2 (U)
shows that ∆ ◦ (−ıı∗ ) = I, as claimed.

It remains to show that S is a compact operator. For this notice that ı is
the composition of the operators
P ı1,0 ·|U
H01 (U ) −→ H 1 (TdR ) −→ H 0 (TdR ) = L2 (TdR ) −→ L2 (U ),
where we choose R such that U ⊆ BR , P is the periodizing operator from

Lemma 5.36, ı1,0 is the operator from Proposition 5.3 that simply forgets
regularity and is compact by Lemma 6.58, and finally ·|U is the restriction
operator to U . We also note that we equip H01 (U ) with the norm derived
from the inner product h·, ·i1 in Lemma 5.40 but H 1 (Td ) with the standard
Sobolev norm. By Lemma 5.41 this is not an issue since the norm derived
from h·, ·i1 is equivalent to the standard Sobolev norm on H01 (U ). As ı is
the composition of bounded operators and a compact operator, Lemma 6.3
applies, and it follows that ı and also S are compact operators.
Proof of Theorem 6.56. Let S = −ıı∗ : L2 (U ) → L2 (U ) be the self-adjoint

compact operator from Proposition 6.60. Theorem 6.27 gives an orthonormal
basis (fn ) of eigenvectors with Sfn = µn fn and µn → 0 as n → ∞. By
Proposition 6.60 we have ∆(Sfn ) = fn , so that ∆(µn fn ) = fn . Note that
the eigenvectors fn all lie in the image of ı, and so belong to H01 . It follows
that µn 6= 0 and ∆fn = λn fn with λn = µ1n , and so we have |λn | → ∞
as n → ∞. Since S = −ıı∗ we have µn = hSfn , fn iL2 (U) = − hı∗ fn , ı∗ fn i1 6 0
so that λn → −∞ as n → ∞.
It remains to show that fn is smooth in U and ∆fn = λn fn for all n > 1,
and this is precisely the statement of Corollary 5.47.
There are very few examples of domains U for which one can write down
the eigenfunctions of the Laplace operator explicitly. Important exceptions
Rd
are rectangles U = (0, a1 ) × (0, a2 ) × · · · × (0, ad ) and balls U = BM .
Exercise 6.61. (a) Let U = (0, a1 ) × · · · × (0, ad ). Show that the functions fn arising as
eigenfunctions of the Laplace operator as in Theorem 6.56 can be chosen to take the form
(1) (d)
fn (x) = sin(λn x1 ) · · · sin(λn xd ).
(b) Let U = {(x1 , x2 ) ∈ (0, 1) × (0, 1) | x1 + x2 < 1}. Find an orthonormal basis of L2 (U )
consisting of eigenfunctions of the Laplace operator and satisfying the Dirichlet boundary
value conditions.
Exercise 6.62. Assume that d > 2 (or that d = 2 for simplicity). Let U ⊆ Rd be open
and K ⊆ U a compact subset. Let f ∈ H01 (U ) be an eigenfunction of ∆ (and of S as in
the proof of Theorem 6.56) such that ∆f = λf for some λ < −1. Show that
d 1
kf kK,∞ ≪K,U |λ| 4 + 2 kf k2 .
6.4.3 Eigenfunctions on a Drum
(19)
We now describe a concrete case of Theorem 6.56. As mentioned earlier,
a concrete description of the Laplace eigenfunction is generally impossible
unless the domain has special features. Thus a natural case beyond the open
2
rectangle considered in Exercise 6.61 is to set U to be the open unit disc B1R .
1
For a given eigenfunction f ∈ H0 (U ) of ∆ and some rotation matrix

cos φ − sin φ
k(φ) =
sin φ cos φ
for φ ∈ [0, 2π) as in Section 1.1 we may consider the function f k (x) = f (kx).
A simple calculation (which may be carried out using Proposition 1.5) shows
that f k is also an eigenfunction of ∆ on U with the same eigenvalue as f .
Since the eigenspace of H01 functions of ∆ for a given eigenvalue is finite-
dimensional, it follows that we can find for any given eigenvalue a basis
of the eigenspace with the property that every basis vector also has some
weight n ∈ Z for the action of K on U (cf. Corollary 3.89).
Fixing the weight n ∈ Z and the eigenvalue λ < 0, the partial differential
equation ∆f = λf has a convenient reformulation. In fact, a calculation
∂2 ∂2
reveals that the Laplace operator ∆ = ∂x 2 + ∂y 2 has the representation
∂2 1 ∂ 1 ∂2
∆= + +
∂r2 r ∂r r2 ∂θ2
in polar coordinates (we will also write f for the eigenfunction in polar co-
ordinates), and if f has weight n then f (r, θ) = F (r)einθ for a function F
on [0, 1]. Since f is smooth on U , F is smooth on (0, 1). Since f vanishes
on ∂U we have F (1) = 0. Moreover, if n 6= 0 we must also have F (0) = 0
(check this). Finally, the partial differential equation ∆f = λf now becomes
2
n2
the ordinary differential equation ddrF2 + r1 dF
dr − r 2 F = λF , or, equivalently,
d2 F dF
r2 +r + |λ|r2 − n2 F = 0, (6.30)
dr2 dr
with the conditions on F (0) and F (1) as explained above. The differential
equation
x2 Jn′′ + xJn′ + (x2 − n2 )Jn = 0 (6.31)
on (0, ∞) is known as Bessel’s equation and the solutions are called the Bessel
functions, one of a class of special functions introduced by the astronomer
Bessel in 1917 in connection with the problem of three bodies moving un-
der mutual gravitational attraction. The two equations (6.30) and (6.31) are
essentially equivalent by setting x = |λ|1/2 r and Jn (x) = F (|λ|1/2 r).
Since (6.31) is a linear second-order differential equation there are two
linearly independent real solutions for each λ and n. The function Jn is
characterized up to a scalar multiple by the condition that limx→0 Jn (x)
exists (see Exercise 6.63(b)). Bessel found the integral representation
Z
1 π
Jn (x) = cos (x sin t − nt) dt (6.32)
π 0
of the function Jn (we refer to Whittaker and Watson [113] for a general
treatment of special functions). We will not develop this theory further, but
refer to Figure 6.1–6.2 for a visualization of the resulting functions; in mod-
Fig. 6.1: Two eigenfunctions with weight n = 0
Fig. 6.2: Two eigenfunctions with weight n = 1
elling the behaviour of a drum the time variable is also needed, and so these
illustrations may be thought of as snapshots of an oscillating drum skin (as
alluded to in Section 1.2.2).
Exercise 6.63. Make the discussion of this section complete by the following steps.
(a) Prove that Jn as defined in (6.32) satisfies the differential equation (6.31).
(b) Show that the equation 6.31 has a solution Yn with Yn (x) → −∞ as x → 0 given by
Z π Z ∞
1 1
Yn (x) = sin (x sin t − nt) dt − ent + (−1)n e−nt e−x sinh t dt.
π 0 π 0
(The solutions Jn and Yn are referred to as Bessel functions of the first and second kind,
respectively.)
(c) Show that for every n ∈ Z there is an eigenfunction of weight n.
(d) Show that for every n ∈ Z the eigenvalues (and eigenfunctions) of weight n correspond
to the zeros of Jn .
6.4.4 Weyl’s Law
The eigenfunctions and eigenvalues arising in Theorem 6.56 are mysterious.

While their existence and some of their properties are readily proved, it is
not usually possible to describe them analytically in closed form unless the
open set is very special, so numerical approximations are often all that is
available. It is, however, possible to count them asymptotically as expressed
in this result of 1911 by Weyl [111].(20) Recall that U ⊆ Rd is said to be
Jordan measurable if it is bounded and its characteristic function is Riemann

integrable.
Theorem 6.64 (Weyl’s law). Let U ⊆ Rd be open and Jordan measurable.
Let N (T ) = |{n | |λn | 6 T }| denote the number of eigenvalues of ∆ on U with
eigenfunctions in H01 (U ) and absolute value bounded by T (with repetitions
allowed, just as in Theorem 6.56). Then
N (T )
lim = (2π)−d ωd m(U ), (6.33)
T →∞ T d/2
d
where m is the Lebesgue measure on Rd and ωd = m(B1R ) is the volume of
the unit ball in Rd .
In 1966 M. Kac [50] asked ‘Can one hear the shape of a drum?’ As we
explained in Section 1.2.2, the eigenvalues of the Laplacian on an open set U
relate directly to the frequencies at which a membrane with the shape U
would vibrate. Thus the notes one hears from a drum with shape U are
precisely related to the eigenvalues of the Laplacian and the question raised
by Kac asks whether the list of eigenvalues determines U (up to isometric
motions of Rd ). One of the consequences of Theorem 6.64 is that the size of
the drum certainly can be heard in this sense. Kac’s question was answered
in the negative.(21)
Our (by now well-established) approach is to first show the result for the
torus, and we will then apply a technique known as Dirichlet–Neumann brack-
eting to extend the proof to the general case.
Proposition 6.65. Let R > 0 and U = TdR = Rd /(2RZd ) or U = (0, R)d .
Then Weyl’s law holds for the eigenvalues of the Laplacian on U .
Proof. In both cases, write (λn ) for the eigenvalues and (fn ) for the associ-
ated eigenfunctions in H 1 (TdR ) resp. H01 (U ). In the case of TdR we know that
the basis of eigenfunctions is given by (χn ), where
−1
χn (x) = e2πi(2R) (n1 x1 +···+nd xd )
for x ∈ TdR and n ∈ Zd . The Laplace eigenvalue of χn is given by
λn = −(2π)2 (2R)−2 knk22
for all n ∈ Zd . Hence

N (T ) = {n ∈ Zd | (2π)2 (2R)−2 knk22 6 T }

= {n ∈ Zd | knk22 6 T (2R)2 (2π)−2 } = Zd ∩ BTR1/2 2R/(2π)
d
is determined by a lattice point counting problem.

Using the fundamental domain F = [− 21 , 12 )d for Zd in Rd we have
d d
⊆ (Zd ∩ BSR ) + F ⊆ BS+
R √ d R √
BS− d d
√
for any S > d (see Figure 6.3). Taking the Lebesgue measure (satisfy-
√ d d
d
√
ing m(F ) = 1) we obtain ωd (S − d) 6 Z ∩ BS 6 ωd (S + d)d and
R

d d
Z ∩ BSR
lim = ωd .
S→∞ Sd
Combining the two discussions and setting S to be T 1/2 2R/2π gives
N (T )
lim = ωd ,
T →∞ (T 1/2 2R/2π)d
or equivalently (6.33) for U = Td .
Fig. 6.3: Counting lattice points in a ball for d = 2.
We now extend the result to U = (0, R)d . For this let n ∈ Nd0 and note
that the characters (χm ) on TdR for m = (±n1 , . . . , ±nd ) all have the same
eigenvalue −4π 2 (2R)−2 knk22 for the Laplacian. Taking linear combinations of
the characters we obtain the eigenfunctions
f1 (πR−1 n1 x1 ) · · · fd (πR−1 nd xd ), (6.34)
where each fj for j = 1, . . . , d is either the sine or cosine function. If n ∈ Nd

this gives 2d linearly independent eigenfunctions, while for n ∈ Nd0rNd this
gives 2e linearly independent eigenfunctions, where e is the number of non-
zero components of n. Notice that the hull of these eigenfunctions coincides
with the linear hull of all characters, and that these are mutually orthogonal.
We claim that the functions that only involve sine functions form the ortho-
gonal basis of L2 (U ) consisting of eigenfunctions of ∆ in H01 (U ). Assuming
the claim, we obtain precisely one eigenfunction for every n ∈ Nd , so

NU (T ) = {n ∈ Nd | 4π 2 (2R)−2 knk22 6 T } = Nd ∩ BTR1/2 2R/2π ,
d
and so
NU (T ) NTd (T )
lim d/2
= lim d R d/2 = (2π)−d ωd m(U ).
T →∞ T T →∞ 2 T
For the proof of the claim we apply the discussion of even and odd functions
from Section 1.1 in d dimensions. For this we identify L2 ((0, R)d ) with the
subspace of functions in L2 ((−R, R)d ) = L2 (TdR ) that are odd with respect
to all coordinates. More precisely, given f ∈ L2 ((0, R)d ) we define f˜|U = f
and
f˜(ε1 t1 , ε2 t2 , . . . , εd td ) = ε1 · · · εd f˜(t1 , . . . , td )
for ε1 , . . . , εd ∈ {±1} and (t1 , . . . , td ) ∈ U (and the same formula then holds
for all t ∈ (−R, R)d ). Expand f˜ into eigenfunctions of the form (6.34) for
all n ∈ Nd . If g is one of these, then either g is the product only of sine
functions or it is even with respect to one or more of the variables; assume
that it is even with respect to xk . Using the substitution xk → −xk in the
inner product we obtain
hf˜, giL2 ((−R,R)d ) = h−f˜, giL2 ((−R,R)d )
and so hf˜, giL2 ((−R,R)d ) = 0. This shows that f˜ is expressed using products
of sine functions only. We also note that for any f, g ∈ L2 ((0, R)d ) we have
D E
f˜, g̃ = 2d hf, giL2 ((0,R)d )
L2 ((−R,R)d )
which may be seen by splitting (−R, R)d into 2d smaller cubes and substi-
tuting yj = ±xj for j = 1, . . . , d and x ∈ (0, R)d on each one of them. It
follows that the functions of the form x 7→ sin(πR−1 n1 x1 ) · · · sin(πR−1 nd xd )
for n ∈ Nd are an orthogonal basis of L2 (U ). As these functions also vanish
on ∂U it follows that they belong to H01 (U ) (see Exercise 6.66), proving the
proposition. The cautious reader may notice that we have only found an or-
thonormal basis of L2 (U ) in H01 (U ) consisting of eigenfunctions of ∆ as in
Theorem 6.56. However, as ∆ is not a well-defined operator it is not clear
whether this basis is the same as the one in Theorem 6.56. This is resolved
in Lemma 6.67(a).
Essential Exercise 6.66. (a) Show that the function x 7→ sin(πR−1 nx) lies
in H01 ((0, R)) for all n > 1.
(b) Formulate and show the analogous result for U = (0, R)d .
Lemma 6.67. Let U ⊆ Rd be open and bounded.

(a) If a function f ∈ H01 (U ) ∩ C ∞ (U ) satisfies ∆f = λf , then
hf, gi1 = −λ hf, giL2 (U) (6.35)

for all g ∈ H01 (U ). If this holds for a non-trivial f then λ < 0 and we
have S(f ) = λ−1 f (where S = −ıı∗ is as in Proposition 6.60). In particular,
the eigenspaces inside H01 (U ) of ∆ coincide with those of S.
(b) If f1 , f2 , . . . ∈ H01 (U ) ∩ C ∞ (U ) are eigenfunctions of ∆ with eigen-
values 0 > λ1 > λ2 > · · · that form an orthonormal basis of L2 (U ), then
1 1
|λ1 |− 2 f1 , |λ2 |− 2 f2 , . . . (6.36)
form an orthonormal basis of H01 (U ) with respect to h·, ·i1 . P∞

(c) Moreover, if g ∈ H01 (U ) and an = hg, fn iL2 (U) then g = n=1 an fn
converges in L2 (U ) and in H01 (U ).
Proof. For the proof of (a), suppose that f ∈ H01 (U ) ∩ C ∞ (U ) satisfies the
equation ∆f = λf . Let φ ∈ Cc∞ (U ). Then
X
hf, φi1 = h∂j f, ∂j φiL2 (U) = − h∆f, φiL2 (U) = −λ hf, φiL2 (U) .
j
Since Cc∞ (U ) is dense in H01 (U ), we obtain (6.35) for all g ∈ H01 (U ). In

particular, we may set g = f to obtain hf, f i1 = −λ hf, f iL2 (U) , which im-
plies λ < 0 if we assume that f is non-trivial. Recall that ı : H01 (U ) → L2 (U )
is defined by ı(f ) = f . With this, (6.35) gives
hı∗ (−λf ), gi1 = h−λf, ı(g)iL2 (U) = h−λf, giL2 (U) = hf, gi1
for all g ∈ H01 (U ). This implies ı∗ (−λf ) = f and S(f ) = λ−1 f .

For the proof of (b), let f1 , f2 , . . . be an orthonormal basis of L2 (U )
in H01 (U ) ∩ C ∞ (U ) as in the lemma. Then hfk , fℓ i1 = −λk hfk , fℓ iL2 (U) = 0
for k, ℓ > 1 with k 6= ℓ, and hfk , fk i1 = |λk |hfk , fk iL2 (U) = |λk | by part (a).
This shows that (6.36) forms an orthonormal list of vectors in H01 (U ) with
respect to h·, ·i1 . To see that these form an orthonormal basis of H01 (U ), sup-
pose that f ∈ H01 (U ) is orthogonal to all of these functions with respect
to h·, ·i1 . Then
0 = hf, fn i1 = −λn hf, fn iL2 (U)
by assumption and (a). Hence hf, fn iL2 (U) = 0 for all n > 1, and so we must
have f = 0 since we assumed that the sequence f1 , f2 , . . . is an orthonormal
basis of L2 (U ), and the map ı : P
H01 (U ) → L2 (U ) is injective.
∞
For the final claim, let g = n=1 bn fn be the expansion in H01 (U ), and
1 2
notice that ı : H0 (U ) → L (U ) is a bounded operator, and in particular must
send a convergent series to a convergent series, so that bn = an for all n > 1.

Our proof of Weyl’s law (Theorem 6.64) relies crucially on the following
reformulation of the counting function.
Lemma 6.68 (Variational characterization). Let U ⊆ Rd be open and

bounded. Let f1 , f2 , . . . , λ1 , λ2 , . . . , and N (T ) for T > 0 be as in The-
orem 6.56. Then

N (T ) = max dim V | V ⊆ H01 (U ), kf k1 6 T 1/2 kf kL2 for all f ∈ V (6.37)
for all T > 0.
Proof. Fix some T > 0. Suppose that |λ1 |, . . . , |λn | 6 T and |λk | > T
for all k > n (so that n = N (T )). Define V0 =Phf1 , . . . , fn i. Applying
n
Lemma 6.67(a) to each fk for k = 1, . . . , n and g = ℓ=1 aℓ fℓ we get
DX
n n
X E n
X D X n E
kgk21 = ak f k , aℓ f ℓ = ak |λk | fk , aℓ f ℓ
1 L2 (U)
k=1 ℓ=1 k=1 ℓ=1
Xn Xn 2

= |λk ||ak |2 6 T ak f k 2
L (U)
k=1 k=1
for all (a1 , . . . , an ) ∈ Cn . This already gives N (T ) = dim V0 6 max dim V

with V as in (6.37).
For the reverse inequality, assume that V ⊆ H01 (U ) has dim V > N (T ).
Then there exists a non-trivial function f ∈ V ∩ V0⊥ (because, for example,
any f ∈ V induces a linear functional on V0 by taking the inner product,
∗
and for dimension reasons the resulting linear map V → P V0 cannot be in-
jective). Since V0⊥ = hfn+1 , fn+2 , . . . i we may write f = k>n ak fk with the
sum converging in L2 (U ) and in H01 (U ) by Lemma 6.67(c). Therefore using
Lemma 6.67(a) as above we obtain
DX X E X X
kf k21 = ak f k , aℓ f ℓ = |λk ||ak |2 > T |ak |2 = T kf k2L2 ,
1
k>n ℓ>n k>n k>n
and we see that V does not satisfy the requirement in (6.37). Hence any
subspace V as in (6.37) would satisfy dim V 6 dim V0 = N (T ) and the
lemma follows.
Proof of Theorem 6.64. Notice first that Lemma 6.68 implies for disjoint
open subsets U1 and U2 of a bounded open set U the sub-additivity
NU1 (T ) + NU2 (T ) 6 NU (T ), (6.38)
where we write NU ′ (T ) for the counting function for an open and bounded
domain U ′ ⊆ Rd . Indeed, on extending functions to be zero outside Uj we may
write H01 (Uj ) ⊆ H01 (U ) for j = 1, 2 as in Exercise 5.27, and, once embedded,
we have H01 (U1 ) ⊥ H01 (U2 ), with respect to both h·, ·i1 and h·, ·iL2 (U) , since U1
and U2 are disjoint, so that we can take the direct sum of the subspaces
realising the maximum U1 and U2 appearing in Lemma 6.68. We note that —
although it is tempting to try — it is not possible to derive the estimate 6.38
directly by expanding eigenfunctions on U1 and U2 to eigenfunctions on U as

there would be no reason to expect these functions to be smooth on ∂U1 ∩ U .
The variational characterization in Lemma 6.68 avoids this issue.
Now let U ⊆ (−R, R)d be an open Jordan measurable subset. By Jordan
measurability, for any ε > 0 we can divide (−R, R)d into finitely many cubes
so that we can approximate U from the inside and from the outside by finite
disjoint unions of small cubes, as illustrated in Figure 6.4.
Fig. 6.4: Approximating U by two pixelated versions of U , one from inside and
one from outside.
Let I1 ⊔ · · · ⊔ Ik ⊆ U and U ⊆ O1 ⊔ · · · ⊔ Oℓ be the approximation from

inside and outside respectively, where I1 , . . . , Ik and O1 , . . . , Oℓ are translates
of the cube (0, R d
n ) , and choose n so large and the approximations so well
that
m ((O1 ⊔ · · · ⊔ Oℓ )r(I1 ⊔ · · · ⊔ Ik )) < ε.
By the sub-additivity mentioned above for the various counting func-
tions NU (T ) we have
NU (T ) NI1 (T ) + · · · + NIk (T )
lim inf > lim
T →∞ T d/2 T →∞ T d/2
= (2π) ωd m(I1 ⊔ · · · ⊔ Ik ) > (2π)−d ωd (m(U ) − ε)
−d
by Proposition 6.65.
On the other hand, we may add extra cubes to O1 ⊔ · · · ⊔ Oℓ to obtain
O1 ⊔ · · · ⊔ Oℓ ⊔ E1 ⊔ · · · ⊔ En = [−R, R]d
for some cubes E1 , . . . , En ⊆ [−R, R]d . Hence, again by sub-additivity,

n
X
  NO o (T )+ NEj (T )
n 1 ⊔···⊔Oℓ
X NE (T ) j=1
lim sup  NTUd/2
(T )
+ j
T d/2
 6 lim sup
T d/2
T →∞ j=1 T →∞
N(−R,R)d (T )
6 lim T d/2
= (2π)−d ωd m((−R, R)d ).
T →∞
Since for each j we have
NEj (T )
lim = (2π)−d ωd m(Ej )
T →∞ T d/2
it follows that
NU (T )
lim sup 6 (2π)−d ωd m (O1 ⊔ · · · ⊔ Oℓ ) 6 (2π)−d ωd (m(U ) + ε) .
T →∞ T d/2
Since ε > 0 was arbitrary, this and the reverse bound for the limit infimum
above prove the theorem.
Exercise 6.69. Let U ⊆ Rd be open, bounded, and Jordan measurable. Show that
|λn |
lim = (2π)2 (ωd m(U ))−2/d ,
n→∞ n2/d
where λ1 > λ2 > · · · is the ordered list of eigenvalues of ∆ on U .
Exercise 6.70. Assume that d > 2 (or, for simplicity, that d = 2), let U ⊆PRd be an open,
bounded, Jordan measurable set, and let f ∈ Cc∞ (U ). Show that the series ∞n=1 hf, fn ifn ,
with (fn ) as in Theorem 6.56 ordered as in Exercise 6.69, converges pointwise on U and
uniformly on any compact subset of U .
6.5 Further Topics
• In Section 8.2.2 we return one more time to the topic of Sobolev spaces
and study elliptic regularity up to and including the boundary of U . For
further reading in that direction, we refer to Evans [30].
• The spectral theory of compact self-adjoint operators proven here is only
the starting point. We discuss spectral theory again in Chapter 9 for
unitary operators, in Chapter 11 from a general perspective as a prepar-
ation for Chapter 12 for bounded normal operators, and in Chapter 13
for unbounded self-adjoint operators.
The reader should continue with Chapter 7 and Chapter 8 as these give
important results for the chapters that follow.
Chapter 7
Dual Spaces
Let X be a real (or complex) normed vector space. A bounded linear operator
from X into the normed space R (or C) is a (continuous) linear functional
on X. Recall that the space of all continuous linear functionals is denoted X ∗
or B(X, R) and it is called the dual or conjugate space of X. Lemma 2.54
shows that X ∗ is a Banach space with respect to the operator norm.
In Section 7.1 we prove the Hahn–Banach theorem, a fundamental tool for
constructing linear functionals with prescribed properties. We also discuss
several further consequences of the Hahn–Banach theorem concerned with
the relationship between X and X ∗ . In Section 7.2 we discuss applications
of these results. Finally, in Sections 7.3 and 7.4 we will identify the duals of
many important Banach spaces, leading to examples and counter-examples
to the property of reflexivity.
7.1 The Hahn–Banach Theorem and its Consequences
One of the most important questions one may ask of X ∗ is the following: are
there ‘enough’ elements in X ∗ ? For example, are there enough elements to
separate points? This is answered in great generality using the Hahn–Banach
theorem (Theorem 7.3 below); see Corollary 7.4.
7.1.1 The Hahn–Banach Lemma and Theorem
Even though in the main applications of the Hahn–Banach lemma the func-
tion p below is simply a norm, we will also see applications of this stronger
form of the lemma (with the stated weaker assumptions on the function p)
in Section 7.4 and Section 8.6.1.
Lemma 7.1 (Hahn–Banach lemma). Let X be a real vector space, and

assume that p : X → R is a norm-like function with the properties

210 7 Dual Spaces
p(x1 + x2 ) 6 p(x1 ) + p(x2 )
and
p(λx1 ) = λp(x1 )
for all λ > 0 and x1 , x2 ∈ X. Let Y be a subspace of X, and f : Y → R
a linear function with f (y) 6 p(y) for all y ∈ Y . Then there exists a linear
functional F : X → R such that F (y) = f (y) for y ∈ Y , and F (x) 6 p(x) for
all x ∈ X.
To stress the similarities of the assumptions on p to the definition of a

norm from Definition 2.1 (or semi-norm from Definition 2.11) we refer to the
function p as a norm-like function.
Proof of Lemma 7.1. Let K be the set of all pairs (Yα , gα ) in which Yα
is a linear subspace of X containing Y , and gα is a real linear functional
on Yα with gα (y) = f (y) for all y ∈ Y , and gα (x) 6 p(x) for all x ∈ Yα . We
make K into a partially ordered set by defining (Yα , gα ) 4 (Yβ , gβ ) if Yα ⊆ Yβ
and gα = gβ |Yα . It is clear that any totally orderedSsubset {(Yλ , gλ ) | λ ∈ I}
has an upper bound given by the subspace Y ′ = λ Yλ and the functional
defined by g ′ (y) = gλ (y) for y ∈ Yλ and λ ∈ I. That Y ′ is a subspace and
that g ′ is well-defined both follow since {(Yλ , gλ ) | λ ∈ I} is linearly ordered.
Applying Zorn’s lemma: All of this is to prepare the ground for an applic-
ation of Zorn’s lemma, which roughly speaking allows us to make a transfin-
ite induction with choices (the heart of the argument follows in the next
paragraph). Indeed, by Zorn’s lemma (see Section A.1), there is a maximal
element (Y0 , g0 ) in K. All that remains is to check that Y0 is all of X (so we
may take F to be g0 ).
Extending by one dimension: So assume for the purposes of a contradic-
tion that x ∈ X\Y0 , and let Y1 be the vector space spanned by Y0 and x.
Each element z ∈ Y1 may be expressed uniquely in the form z = y + λx
with y ∈ Y0 and λ ∈ R, because x is assumed not to be in the subspace Y0 .
Define a linear function g1 on Y1 by setting g1 (y + λx) = g0 (y) + λc, where
the constant c will be chosen to ensure that g1 is bounded by p. Note that
if y1 , y2 ∈ Y0 , then
g0 (y1 ) − g0 (y2 ) = g0 (y1 − y2 ) 6 p(y1 − y2 ) 6 p(y1 + x) + p(−x − y2 ),
so −p(−x − y2 ) − g0 (y2 ) 6 p(y1 + x) − g0 (y1 ). It follows that
A = sup {−p(−x − y) − g0 (y)} 6 inf {p(y + x) − g0 (y)} = B.

y∈Y0 y∈Y0
Choose c to be any number in the interval [A, B]. Then, by construction of A

and B,
c 6 p(y + x) − g0 (y), (7.1)

7.1 The Hahn–Banach Theorem and its Consequences 211
and
−p(−x − y) − g0 (y) 6 c (7.2)
for all y ∈ Y0 . In order to show the required bound on g1 , we consider scalars

of different sign separately. For λ > 0, multiply (7.1) by λ and substitute λ1 y
for y to obtain
λc 6 p(y + λx) − g0 (y) (7.3)
from the assumed (positive) homogeneity. Similarly, for λ < 0, multiply (7.2)
by λ, and substitute λ1 y for y to obtain λc 6 |λ|p(−x − λ1 y) − λg0 ( λ1 y). Using
the homogeneity assumption on p for |λ| = −λ we obtain (7.3) again. Since
the assumptions on g also give (7.3) for λ = 0, we obtain
g1 (y + λx) = g0 (y) + λc 6 p(y + λx)
for all λ ∈ R and y ∈ Y0 .

A contradiction: Thus we have found (Y1 , g1 ) ∈ K with (Y0 , g0 ) 4 (Y1 , g1 )
and Y0 6= Y1 . This contradicts the maximality of (Y0 , g0 ) and hence F = g0
is defined on all of X and satisfies the conclusion of the lemma.
Exercise 7.2. Let X be a real vector space and let K ⊆ X be a convex subset. Suppose
that 0 ∈ K and that for every x ∈ X there is some t > 0 with tx ∈ K. Define the gauge
function pK (x) = inf{t > 0 | 1t x ∈ K}. Show that pK is norm-like in the sense that it is
non-negative, homogeneous for positive scalars, and satisfies the triangle inequality (the
latter two being assumptions in Lemma 7.1).
For real vector spaces, the Hahn–Banach theorem follows at once (for
complex spaces a little more work is needed).
Theorem 7.3 (Hahn–Banach theorem). Let X be a real or complex
normed space, and Y a linear subspace. Then for any y ∗ ∈ Y ∗ there exists
an x∗ ∈ X ∗ such that kx∗ k = ky ∗ k and x∗ (y) = y ∗ (y) for all y ∈ Y .
That is, any linear functional defined on a subspace may be extended to
a linear functional on the whole space, without increasing the norm.
Proof of Theorem 7.3. Assume first that X is a real normed space.
Let p(x) = ky ∗ kkxk and f (x) = y ∗ (x). Apply the Hahn–Banach lemma
(Lemma 7.1) to find an extension x∗ = F to the whole space. To check
that kx∗ k 6 ky ∗ k, write x∗ (x) = θ|x∗ (x)| with θ ∈ {±1}. Then
|x∗ (x)| = θx∗ (x) = x∗ (θx) 6 p(θx) = ky ∗ kkθxk = ky ∗ kkxk.
The reverse inequality is clear, so kx∗ k = ky ∗ k.

Complex case: Now let X be a complex normed vector space, let Y ⊆ X be
a complex linear subspace, let y ∗ ∈ Y ∗ , and define a real linear functional yR∗
by yR∗ (y) = ℜ(y ∗ (y)) for y ∈ Y . Let x∗R : X → R be an extension of yR∗
with kx∗R k = kyR∗ k 6 ky ∗ k (by the real case above). Now define
212 7 Dual Spaces
x∗ (x) = x∗R (x) − ix∗R (ix),
which is once again an R-linear map from X to C. It is also C-linear since
x∗ (ix) = x∗R (ix) − ix∗R (i2 x) = ix∗R (x) − i2 x∗R (ix) = ix∗ (x).
Moreover, for y ∈ Y we have, by C-linearity of y ∗ ,
x∗ (y) = x∗R (y) − ix∗R (iy) = ℜ(y ∗ (y)) − iℜ(y ∗ (iy))

= ℜ(y ∗ (y)) + iℑ(y ∗ (y)) = y ∗ (y).
Finally, |x∗ (x)| = θx∗ (x) for some θ ∈ C with |θ| = 1, and so
|x∗ (x)| = θx∗ (x) = x∗ (θx) = x∗R (θx) 6 ky ∗ kkθxk = ky ∗ kkxk,
which shows that kx∗ k = ky ∗ k and hence the complex case of the theorem.

7.1.2 Consequences of the Hahn–Banach Theorem
Many useful results follow from the Hahn–Banach theorem.

Corollary 7.4 (Separation). Let X be a non-trivial normed vector space.
Then for any x ∈ X there is a functional x∗ ∈ X ∗ with kx∗ k = 1 and
with x∗ (x) = kxk. Hence, if z 6= y ∈ X then there exists an x∗ ∈ X ∗ such
that x∗ (y) 6= x∗ (z).
Proof. Note that we may assume without loss of generality that x 6= 0.
Apply Theorem 7.3 with Y being the linear hull of x to find an extension
of the linear map y ∗ (ax) = akxk on Y . Since |y ∗ (ax)| = |a|kxk = kaxk we
have ky ∗ k = 1, and so we find an x∗ ∈ X ∗ with kx∗ k = 1 and x∗ (x) = kxk.
For the last part, take x = y − z.
Notice finally that linear functionals allow us to decompose a vector space
(see Exercises 3.27 and 3.28): let X be a normed vector space, and x∗ ∈ X ∗ .
The null space or kernel of x∗ is the closed linear subspace
ker(x∗ ) = {x ∈ X | x∗ (x) = 0}.
If x∗ 6= 0, then there is a point x0 6= 0 such that x∗ (x0 ) = 1. Any x ∈ X can

then be written as x = z + λx0 , with λ = x∗ (x) and z = x − λx0 ∈ ker(x∗ ).
Thus, X = ker(x∗ )⊕ Y , where Y is the one-dimensional space spanned by x0 .
Exercise 7.5. Show that every finite-dimensional subspace of a normed vector space has
a closed linear complement.
The reader should compare the following result for a general normed vector
space to the characterization of the closed linear hull in Hilbert spaces (see
Corollary 3.26).
Corollary 7.6 (Closed linear hull). Let S ⊆ X be a subset of a normed

vector space. Then the closed linear hull of S is precisely the set of all x ∈ X
that satisfy x∗ (x) = 0 for all x∗ ∈ X ∗ with x∗ (S) = {0}. Equivalently,
\
hSi = ker(x∗ ).
x∗ ∈X ∗
x∗ (S)={0}
Proof. The inclusion of the left-hand side in the right-hand side is clear
since ker(x∗ ) is a closed subspace for any x∗ ∈ X ∗ . Suppose that x0 ∈/ hSi,
and let Y = hx0 i + hSi. Then the functional y ∗ defined by y ∗ (αx0 + z) = α
for z ∈ hSi is bounded. For otherwise there would exist, for every n > 1,
some scalar αn 6= 0 and some zn ∈ hSi with |αn | > nkαn x0 + zn k, which
implies that
kx0 + α1n zn k 6 n1 ,
forcing x0 ∈ hSi. Therefore, y ∗ can be extended to a continuous linear func-

tional x∗ on X which satisfies x∗ (S) = {0} but x∗ (x0 ) = 1. This shows
that x0 also does not lie in the intersection of the kernels on the right-hand
side, and hence proves the other inclusion.
Exercise 7.7. Let Y ⊆ X be a subspace of a normed linear space X. Show that
max |x∗ (x)| = inf kx − yk

kx∗ k61, y∈Y
x∗ (Y )={0}
for all x ∈ X.
Exercise 7.8. (a) Prove that if the dual space X ∗ of a real normed vector space X is
strictly convex (see Definition 2.17), then the Hahn–Banach extension of a continuous
functional on a subspace to all of X is unique.
(b) Give an explicit example of a situation in which the extension defined by the Hahn–
Banach theorem is not unique.
7.1.3 The Bidual
Corollary 7.9 (Isometric embedding into the bidual). Let X be a

normed vector space. Then
kxk = max
∗ ∗
|x∗ (x)| (7.4)
x ∈X ,
kx∗ k61
for any x ∈ X. In particular, the natural linear map
ı : X −→ X ∗∗ = (X ∗ )∗
x 7−→ ı(x)
from X into the bidual of X that sends x ∈ X to the linear functional ı(x)
defined by ı(x)(x∗ ) = x∗ (x) for x∗ ∈ X ∗ , is an isometric embedding.
214 7 Dual Spaces
Definition 7.10. A Banach space is called reflexive if the isometry ı in Co-

rollary 7.9 is a bijection (and hence an isometric isomorphism) from X to X ∗∗ .
As we will see in the next section, some Banach spaces which we have
already encountered are reflexive, but some are not.
Proof of Corollary 7.9. By definition, |x∗ (x0 )| 6 kx∗ kkx0 k 6 kx0 k for
all x∗ ∈ X ∗ with kx∗ k 6 1 and x0 ∈ X. Moreover, we may apply Corollary 7.4
to obtain some functional x∗ ∈ X ∗ of norm one with x∗ (x0 ) = kx0 k, which
proves (7.4). Now notice that
sup |x∗ (x0 )|

x∗ ∈X ∗
kx∗ k61
is, by definition, precisely the operator norm of ı(x0 ) ∈ X ∗∗ . Hence we have

shown that ı is an isometry, and linearity of ı is easy to check.
Exercise 7.11. Let Y ⊆ X be a closed subspace of a normed vector space.

(a) Show that Y ⊥ = {x∗ ∈ X ∗ | x∗ (Y ) = {0}} is a closed subspace.
(b) Show that (X/Y )∗ = Y ⊥ (that is, that there is a natural isometric isomorphism
between the two).
(c) Show that Y ∗ = X ∗ /Y ⊥ .
(d) Conclude that Y is reflexive if X is reflexive.
Exercise 7.12. Let X be a normed vector space and suppose that the dual X ∗ is separable.
Show that X is also separable. In particular, if X is separable but X ∗ is not, then X cannot
be reflexive. Find an example of a Banach space that is not reflexive for that reason.
The results developed above give another approach to the existence of

completions.
Shorter proof of Theorem 2.32. Let X be a normed vector space. By
Corollary 7.9, X is isometric to ı(X) ⊆ X ∗∗ . Set B = ı(X), which is a Banach
space by Lemma 2.54 and Exercise 2.26(b).
7.1.4 An Application of the Spanning Criterion
†
The description of the closed linear hull in Corollary 7.6 can be used as a
spanning criterion: a subset S of a Banach space X spans X (that is, has X
as its closed linear hull) if and only if there is no non-zero x∗ ∈ X ∗ with the
property that S ⊆ ker(x∗ ).
This is a powerful tool, surprisingly often even without a complete descrip-
tion of the dual space. The following result generalizes the Stone–Weierstrass
theorem on the unit interval. The full result also shows the converse, so the
divergence characterises the density.
† The result of this subsection will not be needed in the remainder of the book.
Theorem 7.13 (Müntz [76]). Suppose that (nk ) is a sequence in N with
n1 < n2 < n3 < · · ·

P∞
and with k=1 n1k = ∞, and let pn (x) = xn for n ∈ N. Then the linear hull
of {1, pn1 , pn2 , . . . } is dense in C([0, 1]).
Proof. Let Y be the closed linear hull of the set {1, pn1 , pn2 , . . . } in C([0, 1]).
By Corollary 7.6 we have to show that if ℓ ∈ C([0, 1])∗ has
ℓ(1) = ℓ(pnk ) = 0 (7.5)
for all k > 1, then ℓ = 0. In fact, it is enough to show that if ℓ ∈ C([0, 1])∗
has (7.5) for all k > 1, then ℓ(pn ) = 0 for all integers n > 1. This is because
Corollary 7.6 then shows that C[x] ⊆ Y , after which the Stone–Weierstrass
theorem (Theorem 2.40) may be applied to give Y = C([0, 1]). So assume
that ℓ ∈ C([0, 1])∗ satisfies (7.5) for all k > 1, and assume also
P∞ that there is
some n ∈ N with ℓ(pn ) 6= 0. We will show that this implies k=1 n1k < ∞.
For ζ ∈ C with ℜ(ζ) > 0, we define pζ (t) = tζ for t ∈ [0, 1] (with the
convention that 0ζ = 0). This defines the function pζ ∈ C([0, 1]) satisfy-
ing kpζ k 6 1. Moreover, we have
tζ+δ − tζ tδ − 1
lim = lim tζ = tζ log t,
C∋δ→0 δ C∋δ→0 δ
for all t ∈ [0, 1] and in fact the convergence is with respect to the k · k∞ norm
(use the complex version of the mean value theorem to check this claimed
uniformity).
Now define f (ζ) = ℓ(pζ ) for ζ ∈ C with ℜ(ζ) > 0, so |f (ζ)| 6 kℓk.
Furthermore, f is analytic for ℜ(ζ) > 0 since
f (ζ + δ) − f (ζ)
lim = ℓ tζ log t
C∋δ→0 δ
exists by the above observation regarding uniform convergence. Finally, we
have f (nk ) = 0 for k > 1 by assumption.
Now define the Blaschke product (22)
K
Y ζ − nk
BK (ζ) = ,
ζ + nk
k=1
with simple zeros BK (nk ) = 0 for k = 1, . . . , K, BK (ζ) 6= 0 for ℜ(ζ) > 0

and ζ ∈
/ {n1 , . . . , nK }, and the asymptotic formula
|BK (ζ)| −→ 1
216 7 Dual Spaces
as ℜ(ζ) → 0 or as |ζ| → ∞. Together with f (nk ) = 0 these properties show

that
f (ζ)
gK (ζ) =
BK (ζ)
is analytic for ℜ(ζ) > 0, and satisfies the estimate
|gK (ζ)| 6 (1 + δ)kℓk (7.6)
for kζk > R(δ) and ℜ(ζ) 6 ε(δ), the positive quantities R(δ) and ε(δ) depend
on δ, and δ > 0 is arbitrary.
Applying the maximum principle for gK on the half-disk
{ζ ∈ C | |ζ − ε(δ)| 6 R(δ), ℜ(ζ) > ε(δ)}
in Figure 7.1, the function |gK | must attain its maximum on the boundary
of the half-circle. As we have (7.6) on that boundary, we obtain
kgK k 6 kℓk(1 + δ)
first on the half-disk and, by decreasing ε(δ) and increasing R(δ), on all of
the right half-space. As δ > 0 was arbitrary, we obtain kgK k∞ 6 kℓk.
0 ε
Fig. 7.1: Applying the maximum principle.
Recall that n was chosen so that f (n) = ℓ(pn ) 6= 0. For ζ = n this shows
that
YK Y K
n + nk
1 + 2n = −1 kℓk
nk − n n − nk = |BK (n)| 6 |f (n)| < ∞,
k=1 k=1
meaning that we have found an upper bound for the product on the left-hand
side independent of K. Notice that nk > n for all but finitely many k ∈ N.
7.2 Banach Limits, Amenable Groups, and the Banach–Tarski Paradox 217
Taking the logarithm and using the fact that x ≪ log(1 + x) for all x ∈ [0, 1],
PK
it follows that the sum k=1 nk1−n has an upper bound independent of K.
Multiplying the series term-by-term with nkn−n (and noticing that nkn−n →1
P∞ 1 k k
as k → ∞), it follows that k=1 nk < ∞, as claimed. This contradicts our
assumption, and the theorem follows.
7.2 Banach Limits, Amenable Groups, and the

Banach–Tarski Paradox
In this section we start the discussion of amenability and related topics.

Amenability will be one of two (quite different) functional-analytic properties
of a group that we will discuss in Chapter 10.
7.2.1 Banach Limits
On the space c(N) = {(xn )n∈N ∈ ℓ∞ (N) | limn→∞ xn exists} we have the
natural linear functional lim defined by
c(N) ∋ (xn )n∈N 7−→ lim ((xn )n∈N ) = lim xn .

n→∞
A natural question is to ask if this rather obvious functional — taking the

limit of sequences that do have a limit — might have an extension to all of
the much larger space ℓ∞ (N). The Hahn–Banach theorem is built for just
such situations, and using it we readily find such a generalized limit in the
form of a linear functional.
Corollary 7.14 (Banach limit). There exists a linear functional, which we
∗
will denote LIM ∈ (ℓ∞ (N)) , with norm one, which may be thought of as a
generalized limit since it satisfies the following properties:
• LIM((an )) = limn→∞ an if the latter limit exists;
• LIM((an )) ∈ [lim inf n→∞ an , lim supn→∞ an ] if an ∈ R for all n > 1;
• LIM((an )) = LIM((an+1 )).
The functional LIM is called a Banach limit.
Proof. We work initially over R. Let c(N) ⊆ ℓ∞ (N) and lim ∈ c(N)∗ be as
given before the statement of the corollary. Notice that k lim k = 1 since

lim an 6 sup |an |.
n→∞ n>1
Let L ∈ (ℓ∞ (N))∗ be an extension as in Theorem 7.3, with kLk = k lim k. We

now define
218 7 Dual Spaces

LIM((an )) = L a1 , a1 +a
2 ,
2 a1 +a2 +a3
3 ,... .
Clearly LIM is linear and extends lim on the subspace c(N), since the Césaro
averages of a convergent sequence converge to the same limit. This functional
also has norm one, since
a +···+a
1 n
6 k(an )k∞
n
for all n > 1, which implies that

L a1 , a1 +a2 , . . . 6 k(an )k∞ .
2
Moreover,

LIM((an )n − (an+1 )n ) = L a1 − a2 , a1 −a
2
3 a1 −a4
, 3 , . . . = 0,
which implies the last claim in the corollary.

Let I = inf n>1 an and S = supn>1 an , so that an − I+S
2 6 S−I
2 for
all n > 1, and hence
LIM((an )) − I+S 6 S−I ,
2 2
which implies that I 6 LIM((an )) 6 S. Together with the already established

translation-invariance, we obtain inf n>k an 6 LIM((an )) 6 supn>k an , and
so also
lim inf an 6 LIM((an )) 6 lim sup an .
n→∞ n→∞
We may extend LIM from ℓ∞

R (N) to ℓ∞
C (N) by setting
LIM((an )) = LIM((ℜan )) + i LIM((ℑan )) (7.7)
for all (an ) ∈ ℓ∞

C (N) (see Exercise 7.15).
∗
C (N)
Exercise 7.15. Show that the extension LIM ∈ ℓ∞ from (7.7) is C-linear and has
norm one.
By pre-composing with the projection operator from ℓ∞ (Z) to ℓ∞ (N)

defined by (an )n∈Z 7→ (an )n∈N , we can also view LIM as a translation-
invariant linear functional on ℓ∞ (Z).
7.2.2 Amenable Groups
A natural and important question to ask is which other groups G have a

similar invariant functional defined on all of ℓ∞ (G). We assume here that G
is endowed with the discrete topology but note that the notions discussed
have natural analogues for locally compact groups (see Section 10.2).
The following concept was introduced by von Neumann in 1929, and
called messbar (measurable); the modern terminology was introduced by Day
in 1949, perhaps as a pun, as these groups are ‘easy to work with’ and hence
a-men-able (US) / a-mean-able (UK), and are groups that ‘admit a mean’,
hence a-mean-able.
Definition 7.16. A discrete group G is called amenable if there exists a

finitely additive (left-)invariant mean on G. That is, a map m : P(G) → [0, 1]
defined on all subsets of G, with the following properties:
• m(A) > 0 for all A ⊆ G and m(G) = 1;
• m(A1 ∪ A2 ) = m(A1 ) + m(A2 ) for disjoint sets A1 , A2 ⊆ G; and
• m(gA) = m(A) for all g ∈ G and A ⊆ G.
One may think of a mean (which is only required to be finitely additive) as

a poor substitute for a measure (which is countably additive) when a measure
with the desired properties does not exist. This is the case if G is a countable
infinite group, as the only translation-invariant measure in that case is the
counting measure (or a scalar multiple of the counting measure), which is
infinite. However, the invariant mean discussed here takes values in [0, 1].
Example 7.17. Corollary 7.14 together with the next lemma shows G = Z
is amenable. Moreover, any invariant mean on Z (there will turn out to be
many, see Exercise 7.25) will have some reassuringly natural properties. For
example, if E = {n ∈ Z | n is even}, then for any invariant mean m we must
have m(E) = 21 since 2m(E) = m(E) + m(E + 1) = m(Z) = 1.
Lemma 7.18. A discrete group G is amenable if and only if there exists a

∗
positive left-invariant functional LIM ∈ (ℓ∞ (G)) of norm one. Here positiv-
ity is the requirement that
• a > 0 (on all of G) implies that LIM(a) > 0,
and left-invariance is
• LIM(ah ) = LIM(a) for all h ∈ G and a ∈ ℓ∞ (G), where ah is the shifted
map g 7→ a(h−1 g).
Sketch of proof. If LIM is given on ℓ∞ (G) then we can define m(A) to

be LIM(1A ) for all A ⊆ G, and then it is easy to check that m is a left-
invariant mean on G. On the other hand, if a left-invariant mean m is given,
then we may obtain every a ∈ ℓ∞ (G) as the limit of a sequence of finite sums
of the form
Xℓn
ri 1A(n)
(n)
i
i=1
(n) (n)
as n → ∞, where ri ∈ C and Ai ⊆ G for all n > 1 and i. For example,
C
we may partition the set Bkak into finitely many sets B1 , . . . , Bℓ each of
∞
1 (n)
diameter less than n, choose ri in Bi and then define Ai = a−1 (Bi ) so that
220 7 Dual Spaces

Xℓn 1

ri 1A(n) 6 .
(n)
a −
i n
i=1
Now we can define LIM by

ℓn
X (n) (n)
LIM(a) = lim ri m(Ai ),
n→∞
i=1
and check that LIM is well-defined, linear, bounded of norm one, positive,
and left-invariant.
Exercise 7.19. Provide a detailed proof of Lemma 7.18.
Next we show that the class of amenable groups is closed under many
natural operations that allow us to give more examples.
Proposition 7.20 (Subgroups and quotients). Let G be a discrete group.

(a) If G is amenable, then every subgroup H < G is also amenable.
(b) Let H ⊳ G be a normal subgroup. Then G is amenable if and only if
both H and G/H are amenable.
Proof. For (a), let LIMG be a left-invariant positive functional on ℓ∞ (G) of

norm one, as in Lemma 7.18. Let H < G be a subgroup and S ⊆ G a set of
right coset representatives, so that
G
G= Hs.
s∈S
For any a ∈ ℓ∞ (H) we define aG (hs) = a(h) for h ∈ H and s ∈ S

and LIMH (a) = LIMG (aG ), which is clearly linear. Moreover, a > 0 implies
that aG > 0, and therefore LIMH (a) > 0. Moreover,
| LIMH (a)| = | LIMG (aG )| 6 kaG k∞ = kak∞ .
Finally, for h0 ∈ H and s ∈ S we have
(ah0 )G (hs) = ah0 (h) = a(h−1 −1 h0

0 h) = aG (h0 hs) = (aG ) (hs),
so LIMH (ah0 ) = LIMG ((aG )h0 ) = LIMG (aG ) = LIMH (a), and Lemma 7.18
shows that H is amenable.
For (b) we again use Lemma 7.18, allowing us to work with function-
als on the space of bounded functions on the groups involved. Assume first
that G is amenable, so that H is amenable by (a). For a ∈ ℓ∞ (G/H),
define an element aG ∈ ℓ∞ (G) by setting aG (g) = a(gH). Just as in (a)
we define LIMG/H (a) = LIMG (aG ) to obtain a left-invariant positive func-
tional of norm one on ℓ∞ (G/H), showing that G/H is amenable.
For the converse, assume that H and G/H are both amenable, and
write LIMH and LIMG/H for the associated functionals. For a ∈ ℓ∞ (G)
define the bounded function a on G by
a(g) = LIMH (h 7→ a(gh))
for g ∈ G. Notice that for h0 ∈ H
a(gh0 ) = LIMH (h 7→ a(gh0 h)) = LIMH (h 7→ a(gh)) = a(g),
since multiplication by h0 induces a left shift on the function h 7→ a(gh).

Therefore we can think of a as a function on G/H by setting a(gH) = a(g).
We can now define
LIMG (a) = LIMG/H (a) = LIMG/H (g 7→ LIMH (h 7→ a(gh))) ,
and it is once again straightforward to check the required properties of LIMG

to see that G is amenable.
The following exercise gives the first indication of the existence of a max-
imal amenable radical subgroup: the amenable radical (also see Exercise 8.25).
Exercise 7.21. (a) Let G be a discrete group and let H1 ⊳ G and H2 < G be two amenable
subgroups with H1 normal. Show that H1 H2 = {h1 h2 | h1 ∈ H1 , h2 ∈ H2 } is also an
amenable subgroup.
(b) Let H1 , . . . , Hn ⊳ G be amenable normal subgroups for some n > 2. Show that the
product H1 H2 · · · Hn ⊳ G is an amenable normal subgroup.
Even though the above shows that many groups are amenable, it is also
easy to give an example of a group that is not amenable.
Example 7.22. The free group G = F2 generated by two elements α, β is not

amenable. To see this, suppose that m is an invariant mean on G. Clearly the
singletons {e}, {α}, {αα}, . . . are all disjoint and are left-translates of each
other, so we must have m({e}) = 0. Now define
Sα = {g ∈ G | the reduced form of g starts on the left with α},
and similarly define sets Sα−1 , Sβ , and Sβ −1 . Since
G = Sα ⊔ Sα−1 ⊔ Sβ ⊔ Sβ −1 ⊔ {e}
and m({e}) = 0, we must have
1 = m(Sα ) + m(Sα−1 ) + m(Sβ ) + m(Sβ −1 ). (7.8)
However,
α−1 Sα = Sα ⊔ Sβ ⊔ Sβ −1 ⊔ {e},
so
222 7 Dual Spaces
m(Sα ) = m(Sα ) + m(Sβ ) + m(Sβ −1 ),

and hence m(Sβ ) = m(Sβ −1 ) = 0 by positivity. Exchanging the roles of α
and β also shows that m(Sα ) = m(Sα−1 ) = 0, which together contradict (7.8).
It follows that F2 cannot be amenable.
We note that Proposition 7.20 shows inductively that G = Zd is amenable.
This may also be seen by noting that a sequence (Fn ) of large boxes, for
example the sequence defined by Fn = [−n, n]d ∩ Zd for n > 1, is a Følner
sequence as discussed now.
Definition 7.23. A sequence (Fn )n of finite subsets of a countable group G
is called a Følner sequence if the elements of the sequence are asymptotically
translation invariant in the sense that
|Fn △hFn |
lim =0 (7.9)
n→∞ |Fn |
for all h ∈ G.
Lemma 7.24. If a countable group has a Følner sequence, then it is amen-
able.
Proof. Let LIMN be the Banach limit from Corollary 7.14 and let (Fn ) be
a Følner sequence. Then for any a ∈ ℓ∞ (G) we can define

1 X
LIMG (a) = LIMN a(g) ,
|Fn |
g∈Fn
which is linear, positive, and of norm one. Then

h 1 X −1 1 X
LIMG (a ) = LIMN a(h g) = LIMN a(g)
|Fn | |Fn |
g∈Fn g∈h−1 Fn

1 X
= LIMN a(g) = LIMG (a) (by (7.9))
|Fn |
g∈Fn
showing left-invariance.
Exercise 7.25. Show that the invariant means on Z constructed by using the Følner
(1) (2) (3)
sequences defined by Fn = [0, n], Fn = [−n, 0], and Fn = [n2 , n2 + n] are all different.
Can you construct infinitely many different invariant means on Z?
Exercise 7.26. Show that any countable abelian group is amenable.
Exercise 7.27. Prove that the discrete Heisenberg group
  
 1k ℓ 
H = 0 1 m | k, ℓ, m ∈ Z
 
0 0 1
with the usual matrix multiplication is amenable.

Exercise 7.28. Show that SL2 (Z) is not amenable. You may use the fact that the
group PSL2 (Z) = SL2 (Z)/{±I} is isomorphic to the free product of Z/2Z and Z/3Z.
7.2.3 The Banach–Tarski Paradox
The following surprising consequence of the axiom of choice was one of the
original motivations for the study of amenable groups and of measurable sets.
Theorem 7.29 (Banach–Tarski paradox(23) ). The closed unit ball B

in R3 can be decomposed into finitely many disjoint sets B = P1 ⊔ · · · ⊔ Pm
with the property that there are isometric motions (that is, compositions of
rotations and translations) of R3 sending Pi to Pi′ for 1 6 i 6 m such that
  
3
B ⊔ B + 0 = P1′ ⊔ · · · ⊔ Pm ′
.
0
Clearly some of the sets appearing in the decomposition are of necessity

non-measurable. The same result holds in Rd for any dimension d > 3, but we
restrict attention to the case d = 3 for simplicity. However, in dimension d = 2
no such finite paradoxical decomposition can be found.
Notice that while the dimension d plays an essential role, other aspects of
the topology of Rd play no real role here since the sets in the decomposition
are not even measurable, and in particular cannot be described using count-
able unions and intersections of open and closed sets. The real drama takes
place in the group of isometries of Rd , and it is differences between the struc-
ture of this group when d = 2 and when d > 3 that lie behind the existence
of the paradoxical decomposition. For this reason we will endow the group
of isometries Isom(Rd ) = Od (R) ⋉ Rd with the discrete topology. Doing so
helps to illuminate the difference between the cases d = 2 and d > 3.
• For d = 2 the group SO2 (R) is abelian, and hence amenable (we will only
be able to prove this a bit later in Exercise 8.23). The translation group R2
and the factor O2 (R)/ SO2 (R) are also abelian, so by Proposition 7.20
we see that Isom(R2 ) is amenable. This helps to prevent paradoxical
decompositions as in the Banach–Tarski paradox for the unit ball in 2
dimensions and using isometries (see Exercise 7.30).
• For d > 3 the group SOd (R) is far from abelian, and in fact contains
a free subgroup (see Proposition 7.31) and so cannot be amenable by
Proposition 7.20 and Example 7.22 (where we use the discrete topology
on SOd (R)).
Exercise 7.30. (a) Let G be a discrete amenable group, and suppose that G acts on a
set X. Show that there exists a (non-canonical) finitely additive G-invariant mean mX (that
is, a function mX : P(X) → [0, 1] satisfying the first two requirements of Definition 7.16
.
and with mX (g B) = mX (B) for all B ⊆ X and g ∈ G) on the set of subsets P(X) of X.
224 7 Dual Spaces
(b) Show that amenability of Isom(R2 ) implies that a paradoxical decomposition as in

Theorem 7.29 cannot exist in 2 dimensions.
Proposition 7.31. The rotations

3 4   
5 –5 0 1 0 0
3 4
a =  45 35 0, b = 0 5 – 5 
4 3
0 0 1 0 5 5
in SO3 (R) generate a free subgroup H of SO3 (R).
The statement is essentially geometric: a and b are rotations by an irra-

tional multiple of π fixing two orthogonal axes, as illustrated in Figure 7.2.
The proposition means that any rotation obtained by composing a finite se-
quence of rotations a±1 and b±1 (in which a rotation is never followed by its
inverse) cannot be obtained by any other such sequence of rotations.
z
a
y
x
b
Fig. 7.2: Two rotations generating a free subgroup.
Proof of Proposition 7.31. Let

   
3 –4 0 3 4 0
a+ = 5a = 4 3
e 0 , e
a− = 5a−1 = –4 3 0 ,
0 0 5 0 0 5
and similarly define eb+ = 5b and eb− = 5b−1 . For some part of the proof we will
be working over the field F5 (that is, working modulo 5). The matrices arising
can all be viewed as linear transformations of the vector space Z3 /5Z3 ∼ = F35 .
We want to study how they act on the vectors
         
1 2 1 0 0
v = 1 , wα = 1 , wα−1 = 2 , wβ = 2 , wβ −1 = 1 .
1 0 0 1 2
Notice that it is enough to carry out the calculations for e

a+ , since the situ-
ation for the other matrices is the same up to a permutation of the basis
vectors. Writing ∼ for proportionality, we obtain
     
−1 4 2
a+ v =  7  ≡ 2 ∼ 1 = wα ,
e
5 0 0
    
3 –4 0 2 2
a+ wα = 4 3 0 1 ≡ 1 = wα ,
e
0 0 0 0 0
    
3 –4 0 0 2
a+ wβ = 4 3 0 2 ≡ 1 = wα ,
e
0 0 0 1 0
      
3 –4 0 0 1 2
a+ wβ −1 = 4 3 0 1 ≡ 3 ∼ 1 = wα ,
e
0 0 0 2 0 0
but     
3 –4 0 1 0
e
a+ wα−1 = 4 3 0 2 ≡ 0.
0 0 0 0 0
The same applies to the other matrices, which in summary means that each
of the matrices e a− , eb+ , eb− has its own non-zero eigenvector in F35 , maps
a+ , e
the eigenvector of the matrix with the same symbol but opposite sign to the
zero vector, but maps v and the other three to a multiple of its eigenvector.
Suppose now that γ is a reduced word of length n > 1 in F2 , the free
group with generators α and β (that is, a finite string of symbols chosen
from α, α−1 , β, β −1 with the property that no symbol is immediately fol-
lowed by its inverse). Define a homomorphism φ : F2 → SO3 (R) by defining
it on the generators by φ(α) = a, φ(β) = b and then extending to F2 using
the homomorphism property, and use this to define φ(γ) e = 5n φ(γ). Equival-
e
ently, φ(γ) ∈ Mat33 (Z) may be obtained by multiplying e a− , eb+ , eb− in the
a+ , e
order and multiplicities corresponding to the appearance of α, α−1 , β, β −1 in
the word γ. As the word γ is reduced, we see by induction on n and the
calculations above that
e
φ(γ)v ∈ Z3
modulo 5 is a non-zero multiple of wη where η ∈ {α, α−1 , β, β −1 } is the
e
left-most symbol of γ. In particular, φ(γ) is not divisible by 5 and φ(γ) 6= I
e
because φ(γ) 6= 5 I. Thus φ is injective and so im φ = ha, bi ∼
n
= F2 .
In the proof of Theorem 7.29 the free subgroup ha, bi < SO3 (R) from
Proposition 7.31 will play a critical role. It will be convenient to define two
subsets B1 , B2 of R3 to be equivalent, written B1 ∼ B2 , if they can be
decomposed as B1 = P1 ⊔ · · · ⊔ Pn and B2 = Q1 ⊔ · · · ⊔ Qn into finitely many
disjoint subsets such that Qk is the image of Pk under some isometric motion
for k = 1, . . . , n.
Essential Exercise 7.32. Prove that this defines an equivalence relation.

226 7 Dual Spaces
3
Proof of Theorem 7.29. Let B = B1R be the closed unit ball in R3 .
Step 1. We claim that B ∼ Br{0} by using a ‘Hilbert’s Hotel’ argument.(24)
To see this, let x0 = ( 12 , 0, 0)t and let γ : R3 → R3 be an irrational rotation
(meaning that γ n = I for some n ∈ Z implies that n = 0) about the point x0
in the x-y plane extended trivially to a rotation about the line parallel to
the z-axis through x0 , so that the orbit D = {γ n (0) | n ∈ N0 } ⊆ B is infinite.
Therefore
B = BrD ⊔ D ∼ BrD ⊔ γD = Br{0},
proving the claim.
Now let H < SO3 (R) be the free subgroup constructed in Proposi-
tion 7.31. Since H is countable and every non-trivial rotation in SO3 (R) has a
single one-dimensional eigenspace with eigenvalue 1, we can find a countable
union E of lines through the origin such that HE = E and with the property
that no vector in BrE is fixed by a non-trivial element of H.
Step 2. By using countably many Hilbert Hotel arguments at once we claim
that Br{0} ∼ BrE. To see this, notice that the set S2 ∩ E is countable and
so the set P of pairs of vectors v1 , v2 ∈ S2 ∩ E with v1 6= v2 is also countable.
Therefore

W = w ∈ R3 | w ⊥ v1 − v2 for some (v1 , v2 ) ∈ P
is a countable union of hyperplanes, and so a null set. Fix x1 ∈ R3r(W ∪ E)

and some irrational rotation γ about the line Rx1 . If now γ m v1 = γ n v2 for
some m, n ∈ N0 and v1 , v2 ∈ S2 ∩ E, then, since γ is an isometry fixing x1 ,
hv1 , x1 i = hγ m v1 , x1 i = hγ n v2 , x1 i = hv2 , x1 i
/ Rx1 and γ is an irrational

gives v1 = v2 by our choice of x1 . Since v1 ∈
rotation about the line, we also see that m = n. Therefore the various
sets γ n Rv1r{0}, where we vary n > 0 and the subspace Rv1 ⊆ E, are all
disjoint. Thus
G G
Br{0} = Br γnE ⊔ B ∩ γ n Er{0}
n>0 n>0
G G
∼ Br n
γ E⊔B∩ γ n Er{0} = BrE,
n>0 n>1
as claimed.
Step 3. We claim that BrE ∼ BrE ⊔ (BrE + (3, 0, 0)t ). This is clearly the
main step in the argument, and it is here that we will use the fact that H is
a free group, and in particular the resulting decomposition
H = {e} ⊔ Sa ⊔ Sa−1 ⊔ Sb ⊔ Sb−1

7.3 The Duals of Lpµ (X) 227
obtained by taking the image of the decomposition of F2 in Example 7.22.

In order for the decomposition of H to be useful, we have to find a cross-
section C of BrE satisfying
G
BrE = Hc,
c∈C
which can be found by a direct application of the axiom of choice (and will
not be measurable). We now decompose BrE into the four disjoint sets
G G
B1 = S a C ⊔ a−n C, B2 = Sa−1 Cr a−n C, B3 = Sb C
n>0 n>1
and B4 = Sb−1 C.
Applying a to B2 we obtain
G
aB2 = (Sa−1 C ⊔ C ⊔ Sb C ⊔ Sb−1 C) r a−n C
n>0

= B2 ⊔ B3 ⊔ B4 ∼ B2 ⊔ (B3 ⊔ B4 ) + (3, 0, 0)t ,
and applying b to B4 we obtain
bB4 = (Sb−1 C ⊔ C ⊔ Sa C ⊔ Sa−1 C)

= B4 ⊔ B1 ⊔ B2 ∼ B4 ⊔ (B1 ⊔ B2 ) + (3, 0, 0)t .
Leaving B1 and B3 untouched and taking the union this proves the claim in
Step 3.
Applying Step 2 and Step 1 (twice each) backwards, the theorem follows.

7.3 The Duals of Lpµ (X)
In this section we will present descriptions of dual spaces using a bilinear

pairing. If X and Y are vector spaces and each y ∈ Y induces a linear
functional on X, then we often write hx, yi for the value of the functional
associated to y ∈ Y evaluated at x ∈ X. We always assume that the linear
functional depends linearly on y ∈ Y , and so h·, ·i is a bilinear functional
on X × Y . We use the word pairing here to signify that, for example, the
map in Exercise 7.33 may be thought of in two ways. It defines on the one
hand a family of functionals parameterized by elements of ℓ1 (N) defined on
sequences in c0 (N) and on the other a family of functionals parameterized by
elements of c0 (N) defined on sequences in ℓ1 (N). Even though we will prove
this in many cases (indeed, it will often be the key step in an argument), when
228 7 Dual Spaces
we use this notation and terminology we do not assume that Y is indeed the
whole dual to X or vice versa. The reader may start with the following as a
warm-up exercise on how dual spaces may be found.
Exercise 7.33. (a) Recall that c0 (N) = {(an ) | limn→∞ an = 0} ⊆ ℓ∞ (N) is a Banach
space with respect to the supremum norm k · k∞ . Show that there is an isometric iso-
morphism (c0 (N))∗ ∼
= ℓ1 (N), where the dual pairing is given by
∞
X
h(an ), (bn )i = an bn
n=1
for (an ) ∈ c0 (N) and (bn ) ∈ ℓ1 (N).

∗
(b) Show that ℓ1 (N) ∼ = ℓ∞ (N), with the same formula for the pairing.
(c) Show that the Banach limit LIM ∈ (ℓ∞ (N))∗ as in Corollary 7.14 is not in the canonical
image of ℓ1 (N) in (ℓ∞ (N))∗ .
(d) Conclude that neither c0 (N) nor ℓ1 (N) is reflexive.
(e) Now let X be any infinite discrete set, and define c0 (X) (and ℓ1 (X)) to be the space
of all (R-valued or C-valued) functions a on X for which there exists a sequence (xn )
in X with Supp a ⊆ {x1 , x2 , . . .} and such that (a(xn ))n belongs to c0 (N) (resp. ℓ1 (N)).
Generalize (a) and (b) to this context.
7.3.1 The Dual of L1µ (X)
We start by generalizing the second part of Exercise 7.33.
Proposition 7.34. Let (X, B, µ) be a σ-finite measure space. Then

∗
L1µ (X) ∼ = L∞
µ (X)
R
under the pairing hf, gi = X f g dµ for f ∈ L1µ (X) and g ∈ L∞ µ (X). The
operator norm of the functional defined by g ∈ L∞µ (X) is the essential su-
premum norm kgk∞ (defined on p. 29).
Proof. As indicated, we associate to every g ∈ L∞µ (X) the functional

Z
φ(g) : f 7−→ f g dµ
X
which satisfies
Z Z

kφ(g)kop
= sup f g dµ 6 sup |f ||g| dµ 6 kgk∞ .
kf k1 61 X kf k1 61 X
For the converse we assume that g 6= 0, let ε ∈ (0, kgk∞) and choose a
measurable set A ⊆ {x ∈ X | |g(x)| > kgk∞ − ε} with µ(A) > 0 (which is
possible by definition of the essential supremum) and with µ(A) < ∞ (which
is possible since µ is σ-finite). Now define
1 |g(x)|
f= 1A ,
µ(A) g(x)
and notice that kf k1 = 1 and

Z Z
1
φ(g)(f ) = f g dµ = |g| dµ > kgk∞ − ε.
X µ(A) A
This shows that kφ(g)kop = kgk∞ , so φ : L∞ 1 ∗

µ (X) → (Lµ (X)) is an isometric
embedding. It remains to show that φ is onto (this is often the most interesting
part of the identification of a dual space).
For this, assume first that µ is finite. Then L2µ (X) ⊆ L1µ (X) and
kf k1 6 µ(X)1/2 kf k2
for all f ∈ L2µ (X) by the Cauchy–Schwarz inequality. Therefore, a func-

∗ ∗
tional ℓ ∈ L1µ (X) also induces a functional ℓ′ ∈ L2µ (X) . Since L2µ (X) is
a Hilbert space, the Fréchet–Riesz representation theorem (Corollary 3.19)
now shows that there exists some g ∈ L2µ (X) with
Z
ℓ′ (f ) = f g dµ
X
for all f ∈ L2µ (X). We now show that g ∈ L∞

µ (X). Let
A = {x ∈ X | |g(x)| > kℓkop},
so that f = 1A |g| 2 1
g ∈ Lµ (X) ⊆ Lµ (X) and kf k1 = µ(A). If µ(A) > 0 then
Z Z

kℓkop µ(A) < |g| dµ = f g dµ = |ℓ(f )| 6 kℓkop µ(A)

A X
gives a contradiction. Thus µ(A) = 0 and so kgk∞ 6 kℓkop . Since ℓ and φ(g)
2 1
agree on the dense subset LF µ (X) ⊆ Lµ (X), we have ℓ = φ(g), as required.
If µ is σ-finite with X = ∞ Y
n=1 n and µ(Yn ) < ∞, then we may apply the
argument above to ℓ|L1µ (Yn ) to find some gn ∈ L∞µ (Yn ) with
kgn k∞ = kℓ|L1µ (Yn ) kop 6 kℓkop .
We define g(x) = gn (x) for x ∈ Yn , and obtain a function g ∈ L∞ µ (X)

with kgk∞ 6 kℓkop . We now extend each function in L1µ (Yn ) to an element
of L1µ (X) by setting it to be zero outside Yn . With this the linear span
∞
X
V = L1µ (Yn ) ⊆ L1µ (X)
n=1
230 7 Dual Spaces
contains all simple functions in its closure, so that we have V = L1µ (X). By
construction ℓ and φ(g) coincide on V , so once again ℓ = φ(g), as required.

7.3.2 The Dual of Lpµ (X) for p > 1
Exercise 7.35. Let p, q ∈ (1, ∞) with 1

p
+ 1
q
= 1. Show that (ℓp (N))∗ ∼
= ℓq (N).
The following provides us with many examples of reflexive spaces.
Proposition 7.36. Let (X, B, µ) be a σ-finite measure space, and assume

that p ∈ (1, ∞) has Hölder conjugate q. Then (Lpµ (X))∗ ∼
= Lqµ (X) via the
pairing Z
hf, gi = f g dµ
X
for f ∈ Lpµ (X) and g ∈ Lqµ (X). The operator norm of the functional determ-
ined by g is precisely kgkq .
1 1
Proof. For f ∈ Lpµ (X) and g ∈ Lqµ (X) with p + q = 1 we have
|hf, gi| 6 kf kp kgkq
by the Hölder inequality (Theorem B.15). It follows that the linear functional
defined by g on Lpµ (X) is bounded, with norm less than or equal to kgkq . If
we set (
|g|
|g|q/p if g 6= 0,
f= g
0 if g = 0
then Z 1/p
kf kp = |g|q dµ = kgkq/p
q <∞
and Z Z
q
1+ p
hf, gi = |g| dµ = |g|q dµ = kf kp kgkq
X X
shows that the norm of the functional φ(g) ∈ (Lpµ (X))∗ determined by g must
be equal to kgkq . It remains to show that every bounded linear functional ℓ
in (Lpµ (X))∗ is determined as above by some g ∈ Lqµ (X).
Let ℓ ∈ (Lpµ (X))∗ . Replacing ℓ by ℜ(ℓ), respectively by ℑ(ℓ) if necessary,
we may restrict to real-valued functions on X and to R-linear functionals, as
the complex case then follows by putting together the functions associated
to ℜ(ℓ) and ℑ(ℓ).
So we work over the reals and define
ν + (B) = sup{ℓ(1A ) | A ⊆ B measurable, µ(A) < ∞}

+
for any measurable set B ⊆ X. R Notice that
R if+ ℓ were defined by g, then ν (B)
+
would be given by ν (B) = A g dµ = B g dµ for A = {x ∈ B | g(x) > 0}.
Thus for a general ℓ we would like to show that ν + ≪ µ is an absolutely
continuous measure on X (which then will give us g + as a Radon–Nikodym
derivative). Clearly ν + (B2 ) > ν + (B1 ) > 0 for measurable B1 ⊆ B2 ⊆ X. For
measurable disjoint B1 , B2 ⊆ X and A1 ⊆ B1 , A2 ⊆ B2 as in the definition
of ν + , we have
ℓ(1A1 ) + ℓ(1A2 ) = ℓ(1A1 ∪A2 ) 6 ν + (B1 ∪ B2 ),
and taking the supremum over A1 and A2 gives
ν + (B1 ) + ν + (B2 ) 6 ν + (B1 ∪ B2 ).
If, on the other hand, A ⊆ B1 ∪ B2 has finite measure we define Ai = A ∩ Bi

for i = 1, 2 and see that
ℓ(1A ) = ℓ(1A1 ) + ℓ(1A2 ) 6 ν + (B1 ) + ν + (B2 ).
Hence on taking the supremum over A we get
ν + (B1 ) + ν + (B2 ) = ν + (B1 ∪ B2 ).
Suppose now that

∞
G
B= Bn .
n=1
PN
Then n=1 ν + (Bn ) = ν + (B1 ∪· · · ∪BN ) 6 ν + (B) for all N > 1 by the above
finite additivity and monotonicity, and so
∞
X
ν + (Bn ) 6 ν + (B).
n=1
To see the converse, let A ⊆ B be P measurable with finite measure, and

define An = A ∩ Bn . Clearly 1A = ∞ n=1 1An pointwise, but by the domin-
ated convergence theorem this convergence also holds with respect to k · kp
(since p < ∞). Since ℓ is continuous, it follows that
∞
X ∞
X
ℓ(1A ) = ℓ(1An ) 6 ν + (Bn ).
n=1 n=1
As this holds for all A ⊆ B with finite measure we get

∞
! ∞
G X
+
ν Bn = ν + (Bn ),
n=1 n=1
232 7 Dual Spaces
and so ν + is a measure on X.
Finally, if B ⊆ X has finite µ-measure, then
ℓ(1A ) 6 kℓkop k1A kp = kℓkop µ(A)1/p 6 kℓkopµ(B)1/p
for all A ⊆ B, which shows that ν + (B) 6 kℓkµ(B)1/p . It follows that ν + ≪ µ

is absolutely continuous and is σ-finite since µ is assumed to be σ-finite. By
the Radon–Nikodym theorem (see Proposition 3.29) we have dν + = g + dµ
for some measurable function g + > 0.
We claim that g + ∈ Lqµ (X) is the positive part of the element g ∈ Lqµ (X)
we are looking for. For this we first have to check that g + ∈ Lqµ (X), which
we do by estimating the Lqµ norms of
gn+ = min{n, g + 1Xn },
where (Xn ), with XSn∞⊆ Xn+1 for all n > +1, is a sequence of sets with finite
+
measure and X = n=1 Xn . Notice thatPm ր g as n → ∞. Now let h > 0
g n
be a simple function of the form h = j=1 βj 1Bj , where βj > 0 for all j and
with the sets Bj measurable and pairwise disjoint. Then
Z Z m
X
hgn+ dµ 6 hg + dµ = βj sup ℓ(1Aj ) | Aj ⊆ Bj
j=1
X
m
= sup ℓ βj 1Aj | Aj ⊆ Bj ,
j=1
but the expressions inside the last supremum we may estimate by

X
m X
m
ℓ
β 1
j Aj

6 kℓk
op βj 1Aj 6 kℓkop khkp

j=1 j=1 p
and so Z
hgn+ dµ 6 kℓkopkhkp .
Using monotone convergence this estimate extends to all positive h ∈ Lpµ (X).
Applying the argument (for gn+ ∈ Lqµ (X)) from the beginning of the proof this
shows that kgn+ kq 6 kℓkop and letting n → ∞ also shows that kg + kq 6 kℓkop
by monotone convergence.
Now define ∗
ℓ− = φ(g + ) − ℓ ∈ Lpµ (X)
where φ(g + ) is the functional determined by g + ∈ Lqµ (X). Notice that for
all B ⊆ X measurable with µ(B) < ∞ we have
Z
ℓ− (1B ) = 1B g + dµ − ℓ(1B )
= sup{ℓ(1A − 1B ) | A measurable, A ⊆ B}
= sup{−ℓ(1C ) | C measurable, C ⊆ B}, (7.10)
where we used the identity 1A − 1B = −1C for C = BrA and A ⊆ B.

Using −ℓ in the construction above we also obtain a measure dν − = g − dµ
for some g − ∈ Lqµ (X). For a measurable set B ⊆ X with µ(B) < ∞ we now
obtain from (7.10) that
Z Z
1B g dµ ℓ(1B ) = ν (B) = 1B g − dµ
+ − −
or equivalently
Z Z
ℓ(1B ) = 1B (g − g ) dµ =
+ −
1B g dµ
for g = g + − g − ∈ Lqµ (X). By the density of simple functions in Lpµ (X) we

conclude that ℓ = φ(g), as required.
7.3.3 Riesz–Thorin Interpolation
†
The Riesz–Thorin interpolation theorem (also called the Riesz–Thorin con-
vexity theorem) bounds the norms of linear maps between Lp spaces. This can
be useful because certain Lp spaces have special properties making it easier
to understand properties of operators on them — this particularly applies to
the cases p = 1, 2, and ∞.
Proposition 7.37. Let (X, B, µ) be a measure space, and assume that
1 6 q0 < q < q1 6 ∞.
Then
Lqµ0 (X) ∩ Lqµ1 (X) ⊆ Lqµ (X)
and kf kq 6 kf k1−t t q0 q1
q0 kf kq1 for all f ∈ Lµ (X) ∩ Lµ (X), where t ∈ (0, 1) is
1 1−t t
determined by the relation q = q0 + q1 .
Proof. We first assume that q1 = ∞, and note that |f |q 6 |f |q0 kf kq−q

∞
0
almost everywhere for f ∈ Lqµ0 (X) ∩ L∞

µ (X). Integrating over X and taking
q /q (q−q )/q
the qth root gives kf kq 6 kf kq00 kf k∞ 0 . Moreover, since q1 = ∞ we
have q1 = 1−t
q0 , giving the proposition in this case.
† The results of this subsection conclude our discussion of Lp -spaces but will not be needed
in the remainder of the book.
234 7 Dual Spaces
q0
Now suppose that q1 < ∞. In this case the numbers (1−t)q and qtq1 are
Hölder conjugate by definition of t. Let f ∈ Lqµ0 (X) ∩ Lqµ1 (X). Applying
Hölder’s inequality (Theorem B.15) gives
Z Z
|f | dµ = |f |(1−t)q |f |tq dµ
q

tq
6 |f |(1−t)q |f | = kf k(1−t)q
q0 kf ktq
q1 .
q0 /(1−t)q q1 /tq
Taking the qth root gives the proposition.

We now consider a linear map that is defined not just on one Lp space but
on several, taking values in possibly different Lq spaces.
Theorem 7.38 (Riesz–Thorin interpolation). Let (X, B, µ) and (Y, C, ν)

be two σ-finite measure spaces and let p0 , p1 , q0 , q1 ∈ [1, ∞]. Let
T : Lpµ0 (X) ∩ Lpµ1 (X) −→ Lqν0 (Y ) ∩ Lqν1 (Y )
be a linear map such that kT f kq0 6 M0 kf kp0 and kT f kq1 6 M1 kf kp1 for
all f ∈ Lpµ0 (X) ∩ Lpµ1 (X) and some constants M0 , M1 > 0. Then T has
a linear extension to a linear space D of (equivalence classes of ) functions
on X into the space Lqν0 (Y ) + Lqν1 (Y ) with the following properties. If we
define pt and qt for any t ∈ (0, 1) by p1t = 1−t t 1 1−t t
p0 + p1 and qt = q0 + q1 then
we have Lpµt (X) ⊆ D and kT f kqt 6 M01−t M1t kf kpt for all f ∈ Lpµt (X). The
conclusion also holds for t = 0 if p0 < ∞ and for t = 1 if p1 < ∞.
Example 7.39. An interesting example to keep in mind is an application of

the theorem known as the Hausdorff–Young inequality. For this, consider the
map from Theorem 3.47 sending a function f ∈ L2 (G) for a compact abelian
b of characters defined by
group G to the map a(f ) on the set G
a(f ) (χ) = a(f )

χ = hf, χi
for every χ ∈ G.b For f ∈ L2 (G) we have a(f ) ∈ ℓ2 (G)b and ka(f ) k2 = kf k2 ;
1 (f ) ∞ b (f )
for f ∈ L (G) we have a ∈ ℓ (G) with ka k∞ 6 kf k1 — or formally we
have p0 = 2 = q0 , p1 = 1, q1 = ∞, and M0 = M1 = 1. The above interpola-
tion theorem now implies that the map is also defined for functions f ∈ Lp (G)
b A short calculation reveals
with p ∈ [1, 2], taking values in a certain ℓq (G).
that in this case q ∈ [2, ∞] is the Hölder conjugate of p ∈ [1, 2].
Proof of Theorem 7.38 in the case p0 = p1 . Set D = Lpµ0 (X), and notice
that for f ∈ Lpµ0 (X) we have kT f kq0 6 M0 kf kp0 and kT f kq1 6 M1 kf kp0 .
Applying Proposition 7.37 gives the theorem in this case.
For the general case we will need the following result from complex analyis.
Lemma 7.40 (Hadamard’s three-lines theorem). Let φ be a continuous

bounded function on the strip S = {z ∈ C | 0 6 ℜ(z) 6 1} that is holomorphic
on S o . If |φ(z)| 6 M0 when ℜ(z) = 0 and |φ(z)| 6 M1 when ℜ(z) = 1,
then |φ(t + iu)| 6 M01−t M1t for all u ∈ R and t ∈ [0, 1].
Proof. Assume first that M0 , M1 > 0. For ε > 0, define

2
φε (z) = φ(z)M0z−1 M1−z eεz −ε
and note that

lim φε (z) = φ(z)M0z−1 M1−z (7.11)
ε→0
for all z ∈ S. Moreover, φε is continuous on S and holomorphic on S o .

Setting z = t + iu ∈ S we also have z 2 = t2 − u2 + 2itu and so
2
−εu2 −ε 2
|φε (z)| = |φ(z)|M0t−1 M1−t eεt 6 |φ(z)|M0t−1 M1−t e−εu ,
since t ∈ [0, 1]. For t = ℜ(z) = 0 or t = ℜ(z) = 1 this gives |φε (z)| 6 1 by
assumption on φ. Moreover,
lim |φε (z)| = 0,

|u|→∞
where u = ℑ(z). Applying the maximum modulus theorem on
{z ∈ C | 0 6 ℜ(z) 6 1, −Nε 6 ℑ(z) 6 Nε }
for sufficiently large Nε we see that |φε (z)| 6 1 for all z ∈ S. By (7.11) it
follows that
|φ(z)| 6 |M01−z M1z | = M01−t M1t
for z ∈ S with t = ℜ(z). If M0 = 0 (or M1 = 0) we may apply the argument
above with M0 (or M1 ) replaced by any δ > 0 and obtain the lemma by
letting δ → 0.
Proof of Theorem 7.38 in the case p0 6= p1 . Our first goal is the in-
equality
kT f kqt 6 M01−t M1t kf kpt (7.12)
for a fixed t ∈ (0, 1) and all f ∈ ΣX . Here ΣX denotes the space of simple
integrable functions on X (and ΣY is defined similarly). Then ΣX ⊆ Lpµ (X)
for all p ∈ [1, ∞] and in particular T is defined on ΣX and satisfies
T (ΣX ) ⊆ Lqν0 (Y ) ∩ Lqν1 (Y ) ⊆ Lqνt (Y )
by the assumption in the theorem and Proposition 7.36. Assume for the
moment that qt ∈ (1, ∞]. Then the Hölder conjugate qt′ of qt belongs to [1, ∞)
q′
and ΣY is dense in Lνt (Y ). Fix some f ∈ ΣX and assume that
236 7 Dual Spaces
Z

(T f )g dν 6 M 1−t M1t kf kpt kgkq′ (7.13)
0 t
for all g ∈ ΣY . Then the above and Propositions 7.34 and 7.36 imply (7.12).
The case qt = 1 with qt′ = ∞ is only slightly different. Assume again (7.13)
and fix some measurable set B ⊆ Y with ν(B) < ∞. Then
{g ∈ ΣY | g(y) = 0 for y ∈ Y rB}
is dense in L∞
ν (B) and as before (see also Corollary 7.9) we obtain
k(T f )|B k1 6 M01−t M1t
independent of B. Using the fact that ν is σ-finite, this again implies (7.12).
For the proof of (7.13) it suffices to fix some t ∈ (0, 1), some f ∈ ΣX
with kf kpt = 1, and some g ∈ ΣY with kgkqt′ = 1. By definition, we may
express f and g as finite sums
m
X
f= cj 1Ej ,
j=1
with cj ∈ C, Ej ∈ B, µ(Ej ) < ∞ for 1 6 j 6 m, and

n
X
g= dk 1Fk ,
k=1
with dk ∈ C, Fk ∈ C, ν(Fk ) < ∞ for 1 6 k 6 n. We may also assume that

the sets Ej are pairwise disjoint, and similarly for the sets Fk .
A holomorphic function on the strip. Next define
α(z) = (1 − z)p−1 −1
0 + zp1 ,
β(z) = (1 − z)q0−1 + zq1−1
for z ∈ C and observe that α(t) = p−1

t and β(t) = qt−1 for the fixed t ∈ (0, 1).
Notice that α(t) > 0 (since otherwise p0 = p1 = ∞, a case already considered)
and that β(t) = 1 implies that q0 = q1 = qt = 1, q0′ = q1′ = qt′ = ∞, and
also β(z) = 1 for all z ∈ C. For any z ∈ C we now define
m
X
fz = |cj |α(z)/α(t) arg(cj )1Ej ,
j=1
 n
 X
 |dk |(1−β(z))/(1−β(t)) arg(dk )1Fk if β(t) < 1;
gz = k=1


g if β(t) = 1.
Notice that fz ∈ ΣX , gz ∈ ΣY for all z ∈ C, ft = f , and gt = g. Also define

Z X
φ(z) = (T fz )gz dν = Ajk |cj |α(z)/α(t) |dk |(1−β(z))/(1−β(t)) ,
j,k
where the coefficient

Z
Ajk = arg(cj dk ) (T 1Ej )1Fk dν ∈ C
is independent of z for all j and k. It follows that φ is just a finite linear

combination of exponential functions of the form z 7→ az for some a > 0, so
that φ is entire and bounded on the strip
S = {z ∈ C | 0 6 ℜ(z) 6 1}.
Since ft = f and gt = g we also have

Z
φ(t) = (T f )g dν,
which is the quantity that we wish to estimate. The desired estimate will
follow from Lemma 7.40 once we establish its remaining assumptions.
Boundary estimate: Consider therefore z = iu with ℜ(z) = 0 and no-
tice that ℜ(α(iu)) = p−1 0 and ℜ(1 − β(iu)) = 1 − q0−1 = (q0′ )−1 . Since the
sets E1 , . . . , Em , respectively F1 , . . . , Fn , are disjoint, this gives
|fiu | = |f |ℜ(α(iu)/α(t)) = |f |pt /p0
and ( ′ ′
|g|ℜ((1−β(iu))/(1−β(t))) = |g|qt /q0 if β(t) < 1;
|giu | =
|g| if β(t) = 1.
Using the assumption on T this gives
|φ(iu)| 6 kT fiukq0 kgiu kq0′

( p /p q′ /q′
M0 kf kptt 0 kgkqt′ 0 = M0 if β(t) < 1;
6 M0 kfiu kp0 kgiu kq0′ = p /p
t
M0 kf kptt 0 kgk∞ = M0 if β(t) = 1
since kf kpt = kgkqt′ = 1. The case of z = 1 + iu with ℜ(z) = 1 works similarly

after noticing that ℜ(α(1 + iu)) = p−1 −1
1 and ℜ(β(1 + iu)) = q1 , which leads
to the bound |φ(1 + iu)| 6 M1 .
The estimate: By Lemma 7.40 we obtain
Z

|φ(t)| = (T f )g dν 6 M01−t M1t .

238 7 Dual Spaces
Since g ∈ ΣY was arbitrary subject to kgkqt′ = 1, this implies (as explained

after (7.13)) that
kT f kqt 6 M01−t M1t
for any f ∈ ΣX with kf kpt = 1. By the density of ΣX in Lpµt (X) and
Proposition 2.59 it follows that T has a unique extension Tpt to all of Lpµt (X)
with values in Lqνt (Y ) such that kTpt f kqt 6 M01−t M1t kf kpt for f ∈ Lpµt (X).
One linear map: It remains to find one common domain D that con-
tains Lpµt (X) for all t in (0, 1) and a linear map on D that extends the
extension above. For this we let D0 = Lpµ0 (X) ∩ Lpµ1 (X) and
p0 p1
D = D0 + D0 ⊆ Lpµ0 (X) + Lpµ1 (X),
pt
where · denotes the closure with respect to k · kpt for t ∈ [0, 1], and we
have
D = Lpµ0 (X) + Lpµ1 (X)
unless ∞ ∈ {p0 , p1 }. Applying the assumption of the theorem and Proposi-
tion 2.59 we find extensions of T (initially only defined on D0 ); Tp0 extend-
p0 p1
ing T to D0 and Tp1 extending T to D0 . We now define
T (f0 + f1 ) = Tp0 (f0 ) + Tp1 (f1 )

p0 p1
for f0 ∈ D0 and f1 ∈ D0 . We claim that this gives a well-defined extension
of T to D with values in Lqν0 (Y ) + Lqν1 (Y ). Indeed, if f0 + f1 = f0′ + f1′ ∈ D
p0 p1
with f0 , f0′ ∈ D0 and f1 , f1′ ∈ D0 then f0 − f0′ = f1′ − f1 ∈ D0 and
Tp0 (f0 ) − Tp0 (f0′ ) = T (f0 − f0′ ) = T (f1′ − f1 ) = Tp1 (f1′ ) − Tp1 (f1 ).
Rearranging the terms, the claim follows.

The map on Lpµ (X): Now let f ∈ Lpµt (X) and assume without loss of gen-
erality that p0 < p1 . We define the set B = {x ∈ X | |f (x)| 6 1} and use
it to split f = f s + f l into a component f s taking on small values and a
component f l taking on large values, where
f s = f 1B ∈ Lpµt (X) ∩ Lpµ1 (X)
and
f l = f 1XrB ∈ Lpµt (X) ∩ Lpµ0 (X).
If we now choose a sequence (fn ) in ΣX with |fn | 6 |f | for all n > 1 and
with fn → f pointwise as n → ∞ then fns = fn 1B → f s in Lpµt (X) and
in Lpµ1 (X) by dominated convergence if p1 < ∞. If however p1 = ∞ then
we can choose the sequence (fn ) of simple functions to also have fns → f s
with respect to k · k∞ as n → ∞. Similarly, fnl = fn 1XrB → f l in Lpµt (X)
and in Lpµ0 (X). Therefore, T (fns ) → Tpt (f s ) in Lqνt (Y ) and T (fns ) → Tp1 (f s )
in Lqν1 (Y ) as n → ∞. Choosing a subsequence if necessary, the convergence
7.4 Riesz Representation: The Dual of C(X) 239
also holds pointwise almost everywhere, which gives Tpt (f s ) = Tp1 (f s ). The
same argument gives Tpt (f l ) = Tp0 (f l ), and Tpt (f ) = T (f ) follows.
Exercise 7.41. Show that t 7→ log kTpt k is convex for t ∈ (0, 1), where
Tpt : Lpµt (X) → Lqνt (Y )
is as in the proof above.
Exercise 7.42. Let G be a locally compact, σ-compact, metrizable, abelian group. Fix
some p ∈ [1, ∞) with Hölder conjugate q and some F ∈ Lp (G). Show (or recall) that
Z
f ∗ F (x) = f (t)F (x − t) dmG (t)
is well-defined almost everywhere for f ∈ L1 (G) with kf ∗ F kp 6 kf k1 kF kp and also

for f ∈ Lq (G) with kf ∗ F k∞ 6 kf kq kF kp . Apply the Riesz–Thorin interpolation theorem
to obtain a conclusion for all f ∈ Lr (G) with r ∈ [1, q].
7.4 Riesz Representation: The Dual of C(X)
The next result is useful in many ways. It will allow us to completely de-
scribe C(X)∗ in Section 7.4.5, but it is more often used directly in the form
presented here.
Definition 7.43. Let F (X) be a space of real- or complex-valued functions

on some space X. Then a positive linear functional on F (X) is a linear
functional Λ defined on F (X) with the property that any real-valued func-
tion f ∈ F(X) with f > 0 is mapped to Λ(f ) > 0.
Theorem 7.44 (Riesz representation). Let X be a locally compact, σ-

compact metric space, and suppose that Λ is a positive linear functional
defined on Cc (X). Then there exists a uniquely determined locally finite (pos-
itive) Borel measure µ such that
Z
Λ(f ) = f dµ
X
for all f ∈ Cc (X).
Recall that a measure µ is locally finite if every point has a neighbourhood

of finite measure, or equivalently if every compact subset of X has finite meas-
ure. A locally finite Borel measure is also often called a Radon measure. As a
real-valued function in Cc (X) is the difference of two non-negative functions
in Cc (X), a positive linear functional maps a real-valued function to a real
number. Hence we may and will restrict in the proof below to the real case
as this case implies the complex case as well.
240 7 Dual Spaces
Exercise 7.45. Let X be a σ-compact, locally compact metric space. Let µ be a locally
finite measure on X. Show that µ is regular, meaning that
µ(B) = sup{µ(K) | K ⊆ B is compact} = inf{µ(O) | O ⊇ B is open}
for any Borel set B ⊆ X.
We will prove Theorem 7.44 in several steps, first showing the claimed
uniqueness of the measure, then showing existence in the totally disconnected
compact case, then the compact case and finally the general case.
7.4.1 Uniqueness
We will prove uniqueness without assuming the measure to be locally finite,

but this is automatic for a measure representing Λ.
Proof of uniqueness in Theorem 7.44. Let Λ be a positive linear func-
tional on Cc (X) and suppose µ and ν are two positive measures with
Z Z
f dµ = Λ(f ) = f dν
X X
for all f ∈ Cc (X). This implies that µ and ν are locally finite, since for every
compact set K ⊆ X there exists some function f ∈ Cc (X) with f > 1K by
Urysohn’s lemma (Lemma A.27), which shows that µ(K), ν(K) 6 Λ(f ) < ∞.
Define m = µ + ν, so that µ ≪ m and ν ≪ m. By Proposition 3.29 there
exist Radon–Nikodym derivatives fµ , fν > 0 with
dµ = fµ dm, dν = fν dm,
and fµ + fν = 1 m-almost everywhere. Therefore

Z Z Z Z
gfµ dm = g dµ = Λ(g) = g dν = gfν dm
X X X X
for all g ∈ Cc (X). Since Cc (X) ⊆ L1m (X)

is dense by Proposition 2.51, the
functions fµ and fν in L∞m (X) determine the same functional on L1m (X). By
Proposition 7.34 this implies fµ = fν almost everywhere with respect to m.
However, this shows that µ = ν.
7.4.2 Totally Disconnected Compact Spaces
As our first step towards the existence of the measure representing a positive
linear functional we consider the following kind of spaces, where the proof is
quite simple.
Definition 7.46. Let X be a topological space. A set C ⊆ X is called clopen
if it is both open and closed in X. The space X is called totally disconnected
if every open set in X is a union of clopen sets, so the topology has a basis
consisting of clopen sets.
Example 7.47. Before we give the proof, let us give examples of compact
metric totally disconnected spaces.
(1) X = {1, . . . , a}N is a compact metrizable space with respect to the
product topology using the discrete topology on {1, . . . , a}. It is also
totally disconnected, since for any finite collection F1 , . . . , Fn ⊆ {1, . . . , a}
the set π1−1 (F1 ) ∩ · · · ∩ πn−1 (Fn ) is both open and closed (here πj is the
projection X → {1, . . . , a} onto the jth coordinate). Q
∞
(2) More generally, we can also take the product X = n=1 An , where
each An is a finite set equipped with the discrete topology. Note that
any closed subset Y ⊆ X is again totally disconnected and compact.
One way to define a metric on X as in (1) or (2) and hence also on Y as
in (2) is to set
(
0 if x = y, and
d(x, y) = 1
n if x1 = y1 , . . . , xn−1 = yn−1 , but xn 6= yn
for all points x, y (see also Lemma A.17). In this metric the open ball of
radius n1 and centre y is given by

B n1 (y) = x | x1 = y1 , . . . , xn = yn = π1−1 ({y1 }) ∩ · · · ∩ πn−1 ({yn }).
1
Also note Br (y) = B n1 (y) if n+1 < r 6 n1 . It follows that there are only
countably many balls and that these are all clopen. As every open set O ⊆ X
is a union of balls it follows that every open set is actually a countable union
of clopen sets. In particular, the clopen sets generate the Borel σ-algebra.
Lemma 7.48. Let X be a totally disconnected compact metric space. Then
the Borel σ-algebra is generated by the clopen sets.
As we have already obtained a proof of the lemma in the setting of Ex-
ample 7.47 and since these cases will be sufficient for the proof of The-
orem 7.44 we leave the proof as an exercise.
Exercise 7.49. (a) Prove Lemma 7.48 in general by showing that in a compact totally
disconnected metric space, there are only countably many clopen sets.
(b) Show that every compact totally disconnected metric space is homeomorphic to a
metric space Y as in Example 7.47(2).
Proof of Theorem 7.44 for totally disconnected compact metric

spaces as in Example 7.47. Let X be a totally disconnected compact metric
space so that the algebra C = {C ⊆ X | C is open and closed} generates
the Borel σ-algebra of X. Let Λ : C(X) → R be a positive linear functional.
Using Λ we can already define a content µC on the algebra C. In fact, for C ∈ C
we define
242 7 Dual Spaces
µC (C) = Λ(1C ).
This is possible since 1C ∈ C(X) as C is both open and closed. It follows
that
• µC (C) > 0 for C ∈ C (Positivity);
• µC (C1 ⊔C2 ) = µC (C1 )+µC (C2 ) for disjoint C1 , C2 ∈ C (Finite additivity).
By Caratheodory’s extension theorem (see Theorem B.4) we can extend µC
to a measure on the Borel σ-algebra B of X if
∞
! ∞
G X
µC Cn = µC (Cn )
n=1 n=1
F∞
for any disjoint sets C1 , C2 , . . . in C with n=1 Cn ∈ C. In the totally discon-
nected compact setting this is F∞ quite easy to check. Suppose that Cn ∈ C are
disjoint for n > 1 and C = n=1 Cn ∈ C. Then C is compact since C ∈ C
gives that it is a closed subsetF of X and X is compact. On the other hand the
sets Cn ∈ C are open, so C = ∞ n=1 Cn ∈ C is an open cover of a compact set.
FN
It follows that C = n=1 Cn for some N > 1, and hence Cn = ∅ for n > N .
Hence finite additivity gives
N
X ∞
X
µC (C) = µC (Cn ) = µC (Cn ),
n=1 n=1
as required.
Therefore, µC can be extended to a measure µ, defined on the Borel σ-
algebra B of X. By construction
Z
1C dµ = Λ(1C )
X
for C ∈ C. We wish to extend this formula to all continuous functions. For

this we note that this formula extends trivially to all linear combinations of
characteristic functions of clopen sets. Now note that the linear hull A of the
characteristic functions of clopen sets is an algebra, contains the constant
function, and separates points. Hence it is dense in C(X) by the Stone–
Weierstrass theorem (Theorem 2.40; this is also easy to see directly but the
given argument is shorter). It follows that for every
R f ∈ C(X) and ε > 0
there exists some g ∈ A (already satisfying Λ(g) = X g dµ) such that
g − ε < f < g + ε.
R
Hence we may apply Λ and · dµ and obtain from the positivity of both
these functionals the bounds
Z
Λ(f ), f dµ ∈ [Λ(g) − εΛ(1), Λ(g) + εΛ(1)]
and so Z

f dµ − Λ(f ) 6 2εΛ(1).

As this holds for all ε > 0 and all f ∈ C(X), the theorem follows.
7.4.3 Compact Spaces
We now upgrade the result from Section 7.4.2 to the case of a general com-
pact metric space. For this we are going to use the Hahn–Banach lemma
(Lemma 7.1) and the following lemma.
Lemma 7.50 (Symbolic cover). Let X be a compact metric space. Then

there exists a totally disconnected compact metric space Y and a continuous
surjective map φ : Y → X. In fact, we can choose Y as in Example 7.47(2).
Example 7.51. A few cases of this lemma do not need a proof, and should
help explain why one can think of Y as a symbolic cover.
• If X = [0, 1] then we may take Y = {0,P 1}N to be the space of all bin-
∞
ary sequences with the map φ ((an )) = n=1 an 2−n sending the binary
sequence to the real number with that binary expansion.
• Let X ⊆ [−M, M ]d be a compact subset of Rd . By composing with an
affine map, we can assume without loss that X ⊆ [0, 1]d = X ′ . Define
d
Y ′ = {0, 1}N
and a continuous surjective map just as above
φ′ : Y ′ −→ X ′ = [0, 1]d
!
X ∞ ∞
X
(1) (d) (1) −n (d) −n
(an ), . . . , (an ) 7−→ an 2 , . . . , an 2
n=1 n=1
and finally Y = {y ′ ∈ Y ′ | φ′ (y ′ ) ∈ X} with φ = φ′ |Y . Then Y ⊆ Y ′

is closed and so again is a totally disconnected compact metric space,
and φ : Y → X is continuous and surjective.
Exercise 7.52. Suppose that X is a compact d-dimensional manifold. Construct Y and φ
as in Lemma 7.50.
We postpone the proof of the lemma until after we have seen why it is
useful for the problem at hand.
Proof of Theorem 7.44 for compact metric spaces. Let X be a
compact metric space, and let Y and φ : Y → X be as in Lemma 7.50.
Let Λ : C(X) → R be a positive linear functional. For f ∈ C(X) we have
244 7 Dual Spaces

sup f (x) 1X − f > 0,
x∈X
so
Λ sup f (x) 1X − f > 0
x∈X
by positivity, or equivalently
Λ(f ) 6 Λ(1X ) sup f (x).

x∈X
Now let V = {f ◦ φ | f ∈ C(X)} ⊆ C(Y ), where we used the continuity of φ,

and notice that if f1 ◦ φ = f2 ◦ φ for f1 , f2 ∈ C(X) then f1 = f2 since φ is
surjective. Thus we may define ΛV (f ◦ φ) = Λ(f ), which is linear and satisfies
ΛV (f ◦ φ) = Λ(f ) 6 Λ(1X ) sup f (x) = p(f ◦ φ)

x∈X
for p : C(Y ) → R defined by
p(F ) = Λ(1X ) sup F (y).

y∈Y
Note that p satisfies p(F1 + F2 ) 6 p(F1 ) + p(F2 ) and p(αF1 ) = αp(F1 )

for F1 , F2 ∈ C(Y ) and α > 0. These are precisely the hypotheses for the
Hahn–Banach lemma (Lemma 7.1), so we conclude that ΛV can be extended
to a functional ΛY : C(Y ) → R which still satisfies
ΛY (F ) 6 Λ(1X ) sup F (y).

y∈Y
If F > 0 then −F 6 0 and so
ΛY (−F ) 6 Λ(1X ) sup (−F (y)) 6 0,

y∈Y
or ΛY (F ) > 0. Hence ΛY is a positive linear functional on Y . By the totally

disconnected compact case in Section 7.4.2 we conclude that there is a meas-
ure µY on Y with Z
ΛY (F ) = F dµY
Y
for all F ∈ C(Y ). Applying this to F = f ◦ φ, we see that

Z
Λ(f ) = ΛY (f ◦ φ) = f ◦ φ dµY .
Y
We now define µ = φ∗ µY by the formula µ(B) = µY (φ−1 (B)) for a Borel

set B ⊆ X. Note that by this definition we have
Z Z
1B dµ = µ(B) = µY (φ−1 (B)) = 1B ◦ φ dµY ,
X Y
which extends by linearity to all simple functions, then by monotone conver-

gence to all positive measurable functions, and then to all integrable func-
tions. In particular,
Z Z
Λ(f ) = f ◦ φ dµY = f dµ
Y X
for all f ∈ C(X), proving the theorem for a compact metric space X.
We note that the argument above actually proves the following abstract
principle. If φ : Y → X is a continuous surjective map between two compact
spaces, and the Riesz representation theorem holds for Y , then it also holds
for X.
It remains to construct the totally disconnected symbolic cover.
Proof of Lemma 7.50. Recall that since X is a compact metric space,
it is also totally bounded, so for every m > 1 there exist finitely many
(m) (m)
points x1 , . . . , xn(m) ∈ X with
n(m)
[ (m)
X= B1/m xi . (7.14)
i=1
We define
∞
Y
Z= {1, . . . , n(m)}
m=1
with the product topology from the discrete topologies on each of the
spaces {1, . . . , n(m)}. Then Z is a compact metric space (see Sections A.3
and A.4). We will define Y as a closed subset of Z, and will define φ : Y → X
by
(m)
φ(y) = lim xy(m) ,
m→∞
(m)
where y(m) ∈ {1, . . . , n(m)} is the mth coordinate of y and xy(m) is the
corresponding centre of the y(m)-th ball in the cover (7.14). Our definition
of Y will ensure that φ is well-defined (that is, the limit defining φ exists),
continuous, and surjective.
The closed set Y . Define
n o
(1) (m)
Y = y ∈ Z | B1/1 xy(1) ∩ · · · ∩ B1/m xy(m) 6= ∅ for all m > 1 .
We will show that Y is closed by proving that its complement ZrY is open.
So suppose that z ∈ ZrY , so that
(1) (m)
B1/1 xz(1) ∩ · · · ∩ B1/m xz(m) = ∅
246 7 Dual Spaces
for some m > 1. However, this means that all other sequences with the same
first m coordinates also lie in ZrY . That is,
π1−1 ({z(1)}) ∩ · · · ∩ πm
−1
({z(m)}) ⊆ ZrY,
and the set on the left is an open neighbourhood of z by definition, so ZrY

is open.
The limit defining φ exists. Let y ∈ Y and m > ℓ, then there exists a
point
(ℓ) (m)
x ∈ B1/ℓ xy(ℓ) ∩ B1/m xy(m)
and so
(ℓ) (m) (ℓ) (m) 1 1
d xy(ℓ) , xy(m) 6 d xy(ℓ) , x + d x, xy(m) < ℓ + m < 2ℓ . (7.15)
(m)
This shows that xy(m) is a Cauchy sequence in X and so has a limit in X.
Continuity of φ. Let y ∈ Y and fix ε > 0. Choose ℓ with 4ℓ < ε. Suppose
that z ∈ Y belongs to the neighbourhood πℓ−1 ({y(ℓ)}) defined by the ℓth
coordinate of y. Letting m → ∞ in (7.15) we see that
(ℓ) 2
d xy(ℓ) , φ(y) 6 ℓ
and similarly
(ℓ)
d xz(ℓ) , φ(z) 6 2ℓ .
However, by the choice of z we have y(ℓ) = z(ℓ) and so

d φ(z), φ(y) 6 4ℓ < ε.
This shows the continuity of φ.

Surjectivity. Let x ∈ X, and choose for every m > 1 an index y(m)
in {1, . . . , n(m)} with
(m)
x ∈ B1/m xy(m) ,
which is possible by (7.14). It follows directly from the definitions that y ∈ Y
and that φ(y) = x.
7.4.4 Locally Compact σ-Compact Metric Spaces
Knowing Theorem 7.44 for compact metric spaces, we now extend it to σ-

compact locally compact metric spaces using suitable patchworking.
Proof of Theorem 7.44. Let X be a locally compact σ-compact metric
space, and let Λ : Cc (X) → R be a positive linear functional. By Lemma A.22
o
there exists a sequence
S∞ of compact sets (Kn ) with Kn ⊆ Kn+1 for all n > 1
and with X = n=1 Kn .
By Urysohn’s lemma (Lemma A.27) there exists a function fn ∈ Cc (X)

with 1Kn 6 fn 6 1 for each n > 1. If f ∈ Cc (Kno ) then

sup f (x) fn − f > 0
o
x∈Kn
and hence
Λ(f ) 6 Λ(fn ) sup f (x).
o
x∈Kn
We now consider Cc (Kno ) as a subspace of the space of continuous func-

tions C(Kn ) on Kn . The norm-like function
pn (f ) = Λ(fn ) sup f (x)

o
x∈Kn
for f ∈ C(Kn ) has all the properties needed to apply Lemma 7.1, so Λ|Cc (Kno )
extends to some Λn defined on C(Kn ) and is again positive (use the argument
from Section 7.4.3 to check this), and can be represented by a finite meas-
ure µn defined on the Borel sets in Kn . Restricting this measure µn to Kno ,
we obtain a measure µn = µn |Kno on Kno with
Z
Λ(f ) = f dµn
o
Kn
for all f ∈ Cc (Kno ). We claim that these measures can be patched together to
define a locally finite measure µ on X with the desired properties. For this,
o
notice that µn+1 is a measure on Kn+1 which satisfies
Z Z
Λ(f ) = f dµn+1 = f dµn+1
o
Kn+1 o
Kn
for all f ∈ Cc (Kno ) ⊆ Cc (Kn+1

o
). By the uniqueness of the measure in The-
orem 7.44 (see Section 7.4.1) this shows that µn+1 |Kno = µn for all n > 1.
Using this compatibility property we may define
∞
X
µ(B) = lim µn (B ∩ Kno ) = µ1 (B ∩ K1o ) + µn (B ∩ KnorKn−1
o
)
n→∞
n=2
P∞
for any measurable B ⊆ X. Alternatively
we may also write µ = n=1 µ′n ,
where µ′1 = µ1 , X1 = K1o , µ′n = µn Xn =Kno rKn−1
o for n > 2. By Exercise 3.30

this shows that µ is indeed a measure. Note that µ Kno = µn for n > 1.
By construction {Kno | n ∈ N} is an open cover of X. Hence for a given
compact subset K ⊆ X there is a finite subcover of K so K is contained in
some Kno , and we have µ(K) 6 µn (Kno ) < ∞, so µ is locally finite. By the
same argument any f ∈ Cc (X) belongs to some Cc (Kno ) and hence
248 7 Dual Spaces
Z Z Z
Λ(f ) = f dµn = f dµ = f dµ
o
Kn o
Kn X
as required.
Exercise 7.53. Let X be a σ-compact locally compact metric space, and let Λ be a pos-
itive linear functional
Z C0 (X) → R (where we do not assume that Λ is bounded). Show
that Λ(f ) = f dµ for all f ∈ C0 (X) for a finite measure µ on X.
7.4.5 Continuous Linear Functionals on C0 (X)
In the remainder of this section we again treat the real and the complex case
simultaneously. The following result describes the dual of C0 (X).
Theorem 7.54 (Riesz representation on C0 (X)). Let X be a locally com-

∗
pact σ-compact metric space, and let Λ ∈ (C0 (X)) be a continuous linear
functional on the space C0 (X) of continuous functions on X that vanish at
infinity. Then there exists a uniquely determined signed measure µ represent-
ing Λ. That is, there exists a positive finite measure |µ| and some measurable g
with kgk∞ = 1 such that dµ = g d|µ| defines a signed measure with
Z Z
Λ(f ) = f dµ = f g d|µ|
X X
for all f ∈ C0 (X). The operator norm of Λ is equal to kgkL1|µ| (X) , which
shows that C0 (X)∗ ∼
= M(X) under the pairing
Z
hf, µi = f dµ
for f ∈ C0 (X) and µ ∈ M(X), where M(X) is equipped with the norm in
Exercise 3.33.
We note that in a sense Theorem 7.54 also gives a polar decomposition for
complex signed measures (see Exercises 7.56 and 7.55).
In the proof below we first construct from the linear functional Λ a positive
linear functional |Λ| (which may be called the positive version of Λ) which
will give rise to the positive finite measure |µ|. The existence of g will then
follow from Proposition 7.34. At first sight the construction of |Λ| is surprising
— we will force positivity, and then linearity is a minor miracle. Comparing
this construction to our discussion of the operator norm of integration in
Lemma 2.63 and its proof should make this less surprising.
Proof of Theorem 7.54. Let Λ be a continuous linear functional on C0 (X).
Uniqueness: To see the uniqueness claim in the theorem, suppose that Λ is
represented by dµ1 = g1 d|µ1 | and also by dµ2 = g2 d|µ2 |. Define
µ = |µ1 | + |µ2 |
and notice that |µ1 |, |µ2 | ≪ µ. By Proposition 3.29 this implies that there is
a measurable function hj > 0 with d|µj | = hj dµ for j = 1, 2. This shows
that Λ is representated by dµj = gj hj dµ for j = 1, 2, so (g1 h1 − g2 h2 ) dµ
represents the zero functional on C0 (X). By Lemma 2.63 this implies that
kg1 h1 − g2 h2 kL1µ = 0,
which in turn implies that µ1 = µ2 .

Defining the positive version |Λ|: To prove the existence, define the
positive version of Λ by
|Λ|(f ) = sup {ℜ(Λ(g)) | g ∈ C0 (X), |g| 6 f }
for any non-negative and continuous f ∈ C0,R (X). Clearly |Λ(g)| 6 kΛkop kf k∞
for all g as in the definition of |Λ|(f ) and so
0 6 |Λ|(f ) 6 kΛkop kf k∞ . (7.16)
Moreover, the definition readily implies |Λ|(αf ) = α|Λ|(f ) for α > 0. In

order to extend |Λ| to an R-linear functional on C0,R (X) we first consider
functions f1 , f2 ∈ C0,R (X) with f1 > 0 and f2 > 0, and claim that
|Λ|(f1 + f2 ) = |Λ|(f1 ) + |Λ|(f2 ). (7.17)
One inequality is quite easy. If gi ∈ C0 (X) satisfies |gi | 6 fi for i = 1, 2, then
|g1 + g2 | 6 |g1 | + |g2 | 6 f1 + f2

and so ℜ Λ(g1 ) + ℜ Λ(g2 ) = ℜ Λ(g1 + g2 ) 6 |Λ|(f1 + f2 ), which shows
that
|Λ|(f1 ) + |Λ|(f2 ) 6 |Λ|(f1 + f2 ).
To show the reverse inequality, we need to take some function g ∈ C0 (X)
with |g| 6 f1 + f2 and split it into continuous functions g = g1 + g2 with the
property that |g1 | 6 f1 and |g2 | 6 f2 . We define
(
g(x) if |g(x)| 6 f1 (x),
g1 (x) = g(x)
f
|g(x)| 1 (x) if |g(x)| > f1 (x) and g(x) 6= 0,
which we claim is a continuous function satisfying |g1 | 6 f1 .

First consider the restriction h of g1 to the set
D = {x ∈ X | |g(x)| > f1 (x)}.
Clearly h is continuous wherever g(x) 6= 0. For the points

250 7 Dual Spaces
x ∈ D0 = {x ∈ D | g(x) = 0}
we have h(x) = 0 by definition of g1 , and f1 (x) = 0 by definition of D. It

follows that |h| = |f1 | on D and, since f1 is continuous, we see that x0 ∈ D0
and x → x0 inside D implies that h(x) → 0 as x → x0 inside D, which proves
that h is also continuous at points in D0 . Now notice that g1 is continuous on
the two closed sets D and {x ∈ X | |g(x)| 6 f1 (x)}, the union of which is X.
It follows that g1 is continuous and satisfies |g1 | 6 f1 on X. Since f1 ∈ C0 (X)
we also have g1 ∈ C0 (X).
We also define g2 = g − g1 ∈ C0 (X) and notice that
(
0 if |g(x)| 6 f1 (x),
|g2 (x)| =
|g(x)| − f1 (x) if |g(x)| > f1 (x)
so that |g2 | 6 f2 by the assumption on g. Hence
ℜ(Λ(g)) = ℜ(Λ(g1 )) + ℜ(Λ(g2 )) 6 |Λ|(f1 ) + |Λ|(f2 ),
which proves that

|Λ|(f1 + f2 ) 6 |Λ|(f1 ) + |Λ|(f2 )
since g ∈ C0 (X) was arbitrary satisfying |g| 6 f1 + f2 . In particular, we now
obtain (7.17).
Linearity of |Λ|: Now let f be any real-valued function in C0,R (X). We
extend the definition of |Λ| by the formula
|Λ|(f ) = |Λ|(f + ) − |Λ|(f − ), (7.18)
where f + = max{f, 0} and f − = max{−f, 0} are non-negative continuous

functions. We now have
|Λ|(αf ) = α|Λ|(f )
for all α ∈ R and f ∈ C0,R (X). We note that (7.16) extends to all f ∈ C0,R (X)
since |Λ|(f + ), |Λ|(f − ) ∈ [0, kΛkopkf k∞ ]. For linearity it remains to show that
|Λ|(f1 + f2 ) = |Λ|(f1 ) + |Λ|(f2 ) (7.19)
for f1 , f2 ∈ C0,R (X). To see this, notice first that
(f1 + f2 )+ − (f1 + f2 )− = f1 + f2 = f1+ − f1− + f2+ − f2−
and so (f1 + f2 )+ + f1− + f2− = (f1 + f2 )− + f1+ + f2+ . We may apply |Λ| to
the latter equation and use the non-negative linearity in (7.17) to get

|Λ| (f2 +f2 )+ +|Λ|(f1− )+|Λ|(f2− ) = |Λ| (f1 +f2 )− +|Λ|(f1+ )+|Λ|(f2+ ).
Rearranging the terms again we get


|Λ| (f2 +f2 )+ −|Λ| (f1 +f2 )− = |Λ|(f1+ )−|Λ|(f1− )+|Λ|(f2+ )−|Λ|(f2− ),
which is precisely (7.19) by definition of |Λ| in (7.18).

Applying Riesz representation: We have thus shown that |Λ| is a
bounded positive linear functional on C0,R (X). Restricting it to Cc,R (X) we
may apply Theorem 7.44 and find a positive measure |µ| with
Z
|Λ|(f ) = f d|µ| (7.20)
X
for f ∈ Cc,R (X). Now use local compactness, σ-compactness, and Urysohn’s
lemma (see Lemma A.22 and Lemma A.27) to find some non-negative func-
tion fn ∈ Cc,R (X) with fn ր 1 as n → ∞ and apply monotone convergence
to obtain Z
|µ|(X) = lim fn d|µ| = lim |Λ|(fn ) 6 kΛkop (7.21)
n→∞ n→∞
by (7.16). Note that (7.20) extends now to all f ∈ C0,R (X) by applying (7.20)
to the sequence (fn f ) in Cc,R (X) together with continuity of |Λ| and domin-
ated convergence.
Description of Λ: We now return to the study of the original functional Λ.
For any f ∈ C0 (X) we may apply the definition of |Λ| and |µ| to obtain
Z
|Λ(f )| = αΛ(f ) = ℜ(Λ(αf )) 6 |Λ|(|αf |) = |f | d|µ| (7.22)
X
for some α ∈ C with |α| = 1. However, (7.22) shows that Λ is continuous

with respect to k · kL1|µ| (X) , hence by density of Cc (X) extends to L1|µ| (X),
and so by Proposition 7.34 must be of the form
Z
Λ(f ) = f g d|µ|
X
for some g ∈ L∞|µ| (X). Moreover, kΛkop = kgkL1 (µ) by Lemma 2.63, and to-
gether with (7.21) we obtain kΛkop = kgkL1(µ) 6 kgk∞ |µ|(X) 6 kgk∞ kΛkop ,
and so kgk∞ = 1 follows unless kΛkop = 0. In the trivial case Λ = 0 we
have |µ| = 0 and may also set g ≡ 1.
Exercise 7.55. In the notation of Theorem 7.54 (and of its proof) show that |g| = 1
for |µ|-almost every x ∈ X.
Exercise 7.56. (a) Recall that µ = ν1 − ν2 defines a real signed measure µ on X if ν1 , ν2

are two finite measures on a measurable space X. Show that for every real signed measure µ
there exist uniquely determined positive measures µ+ ⊥ µ− with µ = µ+ − µ− . Also show
that µ+ (B) = sup{µ(A) | A ⊆ B measurable} and similarly for µ− . Show the existence
of µ+ , µ− as a corollary of Proposition 3.29 and also of Theorem 7.54 if X is a locally
compact σ-compact metric space.
(b) Suppose that µ is a complex signed measure defined by dµ = h dν for some finite
positive measure ν on X and some h ∈ L1ν (X). Show that µ also has a representation in
252 7 Dual Spaces
the form dµ = g d|µ| with |g| = 1 everywhere and |µ| being a positive finite measure. Show
that |µ| is uniquely determined, as is g, |µ|-almost everywhere.
Exercise 7.57. Let X be a locally compact σ-compact metric space, and let Λ be a linear
functional on Cc (X) with the property that for any compact K ⊆ X there is a con-
stant CK > 0 such that |Λ(f )| 6 CK kf k∞ for any f ∈ Cc (X) with Supp f ⊆ K. Show
that Λ can be represented by a signed Radon measure on X, meaning that there exists a
Radon measure µ on X and a locallyR integrable (that is, integrable on any compact subset)
function g on X such that Λ(f ) = f g dµ for all f ∈ Cc (X).
Exercise 7.58. Let X = [0, 1] ⊆ R (though the reader will notice that the same conclu-
sions holds on most compact metric spaces).
(a) Notice that every finite signed measure µ on X defines a linear functional on the
space L ∞ (X) = {f : X → R | kf k∞ < ∞, f measurable} but that L ∞ (X)∗ contains
other functionals as well.
(b) Notice that every function f ∈ L ∞ (X) defines a linear functional on the space of finite
signed measures M(X) ∼ = C(X)∗ . Deduce that C(X) is not reflexive. Show that M(X)∗
contains more functionals than those arising from L ∞ (X).
Exercise 7.59. Find a description of the dual of C n ([0, 1]) for all n ∈ N.
7.5 Further Topics
• As we will see in Section 8.6, the Hahn–Banach lemma (Lemma 7.1) is

very useful in the study of closed convex sets (even in the more general
setting of locally convex vector spaces introduced in Section 8.4).
• The explicit description of the dual spaces in this chapter will give us
concrete cases of the weak and the weak* topology in Chapter 8.
• The more general Marcinkiewicz interpolation theorem gives a result sim-
ilar to the Riesz–Thorin interpolation theorem for certain non-linear op-
erators, see Folland [33, Sec. 6.5].
• The Riesz representation theorem (Theorem 7.44) has numerous applic-
ations. It plays a crucial role in obtaining a point in a convex set as a
generalized convex combination of extreme points of the convex set (see
Section 8.6.1), in the spectral theory of unitary and self-adjoint operators
on Hilbert spaces (see Chapters 9 and 12–13), in the spectral theory of
unitary representations of locally compact abelian groups (see Herglotz’s
Theorem 9.6), and also in the construction of the Haar measure of a
locally compact group (see Section 10.1).
Chapter 8
Locally Convex Vector Spaces
In this chapter we introduce the important weak and weak* topologies on

Banach spaces and their duals, prove an important compactness result, in-
troduce two more topologies on B(V, W ), and put these into the general
context of locally convex vector spaces. Finally, we also discuss convex sets
of locally convex vector spaces.
8.1 Weak Topologies and the Banach–Alaoglu Theorem
As we have seen in Proposition 2.35, the unit ball in an infinite-dimensional

normed vector space is not compact in the topology induced by the norm
(which is often called the norm or strong topology). Given the central im-
portance of compactness in much of analysis, this is a significant problem. In
general this is simply something that must be lived with as a price to pay
for the additional power of doing analysis in infinite-dimensional spaces, but
we can also improve the chance of finding compactness by studying weaker
topologies than the norm topology. We also refer to Appendix A.3 since many
definitions of topologies in this chapter are special cases of more general con-
structions discussed there. As usual, for a given normed vector space X the
space X ∗ consists of the linear functionals that are continuous with respect
to the norm topology on X.
Definition 8.1. Let X be a normed vector space with dual space X ∗ . The
weak topology on X is the weakest (coarsest) topology on X for which all the
elements of X ∗ (which are functions on X) are continuous.
Exercise 8.2. Show that the weak and norm topologies coincide for a finite-dimensional
normed vector space.
By the properties of the initial topology (Definition A.15) a neighbourhood

in the weak topology of x0 ∈ X is a set containing a set of the form

254 8 Locally Convex Vector Spaces
n
\
Nℓ1 ,...,ℓn ;ε (x0 ) = x ∈ X | |ℓi (x) − ℓi (x0 )| < ε
i=1
for some ε > 0 and functionals ℓ1 , . . . , ℓn ∈ X ∗ . Note that a sequence (xn )

in X converges in the weak topology to x ∈ X if and only if ℓ(xn ) → ℓ(x)
for every ℓ ∈ X ∗ . However, sequences alone are in general not sufficient to
describe a topology (see Exercise 8.15); one needs to consider filters (or nets)
instead. We therefore generalize this comment to that setting in the next
exercise.
Exercise 8.3. Given a normed vector space X, show that a filter F ⊆ P(X) converges in
the weak topology to x ∈ X if and only if limF ℓ = ℓ(x) for all ℓ ∈ X ∗ (see Appendix A.2).
If X is infinite-dimensional, then in contrast to Exercise 8.2 the weak

topology and the norm topology are different. To see this notice that
n
\
ker ℓi ⊆ Nℓ1 ,...,ℓn ;ε (0),
i=1
which implies that no neighbourhood of 0 in the weak topology can be

bounded with respect to the norm of X.
Definition 8.4. Let X be a normed vector space with dual space X ∗ . The
weak* topology (read as ‘weak star’ topology) is the weakest (or coarsest)
topology on X ∗ for which all the evaluation maps x∗ 7→ x∗ (x) corresponding
to x ∈ X are continuous.
Once again we can describe the weak* topology by saying that a neigh-
bourhood of x∗0 ∈ X ∗ is a set containing a set of the form
n
\
Nx1 ,...,xn ;ε (x∗0 ) = x∗ ∈ X ∗ | |x∗ (xi ) − x∗0 (xi )| < ε
i=1
for some ε > 0 and x1 , . . . , xn ∈ X. As before, we can show that the weak*
topology and the norm topology on X ∗ are different if X (and hence if X ∗ )
is infinite-dimensional.
Example 8.5. (a) For a Hilbert space H, the weak and weak* topologies are
identical. The same holds for any reflexive Banach space. However, in general
there is no definition of a weak* topology on a given Banach space as there
may not exist a pre-dual of X, meaning a Banach space Y with X = Y ∗ (see
Example 8.81).
(b) Let X = [0, 1] and consider the sequence of measures (µn ) where
1
µn = δ1/n + δ2/n + · · · + δ1 ,
n
8.1 Weak Topologies and the Banach–Alaoglu Theorem 255
viewed (via integration) as functionals on C(X) (see Theorem 7.54; here δt

denotes the point measure defined by δt (A) = 1 if t ∈ A and 0 if not).
Then the sequence of measures (µn ) converges in the weak* topology to the
Lebesgue measure λ, which we also identify with the functional it induces.
Notice that this statement is equivalent to the beginning of the theory of the
Riemann integral for continuous functions, which should help to explain why
the weak* topology is a quite natural notion.
Notice however that (µn ) does not converge in the weak topology, nor a
fortiori in the norm topology. To see the former, notice that every function
in L ∞ ([0, 1]) induces a linear functional on the space M([0, R 1]) of finite
signed measures
R on [0, 1], and that for f = 1 Q ∩ [0,1] we have f dµn = 1 for
all n, while f dλ = 0. Thus the weak and weak* topologies on M([0, 1]) are
different.
We note that we have already seen other interesting examples of weak*

convergence. In fact, Proposition 3.65 can be interpreted as saying that for
every x ∈ T the measures defined by t 7→ FM (x − t) dt on T converge in the
weak* topology to the Dirac measure δx corresponding to the unit point mass
at x. The reader can analyze the proof of Proposition 3.65 and the material
of this section to prove the following theorem due to Toeplitz.
Exercise 8.6 (Toeplitz). Suppose that (kn ) is a sequence of integrable complex-valued
functions defined on [0, 1], and let x be a point in [0, 1]. Then the measures defined
by kn (t) dt converge in the weak* topology to δx as n → ∞ if and only if all of the
following conditions hold:
(1) kkn k1 6 C for some constant C independent of n;

Z 1
(2) kn (t) dt −→ 1 as n → ∞; and
Z0 1
(3) kn (t)g(t) dt −→ 0 as n → ∞ for all g ∈ C ∞ ([0, 1]) with x ∈
/ Supp(g).
0
Lemma 8.7. For a Banach space X the weak topology on X and the weak*
topology on X ∗ are Hausdorff.
Proof. For the weak topology this follows from Corollary 7.4: if y 6= z in X
there exists some ℓ ∈ X ∗ with ℓ(y) 6= ℓ(z), so that Nℓ;ε (y) ∩ Nℓ;ε (z) = ∅
for ε = |ℓ(z)−ℓ(y)|
2 . The proof for the weak* topology is similar, using the fact
that for x∗1 6= x∗2 there exists some x ∈ X with x∗1 (x) 6= x∗2 (x).
Exercise 8.8. Let X be a Banach space and let (xn ) be a sequence converging to x ∈ X
in the weak topology. Show that supn>1 kxn k < ∞. In other words, show that weakly
convergent sequences in Banach spaces are bounded.
The following exercise shows that the weak and the weak* topologies have
natural compatibility properties with respect to bounded operators.
Exercise 8.9. Let A : X → Y be a bounded operator between two Banach spaces X

and Y .
(a) Show that A is also a continuous operator if we equip both X and Y with the respective
weak topologies.
(b) Consider the dual operator A∗ : Y ∗ → X ∗ defined by A∗ (y ∗ ) = y ∗ ◦ A ∈ X ∗ for
all y ∗ ∈ Y ∗ . Show that A∗ is continuous if we endow both dual spaces with the weak*
topologies.
(c) Suppose now that A is a compact operator. Show that Axn → Ax as n → ∞ in the
norm topology on Y whenever xn → x as n → ∞ in the weak topology on X.
8.1.1 Weak* Compactness of the Unit Ball
The importance of the weak* topology comes from the following theorem,
which was alluded to in the introduction to the chapter.
Theorem 8.10 (Banach–Alaoglu). The closed unit ball
B1X = {ℓ ∈ X ∗ | kℓkop 6 1}
∗
in the dual X ∗ of a normed vector space X is compact in the weak* topology.
Proof. Let B(r) be the closed (and hence compact) ball of radius r > 0
in R or C depending on the field of scalars. By Tychonoff’s theorem (see
Theorem A.20) the space
Y
Y = B(kxk)
x∈X
is compact with respect to the product topology (see Definition A.16). Now
define the embedding
∗
φ : B1X −→ Y
ℓ 7−→ (ℓ(x))x∈X ∈ Y.
Let πx : Y −→ B(kxk) be the projection operator (or evaluation map) cor-

responding to x ∈ X defined by Y ∋ y 7→ πx (y) = y(x). Then the neigh-
∗
bourhoods of some y = φ(ℓ0 ) with ℓ0 ∈ B1X in the product topology are sets
containing sets of the form
n
\
N= πx−1
i
(Bε (ℓ0 (xi ))) .
i=1
Now notice that the pre-image of N under φ takes the form

n
\
φ−1 (N ) =
∗
{ℓ ∈ B1X | |ℓ(xi ) − ℓ0 (xi )| < ε} = Nx1 ,...,xn ;ε (ℓ0 ),
i=1
∗
which is precisely one of the neighbourhoods of ℓ0 ∈ B1X defining the weak*
∗ ∗
topology on B1X . Therefore, φ is a homeomorphism from B1X (with the re-
striction of the weak* topology) to a subset of Y (with the product topology).
We claim that
∗
φ B1X ⊆ Y
is closed, which then implies the theorem, since any closed subset of Y is
compact since Y is itself compact.
∗
To see the claim, notice first that φ(B1X ) consists of all linear maps in Y .
This is because any element y ∈ Y is a scalar-valued function on X with
y(x) ∈ B(kxk)
for all x ∈ X, and so if y is linear then kyk 6 1. The claim now follows easily
since linearity is defined by equations and so is a closed condition, as we will
now show. In fact for any scalars α1 , α2 the set
Dα1 ,α2 = {(λ1 , λ2 , λ3 ) | λ3 = α1 λ1 + α2 λ2 }
is closed, and the joint evaluation map
πx1 ,x2 ,α1 x1 +α2 x2 (y) = (y(x1 ), y(x2 ), y(α1 x1 + α2 x2 ))
is continuous by definition of the product topology on Y . Hence the set of all

linear maps in Y is given by
\
πx−1
1 ,x2 ,α1 x1 +α2 x2
(Dα1 ,α2 )
x1 ,x2 ,α1 ,α2
and so is closed, and the theorem follows.
8.1.2 More Properties of the Weak and Weak* Topologies
The weak and weak* topologies are never metrizable for infinite-dimensional
Banach spaces (see Exercise 8.12), but when restricted to the unit ball the
situation is better.
Proposition 8.11. Let D ⊆ X be a dense subset of a normed vector space.

∗ ∗
Then the weak* topology restricted to B1X is the weakest topology on B1X
for which the evaluation maps ℓ 7→ ℓ(x) are continuous for all x ∈ D. In
∗
particular, if X is separable, then the weak* topology restricted to B1X is
metrizable.
Proof. Suppose that D ⊆ X is dense, and suppose that
Nx;ε = {ℓ ∈ X ∗ | |ℓ(x) − ℓ0 (x)| < ε}

∗
is a neighbourhood of ℓ0 ∈ B1X defined by ε > 0 and some arbitrary x ∈ X.
∗
Choose some x′ ∈ D with kx − x′ k < 3ε , and notice that for all ℓ ∈ B1X we
′ ε
have |ℓ(x) − ℓ(x )| < 3 and so
∗ ∗
Nx′ ;ε/3 (ℓ0 ) ∩ B1X ⊆ Nx;ε (ℓ0 ) ∩ B1X
by a simple application of the triangle inequality (check this). Thus the to-
∗
pologies defined on B1X using the evaluation maps for x ∈ D or for x ∈ X
(the latter being the weak* topology by definition) agree.
For the last claim of the proposition, notice that if X is separable, then
by definition there exists a countable dense set D = {x1 , x2 , . . . } ⊆ X. For
every xn ∈ D the weakest topology for which ℓ 7→ ℓ(xn ) is continuous is
the topology induced by the semi-norm kℓkxn = |ℓ(xn )|, and so the weak*
∗
topology is the weakest topology on B1X that is stronger than all the topo-
logies induced by the semi-norms k · kxn for n ∈ N. By Lemma A.17 and the
Hausdorff property of the weak* topology from Lemma 8.7, this topology is
metrizable.
Exercise 8.12. Let X be an infinite-dimensional Banach space.

(a) Show that X is not the span of countably many elements of X. That is, show that for
any x1 , x2 , . . . ∈ X we have X 6= hxn | n ∈ Ni. Of course we may have X = hxn | n ∈ Ni.
(b) Use part (a) to show that the weak* topology does not have a countable basis of
neighbourhoods of 0. Conclude that the weak* topology on X ∗ cannot be metrizable.
(c) Generalize (b) to the weak topology on X.
Let us finish with the following lemma, which answers both of the following
questions for a Banach space affirmatively:
• Does X ∗ as a vector space with the weak* topology characterize X?
• If the weak and weak* topologies on X ∗ agree, does it follow that X is
reflexive?
Lemma 8.13. Let X be a Banach space. A functional on X ∗ is continuous

with respect to the weak* topology if and only if it is an evaluation map, that
is a map of the form f : x∗ 7→ x∗ (x) for some x ∈ X.
Proof. Suppose f is a functional on X ∗ continuous with respect to the

weak* topology. Then f −1 (B1C ) is a neighbourhood of 0 in X ∗ , and so there
exist x1 , . . . , xn ∈ X and ε > 0 with

Nx1 ,...,xn ;ε (0) ⊆ f −1 B1C .
If now x∗ ∈ X ∗ satisfies x∗ (x1 ) = · · · = x∗ (xn ) = 0 then any multiple of x∗

belongs to Nx1 ,...,xn ;ε (0) and therefore |f (M x∗ )| < 1 for all scalars M . This
implies that f (x∗ ) = 0, or in other words that f induces a functional on
 
φx1
n
\  φx2 
∼  
Y = X ∗/ ker φxi = im  ..  = V,
i=1
 . 
φxn
where φx (x∗ ) = x∗ (x) for x∗ ∈ X ∗ and x ∈ X. However, the dual of V

is generated by the restrictions of the coordinate functions to V , and these
correspond to the functionals φx1 , . . . , φxn on Y , so that f must be a linear
combination of the form
n
X
f= αi φxi = φPni=1 αi xi
i=1
for some scalars α1 , . . . , αn , as claimed.

Lemma 8.13 in particular determines the vector space X (and its weak
topology) as the space of continuous functionals on X ∗ with respect to the
weak* topology, which answers the first question above affirmatively. The
second question can also be answered using the lemma. Suppose the weak
and weak* topologies on X ∗ agree and ℓ ∈ X ∗∗ is a linear functional on X ∗ .
By definition ℓ is also continuous with respect to the weak topology, and so
also with respect to the weak* topology by assumption. Lemma 8.13 shows
that ℓ(x∗ ) = x∗ (x) for all x∗ ∈ X ∗ for some x ∈ X, which shows that X is
reflexive since ℓ ∈ X ∗∗ was arbitrary.
Exercise 8.14. Let X be a reflexive Banach space. Let (xn ) in X be a bounded sequence.
Show that (xn ) has a weakly convergent subsequence. Notice that this follows immediately
from Theorem 8.10 and Proposition 8.11 if X ∗ is separable; show it in general.
Exercise 8.15. We know that the weak topology and the norm topology on infinite-
dimensional Banach spaces are different. In contrast to this, show that a sequence in ℓ1 (N)
converges in the weak topology if and only if it converges in the norm topology.
Exercise 8.16. Fix p ∈ (1, ∞).

(a) Prove that a sequence (fn ) in ℓp (N) converges weakly to f ∈ ℓp (N) if and only if there
is some M with kfn kp 6 M for all n > 1 and fn (k) → f (k) as n → ∞ for each k ∈ N.
(b) Find a sequence in ℓp (N) that converges weakly but not in norm.
Exercise 8.17. Let X, Y be normed vector spaces, and let T : X → Y be linear. Show
that T is a bounded operator if and only if T is sequentially continuous with respect to
the weak topology, that is, xn → x weakly in X as n → ∞ implies that T xn → T x weakly
in Y as n → ∞.
Exercise 8.18. Let X be an infinite-dimensional normed vector space. Show that the weak
closure of the unit sphere S = {x ∈ X | kxk = 1} is the closed unit ball
B1X = {x ∈ X | kxk 6 1}.

8.1.3 Analytic Functions and the Weak Topology
†
As we have seen, weak convergence and norm convergence are in general
quite different. There are, however, situations in which weak convergence
can be upgraded to norm convergence. Analytic functions taking values in a
Banach space provide one setting where this phenomenon is seen.
Definition 8.19. Let G ⊆ C be an open set, and let X be a complex Banach

space. A function f : G → X is called (strongly) analytic if for every ζ ∈ G
the limit
f (ζ + h) − f (ζ)
f ′ (ζ) = lim
h→0 h
exists in the norm topology. Also f is called weakly analytic if for every ℓ ∈ X ∗
and ζ ∈ G the limit
ℓ(f (ζ + h)) − ℓ(f (ζ))

(ℓ ◦ f )′ (ζ) = lim
h→0 h
exists.
Notice that in the definition of weak analyticity we do not see immediately

whether we can associate to f and ζ a weak limit of the difference quotient
defining f ′ (ζ). What we can associate to f and ζ ∈ G in terms of a derivative
is a weak* limit in X ∗∗ ,
ℓ(f (ζ + h)) − ℓ(f (ζ))

X ∗ ∋ ℓ 7−→ lim ∈ C,
h→0 h
which is bounded by a corollary of the Banach–Steinhaus theorem (Corol-
lary 4.3). However, much more is true.
Theorem 8.20 (Dunford). Let G ⊆ C be an open set and let X be a Banach

space. Then every weakly analytic function f : G → X is analytic.
Proof. Let ℓ ∈ X ∗ , so that by assumption ℓ ◦ f : G → C is analytic.

Choose ζ ∈ G, ε > 0 sufficiently small, and h ∈ BεC . Then
I
1 ℓ ◦ f (z)
ℓ ◦ f (ζ + h) = dz
2πi |z−ζ|=ε z − (ζ + h)
by the Cauchy integral formula, where the integral is a contour integral over
a circular path with positive orientation winding once around ζ with radius
of ε. Therefore we have
† This subsection will not be needed in the remainder of the book.
8.2 Applications of Weak* Compactness 261
I
1 1 1
ℓ ◦ f (ζ + h) − ℓ ◦ f (ζ) = ℓ ◦ f (z) − dz
2πi |z−ζ|=ε z − (ζ + h) z − ζ
I
h 1
= ℓ ◦ f (z) dz. (8.1)
2πi |z−ζ|=ε (z − (ζ + h))(z − ζ)
For h 6= h′ in BεCr{0} we write

1 f (ζ + h) − f (ζ) f (ζ + h′ ) − f (ζ)
x(h, h′ ) = −
h − h′ h h′
for the second-order difference quotient. We claim that x(h, h′ ) is uniformly

C r
bounded for h 6= h′ in Bε/2 {0}. This will give the theorem, since it implies
that
f (ζ + h) − f (ζ)
h 7−→
h
C r
is a Lipschitz function on Bε/2 {0} and so has a limit as h → 0.
To prove the claim let ℓ ∈ X ∗ and use (8.1) to calculate that
I h i
′ 1 ℓ◦f (z) ℓ◦f (z)
ℓ(x(h, h )) = 2πi(h−h′ ) (z−(ζ+h))(z−ζ) − (z−(ζ+h ))(z−ζ) dz
′
|z−ζ|=ε
I
ℓ◦f (z)✘ ✘✘
(h−h ′
)
✘✘
1
= 2πi✘
(h−h ′
) (z−ζ)(z−(ζ+h))(z−(ζ+h′ )) dz. (8.2)
|z−ζ|=ε
Notice that the denominator in the integral on the right-hand side of (8.2)
is uniformly bounded away from zero, and the numerator is bounded above
by M kℓk for some constant M depending only on f , ζ, and ε. It follows that
|ℓ(x(h, h′ ))| 6 M ′ kℓk

C r
for some constant M ′ not depending on h 6= h′ ∈ Bε/2 {0}. By Corollary 7.9
we see that kx(h, h′ )k 6 M ′ , which proves the claim and hence the theorem.

Another instance where weak convergence can be upgraded to strong con-
vergence arises in the proof of a version of the mean ergodic theorem for a
measure-preserving group action. We refer to [27, Sec. 8.7], where a simple
version of an argument due to Greschonig and Schmidt [42] is presented.
8.2 Applications of Weak* Compactness
The Banach–Alaoglu theorem (Theorem 8.10) is quite helpful for the con-
struction of the Haar measure on compact abelian groups and invariant means
(see Section 3.3, Section 7.2 and Section 10.2).
Exercise 8.21. Let G be a compact metric abelian group. Show that there exists a G-
invariant positive functional Λ : CR (G) → R with Λ(1) = 1, and deduce the existence of a
Haar measure on G.
Exercise 8.22. Let H be a separable Hilbert space, and suppose that A ∈ B(H) is a

compact operator on H. Show that A B1H is compact and that A∗ is also a compact
operator.
Exercise 8.23 (Discrete abelian groups are amenable). Let G be any abelian dis-
crete group. Define for any finitely generated subgroup H < G the set SH to be the set
of all positive functionals L ∈ (ℓ∞ (G))∗ whichThave norm one and are left-invariant un-
der elements of H. Show that the intersection H f.g. SH taken over all finitely generated
subgroups H in G is non-empty, and deduce that G is amenable by applying Lemma 7.18.
Exercise 8.24. Use Exercise 8.23 and the Riesz representation theorem to give a different
proof of the existence of Haar measure on a compact abelian group.
The next exercise generalizes Exercise 7.21(b) and shows the existence of
a maximal amenable normal subgroup called the amenable radical of G.
Exercise 8.25 (Amenable radical). Let G be a discrete group. Let A be a set and
suppose that Hα ⊳ G is an amenable normal subgroup for any α ∈ A. Show that the
subgroup hHα | α ∈ Ai generated by these subgroups is an amenable normal subgroup.
The following exercise gives an analogue to the existence claim in The-

orem 3.13.
Exercise 8.26. Let X be a Banach space and K ⊆ X ∗ a non-empty weak* closed subset.
Show that for any x∗0 ∈ X ∗ we have kx∗0 − k0 k = mink∈K kx∗0 − kk for some k0 ∈ K.
8.2.1 Equidistribution
The combination of the Riesz representation theorem for functionals on C(X)

(Theorem 7.54) and the compactness of the unit ball in the weak* topology
in the Banach–Alaoglu theorem (Theorem 8.10) provide the basic tools for
studying sequences of probability measures.(25)
Proposition 8.27. Let X be a compact metric space. Then the space P(X)
of probability measures defined on the Borel σ-algebra of X forms a compact
metric space in the weak* topology. The same applies to

M6T (X) = µ is a positive measure on X with µ(X) 6 T
for all T > 0.
Proof. By the Riesz representation theorem (Theorem 7.54) we have
C(X)∗ ∼
= M(X) ⊇ P(X),
where M(X) is the space of finite signed measures defined on the Borel σ-
algebra of X. By Theorem 7.44 the set of probability measures is given by
R \ R
P(X) = µ ∈ M(X) | 1X dµ = 1 ∩ µ ∈ M(X) | f dµ > 0
f >0
where the intersection is taken over all f ∈ C(X) with f > 0. Since each of
the sets in the intersection is closed in the weak* topology, we see that P(X)
is closed as well.
By the Banach–Alaoglu theorem (Theorem 8.10), and since
C(X)∗
P(X) ⊆ B1 ,
this implies that P(X) is compact in the weak* topology. By Lemma 2.46
we know that C(X) is separable, so by Proposition 8.11 the weak* topology
on P(X) is metrizable. The same argument applies to M6T (X).
Exercise 8.28. Let X be a locally compact σ-compact metric space. Show for any T > 0
that M6T (X) (defined as in Proposition 8.27) is compact with respect to the weak*
topology defined by C0 (X) and the identification between C0 (X)∗ and M(X) in the Riesz
representation (Theorem 7.54). Also show that the space of probability measures P(X) is
necessarily not compact if X is not compact.
Definition 8.29. Let X be a compact metric space, and let (µn ) be a se-
quence of probability measures in P(X). We say that (µn ) equidistributes
with respect to a probability measure m ∈ P(X) if µn → m as n → ∞ in
the weak* topology; that is, if
Z Z
f dµn −→ f dm
X X
as n → ∞ for all f ∈ C(X). A sequence (xn ) equidistributes with respect

to m ∈ P(X) if the averages µn = n1 (δx1 + · · · + δxn ) of the Dirac measures
at x1 , . . . , xn equidistribute with respect to m.
One is often interested in equidistribution with respect to a natural given

measure like the Lebesgue measure on T. In that case the natural measure is
often not mentioned, and we simply talk about a sequence of measures being
equidistributed. For the case of the Lebesgue measure on Td the following
provides a characterization of equidistribution.
Lemma 8.30. A sequence (µn ) of probability measures on Td equidistributes

if and only if Z Z
χk dµn −→ χk dx = 0
Td Td
dr
for all k ∈ Z {0}.
Essential Exercise 8.31. Prove Lemma 8.30 using the density of the trigo-
nometric polynomials in C(Td ).
Exercise 8.32. Assume that 1, α1 , . . . , αd ∈ R are linearly independent over Q. Show that
N−1 Z
1 X
f n(α1 , . . . , αd ) (mod Zd ) −→ f (x) dx
N Td
n=0
for any f ∈ C(Td ). Use this to generalize Exercise 2.50 to a statement about powers of 2
and 3 with the same exponent.
Exercise 8.33. Assume that α1 , . . . , αd ∈ R are linearly independent over Q. Show that
Z T Z
1
f t(α1 , . . . , αd ) (mod Zd ) dt −→ f (x) dx
T 0 Td
as T → ∞, for any f ∈ C(Td ).
Lemma 8.30 already gives some examples of equidistributed sequences,

generalizing Example 2.49 (see also Exercises 8.33 and 8.32).
Equidistribution results like this are a starting point for more general res-
ults obtained by Weyl [112]. We will only discuss a special case, and outline
a proof along the lines of a slightly more recent approach due to Fursten-
berg [36].
Proposition 8.34. If α ∈ RrQ, then the sequence (xn ) defined by xn = n2 α

modulo Z for all n > 1 is equidistributed in T.
The approach of Furstenberg is to study not just n2 α modulo Z but in

fact orbits of points (x, y) ∈ T2 under the map T : T2 → T2 defined by
T (x, y) = (x + α, y + 2x + α). (8.3)
Notice that
T (0, 0) = (α, α),

T 2 (0, 0) = (2α, 4α),
..
.
T n (0, 0) = (nα, n2 α),
so that Proposition 8.34 will certainly follow from the stronger result that
the orbit {T n (0, 0) | n > 0} is equidistributed in T2 . Dynamical questions
of this sort — concerning equidistribution of an orbit under iteration of a
map — are part of ergodic theory. We will briefly outline how one can use
the Banach–Alaoglu theorem (Theorem 8.10) to prove Proposition 8.34 using
ideas from ergodic theory without developing this theory further, and refer
to [27] for a more thorough treatment.
Definition 8.35. Let X be a compact metric space, and let T : X → X

be a continuous transformation. A Borel probability measure µ on X is
said to be T -invariant (T is called measure-preserving with respect to µ)
if µ(T −1 B) = µ(B) for Borel measurable sets B ⊆ X. The triple (X, T, µ) is
called a measure-preserving system. A T -invariant probability measure µ is
said to be ergodic if any Borel measurable set B ⊆ X with µ(T −1 B△B) = 0
has µ(B) ∈ {0, 1}.
Ergodicity is the natural notion of indecomposability in ergodic theory
(which includes the study of measure-preserving systems). To see this, notice
that if B ⊆ X is measurable with µ T −1 B△B = 0 and µ(B) ∈ (0, 1), then
we can decompose the measure into a convex combination

1 1
µ = µ(B) µ|B + µ(XrB) µ| ,
µ(B) µ(XrB) XrB

1 1
where one can quickly check that µ(B) µ|B and µ| r are two
µ(XrB) X B
different T -invariant probability measures.
Thus a non-ergodic measure-preserving system can be decomposed into
two disjoint measure-preserving systems (where in both systems we still con-
sider the map T : X → X). Pursuing the idea that a non-ergodic measure
is one that can be decomposed in this way leads to the following alternate
characterization of ergodicity.
Proposition 8.36. Let X be a compact metric space, and let T : X → X be
a continuous transformation. Then the space
P T (X) = {µ ∈ M(X) | µ is a T -invariant probability measure}
is a weak* compact convex subset of M(X). The extreme points of P T (X)

are precisely the ergodic measures in P T (X).
We recall that µ ∈ P T (X) is extremal if it cannot be expressed as a proper
convex combination µ = sν1 + (1 − s)ν2 with s ∈ (0, 1) and ν1 , ν2 distinct
elements of P T (X). We will discuss extreme points from an abstract point of
view in Section 8.6.1.
The characterization in Proposition 8.36 is interesting because it relates
an intrinsic property of a T -invariant probability measure (ergodicity, as in
Definition 8.35) with a property regarding the relative position of this meas-
ure in the space of all T -invariant probability measures.
For the proof we will make use of the following construction. Given a
measurable map θ : (X, B) → (Y, C) between two measurable spaces the
push-forward of a measure µ on (X, B) is the measure R θ∗ µ on (Y,RC) defined
by θ∗ µ(A) = µ(θ−1 A) for all A ∈ C. Notice that Y f dθ∗ µ = X f ◦ θ dµ
for all integrable functions f on (Y, C), which follows for simple functions
directly from the definition of θ∗ µ and then for positive measurable functions
by monotone convergence.
If X and T : X → X are as in the proposition, then the uniqueness part

of Riesz representation (Theorem 7.44) implies that µ is T -invariant if and
only if Z Z Z
f dT∗ µ = f ◦ T dµ = f dµ (8.4)
X X X
for all f ∈ C(X).

Sketch of Proof of Proposition 8.36. By Proposition 8.27, P(X) is
compact in the weak* topology. By the characterization in (8.4), the sub-
set P T (X) ⊆ P(X) is therefore weak* closed and so also compact. It is easy
to see that P T (X) is convex, and the discussion before the proposition shows
that a non-ergodic invariant measure is not extreme.
Suppose now that µ is not extreme, and write µ = sν1 + (1 − s)ν2 with
some s ∈ (0, 1) and ν1 , ν2 distinct measures in P T (X). Clearly ν1 ≪ µ since s
lies in (0, 1), so there is a measurable function f1 > 0 with dν1 = f1 dµ. We
claim that f1 is T -invariant in the sense that f1 ◦ T = f1 almost everywhere
with respect to µ. To see this let B ⊆ X be a measurable set and note that
by T -invariance of µ we have
Z Z Z
ν1 (B) = 1B f1 dµ = (1B f1 ) ◦ T dµ = f1 ◦ T dµ,
B X T −1 B
and, by T -invariance of ν1 ,
Z
ν1 (B) = ν1 (T −1 B) = f1 dµ.
T −1 B
Let us assume† now that T has a continuous inverse (which is the case for
the map on T2 considered above to which this result will be applied). Then
the above implies that f1 = f1 ◦ T almost everywhere with respect to µ,
since T −1 (T B) = B shows that all measurable sets are pre-images.
Since ν1 6= µ the function f1 is not equal to 1 almost everywhere with
respect to µ, and has Z
f1 dµ = ν1 (X) = 1.
X
Therefore, B = f1−1 ([0, 1)) satisfies µ(B△T −1 B) = 0 and has µ(B) ∈ (0, 1),
so µ is not ergodic.
The compactness of P(X) can be used to obtain elements of P T (X) from
sequences of approximately invariant measures.
Essential Exercise 8.37. Let X be a non-trivial compact metric space

and T : X → X be continuous. For any sequence (νn ) in P(X), define a
sequence (µn ) by
† This is not necessary, we refer to [27] for the general case.
n−1
1X j
µn = T νn
n j=0 ∗
for all n > 1. Show that any weak* limit µ of a subsequence of (µn ) is T -
invariant, and deduce that P T (X) is non-empty.(26)
With these general facts about continuous transformations on compact

metric spaces at our disposal, we will return to a consideration of the trans-
formation (8.3). We start by explaining Example 2.49 in this language.
Clearly the map Rα : T → T defined by Rα (x) = x + α preserves the
Lebesgue measure λT , so (T, Rα , λT ) is a measure-preserving system.
Lemma 8.38. If α ∈ RrQ then λT is ergodic for Rα .
Proof. Suppose that B ⊆ T is a measurable set with λT (B△Rα −1

B) = 0.
Then the characteristic function 1B satisfies 1B ◦ Rα = 1R−1α B
= 1B as
elements of L2λT (T). Thus for the Fourier series expansion
X
1B = cm χm ,
m∈Z
which converges in L2λT (T), we have

X X
1B ◦ Rα = cm χm ◦ Rα = cm χm ,
m∈Z m∈Z
where we have used the fact that URα : f 7→ f ◦ Rα is an isometry of L2λT (T)
and hence maps a convergent series to a convergent series. Notice that
χm ◦ Rα (x) = e2πim(x+α) = e2πimα χm (x),
so that (by uniqueness of Fourier coefficients) we must have cm e2πimα = cm

for all m ∈ Z. Since α ∈ RrQ this implies that cm = 0 for m ∈ Zr{0},
so 1B = c0 in L2λT (T), which implies that λT (B) ∈ {0, 1} as required.
In fact, Rα has a stronger property, called unique ergodicity: λT is the
only measure invariant under Rα if α ∈ RrQ. This also implies Lemma 8.38
by Proposition 8.36. To see this stronger result, let µ be any Rα -invariant
probability measure, and calculate
Z Z Z
χm dµ = χm ◦ Rα dµ = e2πimα χm dµ,
T T T
which implies that (

Z
1 for m = 0;
χm dµ =
T 0 for m 6= 0.
Since this is a property shared by λT , and the trigonometric polynomials

are dense in C(T) by Proposition 3.65, we deduce that µ = λT . Using Ex-
ercise 8.37 together with the exercise below gives an alternative approach to
Example 2.49. Of course this approach is more complicated, but it can also be
used in situations where a direct calculation of the sort used in Example 2.49
is not feasible.
Essential Exercise 8.39. (a) Let Z be a topological space, let (zn ) be a

sequence in Z, and let z ∈ Z. Show that the following are equivalent:
• lim zn = z.
n→∞
• For every subsequence (znk ) there is a subsequence (znkℓ ) such that
lim znkℓ = z.
ℓ→∞
(b) Assume in addition that Z is a compact metric space, and show that the
following gives another equivalent condition:
• For every convergent subsequence (znk ) we have
lim znk = z.
k→∞
(c) Assume now that α ∈ RrQ, and use this, together with the fact
that P Rα (T) only contains the measure λT and Exercise 8.37, to show the
equidistribution of (nα) in T.
We now describe the procedure for obtaining equidistribution of the orbits

of the map T defined by T (x, y) = (x + α, y + 2x + α) on T2 discussed earlier,
leaving some of the steps as exercises.
Essential Exercise 8.40. Show that the Lebesgue measure λT2 is T -invariant
and ergodic.
Sketch of Proof of Proposition 8.34. As discussed just after the state-

ment of the proposition, it is enough to show that every T -orbit
(x, y), T (x, y), T 2 (x, y), . . .
is equidistributed with respect to λT2 . Notice that the first coordinate of

points in the orbit are precisely the points in the orbit
2
x, Rα (x), Rα (x), . . .
in T for the transformation Rα . We already know that this sequence is

equidistributed with respect to λT . Write δx for the Dirac measure at x ∈ T,
so the equidistribution of the Rα -orbit of x is equivalent to the statement
n−1
1X j
T (δx × λT ) −→ λT2 (8.5)
n j=0 ∗
as n → ∞ (check this).
Fix some ρ ∈ (0, 12 ) and write λy,ρ = λBρ (y) for the Lebesgue measure
restricted to the ρ-ball Bρ (y) = (y − ρ, y + ρ) ⊆ T around y ∈ T, and consider
the average
n−1
1 X j
T∗ δx × λy,ρ . (8.6)
2ρn j=0
We want to show that these averages converge to λT2 in the weak* topology.
Proposition 8.27 and Exercise 8.39 imply that for this it is enough to show
that any convergent subsequence has λT2 as its limit. So assume (nk ) is the
index sequence of a convergent subsequence, and denote the limit by µ1 .
Using the convergence in (8.5), we see that
nX
k −1
1
T j (δx × (λT − λy,ρ )) −→ µ2
(1 − 2ρ)nk j=0 ∗
converges as k → ∞. By Exercise 8.37 we have µ1 , µ2 ∈ P T (T2 ). We also

have
λT2 = 2ρµ1 + (1 − 2ρ)µ2
by (8.5). Together with Exercise 8.40 and Proposition 8.36 this implies
that λT2 = µ1 = µ2 . Using Exercise 8.39(b) this shows that the average
in (8.6) converges to λT2 as n → ∞.
Using the structure of the map T it is now not too difficult to upgrade the
above to the statement in the proposition. Fix some function f ∈ C(T2 ) and
some ε > 0. By uniform continuity of f there is some ρ ∈ (0, 21 ) such that

d (x1 , y1 ), (x2 , y2 ) < ρ =⇒ |f (x1 , y1 ) − f (x2 , y2 )| < ε
(where d denotes the usual metric on T2 ). With this choice of ρ we have

n−1
1 X 1 XZ ρ
n−1

j
f T (x, y) − f T (x, y + z) dz < ε
j
n 2ρn j=0 −ρ
j=0
since T j (x, y + z) = T j (x, y) + (0, z) has distance less than ρ from T j (x, y)
for all z ∈ (−ρ, ρ). Using the convergence of (8.6) to λT2 , it follows that

Z n−1
X
1
lim sup f dλT2 − f T j (x, y)
n→∞ T2 n j=0

Z
1 XZ
n−1

6 lim sup f dλT2 − f dT∗j (δx × λy,ρ ) + ε = ε.
n→∞ T2 2ρn j=0
Since f ∈ C(T2 ) and ε > 0 were arbitary, the proposition follows.
8.2.2 Elliptic Regularity for the Laplace Operator
†
We show in this and the next subsection how the Banach–Alaoglu theorem
can help to prove elliptic regularity for weak solutions to equations of the
form ∆g = u with g in H01 (U ), u in L2 (U ), and U ⊆ Rd open and bounded.
In this section we essentially reprove Theorem 5.45 using different methods.
In the next subsection we will assume that U has smooth boundary and
will show the regularity (unlike in Section 5.3.2) up to and including the
boundary. For convenience we will consider only R-valued functions.
Definition 8.41 (Difference quotients). Let U ⊆ Rd and V ⊆ U be open
subsets. For any f ∈ L2 (U ), j = 1, . . . , d and h ∈ R such that V + hej is
contained in U we define the difference quotient Djh f ∈ L2 (V ) by
f (x + hej ) − f (x)
Djh f (x) =
h
for almost every x ∈ V .
As one might expect the difference quotient and the weak partial derivative
are related. The first connection below is a direct application of our definition
of the Sobolev spaces.
Lemma 8.42 (Bounding the difference quotient). Let V ⊆ U ⊆ Rd be
open subsets and s > 0 such that V + [−s, s]ej ⊆ U for some j ∈ {1, . . . , d}.
Then, for any function f ∈ H 1 (U ),
kDjh f kL2 (V ) 6 k∂
∂ j f kL2 (U)
for 0 < |h| 6 s.
Proof. If f ∈ C ∞ (U ) ∩ H 1 (U ) then
Z 1
f (x + hej ) − f (x)
Djh f (x) = = ∂j f (x + thej ) dt
h 0
for all x ∈ V and 0 < |h| 6 s. By integrating the square of this equation,
applying Cauchy–Schwarz, translation invariance of the Lebesgue measure,
and Fubini’s theorem we obtain
† This and the next subsection finish our discussion of Sobolev spaces and the Laplace
operator. In particular, this material will not be needed in the remainder of the book.
Z Z Z 1 2

|Djh f (x)|2 dx = ∂ f (x + the ) dt dx
j j
V V 0
Z Z 1 Z
2
6 |∂j f (x + thej )| dt dx 6 |∂j f (x)|2 dx
V 0 U
whenever 0 < |h| 6 s. Approximating f ∈ H 1 (U ) by elements of the inter-

section H 1 (U ) ∩ C ∞ (U ) then gives the lemma.
The above lemma will be useful but it will be more powerful when com-
bined with its partial converse, which is a corollary of the Banach–Alaoglu
theorem.
Corollary 8.43 (Existence of weak derivative). Let V ⊆ U ⊆ Rd be

open subsets and s > 0 satisfying V + [−s, s]ej ⊆ U for some j ∈ {1, . . . , d}.
Suppose that f ∈ L2 (U ) and C > 0 satisfy
kDjh f kL2 (V ) 6 C (8.7)
for all 0 < |h| 6 s. Then f |V has a weak partial derivative ∂ j f on V satisfying
∂ j f kL2 (V ) 6 C.
k∂
Proof. Let φ ∈ Cc∞ (V ) and note that this implies that Djh φ converges
uniformly to ∂j φ as h → 0. Indeed, we have
φ(x + hej ) − φ(x)

Djh φ(x) = = ∂j φ(x + ξh hej ) −→ ∂j φ(x)
h
as h → 0 for some ξh ∈ (0, 1) by the mean value theorem and continuity
of ∂j φ. Indeed, the convergence is uniform and so for f ∈ L2 (U ) we have
hf, Djh φiL2 (V ) −→ hf, ∂j φiL2 (V ) (8.8)
as h → 0. On the other hand, since φ has compact support within V we may

also shift integration to obtain a discrete analogue of the formula defining
the weak derivative in Definition 5.8,
Z

1
f, Djh φ L2 (V ) = f (x) φ(x + hej ) − φ(x) dx
h V
Z Z
1 1
= f (y − hej )φ(y) dy − f (x)φ(x) dx
h V h V

= − Dj−h f, φ L2 (V ) (8.9)
for all sufficiently small h.

−1/n
By the assumption (8.7) the functions Dj f |V for n ∈ N with n1 6 s
form a bounded sequence of elements of the Hilbert space L2 (V ) satisfy-
−1/n
ing kDj f kL2 (V ) 6 C. By the Banach–Alaoglu theorem (Theorem 8.10
and Proposition 8.11) there exists a subsequence (nk ) with the property
−1/nk
that Dj f |V converges in the weak* topology to some function v ∈ L2 (V )
as k → ∞ with kvkL2 (V ) 6 C. We claim that v is the weak partial derivative
sought in the corollary. In fact, by (8.9) we now have for any φ ∈ Cc∞ (V )
1/nk −1/nk
hf, Dj φiL2 (V ) = −hDj f, φiL2 (V ) −→ −hv, φiL2 (V )
as k → ∞. Together with (8.8) this gives hf, ∂j φiL2 (V ) = −hv, φiL2 (V ) for
any function φ ∈ Cc∞ (V ), which proves the corollary.
Exercise 8.44. Using the same assumptions as in Corollary 8.43 show that the difference
quotients Djh f converge weakly in L2 (V ) to ∂ j f as h → 0. Do they also converge strongly?
In order for Corollary 8.43 to be useful we wish to upgrade the conclusion

and obtain functions in H 1 . For this we need some more preparations, which
partially already featured in some exercises. For any open subset U ⊆ Rd we
will identify elements f ∈ L2 (U ) with their trivial extension to all of Rd (by
setting the extension to be equal to zero outside of U ). By a slight abuse of
terminology we will also say that f ∈ L2 (U ) has compact support in Rd if
(for some representative of f ) the set {x ∈ U | f (x) 6= 0} has compact closure
in Rd .
Proposition 8.45 (Convolution and derivatives). Let U ⊆ Rd be open.
Let f ∈ L2 (U ) have compact support in Rd and let χ ∈ Cc∞ (Rd ). Then the
convolution product f ∗ χ defined by
Z Z
f ∗ χ(x) = f (y)χ(x − y) dy = f (x − z)χ(z) dz
for x ∈ Rd is smooth with compact support, and its derivatives are given
by ∂α (f ∗ χ) = f ∗ ∂α χ for all α ∈ Nd0 . If f ∈ L2 (U ) has a weak α-partial
derivative fα ∈ L2 (U ) for some α ∈ Nd0 and the open subset V ⊆ U has the
property that V − Supp χ ⊆ U , then we also have ∂α (f ∗ χ)|V = (fα ∗ χ)|V .
Similarly, if g ∈ L2 (U ) has compact support and satisfies the equation ∆g = u
then ∆(g ∗ χ)|V = (u ∗ χ)|V .
Proof. Let f be as in the proposition and note that f ∈ L1 (Rd ). Since we

know χ ∈ Cc∞ (U ) has compact support, the set {x ∈ U | f (x) 6= 0} + Supp χ
is compact and it is easy to see that f ∗ χ vanishes outside of this set. Note
that dominated convergence implies that f ∗ χ is continuous. Moreover, for
any j ∈ {1, . . . , d} we have
1
∂j (f ∗ χ)(x) = lim f ∗ χ(x + hej ) − f ∗ χ(x)
h→0 h
Z
χ(x+hej −y)−χ(x−y)
= lim f (y) h dy = f ∗ ∂j χ(x)
h→0
for all x ∈ Rd by the mean value theorem and dominated convergence. Induc-
tion now shows that f ∗ χ ∈ Cc∞ (Rd ) and ∂α (f ∗ χ) = f ∗ ∂α χ for all α ∈ Nd0 ,
as claimed.
Assume next that f ∈ L2 (U ) has the weak α-partial derivative fα ∈ L2 (U )
for some α ∈ Nd0 . Note that fα also has compact support, as it vanishes by
Lemma 5.10 almost everywhere on every open subset on which f vanishes
almost everywhere. Also suppose for χ ∈ Cc∞ (Rd ) that the open subset V ⊆ U
satisfies V − Supp χ ⊆ U . Now let φ ∈ Cc∞ (V ) and consider
Z Z
hf ∗ χ, ∂α φiL2 (V ) = f (x − y)χ(y) dy∂α φ(x) dx
V Supp χ
Z
= χ(y) hf, ∂α (λ−y φ)iL2 (U) dy,
Supp χ
where λ−y φ(x) = φ(x + y) has support Supp φ − y and defines for all y
in Supp χ an element of Cc∞ (U ) since V − Supp χ ⊆ U . Using the fact that fα
is the weak partial derivative of f we obtain
Z
hf ∗ χ, ∂α φiL2 (V ) = (−1)kαk1 χ(y) hfα , λ−y φiL2 (U) dy
Supp χ
Z Z
= (−1)kαk1 χ(y) fα (x − y)φ(x) dx dy
Supp χ V
kαk1
= (−1) hfα ∗ χ, φiL2 (V )
for any φ ∈ Cc∞ (V ). By uniqueness of the weak derivative (Lemma 5.10) and
continuity we now obtain ∂α (f ∗ χ)|V = (fα ∗ χ)|V (pointwise) as required.
This argument also gives the claim in the proposition for g ∈ L2 (U )
with ∆g = u ∈ L2 (U ). Indeed, with the same arguments concerning the
support of λ−y φ with y ∈ Supp χ we obtain
Z Z
hg ∗ χ, ∆φiL2 (V ) = g(x − y)χ(y) dy∆φ(x) dx
V Supp χ
Z
= χ(y) hg, ∆(λ−y φ)iL2 (U) dy
Supp χ
Z
= χ(y) hu, λ−y φiL2 (U) dy
Supp χ
Z Z
= χ(y) u(x − y)φ(x) dx dy = hu ∗ χ, φiL2 (V ) ,
Supp χ V
as required.
We will now use a non-negative functionR  ∈ Cc∞ (Rd )
as in Exercise 5.17,
so that Supp  is the closed unit ball and  dx = 1, and define the scaled
function ε (x) = ε−d ( xε ) for all x ∈ Rd and ε > 0.
Lemma 8.46 (Approximate identity Rin L2 (Rd )). Let U ⊆ Rd be open

and let f ∈ L2 (U ). Then f ∗ ε (x) = f (x − y)ε (y) dy converges to f
in L2 (Rd ) as ε → 0.
Proof. By Lemma 3.75 fε = f ∗ ε again belongs to L2 (Rd ) for any ε > 0.

Next, notice that
Z Z
fε (x) = ε−d ( yε )f (x − y) dy = (z)f (x − εz) dz,
for x ∈ Rd . Therefore
Z Z

kfε − f k2L2 (Rd ) = f (x − εz) − f (x) (z) dz 2 dx
ZZ

6 f (x − εz) − f (x)2 (z) dz dx
Z
2
= λεz f − f L2 (Rd ) (z) dz
by Jensen’s inequality (see the first paragraph of the proof of Lemma 3.75)
and Fubini’s theorem. By Lemma 3.74 and dominated convergence the latter
converges to zero as ε → 0.
We note that the following corollary to Proposition 8.45 will be combined
with Corollary 8.43.
Corollary 8.47 (Weak derivatives and H k ). Let U ⊆ Rd be open and

let k > 1 be an integer. Suppose that f ∈ L2loc (U ) has, for all α ∈ Nd0
with kαk1 6 k, a weak partial derivative ∂ α f ∈ L2loc (U ). Then f lies
k
in Hloc (U ). In fact, we have χf ∈ H k (Rd ) for all χ ∈ Cc∞ (U ).
Proof. Let χ ∈ Cc∞ (U ). Extending the product rule for differentiation we

∂ j f ) ∈ L2 (U ) for j = 1, . . . , d (check this),
have that ∂ j (χf ) = (∂j χ)f + χ(∂
which generalizes inductively to the Leibniz rule for weak differentiation ∂ α
with kαk1 6 k. In particular, χf has a weak α-partial derivative on U . We
now show that χf has weak partial derivatives ∂ α (χf ) on all of Rd , where
we extend these functions trivially from U to all of Rd . Since Supp χ ⊆ U
is compact, there exists a function ψ ∈ Cc∞ (U ) with ψ ≡ 1 on Supp χ (see
Exercise 5.37). If now φ ∈ Cc∞ (Rd ), then ψφ ∈ Cc∞ (U ). Using in addition
that Supp(χf ) and Supp ∂ α (χf ) are contained in Supp χ ⊆ U , we obtain
∂ α (χf ), ψφiL2 (U) = h∂

hχf, ∂α φiL2 (Rd ) = hχf, ∂α (ψφ)iL2 (U) = h∂ ∂ α (χf ), φiL2 (Rd )
for all α ∈ Nd0 with kαk1 6 k. Applying Proposition 8.45 and Lemma 8.46
we see that (χf ) ∗ ε is in Cc∞ (Rd ) and that

∂α (χf ) ∗ ε = ∂ α (χf ) ∗ ε
approximates ∂ α (χf ) as ε → 0 for any α ∈ N0 with kαk1 6 k. From this it

follows that χf ∈ H k (Rd ) and with the identification of functions we also see
that χf ∈ H k (U ) (see Definition 5.7). Since χ ∈ Cc∞ (U ) was arbitrary, this
k
proves f ∈ Hloc (U ), and hence the corollary.
The argument that we will present here gives an alternate proof of The-
orem 5.45 (avoiding Fourier series).
Theorem 8.48 (Elliptic regularity). Let U ⊆ Rd be open, g ∈ Hloc 1
(U ),
k k+2
let k > 0 and suppose that ∆g = u ∈ Hloc (U ). Then g ∈ Hloc (U ).
Second proof of elliptic regularity on open sets. By definition

(Definition 5.43) we have to show that χg ∈ H k+2 (U ) for all χ ∈ Cc∞ (U ).
We initially assume that k = 0, fix some χ ∈ Cc∞ (U ), and consider f = χg,
which is an element of H 1 (U ). By Lemma 5.50 we have
∆f = v ∈ H 0 (U ) = L2 (U ),
and we also know that v vanishes on U r Supp χ.

We have to show that f ∈ H 2 (U ), for which, using Corollary 8.47, we need
to show that ∂i ∂j f exists in L2 (U ) for all i, j = 1, . . . , d. It will, however, be
more convenient to work on Rd .
Extending to all of Rd . We claim that, after extending f and v trivially
from U to all of Rd , we have f ∈ H 1 (Rd ) and the relation ∆f = v actually
holds on Rd . In fact, the first claim follows from the assumption on g and
Corollary 8.47. For the second we again apply the same argument using a
function ψ ∈ Cc∞ (U ) satisfying ψ ≡ 1 on Supp χ. Indeed, for φ ∈ Cc∞ (Rd ) we
have ψφ ∈ Cc∞ (U ) and
hf, ∆φiL2 (Rd ) = hf, ∆(ψφ)iL2 (U) = hv, ψφiL2 (U) = hv, φiL2 (Rd ) .
Thus f satisfies ∆f = v on Rd as φ ∈ Cc∞ (Rd ) was arbitrary.

Convolution. Next we let ε > 0 and define fε = f ∗ ε and vε = v ∗ ε . By
Proposition 8.45 we have that fε , vε ∈ Cc∞ (Rd ) satisfy ∆fε = vε . We now use
integration by parts for smooth functions of compact support and obtain
Z Z
k∂i ∂j fε k2 = (∂i ∂j fε )(∂i ∂j fε ) dx = − (∂i2 ∂j fε )(∂j fε ) dx = h∂i2 fε , ∂j2 fε i
2
for all i, j ∈ {1, . . . , d}, which gives

d
X d
X Z
2
k∂i ∂j fε k22 = h∂i2 fε , ∂j2 fε i = ∆fε dx = kvε k22 . (8.10)
i,j=1 i,j=1
A uniform estimate implies regularity. Using Lemma 8.42 we see that
kDih ∂j fε k2 6 k∂i ∂j fε k2 6 kvε k2

for all i, j ∈ {1, . . . , d} and real numbers h with 0 < |h| 6 1. We now let ε → 0
and obtain from (8.10), Proposition 8.45, and Lemma 8.46 that
kDih∂ j f k2 6 kvk2
for all h with 0 < |h| 6 1. Applying Corollary 8.43, this implies that ∂ i∂ j f
exists in L2 (Rd ) and by Corollary 8.47 it follows that f ∈ Hloc
2
(Rd ). Using the
same function ψ ∈ Cc∞ (U ) as above this implies f = ψf ∈ H 2 (Rd ) ∩ H 2 (U ),
2
so g ∈ Hloc (U ) since f = χg and χ ∈ Cc∞ (U ) was arbitrary.
Induction on k. The theorem now follows by induction on k > 0. The
k
case k = 0 is proven above. So assume now that ∆g = u ∈ Hloc (U ) for
k−1
some k > 1. Since we then also have u ∈ Hloc (U ) we obtain from the
k+1
inductive hypothesis that in fact g ∈ Hloc (U ). Let χ ∈ Cc∞ (U ). Lemma 5.50
then gives for f = χg that ∆f = v ∈ H k (U ). If α ∈ Nd0 satisfies kαk1 6 k,
then ∂ α f ∈ L2 (U ) satisfies ∆∂
∂ α f = ∂ α v since
∂ α f, ∆φi = (−1)kαk1 hf, ∂α ∆φi = (−1)kαk1 hf, ∆∂α φi

h∂
= (−1)kαk1 hv, ∂α φi = h∂
∂ α v, φi
for all φ ∈ Cc∞ (U ). Hence the argument above for k = 0 applies to ∂ α f and
shows that ∂ α f ∈ Hloc2
(U ). As α ∈ Nd0 is arbitrary with kαk1 6 k, it follows
that f satisfies the assumption of Corollary 8.47 for the integer k + 2 and
k+2
hence f = ψf ∈ H k+2 (U ), or equivalently that g ∈ Hloc (U ) since f = χg
∞
and χ ∈ Cc (U ) was arbitrary. This concludes the induction and the proof
of the theorem.
The above proof of elliptic regularity, and in particular the step in (8.10),
was tailored very closely to the Laplace operator on open subsets in Rd .
In order to also obtain the regularity at the boundary we start by giving
a different argument which will be more amenable for generalizations (even
though it will be a bit more involved). For this we will use the following
inequality, which is also known as the Cauchy inequality with an ε. For any
measure space (X, B, µ), functions u, v ∈ L2µ (X), and ε > 0 we have
Z
1
hu, viL2 (X,µ) 6 |uv| dµ 6 εkuk22 + kvk22 . (8.11)
X 4ε
The first inequality is the triangle inequality and the second follows from
√ 2 2
integrating the inequality 0 6 ε|u| − 2|v|
√
ε
= ε|u|2 + |v|
4ε − |u||v| over X.
Third proof of elliptic regularity on open sets. As in the second
proof of elliptic regularity above, we multiply g by some χ ∈ Cc∞ (U ) and
apply Lemma 5.50. This shows that it suffices to consider the case U = Rd and
a function g ∈ H01 (Rd ) with compact support satisfying ∆g = u ∈ H k (Rd ).
We also initially set k = 0 and will use Corollary 8.43 after bounding
h
Dℓ gj
2
for ℓ, j = 1, . . . , d, where gj = ∂ j g and h ∈ R with 0 < |h| 6 1.

Recall that for any φ ∈ Cc∞ (Rd ) we have
d
X
hg, φi1 = ∂ j g, ∂j φi = − hg, ∆φi = − hu, φi ,
h∂
j=1
see also Lemma 5.41. Approximating any v ∈ H01 (Rd ) by smooth functions
with compact support this formula extends to φ = v. We set
v = −Dℓ−h (Dℓh g) ∈ H01 (Rd ) (8.12)
for some fixed ℓ ∈ {1, . . . , d} and h ∈ R satisfying 0 < |h| 6 1. Therefore

g, −Dℓ−h (Dℓh g) 1 = u, Dℓ−h (Dℓh g) , (8.13)
| {z } | {z }
L R
where L denotes the left-hand side and R denotes the right-hand side.
Studying the left-hand side. By definition of h·, ·i1 , we have
d
X d
X

L=− gj , ∂ j Dℓ−h (Dℓh g) = − gj , Dℓ−h (Dℓh gj ) ,
j=1 j=1
where we used the fact that ∂ j and Dℓh commute (check this). Finally, we
apply the same argument as in (8.9), which gives our main term
d
X d

h X h 2
M=L= Dℓ gj , Dℓh gj = Dℓ gj .
2
j=1 j=1
This is precisely what we wish to estimate, and it is the only term that is
quadratic in the difference quotient of the weak partial derivatives of g.
Bounding the right-hand side. We are aiming to convert (8.13) into an
estimate on M that is uniform with respect to h. For this, we need to bound
the right-hand side R of (8.13). In fact, we have for any ε > 0 that
Z Z 2 1 2

|R| = uDℓ−h (Dℓh g) dx 6 |Dℓ−h (Dℓh g)||u| dx 6 ε Dℓ−h (Dℓh g) 2 + 4ε u
2
by (8.11) (Cauchy’s inequality with an ε). In the first expression of the bound
on the right we use the fact that Dℓh g ∈ H01 (Rd ) and the bound on the
difference quotient by the weak partial derivative in Lemma 8.42 to obtain
−h h 2
D (Dℓ g) 6 ∂ ℓ (Dℓh g) 2 = Dℓh gℓ 2 .
ℓ 2 2 2
This gives
2
1 2

1 2
|R| 6 ε Dℓh gℓ 2 + 4ε u 2 6 εM + 4ε u 2 , (8.14)
which on setting ε = 12 gives |R| 6 12 M + 12 kuk22 .
Putting the estimates together. Using (8.13) and the estimate for R
2
in (8.14) we finally see that M 6 21 M + 12 u 2 and so
d
X 2
kDℓh gj k22 = M 6 u 2 .
j=1
Note that this upper bound is independent of h and holds for all ℓ in {1, . . . , d}.
Applying Corollary 8.43 we see that ∂ ℓ gj exists for all ℓ, j in {1, . . . , d}. In
other words, all degree two weak partial derivatives of g exist, and so g lies
in H 2 (Rd ) by Corollary 8.47.
Induction on k. The theorem again follows by induction on k as in the
second proof of elliptic regularity above.
8.2.3 Elliptic Regularity at the Boundary
In this subsection we will use the Banach–Alaoglu theorem (much as in the

third proof of elliptic regularity starting on p. 276) to prove elliptic regularity
up to the boundary. For this we need to assume that U has smooth boundary
as in Definition 5.31.
Theorem 8.49 (Elliptic regularity up to the boundary). Let U ⊆ Rd
be bounded and open with smooth boundary. Let g ∈ H01 (U ), k > 0, and
suppose that ∆g = u ∈ H k (U ). Then g ∈ H k+2 (U ).
We define \
C ∞ (U ) = C k (U )
k>0
to consist of all smooth functions on U with the property that the function
and all partial derivatives can be extended continuously to U (see also the
first paragraph of Section 5.3.3). We consider C ∞ (U ) as a subspace of C(U ).
Proposition 8.50 (Sobolev embedding up to the boundary). Let U
be a bounded and open subset of Rd with smooth boundary. Then
\
H k (U ) = C ∞ (U ). (8.15)
k>0
The above theorem and proposition together allow us to complete our dis-
cussion of the Dirichlet boundary value problem and the eigenfunctions of the
Laplace operator, which previously had only weaker than desired conclusions
regarding the behaviour of the functions near the boundary.
Corollary 8.51 (Smooth solutions). Let U ⊆ Rd be bounded and open

with smooth boundary. Then
• for any f ∈ C ∞ (∂U ) the weak solution to the Dirichlet boundary value
problem ∆g = 0, g|∂U = f from Theorem 5.51 belongs to C ∞ (U ) and
satisfies the boundary value condition pointwise, and
• the Laplace eigenfunctions f ∈ H01 (U ) with ∆f = λf from Theorem 6.56
also belong to C ∞ (U ) and vanish at the boundary.
We first assume Theorem 8.49 and Proposition 8.50 and show how these
imply the corollary.
Proof of Corollary 8.51. For the Dirichlet boundary value problem we
recall from the proof of Theorem 5.51 that we first extended f ∈ C ∞ (∂U )
to all of U , which under our assumptions leads to a function f ∈ C ∞ (U ).
Proposition 5.42 then gives a function v ∈ H01 (U ) with g = f − v satisfy-
ing ∆g = 0. In other words, ∆v = ∆f ∈ H k (U ) for all k > 0, which by
Theorem 8.49 and Proposition 8.50 gives v, g ∈ C ∞ (U ). Proposition 5.33
now implies that v vanishes at ∂U pointwise.
Similarly, suppose f ∈ H01 (U ) is an eigenfunction of ∆ from Theorem 6.56.
In this case, ∆f = λf implies f ∈ H 3 (U ) by Theorem 8.49, then f ∈ H 5 (U )
and so on. Together with Proposition 8.50 this gives f ∈ C ∞ (U ).
Proof of Proposition 8.50. The claim in (8.15) is (unlike the corollary)

a purely local statement. In fact, we claim that it is enough to show that
every point z (0) ∈ U has an open neighbourhood V ⊆ Rd with the following
properties. Every function
\
f∈ H k (U ∩ V )
k>0
with support in U ∩ V is continuous and can be continuously extended to U

so that the extension vanishes outside of V . Assuming this property for now
we find by compactness a finite cover V1 , . . . , Vn of U comprising such neigh-
bourhoods. By Exercise 5.52 we can find a corresponding smooth partition
of unity ψ1 , . . . , ψn , which we use to localize f to the functions
\
fj = f ψj ∈ H k (U ∩ Vj )
k>0
with Supp fj ⊆ U ∩ Supp ψj ⊆ U ∩ Vj for j = 1, . . . , n. The local statement

then implies that fj can be continuously extended to all of U for j = 1, . . . , n
so that the extension vanishes outside of Vj . This implies that f = f1 +· · ·+fn
has an extension to U too. Applying this argument to all partial derivatives
then gives (8.15).
Interior points. We now prove the local statement starting with the
case z (0) ∈ U . Here we may take V = U and apply Sobolev embedding
on open sets (Theorem 5.34) to conclude that f ∈ C(U ), which together

with Supp f ⊆ U proves that f can be extended continuously to U by setting
it equal to 0 on ∂U .
Boundary points, flattening the boundary. So consider now some
element z (0) ∈ ∂U . We may translate and rotate the coordinate system so
that z (0) = 0 and use Definition 5.31 to find some ε > 0 so that
U ∩ Bε (0) = {(x, y) ∈ Bε (0) | y < φ(x)}

d−1
for some φ ∈ C ∞ (BεR (0)). To simplify the discussion we define the new
open sets U ′ = (−δ, δ)d−1 × (−δ, 0), V ′ = (−δ, δ)d , and the map Φ by
Φ(x1 , . . . , xd ) = (x1 , . . . , xd−1 , xd + φ(x1 , . . . , xd−1 )),
where δ ∈ (0, ε/d) is chosen so that Φ(V ′ ) ⊆ Bε (0). In particular, Φ and all
the partial derivatives of Φ will be bounded on V ′ and Φ(U ′ ) ⊆ U ∩Bε (0). We
note that Φ maps the Lebesgue measure on V ′ ⊆ Rd to the Lebesgue measure
on the open set V = Φ(V ′ ) ⊆ Bε (0). This implies that every f ∈ C ∞ (U ∩ V )
is of the form f = g ◦ Φ−1 for g = f ◦ Φ ∈ C ∞ (U ′ ∩ V ′ ). Moreover, by the
multi-dimensional chain rule and induction we have
kf kH k (U∩V ) ≪k kf ◦ ΦkH k (U ′ ∩V ′ ) ≪k kf kH k (U∩V )
for all k > 0 and f ∈ C ∞ (U ∩ V ). Therefore, the completions H k (U ∩ V )

and H k (U ′ ∩ V ′ ) are also isomorphic under the map
H k (U ∩ V ) ∋ f 7−→ f ◦ Φ ∈ H k (U ′ ∩ V ′ )
for all k > 0. In particular, it is enough to prove the desired local statement
for U ′ and V ′ instead of U and V .
Simplifying the notation further we apply a linear map, set δ = 1, and
may suppose in the following that V = (−1, 1)d and U = (−1, 1)d−1 × (0, 1).
Boundary points, trace operators on a box. We define S = (−1, 1)d−1
and will need the trace operator on the hyperplanes Sy = S × {y} inside U
for all possible values of the height parameter y ∈ (0, 1). The trace operat-
ors for y ց 0 will allow us to extend functions from U to U ∩ V = U ∪ S0
with S0 = S × {0}. We note that these trace operators already featured in
Section 5.2.2 but (except for Exercise 5.29 and Exercise 5.35) not in the gen-
erality needed here. For completeness we quickly go through the construction
of these operators once more.
To define the trace operators we note that for any y1 , y2 ∈ (0, 1), any
function f ∈ C ∞ (U ), and x ∈ S we have
Z y2
f (x, y2 ) = f (x, y1 ) + ∂d f (x, t) dt. (8.16)
y1
If f ∈ C ∞ (U ) ∩ H 1 (U ) we may integrate over y1 ∈ (0, 1) to obtain

Z 1 Z 1
f (x, y2 ) = f (x, y) dy + ∂d f (x, t)k(y2 , t) dt (8.17)
0 0
for some bounded measurable function k (see (5.6)), almost every x ∈ S,

and y2 ∈ (0, 1). Clearly for a fixed height y ∈ (0, 1) the trace map
·|Sy : C ∞ (U ) ∩ H 1 (U ) −→ C ∞ (S)

f 7−→ S ∋ x 7−→ f (x, y)
is linear. Using (8.17) and Cauchy–Schwarz for the integration over t ∈ (0, 1)
it is also easy to see that the trace map is bounded with respect to k · kH 1 (U)
and k · kL2 (S) . Therefore the trace map is defined on H 1 (U ) and takes values
in L2 (S). For y1 , y2 ∈ (0, 1) we may also use (8.16) to obtain
Z Z Z y2 2
2
|f (x, y2 ) − f (x, y1 )| dx = ∂d f (x, t) dt dx
S S y1
Z Z 1
6 |∂d f (x, t)|2 dt dx |y2 − y1 |
S 0
by Cauchy–Schwarz for f ∈ C ∞ (U ) ∩ H 1 (U ), which gives

p
kf |Sy1 − f |Sy2 kL2 (S) 6 k∂d f kL2 (U) |y2 − y1 | (8.18)
first for f ∈ C ∞ (U ) ∩ H 1 (U ) and then for all f ∈ H 1 (U ). In particular, the

map (0, 1) ∋ y 7→ f |Sy ∈ L2 (S) is uniformly continuous for any f ∈ H 1 (U ).
We wish to combine the above with the Sobolev embedding theorem on S
to obtain similar conclusions with respect to the supremum norm. Clearly
if we have f ∈ C ∞ (U ) then f |Sy ∈ C ∞ (S) for all y ∈ (0, 1). Applying the
trace operator to the partial derivatives of a function in H k (U ) along the
various directions in S now shows that |Sy : H k (U ) → H k−1 (S) is a bounded
operator for all k > 1 and that (8.18) implies
p
kf |Sy1 − f |Sy2 kH k−1 (S) 6 kf kH k (U) |y2 − y1 | (8.19)
for f ∈ H k (U ), k > 1, and y1 , y2 ∈ (0, 1). Let now k > 1 + d−1

2 . Using the
Sobolev embedding theorem (Theorem 5.34) on S we obtain that H k (U )|Sy
belongs to C(S). For κ ∈ (0, 1) the proof of Theorem 5.34 (see Exercise 5.39)
also shows that on the compact subset K = [−1 + κ, 1 − κ]d−1 ⊆ S we have
kgkK,∞ ≪ kgkH k−1 (S)
for all g ∈ H k−1 (S). Together with (8.19) we obtain

p
sup |f (x, y1 ) − f (x, y2 )| ≪κ kf kH k (U) |y2 − y1 |
x∈K
for f ∈ H k (U ) and y1 , y2 ∈ (0, 1). This also shows that for any sequence (yn )
with yn → 0 as n → ∞ the sequence of functions defined by K ∋ x 7→ f (x, yn )
for n > 1 is a Cauchy sequence with respect to k · kK,∞ .
Boundary points, conclusion. Thus if k > 1 + d−1 k
2 and f ∈ H (U ∩ V )
has Supp f ⊆ U ∩ V , then we can find κ > 0 so that
Supp f ⊆ [−1 + κ, 1 − κ]d−1 × [0, 1 − κ]
and apply the above to see that f can be continuously extended to U . This
proves the remaining local statement. As mentioned before, this discussion
also applies to all partial derivatives of f and hence completes the proof of
the proposition.
Much as in the proof of Proposition 8.50, the statement in Theorem 8.49
can be reduced to a purely local statement. Indeed, suppose the following
holds.
(Local statement) For any z (0) ∈ U there exists a neighbourhood V
of z (0) in Rd so that if g ∈ H01 (U ) with ∆g = u ∈ H k (U ) for some k > 0
and Supp g ⊆ U ∩ V , then g ∈ H k+2 (U ).
Then we may find a finite cover of U consisting of such neighbourhoods and an
associated smooth partition of unity. Together with Lemma 5.50 this reduces
the proof to the local statements (check this).
Moreover, for any interior point z (0) ∈ U we may take V = U , apply elliptic
regularity on open subsets (Theorem 5.45 or Theorem 8.48), and multiply g
by a smooth ψ ∈ Cc∞ (U ) with ψ ≡ 1 on Supp g to obtain the local statement.
We have therefore reduced the proof of Theorem 8.49 to the following:
(Local statement at boundary points) For any z (0) ∈ ∂U there
exists a neighbourhood V ⊆ Rd so that g ∈ H01 (U ) with ∆g = u ∈ H k (U )
for some k > 0 and Supp g ⊆ U ∩ V implies that g ∈ H k+2 (U ).
Proof of Theorem 8.49, Flattening of the boundary. As in the proof

of Proposition 8.50 we wish to flatten out the boundary by a diffeomorph-
ism. Indeed, the assumption that U has smooth boundary implies for any
boundary point z (0) ∈ ∂U that there exists some δ > 0 and a diffeomorph-
ism Φ defined on V ′ = (−δ, δ)d such that Φ(0) = z (0) and Φ(U ′ ) = U ∩ V ,
where U ′ = (−δ, δ)d−1 × (0, δ) and V = Φ(V ′ ). The proof of Proposi-
tion 8.50 also shows that we may assume that Φ extends to a diffeomorphism
on a neighbourhood of V ′ , that Φ maps the Lebesgue measure on V ′ to
the Lebesgue measure on V , and induces an isomorphism between H k (U ′ )
and H k (U ∩ V ) for all k > 0.
Assume now g ∈ H01 (U ) with ∆g = u ∈ H k (U ) for some k > 0
and Supp g ⊆ U ∩V . Then hg, ∆φiL2 (U) = hu, φiL2 (U) for any φ ∈ Cc∞ (U ∩V ).
We define g ′ = g ◦ Φ ∈ H01 (U ′ ), u′ = u ◦ Φ ∈ H k (U ′ ), and obtain
hg ′ , ∆(φ) ◦ ΦiL2 (U ′ ) = hu′ , φ ◦ ΦiL2 (U ′ ) . (8.20)
Here φ ◦ Φ ∈ Cc∞ (U ′ ) is again an arbitrary smooth function with com-

pact support in U ′ , which we will denote by ϕ ∈ Cc∞ (U ′ ). We will use
the shorthand Ψ = Φ−1 for the inverse diffeomorphism and wish to ex-
press ∆(φ) ◦ Φ = (∆(ϕ ◦ Ψ )) ◦ Φ in terms of partial derivatives of ϕ. By
the chain rule
d
X
∂ℓ (ϕ ◦ Ψ ) = ((∂i ϕ) ◦ Ψ )(∂ℓ Ψi )
i=1
and the product rule

d
X d
X
∂ℓ2 (ϕ ◦ Ψ ) = ((∂i ∂j ϕ) ◦ Ψ )(∂ℓ Ψi )(∂ℓ Ψj ) + ((∂i ϕ) ◦ Ψ )(∂ℓ2 Ψi )
i,j=1 i=1
for all ℓ ∈ {1, . . . , d}, and hence

d
X d
X
∆(φ) ◦ Φ = ∆(ϕ ◦ Ψ ) ◦ Φ = ai,j ∂i ∂j ϕ + bi ∂i ϕ,
i,j=1 i=1
Pd ∞
where ai,j = ′
ℓ=1 (∂ℓ Ψi )(∂ℓ Ψj ) ◦ Φ and bi = (∆Ψi ) ◦ Φ belong to C (U )
for all i, j ∈ {1, . . . , d}. Hence the relationship in (8.20) between g and u′
′
can also be expressed as
hg ′ , P ϕiL2 (U ′ ) = hu′ , ϕiL2 (U ′ )
for all ϕ ∈ Cc∞ (U ′ ), where P is the degree two partial differential operator
d
X d
X
P (ϕ) = ai,j ∂i ∂j ϕ + bi ∂i ϕ
i,j=1 i=1
with smooth coefficients ai,j ∈ C ∞ (U ′ ) and bi ∈ C ∞ (U ′ ) for i, j ∈ {1, . . . , d}.

We remark that the coefficients ai,j form a symmetric positive-definite
matrix. More precisely, for any x′ ∈ U ′ we have ai,j (x′ ) = aj,i (x′ ) for all i, j
and on setting x = Φ(x′ ) we also have that
d
X d X
X d X
d
ai,j (x′ )vi vj = (∂ℓ Ψi )(x)vi (∂ℓ Ψi )(x)vj
i,j=1 ℓ=1 i=1 j=1
2 2
= kDΨ |x vk > θkvk
for all v ∈ Rd and some constant θ. Here DΨ |x is the total derivative of Ψ

at x = Φ(x′ ) and by invertibility of Dψ|x and compactness of U ′ the con-
stant θ > 0 can be chosen uniformly for all x ∈ U ′ . This makes P into
a uniformly elliptic operator of degree two (as defined below), and we can
apply the argument presented below.
Let U ⊆ Rd be open and bounded. We use functions ai,j , bi , c ∈ C ∞ (U )
for i, j = 1, . . . , d to define a partial differential operator P by
d
X d
X
P (φ) = ai,j ∂i ∂j φ + bi ∂i φ + cφ
i,j=1 i=1
for φ ∈ Cc∞ (U ). The operator P is called a uniformly elliptic operator of

degree two if ai,j = aj,i for all i, j ∈ {1, . . . , d} and there exists some uniform
constant θ > 0 with
d
X d
X
ai,j (x)vi vj > θkvk2 = θ vi2
i,j=1 i=1
for all v ∈ Rd and x ∈ U . These uniformly elliptic operators satisfy the

following (by now familiar) result.
Theorem 8.52 (Elliptic regularity for P ). Let U ⊆ Rd be bounded and

open with smooth boundary, and let P be a uniformly elliptic operator of
degree two. Let g ∈ H01 (U ), k > 0, and u ∈ H k (U ). Suppose that
hg, P φiL2 (U) = hu, φiL2 (U)
for all φ ∈ Cc∞ (U ). Then g ∈ H k+2 (U ).
We will not prove this theorem in detail (see Exercise 8.53), but instead
consider only the ‘local version’ needed to complete the proof of elliptic reg-
ularity of the Laplace operator in Theorem 8.49.
(Local statement at boundary of a box) Let V = (−1, 1)d
and U = (−1, 1)d−1 × (0, 1), and let k > 0. Assume that g ∈ H01 (U )
with
Supp g ⊆ U ∩ V = (−1, 1)d−1 × [0, 1),
and u ∈ H k (U ) with
hg, P φiL2 (U) = hu, φiL2 (U) (8.21)
for all φ ∈ Cc∞ (U ). Then
g ∈ H k+2 (U ). (8.22)
Proof of the local statement for Theorem 8.52 at a flat bound-

ary. We first assume that k = 0 and let φ be a function in Cc∞ (U ). By
assumption,
Z X
d d
X Z
g ai,j (∂i ∂j φ) + bi (∂i φ) + cφ dx = uφ dx. (8.23)
i,j=1 i=1
By integration by parts and the product rule we also have

Z Z
gai,j (∂i ∂j φ) dx = − ∂ i (gai,j )(∂j φ) dx
Z Z
∂ i g)(∂j φ) dx − g(∂i ai,j )(∂j φ) dx
= − ai,j (∂
Z Z
∂ i g)(∂j φ) dx + ∂ j (g∂i ai,j )φ dx
= − ai,j (∂
and Z Z
gbi ∂i φ dx = − ∂ i (gbi )φ dx
for all i, j ∈ {1, . . . , d}. Using this we can rewrite (8.23) in the form
d Z
X Z
− ∂ i g)(∂j φ) dx =
ai,j (∂ eφ dx
u (8.24)
i,j=1
for all φ ∈ Cc∞ (U ), where

d
X d
X
e=u−
u ∂ j (g∂i ai,j ) + ∂ i (gbi ) − cg ∈ L2 (U ). (8.25)
i,j=1 i=1
This allows us to ignore the first and zero order terms in the definition of P .
Since (8.24) only involves the first derivatives of φ ∈ Cc∞ (U ), it also holds by
continuity for any φ = v ∈ H01 (U ).
The choice of v. We fix some ℓ ∈ {1, . . . , d − 1}. Then
v = Dℓ−h Dℓh g ∈ H01 (U )
for small enough h ∈ R by the assumptions g ∈ H01 (U ), Supp g ⊆ U ∩ V ,

and ℓ 6= d. With this choice, (8.24) becomes
d Z
X Z
− ai,j (∂ ∂ j Dℓ−h Dℓh g) dx
∂ i g)(∂ = eDℓ−h Dℓh g dx .
u
i,j=1
| {z } | {z }
L R
Studying the left-hand side. Using the assumption Supp g ⊆ U ∩ V and

the abbreviations gi = ∂ i g, and ahi,j (x) = ai,j (x + heℓ ), we see that
d Z
X
L= Dℓh (ai,j gi )Dℓh gj dx
i,j=1
d Z
X d Z
X

= ahi,j Dℓh gi Dℓh gj dx + Dℓh ai,j gi Dℓh gj
i,j=1 i,j=1
| {z } | {z }
M E
since the product rule for Dℓh has the form
1
Dℓh (ai,j gi )(x) = ai,j (x + heℓ )gi (x + heℓ ) − ai,j (x)gi (x)
h
1
= ai,j (x + heℓ ) gi (x + heℓ ) − gi (x)
h

+ ai,j (x + heℓ ) − ai,j (x) gi (x)

= ahi,j (x)Dℓh gi (x) + Dℓh ai,j (x)gi (x)
for i, j ∈ {1, . . . , d} and x ∈ U . For x ∈ U close to ∂U rV we have gi (x) = 0.

Extending gi and ai,j trivially, this also holds for all x ∈ Rd .
The term M is our main term since the uniform ellipticity assumption
on P implies that
Xd
M>θ kDℓh gi k22 . (8.26)
i=1
The extra term E may be bounded using the Cauchy inequality with an ε as
in (8.11) to obtain
d
X
1
|E| 6 εkDℓh gj k22 + k Dℓh ai,j gi k22
i,j=1
4ε
d
−1 κd X
6 εθ dM + kgi k22 , (8.27)
4ε i=1
where we also used (8.26) and bounded the supremum norm of Dℓh ai,j
on Supp gj by some constant κ > 0.
Bounding the right-hand side. Since our right-hand side has the same
shape as the right-hand side of (8.13) the argument on p. 277 applies to give
1 1
|R| 6 εkDℓh gℓ k22 + uk22 6 εθ−1 M + ke
ke uk22 . (8.28)
4ε 4ε
Putting the estimates together. Using L = M + E = R together with

the estimates in (8.26)–(8.28), we obtain
|M| = |R − E| 6 |R| + |E| 6 εθ−1 (d + 1)M + 1ε C,
e but not on h. We choose ε

where C > 0 depends only on (ai,j )i,j , (gi )i and u
θ
to be 2(d+1) which allows us to obtain the estimate |M| 6 2ε C and hence
d
X
kDℓh gi k22 6 θ−1 |M| 6 2
θε C (8.29)
i=1
for all sufficiently small h ∈ Rr{0} and ℓ ∈ {0, . . . , d − 1}.

Existence of second weak partial derivatives. The uniform estimate
in (8.29) and Corollary 8.43 applied to the subset
Ws = (−1 + s, 1 − s)d−1 × (0, 1) ⊆ U
for some s > 0 implies that ∂ ℓ (gi |Ws ) exists for any i ∈ {1, . . . , d} and for
any ℓ ∈ {1, . . . , d − 1}. We choose s > 0 such that
Supp g ⊆ (−1 + 2s, 1 − 2s)d−1 × [0, 1).
Extending ∂ (gi |Ws ) trivially we obtain the existence of ∂ ℓ gi = ∂ ℓ∂ i g on U

for i ∈ {1, . . . , d} and ℓ ∈ {1, . . . , d − 1} (check this).
To show the existence of the partial derivative ∂ d∂ d g we use the argument
above and the operator P more directly. In fact, we note that ad,d > θ on U
1
by the uniform ellipticity assumption on P . Let φ ∈ Cc∞ (U ). Using ad,d φ
in (8.24) we obtain
Z X Z Z
1
1
1
ad,d ∂ d g ∂d ad,d φ dx = − ai,ℓ ∂ i g ∂ℓ ad,d φ dx − ue ad,d φ dx
16i,ℓ6d,
(i,ℓ)6=(d,d)
and hence also

Z Z
1
∂ d g∂d φ dx = − ad,d∂ d g∂d ad,d )φ dx
X Z
Z
1 1
+ ∂ ℓ ai,ℓ∂ i g ad,d φ dx − e ad,d
u φ dx,
16i,ℓ6d,
(i,ℓ)6=(d,d)
which shows the existence of

X
∂ d∂ d g = ad,d∂ d g∂d 1
ad,d ) − ∂ ℓ ai,ℓ∂ i g 1
ad,d +u 1
e ad,d ∈ L2 (U ). (8.30)
16i,ℓ6d,
(i,ℓ)6=(d,d)
Since Supp g ⊆ U ∩ V by assumption, we may extend g and its partial deriv-

atives trivially from (−1, 1)d−1 × (0, 1) to the half-space Rd−1 × (0, ∞) and
again obtain a function g together with all its degree one and two partial
derivatives.
Extending Corollary 8.47. From the existence of all second-order partial
derivatives of g on U we would like to conclude that g ∈ H 2 (U ). For this we
again combine Proposition 8.45 and Lemma 8.46, much as in the argument
in the proof of Corollary 8.47.
By continuity of the regular representation in Lemma 3.74 we may find,
for every ε > 0, some s > 0 so that the function gs defined by
gs (x) = g(x + sed )
for x ∈ Rd−1 × (−s, ∞) and extended trivially outside that set satisfies
∂ α g − ∂ α gs k2 < ε
k∂
for all α ∈ Nd0 with kαk 6 2. By Proposition 8.45 and Lemma 8.46 the
function g δ = gs ∗ δ is smooth, and satisfies kgδ − gk2 < 2ε for δ ∈ (0, s)
sufficiently small. Also by Proposition 8.45 and our shift of the functions the
derivatives ∂α g δ of gδ for all α ∈ Nd0 with kαk 6 2 can be expressed on U by
convolution of ∂ α gs with δ and so k∂α gδ − ∂ α gkL2 (U) < 2ε for sufficiently
small δ ∈ (0, s) and α ∈ Nd0 with kαk 6 2 . Therefore g ∈ H 2 (U ), which
concludes the case k = 0.
Induction on k > 0. The argument above gives the base of the induction
on k. Suppose we already know (8.22) for k − 1 > 0 and assume again
that u ∈ H k (U ). By the inductive hypothesis we already know g ∈ H k+1 (U ).
We again fix some ℓ ∈ {1, . . . , d − 1} and claim that ∂ ℓ g ∈ H01 (U ) and that
there exists some uℓ ∈ H k−1 (U ) with
∂ ℓ g, P φi = huℓ , φi
h∂ (8.31)
for all φ ∈ Cc∞ (U ). The inductive hypothesis then implies ∂ ℓ g ∈ H k+1 (U ). By

varying ℓ ∈ {1, . . . , d−1} this shows the existence of all partial derivatives ∂ α g
for α ∈ Nd0 with kαk1 6 k + 2 except for α = (k + 2)ed . To obtain this partial
derivative we take the kth partial derivative of ∂ 2d g as in (8.30) with respect
to the dth coordinate and obtain the existence of ∂ (k+2)ed g by using the other
partial derivatives of degree at most (k + 2). In fact, if we have u ∈ H k (U )
and g ∈ H k+1 (U ) then u e ∈ H k (U ) by (8.25). Using ∂ ℓ g ∈ H k+1 (U ) for
each ℓ ∈ {1, . . . , d − 1} and (8.30) we also have ∂ 2d g ∈ H k (U ). As in the base
case, this implies that g ∈ H k+2 (U ).
For the proof of the first part of the claim we note that f ∈ H 1 (U )
and Supp(f ) ⊆ U ∩ V implies
Z h
1
Dℓh f (x) = ∂ ℓ f (x + seℓ ) ds (8.32)
h 0
for sufficiently small h ∈ Rr{0} and almost every x ∈ U . Indeed, this holds
for any f ∈ H 1 (U ) ∩ C ∞ (U ) and x, x + heℓ ∈ U , extends by continuity to
any f ∈ H 1 (U ) and almost every x ∈ U with x+heℓ ∈ U , and then holds for f
with Supp f ⊆ U ∩ V for almost every x ∈ U and sufficiently small h 6= 0. By
the continuity of the regular representation in Lemma 3.74 the identity (8.32)
implies that
∂ ℓ f = lim Dℓh f.
h→0
2
Using this for g ∈ H (U ) and its partial derivatives ∂ i g for i = 1, . . . , d we
find for a given ε > 0 some h > 0 such that
∂ ℓ g − Dℓh gk2 < ε

k∂
and
∂ ℓ∂ i g − Dℓh∂ i gk2 < ε
k∂
for i = 1, . . . , d. Since g ∈ H01 (U ), Supp(g) ⊆ U ∩ V and ℓ ∈ {1, . . . , d − 1} we
have Dℓh g ∈ H01 (U ) for all sufficiently small h, and since ε > 0 was arbitrary
this implies ∂ ℓ g ∈ H01 (U ), as claimed.
For the second part (8.31) of the claim we let φ ∈ Cc∞ (U ) and calculate
∂ ℓ g, P φi = − hg, ∂ℓ (P φ)i
h∂
* +
X d d
X
= − g, ∂ℓ ai,j ∂i ∂j φ + bi ∂i φ + cφ
i,j=1 i=1
* d d
+
X X
= − g, P ∂ℓ φ + ∂ℓ ai,j ∂i ∂j φ + ∂ℓ bi ∂i φ + ∂ℓ c φ
i,j=1 i=1
d
X d
X
= − hu, ∂ℓ φi − hg∂ℓ ai,j , ∂i ∂j φi − hg∂ℓ bi , ∂i φi−hg∂ℓ c, φi
i,j=1 i=1
* d d
+
X X
= ∂ ℓu − ∂ i∂ j (g∂ℓ ai,j ) + ∂ i (g∂ℓ bi ) − g∂ℓ c, φ
i,j=1 i=1
= huℓ , φi ,
where
d
X d
X
uℓ = ∂ ℓu − ∂ i∂ j (g∂ℓ ai,j ) + ∂ i (g∂ℓ bi ) − g∂ℓ c
i,j=1 i=1
|{z} | {z } | {z } |{z}
∈H k−1 (U) ∈H k−1 (U) ∈H k (U) ∈H k+1 (U)
belongs to H k−1 (U ), as claimed. This concludes the induction and hence the
proof of Theorem 8.49.
Exercise 8.53. Complete the proof of Theorem 8.52 following the steps below.
(a) State and give a detailed proof of the extension of Corollary 8.47 that was used in the
above proof.
(b) Generalize Lemma 5.50 to allow the uniformly elliptic operator P instead of just ∆.
(c) Use the assumption that U has smooth boundary and a smooth partition of unity to
localize the situation. Apply the above proof on each of the local statements.
8.3 Topologies on the space of bounded operators
Let X and Y be Banach spaces. Then we have seen that the space B(X, Y )
of bounded linear operators from X to Y together with the operator norm is
again a Banach space.
Definition 8.54. Let X and Y be Banach spaces. The topology on B(X, Y )

induced by the operator norm is called the uniform operator topology.
Since any Banach space has a weak topology, there is of course also a
weak topology on B(X, Y ). There are, however, further topologies that make
special use of the fact that B(X, Y ) is a space of maps.
Definition 8.55. Let X and Y be Banach spaces. The strong operator topo-
logy on B(X, Y ) is the weakest topology for which the evaluation maps
B(X, Y ) ∋ L 7−→ Lx ∈ Y
are continuous for every x ∈ X, where we use the norm topology on Y .
In other words, a neighbourhood of L0 ∈ B(X, Y ) in the strong operator

topology is a set containing a set of the form
n
\
Nx1 ,...,xn ;ε (L0 ) = L ∈ B(X, Y ) | kLxi − L0 xi k < ε
i=1
for some x1 , . . . , xn ∈ X and ε > 0. Equivalently, we could define the strong

operator topology by using all neighbourhoods defined by the semi-norms
kLkx1,...,xn = max{kLx1 k, . . . , kLxn k}.
The strong operator topology is in many situations more natural than the uni-
form topology, and the study of unitary representations (see Definition 3.73
for the general definition) is an example.
Example 8.56. Let H = L2 (R) and define for x ∈ R the unitary map
ρx : H → H
8.3 Topologies on the space of bounded operators 291
by ρx f (t) = f (t − x) for t ∈ R and f ∈ H. We claim that

(
2 if x 6= y;
kρx − ρy k = 2δxy =
0 if x = y.
In fact, if x < y and M > 0 then we can define a function f ∈ H (illustrated

in Figure 8.1) by


0 for t < 0;
f (t) = (−1)m for t ∈ [m(y − x), (m + 1)(y − x)) with 0 6 m < M ;


0 for t > M (y − x).
Fig. 8.1: The function f in Example 8.56.
Then ρy−x f satisfies
|(f − ρy−x f ) (t)| = |2f (t)| = 2
for almost every t ∈ (y − x, M (y − x)), so that

p
kf − ρy−x f k2 > 2 (M − 1)(y − x)
p
while kf k2 = M (y − x). As M > 1 was arbitrary, this shows that
kρx − ρy k = kI − ρy−x k = 2
since kρx k = kρ−x k = 1.

The claim implies that the map R ∋ x 7→ ρx ∈ B(H, H) is not continuous
with respect to the uniform operator topology. However, it is continuous with
respect to the strong operator topology, for if f1 , . . . , fn ∈ H and ε > 0 are
given, then for y sufficiently close to x we have
kρy fi − ρx fi k2 < ε
for i = 1, . . . , n by the continuity property from Lemma 3.74. Thus, for y

sufficiently close to x we have ρy ∈ Nf1 ,...,fn ;ε (ρx ), as required.
Exercise 8.57. Let X and Y be Banach spaces. Show that the strong operator topology
on B(X, Y ) has the following properties:
(1) it is Hausdorff;
(2) it is weaker than the uniform operator topology (defined by the operator norm);
(3) a sequence (Tn ) in B(X, Y ) converges to T0 ∈ B(X, Y ) as n → ∞ in the strong
operator topology if and only if Tn (v) → T (v) as n → ∞ for all v ∈ X; and
(4) a filter F on B(X, Y ) converges to T0 ∈ B(X, Y ) if the filter generated by

{T v | T ∈ F} | F ∈ F
converges to T0 v for all v ∈ X.
Another topology on B(X, Y ) is built up using functionals on Y .
Definition 8.58. Let X and Y be Banach spaces. The weak operator topology
on B(X, Y ) is the weakest topology with respect to which the maps
B(X, Y ) ∋ L 7−→ y ∗ (Lx)
are continuous for all x ∈ X and y ∗ ∈ Y ∗ .
Equivalently, the weak operator topology can be defined using the neigh-
bourhoods defined by the semi-norms
kLkx1 ,y1∗ ;x2 ,y2∗ ;...;xn ,yn∗ = max{|y1∗ (Lx1 )|, . . . , |yn∗ (Lxn )|}.
Exercise 8.59. Assume that X and Y are infinite-dimensional Banach spaces. Show that
the uniform topology, the weak topology, the strong operator topology, and the weak
operator topology are all different Hausdorff topologies on B(X, Y ).
8.4 Locally Convex Vector Spaces
Even if we were initially only interested in Banach spaces, the last few sections
should have left no doubt that the next definition is natural and unavoidable.
It gives a class of topological vector spaces generalizing normed vector spaces.
Definition 8.60. Let X be a vector space (over R or C) and suppose that
{k · kα | α ∈ A}
is a family of semi-norms on X with the property that for every x ∈ Xr{0}

there is some α ∈ A with kxkα > 0. Then the locally convex topology on X
induced by the semi-norms is the topology for which a neighbourhood of the
point x0 ∈ X is a set containing a set of the form
8.4 Locally Convex Vector Spaces 293
n
\ k·kαi
Nα1 ,...,αn ;ε (x0 ) = Bε (x0 ) = x ∈ X | max kx − x0 kαi < ε .
i=1,...,n
i=1
The vector space X together with this topology is called a locally convex
vector space.
Equivalently, a locally convex topology is the weakest topology that is

stronger than those defined by a collection of semi-norms. Enlarging the
collection of semi-norms if necessary, we may assume that for α1 , . . . , αn ∈ A
the semi-norm
kxk′ = max kxkαi
i=1,...,n
also belongs to the collection (that is, coincides with k · kα for some α ∈ A).
If this is the case, then the neighbourhoods of x ∈ X are sets containing a
ball of the form

Bεk·kα (x0 ) = x ∈ X | kx − x0 kα < ε
for some α ∈ A and ε > 0.

An equivalent definition of locally convex vector spaces is obtained by re-
quiring that the topology on the vector space X is Hausdorff, makes addition
and scalar multiplication continuous, and has a basis of neighbourhoods of
the point 0 ∈ X consisting of absorbent balanced convex sets. Here a convex
set C ⊆ X is balanced if for any x ∈ C and scalar ρ with |ρ| 6 1 we also
have ρx ∈ C, and is absorbent if for any x ∈ X there exists some α > 0
with αx ∈ C. We refer to Exercise 8.72 and Conway [19, Sec. IV.1] for the
equivalence, as we will not need it in this form.
Essential Exercise 8.61. Show that a locally convex vector space (as in
Definition 8.60) has the property that addition and scalar multiplication are
continuous, and that 0 ∈ X has a basis consisting of absorbent balanced
convex sets.
As the next exercise shows, even if a locally convex vector space topology
cannot be described using a norm, the locally convex structure is enough to
obtain results similar to those obtained as corollaries of the Hahn–Banach
theorem (Theorem 7.3).
Exercise 8.62. Let X be a locally convex vector space. Show that the space X ∗ of con-
tinuous linear functionals on X separates points.
We have seen many examples of locally convex vector spaces. These in-
clude normed vector spaces with their norm or weak topology, duals of Banach
spaces with the weak* topology, and the space B(X, Y ) of operators between
two Banach spaces with any of the topologies discussed in Section 8.3. How-
ever, there are further spaces that we have neglected so far because they do
not fit well (or at all) into the framework of normed spaces.
Example 8.63. (1) The space C ∞ ([0, 1]) is a locally convex vector space with
the semi-norms
kf kC n([0,1]) = max kf (j) k∞
j=0,...,n
for n ∈ N. Notice that even though each of these semi-norms is already a

norm, we still have to use all of them to define the locally convex topology
on C ∞ ([0, 1]) we are interested in, namely the topology of uniform conver-
gence of all derivatives. Notice that differentiation
D
C ∞ ([0, 1]) ∋ f 7−→ f ′ ∈ C ∞ ([0, 1])
is a continuous operator on C ∞ ([0, 1]).

(2) Let U ⊆ Rd be an open set. Then
\
Cb∞ (U ) = Cbk (U ),
k>0
with Cbk (U ) defined as in Example 2.24(6), is another example of a locally

convex vector space if we use all of the norms k · kCbk (U) for k > 1.
(3) Let U ⊆ Rd be an open set. Another important notion of convergence in
analysis for functions on U is the notion of uniform convergence on compact
subsets. For example, on the space C(U ) this notion is captured if we use the
collection of semi-norms {k · kK,∞ | K ⊆ U compact}, where
kf kK,∞ = sup |f (x)|

x∈K
for f ∈ C(U ) is the supremum norm of the restriction to K.

(4) Let U ⊆ Rd be an open set. We can also make Cc (U ) into a locally convex
space in a natural way by endowing it with the collection of semi-norms
{k · kF | F ∈ C(U )},
where kf kF = kf F k∞ for f ∈ Cc (U ) is the supremum norm taken after

multiplication by F ∈ C(U ). The corresponding notion of convergence is
less familiar but is natural for elements of Cc (U ) (see Exercise 8.64 below).
The convergence is uniform across U , and this remains true after multiplic-
ation with any continuous function on U , however rapidly it might increase
towards ∂U .
Exercise 8.64. We use the notation from Example 8.63(4) in this exercise.
(a) Show that f ∈ C(U ) belongs to Cc (U ) if and only if kf kF < ∞ for all F ∈ C(U ).
(b) Suppose that (fn ) is a sequence of functions in Cc (U ) that converges in Cc (U ) to
some f ∈ Cc (U ). Show that there exists a compact set K ⊆ U such that
Supp(fn ), Supp(f ) ⊆ K
for all n > 1.

8.4 Locally Convex Vector Spaces 295
In general, the topology of a locally convex vector space is not metrizable.

One important situation in which it is metrizable is when it is sufficient to
use countably many semi-norms. This is the case in Example 8.63(1), (2)
and (3) (for the latter recall that an open subset of Rd is σ-compact), but
is not for Example 8.63(4). If the locally convex topology on X is given by
the semi-norms k · kn for n ∈ N, then we can define a metric on X as in
Lemma A.17, leading to the following definition.
Definition 8.65. A Fréchet space is a locally convex vector space X whose

topology is defined by countably many semi-norms k · kn for n ∈ N, such
that X is complete with respect to the metric
∞
X
1 kx−ykn
d(x, y) = 2n 1+kx−ykn . (8.33)
n=1
Exercise 8.66. Suppose that the topology of a locally convex vector space X is induced
by countably many semi-norms k · kn for n ∈ N.
(a) Show that a sequence (xn ) in X is a Cauchy sequence with respect to the metric
in (8.33) if and only if (xn ) is a Cauchy sequence with respect to all of the semi-norms k·kn
for n ∈ N.
(b) Show that if two families of semi-norms {k · kn } and {k · k′n } make X into a locally
convex vector space with the same topology, then X is complete with respect to d if and
only if X is complete with respect to d′ , where d′ is defined using {k·k′n } (just as in (8.33)).
(c) Show that the spaces from Example 8.63(1), (2) and (3) are Fréchet spaces.
The following exercise indicates why we restricted attention to the study

of locally convex vector spaces instead of considering the larger class of to-
pological vector spaces.
Exercise 8.67 (A topological vector space with trivial dual). Let MF([0, 1]) denote
the space of all (equivalence classes of) complex-valued measurable functions on [0, 1],
where functions are equivalent if they agree almost everywhere with respect to Lebesgue
measure m on [0, 1]. Given f0 ∈ MF([0, 1]) and ε > 0 we define the ε-neighbourhood of f0
by
Uε (f0 ) = f ∈ MF([0, 1]) | m {x ∈ [0, 1] | |f (x) − f0 (x)| > ε} < ε .
(a) Show that the above system of neighbourhoods defines a basis of neighbourhoods with
respect to a topology on MF([0, 1]). We note that the corresponding notion of convergence
is called convergence in measure.
(b) Show that the vector space operations in MF([0, 1]) are continuous, making MF([0, 1])
into a so-called topological vector space.
(c) Show that the dual space
MF([0, 1])∗ = {ℓ : MF([0, 1]) → C | ℓ is linear and continuous}
is given by MF([0, 1])∗ = {0}.

(d) Show that if a sequence (fn ) in MF([0, 1]) converges almost everywhere to f in MF([0, 1]),
then it also converges in measure to f . Give an example of a sequence that converges in
measure but not almost everywhere.
(e) Show that a sequence (fn ) in MF([0, 1]) converges in measure to f in MF([0, 1]) if every
subsequence of (fn ) has a subsequence that converges almost everywhere to f .
8.5 Distributions as Generalized Functions
Both in applications and within mathematics it is often useful to have a

generalized notion of function to allow, for example, a function F on R with
the property that Z
φ(x)F (x) dx = φ(0) (8.34)
R
for any ‘nice’ function φ : R → R. Such an F might represent a point mass
(a dimensionless object of mass 1 located at 0), or be a mathematical rep-
resentation of an impulse in physics. Since F is certainly not a function, one
needs to develop a new theory that includes such objects.(27) The theory
of distributions allows for such generalized functions, and permits them to
be differentiated, multiplied by smooth functions, and so on. Of course if
we were only interested in expressions of the form in (8.34) then we could
simply study measures, since (8.34) is simply the integral against the Dirac
measure δ0 at the origin. However, within the space of measures it does not
normally make sense to take derivatives (and this is the case for δ0 ), while it
is possible to define a derivative map in the space of distributions.
The most direct approach to distributions superficially seems to be a cheat:
We declare a distribution to be a linear continuous functional (that is, a linear
continuous map to the base field R or C) on a space of nice test functions {φ}.
Here the definition of ‘nice’ may vary, to give different classes of distributions.
For example, we could fix an open subset U ⊆ Rd and all φ ∈ Cc∞ (U ) as test
functions.
Requiring continuity of the linear functional is natural but needs a to-
pology on Cc∞ (U ). We declare the topology on Cc∞ (U ) by introducing the
following systems of semi-norms, which make Cc∞ (U ) into a locally convex
vector space. In fact, for every α ∈ Nd0 and F ∈ C(U ) we define the semi-norm
kf kα,F = k(∂α f )F k∞
for f ∈ Cc∞ (U ). Using F = 1 and α = 0 shows that these include k·k∞ , so that
the topology is indeed Hausdorff. We define the space D(U ) of distributions
on U to be the space of continuous linear functionals on the locally convex
vector space Cc∞ (U ).
This definition of a distribution is a cheat because we have finessed the
problem that no function F satisfies (8.34) by simply declaring F to be
the distribution (that is, continuous linear functional) which sends the test
function φ to φ(0) without giving a more direct generalization of functions
on R. We may write this formally as
hF, φi = φ(0),
where we write hF, φi for the

R action of the functional F on the test function φ.
One sometimes also writes R F φ for hF, φi, especially if we continue to think
8.5 Distributions as Generalized Functions 297
of F as a generalized function, but whenever one wants to prove something

about F one has to go back to the formal definition of F as a functional
on Cc∞ (U ). Even though this may look dubious at first sight, the intuition
provided by the viewpoint that F is a generalized function is often useful,
and will stay consistent with the formal treatment of F as a linear functional.
Our discussion of the Dirichlet boundary value problem (in Section 5.2–5.3)
and the eigenfunctions of the Laplace operator (in Section 6.4) have already
made use of the viewpoint provided by distribution theory. However, we will
not develop the theory here, referring to the monograph of Schwartz [94, 95]
for a thorough treatment.
Exercise 8.68. Show that any integrable function on an open subset U ⊆ Rd gives rise to
a distribution. That is, any f in L1 (U ) defines a linear functional Ff on the space Cc∞ (U )
of smooth compactly supported functions via
Z

Ff , φ = f (x)φ(x) dx.
Prove that the resulting map f 7−→ Ff is linear and injective. Actually it is sufficient to
assume that f ∈ L1loc (U ), the space of locally integrable functions, measurable functions
that are integrable on any compact set.
Exercise 8.69. Show that no measurable and locally integrable function f : R → R has
the property (8.34) for all φ ∈ Cc∞ (R).
Exercise 8.70. Let U ⊆ Rd be open and α ∈ Nd0 .

(a) Show that the linear map
∂α : Cc∞ (U ) −→ Cc∞ (U )
f 7−→ ∂α f
is continuous with respect to the locally convex topology on Cc∞ (U ).

(b) Define ∂ α = −∂α∗ : D(U ) → D(U ) by ∂ F = −F ◦ ∂ for all F ∈ D(U ). Show that we
α α
have ∂ α ψ = ∂α ψ for ψ ∈ C ∞ (U ) if we identify ψ with the distribution
Z
Fψ : Cc∞ (U ) ∋ f 7−→ f ψ dm
as in Exercise 8.68. In other words, the operator ∂ α extends differentiation on C ∞ (U )

to D(U ).
(c) Show that for ψ ∈ C ∞ (U ) and F ∈ D(U ) we can define a new distribution
ψ · F : Cc∞ (U ) ∋ φ 7−→ hψ · F, φi = hF, ψφi
which depends linearly on ψ ∈ C ∞ (U ) and linearly on F ∈ D(U ). Prove also the product
rule
∂ j F ).
∂ j (ψ · F ) = (∂j ψ) · F + ψ · (∂
8.6 Convex Sets
A set K ⊆ X in a vector space is called absorbent if for any x ∈ X there exists

some α > 0 with αx ∈ K. We note that for a convex set K ⊆ X with 0 ∈ K
and some given x ∈ X the set {t > 0 | 1t x ∈ K} is either empty, an open
interval (pK (x), ∞), or a closed interval [pK (x), ∞) for some pK (x) > 0.
k·k
Moreover, if K = B1 for some semi-norm k · k on X, then K is an absorbent
convex set and pK (x) = kxk for any x ∈ X. A partial converse is given by
the following result, which gives a solution to Exercise 7.2.
Lemma 8.71. Let K ⊆ X be an absorbent convex set in a vector space.

Define the gauge function pK : X → R>0 by
pK (x) = inf{t > 0 | 1t x ∈ K}.
Then pK (αx) = αpK (x) and pK (x + y) 6 pK (x) + pK (y) for all α > 0
and x, y ∈ X.
Proof. The positive homogeneity follows directly from the definition. Sup-
pose now that x, y ∈ X and tx , ty > 0 have
1 1
x, y ∈ K. (8.35)
tx ty
Then
1 tx 1 ty 1
(x + y) = x + y
tx + ty tx + ty tx tx + ty ty
also lies in K, since K is convex. Thus pK (x + y) 6 tx + ty , and since this
holds for all tx , ty with (8.35), the triangle inequality follows.
Exercise 8.72. Use Lemma 8.71 to prove the converse to Exercise 8.61. More precisely,
let X be a vector space endowed with a Hausdorff topology. Assume that addition and
scalar multiplication are continuous and that 0 ∈ X has a basis of neighbourhoods consist-
ing of absorbent balanced convex sets. Show that X is a locally convex space in the sense
of Definition 8.60.
In the following discussion concerning convex sets we will frequently re-

strict to real locally convex vector spaces. This is not a severe restriction
as every complex locally convex vector space X can also be considered a
real vector space, and every continuous linear functional ℓR on X has the
form ℓR = ℜℓC for a continuous linear functional ℓC on X (see the proof of
the complex case in Theorem 7.3).
The next result strengthens Corollary 7.4 and Exercise 8.62, and is readily
explained by Figure 8.2.
Theorem 8.73 (Separation from convex sets). Let X be a locally convex

vector space. Let K ⊆ X be a closed convex set, and suppose that z ∈ XrK.
8.6 Convex Sets 299
K
ℓ(x) = c
Fig. 8.2: Separation of z ∈

/ K from the convex set K by a closed hyperplane.
Then there exists a continuous linear functional ℓ ∈ X ∗ and a constant c ∈ R

such that ℓ(y) 6 c < ℓ(z) for all y ∈ K.
Proof. Since z ∈ / K and K is closed, XrK is a neighbourhood of z, and

in particular Nα1 ,...,αn ;ε (z) ⊆ XrK for some α1 , . . . , αn ∈ A and ε > 0
(see Definition 8.60 for the notation). We define U = Nα1 ,...,αn ;ε/2 (0), so
that z + 2U ⊆ XrK.
Without loss of generality we may assume that 0 ∈ K (for otherwise we
can just translate both K and z by the negative of an element of K). Define
M = K + U = {y + u | y ∈ K, u ∈ U }
and notice that M is convex because both K and U are (check this) and
that M is absorbent as it contains U .
We now apply Lemma 8.71 to obtain the norm-like function pM . By defin-
ition, we have
pM (·) 6 2ε max{k · kα1 , . . . , k · kαn } (8.36)
since U ⊆ M .
We claim that pM (z) > 1. For otherwise there exists a sequence (λn )
with λn → 1 as n → ∞ and with
1
λn z = kn + un ∈ M = K + U
1
for all n > 1. Clearly λn z and un are bounded in the semi-norms
k · kα1 , . . . , k · kαn ,
so the same holds also for kn . Now rewrite the above equation as
z = kn + (λn − 1)kn + λn un
and notice that for large enough n we have
(λn − 1)kn + λn un ∈ 21 U + 23 U = 2U,
since λn → 1 as n → ∞. However, this contradicts z + 2U ⊆ XrK. Therefore

we must have pM (z) > 1.
Now define Z = Rz and a functional ℓ ∈ Z ∗ by ℓ(αz) = αpM (z). Observe

that ℓ(αz) 6 pM (αz) for all α ∈ R (for α > 0 we have equality and for α < 0
it follows from ℓ(αz) < 0 6 pM (αz)) and ℓ(z) > 1. By the Hahn–Banach
lemma (Lemma 7.1) ℓ extends to all of X with ℓ(x) 6 pM (x) for all x ∈ X.
This implies that ℓ(y) 6 1 for y ∈ K ⊆ M . Moreover, this estimate also gives
continuity of ℓ since (8.36) implies that
2
ℓ(x) 6 ε max{kxkα1 , . . . , kxkαn },
which upgrades to
2
|ℓ(x)| 6 ε max{kxkα1 , . . . , kxkαn }
by linearity of ℓ and since the right-hand side is a semi-norm. This gives the
theorem (for c = 1).
Since the weak topology is, for infinite-dimensional vector spaces, strictly
coarser than the norm topology, there is no reason why a set that is closed
in the norm topology should be closed in the weak topology. However, for
convex sets the situation is better.
Corollary 8.74. Suppose that K is a convex set in a real Banach space X.

norm weak
Then the norm and weak closure of K agree. That is, K =K .
norm
Proof. Suppose that z ∈ /K and apply Theorem 8.73 to find a continu-
norm
ous linear functional ℓ ∈ X ∗ with ℓ(y) 6 c < ℓ(z) for all y ∈ K and
some c ∈ R. Therefore, ℓ−1 ((c, ∞)) ⊆ XrK is a neighbourhood of z in the
weak weak norm
weak topology and z ∈ / K follows. Thus K ⊆ K . The reverse
norm weak
inclusion K ⊆K is clear as the norm topology is stronger than the
weak topology.
Exercise 8.75. Generalize Corollary 8.74 to complex Banach spaces.
Exercise 8.76. Suppose that K, L ⊆ X are disjoint convex sets in a locally convex vector
space X over R. Suppose one of them has non-empty interior. Show that there exists a
non-trivial continuous linear functional ℓ and a constant c ∈ R such that ℓ(x) 6 c 6 ℓ(y)
for all x ∈ K and y ∈ L.
Exercise 8.77. Let X be a normed vector space over R, and let K ⊆ X be a non-empty
closed and convex subset. Show that

inf kz − xk = sup ℓ(z) − sup ℓ(x)
x∈K ℓ∈X ∗ x∈K
kℓk=1
for any z ∈ XrK.
Exercise 8.78. Let X be a real Banach space and let ı : X → X ∗∗ be the embedding
∗∗
of X into its bidual as in Corollary 7.9. Show that ı B1X is dense in B1X when X ∗∗ is
equipped with the weak* topology.
8.6 Convex Sets 301
8.6.1 Extreme Points and the Krein–Milman Theorem
An important concept for convex sets, both abstractly and for many concrete
applications (see, for example, Proposition 8.36), is the notion of extreme
points.
Definition 8.79. Let X be a locally convex space and let K ⊆ X be a convex
subset. An element x ∈ K is an extreme point of K if x cannot be expressed
as a proper convex combination of points of K (that is, if x = sy + (1 − s)z
with y, z ∈ K and s ∈ (0, 1) then we must have x = y = z).
As illustrated in Figure 8.3, the set of extreme points of a convex set
will not be closed in general, even in a finite-dimensional setting. In infinite-
dimensional spaces, the situation is more complex still and the extreme points
may even be dense (see Exercise 8.84). The smallest closed convex subset of
a locally convex space X that contains A ⊆ X is called the closed convex
hull of A and is the intersection of all closed convex sets containing A, or
equivalently the closure of the convex hull of A.
Fig. 8.3: The set of extreme points need not be closed: Here two cone-like objects
are glued together at their base so that a single straight line connects the two cone
points in the resulting convex set. The extreme points are the two ends together
with all but one point of the central circle.
Theorem 8.80 (Krein–Milman). Let X be a locally convex space over R,

and let K ⊆ X be a compact convex subset. Then K is the closed convex
hull of its extreme points. In particular, if K is non-empty then K has some
extreme points.
For the proof the following extension of the definition of extreme points
will be useful. A subset E ⊆ K of a convex set is called an extremal subset
of K if E is convex, non-empty, and if x = sy + (1 − s)z for x ∈ E, y, z ∈ K
and s ∈ (0, 1) forces y, z ∈ E. To better understand this notion, it may be
helpful to the reader to find all extremal subsets of a polygon in R2 or of a
polytope in R3 .
Proof of Theorem 8.80. We assume without loss of generality that K 6= ∅.
The proof uses Zorn’s lemma, applied to the set
F = {E ⊆ K | E is an extremal closed subset of K}

with the partial order defined by E1 < E2 if E1 ⊆ E2 . Note in particular

that K ∈ F , so that F is non-empty. We need to show that for any linearly
ordered subset {Eα | α ∈ I} there exists an element E ∈ F with E < Eα for
every α ∈ I. We claim that \
E= Eα
α∈I
is such an element. For this we only need to show that E ∈ F , as the fact
that E < Eα for every α ∈ I then follows directly from the definition of <.
Since each Eα is closed and convex, the same holds for the intersection E.
Since each Eα is non-empty and {Eα | α ∈ I} is linearly ordered, we see
that every finite intersection Eα1 ∩ · · · ∩ Eαn is non-empty, because it must
coincide with one of the sets Eα1 , . . . , Eαn . Since K is compact, we see that
the intersection E is non-empty (see Appendix A.4). It remains to show
that E is an extremal subset. Suppose therefore that x = sy + (1 − s)z ∈ E
with y, z ∈ K and s ∈ (0, 1). Then x ∈ Eα for all α ∈ I as E ⊆ Eα . By
extremality of Eα this forces y, z ∈ Eα for all α ∈ I, and so y, z ∈ E as
required.
In summary, we have shown that we are in a position to use Zorn’s lemma,
so that there must be a maximal element E of F . In our setting, this is a
minimal closed extremal subset of K. We claim that E = {x} is a singleton,
which then implies that x must be an extreme point of K. Indeed, if E
contains two points x0 , y0 , then by Theorem 8.73 there exists a continuous
linear functional ℓ on X with ℓ(x0 ) < ℓ(y0 ). However, by compactness this
implies that
E ′ = z ∈ E | ℓ(z) = sup ℓ|E
is a non-empty proper closed convex subset of E. It is also an extremal subset
of K, since if x = sy + (1 − s)z ∈ E ′ with y, z ∈ K and s ∈ (0, 1), then we
must have y, z ∈ E as E is extremal and so ℓ(x) = sℓ(y) + (1 − s)ℓ(z)
and ℓ(y), ℓ(z) 6 ℓ(x) = max ℓ|E , which implies that y, z ∈ E ′ , as required,
since s ∈ (0, 1). However, this is a contradiction since E ⊆ K was supposed to
be a minimal closed extremal subset of K. Therefore, E = {x0 } is a singleton
and we have shown that the set of extreme points of K is non-empty.
Now let M denote the closed convex hull of the set of all extreme points
of K. Clearly M ⊆ K and we need to show that M = K.
Suppose that x0 lies in KrM . By Theorem 8.73 there exists a continuous
linear functional ℓ with ℓ(y) 6 c < ℓ(x0 ) for all y ∈ M . Now let
F = {x ∈ K | ℓ(x) = max ℓ|K },
and notice that F ⊆ KrM is a closed convex subset of K. Therefore, F is

compact and by the above argument there exists an extreme point x ∈ F .
We claim that x is also an extreme point of K, which gives a contradiction
since then x ∈ M ⊆ KrF , by definition of M .
So suppose that x = sy + (1 − s)z with y, z ∈ K and s ∈ (0, 1). Then
8.6 Convex Sets 303
ℓ(x) = sℓ(y) + (1 − s)ℓ(z)
and ℓ(y), ℓ(z) 6 max ℓ|K = ℓ(x), which implies that y, z ∈ F since s ∈ (0, 1)
and hence x = y = z by extremality of x in F .
This contradiction shows that K = M is the closed convex hull of the set
of extreme points.
The Krein–Milman theorem, together with the Banach–Alaoglu theorem
(Theorem 8.10), can produce some striking consequences.
Example 8.81. Let us show that c0 (N) has no pre-dual. In other words, there
is no Banach space X with the property that X ∗ is isometrically isomorphic
to c0 (N). Indeed, suppose that there is such a Banach space. Then, by the
Banach–Alaoglu theorem, the unit ball of c0 (N) would be weak* compact.
Thus, by the Krein–Milman theorem,† the unit ball would have to contain
some extreme point (an )n>1 . We complete the argument by showing that
there cannot be such an extreme point of the unit ball.
By definition, |an | 6 1 for all n > 1 and limn→∞ an = 0. Therefore, there
exists some n0 with |an0 | < 21 and then the sequences (bn ) and (cn ) defined
by (
an for n 6= n0 ,
bn = 1
an + 2 for n = n0
and (
an for n 6= n0 ,
cn = 1
an − 2 for n = n0
are different, both belong to the unit ball by construction, and we have
(an ) = 21 (bn ) + 12 (cn ),
which shows that (an ) is not an extreme point.

Exercise 8.82. In Example 8.81 we showed that V ∗ is never isometrically isomorphic
to c0 (N). Generalize the result in two ways as follows. Show that there is no Banach
space V with the property that V ∗ is isomorphic to C0 (X), where X is a σ-compact,
locally compact, non-compact space and the isomorphism is only assumed to be a bounded
operator with a bounded inverse.
Exercise 8.83. Let X be a σ-compact, locally compact metric space.

M(X)
(a) Find the extreme points of the closed unit ball B1 in the space of signed measures
on X, and the extreme points of the convex set P(X) of all probability measures on X.
(b) Assume in addition that X is compact and infinite. Show that the assumptions of the
Krein–Milman theorem (Theorem 8.80) hold, but that P(X) is not the convex hull of its
extreme points. In other words, taking the closure of the convex hull is important in infinite
dimensions.
† This assumes we are working over R, but as mentioned before, the arguments extend
easily to C.
(c) Assume now instead that X is non-compact. Show that the conclusion of the Krein–
Milman theorem (Theorem 8.80) holds for P(X) (despite the fact that the assumptions do
not).
In many applications where convex subsets of Banach spaces or locally

convex spaces appear the extreme points play a special role. One instance of
this arose in our brief excursion into ergodic theory (see Section 8.2.1), where
the ergodic measures (which may now be seen to exist in great generality
due to the Krein–Milman theorem) are precisely the extreme points of the
convex set of invariant probability measures.
The following example (or rather part (c) of it) shows how badly intuition
can fail for convex sets in infinite dimensions.
Exercise 8.84. Let K = {f ∈ CR ([0, 1]) | f (0) = 0, f is 1-Lipschitz}.
(a) Show that K is convex and compact in the norm topology.
(b) Show that any function f ∈ K which is piecewise linear and has slope ±1 wherever f
is differentiable is an extreme point.
(c) Show that the extreme points in K are dense in K.
(d) Describe all the extreme points of K.
8.6.2 Choquet’s Theorem
We now further refine the Krein–Milman theorem by showing that every point
of a compact convex set can be obtained as a ‘generalized convex combination’
of extreme points of K. However — even after taking account of convergence
questions — convex combinations alone will not be sufficiently general, as
the next example shows.
Exercise 8.85. Let X and P(X) ⊆ C(X)∗ be as in Exercise 8.83. Describe the elements
of P(X) that can be written Pas a convergent (in norm, or equivalently in the weak* to-
∞
pology) convexPcombination n=1 cn νn of extreme points νn ∈ P(X) with cn > 0 for
∞
all n > 1 and n=1 c n = 1. Now let X = [0, 1] and give examples of Borel probability
measures that cannot be obtained as such limits.
Definition 8.86. Let K ⊆ X be a compact convex subset of a locally convex

vector space, and suppose that the induced topology on K is metrizable. Let µ
be a Borel probability measure on K and x ∈ K. We say that µ represents x
(or that x is the barycentre of µ) if
Z
ℓ(x) = ℓ dµ
K
for every ℓ ∈ X ∗ .
Notice that each ℓ ∈ X ∗ is continuous on K and hence is integrable with
respect to any µ on K as in the definition above.
Essential Exercise 8.87. Show that the barycentre of a Borel probability
measure µ on a metrizable compact convex subset is uniquely determined
by µ.
8.6 Convex Sets 305
Throughout the discussion of this subsection we will assume that the in-
duced topology on K ⊆ X is metrizable, writing simply (as above) that K is
a metrizable subset of X. With Proposition 8.11 and Exercise 8.12 in mind,
it should be clear why we do not wish to assume that X itself is metrizable.
Lemma 8.88. Let K be a metrizable compact convex subset of a locally con-

vex vector space X over R. Let µ be a probability measure on K. Then µ has
a barycentre in K.
Proof. For every ℓ ∈ X ∗ we define the closed hyperplane

R
Hℓ = x ∈ X | ℓ(x) = K ℓ dµ .
Notice that for a fixed ℓ ∈ X ∗ , this hyperplane is not empty since ℓ is linear.
The lemma is equivalent to the statement that
\
K∩ Hℓ 6= ∅.
ℓ∈X ∗
However, since K is compact and the sets Hℓ are closed, it is sufficient to

show that
K ∩ Hℓ1 ∩ · · · ∩ Hℓn 6= ∅ (8.37)
for any ℓ1 , . . . , ℓn ∈ X ∗ and n > 1.
For this, consider the continuous linear map
L : X −→ Rn
x 7−→ (ℓ1 (x), . . . , ℓn (x))
and the compact convex set L(K) ⊆ Rn . We claim that

Z Z
b= ℓ1 dµ, . . . , ℓn dµ ∈ L(K), (8.38)
K K
and note that this is equivalent to (8.37). Hence the claim implies the lemma.
Suppose therefore that (8.38) does not hold. Then by TheoremP 8.73 (ap-
plied to L(K) ⊆ Rn ) there exists a functional φ defined by φ(t) = nj=1 aj tj
for t ∈ Rn and some row vector a ∈ Rn such that
φ(b) > sup φ(L(x)).

x∈K
Pn
Defining ℓ∗ = j=1 aj ℓj = φ ◦ L ∈ X ∗ we now obtain
Z n
X Z
ℓ∗ dµ = aj ℓj dµ = φ(b) > sup φ(L(x)) = sup ℓ∗ (x),
K j=1 K x∈K x∈K
which gives a contradiction since µ is assumed to be a probability measure.

Exercise 8.89. Let K ⊆ X be a metrizable compact convex subset of a locally convex

vector space over R, and let M ⊆ K be a closed subset. Prove that the closure of the
convex hull of M is precisely the set of all barycentres of all Borel probability measures µ
with Supp µ ⊆ M .
Theorem 8.90 (Choquet’s theorem). Let K ⊆ X be a metrizable com-

pact convex subset of a locally convex space X over R. Then the set of ex-
treme points ext(K) of K is Borel measurable. Moreover, for any
x0 ∈ K
there exists a Borel probability measure µ on K with µ ext(K) = 1 that
represents x0 .
While the issues arising in the proof are functional-analytic, the intuition
behind the statement is essentially geometric, which is more visible in a finite-
dimensional version illustrated in the next exercise.
Exercise 8.91 (Carathéodory’s form of Minkowski’s theorem). Let K ⊆ Rn be a
compact convex subset. Show that any point x0 ∈ K is a convex combination of (n + 1)
extreme points of K.
For the proof of Theorem 8.90 we will need the following lemma and some
notation. We write A for the space of affine functions on X, that is, functions
of the form a(x) = ℓ(x) + c for some ℓ ∈ X ∗ and c ∈ R. Moreover, recall
that kf kK,∞ denotes the supremum norm of a function f restricted to some
subset K ⊆ X.
Lemma 8.92 (Upper envelope). Let K ⊆ X be as in Theorem 8.90. Given

a bounded function f : K → R we define the upper envelope of f by
f (x) = inf{a(x) | a ∈ A and a > f on K}
for all x ∈ K. Then

(1) f 6 f 6 kf kK,∞ .
(2) f is concave (that is, f (λx + (1 − λ)y) > λf (x) + (1 − λ)f (y) for all x, y
in K and λ with 0 6 λ 6 1).
−1
(3) f is upper-semicontinuous (that is, the pre-image f ((−∞, c)) is open
for every c ∈ R), and so in particular Borel measurable.
(4) If f is concave and upper-semicontinuous then f = f .
(5) Given r > 0, a ∈ A, and another bounded function g on K, we have:
(a) rf = rf ,
(b) f + a = f + a,
(c) f + g 6 f + g, and
(d) |f − g| 6 kf − gkK,∞ .
8.6 Convex Sets 307
Proof. Since the constant function kf kK,∞ belongs to A, (1) follows at once
from the definition of f .
Given x, y ∈ K, λ ∈ [0, 1] and a ∈ A with a > f on K we see that

a λx + (1 − λ)y = λa(x) + (1 − λ)a(y) > λf (x) + (1 − λ)f (y)
by definition of f (x) and f (y). The claim in (2) follows by taking the infimum
over a.
For (3), let c ∈ R and x0 ∈ K such that f (x0 ) < c. Then there exists
some a ∈ A with a > f and a(x0 ) < c. Clearly
{x ∈ K | a(x) < c} ⊆ {x ∈ K | f (x) < c}
and, since a is continuous, the former is an open neighbourhood of x0 . Since x0

was an arbitrary element of {x ∈ K | f (x) < c} we obtain (3).
For (4), assume that f is concave and upper-semicontinuous. We claim
that this implies that the set
M = {(x, c) ∈ K × R | c 6 f (x)}
of points on or underneath the graph of f is convex and closed. Here we

consider M as a subset of X ′ = X × R, which is the locally convex vector
space with the collection of semi-norms defined by k(x, c)k′α = max{kxkα , |c|}
for (x, c) ∈ X ′ and every semi-norm k·kα of the locally convex vector space X.
To see that M is convex, fix (c1 , x1 ), (c2 , x2 ) ∈ M and λ ∈ [0, 1]. Then we
have c1 6 f (x1 ), c2 6 f (x2 ), and
λc1 + (1 − λ)c2 6 λf (x1 ) + (1 − λ)f (x2 ) 6 f (λx1 + (1 − λ)x2 ),
which shows that λ(x1 , c1 ) + (1 − λ)(x2 , c2 ) ∈ M .

Notice that in order to show that M is closed in X ′ it is enough to show
that M is closed in K × R, since the latter is closed in X ′ . So suppose
that (x0 , c0 ) ∈ (K × R)rM . By definition, this means that f (x0 ) < c0 , so we
can choose c ∈ (f (x0 ), c0 ), and apply the definition of upper-semicontinuity
to see that f −1 ((−∞, c)) × (c, ∞) is open in K × R, contains (x0 , c0 ), and
does not intersect M . This means that M is closed in K × R and hence in X ′ .
Now let x0 ∈ K and f (x0 ) < c0 , so that z = (x0 , c0 ) ∈ / M . Applying
∗
Theorem 8.73 we find some functional ℓ′ ∈ (X ′ ) with ℓ′ ((x, c)) < ℓ′ ((x0 , c0 ))
for all (x, c) ∈ M . Clearly ℓ′ ((x, c)) = ℓ(x) + ac for some ℓ ∈ X ∗ and α ∈ R.
Since for x0 we have (x0 , f (x0 )) ∈ M , f (x0 ) < c0 and
ℓ(x0 ) + αf (x0 ) < ℓ(x0 ) + αc0 ,
we see that α > 0. Dividing ℓ′ by α we may thus assume that α = 1. Using

the points (x, f (x)) ∈ M for x ∈ K we obtain ℓ(x) + f (x) < ℓ(x0 ) + c0
for all x ∈ K. Then a(x) = −ℓ(x) + c0 + ℓ(x0 ) defines a function a ∈ A
with f (x) < a(x) for all x ∈ K, and so f (x0 ) 6 a(x0 ) = c0 . Since x0 ∈ K
and c0 > f (x0 ) were arbitrary we deduce that f = f as required.
Now let r > 0, a ∈ A and g be as in (5). For r = 0 the statement is clear.
For r > 0 and a function a ∈ A on X we have a > f if and only if ra > rf .
Therefore (a) follows from standard properties of the infimum.
Next notice that af > f , ag > g and af , ag ∈ A implies that
A ∋ af + ag > f + g
as in the definition of f + g. Hence af + ag > f + g, and taking the infimum

over af and ag separately gives f + g > f + g, which is (c).
Applying this for g = a ∈ A and using a = a and −a = −a, we see that
f + a 6 f + a = f + a = f + a − a + a 6 f + a + −a + a = f + a,
and so f + a = f + a, giving (b).

The proof of (d) is similar: In fact
f =f −g+g 6f −g+g
and thus f − g 6 f − g 6 kf − gkK,∞ by (1). Reversing the roles of f and g

gives |f − g| 6 kf − gkK,∞ and hence the lemma.
Proof of Theorem 8.90. By Lemma 2.46, C(K) is separable and so the

same applies to the subspace A ⊆ C(K) of affine functions (when restricted
to K and using the supremum norm k · kK,∞ on K). Hence we may choose a
dense countable subset D = {an | n ∈ N} ⊆ A with D = A ⊆ C(K). Notice
that D is still dense if we remove from it any (potentially contained) an
with an |K = 0, so we may assume that kan kK,∞ 6= 0 for all n ∈ N. Moreover,
since X ∗ separates points by Theorem 8.73, we know that D separates points
in K. Now define the function
∞
X 1
F = 2 ka k2
a2n ,
n=1
n n K,∞
whose properties are crucial for the proof of the theorem. Since the series
converges uniformly on K, we see that F ∈ C(K). We claim that F is strictly
convex, meaning that
F (λx + (1 − λ)y) < λF (x) + (1 − λ)yF (y) (8.39)
for all x 6= y ∈ K and λ ∈ (0, 1). To see this, note that
an (λx + (1 − λ)y) = λan (x) + (1 − λ)an (y)
for any n ∈ N, x 6= y ∈ K and λ ∈ (0, 1) so that

8.6 Convex Sets 309
an (λx + (1 − λ)y)2 6 λan (x)2 + (1 − λ)an (y)2 (8.40)
by convexity of the map t 7→ t2 . Also, since D separates points we have
an0 (x) 6= an0 (y)
for some n0 ∈ N and hence a strict inequality in (8.40) for this choice of n = n0
(by strict convexity of t 7→ t2 ). Summing over n gives (8.39).
We now fix x0 ∈ K, which we wish to represent. Using x0 we define the
subspace
V = A + RF ⊆ C(K),
the linear functional
Λ(a + cF ) = a(x0 ) + cF (x0 )
for any a + cF ∈ V , and the function p defined by p(f ) = f (x0 ) for all f
in C(K).
By Lemma 8.92(5), the function p is norm-like, as required in the Hahn–
Banach lemma (Lemma 7.1). Also, by Lemma 8.92(5) we have
Λ(a + cF ) = a(x0 ) + cF (x0 ) = a + cF (x0 ) = p(a + cF )
whenever c > 0. For c < 0 the function cF is concave and continuous, so

that cF = cF by Lemma 8.92(4). Using also the inequality in Lemma 8.92(1)
(multiplied by c < 0) and Lemma 8.92(5b) we now see that
Λ(a + cF ) = a(x0 ) + cF (x0 ) 6 a(x0 ) + cF (x0 )

= a(x0 ) + cF (x0 ) = a + cF (x0 ) = p(a + cF )
also for c < 0. Hence all the assumptions of the Hahn–Banach lemma are
satisfied and so there is an extension of Λ (which we again denote by Λ) to
all of C(K) satisfying
Λ(f ) 6 p(f ) = f (x0 ) 6 kf kK,∞
for all f ∈ C(K) (where the last inquality is given by Lemma 8.92(1)). For
a non-negative function f ∈ C(K) we also have −f 6 0 and so Λ(f ) > 0.
Since 1 ∈ A ⊆ V we have Λ(1) = 1(x0 ) = 1 by definition of Λ. Therefore
we may apply the Riesz representation theorem (Theorem 7.44) to obtain a
Borel probability measure µ on K with
Z
Λ(f ) = f dµ (8.41)
K
for all f ∈ C(K). Applying Λ to linear functionals ℓ ∈ X ∗ ⊆ A we see that

Z
ℓ dµ = Λ(ℓ) = ℓ(x0 ),
K
so x0 is the barycentre of the probability measure

µ. It remains to see
that ext(K) is measurable and that µ ext(K) = 1.
For this, we claim that
Z Z
F dµ = F (x0 ) = F dµ. (8.42)
K K
The first of these equations follows directly by applying the functional to F

since µ satisfies (8.41) and Λ(F ) was defined as F (x0 ). The proof of the
second equality is less direct: By Lemma 8.92(1) we have F 6 F and hence
Z Z
F dµ 6 F dµ.
K K
On the other hand a ∈ A and a > F implies that a > F and therefore
Z Z
F dµ 6 a dµ = Λ(a) = a(x0 ).
K K
Taking the infimum over these a ∈ A we get

Z
F dµ 6 F (x0 )
K
by definition of F . However, since the first equality in (8.42) is already proven

these inequalities together prove (8.42).
In particular, we see from (8.42) and the inequality F 6 F that

µ {x ∈ K | F < F } = 0. (8.43)
We claim that
ext(K) ⊇ {x ∈ K | F = F }. (8.44)
To see this, let z = λx + (1 − λ)y ∈ K be a non-extreme point (that is,
with x 6= y ∈ K and λ ∈ (0, 1)). Since F is strictly convex, this gives
F (z) < λF (x) + (1 − λ)F (y)

6 λF (x) + (1 − λ)F (y) 6 F (z)
by Lemma 8.92(1) and the concavity of F from Lemma 8.92(2). Hence
F (z) < F (z),
and the claim follows. Note that once we have shown the measurability
of ext(K), this claim implies that µ Xr ext(K) = 0 by (8.43).
To see that ext(K) is measurable, let d denote a metric on K that induces

the topology on K and notice that
Kr ext(K) = {λx + (1 − λ)y | λ ∈ (0, 1), x 6= y ∈ K}

[
= {λx + (1 − λ)y | λ ∈ [ n1 , 1 − n1 ]; x, y ∈ K with d(x, y) > n1 }
n∈N
| {z }
=Fn
is a countable union of sets Fn each of which is compact since Fn is a con-

tinuous image of the compact set
[ n1 , 1 − n1 ] × {(x, y) ∈ K 2 | d(x, y) > 1

n}
for each n > 1. Therefore, µ(Kr ext(K)) = 0 by (8.43) and (8.44) and µ
represents x0 by the argument after (8.43). This proves the theorem.
Exercise 8.93. Let K be a metrizable compact convex subset of a locally convex vector
space over R, and let x0 ∈ K. Show that x0 is an extreme point if and only if µ = δx0 is
the only Borel probability measure on K that represents x0 .
The material above follows the monograph of Phelps [86] loosely. We refer
to those notes for many interesting applications of Choquet’s theorem as well
as the generalization of this result to the case of general compact convex sets
in locally convex vector spaces (without the metrizability assumption) in the
form of the Choquet–Bishop–de Leeuw theorem.
8.7 Further Topics
Many proofs and theories depend on weak* compactness, the notion of locally
convex vector spaces, or the study of extreme points of convex subsets. We
only mention a few samples and give further references.
• Decay of Matrix Coefficients for Simple Lie Groups (the Howe–Moore
Theorem): If a simple non-compact Lie group G acts unitarily on a Hil-
bert space H without non-zero G-fixed vectors, then the matrix coef-
ficients hπg v, wi decay to zero as g → ∞ in G, for any v, w ∈ H. In
the language of ergodic theory, this means that every measure-preserving
ergodic G-action is mixing. This may sound complicated but the proof
for SLd (R) only needs as inputs the equality case of the Cauchy–Schwarz
inequality, the Banach–Alaoglu theorem, and matrix multiplication. We
refer to [27, Sec. 11.4] for a discussion of the easier case G = SL2 (R), and
to [25] for the general case. The weak* compactness is here used on the
Hilbert space H.
• In the study of von Neumann algebras two more topologies on B(X, Y )
are used (particularly in the case where X = Y is a Hilbert space):
the ultra-strong operator topology and the ultra-weak operator topology.

We refer to von Neumann [79] for the original formulation of the ultra-
strong topology, and to the monograph of Takesaki [101, Ch. II] for a full
treatment.
• As discussed in Section 8.5 the locally convex vector space Cc∞ (U ), also
written D(U ), is the space of test functions for distributions on U ⊆ Rd in
the sense that one can define distributions as continuous linear functions
on D(U ). We refer to Folland [33, Ch. 9] for a treatment of distributions.
• The Fréchet space S (Rd ) of Schwartz functions on Rd has important
connections to Fourier transforms, and is the space of test functions for
tempered distributions on Rd (see Section 9.2.3 and Folland [33, Ch. 9]).
• There are further general classes of locally convex vector spaces. Among
them are the nuclear spaces (C ∞ ([0, 1]), Cb∞ (U ) and S (Rd ) are ex-
amples) and the LF-spaces (these are strict inductive limits of Fréchet
spaces; examples include Cc (U ) and Cc∞ (U )). We refer to Bourbaki [13]
or Trèves [106] for more details.
• For a topological group G we will define in Section 9.3.1 the notion of a
positive-definite function. If G is locally compact and abelian, the extreme
points of the set of positive-definite functions (properly normalized) will
be the set of characters of G. For more general groups the extreme points
give rise to the irreducible unitary representations of the given group G
(see Exercises 9.55 and 12.59).
Chapter 9
Unitary Operators and Flows, Fourier
Transform
In this chapter we return to the topic of spectral theory by considering unit-

ary operators. Moreover, we generalize our discussion of Fourier series by
considering the Fourier transform for functions on Rd and use it to obtain
the spectral theory of unitary flows (unitary representations of R or Rd ). As
is natural in discussions concerning spectral theory (which will generalize ei-
genvalues and eigenvectors) we will only consider separable complex Hilbert
spaces in this chapter.
9.1 Spectral Theory of Unitary Operators
Let H1 , H2 be Hilbert spaces. Recall that a linear operator U : H1 → H2 is

said to be unitary if U is surjective and
kU vkH2 = kvkH1
for all v ∈ H1 (or, equivalently, if U ∗ = U −1 ). Operators V1 : H1 → H1

and V2 : H2 → H2 on Hilbert spaces are said to be unitarily isomorphic
if there is a unitary operator U : H1 → H2 with U V1 = V2 U . In contrast
to the spectral theory of compact self-adjoint operators, it is not in general
true that unitary operators on a Hilbert space are diagonalizable (that is,
unitarily isomorphic to a diagonal operator). This may be seen in the next
model example.
Example 9.1. Let µ be a finite measure on T, let H = L2µ (T) = L2 (T, µ)
and write again χ1 (x) = e2πix for x ∈ T. Define the unitary multiplication
operator U = Mχ1 : H ∋ f 7−→ Mχ1 (f ) = χ1 f ∈ H which gives a special case
of Exercise 6.25(b). Note that any eigenvalue of a unitary operator like U must
have absolute value one. Moreover, this unitary operator U has λ = e2πix0 as
an eigenvalue if and only if f = 1{x0 } is non-zero as an element of H = L2µ (T).
That is, λ = e2πix0 is an eigenvalue if and only if x0 is an atom of µ, meaning

314 9 Unitary Operators and Flows, Fourier Transform
that µ({x0 }) > 0. P

Moreover, U is diagonalizable if and only if µ is atomic,
∞
meaning that µ = k=1 ck δxk for some ck > 0 and xk ∈ T.
The type of operators seen in Example 9.1 are not difficult to deal with
even though they are usually not diagonalizable. Having abandoned the false
hope that all unitary operators will be diagonalizable (that is, describable
ultimately in terms of only countably many scalar multiplications on the
ground field), the next best hope one might have is that they can be fully
described in terms of multiplication by characters as in Example 9.1 (at the
expense of allowing the underlying measure µ to vary). That this is in fact
true is the content of the spectral theory of unitary operators.
Theorem 9.2 (Spectral theory of unitary operators). Let H be a sep-

arable complex Hilbert space and let U : H → H be a unitary operator.
Then H can be split into a countable direct sum
M
H= Hn
n>1
of closed mutually orthogonal subspaces Hn , invariant under U and U ∗ , such

that for each n > 1 the unitary operator Un = U |Hn : Hn → Hn is unitarily
isomorphic to the multiplication operator Mχ1 : L2 (T, µn ) → L2 (T, µn ) for
some finite measure µn on T.
We will add some more information on the emerging sequence of measures

in Section 9.1.3.
9.1.1 Herglotz’s Theorem for Positive-Definite Sequences
Although this is not immediately apparent, a useful concept for the proof of
Theorem 9.2 is the notion of a positive-definite sequence.
Definition 9.3. A sequence (pn )n∈Z of complex numbers is called positive-

definite if for any finite sequence (cn )n∈Z ∈ cc (Z) of complex numbers we
have X
cm cn pm−n > 0,
m,n∈Z
meaning that the sum is real and non-negative.
It is not obvious that non-trivial positive-definite sequences exist at all.

There are two ways to construct examples.
Example 9.4 (First basic construction). Let µ be a finite measure on T. Then

the Fourier coefficients pn (µ) of µ defined by
Z
pn (µ) = χn dµ
T
9.1 Spectral Theory of Unitary Operators 315
for n ∈ Z form a positive-definite sequence.

Example 9.5 (Second basic construction). Let U : H → H be a unitary oper-
ator on a Hilbert space, and fix some v ∈ H. Then the inner products
pn (v) = hU n v, viH
for n ∈ Z form a positive-definite sequence.

Both these claims require justification.
Proof of statement in Examples 9.4 and 9.5. Notice first that Ex-
ample 9.4 is a special case of Example 9.5. Indeed, if H = L2 (T, µ), U = Mχ1 ,
and v = 1T = 1 is the constant function, then U n v = χn and so
Z
pn (µ) = χn dµ = hU n v, viL2 (T,µ) = pn (v)
T
for all n ∈ Z. Thus it is enough to consider the sequence (pn (v)) from Ex-
ample 9.5. Let (cn ) be a finite complex sequence as in Definition 9.3. Then
X X
cm cn pm−n (v) = cm cn hU m v, U n viH
m,n∈Z m,n∈Z
* +
X X
m n
= cm U v, cn U v > 0,
m∈Z n∈Z H
since the inner product is positive-definite.

The main step towards the proof of Theorem 9.2 is the following descrip-
tion of all positive-definite sequences.
Theorem 9.6 (Herglotz’s theorem). Let (pn )n∈Z be a positive-definite se-
quence. Then there exists a uniquely determined finite measure µ on T for
which Z
pn = pn (µ) = χn dµ
T
for all n ∈ Z.
Note that Herglotz’s theorem shows in particular that the Examples 9.4
and 9.5 both give rise to all positive-definite sequences.
Proof of Theorem 9.6. Let (pn ) be a positive-definite sequence. Fix
some N > 1. For any θ ∈ T we may use the defining property for a positive-
definite sequence with the finite sequence
(
χ−n (θ) if 1 6 n 6 N,
cn =
0 otherwise,
of coefficients to obtain that the function

N
1 X
FN (θ) = χ−m+n (θ)pm−n
N m,n=1
is non-negative. Therefore, we may define a positive

R measure µN by the for-
mula dµN (θ) = FN (θ) dθ, that is, µN (B) = B FN (θ) dθ for any measur-
able B ⊆ T. Notice that
Z N
1 X
µN (T) = FN (θ) dθ = p0 = p0 .
T N m=1
By the Riesz representation theorem (Theorem 7.44) and the Banach–

Alaoglu theorem (Theorem 8.10) in the combined form of Proposition 8.27
this sequence has a subsequence (µNk ) converging in the weak* topology to
a positive measure µ. Since µNk (T) = p0 for all k we also have µ(T) = p0 .
Now let ℓ ∈ Z and calculate
Z Z
χℓ dµ = lim χℓ FNk (θ) dθ
T k→∞ T
Nk Z
1 X Nk − |ℓ|
= lim pm−n χℓ−m+n (θ) dθ = lim pℓ = pℓ ,
k→∞ Nk T k→∞ Nk
m,n=1
R
where we used the fact that T χℓ−m+n dθ = δm,n+ℓ (which is 1 if m = n + ℓ
and 0 otherwise) and |[1, N ] ∩ ([1, N ] + ℓ) ∩ Z| = N − |ℓ|. This proves the
existence of the finite measure µ as in the theorem. For uniqueness we note
that the algebra of characters is dense in C(T) and refer to the uniqueness
statement in Theorem 7.44.
9.1.2 Cyclic Representations and the Spectral Theorem
Recall the notion of a unitary representation from Definition 3.73 and notice
that a unitary operator U defines (and is defined by) an associated unitary
representation π of the group Z given by πn = U n for n ∈ Z.
Definition 9.7. A unitary representation π of a group G on a Hilbert
space H is called cyclic if
H = Hv = hπg v | g ∈ Gi
for some v ∈ H. We will call v a generator of the cyclic representation. A

closed subspace H′ ⊆ H is π-invariant if πg H′ ⊆ H′ for all g ∈ G. The cyclic
subspace Hv generated by v ∈ H = hπg v | g ∈ Gi is the minimal π-invariant
closed subspace containing v.
If G = Z then we also refer to a cyclic representation of Z as a cyclic
Hilbert space with respect to the unitary operator π1 : H → H.
Cyclic representations can be thought of as the building blocks of more

general unitary representations, as the following construction shows. If π is
a unitary representation of G on a separable Hilbert space H (or π : G ý
H
is a unitary representation), then we can write H as a direct sum
M
H= Hn ,
n>1
of pairwise orthogonal closed π-invariant subspaces Hn , each of which is a

cyclic representation. Indeed, if w1 , w2 , . . . is an orthonormal basis of H as
in Theorem 3.39 and
H1 = Hw1 = hπg w1 | g ∈ Gi,
then H1 is π-invariant. Together with the fact that πg∗ = πg−1 for all g ∈ G
and Lemma 6.30, this implies that H1⊥ is also π-invariant. Define H2 = Hw2⊥ ,
where w2⊥ ∈ H1⊥ is the orthogonal projection of w2 onto H1⊥ . Again the
spaces H1 ⊕ H2 and (H1 ⊕ H2 )⊥ are π-invariant, and we can continue the
⊥
process by defining H3 = Hw3⊥ , where w3⊥ ∈ (H1 ⊕ H2 ) is the orthogonal
projection of w3 onto (H1 ⊕ H2 )⊥ . Clearly w1 ∈ H1 , w2 ∈ H1 ⊕ H2 , and
w3 ∈ H1 ⊕ H2 ⊕ H3 .
Repeating this construction inductively gives a sequence of pairwise

L ortho-
gonal, closed, π-invariant, cyclic subspaces (Hn ) with H = n>1 H n (see
Exercise 3.37).
Therefore the following corollary to Theorem 9.6 will also be the main step
towards the proof of Theorem 9.2.
Corollary 9.8 (Cyclic spaces). Let U : H → H be a unitary operator on

a complex Hilbert space such that H is cyclic with respect to U , with
H = Hv = hU n v | n ∈ Zi
for some v ∈ H. Then there is a uniquely determined finite measure µv

on T, called the spectral measure of v with respect to U , such that U is
unitarily isomorphic to the multiplication operator Mχ1 on L2 (T, µv ) with
the vector v ∈ H corresponding to 1 ∈ L2 (T, µv ).
If φ : H → L2 (T, µv ) is the unitary isomorphism as in the diagram then

‘corresponding’ here means that φ(v) = 1. In other words, for cyclic Hilbert
spaces we have a commutative diagram
U
H −−−−→ H
 

φy
φ
y
L2 (T, µv ) −−−−→ L2 (T, µv )
Mχ1
of unitary maps.
Together with the discussion before Theorem 9.2, we see that spectral
measures should be thought of as a replacement for, or a generalization of,
eigenvalues. As we will see during the proof, the spectral measure µv stores
precisely the values of the inner products hU n v, vi for a given unitary oper-
ator U on a Hilbert space H and vector v ∈ H. In the case of an eigenvector
with eigenvalue λ0 = e2πix0 , we obtain the Dirac measure kvk2 δx0 . At the
opposite extreme, it could be that the vectors . . . , U −1 v, v, U v, . . . are all mu-
tually orthogonal, in which case µv is the multiple kvk2 mT of the Lebesgue
measure.
Proof of Corollary 9.8. Let v ∈ H be a generator of
H = Hv = hU n v | n ∈ Zi.
By Example 9.5, we know that if pn (v) = hU n v, viH for n ∈ Z then (pn ) is a

positive-definite sequence. By Theorem 9.6 there exists a uniquely determined
finite measure µv on T with
Z
pn (v) = pn (µv ) = χn dµv
for all n ∈ Z. We wish to define a unitary map
φ : Hv = hU n v | n ∈ Zi −→ L2 (T, µv )
to be the unique extension of the map that sends any finite linear combina-
tions of the vectors U n v to the corresponding trigonometric polynomial,
φ : Hv −→ L2 (T, µv )
X X
cn U n v 7−→ cn χn .
|n|6N |n|6N
While this is a natural attempt at defining the map φ, it is not clear whether
it produces a well-defined map. Curiously (at first encounter), this will follow
from the map being an isometry: for any finite complex sequence (cn ), we
have
X 2 X X

cn U n v = hcm U m v, cn U n viH = cm cn = U m−n v, v H
n∈Z
H
m,n∈Z m,n∈Z
| {z }
pm−n (v)=pm−n (µv )
X Z
= cm cn χm−n dµv
m,n∈Z
Z X X X 2

= cm χm cn χn dµv = cn χn .
L2 (T,µv )
m∈Z n∈Z n∈Z
We now show that this implies that φ is well-defined

P on the set
P of finite linear
combinations by the following argument. If n∈Z cn U n v = n∈Z c′n U n v for
finite complex sequences (cn ) and (c′n ), then

X X

0= (cn − c′n )U n v = (cn − c′n )χn

n∈Z H n∈Z L2 (T,µv )
P P
and so n∈Z cn χn = n∈Z c′n χn in L2 (T, µv ). Clearly the map φ so defined
is now an isometry on a dense subspace of Hv , and so extends by the auto-
matic extension to the closure (Proposition 2.59) to an isometry from Hv
into L2 (T, µv ). Furthermore, the image of φ contains all trigonometric poly-
nomials on T, and which form a dense subset of C(T) by Proposition 3.65
(and thus also of L2 (T, µv )). Since Hv = H is a Hilbert space and φ is an
isometry, we see that φ(Hv ) ⊆ L2 (T, µv ) is complete and dense, and hence
equal to L2 (T, µv ).
It remains to check that φ ◦ U = Mχ1 ◦ φ. Let (cn ) be a finite complex
sequence. Then !
X X
n
U cn U v = cn U n+1 v,
n∈Z n∈Z
and so
X X X X
φ U cn U n v = cn χn+1 = χ1 cn χn = Mχ1 φ cn U n v .
n∈Z n∈Z n∈Z n∈Z
That is, the desired formula holds on a dense subset of Hv and so by continuity
on all of Hv . This proves the corollary.
Proof of Theorem 9.2. Let U be a unitary operator on a complex separable

Hilbert space H. By the discussion after Definition 9.7 H can be written
as a direct sum of closed mutually orthogonal cyclic subspaces. Applying
Corollary 9.8 to each of them proves the theorem.
Exercise 9.9. (a) Let Hv ∼ = L2 (T, µv ) be as in Corollary 9.8. Let w ∈ Hv and suppose
that f ∈ L2 (T, µv ) corresponds to w. Characterize the property Hv = Hw in terms of f .
(b) Apply (a) to the unitary operator defined by U ((an )) = (an−1 ) on H = ℓ2 (Z) and the
vector (vn ) with vn = 0 for n 6= 0 and v0 = 1.
The spectral theory of unitary operators has the spectral theory of self-
adjoint operators as a consequence, as the next exercise shows. However, we
will also give an independent and much more detailed treatment of this theory
in Chapter 12.
Exercise 9.10. (a) ForP∞ any bounded operator A : V → V on a Banach space V and any
power series f (z) = k
k=0 ck z whose radius of convergence is bigger than kAk, show
that the natural definition of f (A) as the limit ofPthe sequence of operators obtained as
∞ n is the inverse func-
partial sums makes sense. Show that if g(z) = n=0 dn (z − c0 )
tion P
to f definedPin a neighbourhood
of f (0) = c 0 (represented by another power series)
and ∞ ∞ k n < ∞, then we have g(f (A)) = A.
n=0 |dn | k=0 |ck |kAk
1
(b) Let A : H → H be a non-zero self-adjoint operator. Replacing A by 2kAk A we may
assume that kAk = 12 . Apply part (a) to A and the power series corresponding to eiz to
obtain a unitary operator U : H → H. Show that kU − Ik 6 e1/2 − 1 < 1, and that A can
be recovered from U via the power series representing 1i log(z) in a neighbourhood of 1.
(c) Apply Theorem 9.2 to U and show that one can describe A on H by a direct sum of
multiplication operators as in Exercise 6.25(b). In fact, for each of the direct summands
the measure space can be chosen to be a copy of R together with a measure supported
in [−kAk, kAk] and the multiplication operator can be chosen to be MI (f )(x) = xf (x).
For simplicity we have been working in this section with a single unitary
operator (or the group Z) but the approach can be generalized to several
commuting unitary operators (the group Zd ) as outlined in the following
exercise.
Exercise 9.11. (a) Define positive-definite functions on Zd (so that the sequence case
corresponds to d = 1), and generalize Herglotz’s theorem to this context.
(b) State and prove a corollary to part (a) regarding the spectral theory of d commuting
unitary operators, so that Theorem 9.2 corresponds to the case d = 1.
9.1.3 Spectral Measures
In this subsection we will strengthen the spectral theorem for unitary op-
erators by studying the sequence of spectral measures appearing in it more
carefully. We start with a few immediate consequences of the definition of
the spectral measures.
Lemma 9.12 (Behaviour of spectral measures). Let U : H → H be a

unitary operator on a separable complex Hilbert space H.
(a) For any v ∈ H we have µv (T) = kvk2 .
(b) If v, w ∈ H satisfy Hv ⊥ Hw , then µv+w = µv + µw .
(c) If w ∈ Hv then µw ≪ µv .
(d) If v1 , v2 , . . . ∈ H P
are such L
that the cyclic spaces that they generate are
∞
orthogonal, w = k wk ∈ k=1 Hvk , and wk Pcorresponds to the func-
tion fk ∈ L (T, µvk ) for all k > 1, then dµw = ∞
2 2
k=1 |fk | dµvk .
(e) If µv ⊥ µw then Hv ⊥ Hw .
Z

Proof. (a) By definition we have µv (T) = χ0 dµv = U 0 v, v = kvk2 .
(b) Assume that Hv ⊥ Hw . Then
Z
χn dµv+w = hU n (v + w), v + wiH
Z Z
= hU n v, viH + hU n w, wiH = χn dµv + χn dµw
for all n ∈ Z, which implies that µv+w = µv + µw by uniqueness of the

spectral measures (Corollary 9.8).
(c) Suppose that w ∈ Hv and recall that Hv = ∼ L2 (T, µv ) with U correspond-
2 2
ing to Mχ1 on L (T, µv ). If now f ∈ L (T, µv ) is the image of w under the
unitary isomorphism, then
Z

hU n w, wiH = Mχn1 f, f L2 (T,µ ) = χn |f |2 dµv
v
for all n ∈ Z implies that dµw = |f |2 dµv .

P P∞
(d) Let w =P ∞ k=1 wk with wk ∈ Hvk for all k > 1, so that
2
k=1 kwk k < ∞.
∞ 2
Then µ = k=1 µwk is finite by (a). For any k > 1, let fk ∈ L (T, µvk ) be
the function corresponding to wk ∈ Hvk ∼ = L2 (X, µvk ), so dµwk = |fk |2 dµvk
by the argument in (c). We now have
∞
X ∞ Z
X Z
hU n w, wiH = hU n wk , wk iH = χn |fk |2 dµvk = χn dµ
k=1 k=1
for all n ∈ Z, which implies (d).

(e) Assume that w = x + y with x ∈ Hv and y ∈ Hv⊥ . Then we have Hx ⊆ Hv
and Hy ⊆ Hv⊥ so that µw = µx + µy by (b). However, (c) states that µx ≪ µv
which implies that µx = 0 since µw ⊥ µv . Thus x = 0 by (a) and w = y ∈ Hv⊥ .

It is easy to check that the relation ν ≪ µ ≪ ν of mutual absolute continu-
ity for measures µ and ν is an equivalence relation on the space of measures
of a given measurable space, giving rise to the measure equivalence class of
a given measure.
Corollary 9.13 (Maximal spectral type). Let H and U be as in the spec-

tral theorem (Theorem 9.2). The sequence of spectral measures in the spectral
theorem can be chosen so that µ1 ≫ µ2 ≫ · · · . In this case µ1 is the maximal
spectral measure in the sense that µv ≪ µ1 for any v ∈ H, and this property
uniquely characterizes the measure equivalence class of µ1 , which is called the
maximal spectral type of U .
Proof† . In the proof of Theorem 9.2 (more precisely, in the argument after
Definition 9.7), we found a sequence of vectors w1 , w2 , . . . in H such that
M
H= Hwn
n>1
(where the sum is possibly finite). Each of the vectors has a spectral meas-
ure νn = µwn for n > 1. Applying the Lebesgue decomposition theorem (from
Proposition 3.29) to ν1 and νn for n > 2 we define
νn = νnac + νn⊥ (9.1)
with νnac ≪ ν1 and νn⊥ ⊥ ν1 . From νn⊥ ⊥ ν1 it follows that there exists some
measurable Bn ⊆ T such that νn⊥ (Bn ) = 0 and ν1 (TrBn ) = 0, which implies
with (9.1) that νnac = νn |Bn and νn⊥ = νn |TrBn . We will use the set Bn to
decompose wn into two components. In fact, under the unitary isomorphism
between Hwn and L2 (T, νn ), let wn⊥ ∈ Hwn be the vector corresponding
to cn 1TrBn where ⊥ 1
P cn > 0⊥ is chosen so that kwn k 6 2n .
Let w = w1 + n>2 wn , which converges absolutely. Using Lemma 9.12(d)
we now find that X
dµw = dν1 + c2n 1TrBn dνn ,
n>2
or equivalently X
µw = ν 1 + c2n νn⊥ .
n>2
From this we see that νn = νnac

+ νn⊥
≪ µw for all n > 2, and for n = 1 this
holds trivially.L In particular, Lemma 9.12(d) now shows that µv ≪ µw for
any v ∈ H = n>1 Hwn , as claimed in the corollary. It is easy to see that
this property uniquely characterizes the measure class of µw .
We claim
P now that Hw1 ⊆ Hw . To see this, notice that by construc-
tion ν1 ⊥ n>2 c2n νn⊥ and let B ⊆ T have ν1 (TrB) = 0 = νn⊥ (B) for all n > 2.
Then by Corollary 9.8 and Lemma 9.12(d), Hw contains an element x cor-
responding to 1B ∈ L2 (T, µw ) with spectral measure µx = µw |B = ν1
and an element
P y = w − x corresponding to 1TrB with spectral meas-
ure µy = n>2 c2n νn⊥ . Since µy ⊥ ν1 = µw1 we obtain from Lemma 9.12(e)
⊥
that y ∈ Hw 1
. The same argument shows that x ∈ Hw1 even though
it requires some additional thought: Recall from our construction above
that wn⊥ ∈ Hwn satisfies µwn⊥ ⊥ µw1 =Lν1 . Applying Lemma 9.12(c)
and (d), this gives µz ⊥ ν1 for all z ∈ Hwn⊥ . Since µx = ν1 we
n>2 L
may apply Lemma 9.12(e) L again to see that x ⊥ n>2 Hwn ⊥ . Moreover, we
have x ∈ Hw ⊆ Hw1 ⊕ n>2 Hwn⊥ , which implies that x ∈ Hw1 . Therefore,

we have
† The corollary will not be needed in the remainder of the book, and the reader may skip
its proof.
X
w = w1 + wn⊥ (by construction of w)
n>2
=x+y (by construction of x, y)

⊥
with x ∈ Hw1 and y ∈ Hw 1
, so that w1 = x ∈ Hw and hence also Hw1 ⊆ Hw ,
as claimed.
The corollary follows by repeating this argument inductively. For this,
⊥
we note that we can describe the space Hw in the next step as a sum of
cyclic spaces such that the first space is generated by the orthogonal projec-
tion w2′ of w2 to Hw ⊥
. Repeating now the argument above constructs a new
′ ⊥
vector w such that µv ≪ µw′ for all v ∈ Hw and its cyclic subspace contains
′
the vector w2 . It follows that w2 belongs to the sum Hw ⊕ Hw′ . Continu-
ing in this way, we can make sure that the direct sum of Hw , Hw′ , . . . still
contains w1 , w2 , . . . and so coincides with H. By construction, the spectral
measures of w, w′ , w′′ , . . . have the claimed absolute continuity property.
The next two exercises extend the discussion above, and we invite the
reader to consider in particular the case in which the measures arising are
atomic, so that we are simply discussing eigenvalues and their multiplicities.
Exercise 9.14. Given a unitary operator U on a separable complex Hilbert space H and
a finite Borel measure ρ on T, write Hρ = {v ∈ H | µv ≪ ρ}. Show that Hρ is a closed
subspace of H that is invariant under U and U ∗ , and that we have µw ⊥ ρ for any
vector w ∈ (Hρ )⊥ .
Exercise 9.15. (a) Given finite measures µ and ν on T with µ ≪ ν ≪ µ, show that the
corresponding multiplication operators from Example 9.1 are unitarily isomorphic.
(b) Reformulate Corollary 9.13 to show the existence of a sequence of finite measures (νn )
and a measure
L ν∞ 2with νmn ⊥ νn2 for all m 6= n with m, n ∈ N ∪ {∞} with the property
that H ∼= n>1 L (T, νn ) ⊕ L (T, ν∞ ) and the isomorphism carries U to the sum of
N
the associated multiplication operators.

(c) Show that the unitary isomorphism from (b) takes H(1) = {v ∈ H | µv ⊥ µw ∀ w ∈ H⊥ v }
to L2 (T, ν1 ), and hence in particular deduce that H(1) is a closed subspace.
(d) Show that the unitary isomorphism in (b) takes the closed subspace
n o
H(2) = v ∈ H | ∃ v2 with Hv ⊥ Hv2 , µv = µv2 and µw ⊥ µv ∀ w ∈ (Hv ⊕ Hv2 )⊥
to L2 (T, ν2 )2 .
(e) Generalize (d) to higher multiplicities and conclude that the sequence of the measure
classes of ν1 , ν2 , . . . , ν∞ and subspaces in H corresponding to L2 (T, νn )n resp. L2 (T, νn )N
are uniquely determined by U .
9.1.4 Functional Calculus for Unitary Operators
As in Exercise 9.10 it is relatively straightforward to obtain a definition

of h(A) for an analytic function h (defined by a power series) and bounded
operator A (whose norm is less than the radius of convergence of the power
series). For a multiplication operator Mg as in Exercise 6.25 one can go much
further. For example, one could define h(Mg ) by setting it equal to Mh◦g for
any bounded measurable function h. The reader should verify at this point
that this definition does generalize the prior definition for analytic functions
to measurable functions. Since Theorem 9.2 and Exercise 9.10 describe ar-
bitrary unitary or self-adjoint operators in terms of multiplication operators,
this allows one to also define the operators obtained by applying h to these.
However, from this definition it is not clear whether the result is independent
of the choices made to describe the operator on H as a sum of multiplication
operators. As it turns out, this is the case, and we will discuss this ‘functional
calculus’ in greater detail and in a more general setting in Chapter 12. Here
we aim to give a first taste of this theory, by discussing simpler instances of
the results for a single unitary operator.
For the discussion in this subsection it is convenient to use χ1 as an iso-
morphism from T to S1 = {z ∈ C | |z| = 1}, and transport the spectral
measures µv on T provided by Corollary 9.8 to S1 . We will still use the same
symbol for these measures so that their characterizing property becomes
Z
hU n v, viH = z n dµv (z)
S1
for v ∈ H and n ∈ Z. Note that the multiplication operator in Corollary 9.8

then has the form MI (f )(z) = zf (z) for all z ∈ S1 and f ∈ L2 (S1 , µv ),
where I(z) = z denotes the identity map on C.
To obtain a good definition of h(U ) the trick is to define more general
spectral measures.
Definition 9.16. Let U : H → H be a unitary operator. A complex-valued

measure µv,w on S1 is the spectral measure of v, w ∈ H with respect to U if
Z
z n dµv,w (z) = hU n v, wiH (9.2)
S1
for all n ∈ Z.
Proposition 9.17 (Non-diagonal spectral measures). Let U : H → H

be a unitary operator on a separable complex Hilbert space H. For every v, w
in H the spectral measure µv,w exists and is uniquely determined by v and w.
Moreover, the spectral measure depends linearly on v and semi-linearly on w.
For every h ∈ L ∞ (S1 ) there exists a bounded operator h(U ) : H → H that is
characterized by the property
Z
hh(U )v, wiH = h dµv,w
S1
L
for all v, w ∈ H. If H ∼ = 2 1
n>1 L (S , µn ) as in Theorem 9.2, then h(U )
corresponds to the direct sum of the multiplication operators Mh on L2 (S1 , µn )
for n > 1.
The idea behind the proof of this corollary to Bochner’s theorem is simple,
and relies on the polarization identity
3
X

hU n v, wiH = 1
4 iℓ U n (v + iℓ w), v + iℓ w H (9.3)
ℓ=0
which is easily checked and gives the existence, by setting

3
X
µv,w = 1
4 iℓ µv+iℓ w . (9.4)
ℓ=0
An important case is to consider the characteristic functions h = 1B for a

measurable set B ⊆ T (see the proof of Corollary 9.13 and Exercise 9.21). In
particular, for every Borel subset B ⊆ S1 there exists an orthogonal projection
operator E(B) = 1B (U ).
Definition 9.18. The function E : B(S1 ) → B(H) is called a projection-
valued measure.
For a given measurable B ⊆ S1 one should think of E(B) as the projection
operator that projects onto the closed subspace on which all ‘generalized
eigenvalues belong to B’.
Proof of Corollary 9.17. Since the spectral measures µv+iℓ w for the vec-
tors v + iℓ w ∈ H exist by Corollary 9.8, equation (9.3) shows that µv,w as
defined in (9.4) satisfies the desired relationship. Since trigonometric poly-
nomials (which on S1 are linear combinations of z n for n ∈ Z) are dense
in C(S1 ) by Proposition 3.65, and complex-valued measures are naturally
identified with linear functionals on C(S1 ) (by Theorem 7.54), it follows that
the spectral measure µv,w is uniquely determined by (9.2). Since the inner
product is sesqui-linear, this also implies the claimed sesqui-linearity.
We claim that
kµv,w k 6 4kvkkwk (9.5)
(see Exercise 3.33 and Theorem 7.54 for the norm of µv,w , and also Exer-
cise 9.19 for the correct upper bound). First note that by sesqui-linearity µtv,sw
is equal to tsµv,w for all t, s ∈ C, and in particular µv,w = 0 if one of
the vectors is zero. This allows us to assume p that v, w ∈ H are non-zero
and that kvk = kwk after multiplying v by kwk/kvk, the vector w by
its inverse and therefore assume without loss of generality that kvk = kwk.
Using now the definition of µv,w in (9.4) with these new vectors and the
formula µv+iℓ w (S1 ) = kv + iℓ wk2 6 4kvk2 for 0 6 ℓ 6 3, we obtain (9.5).
Now let h ∈ L ∞ (S1 ) so that
Z

h dµv,w 6 4khk∞ kvkkwk.
S1
Fix v ∈ H and consider the function ℓh,v defined by

Z
ℓh,v (w) = h(z) dµv,w (z)
S1
for w ∈ H. Then w 7→ ℓh,v (w) is a bounded linear functional on H, and

so by the Fréchet–Riesz representation theorem (Corollary 3.19) there exists
some vh ∈ H with ℓh,v (w) = hvh , wi for all w ∈ H. Moreover,
kvh k = kℓh,v k 6 4khk∞ kvk
and vh depends linearly on v. Hence h(U )v = vh defines a bounded linear

operator with kh(U )kop 6 4khk∞.
We note that v ′ ∈ Hv and w ∈ Hv⊥ implies hU n v, wiH = 0 for all n ∈ Z,
hence µv′ ,w = 0 and so hh(U )v ′ , wiH = 0 for all h ∈ L ∞ (S1 ). As this holds
for all v ′ ∈ Hv and w ∈ Hv⊥ , we see that h(U )Hv ⊆ Hv . Thus for the proof
of the description of h(U ) via the spectral theorem it suffices to describe the
restriction of h(U ) to a cyclic representation, or equivalently describe h(MI )
on L2 (S1 , µ) for some finite measure µ on S1 . If f, g ∈ L2 (S1 , µ), then
Z
hMIn f, giL2 (S1 ,µ) = z n f g dµ
S1
for all n ∈ Z. Hence dµf,g = f g dµ is the spectral measure of f, g with respect

to MI , giving
Z
hh(MI )f, giL2 (S1 ,µ) = hf g dµ = hMh f, giL2 (S1 ,µ) .
S1
This implies the corollary.
Exercise 9.19. Improve the estimate in (9.5) to the inequality kµv,w k 6 kvkkwk for
all v, w ∈ H.
Exercise 9.20. Let U be a unitary operator on a separable complex Hilbert space and h
a function in L ∞ (S1 ). When is h(U ) unitary or self-adjoint? What is the norm kh(U )kop ?
Exercise 9.21. Let U be a unitary operator on aFseparable complex Hilbert space H.

Suppose that we are given a decomposition S1 = k P Bk for some (finite or countable)
list of measurable sets Bk . Let v ∈ H. Show that v = k E(Bk )v. Describe the spectral
measure µE(Bk )v for all k in terms of the spectral measure µv and explain the meaning
of E(Bk )v in the case when µv is atomic.
9.1.5 An Application of Spectral Theory to Dynamics
In this subsection we will use the spectral theory of unitary operators in

the context of measure-preserving systems (see Definition 8.35). In fact, we
will assume here that X is a compact metric space, µ is a Borel probability
measure on X, and that T : X → X is measure-preserving, continuous, and
invertible, so that T −1 : X → X is also continuous. This guarantees that the

operator UT : L2 (X, µ) → L2 (X, µ) defined by UT f = f ◦ T is unitary, since
Z Z
hUT f, gi = (f ◦ T g) ◦ T −1 dµ = f (g ◦ T −1 ) dµ = hf, UT −1 gi
X X
2
for all f, g ∈ L (X, µ).
Definition 9.22. Let (X, µ, T ) be an invertible measure-preserving system

as above. Then we say that T has purely discrete spectrum if UT is diagon-
alizable (equivalently, if for any f ∈ L2 (X, µ) the spectral measure µf with
respect to UT is atomic). We say that T has Lebesgue spectrum if for any
Z
f ∈ L20 (X, µ) = f ∈ L2 (X, µ) | f dµ = 0
the spectral measure µf with respect to UT is absolutely continuous with

respect to the Lebesgue measure on T.
Essential Exercise 9.23. (a) Prove that the rotation Rα : x 7→ x + α

for x, α ∈ T has purely discrete spectrum with respect to the Lebesgue meas-
ure on T.
(b) Prove that the map

x 01 x y
A: 7−→ =
y 11 y x+y
preserves Lebesgue measure on T2 and has Lebesgue spectrum.
We recall that if (Z, B) and (Z ′ , B ′ ) are measurable spaces, µ is a meas-

ure on Z and φ : Z → Z ′ is a measurable map, then we can define the
push-forward measure φ∗ µ on Z ′ by the formula φ∗ µ(B ′ ) = µφ−1 (B ′ ) for
all sets B ′ ∈ B ′ . Using this notion, invariance of a measure µ under a
map T : X → X is precisely the condition that the push-forward T∗ µ coin-
cides with the original measure µ.
Definition 9.24 (Furstenberg [37]). Let (X, νX , T ) and (Y, νY , S) be two

measure-preserving systems. A joining of these systems is a measure ρ on
the product X × Y with (T × S)∗ ρ = ρ, (πX )∗ ρ = νX , and (πY )∗ ρ = νY
where πX , πY denote the natural projections from X × Y onto X, Y respect-
ively. The two systems are called disjoint, written T ⊥ S, if ρ = νX × νY is
the only possible joining.
Furstenberg introduced the notion of joinings and also gave the first classes
of disjoint systems.
Theorem 9.25 (Disjointness for spectral reasons). Suppose (X, νX , T )

and (Y, νY , S) are two measure-preserving systems. Suppose moreover that for
any f in L20 (X, νX ) and any g in L20 (Y, νY ) the spectral measures µf (with
respect to UT on L2 (X, νX )) and µg (with respect to US on L2 (Y, νY )) are
mutually singular. Then the two systems are disjoint.
Proof. Let ρ be a joining of the two systems. For every f ∈ L2 (X, νX ) we

have f ◦ πX ∈ L2 (X × Y, ρ) with kf ◦ πX kL2 (X×Y,ρ) = kf kL2(X,νX ) and
UT ×S (f ◦ πX ) = UT (f ) ◦ πX .
Moreover,

n
UT ×S (f ◦ πX ), f ◦ πX L2 (X×Y,ρ) = hUTn f, f iL2 (X,νX )
for all n ∈ Z, which implies that the spectral measures µf ◦πX defined using
the unitary operator UT ×S : L2 (X × Y, ρ) → L2 (X × Y, ρ) agrees with µf . A
similar statement holds for g ∈ L2 (Y, νY ).
Applying this to f ∈ L20 (X, νX ) and g ∈ L20 (Y, νY ), using the assumption in
the theorem and Lemma 9.12(e) we see that f ◦ πX ⊥ g ◦ πY . Now let A ⊆ X
and B ⊆ Y be measurable sets and define f = 1A − νX (A) ∈ L20 (X, νX )
and g = 1B − νY (B) ∈ L20 (Y, νY ) to obtain
0 = hf ◦ πX , g ◦ πY i = h1A×Y − νX (A), 1X×B − νY (B)i

= h1A×Y , 1X×B i − νX (A) h1, 1X×B i − νY (B) h1A×Y , 1i + νX (A)νY (B)
= ρ(A × B) − νX (A)νY (B),
where all the inner products are taken in L2 (X × Y, ρ). As this holds for all
measurable A ⊆ X and B ⊆ Y , we deduce that ρ = νX × νY .
We wrap up the discussion of disjointness, and our excursion into ergodic
theory, by discussing a consequence of disjointness for the dynamics of indi-
vidual points.
Let X be a compact metric space, T : X → X a continuous map, and µ
a T -invariant and ergodic probability measure on X. A consequence of the
pointwise ergodic theorem (one of the fundamental results in ergodic theory,
see [27, Ch. 2, Sec. 4.4.2]) is that µ-almost every point x ∈ X satisfies
N −1 Z
1 X n
f (T x) −→ f dµ
N n=0 X
as N → ∞ for all f ∈ C(X), in which case x is called µ-generic. Exer-

cise 8.39 and the map in the proof of Proposition 8.34 give examples of
systems in which every point is generic for the (Lebesgue) measure on the
space. However, for the map A : T2 → T2 in Exercise 9.23(b) it is easy to
see that rational points in Q2 /Z2 are not generic. There are also irrational
non-generic points, but as the Lebesgue measure m on T2 is invariant and
ergodic for A, one obtains from the ergodic theorem that m-almost every
9.2 The Fourier Transform 329
point in T2 is m-generic. With these examples in mind, we can now give a

pointwise corollary of the discussion of spectral measures in the following
exercise.
Essential Exercise 9.26. Let X, Y be compact metric spaces, T : X → X
and S : Y → Y continuous maps, and νX ∈ P T (X), νY ∈ P S (Y ) be invariant
and ergodic probability measures. Suppose that the two systems are disjoint.
(a) Let (ρn ) be a sequence of probability measures on X × Y with the prop-
erty that (πX )∗ ρn → νX and (πY )∗ ρn → νY as n → ∞ with respect to the
weak* topology. Suppose that any limit limk→∞ ρnk of a weak* convergent
subsequence is T × S-invariant. Show that limn→∞ ρn = νX × νY .
(b) Let x ∈ X be νX -generic for T and let y ∈ Y be νY -generic for S. Use (a)
to show that (x, y) is νX × νY -generic for T × S.
(c) Show that the result of (b) applies in particular to the case of the ro-
tation T = Rα : T → T and the automorphism S = A : T2 → T2 from
Exercise 9.23.
9.2 The Fourier Transform
The Fourier transform generalizes the (important and satisfying) theory of

Fourier series on Td to an (equally important and satisfying) theory for func-
tions on Rd , which will in particular lead to a generalization of the spectral
theory of unitary operators (Theorem 9.2) to unitary flows (Theorem 9.58).
The analogue of the Fourier coefficients of a function on Td will be the Fourier
transform fb of a function f on Rd , defined by
Z
fb(t) = f (x)e−2πix·t dx, (9.6)
Rd
where x, t ∈ Rd and x · t = x1 t1 + · · · + xd td is the usual inner product. The

analogue of the Fourier series will be the Fourier back transform (or reverse
transform) qh of a function h on Rd , defined by
Z
qh(x) = h(t)e2πix·t dt. (9.7)
Rd
The analogue of the fact that the Fourier series represents the original func-
tion (where this is true) is a Fourier inversion formula f = (fb)q . However,
the way in which the optimistic identity f = (fb)q needs to be interpreted
as a mathematical theorem is more involved. For example, if f ∈ L2 (Rd )
then there is no reason to expect the integral defining the Fourier transform
in (9.6) to exist. However, we will still be able to obtain a sensible defini-
tion of the Fourier transform as an extension of a densely defined bounded
operator.
We also note that we will think of x ∈ Rd as the space variable and

of t ∈ Rd as the frequency variable. In fact, any t ∈ Rd defines the wave
function x 7→ e2πix·t for which t gives the frequency and the direction of
the wave. In that sense, fb(t) should be interpreted as the correlation of the
function f and the wave with frequency t. The formula f = (fb)q then means
that one can reconstruct the original function f as a suitable superposition
of the waves with frequency t and amplitude fb(t).
We start with a concrete example.
2
Example 9.27 (Gaussian distribution). Let f (x) = e−πkxk for x ∈ Rd .
Then fb(t) = e−πktk .
2
Proof. Suppose first that d = 1, and start by calculating fb(0). By definition,

Z
2
fb(0) = e−πx dx.
R
Thus
Z
2 2
fb(0)2 = e−πx e−πy dx dy (by Fubini)
R2
Z ∞ Z 2π
2
= e−πr r dθ dr (in polar coordinates)
0
Z ∞0 Z ∞
2
= 2π e−πr r dr = e−s ds = 1 (where πr2 = s)
0 0
and as fb(0) > 0 we get fb(0) = 1. To verify the claimed formula for a general t
in R we will use the Cauchy integral formula for complex path integrals
2
applied to the holomorphic function C ∋ z 7→ e−πz . We integrate over
a rectangular path γ with corners at ±M and ±M + it as illustrated in
Figure 9.1.
−M + it M + it
−M M
Fig. 9.1: The contour γ.

H 2
Then by Cauchy’s formula γ e−πz dz = 0. Bringing the integrals over the
third and the fourth piece of the path to the other side we obtain
Z M Z t Z t Z M
−πx2 −π(M+is)2 −π(−M+is)2 2
e dx+i e ds = i e ds+ e−π(x+it) dx. (9.8)
−M 0 0 −M
Now notice that 2 2

−s2 +2isx)
e−π(x+is) = e−π(x
which implies that for fixed t ∈ R and |s| 6 |t| we have

−π(±M+is)2 2 2
e 6 e−π(M −t )
2
which implies that e−π(±M+is) → 0 uniformly on [−t, t] as M → ∞. Thus
letting M → ∞ in (9.8) we see that
Z ∞ Z ∞
2 2 2
1= e−πx dx = e−πx e−2πitx dx eπt ,
−∞ −∞
| {z }
fb(t)
which gives fb(t) = e−πt for all t ∈ R.

2
For d > 1 notice that

2 2 2
f (x) = e−πkxk = e−πx1 · · · e−πxd = f1 (x1 ) · · · f1 (xd )
2
is a product of d copies of the function f1 (x) = e−πx discussed above, so
Z
fb(t) = f1 (x1 ) · · · f1 (xd )e−2πi(x1 t1 +···+xd td ) dx
d
ZR Z
= f1 (x1 )e−2πix1 t1 dx1 · · · f1 (xd )e−2πixd td dxd
|R {z } |R {z }
=fb1 (t1 ) =fb1 (td )
−πt21 −πt2d −πktk2
=e ···e =e
by Fubini’s theorem and the case d = 1.
9.2.1 The Fourier Transform on L1 (Rd )
Lemma 9.28 (Basic inequality). The Fourier transform in (9.6) is defined

for every f ∈ L1 (Rd ) and t ∈ Rd and satisfies kfbk∞ 6 kf k1 .
Proof. For f ∈ L1 (Rd ) and t ∈ Rd we have

Z Z

|fb(t)| = f (x)e−2πix·t dx 6 |f (x)| dx = kf k1 ,
Rd Rd
which proves the lemma.

The next result is the first of many duality principles involving Fourier
transforms.
Proposition 9.29 (Duality between shift and phase shift). For x0

and t0 ∈ Rd we define the shift operator λx0 and the multiplication oper-
ator Mχ(t0 ) on L1 (Rd ) by
λx0 (f ) : x 7−→ f (x − x0 )
and
Mχ(t0 ) (f ) : x 7−→ e2πix·t0 f (x).
Then λ\ b \ b
x0 (f ) = Mχ(−x0 ) (f ) and Mχ(t0 ) (f ) = λt0 (f ).
We note that by a phase shift we mean multiplication by a character, the

reason being this proposition.
Proof of Proposition 9.29. By definition,
Z
\
λx0 (f )(t) = f (x − x0 )e−2πix·t dx
Rd
Z
= f (y)e−2πi(y+x0 )·t dy = e−2πix0 ·t fb(t) = Mχ(−x0 ) (fb)(t)
Rd
for all x0 , t ∈ Rd and

Z
M\
χ(t0 ) (f )(t) = e2πix·t0 f (x)e−2πix·t dx
Rd
Z
= f (x)e−2πix·(t−t0 ) dx = fb(t − t0 ) = λt0 (fb)(t)
Rd
for all t0 , t ∈ Rd .
Proposition 9.30 (Duality for linear transformations). Let f ∈ L1 (Rd )

and let A ∈ GLd (R) be an invertible matrix. Then f ◦A ∈ L1 (Rd ), and
1 −1
fd
◦A = fb ◦ At .
| det A|
Proof. We use the definition and a substitution to get

Z
fd
◦A(t) = f (Ax)e−2πix·t dx
Rd
Z
1 (At )−1 t
= f (Ax)| det A|e−2πi(Ax)· dx
| det A| Rd
1
= fb (At )−1 t
| det A|
for all t ∈ Rd .
Proposition 9.31 (Duality of convolution and multiplication (I)).

For f1 , f2 ∈ L1 (Rd ) recall that the convolution f1 ∗ f2 ∈ L1 (Rd ) defined
by Z
f1 ∗ f2 (x) = f1 (y)f2 (x − y) dy
Rd
satisfies kf1 ∗ f2 k1 6 kf1 k1 kf2 k1 and f1 ∗ f2 = f2 ∗ f1 (so that L1 (Rd ) is a

commutative Banach algebra). The Fourier transform of f1 ∗ f2 is given by
f\ bb
1 ∗ f2 = f1 f2 .
Proof. Applying Fubini’s theorem and a substitution we see that

Z Z Z Z
|f1 (y)f2 (x − y)| dy dx = |f1 (y)f2 (x − y)| dx dy
Rd Rd d d
ZR ZR
= |f1 (y)||f2 (z)| dz dy = kf1 k1 kf2 k1 .
Rd Rd
Thus the integral defining f1 ∗ f2 (x) exists for almost every x ∈ Rd , and
Z Z
kf1 ∗ f2 k1 6 |f1 (y)f2 (x − y)| dy dx = kf1 k1 kf2 k1 .
Rd Rd
For commutativity we see that

Z Z
f1 ∗ f2 (x) = f1 (y)f2 (x − y) dy = f1 (x − z)f2 (z) dz = f2 ∗ f1 (x)
by using the substitution z = x − y for any fixed x ∈ Rd .

Now let t ∈ Rd and apply Fubini’s theorem to the definition of f\ 1 ∗ f2 (t)
to see that
Z Z
\
f1 ∗ f2 (t) = f1 (y)f2 (x − y) dy e−2πix·t dx
Rd Rd
Z Z
= f1 (y) f2 (x − y)e−2πi(x−y)·t dx e−2πiy·t dy = fb1 (t)fb2 (t).
R d d
|R {z }
fb2 (t)
Exercise 9.32. Show that the convolution in Proposition 9.31 is associative.
The impatient reader may use the propositions above together with Ex-
ample 9.27 to show that the Fourier transform extends to an isometry
from L2 (Rd ) to L2 (Rd ) via the steps of the following exercise.
Exercise 9.33. (a) Show that
( )
X 2
A= x 7−→ ci e−πai kx−xi k +2πix·ti
| ci ∈ C, ai > 0, xi , ti ∈ Rd
finite
is a sub-algebra of C0 (Rd ) that separates points and is closed under conjugation.

(b) Show that A b = A and that f = (fb)q = (fq)b for all f ∈ A.
(c) Show that A ⊆ C0 (Rd ) is dense with respect to k · k∞ .
(d) Show that A ⊆ L1 (Rd ) ∩ L2 (Rd ) is dense in both L1 (Rd ) and in L2 (Rd ) with respect
to the norms k · k1 and k · k2 respectively (which is not an immediate consequence of (c)
since Rd has infinite Lebesgue measure). Show that if F ∈ L1 (Rd ) ∩ L2 (Rd ) and ε > 0
then there exists a single function f ∈ A with kf − F k1 < ε and kf − F k2 < ε.
(e) Show that kfbk2 = kf k2 for all f ∈ A so that the Fourier transform extends to a unitary
map on L2 (R) with inverse given by the Fourier back transform. Moreover, the extension
agrees with the Fourier transform defined by the Lebesgue integral on L1 (Rd ) ∩ L2 (Rd ).
Proposition 9.34 (Riemann–Lebesgue lemma). The Fourier transform

maps L1 (Rd ) into C0 (Rd ).
Proof. If tn → t in Rd as n → ∞, then also f (x)e−2πix·tn → f (x)e−2πix·t

for almost every x ∈ Rd and so
Z Z
fb(tn ) = f (x)e−2πix·tn dx −→ f (x)e−2πix·t dx = fb(t)
Rd Rd
as n → ∞ by the dominated convergence theorem. Therefore fb is a bounded

continuous function on Rd by Lemma 9.28. It remains to show that fb lies
in C0 (Rd ), which we will do by an approximation argument.
Suppose first that f = 1[a1 ,b1 ]×···×[ad ,bd ] is the characteristic function of a
rectangle. Then, by Fubini’s theorem,
Z
fb(t) = 1[a1 ,b1 ] (x1 ) · · · 1[ad ,bd ] (xd )e−2πix·t dx
Rd
Z b1 Z bd
= e−2πix1 t1 dx1 · · · e−2πixd td dxd .
a1 ad
Each factor can be calculated explicitly, in fact

Z ( −2πibt −2πiat
b e −e
for t 6= 0, and
e−2πixt dx = −2πit (9.9)
a b−a for t = 0,
so each factor lies in C0 (R). It follows that fb ∈ C0 (Rd ) if f is the characteristic

function of a rectangle.
By linearity the same holds for finite linear combinations of such func-
tions. Since C0 (Rd ) is complete with respect to k · k∞ and the Fourier
transform is continuous from L1 (Rd ) to Cb (Rd ), the same holds for any ele-
ment f ∈ L1 (Rd ) that can be approximated in L1 (Rd ) by such finite linear
combinations, which is all of L1 (Rd ) (see the argument on p. 172).
Exercise 9.35. Show that the Fourier transform calculated in (9.9) does not belong
to L1 (R).
As mentioned above, we will show that the Fourier back transform is the
inverse of the Fourier transform. However, as we will see, this requires ad-
ditional assumptions on the function, since the hypothesis f ∈ L1 (Rd ) does
not imply that fb ∈ L1 (Rd ) (as seen in Exercise 9.35), so there is no reason
to expect that the Fourier back transform will be defined on fb.
Theorem 9.36 (Fourier inversion). If f ∈ L1 (Rd ) has fb ∈ L1 (Rd ),

then f agrees almost everywhere with the continuous function (fb)q ∈ C0 (Rd ).
Despite the additional assumption, this theorem already implies that

any L1 function is uniquely determined by its Fourier transform. However, if
the latter is not integrable, it is unclear how to recover f from fb.
Corollary 9.37 (Injectivity). If f1 , f2 ∈ L1 (Rd ) have fb1 = fb2 , then f1 = f2 .
Proof. Given f1 , f2 ∈ L1 (Rd ) as in the corollary, the function f = f1 − f2

satisfies fb = 0 ∈ L1 (Rd ). Applying Theorem 9.36 this implies that f = 0 and
proves the corollary.
In order to prove Theorem 9.36 we need a preparatory lemma.
Z Z
Lemma 9.38. If f, g ∈ L1 (Rd ) then fbg dx = fb
g dy.
Rd Rd
Proof. Once again, this is a simple application of Fubini’s theorem, as

Z Z Z
b
f (x)g(x) dx = f (y)e−2πiy·x dy g(x) dx
Rd Rd Rd
Z Z Z
= f (y) g(x)e−2πiy·x dx dy = f (y)b
g(y) dy.
Rd Rd Rd
Proof of Theorem 9.36. Let f ∈ L1 (Rd ) such that fb ∈ L1 (Rd ). By Pro-

position 9.34 we have (fb)q ∈ C0 (Rd ). We need to show that (fb)q agrees with f
almost everywhere. To achieve this, we will use Lemma 9.38 for f and the
phase-shifted stretched Gaussian distribution
φr,x0 (t) = e2πit·x0 φr (t)
where 2
φr (t) = e−πkrtk
for x0 ∈ Rd and r > 0. Using Lemma 9.38 we can define the function fr in
two equivalent ways by
Z Z
fr (x0 ) = fb(t)φr,x0 (t) dt = f (x)φ[
r,x0 (x) dx (9.10)
Rd Rd
for all x0 ∈ Rd . We will use the two sides of this formula to show that fr
converges as r → 0 both to (fb)q and to f (in two different ways).
Pointwise convergence. We first show that fr → (fb)q pointwise as r → 0
(where we will use the left-hand integral in (9.10)). Since
Z
2
fr (x0 ) = fb(t)e2πit·x0 e−πkrtk dt
Rd
2
and e−πkrtk → 1 as r → 0, we obtain
Z
fr (x0 ) −→ fb(t)e2πit·x0 dt = (fb)q(x0 )
Rd
by the dominated convergence theorem, for any x0 ∈ Rd .

Convergence in L1 . We next show that fr → f in L1 (Rd ) as r → 0 (which
will use the right-hand integral in (9.10)).
The proof of this step is a bit more involved and relies on the interpretation
of the right-hand integral as a convolution with the approximate identity φ cr
(it is easy to see — from the proof that follows, for example — that φ cr has
properties similar to the Fejér kernel in Section 3.4.2 and the function used in
Exercise 5.17; see also Exercise 8.6). By Example 9.27 and Proposition 9.30
we have
cr (x) = r−d e−πkx/rk2 ,
φ
and by Proposition 9.29 we also have
φ[
r,x0 (x) = r
−d −πk(x−x0 )/rk2
e .
This gives
Z
2
fr (x0 ) = cr (x0 )
f (x)r−d e−πk(x0 −x)/rk dx = f ∗ φ
cr ∈ L1 (Rd ) by Proposition 9.31. We may bring the difference of fr

and f ∗ φ
and f at x0 into the form
Z
2
cr (x0 ) − f (x0 ) =
f ∗φ f (x0 − x)r−d e−πkx/rk dx − f (x0 )
Z
2
= (f (x0 − rz) − f (x0 )) e−πkzk dz
R 2
by using the substitution z = x/r (and recalling that Rd e−πkzk dz = 1). On
taking the norm we obtain
Z Z

cr − f k1 = f (x0 − rz) − f (x0 ) e−πkzk2 dz dx0
kf ∗ φ
ZZ
2
6 |f (x0 − rz) − f (x0 )|e−πkzk dz dx0
Z
2
6 kλrz (f ) − f k1 e−πkzk dz
by Fubini’s theorem. Now notice that kλrz (f ) − f k1 → 0 as r → 0 for z ∈ Rd

by Lemma 3.74. Therefore
cr − f k1 −→ 0
kf ∗ φ
as r → 0 by dominated convergence, which proves the claimed convergence

in L1 .
The limits coincide. Since every sequence that converges in L1 (Rd ) has a
subsequence that converges almost everywhere (see, for example, the proof
of completeness of Lp spaces on p. 34), the two statements regarding the
convergence properties of fr as r → 0 together prove the theorem.
9.2.2 The Fourier Transform on L2 (Rd )
As mentioned before, the Fourier transform behaves quite well on L2 (Rd )

(after overcoming the minor obstacle that the defining integral does not make
sense).
Theorem 9.39 (Plancherel formula). If f ∈ L1 (Rd ) ∩ L2 (Rd ), then fb

lies in L2 (Rd ) with kfbk2 = kf k2 and the map f 7→ fb extends continuously
to a unitary operator on L2 (Rd ) (whose inverse is the continuous extension
of f 7→ fq).
Proof: A dense subspace. We define the space of functions
V = {f ∈ L1 (Rd ) | fb ∈ L1 (Rd )}.
By Theorem 9.36 we have f = (fb)q ∈ C0 (Rd ) almost everywhere for f ∈ V.

Hence V ⊆ L∞ (Rd ) consists of essentially bounded functions and so
|f |2 = f f ∈ L1 (Rd )
for all f ∈ V so V ⊆ L2 (Rd ). We claim that V is actually dense in L2 (Rd ). For

this, notice first that L1 (Rd ) ∩ L2 (Rd ) is dense in L2 (Rd ) (since it contains
simple functions), so that it is enough to approximate a given function in
the intersection L1 (Rd ) ∩ L2 (Rd ) by an element of V with respect to k · k2 .
Using the same notation as in the proof of Theorem 9.36, we already know
that f ∗ φcr converges to f in L1 (Rd ) as r → 0. By Proposition 9.31 we see
that
\ cr = fbφr
f ∗φ
belongs to L1 (Rd ) (since it is a product of an element in C0 (Rd ) with an ele-
ment of L1 (Rd )), which shows that f ∗ φ cr ∈ V. Analyzing the argument, we
see that only a slight modification is necessary to also obtain L2 convergence
cr → f as r → 0. Indeed, using Jensen’s inequality (see the first para-
of f ∗ φ
2
graph of the proof of Lemma 3.75) with the probability measure e−π|z| dz
on R we get
d
Z Z

cr − f 2 = (f (x0 − rz) − f (x0 ))e−π|z|2 dz 2 dx0
f ∗ φ
2
ZZ

6 λrz f (x0 ) − f (x0 )2 e−π|z|2 dz dx0
2
Z
2 2
= λrz f − f 2 e−π|z| dz −→ 0
as r → 0, again by dominated convergence. This gives the claimed density.

Unitarity. We will now show that the Fourier transform preserves the inner
product. For this, let f, g ∈ V and define h = gb. Then
Z Z
b
h(x) = gb(t)e−2πix·t dt = g)q(x) = g(x)
gb(t)e2πix·t dt = (b
Rd Rd
almost everywhere by Theorem 9.36. Applying Lemma 9.38 we see that

Z Z Z Z
hf, giL2 (Rd ) = f g dx = b
f h dx = b
f h dt = fbgb dt = hfb, b
giL2 (Rd ) .
Rd Rd Rd Rd
In other words, we have shown that the Fourier transform preserves the inner
product for elements in V. It follows that the Fourier transform extends to
an isometry from L2 (Rd ) to itself, which we again denote by
L2 (Rd ) ∋ f 7−→ fb ∈ L2 (Rd ).
b = V (which follows directly from Theorem 9.36) is dense in L2 (Rd )

Since V
(by the above), the extension is surjective. Clearly the same discussion ap-
plies to the Fourier back transform. Moreover, since (fb)q = f for f ∈ V by

Theorem 9.36, the same holds for all f ∈ L2 (Rd ).
As we will use the same symbol fb for the Fourier transform of f defined
for f ∈ L1 (Rd ) by (9.6) and defined for f ∈ L2 (Rd ) by the unique con-
tinuous extension from V ⊆ L2 (Rd ), we still need to check that for a
function f ∈ L1 (Rd ) ∩ L2 (Rd ) these definitions agree. Fortunately, most
of the required work has already been done. Let f ∈ L1 (Rd ) ∩ L2 (Rd ), so
that V ∋ f ∗ φcr → f as r → 0 both in L1 (Rd ) and in L2 (Rd ). As both no-
\ cr → fbL1 in C0 (Rd )
tions of Fourier transforms are continuous we obtain f ∗ φ
\ cr → fbL2 in L2 (Rd ) as r → 0, where we write fbL1 for the Fourier
and f ∗ φ
transform defined using (9.6) and fbL for the Fourier transform obtained
2
by the above continuous extension. Taking a sequence (rn ) with rn → 0

\ d bL2 almost everywhere as n → ∞ we deduce
as n → ∞ such that f ∗ φ rn → f
that fbL = fbL almost everywhere, as desired.
2 1

Using the Plancherel formula we can give the reverse direction of the du-
ality we first enountered in Proposition 9.31.
Corollary 9.40 (Duality of convolution and multiplication (II)). For

functions f, g ∈ L2 (Rd ) the pointwise product f g ∈ L1 (Rd ) has Fourier trans-
form fcg = fb ∗ gb.
Proof. Note that since fb, gb ∈ L2 (Rd ), the integral in

Z
b
f ∗b g(t) = fb(s)b
g (t − s) ds
Rd
exists for every t ∈ Rd by the Cauchy–Schwarz inequality. Also note that

Z Z
b
g(t) = g(x)e −2πix·t dx = g(x)e2πix·t dx = gb(−t).
Rd Rd
Since the map f 7→ fb is unitary on L2 (Rd ), this gives

Z Z

fb ∗ b
g(0) = fb(t)b
g (−t) dt = fb, b
g L2 (Rd ) = f, g L2 (Rd ) = f g dx = fcg(0).
Rd
Using Proposition 9.29 and λ−t0 (b g )(−t) = b g (t0 − t) for all t ∈ Rd we can
extend this to
Z
fb ∗ gb(t0 ) = fb(t)b
g (t0 − t) dt = fb ∗ λ−t0 (b
g)(0)
Rd
Z

b \
= f ∗ Mχ(−t0 ) (g) (0) = f Mχ(−t0 ) (g) dx = fcg(t0 )
Rd
for all t0 ∈ Rd .
Exercise 9.41. Show that the unitary operator L2 (Rd ) ∋ f 7→ fb ∈ L2 (Rd ) is completely
diagonalizable and has only four eigenvalues.
Exercise 9.42. Use the Riesz–Thorin interpolation theorem to prove the Hausdorff–Young
inequality. Fix p ∈ (1, 2). Show that the Fourier transform on L1 (Rd ) ∩ L2 (Rd ) can be
extended to all f ∈ Lp (Rd ) so that kfbkq 6 kf kp where p1 + 1q = 1.
9.2.3 The Fourier Transform, Smoothness, Schwartz Space
As with Fourier series in Section 3.4, smoothness and decay properties of the
For x ∈α R and α ∈ N0 we write x d
d d α
Fourier transform are closely related.
α1 αd
for x1 · · · xd and define M(cI)α f (x) = (cx) f (x) for any function f on R
and scalar c.
Proposition 9.43 (Duality between differentiation and multiplica-
tion by monomials). If x 7→ xα f (x) lies in L1 (Rd ) for all α ∈ Nd0
with kαk1 6 k, then fb ∈ C k (Rd ), and
V
∂α fb = M(−2πiI)α (f )
for all α with kαk1 6 k. If f ∈ C k (Rd ) and ∂α f ∈ L1 (Rd ) for kαk1 6 k

and ∂α f ∈ C0 (Rd ) for kαk1 6 k − 1, then
∂d
α f (t) = M(2πiI)α f
for all α with kαk1 6 k.
Proof. Suppose that f and x 7→ xj f (x) lie in L1 (Rd ). Then
fb(t + hej ) − fb(t)

∂j fb(t) = lim
h→0 h
Z
e−2πix·(t+hej ) − e−2πix·t
= lim f (x) dx
h→0 Rd h
Z
e−2πihxj − 1 −2πix·t
= lim f (x) e dx
h→0 Rd h
if the limit exists. Now notice that

e−2πihxj − 1
−→ −2πixj
h
as h → 0 is bounded in absolute value by 2π|xj | by the two-dimensional
mean value theorem for differentiation. Applying the dominated convergence
theorem we deduce that the above limit exists and is equal to
Z V
∂j fb(t) = f (x)(−2πixj )e−2πix·t dx = M(−2πiI)ej f (t)

Rd
and so the first part of the proposition now follows by induction on k.

Now suppose f ∈ C 1 (Rd ), f ∈ C0 (Rd ) and f, ∂j f ∈ L1 (Rd ). Then
Z
∂f
∂d
j f (t) = (x)e−2πix·t dx.
Rd ∂xj
By Fubini’s theorem we may evaluate this integral by first integrating over xj .

Assuming as we may that this one-dimensional integral is finite, we apply
integration by parts to obtain
Z xj =M
∂f
(x)e−2πix·t dxj = lim f (x)e−2πix·t
R ∂x j M→∞ xj =−M
Z M
− lim f (x)(−2πitj )e−2πix·t dxj
M→∞ −M
Z
= 2πitj f (x)e−2πix·t dxj
R
by all of our assumptions on f . Integrating over the remaining variables this

gives the second formula in the proposition in the case α = ej and the general
case now follows by induction.
Proposition 9.43 says in particular that the Fourier transform of a smooth
function in C0 (Rd ) ∩ L1 (Rd ) whose derivatives also lie in C0 (Rd ) ∩ L1 (Rd )
(for example, any element of Cc∞ (Rd )) has a Fourier transform with super-
polynomial decay. That is, the Fourier transform fˆ multiplied by any poly-
nomial is bounded (and still decays). Similarly, a function that has super-
polynomial decay has a smooth Fourier transform. Given these observations,
the next definition describes a natural class of functions invariant under dif-
ferentiation and under Fourier transforms.
Definition 9.44. The Schwartz space on Rd is defined by

S (Rd ) = f : Rd → C | f is smooth and kxα ∂β f k∞ < ∞ for all α, β ∈ Nd0 .
The following exercises describe the main properties of S (Rd ) and of the
Fourier transform on S (Rd ).
Essential Exercise 9.45. (a) Show that S (Rd ) is a Fréchet space (see
Definition 8.65) with the seminorms kf kα,β = kxα ∂β f k∞ for f ∈ S (Rd )
and α, β ∈ Nd0 .
(b) If the seminorms kf k′α,β = k∂β (xα f (x))k∞ are used instead, do you get
the same Fréchet space?
(c) What happens if we replace the supremum norms by 1-norms or 2-norms?
(d) Show that S (Rd ) ⊆ Lp (Rd ) for all p ∈ [1, ∞].
Essential Exercise 9.46. Show that the Fourier transform c maps S (Rd )
to itself, is a continuous operator, and has the Fourier back transform | as
its continuous inverse.
Exercise 9.47. Prove the Poisson summation formula:
X X
f (n) = fb(n)
n∈Zd n∈Zd
for f ∈ S (Rd ).
We will use the following in Chapters 11 and 14.
Essential Exercise 9.48. Show that C\

∞ p
c (R) is a dense subspace of L (R)
for any p ∈ [1, ∞).
9.2.4 The Uncertainty Principle
The Fourier transform viewed as a homeomorphism S (R) → S (R) (see

Exercise 9.46) has multiple physical interpretations
• A function f ∈ S (R) with kf k2 = 1 may describe the probability distri-
bution of the position
R of a particle, so that the probability of the particle
being in B ⊆ R is B |f (x)|2 dx. In this case fb gives a probability distribu-
tion of the momentum of the particle. Here Heisenberg’s uncertainty prin-
ciple from quantum mechanics states that if the position distribution f
is strongly localized at one position, then the momentum distribution fb
is forced to be spread out over a big range.
• A function f ∈ S (R) may describe a sound, in which case fb describes
the frequencies present. Sampling f for a short time may be thought of
as localizing f by multiplying it by a function in S (R) supported on a
short interval. It is intuitively clear that if the sampling interval is very
short, then we cannot get too much information about the frequencies
present: f and fb cannot both be strongly localized.
It turns out that both of these observations are a consequence of the Cauchy–
Schwarz inequality, and they have a convenient precise formulation as fol-
lows.(28) As before, we write MI (f )(x) = xf (x) for x ∈ Rd and f ∈ L2 (Rd ).
Note that if we think of f ∈ L2 (R) with kf k2 = 1 as givingRus a probability
distribution for the position of a particle, then kMI (f )k2 = R |x|2 |f (x)|2 dx
gives the expectation of the squared distance to the origin. If that expectation
were to be small, then f would certainly be quite localized near the origin.
Similarly, kMI (fb)k2 measures how much fb is localized at zero momentum
(see also Exercise 9.51 for other positions and momenta).
Theorem 9.49 (Uncertainty principle). For f ∈ S (R) we have
1
kMI (f )k2 kMI (fb)k2 > kf k22 . (9.11)
4π
Proof. First notice that for z, w ∈ C we have
zw + zw = 2ℜ(zw). (9.12)
Then
Z Z
kf k22 = 2
|f (x)| dx = f (x)f (x) dx
R R
∞ Z

= x|f (x)|2 − x f ′ (x)f (x) + f (x)f ′ (x) dx
| {z –∞ } R
=0 as f ∈S (R)
Z

= −2 xℜ f (x)f ′ (x) dx. (by (9.12))
R
Hence Z

kf k22 = −2 ℜ xf (x)f ′ (x) dx 6 2kMI (f )k2 kf ′ k2
R
by the Cauchy–Schwarz inequality. Finally we may apply (a very special case

of) Proposition 9.43 to obtain
V
kf ′ k2 = kf ′ k2 (by Theorem 9.39)

= 2πkMI (fb)k2 , (by Prop. 9.43)
so that
kf k22 6 4πkMI (f )k2 kMI (fb)k2 ,
as claimed in the theorem.
Exercise 9.50. Show that if for some f ∈ S (R) we have equality in (9.11) then f has the
2 2
form f (x) = Ae−B x for constants A ∈ C and B > 0.
Exercise 9.51. Extend Theorem 9.49 by showing that for any f ∈ S (R) and x0 , t0 in R
we have
1
kMI−x0 (f )k2 kMI−t0 (fb)k2 > kf k22 ,
4π
and that equality holds only if
2
(x−x0 )2
f (x) = Ae2πixt0 e−B
for constants A ∈ C and B > 0.
We note that in Exercise 3.50 we saw another instance of an uncertainty

principle for finite abelian groups.
9.3 Spectral Theory of Unitary Flows
Recall from Definition 3.73 the definition of a unitary representation of a

topological group G. In the case G = R, or more generally G = Rd , we will
also refer to it as a unitary flow. The important notion of a positive-definite
sequence generalizes easily to this setting.
9.3.1 Positive-Definite Functions and Cyclic Representations
Definition 9.52. Let G be a topological group. A continuous function
p:G→C
is called a continuous positive-definite function if for any choice of con-

stants c1 , . . . , cℓ ∈ C and g1 , . . . , gℓ ∈ G we have
ℓ
X
cm cn p(gn−1 gm ) > 0.
m,n=1
Just as in Section 9.1, this notion is intimately connected to cyclic repres-

entations of the group (Definition 9.7).
Lemma 9.53 (Matrix coefficients). Let G be a topological group and as-
sume that π : G ý H is a unitary representation of G on H. For any v ∈ H
the function pπ,v : G → C defined by pπ,v (g) = hπg v, vi for g ∈ G, also known
as the principal matrix coefficient of v, is a continuous positive-definite func-
tion. Moreover, the function pπ,v uniquely characterizes the cyclic representa-
tion Hv generated by the element v. More precisely, if π ′ : G
′ ′
ý
H′ is another
unitary representation and there is some v ∈ H with pπ,v = pπ ,v′ then there
′
is a unitary isomorphism Ψ : Hv → Hv′ ′ with Ψ (v) = v ′ and Ψ ◦ πg = πg′ ◦ Ψ

for all g ∈ G.
If there is a unitary isomorphism with the properties of Ψ in the lemma,
then we say that the representations π and π ′ are unitarily isomorphic.
Proof of Lemma 9.53. The argument is essentially the same as that used
in the sequence case (see the justification for Example 9.5 and the proof of
Corollary 9.8), so we will be brief. Let c1 , . . . , cℓ ∈ C and g1 , . . . , gℓ ∈ G. Then
ℓ
X ℓ
X
cm cn pπ,v (gn−1 gm ) = cm cn hπgm v, πgn vi
m,n=1 m,n=1
* ℓ ℓ
+
X X
= cm πgm v, cn πgn v > 0,
m=1 n=1
which gives the first claim of the lemma.

9.3 Spectral Theory of Unitary Flows 345
Now suppose that π ′ , H′ , and v ′ have the properties stated in the lemma,
namely pπ,v = pπ′ ,v′ . Then the elements
ℓ
X
cm πgm v ∈ Hv
m=1
and
ℓ
X
cm πg′ m v ′ ∈ Hv′ ′
m=1
have the same norms in their respective Hilbert spaces as both norms can
be expressed as above in terms of the positive-definite function p. We can
define Ψ on a dense subset of Hv by setting
ℓ
! ℓ
X X
Ψ cm πgm v = cm πg′ m v ′ ,
m=1 m=1
and this is a well-defined isometry mapping from a dense subset of Hv onto a

dense subset of Hv′ ′ . The lemma follows by taking the automatic continuous
extension from Proposition 2.59.
Using only the definition we can prove the following elementary properties
of positive-definite functions.
Lemma 9.54 (Properties of positive-definite functions). Let G be a

topological group and let p : G → C be a continuous positive-definite function
on G. Then p(g −1 ) = p(g) for all g ∈ G and kpk∞ = p(e).
Proof. Applying the defining property for a positive-definite function with

the choices x1 = e, c1 = 1, x2 = g ∈ G, and c2 = α ∈ C, we obtain
p(e) + |α|2 p(e) + αp(g) + αp(g −1 ) > 0.
As this holds for all α ∈ C, we may set α = 0 and see that p(e) > 0. Now
use both α = 1 and α = i to see that p(g −1 ) = p(g). Finally, if p(g) 6= 0,
we may set α = −|p(g)|/p(g) to see that 2p(e) − 2|p(g)| > 0. It follows
that |p(g)| 6 p(e) (which also holds if p(g) = 0), with equality for g = e,
giving the lemma.
Exercise 9.55. Let G be a topological group.

(a) Show that any positive-definite function p on G has the form p = pπ,v for some cyclic
unitary representation π on a Hilbert space H and generator v ∈ H.
(b) Show that P1 (G) = {p ∈ Cb (G) | p is positive-definite and p(e) = 1} is convex, and
show that if p ∈ P1 (G) is extreme then the associated unitary representation in (a) is
irreducible (that is, has no non-trivial proper π-invariant closed subspaces).
The converse of the statement in Exercise 9.55(b) also holds, and will be
shown later (see Exercise 12.59).
9.3.2 The Case G = Rd
The following describes all positive-definite functions for Rd and hence once
again all cyclic representations of Rd .
Theorem 9.56 (Bochner’s theorem for Rd ). Let d > 1 and suppose

that p : Rd → C is a continuous positive-definite function. Then there exists
a uniquely determined finite measure µp on Rd satisfying
Z
p(x) = e2πix·t dµp (t)
Rd
for all x ∈ Rd .

We note that this means that p(x) = Mχ(x) 1, 1 L2 (Rd ,µ , where Mχ(x) is
p
defined by Mχ(x) (f )(t) = e2πix·t f (t) for all f ∈ L2 (Rd , µp ) and t ∈ Rd .
Essential Exercise 9.57. Let µ be a finite measure on Rd with d > 1. Show

that Rd ∋ x 7→ Mχ(x) defines a unitary representation of Rd on L2 (Rd , µ)
(and, in particular, also satisfies the continuity requirement of a unitary rep-
resentation in Definition 3.73).
We postpone the proof of Bochner’s theorem and first discuss one of its
corollaries, the spectral theorem.
Theorem 9.58 (Spectral theorem for Rd ). Let d > 1 and suppose that π
is a unitary representation Rd ý
HL on a separable complex Hilbert space H.
Then there is a decomposition H = n>1 Hvn for some sequence (vn ) in H.
Moreover, for every v ∈ H the unitary representation π : Rd ý
Hv is unitar-
ily isomorphic to the unitary representation Mχ(x) on L2 (Rd , µv ), where µv
is the spectral measure of v ∈ H (obtained from pπ,v and Theorem 9.56)
and Mχ(x) is the unitary multiplication operator on L2 (Rd , µv ), as above.
Proof. The argument after Definition 9.7 shows that H can be written
as an orthogonal direct sum of cyclic representations Hvn for some vec-
tors v1 , v2 , . . . ∈ H. We apply Bochner’s theorem (Theorem 9.56) to a cyclic
representation, say Hv for v ∈ H, to find the spectral measure. Lemma 9.53,
the comment after Theorem 9.56, and Exercise 9.57 show that the cyclic
representation is isomorphic to the cyclic representation generated by 1
inside L2 (Rd , µ). It remains to show that this representation is all of the
space H′ = L2 (Rd , µ).
Suppose therefore that f ∈ L2 (Rd , µ) belongs to the orthogonal comple-
ment of H1′ , so that
Z

f (t)e2πix·t dµ(t) = f, Mχ(−x) 1 = 0
Rd
for all x ∈ Rd . Let g ∈ S (Rd ). Since (b g )q = g and b g ∈ L1 (Rd ), we obtain

Z Z
hf, giL2 (µ) = f g dµ = g )q dµ
f (b
Rd Rd
Z Z
= f (t) gb(x)e2πix·t dx dµ(t)
Rd Rd
Z Z
= gb(x) f (t)e2πix·t dµ(t) dx = 0
Rd Rd
by Fubini’s theorem. Recalling that S (Rd ) ⊇ Cc∞ (Rd ) is dense in L2µ (Rd ),
we see that f = 0.
⊥
Since this holds for all f ∈ (H1′ ) it follows that H1′ = H′ = L2 (Rd , µ), as
required.
For the proof of Bochner’s theorem it will be convenient to reformulate
the defining property of positive-definite functions in terms of convolution as
in the next lemma.
Lemma 9.59 (Positive-definite functions and convolutions). Assume

that d > 1 and suppose p : Rd → C is a continuous positive-definite function
on Rd . For f ∈ L1 (Rd ), define fe ∈ L1 (Rd ) by fe(x) = f (−x) for x ∈ Rd . Then
Z

f ∗ fe p dx > 0.
Rd
Proof. We first suppose that f ∈ Cc (Rd ) has Supp(f ) ⊆ [−M, M ]d

for some M > 0. In the following we write {P1 , . . . , Pn } for a partition
of [−M, M ]d into squares and assume xi ∈ Pi for i = 1, . . . , n. Then
Z Z Z

e
f ∗ f p dx = f (y)f (y − x)p(x) dy dx
Rd [−M,M]d | {z }
=z
Z Z
= f (y)f (z)p(y − z) dy dz
[−M,M]d [−M,M]d
Xn
= lim f (xi )m(Pi )f (xj )m(Pj )p(xi − xj )
i,j=1
R
is a limit of Riemann sums of the form in Definition 9.52, so f ∗ fe p dx > 0
whenever f ∈ Cc (Rd ). Approximating an arbitrary function f ∈ L1 (Rd ) by
such functions (using the continuity of a product in a Banach algebra from
Proposition 9.31) gives the result.
Proof of Theorem 9.56. Let p : Rd → C be a continuous positive-definite

function. Let us first prove the uniqueness and assume µp is as in the theorem.
For f ∈ S (Rd ) we then have
Z Z Z Z Z
f dµp = (fb)q dµp = fb(x)e2πix·t dx dµp (t) = fb(x)p(x) dx
Rd Rd Rd Rd Rd
R
by Fubini’s theorem. In particular, we see that p determines Rd f dµp
uniquely for every f ∈ Cc∞ (Rd ) ⊆ S (Rd ). Since Cc∞ (Rd ) is dense in C0 (Rd )
(which contains Cc (Rd )) it follows that p determines µp uniquely by the Riesz
representation (Theorem 7.44).
We now turn to the existence. We will obtain the measure µ = µp by
defining a positive functional on C0 (Rd ), whose properties will be proved in
several steps.
First step: Definition on the Schwartz space. We initially define the
functional on S (Rd ) (which only contains rapidly-decaying smooth func-
tions). For f in S (Rd ) let
Z Z Z
Λ(f ) = fbp dx = f (t)e−2πix·t dt p(x) dx,
Rd Rd Rd
which is well-defined since fb ∈ L1 (Rd ) and kpk∞ = p(0) < ∞.

Second step: Positivity. The assumption on p immediately gives a certain
amount of positivity. Suppose f ∈ S (Rd ) and note that
Z Z
e
fb(x) = f (t)e −2πix·t
dt = f (t)e2πix·t dt =fb(x)
Rd Rd
(see Lemma 9.59 for the definition of fe) and that |f |2 ∈ S (Rd ). By the
duality of multiplication and convolution (Corollary 9.40) we obtain
e
d
|f |2 = fb ∗ fb = fb ∗fb,
and so Z
e
2
Λ(|f | ) = fb ∗fb p dx > 0
by Lemma 9.59.
We wish to upgrade the above positivity statement to say that f > 0
and f ∈ Cc∞ (Rd ) implies Λ(f ) > 0. So let f ∈ Cc∞ (Rd ) be non-negative, and
define q
hε (t) = f (t) + εe−πktk2 .
Notice that hε ∈ S (Rd ) so that

2
Λ(f ) + εΛ e−πktk = Λ |hε |2 > 0
for all ε > 0, and hence Λ(f ) > 0.

Third step: Boundedness with respect to k · k∞ . Next we wish to
obtain an estimate for Λ(f ) for any f ∈ Cc∞ (Rd ) that only depends on kf k∞ .
We assume first that f ∈ Cc∞ (Rd ) is real-valued, and fix ε > 0. Then, for
2
sufficiently large a > 0, we have f (t) < (1 + ε)kf k∞ e−πkt/ak , for all t ∈ R
2
and similarly for −f . Therefore, we have (1 + ε)kf k∞ e−πkt/ak − f (t) = |h|2
where q
h(t) = (1 + ε)kf k∞ e−πkt/ak2 − f (t),
and h ∈ S (Rd ), which as above gives the inequality

2
(1 + ε)kf k∞ Λ e−πkt/ak − Λ(f ) > 0.
2
Note that e−πkt/ak also has a square root inside S (Rd ) which gives
2
Λ e−πkt/ak >0
2
and so Λ(f ) ∈ R satisfies Λ(f ) 6 (1+ε)kf k∞Λ e−πkt/ak . The same estimate
also holds for Λ(−f ). However,
Z
−πkt/ak2
d −πkaxk2
Λ e = | e {z } p(x) dx 6 kpk∞ = p(0)
a
Rd
k·k1 =1
can be bounded independently of a > 0. It follows that
|Λ(f )| 6 kf k∞ p(0) (9.13)
for any R-valued f ∈ Cc∞ (Rd ).

If f ∈ Cc∞ (Rd ) is C-valued let α ∈ C have absolute value one and sat-
isfy |Λ(f )| = αΛ(f ). Now note |Λ(f )| = Λ(αf ) = Λ(ℜ(αf )) (since Λ maps R-
valued functions into R) and apply (9.13) for ℜ(αf ) to obtain
|Λ(f )| = |Λ(ℜ(αf ))| 6 kf k∞ p(0),
showing that (9.13) holds for all f ∈ Cc∞ (Rd ).

Last Step: Existence and properties of the measure. By the pre-
vious step the functional Λ|Cc∞ (R) extends continuously to a functional Λ0
on C0 (Rd ). Furthermore, every non-negative function in C0 (Rd ) can be ap-
proximated with respect to the supremum norm by non-negative functions
in Cc∞ (Rd ). Hence the extension of Λ0 is a positive continuous linear func-
tional on C0 (Rd ) and so defines, by the Riesz representation theorem (The-
orem 7.44 and Theorem 7.54), a finite positive measure µ on Rd satisfying
Z Z
fbp dx = Λ(f ) = f dµ (9.14)
Rd Rd
for all f ∈ Cc∞ (Rd ). R

Fix some non-negative h ∈ Cc∞ (Rd ) with Rd h dx = 1 and notice that

hr,x0 (x) = r−d h (x − x0 )/r
approximates the δ-measure at x0 as r → 0. Concretely, since p is continuous

we may apply dominated convergence and the substitution y = (x − x0 )/r to
see that
Z
p(x0 ) = lim h(y)p(x0 + ry) dy
r→0 Rd
Z Z
−d

= lim r h (x − x0 )/r p(x) dx = lim hr,x0 (x)p(x) dx. (9.15)
r→0 Rd r→0 Rd
In order to be able to combine this with (9.14) we calculate the Fourier

back transform of hr,x0 ,
Z

fr,x0 (t) = h~
r,x0 (t) = r−d h (x − x0 )/r e2πix·t dx
d
ZR
= h(y)e2πi(x0 +ry)·t dy = e2πix0 ·tq
h(rt),
Rd
where we again used the substitution y = (x − x0 )/r. By Fourier inversion

(Theorem 9.36) we also have hr,x0 = fd r,x0 . With (9.14) and (9.15) we now
obtain
Z Z
2πix0 ·t q
p(x0 ) = lim Λ(fr,x0 ) = lim e h(rt) dµ(t) = e2πix0 ·t dµ(t)
r→0 r→0 Rd Rd
R
by dominated convergence and since q
h(0) = Rd
h dx = 1.
We remark that the properties of spectral measures discussed in Sec-
tion 9.1.3 hold in the setting of unitary flows for essentially the same reasons,
but we will not pursue this here.
9.3.3 Stone’s Theorem
We already saw a connection between self-adjoint operators and unitary op-

erators in Exercise 9.10. The spectral theorem for unitary flows allows us to
expand this connection, leading to a complete description of a unitary flow
in terms of a ‘potentially unbounded self-adjoint’ operator. For simplicity
we restrict to the case of a one-parameter unitary flow (that is, a unitary
representation of G = R).
Theorem 9.60 (Stone’s theorem). Suppose that π : R ý

H is a unitary
representation of R on a separable complex Hilbert space H. Then the subspace

πx v − v
D = v ∈ H | lim exists in H
x→0 x
of differentiable vectors is dense in H, and D is the natural domain of the

closed operator
1 πx v − v
D ∋ v 7−→ A(v) = lim .
2πi x→0 x
Moreover, there exists an increasing sequence of closed,S A-invariant, π-
invariant subspaces D1 ⊆ D2 ⊆ · · · ⊆ D such that ℓ>1 Dℓ is dense
in H, A|Dℓ : Dℓ → Dℓ is self-adjoint, kA|Dℓ k 6 ℓ, and πx |Dℓ = exp(2πixA|Dℓ )
is defined by a convergent power series for all x ∈ R and ℓ > 1.
Proof. We will apply the spectral theorem (Theorem 9.2) for unitary flows
to describe the unitary representation in terms of multiplication operators
by scalars. As a slight simplification, we assume that H = Hv is cyclic for
some v ∈ H and refer to Exercise 9.62 for the general case.
Then by the spectral theorem we have Hv ∼ = L2µv (R), where v corres-
ponds to 1 and πx corresponds to Mχ(x) for all x ∈ R. As the isomorphism
between Hv and L2µv (R) is unitary it maps convergent sequences to conver-
gent sequences and hence also differentiable vectors (as in the definition of D)
for π precisely to the differentiable vectors for Mχ(·) . In other words, it suf-
fices to prove the theorem in the case where π = Mχ(·) and H = L2µ (R) for
a finite measure µ on R. We note that the spectral theorem provides a finite
measure, but the proof stays the same if µ is only locally finite. In this case,
we claim that D is given by D = {f ∈ L2µ (R) | MI (f ) ∈ L2µ (R)}. Indeed,
for f ∈ L2µ (R) we have

πx f − f e2πixt − 1
lim (t) = lim f (t) = 2πitf (t)
x→0 x x→0 x
pointwise wherever f (t) is defined. Therefore f ∈ D forces MI (f ) ∈ L2µ (R),

where MI (f )(t) = tf (t) for t ∈ R. 2πixt

For the other inclusion, note that e x −1 6 2πt by the mean value
theorem for vector-valued differentiable functions. If now f, MI (f ) ∈ L2µ (R)
then
2 Z 2πixt 2
πx f − f e −1
− 2πiM (f ) = f (t) − 2πitf (t) dµ(t)
x
I x
| {z }
6(4πtf (t))2
converges to zero as x → 0 by dominated convergence, which proves the

claimed description of D and that A = MI .
We now show that MI is a closed operator in the sense of Definition 4.27.
So suppose that (fn , MI (fn )) ∈ Graph(MI ) converges to (f, g). Then (after
taking a subsequence) we may assume that fn (t) → f (t) almost everywhere
and tfn (t) = MI (fn )(t) → g(t) almost everywhere, and hence g ∈ L2µ (R)
where tf (t) = g(t) almost everywhere and so f ∈ D and MI (f ) = g, as
required.
The increasing sequence of closed subspaces can be defined by setting
Dℓ = L2µ|[−ℓ,ℓ] ([−ℓ, ℓ]) ⊆ L2µ (R)
for ℓ > 1, where a function defined on [−ℓ, ℓ] is extended to be defined on R

by setting it equal to zero outside [−ℓ, ℓ]. These
S subspaces are clearly Mχ(x) -
and MI -invariant for all x ∈ R. Moreover, ℓ>1 Dℓ contains all continu-
ous functions of compact support and therefore is dense in L2µ (R) by Pro-
position 2.51. Moreover, MI |Dℓ : Dℓ → Dℓ is self-adjoint, kMI |Dℓ k 6 ℓ,
and exp(2πixMI |Dℓ ) = Mexp(2πixI) |Dℓ = Mχ(x) |Dℓ for all x ∈ R and ℓ > 1.
Exercise 9.61. Let H be a separable S complex Hilbert space, let D1 ⊆ DS2 ⊆ · · · be a

sequence of closed subspaces such that ℓ>1 Dℓ is dense in H, and let A : ℓ>1 Dℓ → H
be a linear map such that A(Dℓ ) ⊆ Dℓ , kA|Dℓ k 6 ℓ, and A|Dℓ : Dℓ → Dℓ is self-adjoint
for all ℓ > 1. Show that there exists a uniquely defined unitary representation π of R on H
such that Dℓ is π-invariant and πx |Dℓ = exp(2πixA|Dℓ ) for all x ∈ R and ℓ > 1.
L
Exercise 9.62. Prove Theorem 9.60 in the general case where H = n>1 Hvn for some
sequence of vectors (vn ).
Exercise 9.63. Apply the results above to the unitary flow (ρx f )(y) = f (y+x) for x, y ∈ R
and f ∈ L2 (R).
(a) Use the Fourier transform and the proof of Theorem 9.60 to show that
D = {f ∈ L2 (R) | t 7−→ tfb(t) lies in L2 (R)}.
(b) Show that Cc∞ (R) ⊆ D and that Cc∞ (R) is dense in D when D is endowed with the
norm in Graph(A) where A is defined as in Theorem 9.60.
1
(c) Show that Graph(A) = H01 (R) and that A = 2πi ∂ x.
(d) Show moreover that H 1 (R) = H01 (R).
Exercise 9.64. Generalize Theorem 9.60 to unitary representations of Rd by studying the

π e v−v
space D of vectors for which all partial derivatives limt→0 t jt exist.
9.4 Further Topics
• We will study unitary representations of more general topological groups

in the next chapter (which in part will need the results regarding unitary
flows of this chapter) and in Chapter 12.
• The theory of the Fourier transform and the Schwartz space (together
with the results of Chapter 11) will be used in Chapter 14 to prove the
prime number theorem.
• Our introduction to spectral theory will continue in Chapters 11, 12,
and 13.
Chapter 10
Locally Compact Groups, Amenability,
Property (T)
In this chapter we turn our attention to topological groups. We have seen

the importance of the Haar measure for the theory of Fourier series already
in Chapter 3. After proving the existence of the Haar measure(29) we study
two different classes of groups. The notion of amenable group was already
discussed in Chapter 7 but here we will drop the assumption of discreteness
and discuss the property in greater detail. After this we will study groups
with property (T); these are in some sense (see Exercise 10.35) opposite to
amenable groups. Finally, we will link property (T) to the topic of expander
graphs in Section 10.4.
10.1 Haar Measure
Using the Riesz representation theorem (Theorem 7.44) we now prove a ver-
sion of the existence of Haar measures from p. 92. Throughout this section
we will be working with real-valued functions.
Theorem 10.1 (Existence of Haar measure). Let G be a locally com-

pact, σ-compact, metrizable group. Then there exists a (left) Haar meas-
ure mG on G: that is, there is a locally finite Borel measure mG (that
is, a Radon measure) that is positive on non-empty open sets and satis-
fies mG (gB) = mG (B) for every Borel measurable set B ⊆ G and all g ∈ G.
Write λg for the left regular representation of G on functions (or equival-

ence classes of functions) on G defined by λg (f )(h) = f (g −1 h) for g, h ∈ G.
This indeed defines a representation since
λg1 (λg2 (f )) (h) = λg2 (f )(g1−1 h) = f (g2−1 g1−1 h) = λg1 g2 (f )(h)
for all g1 , g2 , h ∈ G and functions f on G.

354 10 Locally Compact Groups, Amenability, Property (T)
Proof of Theorem 10.1. To motivate the following argument, fix some

function
φ ∈ Cc+ (G) = {f ∈ Cc (G) | f > 0}r{0}
and think of it as a gauge† function. If now another non-trivial function f
in Cc+ (G) can be approximated (in a suitable sense) by sums of the form
n
X
cj λgj φ,
j=1
then we would expect that the (yet to be defined) integral

Z
f dmG
G
will be approximated by
n
X Z
cj φ dmG .
j=1 G
In particular, this would express an approximation to the integral of a general

function in terms of the integral of a single chosen function.
In order to follow this through it is clear that the gauge function will need
to be allowed to vary. Roughly speaking, the more localized φ is, the more
functions can be approximated in this way. As an extreme example, if G is
compact then φ could be a constant and only the constant functions could be
approximated (also see Figure 10.1). For that reason, we will fix throughout
another function f0 ∈ Cc+ (G) and normalize all expressions
R so that the (yet
to be constructed) Haar measure mG will satisfy G f0 dmG = 1.
Hence we start the formal argument by defining for functions φ, f ∈ Cc+ (G)
the expression
( n )
X n
X
M (f : φ) = inf cj f 6 cj λgj φ for some c1 , . . . , cn > 0, g1 , . . . , gn ∈ G
j=1 j=1
and the normalized quantity
M (f : φ)
Λφ (f ) = .
M (f0 : φ)
P P
We may think of nj=1 cj λgj φ as a φ-cover of f and of nj=1 cj as the total
weight of the φ-cover. Notice that {g ∈ G | φ(g) > 21 kφk∞ } is a non-empty
open subset of G, and since f ∈ Cc (G) has compact support it is easy to
see that a cover of f as in the definition of M (f : φ) exists, andPso M (f : φ)
n
is a well-defined non-negative real number. Moreover, if f0 6 j=1 cj λgj φ
† The word ‘gauge’ means a fixed standard of measure like a ruler.
10.1 Haar Measure 355
P
then kf0 k∞ 6 nj=1 cj kφk∞ and so M (f0 : φ) > kf0 k∞ kφk−1 ∞ > 0, which
implies that Λφ (f ) ∈ R>0 is well-defined.
We collect a few immediate properties of Λφ for a scalar α > 0 and func-
tions f, f1 , f2 ∈ Cc+ (G):
• (left-invariance) Λφ (λg f ) = Λφ (f );
• (positive homogeneity) Λφ (αf1 ) = αΛφ (f1 );
• (monotonicity) Λφ (f1 ) 6 Λφ (f2 ) whenever f1 6 f2 ; and
• (sub-additivity) Λφ (f1 + f2 ) 6 Λφ (f1 ) + Λφ (f2 ).
These properties are immediate consequences of the definitionP of M (f : φ)
n
and standard properties of the infimum. For instance, if f1 6 j=1 cj λgj φ
Pm
and f2 6 k=1 dk λhk φ for some scalars c1 , . . . , cn , d1 , . . . , dm > 0 and group
elements g1 , . . . , gn , hP
1 , . . . , hm ∈ G, P
then we obtain a φ-cover of f1 + f2 in
n m
the form f1 + f2 6 j=1 cj λgj φ + k=1 dk λhk φ and so M (f1 + f2 : φ) is
Pn Pm
bounded above by j=1 cj + k=1 dk . Since the φ-covers of f1 and f2 were
arbitrary this implies that
M (f1 + f2 : φ) 6 M (f1 : φ) + M (f2 : φ),
and the claimed sub-additivity follows after dividing by M (f0 : φ).

The main step in the argument is to upgrade the sub-additivity of Λφ to
an ‘approximate additivity’ property. For this we have to study not one gauge
function but many (see Figure 10.1). To prepare for this, we first show that
(
M (f : φ) 6 M (f : f0 )M (f0 : φ),
(10.1)
Λφ (f ) 6 M (f : f0 )
whenever φ, f ∈ Cc+ (G). Note that the second line follows from the first on
dividing by M (f0 : φ). For the proof of the first line in (10.1), suppose that
n
X
f6 cj λgj f0
j=1
and
m
X
f0 6 dk λhk φ
k=1
are an f0 -cover and a φ-cover of f and of f0 , respectively. We then have

m
X
λgj f0 6 dk λgj hk φ
k=1
for all j and we obtain the φ-cover

n X
X m
f6 cj dk λgj hk φ,
j=1 k=1
which gives ! !
n
X m
X
M (f : φ) 6 cj dk .
j=1 k=1
Since the f0 -cover of f and the φ-cover of f0 were arbitrary, this implies (10.1).
φ f1 f2
Fig. 10.1: Using φ, f1 , f2 ∈ Cc (R) as shown, it is clear that Λφ is not additive in

general. Here this failure of additivity happens because the gauge function is not
sufficiently localized to measure the functions f1 and f2 .
We now come to the approximate additivity property mentioned above.

For any two functions f1 , f2 ∈ Cc+ (G) and ε > 0 we claim that there exists a
neighbourhood U of e ∈ G with the property that
Λφ (f1 + f2 ) 6 Λφ (f1 ) + Λφ (f2 ) 6 Λφ (f1 + f2 ) + ε (10.2)
for any non-zero function φ ∈ Cc+ (G) with support contained in U .

Notice that the first inequality in (10.2) is the sub-additivity shown above,
and the second inequality requires an argument that splits a φ-cover of f1 +f2
into two separate φ-covers of f1 and f2 without too much loss of precision. We
will do this using an approximate partition of unity as follows. By Urysohn’s
lemma (Lemma A.27) we may find a function F ∈ Cc+ (G) such that F ≡ 1
on Supp(f1 + f2 ). Using it we define

δ = min 1, 3M(f1 +fε2 +F :f0 ) (10.3)
and the functions p1 and p2 by

(
fk (g)
f1 (g)+f2 (g)+δF (g) for g ∈ Supp fk ;
pk (g) =
0 for g ∈
/ Supp fk
for k = 1, 2; notice that p1 , p2 ∈ Cc+ (G). By uniform continuity of p1 and p2

there exists some neighbourhood U of e ∈ G such that u ∈ U and g ∈ Supp pk
implies that
pk (gu−1 ) − pk (g) < δ
for k = 1, 2. Suppose now that φ ∈ Cc+ (U ) (which we could also write

as Cc+ (G) ∩ Cc (U ) with the usual convention that functions on U may be
extended to functions on G by setting them to be zero on GrU ) and
n
X
f1 + f2 + δF 6 cj λgj φ (10.4)
j=1
is a φ-cover of f1 + f2 + δF . Multiplying this inequality by pk gives

n
X
fk (g) = (f1 + f2 + δF (g))pk (g) 6 cj pk (g)φ(gj−1 g)
j=1
for all g ∈ G. Fixing g ∈ G and one j in the sum, we see that either
pk (g)φ(gj−1 g) = 0
or g ∈ Supp pk and gj−1 g = u ∈ Supp φ ⊆ U , which implies that gj = gu−1

and
pk (g)φ(gj−1 g) 6 (pk (gj ) + δ) φ(gj−1 g)
in either case. Taking the sum over j gives the φ-cover
n
X
fk 6 cj (pk (gj ) + δ) λgj φ
j=1
and so n
X
M (fk : φ) 6 cj (pk (gj ) + δ)
j=1
for k = 1, 2. Taking the sum over k and using the bound p1 + p2 6 1 we

obtain
n
X
M (f1 : φ) + M (f2 : φ) 6 cj (1 + 2δ).
j=1
Taking the infimum over all φ-covers of f1 + f2 + δF in (10.4) and dividing

by M (f0 : φ) shows that
Λφ (f1 ) + Λφ (f2 ) 6 (1 + 2δ)Λφ (f1 + f2 + δF )

6 Λφ (f1 + f2 ) + δΛφ (F ) + 2δΛφ (f1 + f2 + δF ).
Applying (10.1) for φ, f0 and f = f1 + f2 + F we arrive at
Λφ (f1 ) + Λφ (f2 ) 6 Λφ (f1 + f2 ) + 3δM (f1 + f2 + F : f0 ) 6 Λφ (f1 + f2 ) + ε,
by our choice of δ in (10.3). This proves the second inequality in (10.2).

It remains to apply a compactness argument and the Riesz representation

theorem (Theorem 7.44). For this, notice that
Y
Ω= [0, M (f : f0 )]
f ∈Cc+ (G)
is a compact topological space, and that by (10.1) any non-zero function φ

in Cc+ (G) defines an element Λφ ∈ Ω (by thinking of the product space Ω as
a space of real-valued functions on Cc+ (G)). For a neighbourhood U of e we
can then define
Ω(U ) = {Λφ | φ ∈ Cc+ (G) and Supp φ ⊆ U }.
It is clear that U1 ⊆ U2 implies that Ω(U1 ) ⊆ Ω(U2 ), which in turn shows

that Ω(U1 ) ∩ · · · ∩ Ω(Un ) is non-empty for any finite collection of neighbour-
hoods U1 , . . . , Un of e. It follows that the intersection
\
Ω(U )
U
taken over all neighbourhoods U of e ∈ G is non-empty by compactness

of Ω. Let Λ be an element in this intersection. We note that left-invariance,
positive homogeneity, and monotonicity of all functions Λφ implies the same
properties for Λ (check this). Moreover, Λ is in addition additive. In fact,
given functions f1 , f2 ∈ Cc+ (G) and ε > 0 there exists a neighbourhood U
satisfying (10.2). Since Λ ∈ Ω(U ) there exists a φ ∈ Cc+ (G) with Supp φ ⊆ U
and with
|Λφ (f1 ) − Λ(f1 )| < ε,
|Λφ (f2 ) − Λ(f2 )| < ε,
and
|Λφ (f1 + f2 ) − Λ(f1 + f2 )| < ε.
Using (10.2) for Λφ we see that
Λ(f1 + f2 ) − 3ε 6 Λ(f1 ) + Λ(f2 ) 6 Λ(f1 + f2 ) + 4ε.
As ε > 0 was arbitrary, this shows that Λ is additive in the sense that
Λ(f1 + f2 ) = Λ(f1 ) + Λ(f2 )
for all f1 , f2 ∈ Cc+ (G).

Now extend Λ to all of Cc (G) by setting Λ(0) = 0 and
Λ(f ) = Λ(f + ) − Λ(f − ), (10.5)

where f + = max{0, f } and f − = max{0, −f }. Applying the argument used

immediately after (7.19), we see that Λ is now a positive linear functional
on Cc (G), so by the Riesz representation theorem (Theorem 7.44) there exists
a unique locally finite measure m with
Z
Λ(f ) = f dm
G
for all f ∈ Cc (G). Since

Z Z
λg (f ) dm = Λ(λg (f )) = Λ(f ) = f dm
G G
for any g ∈ G andR f ∈ Cc (G) it follows that m is left-invariant. Also note

that Λ(f0 ) = 1 = f0 dm. If O ⊆ G is a non-empty open subset, then every
compact set can be covered by finitely many left translates of O, showing
that m(O) = 0Rwould imply that m(K) = 0 for every compact set, and in
particular that G f0 dm = 0, a contradiction. It follows that m is positive on
non-empty open subsets, which completes the proof that m = mG is a left
Haar measure on G.
Proposition 10.2 (Uniqueness of the Haar measure). Let G be a locally

compact, σ-compact metrizable group. Then the left Haar measure is unique
up to a positive scalar multiple.
For the proof of uniqueness, the following will be useful.
Lemma 10.3 (Positive overlaps). Let G be as above, and suppose that m

is a left Haar measure. If B1 , B2 ⊆ G are Borel measurable sets with m(B1 )
and m(B2 ) positive, then {g ∈ G | m(gB1 ∩ B2 ) > 0} is non-empty.
Moreover, m(B1−1 ) > 0, where B1−1 = {g −1 | g ∈ B1 }.
Proof. Let B1 , B2 ⊆ G be as in the lemma. Then

Z Z
m(gB1 ∩ B2 ) = 1gB1 (h)1B2 (h) dm(h) = 1hB −1 (g)1B2 (h) dm(h),
1
so by Fubini’s theorem we have

Z Z Z
m(gB1 ∩ B2 ) dm(g) = 1B2 (h) 1hB1−1 (g) dm(g) dm(h)
| {z }
=m(hB1−1 )=m(B1−1 )
= m(B1−1 )m(B2 ).
Setting B2 briefly equal to G, we obtain m(B1 )m(G) = m(B1−1 )m(G) and

see that m(B1−1 ) > 0 (since in measure theory 0 · ∞ = 0). With this we now
obtain the lemma since we see that the set {g ∈ G | m(gB1 ∩ B2 ) > 0} must
have positive measure with respect to m.
Proof of Proposition 10.2. Suppose that m1 , m2 are left Haar measures

on G. Define m = m1 +m2 , so that m is a left Haar measure and m1 , m2 ≪ m.
By the Radon–Nikodym theorem (Proposition 3.29) there exist measurable
functions f1 , f2 > 0 with dmi = fi dm for i = 1, 2.
We claim that f1 is constant m-almost everywhere (and so f2 is also). This
then implies that m1 = c1 m and m2 = c2 m for some constants c1 , c2 > 0,
and so the proposition follows.
Assume now that the claim does not hold. Then there exist sets B1 , B2 ⊆ G
of positive m measure such that f1 (x) < f1 (y) for all x ∈ B1 and y ∈ B2 .
We can find these sets, for example, as pre-images of two distinct inter-
vals [ nk , k+1 ℓ ℓ+1
n ) and [ n , n ) for some integers k < ℓ and n > 1, for other-
wise f1 is constant m-almost everywhere. By Lemma 10.3 there exists a g ∈ G
with m(gB1 ∩ B2 ) > 0. For any E ⊆ G we also have
Z Z Z
f1 (x) dm(x) = m1 (E) = m1 (g −1 E) = f1 dm = f1 (g −1 x) dm(x)
E g−1 E E
by the left-invariance of m1 and m. If we now take E ⊆ gB1 ∩ B2 with

positive and finite measure, then y ∈ E implies y ∈ B2 and g −1 y ∈ B1 ,
hence f1 (y) > f1 (g −1 y) and so
Z Z
f1 (y) dm(y) > f1 (g −1 y) dm(y).
E E
This contradiction proves the claim that f1 must be constant almost every-
where, and hence the proposition.
Essential Exercise 10.4. Let G be a locally compact, σ-compact, metriz-

able group and let f be a measurable complex-valued
function on G with
the property that mG {g ∈ G | f (k −1 g) 6= f (g)} = 0 for every k ∈ G. Show
that f = c almost everywhere for some constant c ∈ C.
Exercise 10.5. Let G be a locally compact, σ-compact metrizable group with left Haar
measure mG . Prove the following assertions.
(a) For any continuous automorphism θ of G there exists a positive number modG (θ)
with mG (θ −1 (B)) = modG (θ)mG (B) for all Borel subsets B ⊆ G.
(b) Applying (a) to inner automorphisms θg defined by θg (h) = ghg −1 for all g, h in G
defines a map ∆G : G → R>0 by ∆G (g) = modG (θg ). Show that ∆G , known as the
modular character, is a continuous group
R homomorphism with respect to multiplication
on R>0 . Use this to give a formula for f (gh−1 ) dm(g).
(c) Show that Z
(right)
mG (B) = ∆G (g)−1 dmG (g)
B
(right)
defines a right-invariant Haar measure on G and show that mG (B) = mG (B −1 ) for
all Borel subsets B ⊆ G.
10.2 Amenable Groups 361
(d) Show that on a compact metrizable group a left Haar measure is also a right Haar
measure.
A group G is called unimodular if any left Haar measure mG is also a right

Haar measure. Thus, for example, Exercise 10.5(d) says that compact groups
are unimodular.
Essential Exercise 10.6 (Approximate identity). Let G be a locally

compact, σ-compact, metrizable group and let (Un ) be a sequence of open
neighbourhoods of the identity e ∈ G with diam(Un ) → 0 as n → ∞. Let (ψn )
be a sequence of non-negative functions
R in L1 (G) with the property that ψn
vanishes outside of Un and satisfies G ψn dmG = 1 for all n > 1 (for example,
we may set ψn = mG 1(Un ) 1Un ). Show that
lim ψn ∗ f = lim f ∗ ψn = f
n→∞ n→∞
for all f ∈ L1 (G).

Exercise 10.7. Show that if a locally compact σ-compact metrizable group G has the
property that mG (G) is finite, then G is compact.
Exercise 10.8. A Haar measure on the additive reals (R, +) is (up to a scalar multiple)
the Lebesgue measure dx. Show that a Haar measure on the multiplicative reals (Rr{0}, ·)
dx
is given by |x| .
Exercise 10.9. Let G be the group of affine transformations x 7→ ax + b with a 6= 0

and a, b ∈ R, which may also be thought of as the matrix group

a b
G= | a, b ∈ R, a 6= 0
1
da db
under matrix multiplication. Show that dmG = a2
defines a left Haar measure on G
(right) da db
and dmG = defines a right Haar measure on G. Compute the modular character
|a|
on G (as defined in Exercise 10.5).
dg
Exercise 10.10. Show that dmGLd (R) (g) = | det g|d
defines a left and right Haar measure
on GLd (R), where dg denotes Lebesgue measure on the space of real d × d matrices.
10.2 Amenable Groups
†
Using the material of Chapter 8 we continue the discussion from Sec-
tion 7.2.2, where the concept of amenability was introduced for discrete
groups.
† Apart from Exercise 10.35 in Section 10.3, this section will not be used later.
10.2.1 Definitions and Main Theorem
In this section we will always assume that either

• G is a discrete (but not necessarily countable) group and m = mG denotes
the counting measure defined by m(A) = |A| for all A ⊆ G; or
• G is a locally compact σ-compact metrizable group and m = mG denotes
a left Haar measure defined on the Borel σ-algebra of G.
We recall that in either case the dual space to L1 (G) is precisely L∞ (G)
by Proposition 7.34 resp. Exercise 7.33(e). Let us also introduce the convex
set n Z o
P(G) = f ∈ L1 (G) | f > 0 a.e. and f dmG = 1
G
of probability distributions, and the convex set M (G) of means on G defined

by
M (G) = M ∈ L∞ (G)∗ | M is positive and M (1) = 1 ,
where M ∈ L∞ (G)∗ is called positive if Φ ∈ L∞ (G) with Φ > 0 almost
everywhere implies M (Φ) > 0.
For a function f on G we write λg f (h) = f (g −1 h) for g, h ∈ G. Notice that
this definition extends to equivalence classes of functions, so λg is an operator
on any function space Lp (G) with p ∈ [1, ∞]. We can now extend the notion
of amenability via a suitable form of the characterization in Lemma 7.18.
Definition 10.11. We say that G is amenable if there exists a left-invariant

mean M on L∞ (G), meaning a mean M ∈ M (G) satisfying in addition the
left-invariance property M (Φ) = M (λg Φ) for any Φ ∈ L∞ (G) and g ∈ G.
The link between amenability and geometric properties of a group seen

in Lemma 7.24 also extends to this setting (see also Proposition 10.19 for a
strengthening).
Definition 10.12. A group G admits Følner sets if for any compact subset K
of G and ε > 0 there exists a measurable set F ⊆ G of positive and finite m-
measure with
m(kF △F )
<ε
m(F )
for all k ∈ K. In this case we will also call F a Følner set (for (K, ε)).
Exercise 10.13. Suppose that G is a locally compact σ-compact metrizable group that
admits Følner sets. Show that there exists a sequence (called a Følner sequence) (Fn )
of measurable sets with positive and finite m-measure so that for any fixed k ∈ G we
have m(kFn △Fn )/m(Fn ) → 0 as n → ∞ and the convergence is uniform on compact
subsets of G.
In the discrete case K and F are finite sets and Definition 10.12 may be
thought of as follows. The Cayley graph Γ (G, K) associated to G and the
subset K (which may or may not generate G) is the graph with vertices given
by elements of G, with edges joining g to kg for any k ∈ K. Then G admits
Følner sets means that for any ε > 0 there is a finite set F such that the
number of edges in Γ (G, K) leaving F is at most ε|F |. This stands in stark
contrast to the property of being an expander graph (see Section 10.4).
It should be clear that the two notions above — amenability and admitting
Følner sets — are related. In fact, our main goal in this section is to prove
Lemma 7.24 and its converse in this more general setting. For the more
difficult part of the equivalence one more definition will be useful.
Definition 10.14 (Reiter’s condition). A group G fulfills the Reiter con-

dition in L1 if for any compact set K ⊆ G and ε > 0 there exists
some f ∈ P(G) with
kλk f − f k1 < ε
for all k ∈ K. We say that L2 (G) has almost invariant vectors (or that G
fulfills the Reiter condition in L2 ) if for any compact set K ⊆ G and ε > 0
there exists some f ∈ L2 (G) with kf k2 = 1 and with
for all k ∈ K.
Theorem 10.15. Let G be a discrete group or a locally compact σ-compact

metrizable group. Then the following are equivalent:
(1) G is amenable;
(2) G admits Følner sets;
(3) G fulfills the Reiter condition in L1 ; and
(4) L2 (G) has almost invariant vectors.
10.2.2 Proof of Theorem 10.15
We will restrict ourselves in the following to R-valued function, but it is not

hard to see that with a bit more work this can be avoided.
Proof that (2)⇐⇒(3)⇐⇒(4). Suppose (2) holds and F is a Følner set for
) 1F
1
a compact subset K ⊆ G and ε > 0 as in Definition 10.12. Then f = m(F
lies in P(G), λk 1F (g) = 1F (k g) = 1kF (g), and so
−1
Z
1 1
kλk f − f k1 = |1kF − 1F |dm = m (kF △F ) < ε
m(F ) m(F )
for all k ∈ K. Similarly, if we set f2 = √ 1

m(F )
1F we see that
Z 1/2
1
(1kF (g) − 1F (g)) dm(g)
2
kλk f2 − f2 k2 =
G m(F )
s
m(kF △F ) √
= < ε.
m(F )
Since ε > 0 and K ⊆ G were arbitrary, we see that (2) implies (3) and (4).
Assuming that L2 (G) has almost invariant vectors. If G satisfies (4)
and f2 ∈ L2 (G) satisfies kf2 k2 = 1 and kλk f2 − f2 k2 < ε for all k in the
compact set K ⊆ G, then we define f (g) = f2 (g)2 for all g ∈ G and see
immediately that f > 0 and kf k1 = kf2 k22 = 1. Moreover, for k ∈ K we also
have
Z

kλk f − f k1 = f2 (k −1 g)2 − f2 (g)2 dm(g)
ZG

= f2 (k −1 g) − f2 (g) f2 (k −1 g) + f2 (g) dm(g)
G
= h|λk f2 − f2 |, |λk f2 + f2 |iL2 (G)
6 kλk f2 − f2 k2 kλk f2 + f2 k2 6 2ε.
Therefore (4) implies (3).

Assuming the Reiter condition in L1 (G). Assume now that (3) holds.
We wish to find a Følner set as in Definition 10.12. Therefore let K ⊆ G
be compact and assume without loss of generality that mG (K) > 0. Further
fix ε > 0 and let f ∈ P(G) be as in Reiter’s condition in Definition 10.14.
For every α > 0 we define the measurable set Fα = {g ∈ G | f (g) > α},
which will be a Følner set if we choose α carefully. By Fubini’s theorem we
have
Z ∞ Z ∞Z
m(Fα ) dα = 1Fα (g) dm(g) dα
0 0 G
Z Z ∞ Z
= 1Fα (g) dα dm(g) = f (g) dm(g) = kf k1 = 1.
G 0 G
Moreover, for any k ∈ K we also have

Z ∞ Z ∞Z
m (kFα △Fα ) dα = |1kFα (g) − 1Fα (g)| dm(g) dα
0
Z0 Z ∞ G

= 1Fα (k −1 g) − 1Fα (g) dα dm(g)
ZG 0

= f (k −1 g) − f (g) dm(g) = kλk f − f k1 < ε.
G
Integrating this over K we obtain

Z ∞Z Z ∞
m(kFα △Fα ) dm(k) dα < εm(K) = εm(K)m(Fα ) dα.
0 K 0
Therefore there must exist some α ∈ (0, ∞) such that

Z
m(kFα △Fα ) dm(k) < εm(K)m(Fα ). (10.6)
K
In the case when G is discrete, K is finite, and this gives
|kFα △Fα | < ε|K||Fα |
for all k ∈ K. Since ε > 0 was arbitrary this proves (2) in the discrete case.
In the non-discrete case, the statement in (10.6) is an averaged form of
the inequality we are seeking, and as a result seems to be weaker than what
we need. For the upgrade we use the fact that ε > 0 was arbitrary: we have
shown that for any ε > 0 and δ > 0 there exists a measurable set F = Fα
such that Z
m(kF △F ) dm(k) < εδm(F ) < ∞.
K
In particular, we must have m(N ) < δ if
N = {k ∈ K | m(kF △F ) > εm(F )}.
Summarising, we have shown for any compact set K, any ε > 0, and any δ > 0
that there exists a measurable set F with finite measure and a subset N ⊆ K
with m(N ) < δ such that
m(kF △F ) < εm(F ) (10.7)
for all k ∈ KrN .

We now use the group structure to upgrade this and deduce the existence
of Følner sets. Define K1 = K ∪K 2 and δ = 21 m(K). Now apply the argument
above to K1 , an arbitrary ε > 0, and this choice of δ. This gives a measurable
set F ⊆ G of finite measure satisfying (10.7) for all k1 ∈ K1rN and some
exceptional set N ⊆ K1 of measure m(N ) < 21 m(K). Now fix some k ∈ K.
We see that (10.7) holds for k1 ∈ KrN where m(KrN ) > 21 m(K), and also
that (10.7) holds for all kk1 ∈ (kK)rN where m ((kK)rN ) > 12 m(K). Since
left translation by k preserves the measure m, it follows that there exists
some k1 ∈ K such that (10.7) holds both for k1 and for kk1 . Therefore
m(kF △F ) 6 m ((kF △kk1 F ) ∪ (kk1 F △F )) 6 m(F △k1 F )+εm(F ) < 2εm(F )
for any k ∈ K, proving (2).

To summarize, we have shown that (2), (3) and (4) are equivalent. We
now turn to the equivalence between (1) and (3), which is where functional
analysis will play an important role. In the following we will frequently use
the left-invariance of m in the form

Z Z
hλk f, Φi = f (k g)Φ(g) dm(g) = f (g ′ )Φ(kg ′ ) dm(g ′ ) = hf, λk−1 Φi
−1
for f ∈ L1m (G), Φ ∈ L∞ (G) and k ∈ G.

Proof that (3)=⇒(1) in Theorem 10.15. Assume that G fulfills Reiter’s
condition. This shows that for a given ε > 0 and compact K ⊆ G the func-
tion f as in Definition 10.14 satisfies
|hf, λk−1 Φ − Φi| = |hf, λk−1 Φi − hf, Φi| = |hλk f − f, Φi| 6 εkΦk∞
for k ∈ K and Φ ∈ L∞ (G). Taking the image of such functions under the
embedding map ı into the dual of L∞ (G) we see that

A ε, Φ, k = {M ∈ M (G) | |M (Φi − λkj Φi )| 6 εkΦi k∞ for all i, j}
is non-empty for any choice of ε > 0,
Φ = (Φ1 , . . . , Φℓ ) ∈ (L∞ (G))ℓ ,
k = (k1 , . . . , kn ) ∈ Gn ,
and any ℓ, n ∈ N. By definition

A ε, Φ, k ⊆ M (G)
is weak* closed and contained in the closed unit ball of L∞ (G)∗ (check this).
Since any finite intersection of such sets will contain
another such set we see
that the collection of sets of the form A ε, Φ, k has the finite intersection
property. By the Banach–Alaoglu theorem (Theorem 8.10) it follows that
the intersection over these sets is non-empty. By definition, this intersection
consists of all left-invariant means on L∞ (G).
For the converse, which is perhaps the most surprising part of the whole
proof, we will need the following lemma.
Lemma 10.16. Let G be as above, and let
ı : L1 (G) −→ L1 (G)∗∗ = L∞ (G)∗
be the natural embedding into the bidual of L1 (G). Then the weak* closure of
the image of P(G) under ı in L∞ (G)∗ is M (G).
Proof. Assume for the purpose of a contradiction that there is some

mean M ∈ M (G) that is not in the weak* closure K of ı(P(G)). Apply-
ing Theorem 8.73 to X = L∞ (G)∗ equipped with the weak* topology, the
closed set K, and M ∈
/ K gives a continuous linear functional on X separ-
ating M from K. By Lemma 8.13 this functional is an evaluation map at
some Φ ∈ L∞ (G). Hence the conclusion of Theorem 8.73 is precisely that

there is some c ∈ R with
Z
f Φ dm = hΦ, ı(f )i 6 c < M (Φ)
G
for all f ∈ P(G). This implies that Φ 6 c almost everywhere, since otherwise
we could find a measurable set B ⊆ G of finite positive measure with Φ(g) > c
for g ∈ B, and then setting f = m(B)1
1B ∈ P(G) leads to a contradiction.
However, Φ 6 c almost everywhere also implies that M (Φ) 6 c by the prop-
erties of M ∈ M (G). This contradiction proves the lemma.
We start with the discrete case as it is significantly easier.
Proof of (1)=⇒(3) in Theorem 10.15 for discrete G. Assume that
there exists a left-invariant mean M . Using M we wish to find, for any ε > 0
and finite K ⊆ G, a function f ∈ P(G) such that
for all k ∈ K. Define the bounded linear operator

K
D : ℓ1 (G) −→ ℓ1 (G)
f 7−→ (λk f − f )k∈K .
norm
Note that D(P(G)) is convex, and we wish to show that 0 ∈ D(P(G)) .
norm
By Corollary 8.74 we know that D(P(G)) is also closed in the weak
topology. Therefore it is enough to show that
weak norm
0 ∈ D(P(G)) = D(P(G)) . (10.8)
K
The dual of ℓ1 (G) is given by (ℓ∞ (G))K and it suffices to find, for
every Φ1 , . . . , Φn ∈ ℓ∞ (G) and ε > 0, some f ∈ P(G) with
|hλk f − f, Φj i| < ε (10.9)
for all k ∈ K and j = 1, . . . , n. The left-hand side of (10.9) may be rewritten

as
|hλk f − f, Φj i| = |hf, λk−1 Φj − Φj i| = |hλk−1 Φj − Φj , ı(f )i| . (10.10)
Note that
hλk−1 Φj − Φj , M i = M (λk−1 Φj − Φj ) = 0
for the invariant mean M . By Lemma 10.16 we know that ı (P) is dense
in M (G) with respect to the weak* topology, so there must exist an element f
of P(G) for which (10.10) is less than ε for all k ∈ K and j = 1, . . . , n, which
proves (10.9), (10.8), and hence that G fulfills the Reiter condition in L1 .
In the non-discrete case another ingredient is needed.
Lemma 10.17 (Topological left-invariant mean). Let G be a σ-compact,

locally compact, metrizable amenable group. Then there also exists a ‘topolo-
gically left-invariant mean’ on L∞ (G), that is, a mean Mtop on L∞ (G) such
that
Mtop (f ∗ Φ) = Mtop (Φ)
for any f ∈ P(G) and Φ ∈ L∞ (G).
Proof. We start by noting that the definition of f ∗ Φ in Lemma 3.75 and

Exercise 3.76 makes sense at every g ∈ G and easily implies that
kf ∗ Φk∞ 6 kf k1 kΦk∞ (10.11)
for any f ∈ L1 (G) and Φ ∈ L∞ (G).

Given f0 , f1 ∈ P(G) and Φ ∈ L∞ (G) we define
Φ0 = f0 ∗ Φ
and claim that

M (f1 ∗ Φ0 ) = M (Φ0 ) (10.12)
if M is a left-invariant mean on L∞ (G).
For this we first recall from Lemma 3.74 that for a given f0 and ε > 0
there exists a neighbourhood U of e ∈ G such that
kλk f0 − f0 k1 < ε
for all k ∈ U . Using the left-invariance of the Haar measure we obtain

Z
λk Φ0 (g) = (f0 ∗ Φ)(k −1 g) = f0 (h)Φ(h−1 k −1 g) dm(h)
G
Z
= f0 (k −1 h1 )Φ(h−1
1 g) dm(h1 ) (with h1 = kh)
G

= (λk f0 ) ∗ Φ (g). (10.13)
Together with (10.11) we deduce that
kλk Φ0 − Φ0 k∞ = k(λk f0 − f0 ) ∗ Φk∞ 6 kλk f0 − f0 k1 kΦk∞ 6 εkΦk∞
for all k ∈ U . In other words, for Φ0 the left regular representation satisfies
the continuity claim appearing in Lemma 3.74, but with respect to the k · k∞
norm. This property of Φ0 is called left uniform continuity of Φ0 . As this is
precisely the assumption for the strong integral
R discussed in Proposition 3.81,
it follows that for f1 ∈ Cc (G) the integral R- f1 (g)λg Φ0 dmG (g) can be ob-
tained as a limit with respect to k · k∞ of Riemann sums of the form
X
f1 (gp )λgp Φ0 mG (P ),
P ∈ξ
where ξ is a finite partition of Supp(f1 ) and gp ∈ P for each P ∈ ξ. As

convergence
R with respect to k · k∞ implies pointwise convergence, we see
that R- f1 (g)λg Φ0 dmG (g) = f1 ∗ Φ0 . Applying the continuous functional M
we see that
X Z
M (f1 ∗ Φ0 ) = lim f1 (gp )m(P )M (λgp Φ0 ) = M (Φ0 ) f1 dm
ξ G
P ∈ξ
since M (λg Φ0 ) = M (Φ0 ) for any g ∈ G. Using the estimate (10.11) again and
the density of Cc (G) in L1 (G) this extends to all f1 ∈ L1 (G). Restricting to
functions in P(G) the claim in (10.12) follows.
We now make the definition
Mtop (Φ) = M (Φ0 ) = M (f0 ∗ Φ)
for some f0 ∈ P(G). Note that Mtop (1) = M (1) = 1 and that Φ > 0 almost
surely implies f0 ∗ Φ > 0 and Mtop (Φ) > 0. We also claim that this definition
is independent of f0 . Using this independence we see that
Mtop (f1 ∗ Φ) = M (f0 ∗ f1 ∗ Φ) = Mtop (Φ)
for any f1 ∈ P(G) by associativity of convolution (cf. the proof of Proposi-

tion 3.91), and the lemma follows.
To see the independence let (ψn )n be an approximate identity in L1 (G)
(see Exercise 10.6) so that
lim kf0 ∗ ψn − f0 k1 = 0,
n→∞
and so by (10.11) and continuity of M also
lim M (f0 ∗ ψn ∗ Φ) = M (f0 ∗ Φ).

n→∞
Combining this with (10.12) and ψn ∈ P(G) we see that
lim M (ψn ∗ Φ) = M (f0 ∗ Φ),

n→∞
which gives the claim and the lemma.
Proof of (1) =⇒ (3) in Theorem 10.15 without discreteness.

Fix ψ1 , . . . , ψn ∈ P(G). Applying the same argument as in the discrete case
but to the map
n
D : L1 (G) −→ L1 (G)
f 7−→ (ψj ∗ f − f )j
and using the existence of a topological left-invariant mean as in Lemma 10.17,

we conclude that it is possible to find, for every ε > 0, some f ∈ P(G) such
that
kψj ∗ f − f k1 < ε (10.14)
for j = 1, . . . , n.
Now fix some ψ ∈ P(G) and some dense countable subset {g1 , g2 , . . . } ⊆ G
with g1 = e. Define ψj = λgj ψ and apply the argument above to ψ1 , . . . , ψn
and ε = n1 . This shows that there exists a sequence (fn ) in P(G) with
kλgj ψ ∗ fn − fn k1 −→ 0
as n → ∞ for every j.
Now let K ⊆ G be a compact subset and fix ε > 0. By Lemma 3.74 there
exists some neighbourhood U of e ∈ G such that
kλu ψ − ψk1 < ε

S∞
for u ∈ U . By density of {g1 , g2 , . . . } we have G = j=1 gj U . By compactness
of K there exists some ℓ such that
ℓ
[
K⊆ gj U. (10.15)
j=1
By construction we can choose n large enough to ensure that
kλgj ψ ∗ fn − fn k1 < ε
for j = 1, . . . , ℓ. For k ∈ K there exists by (10.15) some j 6 ℓ with k = gj u

for some u ∈ U , which shows that
kλk ψ − λgj ψk1 = kλgj (λu ψ − ψ) k1 < ε.
Thus we deduce (after recalling that g1 = e and fn ∈ P(G)) that

kλk ψ ∗ fn − ψ ∗ fn k1 6 k λk ψ − λgj ψ ∗ fn k1
+ kλgj ψ ∗ fn − fn k1
+ kfn − ψ ∗ fn k < 3ε.
Finally, notice that

Z

(λk ψ) ∗ fn (g) = λk ψ(h)fn (h−1 g) dm(h)
Z
= ψ(k −1 h)fn (h−1 g) dm(h)
Z
= ψ(h1 )fn (h−1
1 k
−1
g) dm(h1 ) = λk (ψ ∗ fn ) (g)
for all g ∈ G and k ∈ K. Hence the function ψ ∗ fn ∈ P(G) satisfies Reiter’s

condition for K ⊆ G and 3ε.
Exercise 10.18. Fill in the details of the argument leading to (10.14).
10.2.3 A More Uniform Følner Set
For subsets A, B ⊆ G of a group we define AB = {ab | a ∈ A, b ∈ B}.
Proposition 10.19. Let G be an amenable group as in Theorem 10.15. Then

for every non-empty compact K ⊆ G and every ε > 0 there exists a measur-
able Følner set F ⊆ G with finite measure such that m((KF )△F ) < εm(F ).
Proof. Let U = U −1 be a compact neighbourhood of the identity e ∈ G.

Given a Følner set F for (U, ε) we define a function f : G → R by
Z Z
1 1
f (g) = 1F (ug) dm(u) = 1F dm,
m(U ) U m(U g) U g
where we will think of f (g) as the proportion of positive answers in the neigh-
bourhood U g of g to the question of whether g should belong to an improved
version of F . In case G is not unimodular, we multiplied the integral and the
denominator in the first expression by ∆G (g), used m(U g) = ∆G (g)m(U )
in the denominator and the substitution h = ug ∈ U g for u ∈ U in the
integral (at first reading it may be helpful to assume that G is unimodu-
lar as this simplifies some of the expressions arising). Given any majority
parameter α ∈ (0, 1) we also define the set Fα = Fα (F, U ) by
Fα = {g ∈ G | f (g) > α} = {g ∈ G | m(F ∩ U g) > αm(U g)},
which will be a more well-rounded version of F . The defining property of F

and the definition of f together with Fubini’s theorem imply that
Z
1
kf − 1F k1 6 k1 −1 − 1F k1 dm(u) < εm(F ).
m(U ) U u F
This gives
βm ({g ∈ G | |f − 1F |(g) > β}) < εm(F )
for all β ∈ (0, 1). Setting β = min{α, 1 − α} we obtain

ε
m (Fα △F ) < m(F ). (10.16)
min{α, 1 − α}
Applying this construction will give us the desired Følner set. To this end,
fix some non-empty compact subset K ⊆ G. Since K is compact and U has
non-empty interior there exist k1 , . . . , kn ∈ K such that
n
[
K⊆ kj U.
j=1
Since K is assumed to be non-empty we have n > 1. Suppose now that F is

a Følner set for (K ∪ U 2 , nε ). Set α = 21 and define the associated set
F ′ = F1/2 (F, U )
as above, so that
2ε
m (F ′ △F ) < m(F ) (10.17)
n
1
by (10.16). Assuming ε < 4 we have
m(F ) ≪ m(F ′ ) ≪ m(F ).
Since n > 1 we see, from the Følner property of F for k1 ∈ K and (10.17),
that
m(F ′r(KF ′ )) 6 m(F ′r(k1 F ′ )) ≪ εm(F ′ ).
For the second inequality we first claim that
U F ′ ⊆ Fα = Fα (F, U 2 ) (10.18)
for the parameter

m(U )
α= ,
2m(U 2 ) maxu∈U ∆(u)
which only depends on our choice of the neighbourhood U . In fact, for u ∈ U
and g ∈ F ′ = F1/2 (F, U ) we have
m(F ∩ U 2 ug) > m(F ∩ U g) > 21 m(U g)

m(U )
= m(U 2 g) > αm(U 2 ug),
2m(U 2 )
which implies (10.18).

Since F is a Følner set for (U 2 , nε ) we obtain from (10.18) and (10.16) that
ε ε
m(U F ′rF ) 6 m(FαrF ) ≪ m(F ) ≪ m(F ′ ),
n n
where the implicit constant only depends on the choice of U . Using the fact
that F is a Følner set for ({k1 , . . . , kn }, nε ) and (10.17), this gives
ε
m(kj U F ′rF ′ ) 6 m(kj U F ′rkj F ) + m(kj F rF ) + m(F rF ′ ) ≪ m(F ′ )
n
Sn
for j = 1, . . . , n. Taking the union and recalling that K ⊆ j=1 kj U , we
obtain !
[n
′r ′ ′r ′
m(KF F ) 6 m kj U F F ≪ εm(F ′ ).
j=1
Since ε was arbitrary, this concludes the proof.
10.2.4 Further Equivalences and Properties
We conclude the discussion of amenability with a number of exercises that

extend the treatment above and generalize various earlier topics to the level
of generality of this section.
Exercise 10.20. Let G be a discrete group. Show that G is amenable if and only if every
finitely generated subgroup of G is amenable.
Exercise 10.21. Let G be a σ-compact, locally compact, metric group.

(1) Show that Definition 10.12 and Definition 10.14 could equivalently be formulated by
using only finite subsets K ⊆ G.
(2) Show that if G is amenable, then there exists a mean that is left-invariant and topolo-
gically left-invariant.
Unless otherwise noted G will be, as in Theorem 10.15, either a discrete

group or a locally compact σ-compact metrizable group.
Exercise 10.22. Let G be an amenable group.
(1) Show that there exists a bi-invariant mean on L∞ (G), that is, one which is left-invariant
and right-invariant (defined in the same way).
(2) Assume that G is in addition unimodular. Show that G admits bi-invariant Følner
sets, in the sense that they are almost invariant under left and right translation by a given
compact subset K ⊆ G.
An action of a topological group G on a locally convex vector space X is af-

fine if every g ∈ G acts via a map V ∋ v 7→ πgaff (v) = πglin (v) + wg , where wg
depends continuously on g and πglin (v) depends continuously on (g, v) ∈ G×V ,
and linearly on v.
Exercise 10.23. Show that the following properties are equivalent.
(1) G is amenable.
(2) If G acts continuously on a compact metric space X (see Definition 3.70), then there
exists a G-invariant Borel probability measure on X.
(3) If G acts continuously by affine maps on a locally convex space V and K ⊆ V is
compact, convex, and G-invariant, then there exists a point x0 ∈ K that is fixed under all
elements of G.
Exercise 10.24. Generalize Proposition 7.20 to the groups considered here:

(1) Show that if G is amenable and H < G is a closed subgroup then H is amenable.
(2) Show that if H ⊳ G is a closed normal subgroup with the property that both H
and G/H are amenable, then G is also amenable.
Exercise 10.25. Let H < G be a closed subgroup with the property that X = G/H
supports a finite G-invariant Borel measure. Show that G is amenable if and only if H is.
In the remainder of the section we assume that G is discrete and generated

by a finite symmetric set S, where a subset S of a group G is symmetric
if s ∈ S implies that s−1 ∈ S. The associated length function assigns to each
element g ∈ G the number ℓS (g) ∈ N0 defined by the length of the shortest
representation of g as a product of elements of S, and the associated growth
function is defined by
γS (n) = |{g ∈ G | ℓS (g) 6 n}|
for n ∈ N0 . In order to define a growth property intrinsic to the group G

rather than the pair (G, S), write γ ∼ γ ′ for functions γ, γ ′ : N → N if there
exist positive constants C1 , C2 , κ1 , κ2 such that
C1 γ ′ (κ1 n) 6 γ(n) 6 C2 γ ′ (κ2 n)
for all n > 1.

Exercise 10.26. Let G be generated by a symmetric set S. Show that setting
d(g, h) = ℓS (gh−1 )
defines a metric on G.
Exercise 10.27. Show that the equivalence class [γS ]∼ of the growth function of a finitely
generated group is well-defined (meaning that it is independent of the choice of symmetric
generating set), allowing us to write γ (G) for any representative of the equivalence class.
As a result we may make the following definition. A finitely generated

infinite group G has
• polynomial growth if γ (G) ∼ pa for some a > 0, where pa (n) = na for
all n > 1;
• exponential growth if γ (G) ∼ exp, where exp(n) = en for n > 1;
1/n
• sub-exponential growth if lim supn→∞ γ (G) (n) 6 1; and
• intermediate growth if it is of neither polynomial nor exponential growth.
Exercise 10.28. (1) Show that a group of sub-exponential growth is amenable.
(2) Show that the Heisenberg group in Exercise 7.27 has polynomial growth.
(3) Show that the group
n o
a b
| a ∈ 2Z , b ∈ Z[ 12 ]
01
is finitely generated, amenable, and has exponential growth.
10.3 Property (T) 375
10.3 Property (T)

†
In this section we will connect the spectral theory of unitary flows in Sec-
tion 9.3 to the discussion of expanders in Section 10.4. As we will see, the
connection will be via another property that topological groups may have.
10.3.1 Definitions and First Properties
Let us start with some fundamental definitions where we will assume that G
is a topological group and π is a unitary representation of G on a complex
Hilbert space H.
Definition 10.29 (Almost-invariant vectors). Given ε > 0 and a sub-

set Q ⊆ G, we say that a unit vector v ∈ H is (Q, ε)-almost invariant if
sup kπg v − vk 6 ε.
g∈Q
We also say that the unitary representation π has almost-invariant vectors if

it has (Q, ε)-almost invariant unit vectors for any ε > 0 and compact Q ⊆ G.
The case of (G, 0)-almost invariant vectors corresponds trivially to invari-

ant vectors. Also note that every unit vector is trivially (G, 2)-invariant. An-
other elementary but less immediate observation is contained in the following
exercise which relies on the geometry of Hilbert spaces.
Exercise 10.30. Suppose ε ∈ (0, 1). Assume that v ∈ H is a (G, ε)-almost invariant unit
vector. Show that there exists a non-zero vector that is invariant under all of G.
Definition 10.31 (Spectral gap). We say that π has spectral gap if π re-
stricted to (HG )⊥ does not have almost-invariant vectors, where
HG = {v ∈ H | πg v = v for all g ∈ G}
is the subspace of G-invariant vectors.
Equivalently, we have spectral gap if there exists a compact subset Q ⊆ G

and some ε > 0 such that every unit vector v ∈ (HG )⊥ is moved at least by ε
by some g ∈ Q; more precisely if supg∈Q kπg v − vk > ε.
Definition 10.32 (Property (T)). Let H < G be a closed subgroup of a

topological group G. We say that (G, H) has relative property (T) if whenever
a unitary representation π of G has almost-invariant vectors, then it has a
non-zero vector fixed by H. We say that G has property (T) if (G, G) has
relative property (T).
† This section will not be used later in the book except for Section 10.4. Amenability
and property (T) will be (almost) exclusive. We note that apart from Exercise 10.35 the
following will be independent of Section 10.2.
We note that the letter ‘T’ in property (T) stands for the trivial repres-
entation and that the parentheses indicate a neighbourhood of the trivial
representation. In fact, there is a definition of a topology on the family of
irreducible unitary representations of a topological group G — the Fell to-
pology — such that property (T) is equivalent to the trivial representation
being isolated in that topology.
Finding groups without property (T) is quite easy.
Example 10.33. Let G = Z or G = R. Then G does not have property (T).
Justification of Example 10.33. Let H = L2 (G) and use the regular

representation (λ, H) defined by λx f (y) = f (y − x) for all x, y ∈ G. Let mG
denote the Haar measure on G (that is, counting or Lebesgue measure).
Let Fn = [−n, n] in G for all n > 1. Then
fn = mG (Fn )−1/2 1Fn
has norm 1 and is almost-invariant in the sense that

1/2
mG (Fn + x)△Fn
kλx fn − fn k = −→ 0
mG (Fn )1/2
as n → ∞, uniformly on compact sets.

If G did have property (T), then L2 (G) would have to contain a G-invariant
function. However, a G-invariant function on L2 (G) would have to be constant
(see Exercise 10.4). Since mG (G) = ∞, no non-zero constant function can lie
in L2 (G). Therefore, G does not have property (T).
Exercise 10.34. Show that if G is a topological group with property (T), and φ is a
continuous homomorphism from G to G′ with dense image, then G′ also has property (T).
Conclude that the free group F (with at least one generator) does not have property (T).
Exercise 10.35. Let G be a discrete or locally compact σ-compact metrizable group. Show
that G is compact if and only if G is amenable and has property (T).
Comparing Definitions 10.31 and 10.32, we see that a unitary representa-

tion of a group with property (T) always has a spectral gap. The next lemma
shows that more is true.
Lemma 10.36 (Uniform spectral gap). Suppose G is a locally compact σ-

compact metrizable group. Then it suffices to consider only separable Hilbert
spaces in the definition of property (T). Moreover, assuming that G has prop-
erty (T) all unitary representations of G have uniform spectral gap in the
sense that there exists some ε > 0 and Q ⊆ G compact such that for any unit-
⊥
ary representation π on a Hilbert space H and any unit vector v ∈ HG ,
there is some g ∈ Q with kπg v − vk > ε.
S
Proof. Note that by Lemma A.22, G can be written as ∞ n=1 Qn for some
compact subsets Qn ⊆ G with Qn ⊆ Qon+1 for all n > 1.
Suppose that G does not satisfy the uniform spectral gap property in
the lemma. Then for every n > 1 there is a unitary representation (πn , Hn )
without fixed vectors such that there exists a vector vn ∈ Hn that is (Qn , n1 )-
almost invariant. Since the unitary representation is continuous (and G is
separable), it follows that Sn = {πn,g vn | g ∈ G} ⊆ Hn is separable. It
follows that the closed linear hull Hn′ = (H L n )vn of Sn is a separable G-
invariant subspace ofL Hn . Now define H = n Hn′ with the natural unitary
representation π = n πn |H′n of G on H (see Exercise 3.77) and notice
that (π, H) has no non-zero G-invariant vectors. Moreover, it has almost-
invariant vectors since for every ε > 0 and compact K ⊆ G there exists
some n such that K ⊆ Qn , n1 6 ε, and hence vn ∈ Hn′ ⊆ H is (K, ε)-almost
invariant. It follows that the failure of the uniform spectral gap property
implies the existence of a unitary representation on a separable Hilbert space
without spectral gap. This proves both statements of the lemma.
Exercise 10.37. Show that a discrete group with property (T) is finitely generated.
10.3.2 Main Theorems
In the following we will consider the groups SLd (R) endowed with the topo-
2
logy induced by the inclusion SLd (R) ⊆ Matd,d (R) ∼= Rd . Každan gave the
definition of property (T) in 1967 and also gave the first examples of such
groups.
Theorem 10.38 (Každan). SL3 (R) has property (T).
We note that G = SL2 (R) does not have property (T), but despite this,
many of its natural (and all of its irreducible) unitary representations have
spectral gap; we refer to [26] for references and a detailed discussion. The
main tool for proving the above theorem is the following relative version.

Theorem 10.39 (Každan). ASL2 (R), R2 has relative property (T), where

2 Ax 2
ASL2 (R) = SL2 (R) ⋉ R = | A ∈ SL2 (R), x ∈ R .
0 1
As we will see there is a way to push property (T) from the group SL3 (R)
to its discrete counterpart SL3 (Z).
Corollary 10.40 (Každan). SL3 (Z) has property (T).
As Margulis showed in 1988 discrete groups with property (T) quickly give
rise to expander families, which we will introduce in the next section.
10.3.3 Proof of Každan’s Property (T), Connected Case
For the proof of Theorem 10.38 we need the following property of unitary rep-
resentations of G = SLd (R) for d = 2, 3 (due to Mautner [69] and Moore [75]).
For this we define the subgroup

1x
U12 = ux = |x∈R
01
of SL2 (R). Identifying R2 with the subspace R2 × {0}d−2 of Rd , we obtain an

embedding
g
g 7−→
Id−2
for g ∈ SL2 (R) of SL2 (R) into SLd (R) for d > 3 and may think of U12 also as
a subgroup of SLd (R). Conjugating U12 with permutation matrices we obtain
other subgroups of SLd (R), which we will refer to as elementary unipotent
subgroups.
Proposition 10.41 (Mautner phenomenon). Let π : SLd (R) ý

H for
some d > 2 be a unitary representation. Suppose that v ∈ H satisfies either
• πa v = v for a non-trivial positive diagonal matrix a ∈ SLd (R), or
• πu v = v for all elements u ∈ U of an elementary unipotent subgroup U .
Then v is an invariant vector, meaning that πg v = v for all g ∈ SLd (R).
For the proof we will use the following algebraic fact for K = R. For any
field K the group SLd (K) is generated by the elementary unipotent subgroups
(defined as above but with x ∈ K). This may be seen using a modified Gauss
elimination algorithm: given any g ∈ SLd (K) it is clear that the first column is
non-zero. Multiplying g on the left by elements of U12 (or another elementary
unipotent subgroup) corresponds to the row operation of adding a multiple
of the second row to the first row (or the same with any two other rows). For
example, for d = 2 we have

1x ab a + xc b + xd
=
01 cd c d
for all x, a, b, c, d ∈ R. Using such operations we can obtain matrices g ′ , g ′′

and ge that satisfy increasingly stronger properties:
′
• g21 6 0,
=
′′
• g11 = 1,
• ge11 = 1, and ge21 = ge31 = · · · = ged1 = 0.
Multiplying on the right by the same type of matrices corresponds to column
operations which allows us to find now a matrix gb ∈ HgH satisfying
• gb11 = 1, gb1k = gbk1 = 0 for k > 2,

where H is the subgroup of SLd (K) generated by the elementary unipotent
subgroups. Using induction on the number of variables we see that gb ∈ H,
which implies that g ∈ H and thus H = SLd (K).
Proof of Proposition 10.41. As we will see we only have to multiply
matrices and use continuity of the unitary represenation. Let us first consider
the case d = 2 and a diagonal positive matrix

t 0
a= ,
0 t−1
where we may assume without loss of generality that t > 1. Let

1x
u= ∈ U12
01
for some x ∈ R and notice that

−2n
−n 1 x n 1t x
lim a a = lim = I.
n→∞ 01 n→∞ 0 1
If πa v = v then continuity of the unitary representation implies that
kπu v − vk = kπu πan v − πan vk = kπa−n uan v − vk −→ 0
as n → ∞. Therefore πu v = v for all u ∈ U12 . Using the same argument

with πa−1 v = v and the relation

1 0 −n
lim an a = I,
n→∞ x1
we see that v is fixed by both elementary unipotent subgroups, and hence by

all of SL2 (R).
Staying with the case d = 2, suppose now that v is fixed by the sub-
group U12 . Define
1 0
gn = 1
n 1
and calculate

1n 1 0 1 − n2 2 0
un gn u−n/2 = 1 = 1 1 ,
01 n 1 0 1 n 2
which shows that

20
lim un gn u−n/2 = = a2 .
n→∞ 0 21
Using continuity of the unitary representation again we see that
kπa2 v − vk = lim kπun πgn πu−n/2 v − vk

n→∞
= lim kπgn v − π−un vk = lim kπgn v − vk = 0
n→∞ n→∞
since gn → I as n → ∞. Therefore, v is also invariant under a2 and the first

part of the proof shows that v is fixed by all of SL2 (R). The case of the other
elementary unipotent subgroup follows by the same argument.
Let us note that the first argument above also applies for a non-trivial
diagonal matrix a ∈ SLd (R) with positive eigenvalues a1 , . . . , ad > 0 in the
following way: If πa v = v and, for example, a1 6= a2 , then v is also fixed by
the subgroup obtained by embedding SL2 (R) into the upper left 2-by-2 block
in SLd (R).
Suppose now that d = 3 and v is fixed by a non-trivial positive diagonal
matrix a with eigenvalues a1 , a2 , a3 . Assume that a1 6= a2 (the other cases
are similar, or can be reduced to this one by using permutation matrices). In
this case v is fixed by the subgroup H obtained by embedding SL2 (R) into
the upper left 2-by-2 block in SL3 (R) and in particular by
 
200
a′ = 0 21 0 ∈ H.
001
Since the eigenvalues of a′ satisfy a′1 6= a′3 and a′2 6= a′3 we may repeat the
argument for SL2 (R) twice more and see that v is fixed by all elementary
unipotent subgroups, which implies that v is fixed by all of SL3 (R).
Remaining with the case d = 3, suppose that v is fixed by an elementary
unipotent subgroup U . Since U is again contained in a subgroup H ∼ = SL2 (R)
we see that v is invariant under a non-trivial positive diagonal element to
which we may apply the arguments above.
The case d > 3 follows similarly by induction and will not be needed later,
so we leave this part of the proof to the reader (see Exercise 10.42(a)).
Exercise 10.42. (a) Confirm that the case d > 3 in Proposition 10.41 may be seen using
the same argument.
(b) Suppose that u ∈ SLd (R) is a non-trivial unipotent element (that is, u 6= I and all
ý
eigenvalues of u are equal to 1). Show that for any unitary representation π : SLd (R) H
any v ∈ H with πu v = v is invariant under all of SLd (R).
Proof of Theorem 10.38 assuming Theorem 10.39. Note that

Ax
ASL2 (R) = SL2 (R) ⋉ R2 = | A ∈ SL2 (R), x ∈ R2
0 1
is a closed subgroup of SL3 (R).

Suppose π is a unitary representation of SL3 (R) that has almost-invariant
vectors. Restricting π to ASL2 (R) we obtain a representation of ASL2 (R) that
has almost-invariant vectors. Then by Theorem 10.39, H contains a non-zero
vector v that is fixed by

Ix
01
for all x ∈ R2 . By the Mautner phenomenon (Proposition 10.41) this implies
that v is fixed by all of SL3 (R). It follows that SL3 (R) has property (T).
For the proof of Theorem 10.39 we will use the spectral measures from
Bochner’s theorem for unitary flows (Theorem 9.56).
Lemma 10.43 (Normalizer and push-forward). Let G = ASL2 (R) and

suppose π is a unitary representation of G on a complex Hilbert space H.
Let µv denote the spectral measure of v ∈ H with respect to the restriction of π
t
to R2 ⊳ G. For every A ∈ SL2 (R) and v ∈ H we then have µπA v = (A )−1 ∗ µv
where πA is the unitary operator obtained from the matrix A, thought of as
an element of ASL2 (R).
Proof. Recall that for any v ∈ H the spectral measure µv is uniquely de-
termined by the property
Z

πu(x) v, v = e2πix·t dµv (t)
R2
for all x ∈ R2 , where we use the injective homomorphism u : R2 → ASL2 (R)

defined by
Ix
u(x) =
01
for x ∈ R2 . Applying this to πA v, we have
Z

e2πix·t dµπA v (t) = πu(x) πA v, πA v = πu(A−1 x) v, v
R2
Z Z t −1
2πiA−1 x·t
= e dµv (t) = e2πix·(A ) t dµv (t)
2 R2
ZR
t
2πix·s −1
= e d(A )∗ µv (s),
R2
t
where we used A−1 u(x)A = u(A−1 x) for all x ∈ R2 . Hence µπA v = (A )−1
∗ µv
by uniqueness of the spectral measure.
Lemma 10.44 (Continuity of spectral measures). Let π be a unitary

representation of Rd for some d > 1 on a complex Hilbert space H. If v
and w in H have norm one, then the difference of their spectral measures
satisfies kµv − µw k 6 4kv − wk.
Proof. First decompose w = w1 + w2 with w1 ∈ Hv and w2 ∈ Hv⊥ . By

the properties of the orthogonal decomposition we have kv − w1 k 6 kv − wk
and kw2 k 6 kv − wk 6 2. Just as in Lemma 9.12(b) it is easy to see that
µw = µw1 + µw2
which gives
kµv − µw k = kµv − µw1 − µw2 k

6 kµv − µw1 k + kµw2 k 6 kµv − µw1 k + 2kv − wk
since kµw2 k = kw2 k2 .

It remains to bound kµv − µw1 k. First recall that in the spectral theorem
(Theorem 9.58) the generator v corresponds to 1 ∈ L2µv (Rd ) and w1 corres-
ponds to some function f ∈ L2µv (Rd ). Also note that dµw1 = |f |2 dµv by the
same argument as in the proof of Lemma 9.12(c). Therefore,
Z
kµv − µw1 k = |1 − |f |2 | dµv
Rd
Z

= h 1 − |f |2 dµv = hh1, 1iL2 − hhf, f iL2 ,
µv µv
Rd
where h(t) = sign(1 − |f (t)|2 ). By the Cauchy–Schwarz inequality we deduce

that
kµv − µw1 k = hh1 − hf, 1iL2 + hhf, 1 − f iL2

µv µv
6 kv − w1 kkvk + kw1 kkv − w1 k 6 2kv − w1 k 6 2kv − wk
since 1 ∈ L2µv (Rd ) corresponds to v and f ∈ L2µv (Rd ) to w1 ∈ Hv . Together

with the above this gives the lemma.
The last preparatory step for the proof of Theorem 10.39 is the following
negative result.
Lemma 10.45 (No invariant measures). The natural action of SL2 (R) on
the projective line P1 (R) = R2r{0}/ ∼ has no invariant probability measures.
The natural action here is given by

ab x ax + by
SL2 (R) ∋ : −→ ,
cd y cx + dy

x
where we write for the equivalence class (with respect to proportion-
y
x
ality) of a vector ∈ R2r{0}.
y
Proof of Lemma 10.45. Notice that SO2 (R) ⊆ SL2 (R) acts transitively
on P1 (R), the kernel M of the action consists of ±I, and that SO2 (R)/M acts
simply transitively on P1 (R). Fixing an element of P1 (R), say the element cor-
responding to the x-axis, to correspond to the identity of SO2 (R)/M , we may
identify P1 (R) with SO2 (R)/M so that the action corresponds to translation
on the group. By uniqueness of the Haar measure (Proposition 10.2) there
is only one SO2 (R)-invariant probability measure on P1 (R). However, other
elements
of SL2 (R) do not preserve that measure. For example, the action

e
of does not preserve that probability measure (check this).
e−1
Proof of Theorem 10.39. Let π be a unitary representation of ASL2 (R)

on a Hilbert space H, and suppose it has almost invariant vectors. We note
that we may assume that H is a complex Hilbert space, for if H is a real
Hilbert space we may complexify it (see Exercise 6.51 and Exercise 10.46)
and can extend the given representation to a unitary representation on a
complex Hilbert space that will also have almost invariant vectors. Let (Qn )
be a sequence of compact subsets in ASL2 (R) with Qn ⊆ Qn+1 for which
[
Qn = ASL2 (R).
n>1
Then for every n > 1 there exists some (Qn , n1 )-invariant vector vn ∈ H
with kvn k = 1.
Let µvn be the spectral measure of vn with respect to R2 ⊳ ASL2 (R) for
each n > 1. If, for some n > 1, we have µvn ({0}) > 0, then by the spectral
theorem (Theorem 9.58)
= L2µvn (R2 ) ∋ 1{0}

Hvn ∼
contains a non-zero vector that is invariant under R2 . This is precisely the

statement that we want to prove.
So suppose that µvn ({0}) = 0 for all n > 1, and project µvn to a measure
νn = p∗ µvn
on P1 (R), where p : R2r{0} → P1 (R) denotes the natural projection

map p(v) = [v] for all v ∈ R2r{0}. Since P1 (R) is compact we may ap-
ply Proposition 8.27 and choose a subsequence (νnk ) such that νnk → ν in
the weak* topology as k → ∞ for some probability measure ν on P1 (R). We
claim that ν is invariant under the action of SL2 (R) on P1 (R).
To show this, let f ∈ C(P1 (R)), consider the function
F = f ◦ p ∈ L ∞ (R2r{0}),
and extend it by, for example, setting F (0) = 0. For A ∈ Qn ∩ SL2 (R) the
vector vn satisfies kπA vn − vn k 6 n1 and so we have
Z Z Z
f ◦ (At )−1 dνn = F ◦ (At )−1 dµvn = F d(At )−1
∗ µvn
P1 (R) R2 R2
Z Z
1
1

= F dµvn + Of n = f dνn + Of n
R2 P1 (R)
by Lemmas 10.43 and 10.44. Now let n = nk and take k → ∞ to see that
Z Z
f ◦ (At )−1 dν = f dν
P1 (R) P1 (R)
for all f ∈ C(P1 (R)) and A ∈ SL2 (R). However, this shows that ν is SL2 (R)-
invariant, which contradicts Lemma 10.45.
Exercise 10.46. Let π be a unitary representation of a topological group on a real Hilbert

space H. Let HC be the complexification of H as in Exercise 6.51. Show that π can be
extended to a unitary representation on HC , which has almost-invariant (or invariant)
vectors if and only if the original representation has almost invariant (or invariant) vectors.
Exercise 10.47. Show that SLd (R) has property (T) for all d > 3.
10.3.4 Proof of Každan’s Property (T), Discrete Case
The connection between SL3 (R) and its discrete subgroup SL3 (Z) is largely
controlled by the fact that SL3 (Z) is a lattice in SL3 (R). We will not discuss
the important notion of lattices in detail, but instead will work with the
following form of the result, which will be proved after its significance is
established.
Theorem 10.48 (SL3 (Z) is a lattice). There exists a Borel subset F of the
group G = SL3 (R), called a Ffundamental domain for SL3 (Z) in SL3 (R), such
that mG (F ) < ∞ and G = γ∈SL3 (Z) F γ.
Apart from this result, we will also need a simple form of induction of
unitary representations which will allow us to lift a unitary representation
of SL3 (Z) to a unitary representation of SL3 (R).
To explain this more generally, we let Γ < G be a discrete subgroup of a
locally compact, σ-compact, metrizable, unimodular group, and let F ⊆ G
be a fundamental domain for Γ in G, that is, a Borel subset such that
G
G= F γ. (10.19)
γ∈Γ
Simple examples include Γ = Zd < G = Rd with F = [0, 1)d for any d > 1;
we refer to [25] for more details on the properties of fundamental domains.
Furthermore, let πΓ : Γ ý
HΓ be a unitary representation of Γ on a separable
Hilbert space HΓ .
Using these objects, we now define a new Hilbert space HG equipped

with a unitary representation of G called the induced representation. It is
possible to give this definition abstractly and in a coordinate-free way, but
using a Hilbert space isomorphism between HΓ and ℓ2 (N) we can make the
definition of HG more explicit. We implicitly assume here that HΓ is infinite-
dimensional. In the finite-dimensional case the construction is slightly easier
and adapting the notation to this case is straightforward. In the description
afforded by this isomorphism we can therefore assume that HΓ = ℓ2 (N) and
that Γ acts unitarily on ℓ2 (N). Using the fundamental domain F ⊆ G for Γ
in G we give the initial definition
M ∞
X
2 2
HG = L (F ) = (f1 , f2 , . . . ) ∈ L (F ) |N
kfn k22 <∞ .
n∈N n=1
2
L 2
In the following we will think of f ∈ n∈N L (F ) as a measurable ℓ (N)-
valued and square-integrable function f : F ∋ g 7→ f (g) = (f1 (g), f2 (g), . . . ).
We note that f (g) belongs to ℓ2 (N) for almost every g ∈ F by Fubini’s
theorem. We also define the norm of f = (f1 , f2 , . . . ) by
∞
!1/2 Z 1/2
X
kf kHG = kfn k22 = kf (g)k22 dmG (g) .
n=1 F
Using the unitary representation of Γ on ℓ2 (N) and the decomposition of G

in (10.19) into
L right translates of F we now extend the domain of defini-
tion of f ∈ n∈N L2 (F ) (resp. of the functions f1 , f2 , . . . appearing in any
function f ∈ HG ) to all of G. Given g ∈ F and γ ∈ Γ we define
f (gγ) = πΓ (γ)−1 f (g), (10.20)
which is well-defined for every g ∈ F with the property that
f (g) = (f1 (g), f2 (g), . . .) ∈ ℓ2 (N),
and hence for almost every g ∈ F . For those g, we have
f (g) = lim (f1 (g), . . . , fn (g), 0, 0, . . .)

n→∞
and
n
X
f (gγ) = lim πΓ (γ)−1 (f1 (g), . . . , fn (g), 0, 0, . . .) = lim fk (g)πΓ (γ)−1 ek
n→∞ n→∞
k=1
shows that the components of f (gγ) are a convergent sum of finite linear
combinations of f1 (g), f2 (g), . . . . In particular, f (gγ) depends measurably
on g ∈ F . We will frequently identify f on F with the extension of f to all of G
using (10.20). Moreover, we modify the definition L

of HG so that HG consists
of those functions on G obtained from elements in n∈N L2 (F ) using (10.20),
as explained above.
Even though we used the fundamental domain F in the above construc-
tion quite prominently, the ‘algebraic’ property of the elements of HG is
independent of the choice of F .
Lemma 10.49 (Equivariance property). Let Γ < G, F , πΓ , HΓ , and HG

be as above. Then for any f in HG and almost every g ∈ G we have
f (gγ) = πΓ (γ)−1 f (g)
for every γ ∈ Γ .
Proof. Since Γ is countable it is enough to show this for a fixed γ ∈ Γ .

Let f ∈ HG , g ∈ G and η ∈ Γ have gη ∈ F and f (gη) ∈ ℓ2 (N). Then
f (g) = f (gηη −1 ) = πΓ (η)f (gη) (10.21)
by definition of f ∈ HG . Similarly, we also have for any γ ∈ Γ that
f (gγ) = f (gηη −1 γ) = πΓ (η −1 γ)−1f (gη) = πΓ (γ)−1 πΓ (η)f (gη) = πΓ (γ)−1f (g)
by (10.21). This gives the lemma.

Just as in the algebraic property discussed above, the resulting norm
on HG is also independent of the choice of F .
Lemma 10.50 (Independence of norm). Let Γ < G, F , πΓ , HΓ , and HG

be as above. Let F ′ be another fundamental domain for Γ in G. Then
Z 1/2 Z 1/2
kf kHG = kf (g)k22 dmG (g) = kf (g)k22 dmG (g) . (10.22)
F F′
In the case where πΓ is the trivial representation of Γ , the above contains

the following lemma. However, as it is easier we will give an independent
proof of the next lemma as a warmup for the proof of Lemma 10.50. Because
of this motivation we refrain from introducing the natural viewpoint of the
continuous action of G on the homogeneous space G/Γ ; see Raghunathan [89]
and [25] for this viewpoint.
Lemma 10.51 (Equality of measures). Let Γ < G be as above. Sup-

pose F, F ′ ⊆ G are both measurable fundamental domains as in (10.19).
Then mG (F ) = mG (F ′ ) and this quantity is called the co-volume of Γ .
Moreover, for every g0 ∈ G the invertible map F ∋ g 7→ g ′ = g0 gγg ∈ F,
with γg ∈ Γ uniquely determined by g and g0 , is measure-preserving. Finally,
if B ⊆ G is measurable with mG (B) > mG (F ), then there exists some g ∈ B
and γ ∈ Γ r{e} with gγ ∈ B.
Proof. By assumption on F in (10.19) we have

G
F′ = F ′ ∩ (F γ −1 ). (10.23)
γ∈Γ
Multiplying F ′ ∩(F γ −1 ) on the right by γ, we obtain the sets (F ′ γ)∩F . These

are again disjoint by assumption on F ′ and their union equals F . Therefore,
X X
mG (F ′ ) = mG (F ′ ∩ (F γ −1 )) = mG ((F ′ γ) ∩ F ) = mG (F )
γ∈Γ γ∈Γ
by unimodularity of G.
Suppose now that g0 ∈ G and F is a measurable fundamental domain.
Then F ′ = g0 F is another fundamental domain and (10.23) defines a map φ
by
F ∋ g 7−→ g0 g ∈ F ′ 7−→ g0 gγg ∈ F
(with γg ∈ Γ being uniquely determined by the condition g0 gγg ∈ F ). It
is clear that the inverse to this map is given by the same procedure but
using g0−1 . To see that these maps are measure-preserving we consider the
function φ above. Now let B ⊆ F be measurable, and note that φ(B) is
defined by piecewise right translation of the set g0 B ⊆ F ′ = g0 F back to F . In
other words, we use (10.23) and apply the same cut-and-translate procedure
to obtain the desired equality
X
mG (B) = mG (g0 B) = mG (g0 B) ∩ (F γ −1 )
γ∈Γ
!
X G
= mG (g0 Bγ) ∩ F = mG (g0 Bγ) ∩ F = mG (φ(B)).
γ∈Γ γ∈Γ
Strictly speaking we should prove that mG (φ−1 (B)) = mG (B) as in the defin-
ition of a measure-preserving map (Definition 8.35), but since φ is invertible
this distinction is not important.
Suppose now that B ⊆ G is measurable with mG (B) > mG (F ). Apply-
ing (10.19) we see that
X X
mG (F ) < mG (B) = mG B ∩ (F γ −1 ) = mG (Bγ) ∩ F ,
γ∈Γ γ∈Γ
which implies the existence of γ1 6= γ2 ∈ Γ and g1 , g2 ∈ B with g1 γ1 = g2 γ2

as required.
Proof of LemmaF10.50. Let F and F ′ again denote two fundamental do-

mains so that F = γ∈Γ F ∩ (F ′ γ −1 ). Applying the first equality in (10.22)
(which is the definition of the norm on HG using F ) we obtain
XZ XZ
kf k2HG = kf (g)k22 dmG (g) = kf (hγ −1 )k22 dmG (h)
γ∈Γ F ∩(F ′ γ −1 ) γ∈Γ (F γ)∩F ′
by using the substitution g = hγ −1 for g ∈ F ∩ F ′ γ −1 and h ∈ F γ ∩ F ′ and

unimodularity of G. Using the defining formula f (h) = f (gγ) = πΓ (γ)−1 f (g)
we see that
kf (h)k2 = kf (g)k2 = kf (hγ −1 )k
and so
XZ Z
kf k2HG = kf (h)k22 dmG (h) = kf (h)k22 dmG (h),
γ∈Γ (F γ)∩F ′ F′
F
where we used the consequence F ′ = γ∈Γ (F γ) ∩ F ′ of (10.19).
We are now ready to prove the main properties of the unitary induction
(which in a sense combines the unitary representation πΓ and the measure-
preserving maps discussed in Lemma 10.51).
Proposition 10.52. Let G be a locally compact σ-compact metrizable uni-
modular group and Γ < G a lattice (so that there exists a fundamental do-
main F as in (10.19) with mG (F ) < ∞). Given a unitary representation πΓ
of Γ on a separable Hilbert space HΓ , the Hilbert space HG constructed above
admits a unitary representation πG of G defined by πG,g0 f (g) = f (g0−1 g)
for g0 , g ∈ G and f ∈ HG . Moreover, HΓ has a non-trivial Γ -fixed vector if
and only if HG has a non-trivial G-fixed vector, and HG has almost invariant
vectors if HΓ has almost invariant vectors.
Note that the formula defining πG,g0 is the same formula as for the left
regular representation on the space of functions on G, but that the space and
the norm are different.
Exercise 10.53. Let Γ < G be a discrete subgroup of a locally compact, σ-compact,
metrizable, unimodular group G. Let πΓ be the left regular representation of Γ on ℓ2 (Γ )
defined by πΓ,γ0 f (γ) = f (γ0−1 γ) for all f ∈ ℓ2 (Γ ) and γ0 , γ ∈ Γ . Show that the induced
representation πG is then unitarily isomorphic to the left regular representation of G.
Proof of Proposition 10.52. Let g0 ∈ G and f ∈ HG . Then

Z Z
2 −1 2
kπG,g0 (f )kHG = kf (g0 g)k2 dmG (g) = kf (h)k22 dmG (h)
F g0−1 F
by left-invariance of the Haar measure mG . However, by Lemma 10.50 the

latter is equal to kf k2HG since g0−1 F = F ′ is also a fundamental domain.
Hence πG,g0 is a unitary operator on HG for any g0 ∈ G. That πG is a
homomorphism from G into the group of unitary operators of HG follows by
the same argument as for the regular representation on p. 353.
Lifting invariant vectors. Suppose now that v ∈ HΓ is an invariant unit
vector. Then we can define f (g) = √ 1 v for all g ∈ F , and the extension
mG (F )
1
of f to G will be f (g) = √ v for all g ∈ G. Notice that f ∈ HG
mG (F )
since mG (F ) < ∞. We therefore obtain a unit vector of HG that is invariant
with respect to G.
Pushing invariant vectors back to HΓ . Suppose for the opposite
direction that HG has a G-invariant unit vector f . Since G acts trans-
itively on itself, this implies that f (g) = v for some non-zero v in HΓ
and almost every g ∈ G (by using Exercise 10.4 for each component fj
of f = (f1 , f2 , . . .)). Since we also have f (gγ) = πΓ (γ)−1 f (g) for almost
every g ∈ G and all γ ∈ Γ we see that v ∈ HΓ is a non-zero Γ -invariant
vector.
Lifting almost invariant vectors. Suppose next that HΓ has almost
invariant vectors, let K ⊆ G be a compact subset and let ε > 0. Recall
that mG (F ) < ∞ since Γ < G is a lattice. By regularity of mG there exists
a compact subset L ⊆ F such that mG (F rL) < εmG (F ). Since L−1 KL ⊆ G
is compact and Γ < G is discrete, the set Q = Γ ∩ (L−1 KL) is a finite subset
of Γ . Suppose now that v ∈ HΓ is a (Q, ε)-almost invariant unit vector. Much
as in the discussion of invariant vectors, we define f ∈ HG by setting
(
v for g ∈ L,
f (g) =
0 for g ∈ F rL
and use the formula f (gγ) = πΓ (γ)−1 f (g) for all g ∈ F and γ ∈ Γ to extend f
to a function f ∈ HG .
Now let k ∈ K and g ∈ F . Then
πG,k f (g) = f (k −1 g) = πΓ (γ)f (k −1 gγ)
for all γ ∈ Γ . Choose γg ∈ Γ such that k −1 gγg = g ′ ∈ F . Using this notation

we can then write
Z

πG,k f − f 2 = πΓ (γg )f (k −1 gγg ) − f (g) 2 dmG (g).
HG | {z } 2
F
=g′ ∈F
Next we decompose the integral into the subsets:

• g ∈ F rL (with f (g) = 0 and kf (g ′ )k2 6 kvk2 = 1);
• g ∈ L and g ′ = k −1 gγg ∈ F rL (with f (g) = v and f (g ′ ) = 0); and
• g, g ′ ∈ L (with f (g) = f (g ′ ) = v).
In the first case we will use mG (F rL) < εmG (F ), in the second case
mG ({g ∈ L | g ′ ∈ F rL}) 6 mG (F rL) < εmG (F )
(which follows since F ∋ g 7→ g ′ ∈ F is measure-preserving by Lemma 10.51),

and in the third case we see that
γg = g −1 kg ′ ∈ L−1 KL ∩ Γ = Q
and so kπΓ (γg )v − vk2 < ε. Together this gives
kπG,k f − f k2HG < 2εmG (F ) + ε2 mG (F ).
We may assume that ε < 12 , so that

r
p mG (F )
kf k2 = mG (L) > .
2
Since ε > 0 was arbitrary, this shows that HG has almost invariant vectors.
Continuity. The alert reader will have noticed that the arguments above
do not finish the proof, because it remains to be shown that HG is indeed a
unitary representation, and in particular satisfies the continuity requirement.
We will use Lemma 3.74 for this, but only indirectly.
Let us begin by noting that it is enough to prove continuity at the iden-
tity e ∈ G for the following reason. Suppose that for any sequence (kn )
in G with kn → e as n → ∞ we have πG,kn f → f as n → ∞ for
any f ∈ HG . It follows that if (gn ) is a sequence in G with gn → g as n → ∞
then πG,gn f −πG,g f = πG,g (πG,kn f −f ) → 0 since kn = g −1 gn → e as n → ∞
and πG,g is continuous. L
n∈N L (F ) with L (F × N) ⊆ L (G × N)
2 2 2
In the following, we identify
and endow the latter with the unitary representation defined by
λg (f )(h, n) = f (g −1 h, n)
for (h, n) ∈ G × N, g ∈ G, and f ∈ L2 (G × N). L

2
Given any f ∈ HG , we consider f |F ∈ n∈N L (F ) as an element
of L (G × N) satisfying f |F (g, n) = fn (g) for g ∈ F and f |F (g, n) = 0
2
for g ∈ GrF , and n ∈ N. Applying Lemma 3.74 to f |F ∈ L2 (G × N), we see

that there exists, for every ε > 0, a neighbourhood V of e such that

λg f |F − f |F < ε (10.24)
2
whenever g ∈ V . Given one such g ∈ V we write

F = F ∩ (g −1 F ) ∩ (gF ) ⊔ F r(gF ) ∪ F r(g −1 F ) = Fin ⊔ B,
| {z } | {z } | {z }
=Fin =B+ =B−
that is, we decompose F into the part Fin ⊆ F that stays inside F (under
the action of g and of g −1 ) and its relative complement
B = F r(gF ) ∪ F r(g −1 F ) = F rFin
(see Figure 10.2). It may help to think of B as the bad set on which λg
and πG,g are quite different. We need to estimate its significance.
B+ Fin B−
g −1 F F gF
Fig. 10.2: The circle depicts F , and the action of g translates the circle to the
right, giving rise to the decomposition F = Fin ⊔ B.
Using these sets, and recalling that HG ∋ f ′ 7−→ f ′ |F ∈ L2 (F × N)

is a unitary isomorphism, we also decompose f ∈ HG into f = fin + fB
with fin , fB ∈ HG satisfying fin |F = f |F 1Fin and fB |F = f |F 1B . Using
the identity λg 1B− = 1gB− this implies that λg (f |F 1B− ) vanishes outside
of gB− and hence in particular it vanishes on F . Similarly λg (1F ) = 1gF
shows that λg (f |F ) vanishes on B+ . Combining these with (10.24), it follows
that
kf |F 1B− k2 = kλg f |F 1B− k2 6 kλg f |F − f |F k2 < ε,

kf |F 1B+ k2 6 kλg f |F − f |F k2 < ε,
and so
kfB kHG < 2ε. (10.25)
For fin we claim that

πG,g fin F = λg fin |F . (10.26)
Indeed, if h ∈ Fin ∪ B−rB+ = F ∩ gF , then g −1 h ∈ F and

πG,g fin (h) = fin (g −1 h) = λg fin |F (h)
as required. On the other hand, if h ∈ B+ , then h ∈ F but g −1 h ∈ / F.

Let γ ∈ Γ r{e} be such that g −1 hγ ∈ F , which implies g −1 hγ ∈ B− and
hence by Lemma 10.49

πG,g fin (h) = πΓ (γ) fin (g −1 hγ) = 0 = fin |F (g −1 h) = λg fin |F (h),
as claimed.
Combining (10.25)–(10.26) with (10.24) we can now obtain
kπG,g f − f kHG 6 kπG,g fin − fin kHG + 4ε

= kλg fin |F − fin |F k2 + 4ε

< kλg f |F − f |F k2 + 8ε < 9ε.
Since this holds for any g ∈ V , we obtain the continuity of the unitary
representation and hence the theorem.
Proposition 10.54. Let G be a locally compact σ-compact metrizable unim-

odular group and let Γ < G be a lattice in G. If G has property (T) then Γ
also has property (T).
Proof. Let πΓ : Γ ý HΓ be a unitary representation on a separable Hil-

bert space that has almost invariant vectors. Applying Proposition 10.52
we find the unitary representation πG : G ý HG , also with almost in-
variant vectors. By assumption G has property (T) so that HG has a non-
trivial G-invariant vector. By Proposition 10.52, this implies that HΓ has a
non-trivial Γ -invariant vector. It follows that Γ has property (T).
It should now be clear how to combine the arguments to obtain a proof
of Corollary 10.40: By Theorem 10.38, the topological group SL3 (R) has
property (T). In Theorem 10.48 we claimed that the discrete subgroup SL3 (Z)
is a lattice in SL3 (R). Hence Proposition 10.54 shows that SL3 (Z) also has
property (T). For the proof of the lattice property in Theorem 10.48 we have
to make a short excursion into the ‘geometry of numbers’.
10.3.5 Iwasawa Decomposition, Geometry of Numbers, and

Reduction Theory
Semi-simple groups have distinguished subgroups, some of which permit the

group to be decomposed. The Iwasawa or KAN decomposition of SL3 (R)
concerns the following subgroups:
• the compact special orthogonal group

K = SO3 (R) = g ∈ SL3 (R) | gg t = I ,
(where we write I for the identity matrix) which is also sometimes writ-
ten SO(3, R);
• the positive diagonal subgroup
  
 a1 0 0 
A =  0 a2 0  | a1 , a2 , a3 > 0 and a1 a2 a3 = 1 ;
 
0 0 a3
• and the unipotent subgroup

  
 1xz 
N = 0 1 y  | x, y, z ∈ R .
 
001
Lemma 10.55 (Iwasawa or KAN decomposition). Any element of SL3 (R)

can be written uniquely in the form kan with k ∈ K, a ∈ A and n ∈ N .
Proof. As we will see, this is simply a reformulation of the familiar Gram–

Schmidt procedure in R3 (see the proof of Theorem 3.39). Writing the mat-
rix g = (w1 , w2 , w3 ) ∈ SL3 (R) in terms of its column vectors w1 , w2 , w3 ∈ R3
we may apply the Gram–Schmidt orthonormalization procedure to obtain
v1 = a1 −1 w1 with a1 = kw1 k > 0,

v2 = a2 −1 (w2 − hw2 , v1 i v1 ) = a2 −1 (w2 − n12 a1 v1 ) for some a2 > 0, n12 ∈ R,
v3 = a3 −1 (w3 − n13 a1 v1 − n23 a2 v2 ) for some a3 > 0 and n13 , n23 ∈ R,
with the property that {v1 , v2 , v3 } is an orthonormal basis of R3 . This gives

  
a1 0 0 1 n12 n13
v1 , v2 , v3  0 a2 0  0 1 n23 
0 0 a3 0 0 1

= w1 , n12 a1 v1 + a2 v2 , n13 a1 v1 + n23 a2 v2 + a3 v3 = g.
We define k, a, n to be the first, second, and third matrix in the equation

above. Clearly det n = 1 and det a > 0. Since k has orthonormal column
2
vectors we have det k = ±1 (since k t k = I and so (det k) = 1). Since
we know that det g = 1, this implies that det k = det a = 1, so we have
established the existence of the decomposition.
If kan = g = k ′ a′ n′ are both decompositions of this form, then
an(a′ n′ )−1 = k −1 k ′ ∈ K ∩ AN
since K and AN are both subgroups. Since all elements of K are diagonaliz-
iable over C with eigenvalues of absolute value one, we see that K ∩AN = {I},
which implies k = k ′ and an = a′ n′ . Similarly, since A ∩ N = {I} we now see
in the same way that a = a′ and n = n′ .
A lattice in Rd is a subgroup of the form Λ = gZd for some g ∈ GLd (R).
Recall from Lemma 10.51 that the co-volume of Λ is defined by the Lebesgue
measure of any fundamental domain F ⊆ Rd for Λ. Using F = g[0, 1)d we
see that the co-volume is given by |det g|.
The next result is part of a theory from 1896 due to Minkowski, who also
invented the descriptive name ‘geometry of numbers’ for it (see [74] for a
reprint and the monograph of Lekkerkerker [60] for more material in this
direction).
Proposition 10.56 (Choice of basis). Let Λ < R3 be a lattice of co-

volume 1. Then there exists some g ∈ SL3 (R) with Λ = gZ3 with the property
that the matrices a ∈ A, n ∈ N , and k ∈ K in the Iwasawa decomposi-
tion g = kan satisfy a1 ≪ a2 ≪ a3 and n12 , n13 , n23 ∈ [− 21 , 12 ).
Proof. For the proof of the proposition we will also use a version of the
conclusion in two dimensions. Let us assume first that d > 2 and Λ ⊆ Rd is
a discrete subgroup. Let w1 ∈ Λ be a shortest non-zero vector of Λ, let V be
the space (Rw1 )⊥ and let p : Rd → V denote the orthogonal projection. We
claim that any non-zero vector p(w) ∈ p(Λ) has
√
3
kp(w)k > 2 kw1 k. (10.27)
To prove this, suppose that w ∈ Λ satisfies

√
3
0 < kvk < 2 kw1 k
where v = p(w). Clearly w = v + tw1 for some t ∈ R, and we may add

an integer multiple of w1 ∈ Λ to w ∈ Λ (without changing v) and suppose
without loss of generality that t ∈ [− 21 , 12 ). However, since v ⊥ w1 this gives
0 < kwk2 = kvk2 + t2 kw1 k2 < 43 kw1 k2 + 14 kw1 k2 = kw1 k2 ,
which contradicts our choice of w1 .

The two-dimensional case. We continue with a version of the statement
for Λ < R2 , where we will assume that Λ is a discrete subgroup containing two
linearly independendent vectors. We will also show that this implies that Λ
is a lattice (not necessarily of co-volume 1).
As above we choose a non-zero w1 ∈ Λ of minimal norm, define V to
be (Rw1 )⊥ , and p : R2 → V to be the orthogonal projection. Since Λ con-
tains two linearly independent vectors, the kernel of p is one-dimensional,
and (10.27) shows that p(Λ) is discrete, so there exists some non-zero vec-
tor u = p(w2 ) ∈ p(Λ) of minimal length. As above, we may suppose
that w2 = u + tw1 with t ∈ [− 12 , 12 ).
Suppose now that w ∈ Λ so that p(w) ∈ p(Λ). Since u is of minimal length
it is easy to see (by integer division with remainder) that p(w) = n2 u for
some n2 ∈ Z. Now consider w − n2 w2 ∈ ker(p) = Rw1 . Again because w1 is
of minimal length in Λ it follows that w − n2 w2 = n1 w1 for some n1 ∈ Z. In
other words, we have shown that Λ = (w1 , w2 )Z2 = Zw1 + Zw2 is a lattice
generated (as a group) by w1 , w2 ∈ R2 .
Applying the Gram–Schmidt orthonormalization procedure as in the proof
of Lemma 10.55 but for g = (w1 , w2 ) ∈ GL2 (R) we obtain v1 = a1 −1 w1
with a1 = kw1 k, and v2 = a2 −1 (w2 − hw2 , v1 i v1 ) = a2 −1 (w2 − n12 a1 v1 ) with
n12 = a−1 −1 1 1
1 hw2 , v1 i = a1 t hw1 , v1 i = t ∈ [− 2 , 2 )
and a2 = kw2 − n12 w1 k = kuk. This gives

a 1 n12
g = (w1 , w2 ) = (v1 , v2 ) 1 .
a2 1
With the claim in (10.27) we now obtain a2 = kuk = kp(w2 )k ≫ kw1 k = a1 ,

which gives the result for dimension 2.
The three-dimensional case. Now suppose that Λ < R3 is a lattice with
co-volume 1, so that Λ can be written g0 Z3 for some g0 ∈ SL3 (R). We again
choose w1 ∈ Λr{0} of minimal length. To simplify the discussion we may
apply some k ∈ SO3 (R) to rotate w1 to the first coordinate axis. In other
words, we may assume that w1 = kw1 ke1 is a vector of Λ with minimal length.
Let p : R3 → {0}×R2 be the orthogonal projection with kernel Re1 . Applying
the claim in (10.27), we see that v ∈ p(Λ) implies v = 0 or kvk ≫ kw1 k.
Since Λ contains 3 linearly independent vectors and ker(p) is one-dimensional
we see that p(Λ) contains at least two linearly independent vectors. Applying
the two-dimensional case to p(Λ) we find w2 , w3 ∈ Λ such that v2 = p(w2 )
and v3 = p(w3 ) satisfy p(Λ) = Zv2 + Zv3 , that v2 is the shortest vector
of p(Λ) and the size s of the orthogonal projection of v3 onto the orthogonal
complement of Rv2 has s ≫ kv2 k. For simplicity we may apply another
rotation of {0} × R2 fixing R × {(0, 0)} to Λ and suppose that v2 = kv2 ke2 .
This gives us  
kw1 k ∗ ∗
(w1 , w2 , w3 ) =  0 kv2 k ∗ .
0 0 ∗
Replacing w3 by −w3 if necessary, we may assume that
    
a1 ∗ ∗ a1 0 0 1 n12 n13
(w1 , w2 , w3 ) =  0 a2 ∗  =  0 a2 0  0 1 n23  (10.28)
0 0 a3 0 0 a3 0 0 1
with a3 = s > 0. To summarize, we have obtained a3 ≫ a2 ≫ a1 . Moreover,

for any w ∈ Λ there exist ℓ2 , ℓ3 ∈ Z with p(w) = ℓ2 v2 + ℓ3 v3 . Considering
w − ℓ2 w2 − ℓ3 w3 ∈ Rw1
and the choice of w1 , we also find ℓ1 ∈ Z with
w = ℓ1 w1 + ℓ2 w2 + ℓ3 w3 ,
which shows that Λ = (w1 , w2 , w3 )Z3 = Zw1 + Zw2 + Zw3 . Multiply-

ing (w1 , w2 , w3 ) on the right by
 
1 ℓ1 0
0 1 ℓ2 
0 0 1
with ℓ1 , ℓ2 ∈ Z allows us to modify n12 , n23 by integers while preserving the

lattice. Thus we may assume that n12 , n23 ∈ [− 12 , 12 ). Multiplying (w1 , w2 , w3 )
after this on the right by  
10ℓ
0 1 0
001
for some ℓ ∈ Z finally allows us to also obtain n13 ∈ [− 12 , 12 ).
Lemma 10.57. The group SL3 (R) is unimodular, and the Haar measure
on SL3 (R) decomposes with respect to the Iwasawa decomposition into the
(r)
product of the Haar measure mK on K and the right Haar measure mAN
on AN .
Proof. Notice first that SL3 (R) ⊆ Mat33 (R) = R9 is defined by a single
equation and hence is a hypersurface. We will define the Haar measure mSL3 (R)
on SL3 (R) using the Lebesgue measure mR9 by the following trick. Define a
measure µ on SL3 (R) by µ(B) = mR9 ({tg | t ∈ [0, 1], g ∈ B}) for any Borel
measurable set B ⊆ SL3 (R). To see that the set on the right-hand side is
measurable note that U = {m ∈ Mat33 (R) | det m ∈ (0, 1)} is open since the
determinant map is continuous, and on U the map
φ : U −→ (0, 1) × SL3 (R)

1
m 7−→ det m, √ 3
m
det m
is a homeomorphism. It follows that {tg | t ∈ [0, 1], g ∈ B} is also given
by {0} ∪ φ−1 ((0, 1) × B) ∪ B and so is measurable.
If now B = K is compact, then so is {tg | t ∈ [0, 1], g ∈ B} ⊆ R9 , which
gives µ(B) < ∞. Also, if B = O is non-empty and open, then
{tg | t ∈ [0, 1], g ∈ B}
contains the non-empty open set {tg | t ∈ (0, 1), g ∈ B} and so in particu-
lar µ(B) > 0. Now let B ⊆ SL3 (R) be measurable and g0 ∈ SL3 (R). Then
µ(g0 B) = mR9 ({tg0 g | t ∈ [0, 1], g ∈ B})

= mR9 (g0 {tg | t ∈ [0, 1], g ∈ B})
= mR9 ({tg | t ∈ [0, 1], g ∈ B}) = µ(B),
since left multiplication by g0 scales Lebesgue measure on Mat33 (R) by the

Jacobian of the map m 7→ g0 m which is | det g0 |3 = 1, and so preserves the
Lebesgue measure. This shows that µ is a left Haar measure, so we may
write µ = mSL3 (R) . On the other hand exactly the same argument applies to
right multiplication, so µ is also a right Haar measure, and hence SL3 (R) is
unimodular.
For the second claim define a map
ψ : K × AN −→ SL3 (R)
(k, an) 7−→ k(an)−1 ,
and note that the Gram–Schmidt procedure in the proof of Lemma 10.55
shows that ψ is a homeomorphism. Define a measure ν on K × AN by
ν(B) = mSL3 (R) (ψ(B))
for any measurable set B ⊆ K × AN . Since ψ is a homeomorphism, µ is finite

on compact sets and positive on non-empty open sets. Given some k ∈ K
and an ∈ AN we also have
ν ((k, an)B) = mSL3 (R) (ψ ((k, an)B))

= mSL3 (R) kψ(B)(an)−1 = mSL3 (R) (ψ(B)) = ν(B).
Thus ν is a left Haar measure on K × AN , which by uniqueness of Haar

measure means that ν is a scalar multiple of mK × mAN . Recalling that the
inverse map AN → AN sending an to (an)−1 maps the left Haar measure to
the right Haar measure, the result follows.
For the calculation coming up we also need to know the right Haar measure
on the subgroup AN < SL3 (R) explicitly in terms of coordinates.
Lemma 10.58. Using the coordinates (a1 , a2 , n12 , n13 , n23 ) ∈ R2>0 × R3 cor-
responding to   
a1 0 0 1 n12 n13
0 a2 0  0 1 n23  ∈ AN (10.29)
0 0 a3 0 0 1
(r)
with a3 = (a1 a2 )−1 , the right Haar measure mAN is given by
a1 a1 a2 da1 da2
dn12 dn13 dn23 . (10.30)
a2 a3 a3 a1 a2
Proof. Multiplying the matrix in (10.29) on the right by

 
1 m12 m13
0 1 m23 
0 0 1
gives the map (in our chosen coordinates)
(a1 , a2 , n12 , n13 , n23 ) 7−→ (a1 , a2 , n12 +m12 , m13 +n13 +n12 m23 , n23 +m23 ),
and it is easy to see that this preserves the measure defined by (10.30).
Multiplying on the right by
 
b1 0 0
 0 b2 0 
0 0 b3
with b1 , b2 > 0 and b3 = (b1 b2 )−1 we obtain the map

(a1 , a2 , n12 , n13 , n23 ) 7−→ a1 b1 , a2 b2 , bb21 n12 , bb13 n13 , bb32 n23 .
Let f be a positive measurable function on H = R2>0 × R3 . Then in

Z
f a1 b1 , a2 b2 , bb21 n12 , bb31 n13 , bb32 n23 aa12 aa13 aa23 da 1 da2
a1 a2 dn12 dn13 dn23
H
b2 b3 b3
we may substitute m12 = b1 n12 , m13 = b1 n13 , and m23 = b2 n23 to obtain
Z
f (a1 b1 , a2 b2 , m12 , m13 , m23 ) aa12 aa31 aa23 bb12 bb13 bb23 da1 da2
a1 a2 dm12 dm13 dm23 .
H
Using the substitution c1 = a1 b1 and c2 = a2 b2 (and setting c3 = a3 b3 ) we

obtain
Z
f (c1 , c2 , m12 , m13 , m23 ) cc12 cc13 cc32 dc 1 dc2
c1 c2 dm12 dm13 dm23 .
H
As AN is generated by these two types of elements, this proves the lemma.

The proof of Theorem 10.48 is now (essentially) reduced to a calculation.
Proof of Theorem 10.48. Since SL3 (R) acts on lattices Λ = gZ3 by left
multiplication, and the stabilizer of Z3 under this action is precisely SL3 (Z),
we have the identification

SL3 (R)/ SL3 (Z) ∼
= gZ3 | g ∈ SL3 (R) .
Applying Proposition 10.56 we see that there exists some c > 0 such
that SL3 (R) = B SL3 (Z), where B = KD is called a Siegel set, K = SO3 (R)
and the Borel measurable set D consists of all matrices in AN as in (10.29)
satisfying the conditions
0 < a1 6 ca2 6 c2 a3 , a3 = 1
a1 a2 , and n12 , n13 , n23 ∈ [− 12 , 12 ).
(r)
By Lemma 10.57 we can calculate mSL3 (R) (B) by calculating mAN (D). Note
that the conditions on the diagonal entries a1 , a2 , a3 imply that a1 ∈ (0, c1 ]
−1/2
and a2 ∈ [c2 a1 , c3 a1 ] for some constants c1 , c2 , c3 > 0. By Lemma 10.58
Z c1 Z c 3 a1
−1/2
(r) a1 a1 a2 da2 da1
mAN (D) 6 a2 (a1 a2 )−1 (a1 a2 )−1 a2 a1
0 c 2 a1
Z c1 Z c 3 a1
−1/2
= a31 a2 da2 da1

0 c 2 a1
Z c1 c3 a−1/2 Z c1
1
= 1
2 a31 a22 da1 = 1
2 a31 c23 a−1 2 2
1 − c2 a1 da1 < ∞.
0 c 2 a1 0
To prove the theorem we have to show that there exists a fundamental

domain contained inFB, that is, a Borel measurable subset F ⊆ B = KD
such that SL3 (R) = γ∈SL3 (Z) F γ.
For this we first prove the following injectivity claim: for any g0 ∈ SL3 (R)
there exists an open neighbourhood U of g0 such that h, h′ ∈ U and hγ = h′
for some γ ∈ SL3 (Z) implies γ = I. Indeed, if this were not true, we could
find two sequences (hn ) , (h′n ) converging to g0 as n → ∞ such that
h−1 ′ r
n hn = γn ∈ SL3 (Z) {I}.
However, this contradictsSthe fact that SL3 (Z) is a discrete subgroup of SL3 (R).
Next write SL3 (R) = n Kn as a countable union of compact subsets (for
example, define Kn to be the intersection of SL3 (R) with closed balls in R9
of radius n > 1 around 0). For each n choose a finite cover Un,1 , . . . , Un,mn
of Kn such that the above injectivity claim holds on each of these sets.
To simplify the notation, let us summarize the above by saying that we
have found a countable list of open sets U1 , U2 , . . . satisfying the injectivity
claim and covering all of SL3 (R). We now define F1 = B ∩ U1 and
F2 = (B ∩ U2 )r(F1 SL3 (Z)).
Assuming that F1 , . . . , Fn−1 are already defined, we define

Fn = (B ∩ Un )r (F1 ∪ · · · ∪ Fn−1 ) SL3 (Z) .
S∞
We claim that the set F = n=1 Fn is the desired fundamental domain. First
note that by construction we have F ⊆ B. Moreover, for a given g ∈ SL3 (R)
we know that (g SL3 (Z)) ∩ B 6= ∅ and hence there exists a minimal n ∈ N
such that
g SL3 (Z) ∩ B ∩ Un 6= ∅.
By the injectivity property on Un this intersection then consists of a single
element gγ for some γ ∈ SL3 (Z). By minimality of n we see that
gγ ∈
/ (F1 ∪ · · · ∪ Fn−1 ) SL3 (Z),
so that we have {gγ} = g SL3 (Z) ∩ Fn . Finally, by construction it also follows

that g SL3 (Z) ∩ Fm = ∅ for all m 6= n, giving {gγ} = g SL3 (Z) ∩ F . Hence F
is a fundamental domain and the theorem follows.
10.4 Highly Connected Networks: Expanders
In designing large connected networks (for example, connecting many com-

puters and servers) one is often confronted with two competing constraints:
• (High connectivity) Starting from any vertex, it should be easy to reach
any other vertex quickly (that is, in few steps).
• (Sparsity) The network should be economical, meaning that there should
not be an unnecessarily large number of edges in the network.
Clearly it is easy to achieve the first at the expense of the second by using
a complete graph (in which every pair of vertices has an edge joining them),
and it is easy to achieve the second at the expense of the first (by arranging
the edges so that the vertices are strung along a single line, so as to achieve
connectivity at the lowest possible cost).
Exercise 10.59. Analyze the number of edges as a function of the number of vertices in
the two extreme constructions of connected networks from above.
Of course there is another option of creating a centre vertex with a direct

connection to each of the existing vertices (something that might be called a
hub for an airline), but the centre vertex created in this way would be very
costly (or even technically impossible because of limits to the number of edges
at a single vertex) and would defeat the objective of achieving sparsity.
The notion of expander graphs is an attempt to achieve a balance between
the two constraints. In order to describe expanders, we will need some basic
notation from graph theory.
A graph G = (V, E) is a set of vertices V (the nodes of the network) and
edges E ⊆ V ×V giving the list of direct connections between vertices. We will
always assume that the graph is undirected , so each edge goes both ways and
the set E is symmetric. In particular, a pair of vertices is at most connected
by one edge. We will also assume that the graph is simple, meaning that
there is never an edge from a vertex to itself. Formally, the set of edges is a
subset of (V × V)r{(a, a) | a ∈ V} with the property that (a, b) ∈ E if and
only if (b, a) ∈ E. We identify (a, b) ∈ E with (b, a) so that |E| equals the
total number of edges in the graph, each of which is viewed as a two-way
connection.
The requirement of sparsity is achieved by requiring that the graph G be k-
regular for a fixed k. A graph G = (V, E) is said to be k-regular if, for any
vertex v ∈ V, there are exactly k edges from v to other vertices in V. We will
fix k and look for k-regular graphs with a large number of vertices (it is easy
10.4 Highly Connected Networks: Expanders 401
to see that a k-regular graph on n vertices exists if and only if n > k+1 and nk
is even). Notice that this will impose a sparsity condition on the graph, since
the number of edges |E| will be a linear function of the number of vertices |V|
(in contrast to the case of a complete graph, for which |E| = 21 |V| (|V| − 1)).
In order to define the notion of high connectivity, we will need some pre-
parations. A graph G = (V, E) is called connected if for any two v, w ∈ V there
exists a path from v to w in that there is a list v = v0 , v1 , v2 , . . . , vn = w
of vertices in V with (vi , vi+1 ) ∈ E for i = 0, . . . , n − 1. Such a path may
consist of a single vertex, so each vertex is connected to itself by a path of
length zero. Notice that there is a natural metric on any connected graph:
we may define d(v, w) to be the minimal length of a path from v to w (that
is, the minimal number of edges in a path joining v to w; see Figure 10.3 and
Exercise 10.60). In this metric the diameter of a connected graph G is the
minimal N ∈ N with the property that for any two vertices v and w there is
a path of length no more than N connecting v to w.
Exercise 10.60. Verify that the notion of distance on a graph defines a metric on the set
of vertices of a connected graph.
v1
v2
v
Fig. 10.3: Two points v, w at distance 3 in a connected graph.
The smaller the diameter is in comparison with V, the better the connectiv-
ity of the graph is. The worst case with the vertices strung out on a line (or
if we seek a 2-regular graph, arranged around a circle) has diameter |V| − 1
(or ⌊ |V|
2 ⌋). The other extreme case of a complete graph has diameter 1. In
the case of expander graphs we will see that such families may be found with
diameter N ≪ log |V|. The implied constant will depend on k and on ξ (as
in Definition 10.61) but is not allowed to depend on the particular graph G.
Considering the growth rate of the logarithm, it should be clear that this is
a formulation of high connectivity.
Definition 10.61 (Expanders). A sequence of finite k-regular graphs
(Gi = (Vi , Ei ))i>1

is an expander family if there exists a constant ξ ∈ (0, 1) (independent of i)

with
|∂S| > ξ min |S|, |VirS|
for any subset S ⊆ Vi and any i > 1, where
∂S = {v ∈ S | there exists a w ∈ VirS with (v, w) ∈ E}

∪ {v ∈ VirS | there exists a w ∈ S with (v, w) ∈ E}
is called the boundary of S.
A few comments are in order. We first note that the above definition of
the boundary of a subset of the vertex set of a graph does not coincide with
the boundary of S considered in the metric space Vi (the latter is empty
since Vi is discrete). Any finite collection of finite k-regular connected graphs
(formally, a sequence as in Definition 10.61 that repeats these) is an expander
family. As this is not at all interesting — and in particular does not achieve
the real benefit of the slower growth rate from the logarithmic bound on the
diameter — one usually requires in addition that |Vi | → ∞ as i → ∞. Notice
that we must also have k > 3, because k = 2 corresponds to a sequence of
regular polygons, which we quickly see cannot be an expander family.
An expander family consists of connected graphs, but as already mentioned
much more is true.
Proposition 10.62 (Small diameter). For an expander family (Gi )i , we

have diam Gi ≪ log |Vi |.
Proof. Given some vertex v ∈ Vi we claim that the metric ball
Ba (v) = {w ∈ Vi | d(v, w) 6 a}
|Vi |
has more than 2 elements if the integer a satisfies
log(|Vi |/2)
a>D= .
log 1 + ξ/(k + 1)
Assuming the claim, suppose that v, w ∈ Vi are any pair of vertices and set a
equal to ⌈D⌉. Then, by the claim, each of |Ba (v)| and |Ba (w)| is greater
than |V2i | , so that these two balls must have non-empty intersection. By the
triangle inequality, it follows that
d(v, w) 6 2(D + 1) ≪ξ,k log |Vi |,
giving the proposition.

|Vi |
To prove the claim, let n > 0, set S = Bn (v) and assume |S| 6 2 . We
then have
Bn+1 (v)rBn (v) = ∂SrS.
Note that every element of ∂S∩S must connect to one element of ∂SrS and at
most k elements of ∂S ∩ S can connect to the same element of ∂SrS. We can
use this to define a map from ∂S ∩ S to ∂SrS that is at most k-to-1, showing
that |∂S ∩ S| 6 k|∂SrS|. This, together with |∂S| = |∂S ∩ S| + |∂SrS|, gives
1
|∂SrS| > |∂S|.
k+1
Together with the defining property of expander graphs, and assuming as we
may that ξ ∈ (0, 1), we deduce that
|∂Bn (v)rBn (v)| > ξ

k+1 |Bn (v)|.
ξ
By induction we now prove |B0 (v)| = 1, |B1 (v)| = k + 1 > 1 + k+1 , and
n+1
|Bn+1 (v)| = |Bn (v)|+|∂Bn (v)rBn (v)| > 1+ k+1
ξ ξ
|Bn (v)| > 1+ k+1
for all n with |Bn (v)| 6 |V2i | . Since for n = a > D the lower bound is greater
than or equal to |V2i | , this proves the claim.
Thus expander families achieve a balance between the two constraints of
high connectivity (with logarithmic growth of the diameter) and sparsity
of the graph (with only linear growth of the number of edges and a fixed
number of edges at every vertex). However, several questions remain, the
most pressing of which are the following.
• Do expander families exist?
• What is their connection to functional analysis?
The first examples of expander families were found by Pinsker [87] (trans-
lated in [88]) using a non-constructive probabilistic argument. The same year
Margulis [67] (translation in [68]) was able to give an explicit construction(30)
using Každan’s Property (T) for the group SL3 (Z).
Towards the proof of this, we now exhibit a connection between the ex-
pander property and properties of eigenvalues of linear maps associated to
the graphs.
Let G = (V, E) be a finite graph and identify V with the set {1, 2, . . . , |V|}.
The adjacency matrix AG of the graph G is the matrix with |V| rows and |V|
columns and with entries in {0, 1} so that (AG )i,j = 1 if and only if there is
an edge from vertex i to vertex j. A simple graph G with adjacency matrix
 
010110
1 0 1 0 0 1
 
0 1 0 1 0 1
AG =   

1 0 1 0 1 0
1 0 0 1 0 1
011010
is shown in Figure 10.4.
2 3
1
4
G
6 5
Fig. 10.4: A connected 3-regular graph on 6 vertices.
Several properties of the graph are reflected in the properties of the ad-
jacency matrix. The matrix AG is symmetric by our standing assumption
on the graph G = (V, E). We also define MG = k1 AG , which is an averaging
operator in the following sense. A vector x ∈ R|V| may be thought of as a
function on the set of vertices, and applying MG to x gives a new function
which at the vertex i is equal to the mean of the values of the function x at
all the neighbours of i. By analogy with the discussion in Section 1.2, one also
studies the graph Laplace operator ∆G = I − MG . Since MG is symmetric, it
is diagonalizable and has only real eigenvalues. Moreover,
X X X X
X
|(MG x)i | = (MG )i,j xj 6 (MG )i,j |xj | = |xj |,
i i j i,j j
since X
(MG )i,j = 1 (10.31)
i
for all j by construction. Therefore, any eigenvalue λ on MG has |λ| 6 1 and

by (10.31) we see that λ1 = 1 is an eigenvalue (with the constant vectors as
eigenvectors). The relationship between the eigenvalues and connectivity is
illustrated by the following elementary lemma.
Lemma 10.63 (Connectivity). A k-regular graph is connected if and only
if 1 is a simple eigenvalue of MG .
Essential Exercise 10.64. Prove Lemma 10.63.
What we need next is a quantitative version of this relationship, which is
given by the following proposition.
Proposition 10.65 (Eigenvalues and expanders). Let (Gi = (Vi , Ei ))i>1
be a sequence of graphs. For each i, let Mi = MGi be the averaging operator
for Gi , and order its eigenvalues λ1 (Mi ) = 1 > λ2 (Mi ) > · · · > λ|Vi | (Mi ).
Suppose that there exists some ε > 0 with
λ2 (Mi ) 6 1 − ε (10.32)
for all i > 1. Then the sequence of graphs is an expander family.

The uniform estimate in (10.32) is called a spectral gap for the sequence
of graphs. The converse of Proposition 10.65 also holds, but we will not need
this direction (we refer to Lubotzky [66] for the proof).
Proof of Proposition 10.65. Let ε > 0 be as in the statement of the
proposition. Let G = Gi and M = Mi for some fixed i, so that λ2 (M ) 6 1 − ε.
Also let S ⊆ V be any subset with |S| 6 |V|
2 . We again think of vectors in R
|V|
as functions on V, and notice that M (1S ) = 1S + f∂S , where f∂S is a vector

that vanishes outside of ∂S and has absolute value less than or equal to 1 on
the elements of ∂S. We will estimate kf∂S k2 from above and below, and the
resulting estimate will prove the claim. In fact, it follows that
p
kM (1S ) − 1S k2 = kf∂S k2 6 |∂S|.
On the other hand, M is diagonalizable and so we may expand 1S into a sum

of eigenvectors X
1S = vj , (10.33)
j
†
corresponding to the eigenvalues λ1 = 1 > λ2 > · · · > λ|V| . This then gives
X
M (1S ) = λj vj ,
j
and finally
|V|
X
M (1S ) − 1S = (λj − 1)vj .
j=2
Furthermore, since M is symmetric we can assume that the vectors vj are

orthogonal to each other, so
v
u |V| X|V|
uX
kM (1S ) − 1S k2 > min |λj − 1| t 2
kvj k2 > ε vj .
26j6|V| 2
j=2 j=2
Thus we need to relate the last norm to the size of S. To this end, notice
p
that a constant 1 is an eigenvector for the eigenvalue λ1 = 1, k1k2 = |V|,
and h1S , 1i = |S|, so the orthogonal projection of 1S onto 1 is |V|
|S|
1. There-
fore, as in (10.33), we may subtract from 1S the vector v1 = |S|
|V| 1 and obtain
† If λj = λj+1 for some j ∈ {2, . . . , |V| − 1} we may and will assume vj+1 = 0.
X|V|

vj = 1S − |S|
|V| 1
2 2
j=2

|S|
> 1 − |V| k1S k2 (by restricting the sum to S)
p |S|
> 21 |S| (since |V| 6 12 )
and putting these inequalities together gives
p X|V| p

|∂S| > kM (1S ) − 1S k2 > ε vj > 2ε |S|.
2
j=2
As this holds for all subsets S ⊆ V with |S| 6 |V| 2 and all graphs in the
2
sequence (Gi )i>1 , we see that this is an expander family with ξ = ε4 .
10.4.1 Constructing an Explicit Expander Family
Corollary 10.66 (Explicit expander family). Let S = S −1 be a finite

symmetric set of generators (not containing the identity) of Γ = SL3 (Z).
Let (Vn ) be a sequence of finite sets on each of which Γ acts transitively, with
the property that the elements of S 2r{I} have no fixed points for the action.
We define the sequence of graphs (Gn ) by Gn = (Vn , En ) for all n > 1, where
vertices v, w in Vn are connected in Gn if s v = w for some s ∈ S. Then (Gn )
.
is a sequence of |S|-regular graphs that form an expander family.
We note that the generating set S, for example, could be taken to comprise
the 12 matrices given by  
1 ±1 0
0 1 0
0 0 1
together with all its conjugates by permutation matrices (a permutation mat-
rix is one obtained by permuting the rows or the columns of the identity mat-
rix). Furthermore, the sequence of sets with the transitive actions could, for
example, be Vn = SL3 (Z/pn Z), where pn denotes the nth odd prime number.
In this section we will prove Corollary 10.66. By Corollary 10.40 we know
that Γ = SL3 (Z) has property (T). In the following argument we let S = S −1
be a finite symmetric set of generators of Γ (not containing the identity). Such
a set exists by Exercise 10.37 (or Exercise 10.67 below).
Essential Exercise 10.67. Prove that SL3 (Z) is generated by the 12 ele-
ments given just after Corollary 10.66. (Note, however, that the argument
after Proposition 10.41 does not apply directly since Z is not a field.)
Proof of Corollary 10.66. By Lemma 10.36 all unitary representations

of Γ have uniform spectral gap, meaning that there exist a finite Q0 ⊆ Γ
and ε0 > 0 such that for any unitary representation π : Γ
⊥
ý
H there are
no (Q0 , ε0 )-almost invariant vectors in HΓ . Notice that Q0 ⊆ S k for
some k > 1 and that a (S, εk0 )-almost invariant vector is also (Q0 , ε0 )-almost
invariant. In other words, we may assume that Q0 = S and that for any unit-
ary representation π : Γ

ý H there are no (S, ε0 )-almost invariant vectors
Γ ⊥
in H .
Suppose now that (Vn ) is a sequence of finite sets with the property that
for every n the group Γ acts transitively on Vn and there are no elements
of Vn that are fixed by S 2r{I}.
We note that V = SL3 (Z/pZ) for any odd prime p satisfies these assump-
tions. In fact, V is itself a group, and we may use the reduction modulo p
map φ : Γ = SL3 (Z) → SL3 (Z/pZ) to define the action of Γ by γ v = φ(γ)v .
for all v ∈ V = SL3 (Z/pZ). Transitivity of this action follows since the sub-
group φ(SL3 (Z)) contains all elementary unipotent subgroups of SL3 (Z/pZ)
and since these generate SL3 (Z/pZ) (by the argument after the statement of
Proposition 10.41 applied to the field K = Fp = Z/pZ).
We return to the general case and fix some n > 1. Since Γ acts on the
finite set Vn we also obtain the unitary representation
πn : Γ ýH n = ℓ2 (Vn ) = CVn
.
defined by πn,γ f (v) = f (γ −1 v) for all γ ∈ Γ and f ∈ Hn . By transitivity of Γ
a Γ -invariant function in Hn must be constant, that is HnΓ = C1. Suppose
⊥
now that f ∈ HnΓ is a unit vector. By the uniform spectral gap property
above, there exists some γ ∈ S such that
kπn,γ f − f k > ε0 . (10.34)
We now show that this uniform claim implies that the sequence of
graphs (Gn ) defined by Gn = (Vn , En ) as in Corollary 10.66 is an expander
family by using Proposition 10.65. For this, let ε > 0 and suppose in addition
⊥
that f ∈ HnΓ is an eigenvector for the averaging operator Mn = MGn
associated to the graph Gn and eigenvalue λ2 (Mn ) > 1 − ε. By definition of
the graph structure and the averaging operator we have
1 X
Mn (f ) = πn,γ f,
|S|
γ∈S
and so
1 X
1 − ε 6 λ2 (Mn ) hf, f i = ℜ hMn (f ), f i = ℜ hπn,γ f, f i .
|S|
γ∈S
Fix some γ ∈ S. By using in addition that ℜ hπn,γ ′ f, f i 6 kf k2 = 1 for

all γ ′ ∈ Sr{γ}, we deduce from this that 1 − ε 6 |S| 1
|S| − 1 + ℜ hπn,γ f, f i
and therefore 1 − |S|ε 6 ℜ hπn,γ f, f i . However, this shows that
kπn,γ f −f k2 = kπn,γ f k2 −2ℜ hπn,γ (f ), f i+kf k2 6 2−2 (1−|S|ε)= 2|S|ε
ε2
for all γ ∈ S. Setting ε = 2|S|
0
, this contradicts (10.34). In other words, using
this ε we have λ2 (Mn ) < 1 − ε for every n, which shows that the assumptions
in Proposition 10.65 are satisfied, and so Corollary 10.66 follows.
10.5 Further Topics
• The above concludes our main discussion of non-abelian topological

groups. Abelian groups will appear in Section 11.4 and Section 12.8, which
together give, among other things, a complete classification of unitary rep-
resentations for locally compact abelian groups. We refer to Folland [32]
and [26] for more on the general theory of unitary representations of
locally compact groups.
• Homogeneous spaces are quotients of the form G/Γ where SL3 (R)/ SL3 (R)
is a very important example. These spaces have many interesting and im-
portant connections to geometry (see Helgason [44] and Ratcliffe [90]),
ergodic theory and dynamical systems [27, Ch. 9–11] and [25], algeb-
raic groups (see, for example, the monograph of Witte Morris [116]), and
number theory (an introduction to this large area of interaction may be
found in the monographs of Diamond and Shurman [22] and Serre [97]).
• Amenable groups play an important role in ergodic theory, see [27, Ch. 8].
• For further reading on property (T) we refer the reader to the monograph
of Bekka, de la Harpe and Valette [6], to [26], and for an account of the
special role played by property (T) in ergodic theory to the work of
Gorodnik and Nevo [41].
Chapter 11
Banach Algebras and the Spectrum
In this chapter we will study Banach algebras as introduced in Section 2.4.2.

For most of the discussion we will work over C and assume that the Banach
algebra A is unital, meaning that there is a multiplicative unit 1A . A mul-
tiplicative unit 1A ∈ A is an element with 1A a = a1A = a for all a ∈ A.
We assume that 1A 6= 0, or equivalently that A 6= {0} (in the literature
one sometimes also sees the assumption k1A k = 1, which will hold in all the
examples that we will consider). At first sight it seems as if the assumption
that A is unital excludes the important example (L1 (Rd ), +, ∗), but this may
be overcome by the simple construction in the following exercise.
Essential Exercise 11.1 (Adding a unit). Let A be a complex Banach
algebra. Define the algebra A1 = A ⊕ C with the convention that we write
the elements of A1 in the form a + λI with a ∈ A and λ ∈ C, use the
norm ka + λIk = kakA + |λ|, the obvious linear structure as a vector space
over C, and the multiplication (a + λI)(b + µI) = (ab + λb + µa) + λµI. Show
that with these definitions A1 is a unital Banach algebra with 1A1 = I being
its multiplicative unit.
11.1 The Spectrum and Spectral Radius
We say that an element a of a unital Banach algebra A is invertible if there

exists some b ∈ A called the inverse of a with ab = ba = 1A .
Definition 11.2. Let A be a unital Banach algebra over C. The spectrum of
an element a ∈ A is the set σ(a) = {λ ∈ C | a − λ1A is not invertible}. The
resolvent set is its complement ρ(a) = {λ ∈ C | a − λ1A is invertible}.
Let us note that the above generalizes the notion of an eigenvalue in the
following way: If A is the algebra of linear maps on Cd , then the spectrum
of an element T ∈ A equals the set of eigenvalues of T .

410 11 Banach Algebras and the Spectrum
Exercise 11.3. Let X be a compact topological space, and let A = C(X). Find the
spectrum of f ∈ C(X) as an element of the Banach algebra C(X).
Essential Exercise 11.4. In this exercise we describe the spectrum of mul-

tiplication operators Mg : H → H as in Exercise 6.25.
(a) Let µ be a compactly supported finite (or σ-finite) measure on C, and
write MI for the multiplication operator corresponding to the identity map,
so (MI (f ))(z) = zf (z) for f ∈ L2µ (C). Show that the spectrum σ(MI ) within
the algebra of bounded operators B(L2µ (C)) equals the support of µ.
(b) Let (X, B, µ) be a σ-finite measure space, H = L2µ (X) and g : X → C a
bounded measurable function. Show that the spectrum of the multiplication
operator Mg within the algebra of bounded operators coincides with the
essential range of g, which is defined to consist of all λ ∈ C with the property
that µ(g −1 (U )) > 0 for all neighbourhoods U of λ.
The following theorem will show that the spectrum is always non-empty,
and so provides us with generalized eigenvalues. Since even in finite dimen-
sions eigenvalues may be complex, we will only consider Banach algebras
over C.
Definition 11.5. For an element a of a complex unital Banach algebra the
spectral radius is maxλ∈σ(a) |λ|.
Theorem 11.6 (Spectrum and spectral radius formula). Let A be a
complex unital Banach algebra. Then for every a ∈ A the spectrum σ(a) is a
non-empty compact subset of C. Moreover, the spectral radius satisfies
p
max |λ| = lim n kan k. (11.1)
λ∈σ(a) n→∞
This theorem is the first of many that relate the algebraic to the topolo-
gical structure in Banach algebras. The spectrum and the spectral radius of
an element are defined in purely algebraic terms, whereas the limit is defined
in terms of the norm. One surprising consequence is the following observa-
tion: If A is a unital Banach algebra contained in a larger Banach algebra B
(with compatible structures), then it is possible for an element a ∈ A to
be non-invertible in A but to be invertible in B. Thus the spectrum of an
element depends on the algebra it is viewed in, and σB (a) ⊆ σA (a) with
strict containment being a possibility (see Exercise 11.7). Despite this, the
spectral radius of a ∈ A is not changed when it is viewed as an element of B,
since Theorem 11.6 expresses it in terms of the norms of powers of a, which
are not affected by the switch from A to B (by the implicit compatibility
assumption).
Exercise 11.7. Let U : ℓ2 (Z) → ℓ2 (Z) be the unitary shift operator from Exercises 6.1
and 6.23(a), so U ((xn )) = (xn+1 ).
(a) Show that the spectrum of U considered within the algebra B of all bounded operators
on ℓ2 (Z) is given by S1 = {λ ∈ C | |λ| = 1}.
(b) Now consider the Banach algebra A generated by U (obtained by taking the closed
linear hull of U 0 = I, U, U 2 , . . .). Show that the spectrum of U within A is {λ ∈ C | |λ| 6 1}.
11.1 The Spectrum and Spectral Radius 411
The above mentioned dependence of the spectrum of an element on the am-

bient algebra will not cause any confusion: In the abstract setting considered
here we will only work with one algebra at a time, and in the application of
these results in the context of operators on a Hilbert space H we will always
consider the algebra B(H) of all bounded linear operators on H.
Exercise 11.8 (There are no interesting Banach fields). Use Theorem 11.6 to show
that C is the only Banach algebra over C that is also a field.
Finally, let us comment on the precise shape of the spectral radius for-
mula (11.1). It will be relatively straightforward to show that scalars λ ∈ C
with |λ| > kak cannot belong to the spectrum of a ∈ A. However, it is also
clear that in general the norm may be much larger than the spectral radius.
Self-adjoint and, more generally, normal operators on Hilbert spaces will form
a nice exception to this. In fact, even in the elementary case of the algebra
of two-by-two matrices (equipped with the operator norm) the norm of the
matrix
1C
a=
0 1
can be made arbitrarily large by increasing the value of C, but the spectrum
always consists simply of 1 ∈ C. The right-hand side of the spectral radius
formula (11.1) essentially ignores the original size of the matrix a and instead
looks at the exponential growth rate of the norm of an . In the case at hand
the norm of an grows linearly which makes the right-hand side equal to one
(and thus equal to the left-hand side).
Exercise 11.9. Let k ∈ C([0, 1]2 ) be a continuous function, so that
Z x
K(f )(x) = k(x, t)f (t) dt
0
for f ∈ C([0, 1]) defines an operator K : C([0, 1]) → C([0, 1]). Determine σ(K).
For the proof of Theorem 11.6 we will use Cauchy integration on the
complex plane and convergent geometric series in the unital Banach algebra.
11.1.1 The Geometric Series and its Consequences
Given a unital Banach algebra A we will set a0 = 1A for any a ∈ A as

is customary. Also recall that the inverse of an invertible element a ∈ A is
uniquely determined by a.
Proposition 11.10. The set U of invertible elements of a unital Banach

algebra A is open. Moreover, for any a ∈ A the resolvent set ρ(a) is open
in C, so the spectrum is a closed set.
Proof. If a ∈ A and kak < 1, then the inverse of 1A − a is given by the

geometric series
∞
X
(1A − a)−1 = an . (11.2)
n=0
Indeed, since the right-hand side converges absolutely we may take the
product and obtain
∞
! ∞
! ∞ ∞
X X X X
n n
(1A − a) a = a (1A − a) = an − an = 1 A
n=0 n=0 n=0 n=1
as desired. This shows that B1 (1A ) ⊆ U.

Now let a0 ∈ U be any invertible element with ka − a0 k < ka−1
0 k
−1
. Then
we claim that a is also invertible, which will then show that U ⊆ A is open.
To prove the claim, notice that

a = a0 + (a − a0 ) = a0 1A + a−1
0 (a − a0 )
| {z }
∈B1 (1A )
is a product of two elements of U and so lies in U.

Finally, for any a ∈ A the resolvent set ρ(a) = {λ ∈ C | a − λ1A ∈ U} is
the pre-image of an open set under a continuous mapping, and so is open.
Therefore the spectrum σ(a) = Crρ(a) is closed.
Proposition 11.11.p Let A be a unital Banach algebra over C, and let a ∈ A.

If λ ∈ C has |λ| > m kam k for some m > 1, then λ ∈ ρ(a). In particular,
C
the spectrum σ(a) is a closed subset of Bkak and so is compact.
Proof. Let a ∈ A and λ ∈ C be as in the proposition. Then we claim that

X
∞
−1
(λ1A − a) = λ−1 1A + λ−1 a + · · · + λ−(m−1) am−1 λ−mn amn
n=0
(here and below we will sometimes study λ1A − a instead of a − λ1A , which
clearly will not make any difference). For this notice first that by assumption
∞
X ∞
X n
kλ−mn amn k 6 |λ|−m kam k < ∞,
n=0 n=0
| {z }
<1
P∞
so the series n=0 λ−mn amn converges absolutely. Moreover, by combining
the first three factors in the product
X
∞
−1 −1 −(m−1) m−1
(λ1A − a) λ 1A + λ a+ ···+ λ a λ−mn amn
n=0
we obtain that it equals

∞
X
1A − λ−m am λ−mn amn = 1A .
n=0
by (11.2). Noting that the factors commute with each other (which follows
easily from continuity of multiplication in the algebra), this proves the claim,
and so λ ∈ ρ(a).
The case m = 1 gives the remaining statement about the spectrum.
Exercise 11.12. Assume that (an ) and (bn ) are sequences in a Banach algebra satis-
fying an bn = bn an for all n > 1 and with limn→∞ an = a, limn→∞ bn = b. Show
that ab = ba.
11.1.2 Using Cauchy Integration
We have shown that σ(a) ⊆ C is compact for any element a ∈ A, but have
yet to show that σ(a) is non-empty. This existence theorem uses Cauchy
integration, and to prepare for this we need the following lemma concerning
the resolvent.
Lemma 11.13 (Resolvent function). Let a be an element of a unital

Banach algebra A over C. Then the resolvent function R : ρ(a) → A
defined by R(λ) = (λ1A − a)−1 is an analytic function in the sense that
for any λ0 ∈ ρ(a) there is an open neighbourhood of λ0 on which R is given
by an absolutely convergent power series
∞
X
R(λ) = bn (λ − λ0 )n
n=0
with coefficients bn ∈ A.
Proof. We use essentially the same formulas as those that arise in the proof
of Proposition 11.10. Let a ∈ A and λ0 ∈ ρ(a) be as in the lemma. Suppose
that λ ∈ C satisfies |λ − λ0 | < k(λ0 1A − a)−1 k−1 . Then
λ1A − a = (λ0 1A − a) − (λ0 − λ)1A

= (λ0 1A − a) 1A − (λ0 − λ)(λ0 1A − a)−1 ,
which shows that

∞
X n
R(λ) = (λ0 1A − a)−1 (λ0 1A − a)−1 (−1)n (λ − λ0 )n
n=0
is, for |λ − λ0 | < k(λ0 1A − a)−1 k−1 , an absolutely convergent power series,
as claimed.
With this analyticity we are ready to prove the first part of Theorem 11.6.
Proof that σ(a) is non-empty. Let a be an element of a unital Banach
algebra, and suppose that σ(a) is empty. We first sketch an argument that
produces a contradiction from this assumption, and then fill in the details.
An entire function. Since σ(a) is empty, the resolvent function
R(λ) = (λI − a)−1
is an entire function (that is, is an analytic function defined on all of C). It

follows by Cauchy’s integral formula that
I
R(z) dz = 0
γ
for any closed piecewise differentiable path γ in C. The alert reader may notice
that this usage of Cauchy integration is a bit unorthodox, but should read
on — this will be resolved below. In particular, if γ is the closed positively
oriented path with centre 0 and radius kak + 1, then
∞
X
−1
R(z) = (z1A − a)−1 = 1
z 1A − z1 a = z −n−1 an (11.3)
n=0
for any z on the path γ, and the sum is absolutely convergent. Therefore,
I I ∞
X
0= R(z) dz = z −n−1 an dz
γ |z|=kak+1 n=0
∞
X I (11.4)
n
= a z −n−1 dz = 2πi1A ,
n=0 |z|=kak+1
since I (
−n−1 2πi if n = 0,
z dz =
|z|=kak+1 0 if n =
6 0.
Now 1A 6= 0, and so (11.4) shows that the assumption σ(a) = ∅ leads to a
contradiction.
Using the standard Cauchy integral formula. The difficulty with
the argument sketched above is that most of the integrals are integrals of A-
valued functions. Even though it is possible to make sense of integration for A-
valued functions (see Proposition 3.81), we do not need to extend the Cauchy
integral formula for A-valued functions because of the following argument
(which could be used to prove such an extension).
Let ℓ ∈ A∗ be a linear functional with ℓ(1A ) 6= 0 (such a functional is
guaranteed to exist by Theorem 7.3) and consider ℓ ◦ R : ρ(a) = C −→ C. By
Lemma 11.13, R(z) can locally be represented as a power series. By continuity
of ℓ, the same holds for ℓ ◦ R. It follows that ℓ ◦ R : C → C is an entire
function (in the usual sense of complex analysis). Using this entire function
in the calculation in (11.4) we see that
I ∞
X I
0= ℓ ◦ R(z) dz = ℓ(an ) z −n−1 dz = 2πiℓ(1A ) 6= 0.
|z|=kak+1 n=0 |z|=kak+1
It follows that ℓ ◦ R cannot be defined on all of C, and so σ(a) is non-empty.

For the spectral radius formula in Theorem 11.6 we also need the follow-
ing elementary property of sub-additive and sub-multiplicative real-valued
sequences.
Definition 11.14. A real sequence (αn ) is sub-additive if αm+n 6 αm + αn

for all m, n > 1, and is sub-multiplicative if αn > 0 and αm+n 6 αm αn for
all m, n > 1.
Lemma 11.15 (Fekete’s lemma). Let (αn ) be a real sequence.

(1) If (αn ) is sub-additive then
αn αn
lim = inf .
n→∞ n n>1 n
(2) If (αn ) is non-negative and sub-multiplicative then

√ √
lim n
αn = inf n αn .
n→∞ n>1
Proof. Suppose first αn > 0 is sub-multiplicative. If αn0 = 0 for some n0 ,

then αn = 0 for all n > n0 and in this case the claim is trivial. On the other
hand, if all αn are stricly positive the statement follows from (1) applied to
the sequence (log αn ).
So consider now a real-valued sub-additive sequence (αn ) and let
αn
α = inf ,
n∈N n
so that αnn > α for all n > 1. Let β > α be arbitrary and pick k > 1
such that αkk < β. For any n > k we apply division with remainder to
get n = mk + j for some j ∈ {0, . . . , k − 1} and m > 1. By the sub-additivity
property we then have
αn αmk αj mαk jα1 mk αk jα1
6 + 6 + = + .
n n n n n n k n
If now n = mk + j is large enough we see that the right-hand side is less
than β, which proves the statement.
Proof of Theorem 11.6. Notice first that the sequence (αn ) defined by
αn = kan k
for n > 1 is sub-multiplicative, since
αm+n = kam+n k 6 kam kkan k = αm αn

p p
for all m, n > 1. Thus m kam k converges to inf m>1 m kam k by Lemma 11.15.
Proposition 11.11 shows that, if λ ∈ C satisfies
p
|λ| > inf m kam k,
m>1
then λ ∈ ρ(a). Thus

p p
max |λ| 6 inf m
kam k = lim m kam k.
λ∈σ(a) m>1 m→∞
Using the Cauchy integral formula. The reverse inequality is more

involved. It involves a refinement of the proof that σ(a) is non-empty, and
will use the Cauchy integral formula again. Let s = maxλ∈σ(a) |λ|, so that the
resolvent function R(z) = (z1A − a)−1 is analytic on {z ∈ C | |z| > s} ⊆ ρ(a).
Pick ℓ ∈ A∗ with kℓk 6 1 and fix ε > 0. Using the positively oriented closed
path with centre 0 and radius s + ε, we see that
I

−1 m
ℓ (z1A − a) z dz ≪a,ε (s + ε)m
|z|=s+ε
for all m > 0, where |z m | = (s+ε)m and the implicit constant only depends on
the restriction of R(z) = (z1A −a)−1 to {z ∈ C | |z| = s+ε}, and in particular
does not depend on m and ℓ. Expanding the circle to the radius kak + 1 does
not change the integral, so that we may use (11.3) again to see that
I I

−1 m

ℓ (z1A − a) z dz = ℓ (z1A − a)−1 z m dz
|z|=s+ε |z|=kak+1
∞
X I
= ℓ(an ) z −n+m−1 dz = 2πiℓ(am ).
n=0 |z|=kak+1
Together this gives

|ℓ(am )| ≪a,ε (s + ε)m
for all m > 1 and ℓ ∈ A∗ with kℓk 6 1. Using Corollary 7.4 we deduce that
kam k ≪a,ε (s + ε)m .
Taking the mth root and the limit we see that the implicit constant disap-
pears, and we get
11.2 C ∗ -algebras 417
p
lim m kam k 6 s + ε = max |λ| + ε.
m→∞ λ∈σ(a)
Since this holds for any ε > 0 the theorem follows.

The results of this and the following section will be used in Chapter 12 to
derive the spectral theory of bounded self-adjoint operators and their func-
tional calculus.
11.2 C ∗ -algebras
Definition 11.16. A Banach algebra A over C is a C ∗ -algebra if it has a

star operator ∗ : A → A with the following properties:
• ∗ is semi-linear;
• (ab)∗ = b∗ a∗ for a, b ∈ A;
• (a∗ )∗ = a for a ∈ A; and
• ka∗ ak = kak2 for a ∈ A (the C ∗ -property of the norm).
Example 11.17. (a) The algebra of bounded operators B(H) on a Hilbert

space H has a star operator, namely the map that sends A ∈ B(H) to its
adjoint A∗ ∈ B(H) (introduced in Section 6.2.1). For this star operator we
already know all the desired properties with the exception of the last (critical)
property. To see this last property, let A ∈ B(H), and notice that A∗ A is self-
adjoint since (A∗ A)∗ = A∗ (A∗ )∗ = A∗ A. Hence, by Lemma 6.31,
kA∗ Ak = sup |hA∗ Ax, xi| = sup hAx, Axi = kAk2 ,

kxk61 kxk61
as required. Therefore B(H) is a C ∗ -algebra.

(b) The space of bounded functions B(X), of continuous bounded func-
tions Cb (X), of measurable bounded functions L∞ (X), and of measurable
essentially bounded functions L∞ ∗
µ (X) are all commutative unital C -algebras.
For these multiplication is defined pointwise, and the star operator is point-
wise complex conjugation.
Essential Exercise 11.18. Show that ka∗ k = kak for a ∈ A if A is a C ∗ -

algebra.
Definition 11.19. Let A be a C ∗ -algebra. Then an element a ∈ A is called

self-adjoint if a∗ = a, and is called normal if a∗ a = aa∗ .
Essential Exercise 11.20. Let A be a unital C ∗ -algebra. Show that the

unit 1A is self-adjoint and has k1A k = 1.
For normal elements in a C ∗ -algebra the spectral radius formula simplifies.

Proposition 11.21 (Spectral radius formula for normal elements).

Let A be a unital C ∗ -algebra, and let a ∈ A be a normal element. Then the
spectral radius satisfies max |λ| = kak.
λ∈σ(a)
Proof. We will prove by induction on n that

n n
ka2 k = kak2 . (11.5)
The case n = 0 is trivial. For n = 1 we have
ka2 k2 = k(a2 )∗ a2 k = k(a∗ a)∗ (a∗ a)k = ka∗ ak2 = kak4 ,
where we used the C ∗ -property of the norm for a2 , normality of a, and the C ∗ -
property of the norm for a∗ a and for a. Now suppose that (11.5) holds for a
n
given n > 1 and set b = a2 . Then
n+1 n n+1
ka2 k = kb2 k = kbk2 = ka2 k2 = kak2 ,
where we used the definition of b, the case n = 1 for the normal element b,
and the inductive hypothesis. This concludes the induction, proving (11.5)
for all n > 0. Applying Theorem 11.6 now gives the proposition.
Starting in Section 12.5, we will use the results of this section and their
refinements in Section 11.3.4 to obtain the spectral theory of commutat-
ive C ∗ -subalgebras of bounded operators on a Hilbert space.
11.3 Commutative Banach Algebras and their Gelfand

Duals
Recall that the dual space A∗ of a Banach algebra A consists of all bounded
linear functionals A → C. If A is in addition commutative (with ab = ba for
all a, b ∈ A) then it is useful to study algebra homomorphisms. The trivial
map χ defined by χ(a) = 0 for all a ∈ A may also be considered an algebra
homomorphism, but we will exclude this trivial map in the discussion below.
Definition 11.22. Let A be a commutative Banach algebra over C. Then

the Gelfand dual σ(A) is the set of all non-trivial (equivalently, surjective)
continuous algebra homomorphisms
χ : A → C (which are also called char-
acters). That is, σ(A) = χ ∈ A∗r{0} | χ(ab) = χ(a)χ(b) for all a, b ∈ A .
11.3 Commutative Banach Algebras and their Gelfand Duals 419
11.3.1 Commutative Unital Banach Algebras
If the Banach algebra that we consider also has a unit, then we can link the
notion of algebra homomorphisms to the spectrum of the elements of the
algebra. The following result establishes this link and a great deal more.
Theorem 11.23 (Properties of the Gelfand dual). Let A be a commut-
∗
ative unital Banach algebra over C. Then σ(A) ⊆ B1A is non-empty and
weak* compact, and σ(a) = {χ(a) | χ ∈ σ(A)} for every a ∈ A.
We start the proof of Theorem 11.23 by showing that any algebra homo-
morphism χ : A → C is continuous(31) (and so strictly speaking the continuity
hypothesis in Definition 11.22 could be dropped).
Lemma 11.24. Let A be a commutative Banach algebra, and let χ : A → C
be an algebra homomorphism. Then χ is continuous and kχk 6 1.
Proof. Suppose there is an element a ∈ A with kak < 1 and with kχ(a)k > 1.
Replacing P
a by a/χ(a) we may assume that kak < 1 and χ(a) = 1. Then the
series b = ∞ n
n=1 a converges and satisfies a + ab = b, so that
1 + χ(b) = χ(a) + χ(a)χ(b) = χ(b),
a contradiction. This implies that χ is continuous and has kχk 6 1.

For the next steps we will need to use some more terminology from basic
algebra. Recall that an ideal J of a commutative algebra A is a subspace such
that AJ = {ab | a ∈ A, b ∈ J} ⊆ J, and that for any ideal J the quotient A/J
is also a commutative algebra with multiplication given by
(a + J)(b + J) = ab + J
for all a, b ∈ A. An ideal J ⊆ A is proper if J 6= A. A maximal ideal M in

a unital commutative algebra A is a proper ideal such that if J is an ideal
with M ⊆ J ⊆ A then J = M or J = A. The quotient of a unital algebra by a
maximal ideal M is always a field, for if a+ M ∈ A/M r{0} then J = Aa+ M
is an ideal strictly bigger than M , and so must be A. Since A has a unit 1A ,
we have ba + m = 1A for some b ∈ A and m ∈ M , so every non-zero element
of A/M has a multiplicative inverse.
The next lemma examines these general notions for Banach algebras.
Lemma 11.25. Let A be a commutative unital Banach algebra. The closure
of any ideal in A is an ideal, and any maximal ideal is closed.
Proof. The first claim is an easy consequence of the fact that the multiplic-
ation map A × A → A is continuous by the discussion in Section 2.4.2.
For the second claim, notice that a proper ideal J ⊆ A cannot contain 1A ,
nor indeed any invertible element. By Proposition 11.10 this implies that 1A
is not an element of J . Since a maximal ideal M is proper, and its closure M

is also a proper ideal, we see that M = M is closed.
We note that Lemma 11.25 gives a second proof that any algebra homo-
morphism χ : A → C on a commutative unital Banach algebra is continuous:
Given a non-trivial algebra homomorphism χ : A → C its kernel M = ker χ
is a maximal ideal, and so is closed. Then χ equals the composition of
the continuous projection A → A/M and the isomorphism A/M → C
induced by χ (which is continuous by finite-dimensionality), hence we see
that χ : A → A/M → C is a continuous map.
For the proof of Theorem 11.23 we need one more algebraic result.
Lemma 11.26. Let R be a commutative ring with a unit, and let J0 ⊆ R be

a proper ideal. Then there exists a maximal ideal M ⊆ R containing J0 .
Proof. This is a direct application of Zorn’s lemma. Define a set
S = {J ⊆ R | J is an ideal and J0 ⊆ J ( R}
with the partial order defined by inclusion. If {Jα | α ∈ I} is a linearly

ordered subset (a chain) in S, then
[
J= Jα
α∈I
is again an ideal. Moreover, since each Jα is proper, we have 1A ∈ / Jα for

all α in I, and so 1A ∈ / J, showing that J is also proper. By Zorn’s lemma,
it follows that the set S contains a maximal element, which by construction
is a maximal ideal containing J0 .
Proof of Theorem 11.23. Note that an algebra homomorphism χ : A → C

is non-trivial if and only if χ(1A ) = 1. Indeed, if χ(a) 6= 0 for some a ∈ A
then χ(a) = χ(1A a) = χ(1A )χ(a) shows that χ(1A ) = 1. By the definition
and Lemma 11.24 we have
n o
∗
σ(A) = χ ∈ B1A | χ(ab) = χ(a)χ(b) for all a, b ∈ A and χ(1A ) = 1
\
χ ∈ A∗ | χ(ab) = χ(a)χ(b)} ∩ {χ ∈ A∗ | χ(1A ) = 1 .
∗
= B1A ∩
a,b∈A
Since the sets {χ ∈ A∗ | χ(1A ) = 1} and {χ ∈ A∗ | χ(ab) = χ(a)χ(b)} are

closed in the weak* topology for every a, b ∈ A, we see that σ(A) is weak*
compact by the Banach–Alaoglu theorem (Theorem 8.10).
Now let a0 ∈ A be non-invertible, so that J = Aa0 is a proper ideal.
By Lemma 11.26 there is a maximal ideal M ⊆ A containing J . By
Lemma 11.25, M is closed. We claim that B = A/M is also a Banach al-
gebra. To see this, we equip B with the quotient norm (from Section 2.1.2)
which makes B into a Banach space by Lemma 2.29. Since M is an ideal,

multiplication is well-defined on A/M. Finally,
kab + MkA/M 6 k(a + m1 )(b + m2 )kA 6 ka + m1 kA kb + m2 kA
for all a, b ∈ A and all m1 , m2 ∈ M, which implies that
k(a + M)(b + M)kA/M 6 ka + MkA/M kb + MkA/M
by taking the infimum over m1 and m2 ∈ M, as required.

Thus A/M is a Banach algebra and a field (since M is maximal). We
claim this implies that
A/M = C(1A + M) ∼
= C.
Indeed (solving Exercise 11.8), if a + M ∈ A/M, then σ(a + M) 6= ∅ by

Theorem 11.6, and so a−λ1A +M is non-invertible for some λ ∈ C. However,
since A/M is a field this implies that a+M = λ1A +M and hence the claim.
Together we have shown that if a0 ∈ A is non-invertible, then there exists
a non-trivial algebra homomorphism χ : A → A/M ∼ = C with χ(a0 ) = 0.
Applying this for any a ∈ A to a − λ1A for λ ∈ σ(a), we see that for any
such λ there is some χ ∈ σ(A) with χ(a) = λ.
On the other hand, if χ(a) = λ for some a ∈ A, λ ∈ C and χ ∈ σ(A), then
χ(a − λ1A ) = 0
and hence a − λ1A cannot be invertible (since χ 6= 0). Together we have

shown the theorem.
Example 11.27 (Stone–Čech compactification). Let A = ℓ∞ (N), which is a

Banach algebra with respect to the pointwise product. Clearly for any n0 ∈ N
the map defined by
χn0 ((an )) = an0
is an algebra homomorphism, and N ∋ n0 7→ χn0 defines a map from N
to σ(A). The compact (but non-metrizable) topological space σ(A) is called
the Stone–Čech compactification of N and is denoted βN.
Exercise 11.28. (a) Show that the image of N is dense in β N.
(b) Show that ℓ∞ (N) can be canonically identified with C(β N).
(c) Show that β N is non-metrizable.
11.3.2 Commutative Banach Algebras without a Unit
While the notions of invertibility and spectrum are linked to the existence
of a unit, the definition of the Gelfand dual is not. However, the topological
properties of σ(A) are changed by the absence of a unit.
Corollary 11.29 (Properties of the Gelfand dual). Let A be a com-

∗
mutative Banach algebra over C. Then σ(A) ⊆ B1A is locally compact
(and σ(A) ∪ {0} is compact) in the weak* topology on A∗ . For any a ∈ A we
also have p
max |χ(a)| = lim n kan k.
χ∈σ(A)∪{0} n→∞
Recall that if X is a compact space and x0 ∈ X is any point, then the

space Y = Xr{x0 } is in general only locally compact. Moreover, in the case
when Y is not compact, the one-point compactification of Y is homeomorphic
to (and so can be identified with) X = Y ∪ {x0 }, where the point x0 takes
the role of ∞.
Exercise 11.30. Recall the definition of the one-point compactification of a locally com-
pact space Y and show the above claim.
Proof of Corollary 11.29. The proof that σ(A) ∪ {0} is compact in

the weak* topology is the same as in the proof of Theorem 11.23 since
Lemma 11.24 implies that
n o
∗
σ(A) ∪ {0} = χ ∈ B1A | χ(ab) = χ(a)χ(b) for all a, b ∈ A ,
∗
which is easily seen to be a closed subset of B1A in the weak* topology.
For the last claim of the corollary note that if A has a unit, then The-
orem 11.23 applies and gives the statement. So assume that A does not have
a unit, and consider the algebra A1 = A ⊕ C with the multiplication and
norm as in Exercise 11.1. As argued in the beginning of the proof of The-
orem 11.23, χ1 (1A ) = 1 for any χ1 ∈ σ (A1 ) so that χ1 is uniquely determined
by χ = χ1 |A ∈ σ(A) ∪ {0}. Moreover, any χ ∈ σ(A) ∪ {0} can be extended
to a character χ1 ∈ σ (A1 ) by setting
χ1 (a + λ1A ) = χ(a) + λ
for any a + λ1A ∈ A1 , which allows us to identify σ (A1 ) with σ(A) ∪ {0}.
Applying Theorems 11.23 and 11.6 to A1 now gives
p
max |χ(a)| = max |χ(a)| = lim n kan k.
χ∈σ(A)∪{0} χ∈σ(A1 ) n→∞

Exercise 11.31. Show that every character χ ∈ σ(A) can be extended to a character χ1 ,
as claimed in the proof of Corollary 11.29.
11.3.3 The Gelfand Transform
Definition 11.32. Let A be a commutative Banach algebra with Gelfand

dual σ(A). Then the map (·)o : A → C(σ(A)) defined by
f o (χ) = χ(f )
for f ∈ A and χ ∈ σ(A) is called the Gelfand transform.
Just as in Theorem 11.23 we will always use the weak* topology on σ(A).
Proposition 11.33. Let A be a commutative Banach algebra. The Gelfand

transform is an algebra homomorphism from A into C0 (σ(A)) (or C(σ(A))
if A has a unit) so that (f1 f2 )o = f1o f2o for all f1 , f2 ∈ A. Moreover, it
satisfies kf o k∞ 6 kf k for all f ∈ A.
Proof. By definition of the weak* topology, f o (χ) = χ(f ) depends continu-

ously on χ ∈ σ(A) for each f ∈ A. By Lemma 11.24,
|f o (χ)| = |χ(f )| 6 kχkkf k 6 kf k
for all χ ∈ σ(A), and so kf o k∞ 6 kf k. Finally, 0 ∈ A∗ plays the role of

infinity in the one-point compactification of σ(A) in Corollary 11.29. This
gives f o ∈ C0 (σ(A)), as required. Finally, f1 , f2 ∈ A and χ ∈ σ(A) implies
that (f1 f2 )o (χ) = χ(f1 f2 ) = χ(f1 )χ(f2 ) = f1o (χ)f2o (χ), which shows that the
Gelfand transform is an algebra homomorphism from A into C0 (σ(A)).
11.3.4 The Gelfand Transform for Commutative C ∗ -algebras
The Gelfand transform has good additional properties for C ∗ -algebras.
Corollary 11.34. Let A be a commutative unital C ∗ -algebra. Then the Gel-

fand transform is an isometric algebra isomorphism from A onto C(σ(A))
satisfying (a∗ )o = ao for all a ∈ A.
Proof. From Proposition 11.33 we know that the Gelfand transform is an

algebra homomorphism. For a ∈ A the norm kao k∞ of the Gelfand transform
equals the spectral radius of a (see Theorem 11.23). By Proposition 11.21 we
get kao k∞ = kak, since in a commutative C ∗ -algebra every element a ∈ A
is normal. This shows that (·)o : A −→ C(σ(A)) is an isometric algebra
homomorphism between A and a complete sub-algebra of C(σ(A)) which
contains the unit since 1oA = 1, and separates points since χ1 6= χ2 ∈ σ(A)
implies that there exists some a ∈ A with
ao (χ1 ) = χ1 (a) 6= χ2 (a) = ao (χ2 ).
Since σ(A) is a compact space we can apply the Stone–Weierstrass theorem

(Theorem 2.40) provided that we can also show that the image is closed under
conjugation. Once this is done, we can conclude that the image of the Gelfand
transform is both dense in C(σ(A)) and complete, and therefore must be all
of C(σ(A)).
To show closure under conjugation, it suffices to prove that χ(a∗ ) = χ(a)

for all a ∈ A and χ ∈ σ(A), which also implies that (a∗ )o = ao for all a ∈ A,
as claimed in the corollary. This in turn follows if we know that a = a∗ ∈ A
implies that χ(a) ∈ R for every χ ∈ σ(A). Indeed, any a ∈ A can be written
as
a + a∗ a − a∗
a= +i = aℜ + iaℑ ,
2 2i
∗ ∗
where both aℜ = a+a 2 and aℑ = a−a
2i are self-adjoint. Assuming that χ(aℜ )
and χ(aℑ ) are real, we deduce that
χ(a∗ ) = χ(aℜ ) − iχ(aℑ ) = χ(aℜ ) + iχ(aℑ ) = χ(aℜ + iaℑ ) = χ(a).
The following lemma then finishes the proof of the corollary.
Lemma 11.35. Let a = a∗ ∈ A be a self-adjoint element of a unital C ∗ -

algebra. Then σ(a) ⊆ R.
As we will see in the course of the proof, this can be deduced from Pro-
position 11.11. This might be a little confusing initially. How can a property
like maxλ∈σ(a) |λ| 6 kak imply that σ(a) is real? One way of viewing the
situation is to apply a vertical translation to the set σ(a), as illustrated in
Figure 11.1.
σ(a−iy1A )
kak
ka−iy1A k
C
Bkak
σ(a)
Fig. 11.1: Many possible λ ∈ C that satisfy the constraint |λ| 6 kak might not
p the constraint |λ − iy| 6 ka − iy 1k if the norm of a − iy1A for y ∈ R
satisfy
is kak2 + |y|2 as Figure 11.1 suggests. Taking y → ∞ and y → −∞ shows that
the spectrum σ(a) is a subset of R.
11.4 Locally Compact Abelian Groups 425
Proof of Lemma 11.35. By Proposition 11.11 we know that the spectral

radius of a − iy1A is at most ka − iyIk.
Let λ ∈ σ(a) and y ∈ R. Then λ − iy ∈ σ(a − iy1A ), and
|λ−iy|2 6 ka−iy1Ak2 = k(a−iy1A )∗ (a−iy1A )k = k(a + iy1A )(a−iy1A )k

= ka2 + y 2 1A k 6 ka2 k + y 2 k1A k = kak2 + y 2 ,
where we have used the C ∗ -property, the fact that a and 1A are self-adjoint,
and the fact that k1A k = 1 (see Exercise 11.20 and its hint on p. 581).
Writing λ = x0 + iy0 ∈ C with x0 , y0 ∈ R, the calculation above gives
x20 + y 2 − 2yy0 + y02 = x20 + (y − y0 )2 6 kak2 + y 2
for all y ∈ R. However, this shows that y0 = 0, and so σ(a) ⊆ R, as claimed.

Exercise 11.36. Let A be a commutative C ∗ -algebra.

(a) Show that the Gelfand transform is an isometry onto C0 (σ(A)).
(b) Show that σ(A) is compact if and only if A is unital.
(c) Assume now that A is not unital. Show that it is possible to define a norm on AI = A⊕C
so that AI is again a C ∗ -algebra. (The norm from Exercise 11.1 may not do this.)
11.4 Locally Compact Abelian Groups
An important special case of the Gelfand transform is given by the following

proposition, but we first need a definition. The reader may make this more
familiar by assuming that G = Rd or G = Td (cf. Exercise 1.3, which easily
generalizes to Td and Rd ).
Definition 11.37. Let G be a locally compact metrizable abelian group. The

dual group, character group, or Pontryagin dual of G is the abelian group
b = Hom(G, S1 ) = {χ : G → S1 | χ is a continuous homomorphism},
G
where S1 is the multiplicative unit circle and the group operation is pointwise
multiplication.
Proposition 11.38 (Algebra homomorphisms on L1 (G)). Let G be a σ-

compact locally compact metrizable abelian group, which we equip with a Haar
measure m = mG . Then L1 (G) is a separable commutative Banach algebra
with respect to the convolution defined by
Z
f1 ∗ f2 (g) = f1 (h)f2 (g − h) dm(h)
G
for f1 , f2 ∈ L1 (G). The Gelfand dual σ(L1 (G)) of all non-trivial algebra ho-
momorphisms from L1 (G) onto C is a locally compact σ-compact metrizable
b The Gelfand trans-
space which can be identified with the Pontryagin dual G.
form can be identified with the Fourier back transform.
b is a
We now explain the two identifications in more detail. If χG ∈ G
continuous group homomorphism
χG : G −→ S1 = {z ∈ C | |z| = 1},
then it gives rise to an algebra homomorphism χA on A = L1 (G) defined by

Z
χA (f ) = f (g)χG (g) dm(g),
G
which is well-defined since f ∈ L1 (G) and χG ∈ L∞ (G). The first identific-

ation claimed is the statement that every algebra homomorphism on L1 (G)
has this shape. This also explains the second identification as follows. The
Fourier back transform of an element f ∈ L1 (G) is the function fq on G b
defined by Z
q
f (χG ) = f (g)χG (g) dm(g)
G
for χG ∈ G.b Since we identify the Pontryagin dual Gb with the Gelfand
b corresponds precisely
dual σ(L1 (G)), we see that every character χG ∈ G
to one χA ∈ σ(A) and vice versa, and that
fq(χG ) = χA (f ) = f o (χA )
is the Fourier back transform and is at the same time also the Gelfand trans-
form.
Proof of Proposition 11.38. By Proposition 3.91, L1 (G) is a separable
Banach algebra. By Exercise 3.92 (see also Lemma 3.59(1)), L1 (G) is commut-
ative. The proof that every continuous group homomorphism χG : G → S1
gives rise to an algebra homomorphism
Z
1
χA : L (G) ∋ f 7−→ f χG dm
G
is very similar to the proof for the case G = Rd in Proposition 9.31 and is
therefore left to the reader.
Corollary 11.29 shows that σ(L1 (G)) is locally compact in the weak* to-
pology. The claimed metrizability follows from Proposition 8.11 since L1 (G)
is separable. Finally, the σ-compactness follows since σ(L1 (G)) ∪ {0} is also
compact and metrizable by Corollary 11.29 and Proposition 8.11.
The main claim of the proposition is therefore that every non-trivial al-
gebra homomorphism χA : L1 (G) → C arises from some continuous group
homomorphism χG : G → S1 . So let χA ∈ σ(L1 (G)). Then by Lemma 11.24

we have kχA k 6 1. By Proposition 7.34 there is an element χ ∈ L∞ (G)
with kχk∞ 6 1 such that
Z
χA (f ) = f χ dm
G
1
for all f ∈ L (G). We have to show that χ can be chosen in Cb (G) and with
the property that χ(gh) = χ(g)χ(h) for all g, h ∈ G (which will also imply
that χ(g) ∈ S1 for all g ∈ G).
For this proof we apply the algebra homomorphism property of the
map χA ∈ σ(L1 (G)) for f, f0 ∈ L1 (G) and obtain together with Fubini’s
theorem that
Z

f (h) χ(h)χA (f0 ) dm(h) = χA (f )χA (f0 ) = χA (f ∗ f0 )
G
Z Z
= f (h)f0 (g − h) dm(h)χ(g) dm(g)
ZG G Z
= f (h) f0 (g − h)χ(g) dm(g) dm(h).
G G
As this holds for any fixed f0 and for all f ∈ L1 (G), the uniqueness property
in Proposition 7.34 implies that
Z
χ(h)χA (f0 ) = f0 (g − h)χ(g) dm(g) = χA (f0h ) (11.6)
G
for almost every h ∈ G, where we write f0h (g) = f0 (g − h) for g, h ∈ G as

usual. We now fix some f1 ∈ L1 (G) such that χA (f1 ) 6= 0 and define
χG (h) = χA (f1 )−1 χA (f1h )

R
for h ∈ G, so that χG = χ almost everywhere and thus χA (f ) = G f χG dm
Now note that χA (f0h ) depends continuously on h ∈ G for any f0 ∈ L1 (G)
by the continuity claim in Lemma 3.74. Therefore χG is continuous and we
may replace χ by its continuous representative χG , which implies in turn
that (11.6) holds for χG in fact for all h ∈ G (as both sides of the equation
are now continuous with respect to h ∈ G). Applying the definition of χG
and this version of (11.6) for f0 = f1g2 and h = g1 we obtain
χG (g1 + g2 ) = χA (f1 )−1 χA (f1g1 +g2 ) = χG (g1 )χA (f1 )−1 χA (f1g2 )
= χG (g1 )χG (g2 )
for g1 , g2 ∈ G. In other words, χG : G → C is a continuous homomorphism

to the multiplicative structure of C. Since χG is bounded and not identically
zero, it follows that χG is non-zero everywhere, and that χG takes values

b The identification between
in S1 . This shows that χA is defined by χG ∈ G.
the Fourier back transform and the Gelfand transform follows from this, as
explained before the proof.
The next example shows that the Fourier transform (or in general the
Gelfand transform) is not an isometry.
Example 11.39. Let G = R and f1 = 1[0,1] ∈ L1 (R). Then
( −2πit πit −πit
−e−πit
b
e
−2πit
−1
= e−πit e 2πit = e πt sin(πt)
for t 6= 0,
f1 (t) =
1 for t = 0
so kfb1 k∞ = 1 = kf1 k1 , but the maximum value of |fb1 (t)| is attained precisely
at the point t = 0. Now consider
f2 (x) = 1[0,1] (x) − 1[−1,0] (x) = f1 (x) − f1 (−x)
with
fb2 (t) = fb1 (t) − fb1 (−t)
for t ∈ R and fb(0) = 0. Hence |fb2 (t)| achieves its maximum for some t0 6= 0,
so that
kfb2 k∞ = |fb2 (t0 )| 6 |fb1 (t0 )| + |fb1 (−t0 )| < 2kf1k1 = kf2 k1 ,
showing that the Fourier transform (and hence a Gelfand transform) need
not be an isometry.
Exercise 11.40. Let G be as in Proposition 11.38. When does L1 (G) have a unit with
respect to convolution?
As the following exercise shows, the theory developed above is quite power-
ful. In fact the original proof of the Wiener lemma [114] was complicated and
the Gelfand theory allows for a clean simple proof.
Exercise 11.41 (Wiener lemma for C(Td )). Let f ∈ C(Td ) be the limit of an abso-
lutely convergent Fourier series with f (x) 6= 0 for all x ∈ Td . Show that f1 is also the limit
of an absolutely convergent Fourier series.
Exercise 11.42. (a) (Wiener theorem for L1 (Td )) Let f ∈ L1 (Td ). Show that the
R
span hλy f | y ∈ Zd i is dense in L1 (Td ) if and only if fb(n) = f (x)χn (x) dx 6= 0 for
all n ∈ Z .
d
(b) (Wiener theorem for L1 (Rd )) Let f ∈ L1 (Rd ). Show that hλy f | y ∈ Rd i is dense
in L1 (Rd ) if and only if fb(t) 6= 0 for all t ∈ Rd .
11.4.1 The Pontryagin Dual
b is again a topological group.

We now show that G
Proposition 11.43 (G b is a topological group). Let G be a σ-compact

locally compact metrizable abelian group and let Gb be the dual group equipped
with the weak* topology from the identification with σ(L1 (G)) in Proposi-
tion 11.38. Then G b is also a locally compact σ-compact metrizable abelian
group. The weak* topology on G b is the topology of uniform convergence on
compact sets, that is equivalent to the topology defined by the neighbourhoods
b | kχ − χ0 kK,∞ < ε}
UK,ε (χ0 ) = {χ ∈ G
b for compact sets K ⊆ G and ε > 0.

of χ0 ∈ G
Proof. By Proposition 2.51, Cc (G) is dense in L1 (G) which, together with
Proposition 8.11, shows that the weak* topology on
b = σ(L1 (G)) ⊆ B L1 (G)∗

G 1
can be defined by functions in Cc (G). So let Nf1 ,...,fn ;ε (χ0 ) be a neighbour-

b defined by some functions f1 , . . . , fn ∈ Cc (G)r{0} and ε > 0.
hood of χ0 ∈ G Sn
Let M = maxj=1,...,n kfj k1 and K = j=1 Supp(fj ). If now χ ∈ UK,ε/M (χ0 ),
then Z Z

fj (χ − χ0 ) dm 6 |fj ||χ − χ0 | dm < kfj k1 M ε
6ε

G K
for j = 1, . . . , n. This shows that UK,ε/M (χ0 ) ⊆ Nf1 ,...,fn ;ε (χ0 ), which gives
one direction for the equivalence of the two topologies.
For the reverse direction, we fix some χ0 ∈ G, b a compact subset K ⊆ G,
and ε > 0. We need to find some f0 , f1 , . . . , fn ∈ L1 (G) and δ > 0 so that
Nf0 ,f1 ,...,fn ;δ (χ0 ) ⊆ UK,ε (χ0 ). (11.7)

1
We start by defining f0 = m(B 0)
1B0 for some compact neighbourhood B0
of 0 ∈ G such that Z
1 1
χ − 1 6 .
m(B0 ) 0 3
B0
We will now use a similar argument as in the proof of Proposition 11.38. In

fact, if χ ∈ Nf0 ; 31 (χ0 ) then |fq0 (χ) − 1| 6 32 and so |fq0 (χ)| > 31 . Moreover,
using the relation
Z
| h
f0 (χ) = f0 (g − h)χ(g) dm(g) = χ(h)fq0 (χ),
G
we see that −1 Z

χ(h) = fq0 (χ) f0h χ dm
G
for every h ∈ G. This also gives
|χ(h) − χ(0)| 6 3kf0h − f0 k1 (11.8)

for all χ ∈ Nf0 ; 13 (χ0 ) and h ∈ G. In other words, the equi-continuity proper-
ties of all elements χ ∈ Nf0 ; 13 (χ0 ) at 0 are controlled by the continuity of the
map G ∋ h 7→ f0h ∈ L1 (G).
We set δ = min{ 5ε , 13 }. Using (11.8) and Lemma 3.74 we find some open
neighbourhood B ⊆ G of 0 ∈ G with compact closure such that h ∈ B, g0 ∈ G,
and χ ∈ Nf0 ; 13 (χ0 ) implies
|χ(g0 + h) − χ(g0 )| = |χ(h) − χ(0)| < δ, (11.9)
where we used the fact that χ ∈ G b is a character. Since K ⊆ G is compact,

there exists a finite collection g1 , . . . , gn ∈ K such that
n
[
K⊆ (gj + B). (11.10)
j=1
1
We define fj = m(B) 1gj +B for j = 1, . . . , n and claim that Nf0 ,f1 ,...,fn ,δ (χ0 )
is the neigbourhood we were looking for. Indeed, let
χ ∈ Nf0 ,f1 ,...,fn ;δ (χ0 ) ⊆ Nf0 ; 31 (χ0 )
and fix some j ∈ {1, . . . , n}. Using (11.9) for g0 = gj and g = gj + h ∈ gj + B

we obtain |χ(g) − χ(gj )| < δ and hence also
Z

q
fj (χ) − χ(gj ) = fj χ dm − χ(gj ) < δ. (11.11)
For any g ∈ gj + B we can now combine (11.9) and (11.11) for χ and χ0 , and
the assumption χ ∈ Nfj ;δ (χ0 ) to obtain
|χ(g) − χ0 (g)| 6 δ + |χ(gj ) − χ0 (gj )| + δ

6 χ(gj )− fqj (χ)+ fqj (χ)− fqj (χ0 )+ fqj (χ0 )−χ0 (gj )+2δ

6 fqj (χ) − fqj (χ0 ) + 4δ < 5δ 6 ε
for all g ∈ gj + B. Varying j and using (11.10) this implies (11.7). Thus, the
neighbourhoods of χ0 ∈ G b in the weak* topology are precisely the neighbour-
hoods of χ0 with respect to the topology of uniform convergence on compact
subsets of G.
b it is straightforward to check
With this identification of the topology on G,
that the group operations are continuous. Indeed,
UK,ε (χ0 ) = UK,ε (χ0 )
b shows that the map

for all compact K ⊆ G, all ε > 0, and χ0 ∈ G
b ∋ χ 7→ χ−1 = χ
G
b we therefore know that χ ∈ UK,ε/2 (χ0 )
is continuous. Similarly, for χ0 , η0 ∈ G
and η ∈ UK,ε/2 (η0 ) imply
ε ε
kχη − χ0 η0 kK,∞ 6 kχη − χ0 ηkK,∞ + kχ0 η − χ0 η0 kK,∞ < 2 + 2 = ε,
b×G
showing continuity of the group operation G b ∋ (χ, η) 7→ χη ∈ G.
b
The following exercises give further examples of the duality between a
group and its dual, both viewed as topological groups.
b
Exercise 11.44. (a) Suppose that G is a compact metrizable abelian group. Show that G
is discrete (and countable).
b is compact (and
(b) Suppose that G is a countable discrete abelian group. Show that G
metrizable).
Exercise 11.45. Let G be a σ-compact locally compact metrizable abelian group.

(a) Suppose that G is connected as a topological space. Show that G b has no torsion
elements, meaning that for any χ ∈ Gb an identity χn = 1 for some n > 1 implies that χ = 1
is the trivial character.
(b) Suppose that G is compact and not connected as a topological space. Show that G b has
br{1} and some n > 1 such that χn = 1.
torsion elements: there is some χ ∈ G
11.5 Further Topics
• The study of L1 (G) for a locally compact abelian group can lead to a vast
generalization of the theory of Fourier series and the Fourier transform
to all such groups. This is known as Pontryagin duality or harmonic
analysis on locally compact abelian groups and will be discussed further
in Section 12.8.
• Another important class of Banach algebras with additional structure are
the von Neumann algebras. These are special C ∗ -sub-algebras of B(H) for
a Hilbert space H. We refer to Blackadar [11] for an overview.
Chapter 12
Spectral Theory and Functional
Calculus
In this chapter we use results from Chapter 11 to prove the spectral theorem
and develop the functional calculus for single self-adjoint operators and for
certain commutative C ∗ -algebras (arising, for example, from unitary repres-
entations of locally compact abelian groups). As an example of a self-adoint
operator we discuss the Laplace operator on a regular tree.
12.1 Definitions and Basic Lemmas
In this section we will study the spectrum (as defined for abstract algebras in
Section 11.1) in the context of bounded operators on a Hilbert space. More
precisely, we fix a complex Hilbert space H, let A = B(H) be the Banach
algebra of bounded operators, and study the spectrum of some T ∈ A.
12.1.1 Decomposing the Spectrum
Since an operator with non-trivial kernel cannot be invertible, it is clear that

any eigenvalue of T ∈ B(H) belongs to the spectrum of T . It is usual to call
the set of eigenvalues the discrete or point spectrum.
Definition 12.1 (Discrete spectrum). We say that λ ∈ C belongs to the

discrete spectrum of T ∈ B(H), and write λ ∈ σdisc (T ), if ker(T − λI) 6= {0}.
As we have already seen in Exercise 6.25 and Example 9.1, the discrete
spectrum may well be empty for a given bounded operator. For the operators
in these examples the notion of eigenvector has to be generalized to a sequence
of approximate eigenvectors in the following sense.
Definition 12.2 (Approximate point spectrum). We say that λ ∈ C be-

longs to the approximate point spectrum of T ∈ B(H), and write λ ∈ σappt (T ),

434 12 Spectral Theory and Functional Calculus
if there is a sequence of approximate eigenvectors (vn ) in H with kvn k = 1

for all n > 1, and with k(T − λI)vn k → 0 as n → ∞.
For normal operators we will see that the approximate point spectrum
coincides with the whole spectrum. We now try to describe the non-discrete
part of the spectrum further.
Definition 12.3 (Approximate spectrum). We say that λ ∈ C belongs

to the approximate spectrum of T ∈ B(H), and write λ ∈ σapprox (T ), if there
⊥
is a sequence of approximate eigenvectors (vn ) with vn ∈ (ker(T − λI))
and kvn k = 1 for all n > 1, and with k(T − λI)vn k → 0 as n → ∞.
Definition 12.4 (Continuous spectrum). We say that λ ∈ C belongs to

the continuous spectrum of T ∈ B(H) if λ ∈ σcont (T ) = σappt (T )rσdisc (T ).
We note that the notion of continuous spectrum is quite standard, that of

approximate spectrum less so. These two parts of the spectrum are similar,
and should both be thought of as a ‘complement’ to σdisc (T ) inside σappt (T ).
The advantage of σapprox (T ) over σcont (T ) is discussed in the exercises below.
Note also that
σappt (T ) = σdisc (T ) ⊔ σcont (T ) = σdisc (T ) ∪ σapprox (T ).
In general the approximate point spectrum may not yet describe the whole
spectrum, which motivates the next definition.
Definition 12.5 (Residual spectrum). We say that λ ∈ C belongs to the

residual spectrum of T ∈ B(H), and write λ ∈ σresid (T ), if λ ∈
/ σdisc (T ) and
im(T − λI) 6= H.
Exercise 12.6. (a) Show that σappt (T ) is a closed subset of C for any T ∈ B(H), and
that σapprox (T ) is a closed subset of C for any normal operator T ∈ B(H).
(b) Let H = L2 ([0, 1])2 and define T ∈ B(H) by T (f, g) = (MI f, f ) for all (f, g) ∈ H (so
that T (f, g)(x) = (xf (x), f (x)) for x ∈ (0, 1)). Show that σapprox (T ) is equal to (0, 1], and
in particular is not closed.
(c) Find an example of an operator T ∈ B(H) for which σdisc (T ) and σcont (T ) are not closed
subsets of C. More specifically, find an example of a self-adjoint operator for which σdisc (T )
is countable and dense in σapprox (T ) = σappt (T ) = [0, 1].
Exercise 12.7. Suppose Tj ∈ B(Hj ) for j = 1, 2 are bounded operators on two Hilbert
spaces H1 , H2 . Let T = T1 × T2 ∈ B(H1 × H2 ). Show that σdisc (T ) = σdisc (T1 )∪ σdisc (T2 ),
and similarly for σappt and σapprox . Find an example of a pair of self-adjoint operators
showing that the corresponding statement does not hold for the continuous spectrum.
Roughly speaking, for multiplication operators the discrete spectrum cor-

responds to atoms, and we would expect the continuous spectrum to corres-
pond to the continuous part of the measure, as discussed in the following
refinement of Exercise 11.4.
12.1 Definitions and Basic Lemmas 435
Exercise 12.8. (a) Let µ be a compactly supported σ-finite measure on C, and let
(MI (v))(z) = zv(z)
for v ∈ L2µ (C) be the multiplication operator corresponding to the identity map on C.
Show that
σdisc (MI ) = {λ ∈ C | µ({λ}) > 0},

σappt (MI ) = σ(MI ) = Supp(µ),
σapprox (MI ) = {λ ∈ C | µ (Ur{λ}) > 0 for every neighbourhood U of λ},
σcont (MI ) ⊇ Supp(µcont ),
and that σresid (MI ) is empty (here µcont is the measure determined by the decomposi-
tion µ = µcont + µdisc , where µcont has no atoms and µdisc is purely atomic).
(b) Let (X, B, µ) be a σ-finite measure space, and let g : X → C be a bounded measurable
function. Generalize (a) to the multiplication operator Mg on L2µ (X).
(c) Let X = [0, 1] ⊆ R, and let λcount be the counting measure on Q ∩ [0, 1] considered
as a σ-finite measure on X. Let MI be as in part (a). Describe each of the parts of the
spectrum of MI .
Example 12.9. Let T : ℓ2 (N) → ℓ2 (N) be the operator from Exercise 6.23(c)
defined by T (vn ) = (0, v1 , v2 , . . . ). Then kT vk = kvk for any v ∈ ℓ2 (N), and
so
0∈ / σappt (T ) = σdisc (T ) ∪ σapprox (T ).
However, the image of T is the proper closed subspace {v ∈ ℓ2 (N) | v1 = 0},
so 0 ∈ σresid (T ).
Exercise 12.10. For the operator T from Example 12.9, show that
σdisc (T ) = ∅,
σapprox (T ) = σcont (T ) = S1 = {λ ∈ C | |λ| = 1}, and
σresid (T ) = B1C = {λ ∈ C | |λ| < 1}.
The next lemma gives the main relationship between the parts of the
spectrum from this section and the spectrum in the sense of Definition 11.2
for A = B(H).
Lemma 12.11 (Decomposition of spectrum). Let H be a complex Hilbert

space, and let T ∈ B(H). Then
σ(T ) = σappt (T ) ∪ σresid (T ) = σdisc (T ) ∪ σapprox (T ) ∪ σresid (T ).
Moreover, the residual spectrum is empty if T is normal, so in this case the

spectrum coincides with the approximate point spectrum.
Proof. If λ ∈/ σ(T ) then, by definition, (T − λI)−1 ∈ B(H) and so vn ∈ H

with kvn k = 1 for all n > 1 implies
1 = kvn k = k(T − λI)−1 (T − λI)vn k 6 k(T − λI)−1 kk(T − λI)vn k.

This shows that (vn ) cannot be a sequence of approximate eigenvectors, and

hence λ ∈/ σappt (T ). Finally, if λ ∈
/ σ(T ) then T − λI is an onto map, and
so λ ∈
/ σresid (T ).
The reverse inclusion can be shown almost as directly. Suppose that
λ∈
/ σappt (T ) ∪ σresid (T ).
Then T − λI is injective, since in particular λ ∈

/ σdisc (T ), and there exists
some ε > 0 with
εkvk 6 k(T − λI)vk (12.1)
for all v ∈ H since λ ∈/ σapprox (T ). Therefore (T − λI) : H → im(T − λI) is
bijective and has an inverse (T − λI)−1 : im(T − λI) → H that is continuous
by (12.1). This implies that im(T − λI) is complete (check this), and so is
a closed subspace of H. Since λ ∈ / σresid (T ), it follows that im(T − λI) = H
and so T − λI is invertible, so that λ ∈ / σ(T ).
Suppose now that T : H → H is a normal operator, and that λ ∈ C has
V = im(T − λI) 6= H.
By normality, T ∗ (T − λI)v = (T − λI)(T ∗ v) for v ∈ H, which implies in

particular that V is T ∗ -invariant. By Lemma 6.30 we deduce that V ⊥ is T -
invariant. Now let v ∈ V ⊥r{0}. Then (T − λI)v ∈ V ⊥ by T -invariance,
and (T − λI)v ∈ V by definition. This implies that (T − λI)v = 0, and
so λ ∈ σdisc (T ). It follows that σresid (T ) = ∅.
12.1.2 The Numerical Range
The following definition is useful because it gives an ‘upper bound’ for the
spectrum.
Definition 12.12. The numerical range of T ∈ B(H) is the set
N (T ) = {hT v, vi | v ∈ H, kvk = 1}.
Lemma 12.13. The spectrum of T ∈ B(H) is contained in the closure of the

numerical range of T .
Proof. We have to show that λ ∈ CrN (T ) implies that λ ∈ / σ(T ). By

assumption, |h(T − λI)v, vi | = |hT v, vi − λ| > ε for some fixed ε > 0 and
all v ∈ H with kvk = 1. This shows that λ ∈ / σapprox (T ), and that any
vector v ∈ H with kvk = 1 is not orthogonal to im(T − λI), so λ ∈ / σresid (T ).
By Lemma 12.11, we deduce that λ ∈ / σ(T ).
Exercise 12.14. Show that N (T ) is really only an upper bound for the spectrum of an
operator T ∈ B(H) by showing that N (T ) is the convex hull of the eigenvalues of T if T is
diagonalizable, that is, if H admits an orthonormal basis consisting of eigenvectors of T .
12.2 The Spectrum of a Tree 437
The following is a direct consequence of Lemma 12.13 (giving an easy

alternative to the argument used in Lemma 11.35 in the setting of bounded
operators on a Hilbert space).
Lemma 12.15. If T ∈ B(H) is self-adjoint then σ(T ) ⊆ R.
Proof. For any v ∈ H we have hT v, vi = hv, T vi = hT ∗ v, vi = hT v, vi so

that hT v, vi ∈ R and hence σ(T ) ⊆ N (T ) ⊆ R by Lemma 12.13.
12.1.3 The Essential Spectrum
In this section we describe another notion of spectrum through a series of

exercises. Recall the definition of the Calkin algebra B(H)/ K(H) from Ex-
ercise 6.8. The spectrum of an operator T ∈ B(H) when considered in the
Calkin algebra is the essential spectrum of T , denoted σess (T ), and the spec-
tral radius of T in this algebra is the essential radius.
Definition 12.16. A bounded operator T ∈ B(H) on a separable Hilbert

space H is called a Fredholm operator if T is almost injective in the sense
that dim(ker(T )) < ∞, and almost surjective in the sense that T (H) is closed
and dim(H/T (H)) < ∞.
Clearly invertible operators are Fredholm, as is the operator in Ex-

ample 12.9.
Exercise 12.17. Show that I − A is Fredholm for any compact operator A ∈ K(H) on a
separable Hilbert space H.
Exercise 12.18 (Atkinson’s theorem). Let T ∈ B(H) be a bounded operator on a

separable Hilbert space H. Prove that T is Fredholm if and only if there exists some
operator S ∈ B(H) such that ST − I and T S − I are both compact.
Exercise 12.19. Let (X, µ) be a σ-finite measure space and g ∈ L∞ µ (X). Show that the
essential spectrum σess (Mg ) of the normal operator Mg is given by
σess (Mg ) = σapprox (Mg ) ∪ {λ ∈ σdisc (Mg ) | dim(ker(Mg − λI)) = ∞}.
12.2 The Spectrum of a Tree
In this section we want to study the spectrum of the Laplace operator on

a (p + 1)-regular tree (which has strong connections to the properties of the
random walk on the tree).
Let us recall that a graph is a set of vertices V together with a set of
edges E ⊆ V × V. We will assume that the graph is undirected, meaning
that (v, w) ∈ E if and only if (w, v) ∈ E for all v, w ∈ V. We write v ∼ w if

there is an edge (v, w) ∈ E joining v to w.
More concretely, we fix an integer p > 2 (the case p = 1 is quite different
and much easier, see Exercise 12.21) and suppose that (V, E) is a (p + 1)-
regular tree. This means that V is countably infinite, connected, every ver-
tex v ∈ V is connected to exactly (p + 1) further vertices by edges in E, and
there are no loops (see Figure 12.1).
v0
Fig. 12.1: The 3-regular tree is illustrated here by showing all vertices of distance
no more than 4 from a given initial vertex v0 (also called the root). Of course the
pattern repeats indefinitely, from w and from all the other vertices at distance 4
from our chosen root v0 .
At first sight there are three natural operators that we can define on ℓ2 (V)
using the tree structure (and our discussion will also involve a fourth). In the
following we fix p > 2 and a (p + 1)-regular tree (V, E).
Definition 12.20. The averaging operator on ℓ2 (V) is defined by
1 X
T (f )(v) = f (w),
p + 1 w∼v
for f ∈ ℓ2 (V). It replaces the value of a function at a vertex v by the aver-

age T (f )(v) of all values f (w) at the direct neighbours w ∼ v in the tree. The
summing operator is defined by S = (p + 1)T , simply summing the values at
the immediate neighbours. Finally, the Laplace operator ∆ = I − T compares
the value at each vertex with the average over all its immediate neighbours.
Clearly T , S, and ∆ are essentially equivalent. If one is understood well,
then the same applies to the other two.
Exercise 12.21. Set p = 1, so that we may think of the (p + 1)-regular graph as having
vertex set V = Z and edge set E = {(n, n ± 1) | n ∈ Z}. Show that the summing operator S
is self-adjoint, and describe its spectrum.
Exercise 12.22. Show that the summing operator S : ℓ2 (V) → ℓ2 (V) on a (p + 1)-regular
tree is a self-adjoint bounded operator with kSk 6 p + 1. Show that there is no eigen-
value λ ∈ σdisc (S) of absolute value |λ| = p + 1.
12.2.1 The Correct Upper Bound for the Summing Operator
While it is not difficult to see that kSk 6 p + 1, one might also guess that
this upper bound is not the real value of kSk. Indeed, the proof of the last
statement of Exercise 12.22 already hints at this. Due to the very rapid
growth in the number of vertices in balls BnV (v0 ) (measured with respect to
the natural path length on the tree), elements of ℓ2 (V) must decay rather
rapidly. We start by calculating kSk and go on to discuss the spectrum of S
on ℓ2 (V).
Theorem 12.23. (32) Let p > 2 and let (V, E) be a (p + 1)-regular tree. The
√
summing operator S : ℓ2 (V) → ℓ2 (V) satisfies kSk = 2 p 2 p
by considering the function f = fN defined by
( 1
p− 2 d(v,v0 ) if d(v, v0 ) 6 N ;
f (v) =
0 if d(v, v0 ) > N,
where d(·, ·) denotes the distance function on V (cf. p. 401), v0 ∈ V is a fixed

initial vertex of V, and N > 1 is an arbitrary integer. First note that
X X
kf k22 = 1 + p−1 + · · · + p−N
v∼v0 d(v,v0 )=N

−1
= 1 + (p + 1)p + · · · + (p + 1)pN −1 p−N = 1 + N 1 + p1 . (12.2)
Now calculate
X √ √
S(f )(v) = p−(n−1)/2 + p−(n+1)/2 = pp−n/2 + pp−(n+1)/2 = 2 pf (v)
w∼v
d(w,v0 )=n+1
whenever 1 6 n = d(v, v0 ) < N . This gives

N
X −1
√ 2 X √ 2
kS(f )k22 > (2 p) |f (v)|2 > (2 p) (N − 1) 1 + p1
n=1 d(v,v0 )=n
by using the same calculation as for kf k22 again. On dividing this lower bound
√
by (12.2) and letting N → ∞ we deduce that kSk > 2 p.

Exercise 12.24. Show that the sequence kf1 k fN in the previous proof is a sequence
N
of approximate eigenvectors of S in the sense of Definition 12.2.
For the proof of the upper bound we use an argument that goes back to
work of Gabber and Galil [38], which can also be used for other graphs.
Proof of the upper bound in Theorem 12.23. Let G = (V, E) be an
undirected graph with the property that every vertex v ∈ V has at most N
neighbours for some fixed N ∈ N. The summing operator S is again defined
by X
S(f )(v) = f (w)
w∼v
for v ∈ V. Notice that

X X 2 X
X 3
kS(f )k22 = f (w) 6 N 2 max |f (w)|2 6 N |f (w)|2
w∼v
v∈V w∼v v∈V w∈V
for all f ∈ ℓ2 (V), so that kSk < ∞. Given f1 , f2 ∈ ℓ2 (V) we have

X X
hf1 , Sf2 i = f1 (v) f2 (w)
v∈V w∼v
X XX
= f1 (v)f2 (w) = f1 (v)f2 (w) = hSf1 , f2 i,
w∼v w∈V v∼w
which shows that S = S ∗ is self-adjoint.

→ → →
We let E denote the set of edges in the directed graph G = (V, E ), where
we replace each edge in G by two edges going in either direction, so form-
→ →
ally E = {(v, w) ∈ V × V | v ∼ w}. Suppose that λ : E → R>0 is a function
−1
satisfying λ (w, v) = λ (v, w) for each pair of neighbours (v, w) ∈ E, and
suppose that X
ρ = sup λ(v, w) < ∞.
v∈V w∼v
We claim that this implies kSk 6 ρ.

To prove the claim fix some f ∈ ℓ2 (V). Then for any two neighbours v ∼ w
in G we have

|f (v)|2 λ(v, w) + |f (w)|2 λ(w, v) ∓ f (v)f (w) + f (w)f (v)
p p 2

= f (v) λ(v, w) ∓ f (w) λ(w, v) > 0
or equivalently

±2ℜ f (v)f (w) 6 |f (v)|2 λ(v, w) + |f (w)|2 λ(w, v).
Summing over all neighbouring vertices we obtain from this

X X X
±2ℜ hf, Sf i = ±2ℜ f (v) f (w) = ±2ℜ f (v)f (w)
v∈V w:w∼v v,w:v∼w
X X X X
6 |f (v)|2 λ(v, w) + |f (w)|2 λ(w, v)
v w:w∼v w v:v∼w
6 2ρkf k22.
Since S is self-adjoint we have hf, Sf i = hSf, f i ∈ R and therefore
|hSf, f i| 6 ρkf k22
for any f ∈ ℓ2 (V). Using Lemma 6.31 this implies that kSk 6 ρ.
It remains to define λ(v, w) in the context of the (p + 1)-regular tree so
√
that ρ = 2 p. We again use a root v0 ∈ V and define
(
p−1/2 if d(w, v0 ) = d(v, v0 ) + 1;
λ(v, w) =
p1/2 if d(w, v0 ) = d(v, v0 ) − 1.
With this we obtain

X √
λ(v0 , w) = (p + 1)p−1/2 = p1/2 + p−1/2 < 2 p
w∼v0
in the case v = v0 and

X √
λ(v, w) = p1/2 + pp−1/2 = 2 p
w∼v
in the case d(v, v0 ) > 1.
12.2.2 The Spectrum of S
We outline in this section how to obtain the complete description of the

spectrum of S.
Proposition 12.25. The spectrum of the summing operator S on a (p + 1)-
√ √
regular tree is the interval [−2 p, 2 p].
We will leave the details of the proof as an exercise, explaining just the
crucial ideas. Since S is self-adjoint, σ(S) ⊆ R by Lemma 12.15. By The-
√ √ √
orem 12.23 we know that kSk = 2 p, so σ(S) ⊆ [−2 p, 2 p]. For the reverse
inclusion we generalize Exercise 12.24 and give for each θ ∈ [0, π] a sequence
√
of approximate eigenvectors (fN ) for λ = 2 p cos θ. So we again fix a root
vertex v0 and define fN by
( d(v,v0 )
eiθ p−1/2 if d(v, v0 ) 6 N ;
fN (v) =
0 otherwise.

1
Exercise 12.26. Calculate SfN , and show that f
kfN k N
is a sequence of approximate
√
eigenvectors of S for λ = 2 p cos θ.
12.2.3 No Eigenvectors on the Tree
We outline in this subsection, via a series of exercises, a proof of the fact that
the summing operator S on the (p + 1)-regular tree has no discrete spectrum.
For this proof we will use yet another normalization of the averaging and
summing operators. We refer to this as the unitarily normalized summation,
U1 = √1 S.
p
In fact we will also need the operators Un for n > 0 as defined in the next
exercise.
Exercise 12.27. For any n > 0, let Un be the operator that maps any function f on
a (p + 1)-regular tree to the function Un (f ) defined by
1 X X
Un (f )(v) = f (w),
pn/2 k6n, w∼k v
k≡n(mod2)
where w ∼k v means that w and v have distance k in the (p + 1)-regular tree. Then the
sequence of operators (Un ) satisfies U0 = I, U1 = √1p S, and
Un+1 = U1 ◦ Un − Un−1
for n > 1.
The recurrence relation above is classical.(33)
Definition 12.28. The Chebyshev polynomials of the second kind are the
polynomials Un ∈ Z[x] defined recursively by U0 (x) = 1, U1 (x) = 2x,
and Un+1 (x) = 2xUn (x) − Un−1 (x) for n > 1.
This sequence of polynomials has the following concrete connection to

trigonometric functions.
Exercise 12.29. Let x = cos θ for some θ ∈ (0, π). Show that

sin (n + 1)θ
Un (x) = (12.3)
sin θ
for all n > 1.
Exercise 12.30. Suppose that f ∈ ℓ2 (V) is an eigenfunction for U1 with corresponding

eigenvalue λ ∈ [−2, 2], and derive a contradiction as follows.
(a) Show that
X 2
1 1
|f (w)|2 > U2n (cos θ) − U2n−2 (cos θ) |f (v)|2
w∼2n v
2 p
12.3 Main Goals: The Spectral Theorem and Functional Calculus 443
for any n > 1.

(b) Show that it is enough to consider the case λ > 0 so that we may write λ = 2 cos θ for
some θ ∈ [0, π2 ].
(c) Assuming f (v) 6= 0, show that
X
|f (w)|2 −→ ∞
d(w,v)6n
as n → ∞, and conclude that U1 (or, equivalently, S) has no eigenfunctions in ℓ2 (V).
12.3 Main Goals: The Spectral Theorem and Functional

Calculus
The main goal of this chapter is to establish two related theorems about
normal operators, the first of which gives a complete classification of normal
operators in terms of operators as in the next example (which featured in
other forms before).
Example 12.31. Let H = L2 (X, µ) for a σ-finite measure space (X, µ), and
let g : X → C be a bounded measurable function. The multiplication oper-
ator Mg is then normal on H. We claim that the spectrum σ(Mg ) is the essen-
tial range of g, consisting of all z ∈ C with the property that µ(g −1 U ) > 0
for any neighbourhood U of z. Note first that we have Mg − λI = Mg−λ .
If Xλ = {x ∈ X | g(x) = λ} has positive measure (which clearly implies
that λ belongs to the essential range), then λ lies in σdisc (Mg ) since, for ex-
ample, 1B ∈ ker(Mg − λI)r{0} for any measurable B ⊆ Xλ of positive finite
measure. If on the other hand λ lies in σdisc (Mg ) and v ∈ ker(Mg − λI)r{0},
then µ({x ∈ X | v(x) 6= 0}rXλ ) = 0, so that µ(Xλ ) > 0.
So suppose now g(x) 6= λ almost everywhere. Then we can solve the equa-
tion (Mg − λI)u = v, formally, for any v ∈ L2 (X, µ), by putting
u = (g − λ)−1 v,
and this is in fact the only solution as a set-theoretic function on X. It follows

that λ ∈
/ σ(Mg ) if and only if the operator
v 7−→ (g − λ)−1 v
is a bounded linear map on L2 (X, µ). By Corollary 4.30, we know this is

equivalent to asking that (g − λ)−1 be an L∞
µ function on X. This translates
to the condition that there exists some C > 0 such that
µ({x ∈ X | |(g(x) − λ)−1 | > C}) = 0,

or equivalently that µ({x ∈ X | |g(x) − λ| < C1 }) = 0, which means that λ is

not in the essential range of g. This chain of equivalences gives the claim.
It is convenient to observe that the essential range coincides with the
support of the push-forward measure ν = g∗ µ on C so that the above can be
reformulated as
σ(Mg ) = Supp(g∗ µ). (12.4)
In particular, if X is a bounded subset of C and g(z) = I(z) = z, then the
spectrum of MI is the support of µ.
Exercise 12.32. Let (X, µ) be a σ-finite measure space. Show that there exists a finite
measure ν on X with ν ≪ µ ≪ ν and a unitary isomorphism Φ : L2 (X, µ) → L2 (X, ν)
such that Φ ◦ Mg = Mg ◦ Φ whenever g : X → C is measurable and bounded and Mg acts
on the spaces L2 (X, µ) and L2 (X, ν), respectively.
Theorem 12.33 (Spectral theorem for normal operators). Let H be

a separable complex Hilbert space, and let T ∈ B(H) be a normal operator
on H. Then there exists a finite measure space (X, µ), a bounded measurable
function g : X → C, and a unitary isomorphism φ : H → L2µ (X) such that
T
H −−−−→ H
 

φy
φ
y
L2µ (X) −−−−→ L2µ (X)
Mg
commutes.
As we will see, we can always choose

F X = σ(T ) × N, which we will identify
with the countable disjoint union n∈N σ(T ). Moreover, the measure µ on X
will be obtained from countably many spectral measures which we will define
using the continuous functional calculus. Finally, g will be the bounded con-
tinuous map g(z, n) = z on X.
The second goal is to establish the measurable functional calculus, which
allows us to obtain normal operators f (T ) from any normal T ∈ B(H) and
any bounded measurable f ∈ L ∞ (σ(T )). Notice that the function f in this
formulation lies in the space L ∞ (σ(T )) defined in Example 2.24(8) rather
than L∞ µ (σ(T )) for some measure µ, because we do not have, at this stage,
any preferred measure (see Exercise 12.71 for more on this). For a given
normal T ∈ B(H) this assignment
L ∞ (σ(T )) ∋ f 7−→ f (T ) ∈ B(H) (12.5)
has many natural functorial properties:

n
X n
X
(FC1) (Polynomials) If f (z) = aj z j for all z ∈ σ(T ), then f (T ) = aj T j .
j=0 j=0
12.3 Main Goals: The Spectral Theorem and Functional Calculus 445
(FC2) (Continuity) The map in (12.5) is continuous, with kf (T )kop 6 kf k∞

for f ∈ L ∞ (σ(T )), and is an isometry on C(σ(T )), meaning that
kf (T )kop = kf k∞ (12.6)
for f ∈ C(σ(T )).

(FC3) (Algebra) The map in (12.5) is an algebra homomorphism. In partic-
ular, f1 (T ) commutes with f2 (T ) for f1 , f2 ∈ L ∞ (σ(T )). Moreover,
(f (T ))∗ = f (T ∗ )
for f ∈ L ∞ (σ(T )).

(FC4) (Multiplication operators) If H is unitarily isomorphic to L2µ (X) and T
in B(H) corresponds (via φ and the commutative diagram) to Mg on L2µ (X)
as in Theorem 12.33, then f ◦ g is defined almost everywhere and f (T ) cor-
responds to Mf ◦g for any f ∈ L ∞ (σ(T )).
(FC5) (Commuting operators) If V ⊆ H is a closed subspace that is invariant
under both T and T ∗ , then V is also f (T )-invariant for all f ∈ L ∞ (σ(T )).
Moreover, if S ∈ B(H) commutes with the normal operator T ∈ B(H) and
its adjoint, then S also commutes with f (T ) for all f ∈ L ∞ (σ(T )).

(FC6) (Iteration) If f ∈ L ∞ (σ(T )) and h ∈ L ∞ f (σ(T )) , then
h(f (T )) = (h ◦ f )(T ).
Theorem 12.34 (Measurable functional calculus for normal operat-

ors). Let H be a complex Hilbert space, and T ∈ B(H) a normal operator.
Then there exists a functional calculus for T — that is, a uniquely determined
map as in (12.5) with the properties (FC1)–(FC6).
Example 12.35. (a) Suppose T ∈ B(H) is P a normal operator on a complex

Hilbert space H, and suppose that f (z) = n>0 an z n is a power series with
C
radius of convergence R such that σ(T ) ⊆ BR . Then we may restrict the
function f to σ(T ) so that the power series converges uniformlyPand abso-
lutely. By combining (FC1) with (FC2), it follows that f (T ) = n>0 an T n
is also defined by the absolutely converging power series.
(b) Let (X, µ) be a finite (or σ-finite) measure space, g : X → C a bounded
measurable function, and let
Mg : L2µ (X) → L2µ (X)
be the multiplication operator as in Example 12.31. For f ∈ C[z] it follows

from (FC1) that f (Mg ) = Mf ◦g . Hence it is reasonable to expect that this
holds more generally for f ∈ C(σ(Mg )) or even f ∈ L ∞ (σ(Mg )) as in (FC4).
Note that the composition f ◦ g is well-defined in L∞
µ (X), although the image
of g might not lie entirely in σ(Mg ) (and so in the domain of the continuous or
measurable function). In fact, the description of the spectrum σ(Mg ) in (12.4)
shows that
/ σ(Mg )}) = g∗ µ(Crσ(Mg )) = 0,

µ({x ∈ X | g(x) ∈
(the complement of the support being the largest open set with measure 0)
so that, for almost every x ∈ X, g(x) lies in σ(Mg ) and therefore f (g(x)) is
defined for almost every x (we can set the value of the function f (g(x)) on
the zero-measure subset where g(x) ∈ / σ(Mg ) to be 0). This shows that all
the expressions in property (FC4) make sense.
It is tempting to say that in view of (FC4) the existence of the map

in (12.5) is simply a consequence of Theorem 12.33. However, if we really
used Theorem 12.33 and (FC4) as the definition of the functional calculus
then we would not know whether it is canonical — that is, independent of the
isomorphism φ in Theorem 12.33. The fact that we will define the functional
calculus independently of the isomorphism φ, but nonetheless obtain (FC4)
as one of its properties, demonstrates that there is only one reasonable way
to define f (T ) for f ∈ L ∞ (σ(T )). In particular, we obtain the uniqueness
claimed in Theorem 12.34.
Exercise 12.36. Show that Theorem 12.33, (FC4), and (FC5) together imply the unique-
ness claim in Theorem 12.34 (despite the difference in the assumptions on H).
For simplicity we will start with the case of self-adjoint operators, which
only needs the material from Section 11.1 and Section 11.2. In Section 12.5
we will start the discussion of commutative C ∗ -sub-algebras of B(H), which
includes the case of finitely many commuting normal operators and builds on
Section 11.3.
Before continuing, let us note a slightly confusing point in the notation
for the functional calculus. As usual I denotes the identity map x 7→ x
and 1 denotes the constant function x 7→ 1. Thus (FC1) states in particular
that 1(T ) = I and I(T ) = T for any normal operator T ∈ B(H). The
connection to multiplication operators should help to explain why this makes
sense.
12.4 Self-Adjoint Operators
The goal of this section is to show how to define an operator f (T ) where

the operator T ∈ B(H) is a self-adjoint operator and f ∈ C(σ(T )) and to
use this functional calculus to clarify the relationship between the spectrum
of an operator and its action on vectors. This will imply Theorem 12.33 for
these operators.
12.4 Self-Adjoint Operators 447
12.4.1 Continuous Functional Calculus
For certain functions f the definition of f (T ) for an operator T on a Hilbert

space H is clear. For example, if
d
X
p(z) = aj z j
j=0
is a polynomial with coefficients in C restricted to σ(T ), then the only reas-

onable definition for p(T ) is
d
X
p(T ) = aj T j ∈ B(H),
j=0
where T 0 is defined to be the identity I on H.

In fact, this polynomial definition makes sense for any T ∈ B(H), not
only for T normal, but there is a technical point which explains why only
normal operators are really suitable here. If σ(T ) is finite, then a polynomial
of unknown degree is not uniquely determined by its restriction to σ(T ). Thus
in this case the definition above a priori only gives a map C[T ] → B(H), not
one defined on C(σ(T )). We cannot hope to have a functional calculus only
depending on the spectrum if this dependency is real, and simple examples
show that it sometimes is. Consider, for example, the operator A ∈ B(C2 )
given by the matrix
01
A= .
00
Clearly σ(A) = {0}, so the polynomials p1 (z) = z and p2 (z) = z 2 coincide
when restricted to σ(A), but p1 (A) = A 6= 0 = p2 (A).
However, if we assume that T is normal, this problem does not arise, be-
cause then (as we will show) for every p ∈ C[z] we have kp(T )kop = kpk∞,σ(T ) ,
as claimed in (12.6). This suggests that we should attempt to define, for any
function f ∈ C(σ(T )), the functional calculus for T applied to f by
FCT (f ) = lim pn (T ), (12.7)

n→∞
where (pn ) is a sequence of polynomials with kf − pn k∞,σ(T ) → 0 as n → ∞,

which will then allow us to define f (T ) to be FCT (f ).
This definition is indeed sensible and possible, and the basic properties of
this construction are given in the following theorem. Roughly speaking, any
operation on (or property of) the function f which is reasonable corresponds
to an analogous operation on (or property of) f (T ).
Theorem 12.37 (Continuous functional calculus). Let H be a complex

Hilbert space and T ∈ B(H) a self-adjoint bounded operator. Then there exists
a unique linear map
FC = FCT : C(σ(T )) −→ B(H),
which we will also denote by f 7→ f (T ) = FC(f ), with the following properties:

Pd
(1) For any polynomial p(z) = j=0 aj z j ∈ C[z] we have
d
X
FC(p) = p(T ) = aj T j
j=0
(that is, FC(f ) is just f (T ) when f is a polynomial, extending the defin-

ition above).
(2) For any f ∈ C(σ(T )) we have
kFC(f )kop = kf k∞,σ(T ) . (12.8)
(3) The map FC is a Banach algebra homomorphism, meaning that
FC(f1 f2 ) = FC(f1 )FC(f2 )
for f1 , f2 ∈ C(σ(T )) and FC(1) = I. For any f ∈ C(σ(T )), we have
FC(f )∗ = FC(f¯)
(that is, f (T )∗ = f¯(T )), and in particular f (T ) is normal.

(4) If λ ∈ σdisc (T ) is in the point spectrum then
ker(T − λI) ⊆ ker(f (T ) − f (λ)I).
If T = Mg for some bounded measurable g : X → C and a finite measure

space (X, µ), then f (Mg ) = Mf ◦g for all f ∈ C(σ(Mg )).
As already observed, the essence of the proof of existence of FC is to show
that (12.7) is a valid definition. After having established its existence we will
again simply write f (T ) = FCT (f ) for all f ∈ C(σ(T )).
Lemma 12.38. Let H be a complex Hilbert space.
(1) For T ∈ B(H) and a polynomial p ∈ C[z], define p(T ) ∈ B(H) as before.
Then
σ(p(T )) = p(σ(T )). (12.9)
(2) Let T ∈ B(H) be normal and let p ∈ C[z] be a polynomial. Then
kp(T )kop = kpk∞,σ(T ) . (12.10)
Proof. For (1), observe first that the statement is trivially true if p is a
constant. If p has degree at least one, then fix λ ∈ C and factor the polyno-
mial p(z) − λ in C[z] to give

Y
p(z) − λ = α (z − λi ),
16i6d
for some α ∈ Cr{0} and complex numbers λ1 , . . . , λd ∈ C (not necessarily

distinct). Since p 7→ p(T ) is an algebra homomorphism, it follows that
Y
p(T ) − λI = α (T − λi I).
16i6d
If λ ∈
/ p(σ(T )), then the solutions λi to the equation p(z) = λ are not in σ(T ),
so each factor T −λi I is invertible, and hence p(T )−λI is invertible. It follows
that
σ(p(T )) ⊆ p(σ(T )).
Conversely, if λ ∈ p(σ(T )), then one of the λi must lie in σ(T ). Because the
factors commute, we can assume without loss of generality that either i = 1
if T − λi I is not surjective — in which case p(T ) − λI is not surjective either,
or i = d if T − λi I is not injective — in which case neither is p(T ) − λI. In
all situations, λ ∈ σ(p(T )), proving the reverse inclusion by Proposition 4.25.
The use of Proposition 4.25 here can be avoided by using the fact that
σ(T ) = σappt (T ) ∪ σresid (T )
in Lemma 12.11 and arguing for λi ∈ σappt (T ) as in the case where T − λi I

is not injective.
For (2), we note first that FC(p) = p(T ) is normal if T is. By the improved
spectral radius formula (Proposition 11.21), we have
kp(T )kop = max |λ|,

λ∈σ(p(T ))
and by (12.9), we get
kp(T )kop = max |λ| = max |p(λ)| = kpk∞,σ(T ) ,

λ∈p(σ(T )) λ∈σ(T )
as desired.
Proof of Theorem 12.37. Let T ∈ B(H) be self-adjoint so that σ(T ) is a

compact subset of R. By Lemma 12.38, we deduce that the map
FC : (C[z], k · k∞,σ(T ) ) −→ B(H)
sending p to p(T ) is linear and continuous (indeed, is an isometry). Hence

it extends uniquely, using the Stone–Weierstrass theorem (Theorem 2.40)
and the automatic extension to the closure (Proposition 2.59), to a map
defined on C(σ(T )), and the extension remains isometric, as claimed in (2).
By continuity, the properties FC(f1 f2 ) = FC(f1 )FC(f2 ) and FC(f )∗ = FC(f¯),

which are valid for polynomials (using T = T ∗ for the latter), pass to the limit
and are true for all f ∈ C(σ(T )). It follows that FC (C(σ(T ))) is commutative
and closed under taking adjoints, and in particular f (T ) is normal for all
functions f ∈ C(σ(T )). This proves (1), (2), and (3).
To prove (4) we choose for f ∈ C(σ(T )) a sequence (pn ) in C[z] such
that kpn − f k∞,σ(T ) → 0 as n → ∞. If now v ∈ ker(T − λI) then T v = λv,
and by induction and linearity pn (T )v = pn (λ)v, for all n > 1, and we deduce
that f (T )(v) = f (λ)v.
Finally, assume that T = Mg . Then by (1) we have p(Mg ) = Mp◦g for
all p ∈ C[x] and by (2), Mpn ◦g = pn (Mg ) → f (Mg ) as n → ∞. However, by
Corollary 4.30 and the discussion in Example 12.35(b) we also have
kMpn ◦g −Mf ◦g k = kMpn◦g−f ◦g k = kpn ◦g −f ◦gkesssup = kpn −f k∞,σ(T ) −→ 0
as n → ∞. Together these imply that f (Mg ) = Mf ◦g .
Exercise 12.39. Analyze the proof of Theorem 12.37 above and find out where the argu-
ment fails for a normal operator that is not self-adjoint.
The following definition will help us introduce spectral measures in the

next section.
Definition 12.40. Let T ∈ B(H) be a bounded operator on a complex Hil-

bert space. We say that T is a positive operator, written T > 0, if it is
self-adjoint and has hT v, vi > 0 for all v ∈ H.
The requirement that T is self-adjoint is redundant, as the next exercise

shows.
Exercise 12.41. Let H be a complex Hilbert space. Show that any T ∈ B(H) with the
property that hT v, vi ∈ R for all v ∈ H is self-adjoint.
Corollary 12.42. Let H be a complex Hilbert space, and let T ∈ B(H) be

self-adjoint. If f ∈ C(σ(T )) is non-negative, then f (T ) is a positive operator.
√
Proof. If f√∈ C(σ(T )) satisfies f > 0, then we can write f = ( f )2 = g 2
where g = f > 0 is also continuous on σ(T ). Then g(T ) is well-defined
by the continuous functional calculus in Theorem 12.37, is self-adjoint (by
Theorem 12.37(3) since g is real-valued), and
hf (T )v, vi = hg(T )2 v, vi = hg(T )v, g(T )vi > 0,
for all v ∈ V , which shows that f (T ) > 0.

In the case of a self-adjoint compact operator the continuous functional
calculus discussed above is quite straightforward.
Example 12.43. Let H be a separable complex Hilbert space, and let T

in K(H) be a compact self-adjoint operator. Applying Theorem 6.27 we have
X
Tv = λn hv, en ien ,
n>1
where (λn ) is the sequence of (real) eigenvalues of T with (en ) the sequence of
corresponding eigenvectors. If dim H < ∞ then the spectrum simply consists
of the eigenvalues, and if dim H = ∞ then
σ(T ) = {0} ∪ {λ1 , λ2 , . . . }
and for f ∈ C(σ(T )) we have by Theorem 12.37(4)

X
f (T )v = f (λ)Pλ v,
λ∈σ(T )
where Pλ ∈ B(H) is the orthogonal projection onto ker(T − λI).
12.4.2 Corollaries to the Continuous Functional Calculus
†
The following exercise generalizes Lemma 12.38(1) to any continuous func-
tion.
Exercise 12.44 (Spectral mapping theorem). Let H be a complex Hilbert space, and
let T ∈ B(H) be a self-adjoint operator. Show that σ(f (T )) = f (σ(T )) for any f ∈ C(σ(T )).
Corollary 12.45 (Positive roots). Let T ∈ B(H) be a positive operator.

For any n > 1, there exists a positive operator, denoted T 1/n , with the prop-
erty that (T 1/n )n = T .
We note that such an operator is unique, but we will only prove this a
little later (see Exercise 12.72).
Proof. Since T > 0, we have σ(T ) ⊆ [0, ∞), so the function f : x 7→ x1/n is
defined and continuous on σ(T ). Since f (x)n = x for all x > 0, the functional
calculus implies that f (T )n = T . Moreover f > 0, and hence f (T ) > 0 by
Corollary 12.42.
The case n = 2 is sufficient to prove Lemma 6.38, which claimed that any
bounded operator on a separable complex Hilbert space may be written as a
sum of four unitary operators and was used in our discussion of trace-class
operators.
Proof of Lemma 6.38. Let B be a bounded operator on the complex Hilbert
space H. Then
† The results of this subsection help to explain the functional calculus and how it can be
used further, but will not be needed later.

B = 12 (B + B ∗ ) + i 2i
1
(B − B ∗ )
1
and both 21 (B + B ∗ ) and 2i (B − B ∗ ) are self-adjoint. Thus it remains to show
that every self-adjoint operator can be written as a linear combination of two
unitary operators.
So let A be a self-adjoint operator on H and assume without loss of gen-
erality that kAkop 6 1. Then I − A2 is positive since it is clearly self-adjoint
and

(I − A2 )v, v = kvk2 − kAvk2 > 0
for all v ∈ H. By Corollary 12.45 we find an operator U = A + i(I − A2 )1/2
satisfying
U U ∗ = U ∗ U = A2 + (I − A2 ) = I
and 21 (U + U ∗ ) = A, which shows that A is the linear combination of two
unitary operators.
The next corollary, which will be generalized later, starts to show how the
functional calculus can be used to provide detailed information about the
spectrum.
Corollary 12.46 (Isolated points). Let H be a complex Hilbert space and
let T ∈ B(H) be a bounded self-adjoint operator. Let λ ∈ σ(T ) be an isolated
point meaning that there is some ε > 0 for which σ(T ) ∩ (λ − ε, λ + ε) = {λ}.
Then λ ∈ σdisc (T ).
Proof. The fact that λ is isolated implies that the function
f = 1{λ} : σ(T ) → C
which maps λ to 1 and σ(T )r{λ} to 0 is a continuous function on σ(T ).

Hence we can define an operator P = f (T ) ∈ B(H). We claim that P is
non-zero, and is a projection to ker(T − λI). This will show that λ is in the
discrete spectrum.
Firstly, P 6= 0 because kP kop = kf k∞,σ(T ) = 1 by the functional calculus.
Clearly f = f 2 in C(σ(T )), so
P = f (T ) = f (T )2 = P 2 ,
and
P = f (T ) = f (T ) = P ∗
since f is real-valued, which shows that P is an orthogonal projection.
Moreover, we have an identity of continuous functions
[(I − λ1)f ](z) = (z − λ)f (z) = 0
for all z ∈ σ(T ), so by the functional calculus we get (T − λI)P = 0, which

shows that 0 6= im(P ) ⊆ ker(T − λI).
Exercise 12.47. Extend Corollary 12.46 by showing that λ ∈

/ σapprox (T ).
Exercise 12.48 (Polar decomposition). Let H1 , H2 be complex Hilbert spaces and

suppose that T : H1 → H2 is a bounded operator. Show that there exist a positive
self-adjoint operator A ∈ B(H1 ) and a bounded operator U : H1 → H2 with the property
that U |ker(T ) = 0, A|ker(T ) = 0, U |(ker(T ))⊥ : (ker(T ))⊥ → H2 is an isometry, and T = U A.
12.4.3 Spectral Measures
Using the functional calculus, we can now clarify how the spectrum represents
an operator T and its action on vectors v ∈ H.
Proposition 12.49 (Spectral measure). Let T ∈ B(H) be a self-adjoint

operator on a complex Hilbert space H. Then for any v ∈ H there exists a
uniquely determined measure µv on σ(T ), depending on T and on v, such
that Z
f (x) dµv (x) = hf (T )v, vi
σ(T )
for all f ∈ C(σ(T )). In particular, we have
µv (σ(T )) = kvk2 , (12.11)
so µv is a finite measure. This measure is called the spectral measure asso-

ciated to v (with respect to T ).
Proof. This is a direct application of the Riesz representation theorem (The-

orem 7.44). Indeed, the linear functional
ℓ : C(σ(T )) −→ C
f 7−→ hf (T )v, vi
is well-defined and positive, since if f > 0 we have f (T ) > 0 by Corol-

lary 12.42, and so hf (T )v, vi > 0 by definition. Hence there exists a uniquely
determined positive locally finite measure µv on σ(T ) such that
Z
hf (T )v, vi = ℓ(f ) = f (x) dµv (x)
σ(T )
for all f ∈ C(σ(T )). Moreover, taking f = 1, we obtain (12.11) (which also
implies that kℓk = kvk2 ).
Example 12.50. Let T : H → H be as in Example 12.43. Then we have

Z X
f (x) dµv (x) = f (0)kP0 (v)k2 + f (λn )kPλn (v)k2
σ(T ) n>1
for all continuous functions f on σ(T ). Therefore, as a measure on σ(T ), µv

is a series of Dirac measures at the eigenvalues λn (including 0 if 0 is an
eigenvalue) with µv ({λn }) equal to kPλn (v)k2 .
This example indicates how, roughly speaking, one can think of µv in gen-
eral. The spectral measure indicates how the vector v is spread out across the
spectrum; in general, any individual point λ ∈ σ(T ) carries a vanishing pro-
portion of the vector, because µv ({λ}) is often zero. However, µv (U ) > 0 for
a subset U ⊆ σ(T ) indicates that a positive proportion of the vector belongs
to the ‘generalized eigenspace’ corresponding to that part of the spectrum.
We will discuss this interpretation of the spectrum again in Section 12.7.
Essential Exercise 12.51. Let (X, µ) be a finite measure space and let T
be the multiplication operator Mg for some bounded measurable g : X → R.
Describe the spectral measure of v ∈ L2µ (X).
12.4.4 The Spectral Theorem for Self-Adjoint Operators
Using spectral measures, we can now give a complete description of a self-

adjoint operator in B(H) (essentially by adapting the arguments from Sec-
tion 9.1.2).
To see how this works, consider first some v ∈ H and the associated
spectral measure µv , so that
Z
hf (T )v, vi = f (x) dµv (x)
σ(T )
for all continuous functions f defined on the spectrum of T . In particular, if

we apply this to |f |2 = f f and use the properties of the continuous functional
calculus in Theorem 12.37, we get

kf (T )vk2 = hf (T )v, f (T )vi = (f f )(T )v, v
Z
= |f (x)|2 dµv (x) = kf k2L2 (σ(T ),µv ) .
σ(T )
In other words, the map φ defined by
φ : {f (T )v | f ∈ C(σ(T ))} −→ L2 (σ(T ), µv )

f (T )v 7−→ f
is an isometry. We note that the above also implies that φ is well-defined,

since f1 (T )v = f2 (T )v for f1 , f2 ∈ C(σ(T )) implies
0 = k(f1 − f2 )(T )vk = kf1 − f2 kL2 (σ(T ),µv )
and so f1 = f2 in L2 (σ(T ), µv ). Using the automatic extension to the closure

(Proposition 2.59) we can extend the above map to an isometry, again denoted
by φ, from the closed subspace
Hv = {f (T )v | f ∈ C(σ(T ))}
into L2 (σ(T ), µv ). Since µv is a finite positive measure, continuous functions

are dense in the Hilbert space L2µv (σ(T )) (by Proposition 2.51), which implies
that φ is onto (since the image is complete due to the isometry property).
Next we show that the subspace Hv is invariant under T . Indeed, to see
that T (Hv ) ⊆ Hv , it is enough to show that T (f (T )v) ∈ Hv for f ∈ C(σ(T )).
For this let us again write I for the function σ(T ) ∋ x 7→ x. In this notation
the functional calculus in Theorem 12.37 gives I(T ) = T and
(If )(T ) = T f (T ).
Applying this operator to v gives
T f (T )v = (If )(T )(v)
and hence
φ ◦ T (f (T )v) = If = MI φ(f (T )v)
for all f ∈ C(σ(T )). By the density of the vectors f (T )v ∈ Hv for func-
tions f ∈ C(σ(T )) we obtain
φ ◦ T = MI ◦ φ. (12.12)
Thus the above discussion proves a special case of Theorem 12.33, namely
the case where T is self-adjoint and there exists some vector v with Hv = H.
It is important in this reasoning to keep track of the measure µv , which
depends on the vector v, and to remember that elements of L2 are actually
equivalence classes of functions. Indeed, it could well be that µv has support
which is much smaller than the spectrum, and then the values of a continu-
ous function f outside the support are irrelevant in viewing f as an element
of L2µv . In particular, the map C(σ(T )) → L2µv (σ(T )) is not necessarily in-
jective.
Definition 12.52. Let H be a Hilbert space and T ∈ B(H). The cyclic sub-
space generated by a vector v ∈ H (also called the cyclic vector for Hv )
equals the closure
Hv = {f (T )v | f ∈ C(σ(T ))} = hT n v | n ∈ N0 i.
For a unital sub-algebra A ⊆ B(H) the cyclic subspace generated by v is

defined by Hv = Av.
The equivalence of the two definitions of Hv follows from the density of

the subspace of polynomials in C(σ(T )) and Theorem 12.37. We also note
that Hv is separable.
It is not always the case that T admits a cyclic vector for all of H. However,
we have the following lemma which allows us to reduce many questions to
the cyclic case.
Lemma 12.53. Let H be a Hilbert space, and let T ∈ B(H) be a self-adjoint

operator. Then there exists a family (H Li )i∈I of non-zero, pairwise orthogonal,
closed subspaces of H such that H = i∈I Hi is the orthogonal direct sum of
the Hi , T (Hi ) ⊆ Hi for all i, and T restricted to Hi is, for all i, a self-adjoint
bounded operator in B(Hi ) with a cyclic vector.
Essential Exercise 12.54. Prove Lemma 12.53.
Notice that if H is separable, the index set in the above result is either
finite or countable, since each Hi is non-zero.
We can now prove Theorem 12.33 for a single self-adjoint operator.
Theorem 12.55 (Spectral theorem for self-adjoint operators). Let H

be a separable complex Hilbert space and T ∈ B(H) a continuous self-adjoint
operator. Then there exists a finite measure space (X, µ), a unitary isomorph-
ism
φ : H → L2µ (X)
and a bounded measurable function g : X → R, such that
Mg ◦ φ = φ ◦ T.
In fact, we can set X = σ(T ) × N and g(z, n) = z for z ∈ σ(T ) and n ∈ N.
Proof. Consider a (possibly finite) family (Hn )n>1 of pairwise orthogonal

non-zero closed subspaces of H, spanning H, for which T (Hn ) ⊆ Hn and T
has a cyclic vector vn 6= 0 on Hn as in Lemma 12.53. By replacing vn
with n−1 kvn k−1 vn , we can assume that kvn k2 = n−2 (without changing Hn ).
Let µn = µvn be the spectral measure associated to vn (and T ), so that
µn (σ(T )) = kvn k2 = n−2
for all n > 1. If the list of subspaces is finite, H1 , . . . , Hn0 say, then we
set Hn = {0} for n > n0 and still work with the index set N.
By the argument at the beginning of this section, we have unitary maps
φn : Hn → L2µn (σ(T )),
such that φn ◦ T = MI ◦ φn and MI is the multiplication operator cor-

responding to the function I defined by I(z) = z for z ∈ σ(T ). Now
define X = σ(T ) × N with the product topology, and define the locally
finite positive measure µ by µ(A × {n}) = µn (A) for n > 1 and measur-
able A ⊆ σ(T ). It is easily checked (see Exercise 3.30) that this indeed defines
a measure on X. Moreover, in this context measurable functions on X cor-

respond one-to-one with sequences of measurable functions (fn ) on σ(T ) by
mapping f to (fn ) with fn (z) = f (z, n) for all (z, n) ∈ σ(T ) × N, and
Z XZ
f (x) dµ(x) = fn (z) dµn (z)
X n>1 σ(T )
whenever this makes sense (for example, if f > 0, equivalently fn > 0 for
all n, or if f is integrable, which is equivalent to fn being µn -integrable for
all n and the sum of the integrals of |fn | being convergent). In particular,
X X
µ(X) = µn (σ(T )) = n−2 < ∞,
n>1 n>1
so that (X, µ) is a finite measure space. Moreover, the map

M
L2 (σ(T ), µn ) −→ L2 (X, µ)
n
(fn ) 7−→ f
is a unitary isomorphism (cf. Exercise

L3.37) which we will use implicitly in
the following. Now recall that H = n Hn so that we can construct φ by
defining X
φ wn = φn (wn ) n
n
P L
for all n wn ∈ n Hn = H. Since kφn (wn )kL2 (σ(T ),µn ) = kwn kH , this
defines a unitary map with inverse given by
X M
φ−1 (f ) = φ−1
n (fn ) ∈ Hn = H
n>1 n
for f = (fn )n ∈ L2 (X, µ).

Now consider the map g : X → C sending (z, n) to z, which is bounded
and measurable. Then by (12.12) we have
φ(T (wn )) = φn (T (wn )) = MI (φn (wn )) = Mg (φ(wn ))
for all wn ∈ Hn . As this holds for all n > 1 we see that φ ◦ T = Mg ◦ φ.

This spectral theorem is extremely useful. It immediately implies a number
of results which could also be proved directly from the continuous functional
calculus, but less transparently so.
Note that the method of proof (treating first the case of cyclic operators,
and then extending the result to direct sums) may also be a shorter approach
to some of the other corollaries, since in the cyclic case one knows that the
multiplication operator MI can be taken to correspond to the identity func-

tion I : σ(T ) ∋ x 7→ x on the spectrum.
Corollary 12.56 (Positivity). Let H be a separable complex Hilbert space

and let T ∈ B(H) be a self-adjoint operator. Then for any f ∈ C(σ(T )) we
have f (T ) > 0 if and only if f > 0.
Proof. Because of Corollary 12.42, we only need to check that f (T ) > 0

implies that f > 0. Now two unitarily equivalent operators are simultan-
eously either positive or not, so it suffices to consider an operator of the
form T = Mg acting on L2µ (X) for a finite measure space (X, µ). Without
loss of generality we may assume that X = σ(T ) × N and g(z, n) = z as in
Theorem 12.55. Recall that f (Mg ) = Mf ◦g for any function f ∈ C(σ(T )) by
Theorem 12.37(4). Now set v = 1{(z,n)|f (z)<0} to obtain
Z
f (z) dµ(z, n) = hf (Mg )v, vi > 0,
{(z,n)|f (z)<0}
which implies that g∗ µ ({z | f (z) < 0}) = µ ({(z, n) | f (z) < 0}) = 0 and
so f > 0, since σ(T ) = σ(Mg ) = Supp(g∗ µ) by Example 12.31.
12.4.5 Consequences for Unitary Representations
As the following exercises show, the material above is also useful for the study
of unitary representations.
ý ý
Exercise 12.57. Let G be a topological group, H1 and H2 complex Hilbert spaces, and
let π1 : G H1 and π2 : G H2 be unitary representations of G. Suppose that π1 and π2
are isomorphic in the sense that there exists a bijective bounded operator T from H1
to H2 with T π1 (g) = π2 (g)T for all g ∈ G. Show that this implies that π1 and π2
are also unitarily isomorphic, meaning that T can be chosen to be in addition a unitary
isomorphism T : H1 → H2 .
Exercise 12.58 (Schur’s lemma). (a) Let π1 : G ý H1 and π2 : G ý H2 be unitary

representations of a topological group G, and let B : H1 → H2 be a bounded operator
with Bπ1 (g) = π2 (g)B for all g ∈ G. Show that if π1 is irreducible (that is, there are
no closed π1 -invariant subspaces in H1 other than {0} and H1 ) then B ∗ B = λIH1 for
some λ > 0 and if π2 is also irreducible then BB ∗ = λIH2 .
(b) Suppose now that π1 = π2 is irreducible and deduce that B = λIH1 for some λ ∈ C.
Exercise 12.59. Let G be a topological group, and recall the set P1 (G) of normalized
positive-definite functions in Cb (G) from Exercise 9.55. Show that p ∈ P1 (G) is extreme
in P1 (G) if and only if the associated unitary representation from Exercise 9.55(a) is
irreducible.
12.5 Commuting Normal Operators 459
12.5 Commuting Normal Operators
The following is a natural generalization of the spectral theorem for normal

operators (Theorem 12.33), showing that any commutative C ∗ -sub-algebra
of B(H) is unitarily equivalent to a C ∗ -sub-algebra of multiplication operat-
ors.
Theorem 12.60 (Spectral theorem for commuting normal operat-
ors). Let H be a separable complex Hilbert space, and let A ⊆ B(H) be a
separable commutative unital C ∗ -sub-algebra of B(H). Then there exists a
finite measure space (X, µ), a unitary isomorphism φ : H → L2µ (X) and for
every a ∈ A a bounded function ga ∈ L∞ µ (X) such that
a
H −−−−→ H
 

φy
φ
y
L2µ (X) −−−−→ L2µ (X)
Mga
commutes. In fact, we can choose X = σ(A) × N and ga = ao , where

we identify the function ao with the function defined by ao (x, n) = ao (x)
for (x, n) ∈ σ(A) × N and all a ∈ A and the map that sends a ∈ A
to ga ∈ L∞ ∗
µ (X) is a C -isomorphism preserving products, the adjoint op-
eration, and norms.
Clearly Theorem 12.33 is a special case of Theorem 12.60 (cf. Exer-
cise 12.61). Using Section 12.4 the following proof will be much shorter than
the proof of the case of a single self-adjoint operator above. By Corollary 11.34
the Gelfand transform
A ∋ a 7−→ ao ∈ C(σ(A))
is an isometry and an algebra isomorphism satisfying
(a∗ )o = ao (12.13)
for all a ∈ A. (We note that Lemma 11.35 in the proof of Corollary 11.34 can
here be replaced by Lemma 12.15.) We recall that σ(A) is the generalization
of the spectrum of a single operator and note that in the following the inverse
map
C(σ(A)) ∋ f = ao 7−→ a ∈ A
should be thought of as a generalized continuous functional calculus.
Proof of Theorem 12.60. Fix v ∈ H and define a linear functional
Λ : C(σ(A)) → C
by
Λ(ao ) = hav, vi
for every a ∈ A. We claim that Λ is a positive functional on C(σ(A)). Suppose
that a ∈ A with ao > 0. Then√there exists some b = b∗ ∈ A (defined using
the Gelfand transform by bo = ao ) with b2 = a. The claimed positivity now
follows, since
Λ(ao ) = hav, vi = hbv, bvi > 0.
By the Riesz representation theorem (Theorem 7.44) there exists a positive
finite measure µv on σ(A) such that
Z
hav, vi = ao dµv
σ(A)
and it follows that

Z Z
2 ∗ ∗ o
kavk = hav, avi = ha av, vi = (a a) dµv = |ao |2 dµv
for all a ∈ A by (12.13). Just as in Sections 9.1.2 and 12.4.4, this induces a
unitary isomorphism between the cyclic subspace Hv = Av and L2µv (σ(A))
which sends av ∈ Av to ao ∈ C(σ(A)). In particular, for a, b ∈ A we have
φ(abv) = (ab)o = ao bo = ao φ(bv).
Fixing a ∈ A, this extends by continuity to the statement
φ(aw) = Mao φ(w)
for all w ∈ Hv . As in Sections 9.1.2 and 12.4.4 this extends to a proof of

Theorem 12.60 as follows. If w1 , w2 , . . . is an orthonormal basis of H then we
define H1 = Hw1 , H2 = Hw2⊥ where w2 ∈ H1⊥ is the orthogonal projection
to H1⊥ , similarly H3 = Hw3⊥ with w3 ∈ (H1 ⊕ H2 )⊥ , and so on. Replacing wn
P
by a scalar multiple for each n > 1 we may assume that n>1 kwn⊥ k2 < ∞.
Define G
X = σ(A) × N ∼ = σ(A)
n∈N
with measure G
µ= µwn⊥ ,
n∈N
where the disjoint union notation indicates that we consider µwn⊥ as a measure
on σ(A) × {n} and then take the sum to obtain the measure µ on X.
With this
M M
L2µ (X) ∼
= L2 (X, µwn⊥ ) ∼
= Hwn⊥ = H, (12.14)
n∈N n∈N
12.6 Spectral Measures and the Measurable Functional Calculus 461
and application of a ∈ A leaves each subspace Hwn⊥ invariant and corresponds

to multiplication by ao ∈ C(σ(A)) on L2 (σ(A), µwn⊥ ) ⊆ L2 (X, µ). As this
holds for all n ∈ N the map in (12.14) gives the unitary isomorphism
φ : H → L2µ (X),
with the required properties.
Exercise 12.61. (a) Suppose that A is the unital commutative C ∗ -algebra generated by T ,
a normal operator on a complex Hilbert space H, in the sense that
A = hT m (T ∗ )n | m, n > 0i
where T 0 = (T ∗ )0 = I. Show that ı : σ(A) ∋ φ 7→ φ(T ) ∈ σ(T ) defines a homeomorphism

between compact metric spaces and
use this to deduce Theorem 12.33 from Theorem 12.60.
(b) Now consider the algebra A = I, T1 , T1∗ , T2 , T2∗ generated by two commuting normal
operators T1 , T2 ∈ B(H) on a complex Hilbert space. Show that
ı : σ(A) ∋ φ 7−→ (φ(T1 ), φ(T2 )) ∈ σ(T1 ) × σ(T2 )
is continuous and injective. Give a concrete example to show that the image of ı may not
be all of σ(T1 ) × σ(T2 ).
Exercise 12.62. State and prove a spectral theorem for normal compact operators as a
corollary of Theorem 12.60.
12.6 Spectral Measures and the Measurable Functional

Calculus
For the proof of Theorem 12.34 we now discuss some more general spectral
measures. As it makes little difference whether we consider a single (self-
adjoint or normal) operator or a commutative Banach algebra (as in the
previous section), we will do the latter. The reader only interested in the
case of a single self-adjoint operator T may replace the use of Theorem 12.60
below by Theorem 12.55, set σ(A) = σ(T ) as in Exercise 12.61(a) and replace
the operation ao 7→ a ∈ A by the continuous functional calculus
C(σ(T )) ∋ f 7−→ f (T ).
12.6.1 Non-Diagonal Spectral Measures
Definition 12.63. Let H be a complex Hilbert space and let A ⊆ B(H) be

a separable commutative unital C ∗ -sub-algebra. For v, w ∈ H a non-diagonal
spectral measure is a finite complex-valued measure µv,w on σ(A) with
Z
ao dµv,w = hav, wi (12.15)
σ(A)
for all ao ∈ C(σ(A)).
Proposition 12.64. Let H be a complex Hilbert space, and assume that A

is a separable commutative unital C ∗ -sub-algebra of B(H). Then for every
pair v, w ∈ H there exists a uniquely determined finite complex-valued spectral
measure µv,w on σ(A) satisfying (12.15). Moreover, the measure µv,w depends
sesqui-linearly on v, w ∈ H and satisfies kµv,w k 6 kvkkwk.
Proof. We may assume without loss of generality that H is separable, for

otherwise we may replace H by the closure of Av + Aw, which is separable
by the assumption on A. Recall from Corollary 11.34 that
C(σ(A)) = {ao | a ∈ A}.
Recall that linear functionals on C(σ(A)) can be uniquely identified with

complex-valued measures by Theorem 7.54. We now apply this to the linear
functional
Λv,w : C(σ(A)) ∼
= A ∋ ao 7−→ hav, wi
which satisfies

hav, wi 6 kavkkwk 6 kakkvkkwk = kao k∞ kvkkwk
by Cauchy–Schwarz, the definition of the operator norm, and the isometry

claim in Corollary 11.34. This shows that Λv,w ∈ C(σ(A))∗ is a bounded
functional and we obtain the uniquely defined complex-valued measure µv,w
on σ(A) satisfying (12.15) and kµv,w k 6 kvkkwk.
Since uniqueness and existence are now shown, the sesqui-linearity follows
easily from the sesqui-linearity of the inner product on H.
The following exercise clarifies how the more general non-diagonal spectral
measures can be constructed from the diagonal spectral measures µv = µv,v
appearing in the spectral theorem.
Exercise 12.65. Let H and A be as in Proposition 12.64. Let v, w ∈ H and decompose w
into w = w0 + w ⊥ with w0 ∈ Hv and w ⊥ ∈ H⊥ v . Use the spectral theorem to express µv,w
in terms of µv and the vector f0 ∈ L2µv (σ(A)) corresponding to w0 .
12.6.2 The Measurable Functional Calculus
Using the spectral measures from above, we can now define for every meas-
urable function f ∈ L ∞ (σ(A)) a corresponding operator fH ∈ B(H). In the
case of A being generated by I and a normal operator T this gives a definition
of f (T ) for a function f ∈ L ∞ (σ(T )).
Proposition 12.66. Let H be a complex Hilbert space and let A ⊆ B(H) be

a separable commutative unital C ∗ -sub-algebra. For any f ∈ L ∞ (σ(A)) there
exists a bounded operator fH which is uniquely characterized by the property
Z
hfH v, wi = f dµv,w (12.16)
σ(A)
for all v, w ∈ H. Moreover, the operator norm of fH satisfies kfH k 6 kf k∞ .
Proof. Since µv,w is a finite

R complex-valued measure, and f ∈ L ∞ (σ(A))
is bounded, the integral σ(A) f dµv,w exists. Moreover,
Z

f dµv,w 6 kf k∞ kµv,w k 6 kf k∞ kvkkwk
σ(A)
by Proposition 12.64. Thus for a fixed v ∈ H the map

Z
w 7−→ f dµv,w
σ(A)
is linear and bounded with operator norm bounded by kf k∞ kvk. Therefore,

by Fréchet–Riesz representation (Corollary 3.19) there exists some uniquely
determined vf with
kvf k 6 kf k∞ kvk (12.17)
for which Z
hvf , wi = hw, vf i = f dµv,w
σ(A)
for all w ∈ H. By linearity of v 7→ µv,w and the bound (12.17), we see that
v 7−→ vf = fH v
defines a bounded operator fH with (12.16), and kfH k 6 kf k∞ .

Proposition 12.66 defines the measurable functional calculus. We now dis-
cuss its main properties, which will also give the proof of the existence claim
in Theorem 12.34 (recall that uniqueness was the content of Exercise 12.36).
Proposition 12.67. Let H be a complex Hilbert space, and let A ⊆ B(H)

be a separable commutative unital C ∗ -sub-algebra. The measurable functional
calculus L ∞ (σ(A)) ∋ f 7→ fH ∈ B(H) has the following properties:
(FC1) If f = ao ∈ C(σ(A)), then fH = a ∈ A.
(FC2) kfH k = kf k∞ for any f in C(σ(A)) and kfH k 6 kf k∞ for any f
in L ∞ (σ(A)).
(FC3) (fH )∗ = (f )H and (f1 f2 )H = (f1 )H (f2 )H for f1 , f2 , f ∈ L ∞ (σ(A)).
In particular, properties (FC1)–(FC3) in Theorem 12.34 hold.
Proof. Recall that Corollary 11.34 gives the existence of a C ∗ -algebra iso-
morphism C(σ(A)) ∋ f = ao 7→ fH = a ∈ A (see Theorem 12.37 in the case
of a single self-adjoint operator).
Also recall that in Proposition 12.64 we derived the existence of the family
of finite complex-valued measures {µv,w } on σ(A) with
Z Z
hfH v, wi = hav, wi = ao dµv,w = f dµv,w (12.18)
σ(A) σ(A)
for all f = ao ∈ C(σ(A)), which in Proposition 12.66 we turned around to

use (12.18) as the definition of fH for f ∈ L ∞ (σ(A)). Hence this definition
of the measurable functional calculus extends the definition of the continu-
ous functional calculus (that is, of the map C(σ(A)) ∋ ao 7→ a ∈ A), and
hence satisfies (FC1). By Corollary 11.34 and Proposition 12.66 above we
also have (FC2).
To prove (FC3) we argue in the following way. First, by Corollary 11.34
we already know (FC3) for continuous functions. We will use this to en-
code the properties in (FC3) into properties of the non-diagonal spectral
measures µv,w , which in turn will give the same properties for measurable
functions.
∗
Let us start with fH = (f )H , which we know by Corollary 11.34 for
f = ao ∈ C(σ(A)).
We claim this implies that µv,w = µw,v . To see this, let a ∈ A and notice
that
Z Z Z
ao dµv,w = ha∗ v, wi = hv, awi = haw, vi = ao dµw,v = ao dµw,v ,
for any v, w ∈ H, and since this holds for all f = ao ∈ C(σ(A)) the claim
follows. Now we use essentially the same identity (in a slightly different order,
∗
and with a different logic) to deduce that fH = fH for f ∈ L ∞ (σ(A)). So
∞
let f ∈ L (σ(A)). Then
∗
hfH v, wi = hv, fH wi = hfH w, vi
Z Z

= f dµw,v = f dµv,w = (f )H v, w
for all v, w ∈ H, as required.

We now show that
(f1 f2 )H = (f1 )H (f2 )H (12.19)
∞
for f1 , f2 ∈ L (σ(A)). Again we know this property for f1 , f2 ∈ C(σ(A)).
We claim that this implies
dµ(f2 )H v,w = f2 dµv,w (12.20)

for f2 ∈ L ∞ (σ(A)) and v, w ∈ H.

For f1 , f2 ∈ C(σ(A)) we have
Z
f1 dµ(f2 )H v,w = h(f1 )H (f2 )H v, wi
σ(A)
Z
= h(f1 f2 )H v, wi = f1 f2 dµv,w .
σ(T )
As this holds for all f1 ∈ C(σ(A)) we obtain (12.20) for f2 ∈ C(σ(A)) and
for all v, w ∈ H. Using µv,w = µw,v this also shows that
dµv,(f1 )H w = dµ(f1 )H w,v = f1 dµw,v = f1 dµv,w
for all f1 ∈ C(σ(A)). For general f2 ∈ L ∞ (σ(A)) we now see that

Z

f1 dµ(f2 )H v,w = h(f1 )H (f2 )H v, wi = (f2 )H v, (f1 )H w
σ(A)
Z Z
= f2 dµv,(f1 )H w = f1 f2 dµv,w
σ(A) σ(A)
for all f1 ∈ C(σ(A)) and v, w ∈ H, which implies the claim in (12.20).

We now derive (12.19) from (12.20) for f1 , f2 ∈ L ∞ (σ(A)). Indeed, ap-
plying (12.20) we see that
Z
h(f1 )H (f2 )H v, wi = f1 dµ(f2 )H v,w
σ(A)
Z
= f1 f2 dµv,w = h(f1 f2 )H v, wi .
σ(A)
As v, w ∈ H were arbitrary, we derive (12.19) and conclude (FC3).
Proposition 12.68. Under the same hypotheses as in Proposition 12.64, the

measurable functional calculus has the following properties:
(FC4) Suppose µ is a finite measure on X = σ(A) × N so that the operators a
in A correspond to multiplication operators Mao via a unitary isomorph-
ism φ : H → L2µ (X) as in Theorem 12.60. Then for any f in L ∞ (σ(A)) the
operator fH corresponds to Mf (with f (x, n) = f (x) for (x, n) ∈ σ(A) × N).
(FC5) If V ⊆ H is a closed subspace such that aV ⊆ V for all a ∈ A, then V
is also invariant under fH for all f ∈ L ∞ (σ(A)). Moreover, if S ∈ B(H)
commutes with all a ∈ A, then S commutes with fH for all f ∈ L ∞ (σ(A)).
In the case of a normal operator T ∈ B(H) the measurable functional calculus
satisfies (FC4)–(FC6) in Theorem 12.34.
Proof. We first prove (FC5). Suppose that S ∈ B(H) commutes with all
elements a ∈ A. We extend this again to fH for f ∈ L ∞ (σ(A)) using the
spectral measures. For these, we have that µSv,w = µv,S ∗ w since

Z Z
ao dµSv,w = haSv, wi = hav, S ∗ wi = ao dµv,S ∗ w
σ(A) σ(A)
for all a ∈ A. Now let f ∈ L ∞ (σ(A)), and notice that

Z Z
hfH Sv, wi = f dµSv,w = f dµv,S ∗ w = hSfH v, wi
σ(A) σ(A)
for all v, w ∈ H, showing that fH S = SfH .

To complete the proof of (FC5), we still have to consider an invariant
subspace V ⊆ H. By Lemma 6.30 the closed subspace V ⊥ is a∗ -invariant
for all a ∈ A. This implies that every a ∈ A commutes with the orthogonal
projection PV : H → H onto V , since for v ∈ V we have av ∈ V and
aPV (v) = av = PV (av),
and since for w ∈ V ⊥ we also have aw ∈ V ⊥ and
aPV (w) = 0 = PV (aw).
By what we have already proved, PV commutes with fH for f ∈ L ∞ (σ(A)).

If now v ∈ V then
fH v = fH ◦ PV (v) = PV ◦ fH (v) ∈ V
shows the remaining claim in (FC5).

For the proof of (FC4) it suffices, because of (FC5), to prove the cyclic
case. That is, to prove (FC4) in the case where X = σ(A) and µ is the
spectral measure on σ(A) corresponding to the generator of H. For v, w ∈ H
we then have
dµv,w = φ(v)φ(w) dµ
since Z
hav, wi = ao φ(v)φ(w) dµ
for all a ∈ A by the spectral theorem (Theorem 12.60). Therefore

Z Z
hfH v, wi = f dµv,w = f φ(v)φ(w) dµ = hMf φ(v), φ(w)i L2 (σ(A),µ)
σ(A) σ(A)
for any v, w ∈ H and for f ∈ L ∞ (σ(A)). This proves (FC4) as in the

proposition.
It remains to prove (FC4) and (FC6) in Theorem 12.34 (since (FC4) above
differs slightly from (FC4) in Theorem 12.34).
So suppose that T is unitarily isomorphic to the multiplication oper-

ator Mg : L2µ (X) → L2µ (X) for some bounded measurable g : X → C, and
let f ∈ L ∞ (σ(Mg )). Then f ◦ g is defined almost everywhere (specifically,
on g −1 (σ(Mg )); see Examples 12.31 and 12.35(b)). For v, w ∈ L2µ (X) we see
that dµv,w is the push-forward of vw dµ under g (by solving Exercise 12.51)
since Z Z
hf (Mg )v, wi = f ◦ gvw dµ = f dµv,w
X σ(Mg )
first for f ∈ C[z] and then for all f ∈ C(σ(Mg )) by the properties of the
functional calculus in Theorem 12.37. Therefore,
Z Z
hf (Mg )v, wi = f dµv,w = (f ◦ g)vw dµ = hMf ◦g v, wi
σ(Mg ) X
for all v, w ∈ L2µ (X) and f ∈ L ∞ (σ(Mg )), which proves (FC4).
For the proof of (FC6) we assume first that H is cyclic. By Theorem 12.55
there is a finite measure space (X, µ), some bounded measurable g : X → C,
and a unitary isomorphism φ : H → L2µ (X) such that
φ ◦ T = Mg ◦ φ.
By (FC4) we have φ ◦ f (T ) = f (Mg ) ◦ φ = Mf ◦g ◦ φ for all f ∈ L ∞ (σ(T )).

Next note that by Example 12.31 (specifically, by (12.4)) we have
σ(Mf ◦g ) ⊆ f (σ(T )).
Let h ∈ L ∞ (f (σ(T ))) and apply (FC4) twice more to see that
φ◦h(f (T )) = h(Mf◦g )◦φ = Mh◦f◦g ◦φ = h◦f (Mg )◦φ = φ◦(h◦f )(T ),
which gives h(f (T )) = (h ◦ f )(T ), as claimed in (FC6). If H is not cyclic,

then we decompose H into a direct sum of cyclic subspaces and apply (FC5)
and the above case.
Exercise 12.69. Suppose that H is a separable complex Hilbert space and that
A ⊆ A′ ⊆ B(H)
are two separable commutative unital C ∗ -sub-algebras.

(a) Suppose that H and the action of A on H is described as in Theorem 12.60. General-
ize (12.4) from Example 12.31 to this context by showing that

σ(A) = Supp (πσ(A) )∗ µ ,
where πσ(A) : X = σ(A) × N −→ σ(A) is the projection to the first factor.

(b) Show that the restriction map π : σ(A′ ) ∋ φ′ 7→ φ′ |A ∈ σ(A) is continuous.
(c) Let µv,w be the spectral measure for A on σ(A) and let µ′v,w be the spectral measure
for A′ on σ(A′ ) for v, w ∈ H. Show that π∗ µ′v,w = µv,w .
(d) Show that the two notions of measurable calculus are compatible in the sense that
any f ∈ L ∞ (σ(A)) defines some f ′ ∈ L ∞ (σ(A′ )) by f ′ = f ◦ π which satisfies fH = fH
′ .
(e) Show that π is surjective.
Exercise 12.70. Generalize the results of Section 9.1.3 to the context of a single normal
operator T ∈ B(H) or a separable commutative unital C ∗ -sub-algebra of B(H).
Exercise 12.71. In the notation of Theorem 12.34, fix a normal operator T , suppose it
has a description as a multiplication operator Mg on some measure space L2µ (X), and
let ν = g∗ µ be the push-forward measure on C. Show that f (T ) is now well-defined
with f ∈ L∞ ν (σ(T )) by proving that if f1 , f2 ∈ L
∞ (σ(T )) agree ν-almost everywhere,
then f1 (T ) = f2 (T ).
Exercise 12.72. Let T ∈ B(H) be a positive self-adjoint bounded operator on a complex

separable Hilbert space. Show that for every n > 1 there is only one positive operator S
in B(H) with S n = T .
12.7 Projection-Valued Measures
In this section, we describe another version of the spectral theorem, which

is essentially equivalent but sometimes more convenient. Moreover, it allows
us to examine some concepts from Section 9.1.4 in greater detail. The idea
is to generalize the following interpretation of the spectral theorem (The-
orem 6.27) for a compact self-adjoint operator T ∈ K(H). If we denote by Pλ
the orthogonal projection onto ker(T − λI) for λ ∈ R as in Example 12.43,
then we have
X
v= Pλ (v),
λ∈R
X
T (v) = λPλ (v),
λ∈R
X
f (T )(v) = f (λ)Pλ (v)
λ∈R
for all v ∈ H and f ∈ C(σ(T )), where the series are well-defined because Pλ
is 0 for λ ∈
/ σ(T ).
To generalize this, it is natural to expect that one must replace the summa-
tions with appropriate integrals. Thus some form of integration for functions
taking values in B(H) is needed. Moreover, ker(T − λI) may be zero for all λ,
and so the projections must be generalized. We start by considering these
two questions abstractly.
Definition 12.73 (Projection-valued measure). Let H be a complex Hil-

bert space and let P(H) denote the set of orthogonal projections onto closed
subspaces in B(H). A (finite) projection-valued measure Π on H is a map
12.7 Projection-Valued Measures 469
B −→ P(H)
B 7−→ ΠB
defined on the Borel σ-algebra B of a compact metric space X and taking

values in the set of projections, such that the following hold:
(1) Π∅ = 0 and ΠX = I.
(2) If (Bn ) is a sequence (or finite list) of pairwise disjoint Borel subsets of X,
and G
B= Bn ,
n>1
then X
ΠB = ΠBn (12.21)
n>1
where the series converges in the strong operator topology (see Sec-
tion 8.3).
P
We note that (12.21) simply means that ΠB (v) = n>1 ΠBn (v) for v ∈ H
(see Exercise 8.57). In the study of a single normal operator T on H we
will set X = σ(T ) ⊆ C, and more generally X = σ(A) in the study of a
commutative separable unital C ∗ -sub-algebra A ⊆ B(H). Also notice that
Definition 12.73 resembles in some ways the definition of a (finite) Borel
measure on X. The discussion below will reveal further parallels to Lebesgue
integration.
Lemma 12.74. Let H be a complex Hilbert space and Π a projection-valued
measure on H defined Fn on the σ-algebra B of Borel subsets of a compact metric
space X. If X = j=1 Bj is a disjoint decomposition of X into measurable
L
subsets B1 , . . . , Bn ∈ B, then H = nj=1 im ΠBj is an orthogonal direct sum
of the closed subspaces im ΠB1 , . . . , im ΠBn ⊆ H.
Proof. By the defining properties of projection-valued measures we have

n
X
v= ΠBj v
j=1
with ΠBj v ∈ Hj = im ΠBj and it remains to show that
Hj ⊥ Hk
for 1 6 j 6= k 6 n. So suppose w = ΠBj v so that w = ΠBj w. Since
Bj ∩ Bk = ∅
we have ΠBj ∪Bk = ΠBj +ΠBk by the properties of Π. Applying this operator
to w we obtain ΠBj ∪Bk w = w + ΠBk w, and taking the inner product with w
gives

ΠBj ∪B w 2 = ΠBj ∪B w, w = kwk2+hΠB w, wi = kwk2 + kΠB wk2 > kwk2 .
k k k k
However kΠBj ∪Bk wk 6 kwk, and it follows that ΠBk w = 0 or equivalently

that w ∈ ker ΠBk = Hk⊥ . Since this holds for all w ∈ Hj = im ΠBj and
all 1 6 j 6= k 6 n, the lemma follows.
Exercise 12.75. Let H, X, and Π be as in Lemma 12.74. Show that
ΠB1 ΠB2 = ΠB1 ∩B2 = ΠB2 ΠB1
for any B1 , B2 ∈ B.
As expected, the point of projection-valued measures is that one can in-

tegrate with respect to them, and construct operators in B(H) using this
formalism.
Proposition 12.76 (Integration and uniform convergence). Let H be a
complex Hilbert space and let Π be a projection-valued measure on H defined
on the Borel σ-algebra B of a compact metric space X. For any f ∈ L ∞ (X)
there exists a bounded operator
Z
T = f (λ) dΠλ ,
X
which can be constructed as the uniform limit of the following simple approx-
C
imation. For any ε > 0 and measurable partition ξ = {P1 , . . . , Pm } of Bkf k∞
with diam Pj 6 ε and a choice of sample points λj ∈ Pj for j = 1, . . . , m we
define the simple function
m
X
fξ = λj 1f −1 (Pj ) (12.22)
j=1
and its integral

Z m
X
fξ (λ) dΠλ = λj Πf −1 (Pj ) ,
j=1
R
which satisfies kT − fξ (λ) dΠλ k 6 ε.
Proof of Proposition 12.76. As indicated in the proposition we define
the integral of a simple function
m
X
f= λj 1Bj
j=1
Fm
when λj 6= λk for 1 6 j 6= k 6 m and X = j=1 Bj with B1 , . . . , Bm ∈ B by
Z m
X
f (λ) dΠλ = λj ΠBj ∈ B(H).
X j=1
12.7 Projection-Valued Measures 471
This definition makes sense since the additional assumption on λ1 , . . . , λm

and B1 , . . . , Bm as above ensure that the presentation of f as a sum is unique.
Suppose now that ξ = {P1 , . . . , Pm } and ζ = {Q1 , . . . , Qn } are finite
C ∞
partitions of Bkf k∞ , f ∈ L (X) and ε > 0 as in the proposition. Choose also
two collections of sample points (λj )j=1,...,m and (λ′k )k=1,...,n with λj ∈ Pj
and λ′k ∈ Qk for all 1 6 j 6 m, 1 6 k 6 n, so that we may define fξ by (12.22)
and fζ similarly. Let η = {Pj ∩ Qk | 1 6 j 6 m, 1 6 k 6 n} be the common
refinement of ξ and ζ. Applying the defining properties of Π we see that
n
X
Πf −1 (Pj ) = Πf −1 (Pj ∩Qk )
k=1
for every 1 6 j 6 m and similarly for f −1 (Qk ). Write

Z Z
Aξ,ζ = fξ (λ) dΠλ − fζ (λ) dΠλ
m
X n
X
= λj Πf −1 (Pj ) − λ′k Πf −1 (Qk )
j=1 k=1
X
= (λj − λ′k ) Πf −1 (Pj ∩Qk ) .
(j,k):Pj ∩Qk 6=∅
If now v ∈ H with kvk 6 1, then

2
X 2 2
kAξ,ζ vk = |λj − λ′k | Πf −1 (Pj ∩Qk ) v (12.23)
(j,k):Pj ∩Qk 6=∅
by Lemma 12.74. Using the assumption that diam Pj , diam Qk 6 ε, we see

that
|λj − λ′k | 6 2ε (12.24)
whenever Pj ∩ Qk 6= ∅. Putting this into (12.23) gives kAξ,ζ vk 6 2εkvk 6 2ε,
again by Lemma 12.74.
If we now choose a sequence of partitions (ξN ) withR maxP ∈ξN diam
P →0
as N → ∞, then the argument above shows that fξN (λ) dΠλ forms a
Cauchy sequence with respect to the operator norm (for any choice of the
sample points). Just as in the last part of the proof of Proposition 3.81, this
also implies that the limit is independent of the choice of sample points and
the choice of the sequence of partitions.
In order to improve the estimate from 2ε (as above) to ε in the last part
of the proposition, we fix a partition ξ and construct the partitions ξN as
above so that they are finer than ξ (that is, every partition element of ξN is
contained in one of the partition elements of ξ). This allows us to make the
cosmetic improvement of the 2ε in (12.24) to ε, giving the estimate
Z Z

fξ (λ) dΠλ − fξN (λ) dΠλ 6 ε,

and letting N → ∞ gives the proposition.

R
Exercise 12.77. Let H, X, and Π be as in Proposition 12.76. Show that f (λ) dΠλ
depends linearly on f ∈ L ∞ (X) and that
Z

f (λ) dΠ λ
6 kf k∞ ,

X
Z ∗ Z
f (λ) dΠλ = f (λ) dΠλ ,
X X
and Z Z Z
f1 (λ) dΠλ f2 (λ) dΠλ = (f1 f2 )(λ) dΠλ
X X X
for any f, f1 , f2 ∈ L ∞ (X).
Exercise 12.78 (Strong convergence). Let H, X, Π be as in Proposition 12.76, and

assume that (fn ) is a sequence of functions in L ∞ (X) with supn>1 kfn k∞ < ∞
and fn (x) → f (x) as n → ∞ for every x ∈ X. Show that
Z Z
fn (λ) dΠλ −→ f (λ) dΠλ
X X
as n → ∞ in the strong operator topology.
It remains to establish the connection between the functional calculus as

in Section 12.6 and the projection-valued measures considered here.
Theorem 12.79. Let H be a complex Hilbert space and let A ⊆ B(H) be a

separable commutative unital C ∗ -sub-algebra (for example, the unital algebra
generated by a single normal operator T ). Then there exists a projection-
valued measure Π on H defined on the σ-algebra B of Borel subsets of the
space X = σ(A) (respectively, σ(T ) in the case of a single normal operator)
such that Z
fH = f (λ) dΠλ
X
for any f ∈ L ∞ (X).
Proof. We define the projection-valued measure ΠB for a Borel set B ∈ B

in X using the functional calculus by setting
ΠB = (1B )H
(resp. ΠB = 1B (T ) in the case of a single normal operator). F To show

that Π satisfies the property in Definition 12.73, suppose that B = n>1 Bn
with Bn ∈ B for all n > 1 and fix some v ∈ H. Then
12.8 Locally Compact Abelian Groups and Pontryagin Duality 473
! 2 ! 2
XN XN

ΠB − ΠBn v = 1B − 1Bn v

n=1 n=1 H
2 D E

= 1Br⊔N B
n=1 n H
v = 1 Br⊔ N B
n=1 n H
v, v
Z N
!
G
= 1Br⊔Nn=1Bn dµv = µv Br Bn −→ 0
X n=1
as N → ∞, which shows (12.21).

Now let f ∈ L ∞ (X) with fξ a simple approximation to f as in Proposi-
tion 12.76. Then
Z X
fξ (λ) dΠλ = λj 1f −1 (Pj ) H = (fξ )H
X Pj ∈ξ
by linearity of the functional calculus. Using a sequence of partitions (ξn )

with the property that fξn → f uniformly as n → ∞ as in the proof of
Proposition 12.76, we see that
Z Z
fξn (λ) dΠλ −→ f (λ) dΠλ
X X
as n → ∞ by Proposition 12.76, and
(fξn )H −→ fH
as n → ∞ by the continuity bound of the functional calculus ((FC2) in

Proposition 12.67). This gives the theorem.
12.8 Locally Compact Abelian Groups and Pontryagin

Duality
In this section we study the relationship between the unitary representations

of a locally compact σ-compact metric abelian group G and its dual or char-
acter group as defined in Definition 11.37. As a consequence of the results of
the previous and the current chapter we will also prove the completeness of
characters claimed on p. 92.
b
As we have seen, a surprising and satisfying fact is that the dual group G
of a locally compact abelian group G is also a locally compact abelian group.
This in turn allows us to repeat the operation of forming the dual group
b giving the bidual of G, which will be canonically isomorphic to G as
to G,
a topological group. This duality or reflexivity of locally compact abelian
groups is called Pontryagin duality.
12.8.1 The Spectral Theorem for Unitary Representations
Let π be a unitary representation of G on H as in the corollary, and recall

the definition of the operator f ∗π from Section 3.5.4 for any f ∈ L1 (G).
Essential Exercise 12.80. Suppose that π : G ý

H is a unitary represent-
ation of a locally compact σ-compact metric abelian group G on a Hilbert
space H. Show that (f ∗π )∗ = fe∗π for any f ∈ L1 (G), where fe is defined
by fe(g) = f (−g) for all g ∈ G.
Corollary 12.81. Let G be a locally compact σ-compact metric abelian group

and let Gb be its dual group (as in Definition 11.37, and equipped with the
weak* topology as in Proposition 11.38). Let π : G → B(H) be a unitary
representation of G on a separable complex Hilbert space H. Then there exists
a finite measure µ on X = G b × N (resp. X = Gb if H is cyclic) and a unitary
isomorphism φ : H → L2µ (X) such that
φ ◦ πg = Mg ◦ φ (12.25)
for all g ∈ G, where Mg is the multiplication operator defined by the func-

tion X ∋ (χ, n) 7→ χ(g). Moreover, a unitary isomorphism φ : H → L2µ (X)
satisfies (12.25) if and only if it satisfies φ◦(f ∗π ) = Mfq◦φ for any f ∈ L1 (G).
The proof of the corollary consists largely of assembling the evidence that
we have already proved it.
Proof of Corollary 12.81. By Proposition 11.38, L1 (G) is a separable
commutative Banach algebra. Applying Exercise 11.1 we obtain the separable
commutative unital Banach algebra L1 (G) ⊕ C, whose elements we will write
as f + λI where I denotes the multiplicative unit of the algebra.
Using Exercise 3.86 we define the bounded operator
ı : L1 (G) ⊕ C ∋ f + λI 7−→ f ∗π + λI ∈ B(H).
By Proposition 3.91 and Exercise 3.92 it follows that the closure A of the
image ı(L1 (G) ⊕ C) is a separable commutative unital sub-algebra of B(H).
By Exercise 12.80, we see that A is also a C ∗ -sub-algebra. Applying The-
orem 12.60 we find a unitary isomorphism
φ : H → L2µ (X)
for some finite measure µ on X = σ(A) × N satisfying φ ◦ a = Mao ◦ φ for

all a ∈ A.
We will deduce (12.25) from this formula. However, instead of carrying
the factor N around in the following discussions we simplify the notation and
assume that the unitary representation is cyclic, and hence X = σ(A) (see
Exercise 12.82). The general case follows easily from this by either dropping
that assumption after one has understood the argument below, or by putting
the various cyclic subspaces back together as we have done many times before.
Since ı : L1 (G) ⊕ C → A has dense image, a linear functional on A is
uniquely determined by its restriction to the image ı L1 (G) ⊕ C . Equival-
ently, the dual map ∗
ı∗ : A∗ −→ L1 (G) ⊕ C
is injective. By Exercise 8.9(b) (see also the hint on p. 574), ı∗ is continuous
with respect to the weak* topology. By Theorem 11.23, X = σ(A) is compact,
and hence the restriction of ı∗ to X is a homeomorphism to
ı∗ (X) ⊆ X ′ = σ(L1 (G) ⊕ C).
Next we define the measure µ′ = (ı∗ )∗ µ, which also gives us the identifica-
tion L2µ (X) = L2µ′ (X ′ ). Fix some f ∈ L1 (G), then we have φ◦ (f ∗π ) = Mao ◦ φ
with a = ı(f ) = f ∗π . We use ı∗ to identify X with the subset ı∗ (X) ⊆ X ′ ,
and claim that in this sense the function f o extends the function ao . Indeed
if χ ∈ X = σ(A), then
ao (χ) = χ(a) = χ(ı(f )) = χ ◦ ı(f ) = f o (χ ◦ ı) = f o (ı∗ (χ)).
In other words, we obtain the following slightly more convenient description:

the unitary isomorphism φ : H → L2µ′ (X ′ ) satisfies φ ◦ (f ∗π ) = Mf o ◦ φ for
all f ∈ L1 (G).
By Corollary 11.29 (and its proof) we have the identification
X ′ = σ(L1 (G) ⊕ C) = σ(L1 (G)) ∪ {0},
which is also a homeomorphism. We claim that µ′ actually gives full measure

to L1 (G) and zero measure to the extra point 0. This follows from continuity
of the unitary representation. Indeed, if φ(v) = 1{0} and µ′ ({0}) is positive
then v ∈ H is non-zero and there exists a compact neighbourhood B of 0 ∈ G
such that ℜ(hπg v, vi) > 0 for all g ∈ B. This implies that 1B ∗π v is non-zero,
since Z
ℜh1B ∗π v, vi = ℜhπg v, vi dm(g) > 0.
B
On the other hand, (1B ) (0) = 0 and hence M1oB 1{0} = 0. Recalling the
o
formula φ◦(f ∗π ) = Mf o ◦φ we derive a contradiction and see that µ′ ({0}) = 0.

Finally, we recall from Proposition 11.38 that the Gelfand dual σ(L1 (G))
can be identified with the Pontryagin dual group Gb and the Gelfand transform
can be identified with the Fourier back transform. Simplifying the notation,
we can summarize the above by saying that the spectral theorem shows the
existence of a finite measure µ on X = G b and a unitary isomorphism
φ : H → L2µ (X)
such that φ ◦ (f ∗π ) = Mfq ◦ φ for all f ∈ L1 (G).

This implies (12.25) by using an approximate identity (which also gives
the first direction of the claimed equivalence in the corollary). Let (Bk ) be
a decreasing sequence of open neighbourhoods of 0 ∈ G that form a basis of
the topology at 0, define ψk = m(B 1
k)
1Bk , and fix some g0 ∈ G. Applying the
argument above to the function f defined by
f (g) = ψkg0 (g) = ψk (g − g0 ),
we obtain
φ ◦ (ψkg0 ∗π ) = Mψ
~ g0 ◦ φ (12.26)
k
for all k. Fix some v ∈ H. We will prove that Mψ

~ g0 (φ(v)) converges
k
to Mg0 (φ(v)) (as defined in the corollary) as k → ∞ and that φ (ψkg0 ∗π v)
converges to φ(πg0 v), which then gives (12.25).
To see that Mψ~g0 (φ(v)) converges to Mg0 (φ(v)), we note that
k
Z

}
ψk (χ) = ψk χ dm 6 kψk0 k1 = 1,
g0 g0 g
G
for all χ ∈ G, }
b so that kψ g0
k k∞ 6 1, that
Z
} g0 1
lim ψk (χ) = lim χ dm = χ(g0 ),
k→∞ k→∞ m(Bk ) Bk +g0
and that with this dominated convergence implies that

2 Z 2
} g0
lim Mψ~g0 (φ(v))−M g0 (φ(v)) = lim ψk (χ)−χ(g 0 ) |φ(v)|2 dµ(χ) = 0,
k→∞ k 2 k→∞ X
as claimed.
Since φ is a unitary isomorphism, (12.26) shows that (ψkg0 ∗π v) converges
in H. In order to identify the limit ve ∈ H, fix some w ∈ H and use the
definition of ψkg0 ∗π to see that
Z
g0
v , wi = lim hψk ∗π v, wi = lim
he ψ g0 (h) hπh v, wi dm(h)
k→∞ k→∞ G k
Z
1
= lim hπh v, wi dm(h) = hπg0 v, wi
k→∞ m(Bk ) B +g0
k
by the continuity property of unitary representations. Since w ∈ H is arbit-

rary, it follows that ψkg0 ∗π v converges to πg0 v as k → ∞. Therefore (12.25)
follows by taking the limit of the equation (12.26) as k → ∞.
To prove the corollary it remains to show that (12.25) for some unitary
isomorphism φ : H → L2µ (X) for a finite measure on X = G b × N implies that
φ ◦ (f ∗π ) = Mfq ◦ φ
for any f ∈ L1 (G). This follows from the definition of f ∗π and Fubini’s
theorem. Indeed, using (12.25) in the form φ(πg u)(χ, n) = χ(g)φ(u)(χ, n) for
all (χ, n) ∈ Gb × N we obtain
Z
∗
hφ(f ∗π u), f1 i = hf ∗π u, φ f1 i = f (g) hπg u, φ∗ f1 i dm(g)
G
Z Z
= φ(πg u)(χ, n)f1 (χ, n)f (g) dµ(χ, n) dm(g)
G X
Z Z
= f (g)χ(g) dm(g)φ(u)(χ, n)f1 (χ, n) dµ(χ, n)
DX G E
= Mfq(φ(u)), f1
for all u ∈ H and f1 ∈ L2µ (X).
Essential Exercise 12.82. In the third paragraph of the proof of Corol-

lary 12.81 we assumed ‘without loss of generality’ that the Hilbert space H
is cyclic. However, this contained a small cheat as we did not clarify whether
we meant cyclic with respect to the unitary representation π (as in Defin-
ition 9.7) or cyclic with respect to the sub-algebra A obtained from L1 (G)
and convolution (as in Definition 12.52). Show that these two notions are
equivalent.
Exercise 12.83. Given a unitary representation π of a locally compact σ-compact metric
abelian group G apply Theorem 12.79 to the closure of the image of the algebra
R L1 (G) ⊕ C
to obtain a projection-valued measure on Gb such that πg is given by b χ(g) dΠχ for
G
all g ∈ G.
12.8.2 Characters Separate Points
Using the spectral theorem for unitary representations as in the last corollary,
we turn to the question of whether the Pontryagin dual group is sufficiently
rich to separate points, as claimed on p. 92.
Theorem 12.84 (Completeness of characters). On every locally com-

pact σ-compact metric abelian group G there are enough characters to separ-
b
ate points. That is, if g, h ∈ G have g 6= h, then there exists a character χ ∈ G
with χ(g) 6= χ(h).
Proof. Let G be a locally compact σ-compact metric abelian group, and

let g0 ∈ Gr{0} be non-trivial. Then it is easy to see (for example, by us-
ing characterstic functions of sufficiently small compact neighbhorhoods of 0
in G) that the unitary operator
λg0 : L2 (G) −→ L2 (G)

f 7−→ (f g0 : h 7→ f (h − g0 ))
is not the identity map. Therefore, if we apply Corollary 12.81 to the reg-
ular representation λ of G on L2 (G) we see that there exists some χ ∈ G b
with χ(g0 ) 6= 1. Applying this to g0 = g − h for some g, h ∈ G with g 6= h,
we obtain completeness of characters, as claimed.
12.8.3 The Plancherel Formula
Recall from Proposition 11.43 that G b is a σ-compact locally compact metric

abelian group if G is and let us note that in the next subsection we will use this
to establish Pontryagin duality. We show in this subsection that by applying
Corollary 12.81 to the regular representation of G we obtain a generalization
of the Fourier transform to more general abelian groups. Because of these
results it is natural to treat G and G b in the same way, and in particular to
use the same additive notation in both groups. This is a familiar process in
functional analysis, where notation supports abstraction of ideas: The dual
group G b is initially defined as a collection of maps taking values in S1 with
the operation of pointwise multiplication, which is therefore most naturally
written multiplicatively. However, given the developed structure of G b it is now
natural to think of the dual group operation additively as follows. However,
this is really just a change of notation and not an isomorphism between two
differently defined objects. Hence nothing needs to be proved, but the relation
between the old and the new notation needs to be clarified.
Let us write t for an element of G b (as we did in Section 9.2), use additive
notation for the group operation, and write χt : G → S1 for the character
on G corresponding to t ∈ G. b In particular, this means that χ0 = 1 and
χt1 +t2 = χt1 χt2
b If we want to remove the discrimination between G and G

for t1 , t2 ∈ G. b even
further (as we will do, for example, in the next subsection) we also write
hg, ti = χt (g) ∈ S1
b For t ∈ G
for the dual pairing of g ∈ G and t ∈ G. b write Mt : L2 (G) → L2 (G)
for the multiplication operator defined by Mt (f )(g) = hg, tif (g) for g ∈ G.
Finally, we will write λb for the regular representation of G
b on (equivalence
b
classes of) functions f on G, so that
bt (f )(t) = f (t − t0 )
λ 0
b
for all t, t0 ∈ G.
Theorem 12.85 (Plancherel formula). Let G be a σ-compact locally com-

pact metrizable abelian group with Haar measure mG . Then there exists
a normalization of the Haar measure mGb on G b and a unitary isomorph-
ism φ : L (G) → L (G) which extends the Fourier back transform f 7→ fq
2 2 b
on L1 (G) ∩ L2 (G) to all of L2 (G) and satisfies φ ◦ λg = Mg ◦ φ as well

b−t ◦ φ for all g ∈ G and t ∈ G.
as φ ◦ Mt = λ b
We note that this generalizes Theorem 9.39 and Proposition 9.29, except
that we work here with the Fourier back transform. We split the proof of
the theorem into several steps. Our argument below may not be the most
direct approach, but will also help us to prove Pontryagin duality in the next
subsection. We will assume the hypotheses of Theorem 12.85 throughout.
Lemma 12.86 (A Gaussian on G). There exists a ψ ∈ V = L1 (G)∩L2 (G)

q > 0 for all t ∈ G.
with ψ(t) b
Proof. Recall that for an approximate identity ψk = 1

m(Bk ) 1Bk ∈ V (as in
|k (t) −→ 1 as k → ∞ for any element t
the proof of Corollary 12.81), we have ψ
b b
of G. In particular, for every t ∈ G there exists some k ∈ N with ψ |k (g) 6= 0.
f
Moreover, with ψk = ψk (−g) we have
Z Z
|
f |k (t)
ψk (t) = ψk (−g)χt (g) dm(g) = ψk (h)χt (h) dm(h) = ψ
G G
b and k ∈ N. Therefore, ψ
for every t ∈ G f q 2
k ∗ ψk = |ψk | > 0 for every k ∈ N.
P∞ fk for some rapidly decaying positive sequence (ck )k
Setting ψ = k=1 ck ψk ∗ ψ
P∞
we obtain ψ ∈ L (G) ∩ L2 (G). By the above we have ψb = k=1 ck |ψ
1 fk | > 0
and the lemma follows.
Lemma 12.87 (Correcting measure and isomorphism). If we apply

Corollary 12.81 to the regular representation λ of G we may assume without
loss of generality that the resulting measure space is defined using X = G b
b
(instead of G × N). Moreover, assuming the conclusions of Corollary 12.81 it
is possible to replace the original spectral measure by an absolutely continu-
ous σ-finite measure µ such that φ also satisfies
φ(f )(t) = fq(t) (12.27)
b and all f ∈ V = L1 (G) ∩ L2 (G).

for t ∈ G
Proof. Applying Corollary 12.81 to the unitary representation λ, we obtain

b × N and a unitary isomorphism
a finite measure µ0 on X = G
φ0 : L2 (G) −→ L2µ0 (X)

such that φ0 ◦ (f ∗λ ) = Mfq ◦ φ0 for all f ∈ L1 (G).

Suppose that f1 , f2 , f ∈ V = L1 (G) ∩ L2 (G). Then, by definition of f1 ∗λ f2
and Fubini’s theorem,
Z
hf1 ∗λ f2 , f i = f1 (g)hλg (f2 ), f i dm(g)
ZZ
= f1 (g)f2 (h − g)f (h) dm(g) dm(h)
| {z }
k
ZZ
= f1 (h − k)f2 (k)f (h) dm(k) dm(h) = hf2 ∗λ f1 , f i,
which proves f1 ∗λ f2 = f2 ∗λ f1 by density of V ⊆ L2 (G) (see also Exercises 3.86

and 3.92). Applying the unitary isomorphism φ0 this gives
fq1 φ0 (f2 ) = fq2 φ0 (f1 ) (12.28)
almost everywhere with respect to µ0 .

We now set f1 = ψ with ψ ∈ V as in Lemma 12.86 and define
φ0 (ψ)(t, n)
w(t, n) =
q
ψ(t)
b × N. Setting f2 = f ∈ V and dividing (12.28) by ψb we obtain

for (t, n) ∈ G
φ0 (f )(t, n) = w(t, n)fq(t) (12.29)
for all f ∈ V and µ0 -almost every (t, n) ∈ X. This represents the main step
towards the lemma, which we will obtain by modifying the unitary isomorph-
ism and the measure as follows.
Since φ0 : L2 (G) → L2µ0 (X) is an isomorphism and V = L1 (G) ∩ L2 (G) is
dense in L2 (G), we see from (12.29) that w(t, n) 6= 0 µ0 -almost everywhere.
Using this we define the σ-finite measure µ1 on X by
dµ1
= |w|2 ,
dµ0
and the map φ1 = Mw−1 ◦ φ0 (with inverse φ−1 0 ◦ Mw ) which satisfies

Z Z
2 dµ1
kφ1 (f )kL2 (X) = |w|−2 |φ0 (f )|2 dµ1 = |φ0 (f )|2 |w|−2 dµ0
µ1
X X dµ 0
2
= kφ0 (f )kL2 = kf k2
µ0 (X)
for all f ∈ L2 (G). Hence φ1 is a unitary isomorphism φ1 : L2 (G) → L2µ1 (X),

and
φ1 (f )(t, n) = fq(t) (12.30)
for all f ∈ V and µ1 -almost every (t, n) ∈ X. Since any two multiplication
operators on X commute, the new unitary isomorphism still satisfies the
conclusions of the spectral theorems.
To summarize, φ1 : L2 (G) → L2µ1 (X) is a unitary isomorphism satisfy-
ing (12.30). Finally, since φ1 (V) is dense in L2µ1 (X), this implies that every
element of L2µ1 (X) can be expressed as a pointwise limit of a sequence in φ1 (V)
and so has a representative that only depends on t ∈ G.b Let p : X → G b
b S denote
the projection to the first coordinate of X = G × N, and write X = n>1 Xn
as a union of sets Xn ⊆ X with finite measure. Then for every n > 1 we may
use the above observation for the function 1Xn ∈ L2µ1 (X) and see that there
exists a measurable set Yn ⊆ G b with µ1 (Xn △(Yn × N)) = 0. It follows that
[
µ1 X r Yn × N = 0
n>1
b with L2 (G)
and so p∗ µ1 is a σ-finite measure on G b = L2 (X).
p∗ µ1 µ1
Simplifying the notation, we may assume that φ : L2 (G) → L2µ (G) b is a
b
unitary isomorphism, that µ is a σ-finite measure on G, and that in addition
to the claims of the spectral theorem it also satisfies (12.27).
Essential Exercise 12.88. Show that L b is dense.

1 (G) ⊆ C (G)
0
Lemma 12.89 (Spectral theorem produces Haar measure). Let µ be

b satisfying the conclusion of Lemma 12.87. Then the
a σ-finite measure on G
measure µ = mGb is a Haar measure on G. b
Proof. Throughout the proof we will use the function ψ from Lemma 12.86.
We first claim that µ is locally finite. Notice that ψq ∈ C0 (G)b by definition of
b
the topology on G in Propositions 11.33 and 11.38. Hence

Ot0 = t ∈ G b | |ψ| q 2 (t0 )
q 2 (t) > 1 |ψ|
2
is a neighbourhood of t0 ∈ G.b Together with ψq = φ(ψ) ∈ L2µ (G) b (as assumed

in the lemma) and |ψ| q 2 ∈ L1µ (G),
b it follows that µ(Ot0 ) < ∞. Since t0 ∈ G b
was arbitrary, it follows that µ is locally finite, as claimed.
Next we note that for t0 ∈ G b and f ∈ L1 (G) we have χ ~ b
t0 f = λ−t0 f since
Z

(χt0 f )(t) = (χt0 f )χt dm = fq(t0 + t) = λ b−t fq (t)
0 (12.31)
G
b
for all t ∈ G.
Below we combine (12.31) for f = ψ ∈ L1 (G) with a similar claim for the
spectral measure of ψ ∈ L2 (G) and will obtain the lemma from this. By the
assumptions in the lemma we can define the spectral measures µF on G b for
the algebra L1 (G) acting on functions F ∈ L2 (G) by dµF = |φ(F )|2 dµ since
Z
hf ∗λ F, F iL2 (G) = fq|φ(F )|2 dµ
b
G

We will now show that the spectral measures satisfy

µχt0 ψ = Tt0 ∗ µψ (12.32)
for all t0 ∈ G,b where we use the translation defined by Tt0 (t) = t − t0 for
all t ∈ Gb and the push-forward of the measure (as defined on p. 265). Indeed,
for f ∈ L1 (G) we have (by our definitions)
Z

fqdµχt0 ψ = f ∗λ χt0 ψ , χt0 ψ L2 (G)
b
G
Z

= f (g) λg χt0 ψ , χt0 ψ L2 (G) dm(g)
ZG

= f (g)χt0 (−g) χt0 λg ψ, χt0 ψ L2 (G) dm(g)
G | {z }
=hλg ψ,ψi
Z Z

= (χ−t0 f ) ∗λ ψ, ψ L2 (G)
= χ
−t0 f dµψ = fqd Tt0 µ ,
∗ ψ
b
G b
G
since (χ b q q b b
−t0 f )(t) = λt0 f (t) = f (t − t0 ) = f ◦ Tt0 (t) for all t ∈ G by (12.31).
This proves the claim (12.32) by the uniqueness properties of the spectral
measures (which follow from Exercise 12.88) as f ∈ L1 (G) was arbitrary.
We now combine (12.31), (12.32), and the assumption that φ(f ) = fq for
any f ∈ V. For some test function F ∈ Cc (G) b we then have
Z Z Z
q 2
F |ψ| dµ = F dµψ = F ◦ T−t0 d(Tt0 )∗ µψ
b
G b b
ZG G
Z
= b−t F dµχ ψ =
λ 0 t0 λ 0
~
b−t F |χ 2
t0 ψ| dµ
b b
ZG G
Z
= b q 2
λ−t0 F λ−t0 |ψ| dµ = b−t (F |ψ|
λ 0
q 2 ) dµ.
b
G b
G
q −2 ∈ Cc (G)
Replacing F by F |ψ| b we also obtain
Z Z
F dµ = b−t0 F dµ
λ
b
G b
G
b and F ∈ Cc (G).
for any t0 ∈ G b By the uniqueness property of the measure
in the Riesz representation theorem (Theorem 7.54) we deduce that µ is
invariant under translation.
Finally, note that µ(O) > 0 for any non-empty open subset, since otherwise
every compact subset could be covered by finitely many translates of O and
hence would have measure 0. Since µ 6= 0 we deduce that µ is a Haar measure.

Proof of Theorem 12.85. By Lemma 12.87 we may apply Corollary 12.81

and assume that X = G b and φ(f ) = fq for all f ∈ V = L1 (G) ∩ L2 (G).
Applying Lemma 12.89 we also see that the measure is given by the Haar
measure µ = mGb . The formula φ ◦ λg = Mg ◦ φ holds by Corollary 12.81.
b−t (φ(f )) holds initially for f ∈ V, but knowing that µ
Finally, φ(Mt0 f ) = λ 0
is the Haar measure on G b extends easily to all of L2 (G). This concludes the
proof of the theorem.
12.8.4 Pontryagin Duality
We are now ready to establish a complete symmetry between G and its dual
b Using Proposition 11.43 we can define the dual group G b
b of the dual
group G.
b
group G of G and are led to the question of reflexivity of locally compact
abelian groups. Fortunately, the situation here is much better than that for
Banach spaces in Chapter 7, as the next result shows. Let us prepare for it
with the following exercise.
Essential Exercise 12.90. Let G be a locally compact σ-compact metric

abelian group with dual group G. b
b
(a) Show that G × G ∋ (g, t) 7→ hg, ti ∈ S1 is continuous.
b
b defined by ı(g)(t) = hg, ti = χt (g) for g ∈ G
(b) Show that the map ı : G → G
and t ∈ Gb is a continuous and injective homomorphism of groups.
(c) Suppose gn → ∞ as n → ∞. Show that for any f ∈ L2 (G) the se-
quence λgn f converges weakly to 0.
(d) Show that ı is a proper map, meaning that gn → ∞ as n → ∞ implies
that ı(gn ) → ∞ as n → ∞.
(e) Show that ı is closed (that is, the image of every closed set is again closed).
Corollary 12.91 (Pontryagin duality). Let G be a locally compact σ-

b
b is an
compact metric abelian group. Then the canonical map ı : G → G
isomorphism of topological groups.
Proof. By Exercise 12.90 the map ı is a continuous closed injective ho-

momorphism of topological groups. It only remains to show that it is sur-
jective; continuity of its inverse will then follow from ı being a closed map.
b be as in Theorem 12.85, which satisfies
Let φ : L2 (G) → L2 (G)
b−t ◦ φ
φ ◦ Mt = λ
b We will now read this formula backwards and derive the corollary
for all t ∈ G.
from it. For this we define a unitary isomorphism
b → L2 (G)
U : L2 (G) b
b that is by defining U (f )(t) = f (−t)

by reflecting functions through 0 ∈ G,
2 b b
for f ∈ L (G) and t ∈ G. Using U and φ we also define
b → L2 (G),
ψ = φ−1 ◦ U : L2 (G)
bt = λ
which is also a unitary isomorphism. Now notice that U ◦ λ b−t ◦ U since

bt (f )) (t′ ) = λ
U (λ bt (f )(−t′ ) = f (−t′ − t)
and
b−t (U (f )) (t′ ) = U (f )(t′ + t) = f (−t′ − t)
λ
b and t, t′ ∈ G.
for all f ∈ L2 (G) b Since φ−1 ◦ λ
b−t = Mt ◦ φ−1 it follows that
bt = φ−1 ◦ U ◦ λ
ψ◦λ bt = φ−1 ◦ λ
b−t ◦ U = Mt ◦ φ−1 ◦ U = Mt ◦ ψ
b which is the conclusion of Corollary 12.81 (if we were to apply

for all t ∈ G,
b b
it to G and λ).
We now apply Lemma 12.87, which says in our context (and using the
b
b that the image measure µ = ı∗ mG can be modified
injection ı : G → G)
by a density so that ψ(f ) = fq for f in L1 (G)b ∩ L2 (G).
b By Lemma 12.89,
2 b
b However, by
this new measure dm bb = |w| dµ is a Haar measure on G.
G
b
b
Exercise 12.90 the image ı(G) ⊆ G is a closed subgroup, so Supp µ ⊆ ı(G).
br b
b
Since m bb G ı(G) = 0 we see that G b = ı(G), as claimed in the corollary.
G
We close with several exercises developing certain functorial aspects of
Pontryagin duality. Throughout these exercises the groups arising are as-
sumed to be locally compact σ-compact metric abelian groups, as usual.
Exercise 12.92. Show that if H < G is a closed subgroup, then G/H is also a locally
compact σ-compact metric abelian group with respect to the quotient topology.
Exercise 12.93. Let H < G be a closed subgroup, and define the annihilator group
b | hh, ti = 1 for all h ∈ H}.

H ⊥ = {t ∈ G
[ ∼
(a) Show that G/H = H⊥.
b
b we can also define the double
(b) Using the canonical isomorphism between G and G
annihilator (H ⊥ )⊥ as a subgroup of G. Show that (H ⊥ )⊥ = H.
(c) Deduce from this that Hb∼ b ⊥.
= G/H
Exercise 12.94. Let θ : H → G be a continuous homomorphism.

(a) Show that θ(χb t ) = χt ◦ θ for t ∈ G

b defines a continuous homomorphism θb from Gb to H.
b
(b) Show that θ is injective (or has dense image) if and only if θb has dense image (respect-
ively is injective).
Exercise 12.95. (a) Show that G\ ∼ c1 × G

1 × G2 = G c2 for any groups G1 , G2 as above.
Q
(b) Let (Gn ) be a sequence of compact groups. Show that the direct product n>1 Gn is
again a compact metric abelian group, and that its dual is given by the direct sum
Y n o
\ ∼M c Y
cn tn = 0 for all but finitely many n > 1
Gn = Gn = (tn ) ∈ G
n>1 n>1 n>1
with the discrete topology.
Exercise 12.96. Let (Gn ) be a sequence of compact groups and suppose in addition that
there is a surjective continuous homomorphism φn : Gn+1 → Gn for each n > 1. The
projective limit of the system (Gn , φn ) is defined by
n Y o
lim(Gn , φn ) = (gn ) ∈ Gn φn (gn+1 ) = gn for all n > 1 .
←−
n>1
Show that this is again a compact metric abelian group (with the topology inherited from
the product topology), and that
V [
lim(Gn , φn ) = cn ,
G
←−
n>1
where we use the injective continuous homomorphism φ cn : Gcn → G \ n+1 to identify the
c \ c,φ
group Gn with a subgroup of Gn+1 ; this direct limit is also written lim(G c ).
−→ n n
Exercise 12.97. Formulate and prove the dual statements to Exercise 12.95–12.96 (start-
ing with direct sums, respectively direct limits).
12.9 Further Topics
• Spectral theory will be developed further in Chapter 13, where we study

the spectral theory of unbounded self-adjoint operators.
• We refer to Folland [32] and [26] for more on the theory of abstract
harmonic analysis and unitary representations of non-abelian groups.
• For more material on abelian harmonic analysis and the structure of
abelian topological groups, we refer to Hewitt and Ross [45].
Chapter 13
Self-Adjoint and Symmetric Operators
13.1 Examples and Definitions
In this chapter we will generalize the spectral theorem from Chapter 12 to

the case of unbounded self-adjoint operators (the formal definition will be
given below). The model case for such an operator is again a multiplication
operator.
Example 13.1. Let (X, B, µ) be a σ-finite measure space, and let g : X → R

be measurable. The multiplication operator Mg : f 7→ gf has the natural
domain
DMg = f ∈ L2µ (X) | gf ∈ L2µ (X) .
Clearly Z
hMg (f1 ), f2 i = gf1 f2 dµ = hf1 , Mg (f2 )i
X
for f1 , f2 ∈ DMg . This suggests that Mg is a self-adjoint operator, which is
unbounded if g ∈ / L∞µ (X) (though this statement requires a proof after we
have seen the formal definitions).
The example above as well as the following ones and Exercise 4.29 show
that unbounded self-adjoint operators cannot reasonably be required to be
defined on the whole Hilbert space. In contrast to Definition 4.27, we will in
this chapter always assume that X = H and Y = H′ are complex Hilbert
spaces, and that the domain DT ⊆ H is dense.
Definition 13.2. Let H and H′ be complex Hilbert spaces, let DT ⊆ H be

a subspace, and let T : DT → H′ be a linear operator. Then we write
(DT , T ) : H −→ H′ .
If DT is a dense subspace then we say that T is a densely defined operator

from H to H′ . We say that T is closable if Graph(T ) is again the graph of a

488 13 Self-Adjoint and Symmetric Operators
densely defined operator (DT , T ) : H → H′ . We say that T is a closed operator

if Graph(T ) is closed. If (DT , T ) : H → H′ and (DS , S) : H → H′ are linear
operators, then we say that T is equal to S if DT = DS and T = S, and say
that S is an extension of T , written T ⊆ S, if DT ⊆ DS and S|DT = T .
Of course bounded operators between two Hilbert spaces are special cases
of this definition and in this case we will keep using the notation B : H → H′ .
We note that the inverse and composition of operators will be understood
here as in set theory: If
(DT , T ) : H → H′
is injective, then
(DT −1 , T −1 ) : H′ → H
is simply the inverse map, and it is densely defined if DT −1 = T (DT ) is dense
in H′ . If (DT , T ) : H → H′ and (DS , S) : H′ → H′′ are densely defined
operators, then
DST = {v ∈ DT | T v ∈ DS }
is a subspace and ST : DST → H′′ is linear, but in general it is not clear
whether this defines a densely defined operator.
Lemma 13.3 (Adjoint operator). Let (DT , T ) : H → H′ be a densely

defined operator between complex Hilbert spaces. Then there exists a closed
operator (DT ∗ , T ∗ ) : H′ → H, called the adjoint, satisfying
hT v, wiH′ = hv, T ∗ wiH
for all v ∈ DT and all vectors w belonging to the domain
DT ∗ = {w ∈ H′ | DT ∋ v 7−→ hT v, wiH′ is bounded} .
Moreover, T ∗ is densely defined if and only if T is closable. In this case the

adjoint of the adjoint, T ∗∗ , is equal to the closure T of the operator T .
We will prove this lemma together with Lemma 13.8, but only after we
have seen a few more examples. We note again that in the lemma above
and in the following definition equality of operators entails equality of their
domains.
Definition 13.4. Let (DT , T ) : H → H be a densely defined operator on a

complex Hilbert space. If T = T ∗ then T is said to be self-adjoint.
Essential Exercise 13.5. (a) Check that Example 13.1 indeed defines a self-
adjoint operator in the sense of Definition 13.4.
(b) When does a complex-valued measurable function on a σ-finite meas-
ure space (X, B, µ) define a densely defined, closable, closed, self-adjoint, or
bounded multiplication operator?
13.1 Examples and Definitions 489
d
Example 13.6. Let dx : Cc∞ (R) −→ H = L2 (R) be the differentiation oper-
ator, and define an operator T by
d

Graph(T ) = Graph dx .
By Definitions 5.7, 5.14, and the properties of the weak derivative (Lemma 5.10
applied with d = k = 1) this indeed defines a map
T : DT = H01 (R) −→ L2 (R).
The map iT : DiT = DT → L2 (R) can be checked to be an unbounded

self-adjoint operator which is conjugate to an unbounded self-adjoint multi-
plication operator as in Example 13.1. The unitary isomorphism is given by
the Fourier transform: by Proposition 9.43 we have
d
d b
dx f (t) = 2πitf (t)
for f ∈ Cc∞ (R). From this one can deduce that
d
D \1 2
T = H0 (R) = DMg = {f ∈ L (R) | ktf (t)k2 < ∞}
where g(t) = −2πt, and that the diagram

iT
DiT = DT −−−−→ L2 (R)
 
 
by yb
d
DMg = D 2
T −−−−→ L (R)
Mg
commutes and completely describes iT (and hence T and DT = H01 (R)) in

terms of a multiplication operator (and its domain). We refer to Exercise 9.63
and its hints on p. 578.
The following exercise shows that we have to be more careful about the
domain of unbounded operators, in contrast to the discussions in the previous
chapters which mostly involved bounded operators. In fact, the principle of
automatic extension (Proposition 2.59) has been used extensively throughout
the text but fails in many ways for the unbounded operators we have just
introduced.
Exercise 13.7. Let X = (0, 1) and consider again the operator
d
: Cc∞ ((0, 1)) −→ L2 ((0, 1)).
dx
d
(a) Recall that T0 : H01 ((0, 1)) → L2 ((0, 1)) sending f to ∂ 1 f extends the operator dx to
a closed operator (DT0 , T0 ) : L2 ((0, 1)) → L2 ((0, 1)).
d
(b) Recall that Tp : H 1 (T) → L2 (T) sending f to ∂ 1 f also extends the operator dx to a
closed operator (DTp , Tp ) : L2 ((0, 1)) → L2 ((0, 1)).
(c) Show that T0 ( Tp = −Tp∗ ( −T0∗ , and describe (DT0∗ , T0∗ ).
We will base our discussion of the spectral theory of self-adjoint operators

on the following lemma.
Lemma 13.8 (Orthogonal decomposition into two graphs). Let
(DT , T ) : H → H′
be a closed densely defined operator between two complex Hilbert spaces. The
orthogonal complement of the closed set
Graph(T ) ⊆ H × H′
^ ∗ ), where
is given by Graph(T
f : H′ × H −→ H × H′
(w, v) 7−→ (v, −w).
Proof of Lemmas 13.3 and 13.8. Let (DT , T ) : H → H′ be a densely

defined operator. Notice that if w ∈ DT ∗ , so that the linear map
DT ∋ v 7→ hT v, wiH′
is bounded by definition, then this linear functional can be uniquely extended

from the dense subset DT to H. In particular, by Fréchet–Riesz representation
(Corollary 3.19) there exists a uniquely defined T ∗ w ∈ H with
hT v, wiH′ = hv, T ∗ wiH (13.1)
for all v ∈ DT . It is easy to check that DT ∗ is a linear subspace and that this
defines the linear operator T ∗ : DT ∗ → H.
For the proof of Lemma 13.3 we wish to show next that T ∗ is closed. For
this it is useful to first prove that
^ ∗ ),
Graph(T )⊥ = Graph(T (13.2)
which in particular will imply Lemma 13.8. Let w ∈ DT ∗ so that (13.1) holds
for all v ∈ DT . By definition,
^ ∗)
(T ∗ w, −w) ∈ Graph(T
and
h(v, T v), (T ∗ w, −w)iH×H′ = hv, T ∗ wiH − hT v, wiH′ = 0
for all v ∈ DT . On the other hand, if (v ′ , −w) ∈ Graph(T )⊥ so that
h(v, T v), (v ′ , −w)iH×H′ = hv, v ′ iH − hT v, wiH′ = 0

13.2 Operators of the Form T ∗ T 491
for all v ∈ DT , then DT ∋ v 7→ hT v, wiH′ is bounded. Thus we have w ∈ DT ∗

and v ′ = T ∗ w, so
^ ∗ ).
(v ′ , −w) = (T ∗ w, −w) ∈ Graph(T
Hence Lemma 13.8 follows, and in particular T ∗ is a closed operator.

We now show that T ∗ is densely defined if and only if T is closable. For
this, note that w0 ∈ (DT ∗ )⊥ if and only if
h(0, w0 ), (T ∗ w, −w)iH×H′ = 0
⊥
^ ∗ ) . By (13.2) and the
for all w ∈ DT ∗ , or equivalently (0, w0 ) ∈ Graph(T
characterization of the closed linear hull in Corollary 3.26 this is in turn
equivalent to
(0, w0 ) ∈ Graph(T ).
If now T is closable, then (0, w0 ) ∈ Graph(T ) implies that w0 = 0 and
so (DT ∗ )⊥ = {0} and thus DT ∗ = H′ . On the other hand, if T is not
closed, then there exists a non-zero vector (0, w0 ) ∈ Graph(T ) and so the
element w0 ∈ (DT ∗ )⊥ shows that T ∗ is not densely defined.
For the final remark of Lemma 13.3 we apply (13.2) to T and to T ∗
(and also note that the operator e is unitary and ((v, w) e ) e = −(v, w)
for all (v, w) ∈ H × H′ ) to see that

^ ∗) ⊥
Graph(T ) = Graph(T )⊥⊥ = Graph(T
g
= Graph(T ∗ )⊥ ^ ∗∗ ) g = Graph(T ∗∗ ),
= Graph(T
as claimed.
13.2 Operators of the Form T ∗ T
As we have seen, differentiation can often be used to define a closed operator T

which sends a function to its total derivative. Moreover, T ∗ is then often the
negative of the divergence on vector fields, so that T ∗ T is often some kind of
Laplace operator. This observation also holds true in other cases more general
than those considered here, and this motivates the following discussion.
Theorem 13.9 (Spectral theory of T ∗ T ). Let (DT , T ) : H → H′ be a

densely defined closed linear operator. Then (DT ∗ T , T ∗ T ) : H → H is a
densely defined self-adjoint operator which is unitarily isomorphic to a mul-
tiplication operator
(DMg , Mg ) : L2µ (X) −→ L2µ (X)

for some finite measure space (X, µ) and measurable function g : X → [0, ∞).
Proof. The proof of the theorem essentially comprises a careful analysis of

Figure 13.1.
Graph(T )
(w, T w)
H
(w, 0) (v, 0)
^ ∗)
Graph(T )⊥ = Graph(T
Fig. 13.1: We obtain a bounded operator B : H → H by sending v to w = Bv

using two orthogonal projections.
Let us write PGraph : H × H′ → H × H′ for the orthogonal projection onto

the closed subspace Graph(T ) ⊆ H × H′ , ıH : H → H × H′ for the embedding
map v 7→ (v, 0), and PH : H × H′ → H for the projection map (v, w) 7→ v.
Note that v, w ∈ H and w′ ∈ H′ implies
hıH (v), (w, w′ )iH×H′ = hv, wiH = hv, PH (w, w′ )iH
so that ı∗H = PH . Also note that PGraph

∗ 2
= PGraph = PGraph . Now define
B = PH ◦ PGraph ◦ ıH ,
so that
∗
B ∗ = ı∗H ◦ PGraph ∗
◦ PH = PH ◦ PGraph ◦ ıH = B
is self-adjoint. Moreover,
kBk 6 kPH kkPGraph kkıHk = 1. (13.3)
Also, by definition,
hBv, viH = hPH PGraph ıH (v), viH

= hPGraph ıH (v), PGraph ıH (v)iH×H′ > 0 (13.4)
for any v ∈ H. To summarize, B : H → H is a positive self-adjoint bounded

operator with spectrum in [0, 1].
13.2 Operators of the Form T ∗ T 493
We now relate B to T ∗ T , after which we can simply apply Theorem 12.55

to B and obtain the spectral theorem for T ∗ T . In fact, we claim that†
B = (I + T ∗ T )−1 , (13.5)
or more precisely that

(a) (I + T ∗ T )B = I and, in particular, im(B) ⊆ DT ∗ T ;
(b) B(I + T ∗ T ) = IDT ∗ T and, in particular, DT ∗ T ⊆ im(B).
Together this implies that DT ∗ T = im(B), that B is injective, and finally
that
T ∗ T = B −1 − I
is completely determined by the operator B.
To prove (a) we chase the equations defining B (see Figure 13.1). Let v ∈ H
and w = Bv so that (by definition) w ∈ DT , (w, T w) ∈ Graph(T ), and
^ ∗)
(w, T w) − (v, 0) ∈ Graph(T )⊥ = Graph(T
by Lemma 13.8. This gives
(w − v, T w) = (w, T w) − (v, 0) = (−T ∗ T w, T w),
so w ∈ DT ∗ T and w − v = −T ∗ T w, or equivalently
(I + T ∗ T )Bv = w + T ∗ T w = v.
To prove (b), we essentially use the same formulas. Fix w ∈ DT ∗ T and

define v = w + T ∗ T w. Then
(w, T w) ∈ Graph(T ),
^ ∗ ),
(T ∗ T w, −T w) ∈ Graph(T
and
(v, 0) = (w, T w) + (T ∗ T w, −T w),
which implies that w = Bv = B(I + T ∗ T )w, as claimed.
Now apply Theorem 12.55 to B to find a finite measure space (X, µ)
and some bounded measurable function h ∈ L∞ µ (X) so that B and Mh are
unitarily isomorphic. Since B is injective and satisfies (13.3)–(13.4), h takes
values in (0, 1] µ-almost everywhere. After modifying h on a null set we may
therefore assume that h takes values in (0, 1] everywhere.
Using the same isomorphism φ we claim that T ∗ T is isomorphic to Mg
for g = h1 − 1. Indeed,

φ(DT ∗ T ) = φ(im(B)) = im(Mh ) = f ∈ L2µ (X) | h1 f ∈ L2µ (X) = DMg ,
† The alert reader may at this point feel a sense of déjà vu (cf. Exercise 13.12).
and, since T ∗ T w = B −1 w − w for all w ∈ DT ∗ T , we also see that
φ(T ∗ T w) = φ(B −1 w) − φ(w) = Mh−1 φ(w) − φ(w) = Mg φ(w)
for all such w. Applying Exercise 13.5(a) or 13.10 gives the theorem.
Exercise 13.10. Let B : H → H be an injective self-adjoint bounded operator on a

Hilbert space H. Show directly that the inverse (DB−1 , B −1 ) : H → H is a self-adjoint
operator with domain DB−1 = im(B).
Exercise 13.11 (The influence of the 2

d
domain). Let H = L ((0, 1)).
(a) Let (DT0 , T0 ) = H01 ((0, 1)), dx be the weak derivative map restricted to the
space H01 ((0, 1)). Show that T0∗ T0 equals the negative of the second weak derivative on
DT0∗ T0 = H01 ((0, 1)) ∩ H 2 ((0, 1))
(these are the Dirichlet boundary conditions), and that its eigenfunctions are (scalar mul-
for n ∈ N. ∗
tiples of) the functions x 7→ sin(πnx)
d
(b) Let (DT , T ) = H 1 ((0, 1)), dx . Show that T T coincides with the negative of the
second weak derivative on
DT ∗ T = {f ∈ H 2 ((0, 1)) | f ′ ∈ H01 ((0, 1))},
(which are the Neumann boundary conditions), and that its eigenfunctions are the func-
tions x 7→ cos(πnx) for n ∈ N0 .
d
(c) Let (DTp , Tp ) = H 1 (T), dx . Show that Tp∗ Tp coincides with −∆ on H 2 (T) (which
corresponds to the periodic boundary conditions).
(d) Show that T0∗ T0 , T ∗ T , and Tp∗ Tp are all different and no one extends any other.
Exercise 13.12. Compare the general construction of this section to the arguments of
Section 6.4.2.
Exercise 13.13. Let G = (V, E) be an undirected simple graph as in Section 10.4 (but
→
possibly infinite) such that any v ∈ V has finitely many neighbours, and let E be the set
→
of oriented edges as in Section 12.2. Let H = L2 (V) and H′ = L2 ( E ), where we simply
→
use the counting measure on the vertices in V and the edges in E . Now define
(T f )((v1 , v2 )) = f (v2 ) − f (v1 )

→
for any edge (v1 , v2 ) ∈ E (with v1 6= v2 ∈ V) and for any function f on V, giving an
→
operator (DT , T ) : L2 (V) → L2 ( E ).
(a) Show that T is a bounded operator if and only if there exists some N ∈ N such that
every v ∈ V has at most N neighbours.
(b) Describe the operators T ∗ and T ∗ T where they are defined.
Exercise 13.14. Let G = (V, E) be a finite undirected graph as in Section 10.4, but now
glue for every edge e ∈ E connecting two vertices v1 , v2 ∈ V a compact line segment Se of
length ℓe > 0 between v1 and v2 . We assume that the graph is undirected and we put for
any two vertices at most one line segment linking them directly. This defines a topological
space Q, called a metric graph, consisting of a network of compact line segments (one for
each edge in the graph) that are glued together at the vertices of the graph. Endow Q with
the measure obtained from using the Lebesgue measure on each line segment Se , which in
particular leads to
X
L2 (Q) = L2 (Se ).
e∈E
Define H 1 (Q) to be the space of all continuous functions on Q such that the restriction
to the compact line segment Se ⊆ Q belongs to H 1 (Se ) for every edge e ∈ E. Define the
operator T : H 1 (Q) → L2 (Q) by setting (T f )Se = ∂ fe , where fe = f |Se and the weak
derivative is taken in H 1 (Se ) with respect to the fixed orientation on Se . Then
(H 1 (Q), T ) : L2 (Q) → L2 (Q)
is a densely defined operator (check this). The study of the eigenfunctions of T ∗ T is called
the theory of quantum graphs.
(a) Describe the operators T ∗ and T ∗ T and their domains, especially in relationship to
the behaviour of the functions in the domain at the vertices.
(b) Show that there exists an orthonormal basis of L2 (Q) consisting of eigenfunctions
of T ∗ T .
(c) Assume now that G consists of four vertices with one vertex in the centre and three ver-
tices connected to it. Prove a version of Weyl’s law for the operator T ∗ T on the associated
quantum graph.
13.3 Self-Adjoint Operators
Using the construction from the last section we can also prove the spectral
theorem for general self-adjoint operators.
Theorem 13.15. Let (DT , T ) : H → H be a densely defined self-adjoint
operator. Then there exists a finite measure space (X, µ) and a real-valued
measurable function g : X → R such that (DT , T ) is unitarily isomorphic
to (DMg , Mg ), meaning that there is a unitary isomorphism φ : H → L2µ (X)
such that φ(DT ) = DMg and the diagram
T
H ⊇ DT −−−−→ H
 

φy
φ
y
L2µ (X) ⊇ DMg −−−−→ L2µ (X)
Mg
commutes.
Since a self-adjoint operator T as in Theorem 13.15 is also closed, it is clear
that we could directly apply the method of the previous section to T . Note,
however, that a simple application of Theorem 13.9 only gives a description
of T 2 , which does not allow a description of T . In fact, T 2 has a potentially
smaller domain, and may have lost some information about T (namely the
sign of eigenvalues or approximate eigenvalues). To compensate we will study
two operators: B as in the previous section, and A = T B, as in Figure 13.2.
Proof of Theorem 13.15. Let B = (I + T ∗ T )−1 = (I + T 2 )−1 be as in the
proof of Theorem 13.9. We also define
Graph(T )
(w, T w)
(0, T w) = (0, Av)
(w, 0) = (Bv, 0) (v, 0)
Fig. 13.2: For the proof of Theorem 13.15 we study the operators A and B.
A = T B = PH,2 ◦ PGraph ◦ ıH ,
where PH,2 (v, w) = w is the projection to the second copy of H in H × H,

see Figure 13.2.
Below we will apply Theorem 12.60 to the bounded operators A and B,
and to do this we first have to show that A is normal (in fact, it is self-adjoint;
this is something we already know for B by the proof of Theorem 13.9), and
that A and B commute. To prepare for this, we first claim that
BT ⊆ T B = A. (13.6)
To prove the claim, fix w ∈ DT and define v = Bw so that
(I + T 2 )v = w
by (13.5). Since w ∈ DT this shows that v ∈ DT 3 and
(I + T 2 )T v = T (I + T 2 )v = T w,
which by (13.5) means that BT w = T v. Since v = Bw, we have shown that
BT w = T Bw
for all w ∈ DT and hence the claim in (13.6).

To prove that A∗ = A we argue as follows. For w ∈ DT and v ∈ H we
have
hAv, wi = hT Bv, wi = hBv, T wi = hv, BT wi = hv, Awi
since T and B are self-adjoint and by (13.6). Thus A∗ w = Aw for all ele-
ments w of the dense subset DT ⊆ H. Since A is a bounded operator, it
follows that A∗ = A.
Moreover,
BA = BT B ⊆ T BB = AB
by (13.6). Since both AB and BA are defined on all of H this shows that A
and B commute.
Next we have to show that A and B together uniquely determine T (so
that when A and B are realized as multiplication operators we have some
hope of deducing a similar realization for T ). We claim that T = B −1 A and,
in particular,
DT = DB −1 A = {v ∈ H | Av ∈ im(B)}.
To see this, note that B −1 B = I since B is injective and hence
T = B −1 BT ⊆ B −1 T B = B −1 A
by (13.6). For the converse recall the construction of B in the proof of The-
orem 13.9 (see also Figures 13.2 and 13.3) and the definition of A. With these
we obtain
(Bv, Av) = (Bv, T Bv) ∈ Graph(T ), (13.7)

⊥
(Bv − v, Av) = (Bv − v, T Bv) ∈ Graph(T )
for any v ∈ H. Let v ∈ DB −1 A , and replace the latter instance of v

with B −1 Av to obtain
(Av − B −1 Av, T Av) ∈ Graph(T )⊥ .
^ ) so
Since T is self-adjoint, Lemma 13.8 shows that Graph(T )⊥ = Graph(T
that we have equivalently
(T 2 Bv, −T Bv + B −1 Av) ∈ Graph(T ). (13.8)
Taking the sum of (13.7) and (13.8) and using the identity (I + T 2 )B = I
gives
(v, B −1 Av) ∈ Graph(T ).
Thus v ∈ DT and T v = B −1 Av, as claimed (see Figure 13.3).
Now we apply Theorem 12.60 to A and B to obtain a finite measure
space (X, µ) and two functions gA : X → R and gB : X → (0, ∞) such
that A and B are conjugate to MgA and MgB , respectively. Since we have
shown that DT and T are purely defined in terms of A and B, we can finally
use the same unitary isomorphism φ to describe (DT , T ) as follows:

φ(DT ) = φ {v ∈ H | Av ∈ im(B)}
= {f ∈ L2µ (X) | MgA (f ) ∈ im(MgB )}
n o
= f ∈ L2µ (X) | ggB
A
f ∈ L2µ (X) = DMg ,
gA
where we set g = gB , and also
Graph(T )⊥
(0, B −1 Av) Graph(T )
(Bv − v, Av) (0, Av)
(Bv, 0) (v, 0)
Fig. 13.3: As the proof of Theorem 13.15 shows, the two marked segments are
translates of each other.
φ(T v) = φ(B −1 Av) = Mg−1

B
MgA φ(v) = Mg φ(v)
for all v ∈ DT .
ý ý
Exercise 13.16 (Schur’s lemma for densely defined closed operators). Assume
that π1 : G H1 and π2 : G H2 are unitary representations of a topological group G
such that π1 is irreducible. Moreover, assume that (DT , T ) : H1 → H2 is a densely defined
closed operator satisfying π1 (g)DT ⊆ DT and T π1 (g) = π2 (g)T on DT for all g ∈ G. Show
that DT = H1 and that T is bounded, and deduce that the conclusions of Schur’s lemma
(Exercise 12.58) holds in this setting.
13.4 Symmetric Operators
In this section we will discuss another class of unbounded operators appearing

in applications, which is closely related to the class of self-adjoint operators
discussed above. The requirement that the operator be densely defined in
Definition 13.17 is sometimes dropped, and in the physics literature the term
Hermitian is sometimes used for symmetric.
Definition 13.17. A densely defined operator (DS , S) : H → H on a Hil-

bert space H is called symmetric if hSu, vi = hu, Svi for all u, v ∈ DS , or
equivalently if S ⊆ S ∗ .
Because of the satisfyingly complete description of self-adjoint operators in

the previous section, it is often useful to extend a given symmetric operator
to a self-adjoint operator. As we will see, this is sometimes but not always
possible.
13.4 Symmetric Operators 499
13.4.1 The Friedrichs Extension
Theorem 13.18. Let H be a complex Hilbert space and let (DS , S) : H → H

be a densely defined symmetric operator that is also positive in the sense
that hSu, ui > 0 for all u ∈ DS . Then there exists a positive self-adjoint
extension SF ⊇ S.
Exercise 13.19 (Quantum harmonic oscillator). Define an operator H with do-
d2
main DH = S (R) ⊆ L2 (R) by H(f )(x) = − 21 dx2
f (x) + 21 x2 f (x). Show that H is positive
and symmetric but unbounded.
Proof of Theorem 13.18. Let (DS , S) : H → H be as in the theorem. For

any u, v ∈ DS we define the semi-inner product h·, ·iS by
hu, viS = hSu, vi = hu, Svi ,

p
and let kukS = hu, uiS be the induced semi-norm. We let V0 ⊆ DS denote
the kernel of k · kS and define HS to be the completion of DS /V0 with respect
to k · kS . We denote the extension of h·, ·iS (and of k · kS ) to the completion
again by h·, ·iS (and k · kS ), and the canonical map from DS to HS by ı0 . We
claim that ı0 is closable (as in Definition 13.2) and will write ı for the closed
operator with the property that
Graph(ı) = Graph(ı0 ) ⊆ H × HS .
To see that Graph(ı0 ) is indeed a graph we assume that a sequence ((un , ı0 un ))

in Graph(ı0 ) converges to (0, v) in H × HS . For any w ∈ DS we then have
hv, ı0 wiS = lim hı0 un , ı0 wiS = lim hun , Swi = 0

n→∞ n→∞
since limn→∞ un = 0 with respect to k · k. Since ı0 (DS ) is dense in HS we

see that v = 0, as required.
Since (Dı , ı) : H → HS is densely defined and closed, it follows from
Theorem 13.9 that ı∗ ı is a densely defined self-adjoint operator. We claim
that S ⊆ ı∗ ı. Suppose therefore that u ∈ DS . Then
hSu, vi = hı0 u, ı0 viS = hıu, ıviS
for all v ∈ DS = Dı0 . However, by the density of Graph(ı0 ) in Graph(ı) this

equality extends, giving
hSu, vi = hıu, ıviS
for all v ∈ Dı . Hence for u ∈ DS the map
Dı ∋ v 7−→ hıv, ıuiS = hv, Sui
is bounded with respect to k · k. This implies that ıu ∈ Dı∗ so u ∈ Dı∗ ı and

Su = ı∗ ıu,
as required. Thus SF = ı∗ ı is a self-adjoint extension of S, which is clearly

positive.
The following exercise gives some properties of the Friedrichs extension.
Exercise 13.20. Using the same notation as in the proof of Theorem 13.18, show in turn
the following statements.
(a) Graph(ı) is isomorphic to the completion H1 of DS with respect to the norm derived
from the inner product hu, vi1 = hu, vi + hSu, vi for u, v ∈ DS , and H1 can be identified
with the subspace Dı of H allowing us to write DS ⊆ H1 ⊆ H.
(b) The domain of SF = ı∗ ı consists of all u ∈ H1 such that H1 ∋ v 7−→ hv, ui1 is bounded
with respect to k · k, and in that case hv, ui1 = hv, u + SF ui for all v ∈ H1 .
13.4.2 Cayley Transform and Deficiency Indices
We finish this chapter (and hence our discussion of spectral theory) with a
series of exercises concerning work going back to von Neumann on the ex-
istence of self-adjoint extensions of a general symmetric operator. The main
tool for this discussion is the Cayley transform, the definition of which may
at first be a little surprising. To motivate the definition we recall that the
spectrum of a self-adjoint bounded operator is a compact subset of R. Gener-
alizing this definition we suppose now that (DT , T ) : H → H is a self-adjoint
operator on a complex Hilbert space H and define its resolvent set by
ρ(T ) = {λ ∈ C | T − λI = B −1 for some B ∈ B(H)}
and its spectrum by σ(T ) = Crρ(T ).

Exercise 13.21. Let (DT , T ) : H → H be a self-adjoint operator on a complex Hilbert
space H. Show that σ(T ) ⊆ R.
z−i
Next note that the function φ(z) = z+i maps R bijectively into the unit
circle with the point 1 removed, suggesting a way to associate a unitary
operator to a self-adjoint operator.
Exercise 13.22. Let (DT , T ) : H → H be a self-adjoint operator on a complex Hilbert
space H.
(a) Show that T + iI is injective and im(T + iI) = H.
(b) Show that U (T v + iv) = T v − iv for v ∈ DT defines a unitary operator U : H → H.
The Cayley transform generalizes Exercise 13.22 to the setting of symmet-

ric operators, where we will have to stop relying on the spectral theorem.
Let (DS , S) : H → H be a symmetric operator on a complex Hilbert space.
The Cayley transform of S is the operator
US (Sv + iv) = Sv − iv
13.4 Symmetric Operators 501
for all v ∈ DS with natural domain DUS = {Sv + iv | v ∈ DS }. The Cayley

transform is a partial isometry with the properties that I − US is injective
and that im(I − US ) = {v − US v | v ∈ DS } is dense in H. In summary, we
may write US = (S − iI)(S + iI)−1 .
Essential Exercise 13.23. Let H and (DS , S) be as above. Show that

(a) DS ∋ v 7→ Sv + iv is injective,
(b) US is an isometry on its domain,
(c) I − US is injective, and
(d) im(I − US ) is dense in H.
The Cayley transform has an inverse operation, which allows us to associ-

ate to any given partially defined isometry (DU , U ) : H → H for which I − U
is injective and im(I − U ) is dense in H a densely defined symmetric oper-
ator (DSU , SU ) : H → H by setting DSU = im(I − U ) and
SU (w − U w) = i(w + U w)
for all w ∈ DU . Thus we may write SU = i(I + U )(I − U )−1 .
Essential Exercise 13.24. Let H and (DU , U ) be as above. Show that SU

as defined above is a densely defined symmetric operator.
Essential Exercise 13.25. Show that the procedure above is indeed the
inverse to the Cayley transform by the following steps.
(a) Given a densely defined symmetric operator (DS , S) : H → H, show
that S = SUS .
(b) Given a partially defined isometry (DU , U ) : H → H for which I − U is
injective and im(I − U ) is dense, show that U = USU .
Essential Exercise 13.26. Suppose that U : H → H is unitary. Show

that I − U is injective if and only if im(I − U ) is dense in H. If U has
these equivalent properties, show that S = SU is self-adjoint.
If S and S ′ are densely defined symmetric operators, it is clear from the

definition of
DUS = {Sv + iv | v ∈ DS }
and the relation US = (S − iI)(S + iI)−1 that S ⊆ S ′ implies US ⊆ US ′ .
Similarly, if U and U ′ are partial isometries with the properties above, one
sees that U ⊆ U ′ implies SU ⊆ SU ′ . If U ′ is even unitary, then the density
of im(I − U ′ ) follows from the corresponding property of U , so that by Exer-
cise 13.26 we also know that I − U ′ is injective. Hence in the case of a unitary
extension U ′ of U , the properties above are automatically satisfied. Using
this, the remaining part of Exercise 13.26, and Exercise 13.22 we deduce
that the problem of finding self-adjoint extensions of symmetric operators is
equivalent to finding unitary extensions of (certain) partial isometries.
Exercise 13.27. Give an example of a densely defined symmetric operator that does not
have a self-adjoint extension.
The following exercise presents the main results of this section.
Essential Exercise 13.28. Let (DS , S) : H → H be a densely defined sym-

metric operator on a separable complex Hilbert space H. We define the defi-
ciency indices n+ and n− by

n± (S) = dim {Sv ± iv | v ∈ DS }⊥ ∈ N0 ∪ {∞}.
(a) Show that S has a self-adjoint extension if and only if n+ (S) = n− (S).
(b) A symmetric operator is called essentially self-adjoint if it has a unique
self-adjoint extension. Show that S is essentially self-adjoint if and only
if n+ (S) = n− (S) = 0.
Exercise 13.29. Find an example of an essentially self-adjoint operator that is not self-
adjoint.
Exercise 13.30. Show that we have n+ (S) = n− (S) = 1 for the operator S = iT0 from
Exercise 13.7. Deduce that we can parameterise the self-adjoint extensions Sα of S (in a
natural manner) by elements α ∈ S1 .
13.5 Further Topics
• In the formulation of quantum mechanics due to Dirac and von Neumann,

physical observables (momentum, position, spin, angular momentum and
so on) are represented by self-adjoint operators on a Hilbert space; we
refer to Reed and Simon [91] for this.
• As we have already seen in Chapters 5 and 6, many boundary value
problems in the study of partial differential equations have natural de-
scriptions in terms of self-adjoint operators.
Chapter 14
The Prime Number Theorem
It would be difficult to overstate the importance of functional analysis in

number theory, as almost all the topics discussed in this volume play a found-
ational role in modern number theory. It would be impossible to really justify
that statement here, and we instead follow Tao [103] and give a proof of the
classical prime number theorem using Banach algebras (as in Chapter 11),
Fourier analysis (as in Chapter 9), the weak* topology on the dual of Cc (R)
(using Chapter 8), and some elementary number theory. By the last phrase
we mean a version of Mertens’ theorem in Section 14.2.5 (which predates the
first proof of the prime number theorem) and Selberg’s symmetry formula in
Section 14.2.2 (which was a key ingredient of the ‘elementary’ proof of the
prime number theorem due to(34) Selberg [96]; see also Erdös [29]). In Sec-
tion 14.5 we discuss a generalization of the prime number theorem to primes
in arithmetic progressions.
14.1 Two Reformulations
Gauss observed in 1792–93 (at the age of 15 or 16), that the density of primes
close to x seemed to be approximately log1 x , leading to the suggestion that
the prime counting function
π(x) = |{p 6 x | p is a prime in N}|
has the asymptotic growth rate logx x , and this statement is now called the
prime number theorem. After many partial and weaker results, Hadamard
and (independently) de la Vallée-Poussin extended work in which Riemann
introduced complex-analytic methods to give the first proofs of the prime
number theorem in 1896. We will not concern ourselves with the error rate
in this approximation; the best conjectured error rate is a reformulation of
the famous Riemann hypothesis.(35)

504 14 The Prime Number Theorem
Recall that given positive functions f, g on an interval [a, ∞) we say that f

and g are asymptotic, written f ∼ g, if limx→∞ fg(x) (x)
= 1 or equivalently
if f (x) = g(x) + o(g(x)) as x → ∞.
Theorem 14.1
P (Prime number theorem). For the prime counting func-
tion π(x) = p6x 1 (where p runs over the primes in N) we have π(x) ∼ logx x
as x → ∞.
As it turns out, it is more convenient to work with the von Mangoldt

function Λ, defined by
(
log p if n = pk for a prime p and some k > 1;
Λ(n) =
0 otherwise
for n > 1. In the remainder of the chapter p, p1 , p2 , . . . will always denote

primes in N and the letters d, m, n, q will usually denote positive integers.
For brevity we will refer to ‘the prime number theorem’ as ‘PNT’ throughout
this chapter.
Lemma 14.2 (First reformulation of PNT). If

X
Λ(n) ∼ x (14.1)
16n6x
as x → ∞ then Theorem 14.1 follows.
Proof. Suppose that (14.1) holds, and fix some small δ > 0. Then we also
have X
Λ(n) = x1−δ + o(x1−δ ) = oδ (x)
16n6x1−δ
as x → ∞ (which is also easy to see directly). Taking the difference we see

that
X X X
log p + log p = Λ(n) = x + oδ (x).
x1−δ <p6x x1−δ <pk 6x x1−δ <n6x
k>1
The first sum on the left is the one we are interested in, so we wish to estimate
the second sum. For this, notice that n = pk ∈ (x1−δ , x] with k > 1 implies
that k 6 log x log x
log p 6 log 2 and p 6 x
1/2
. Therefore
X log x X
log p 6 log(x1/2 ) = O(x1/2 log2 x) = o(x),
log 2
x1−δ <pk 6x 16n6x1/2
k>1
and substituting this into the above we see that

14.1 Two Reformulations 505
X
log p = x + oδ (x).
x1−δ <p6x
Since log p 6 log x for p ∈ (x1−δ , x] we obtain

X X
x + oδ (x) = log p 6 log x 1,
x1−δ <p6x x1−δ <p6x
which, after dividing by log x, gives the lower bound in

X
x x x x
+ oδ 6 16 + oδ .
log x log x 1−δ
(1 − δ) log x log x
x <p6x
The upper bound follows similarly using log p > (1 − δ) log x for p ∈ (x1−δ , x].
Since
X x
1−δ
16x = oδ
1−δ
log x
p6x
we may add the small primes back in to obtain

x x x x
+ oδ 6 π(x) 6 + oδ .
log x log x (1 − δ) log x log x
As δ > 0 is arbitrary, this gives the lemma.
Exercise 14.3. Prove that Theorem 14.1 implies (14.1) in Lemma 14.2.
In the second reformulation below due to Tao we will start to see the
connection to functional analysis.
Proposition 14.4 (Second reformulation of PNT). Define the Λ-semi-
norm k · kΛ by
Z
X Λ(n)

kf kΛ = lim sup f log n − h − f (t) dt (14.2)
h→∞ n
n>1 R
for f ∈ Cc (R). If kf kΛ = 0 for every f ∈ Cc (R), then Theorem 14.1 holds.

It is easy to see from the properties of the limit supremum that
kλf kΛ = |λ|kf kΛ
and
kf1 + f2 kΛ 6 kf1 kΛ + kf2 kΛ
for any f1 , f2 ∈ Cc (R) and λ ∈ C. In other words, k · kΛ defines a semi-
norm on Cc (R) once we have checked that it is well-defined in the sense
that kf kΛ < ∞ for every f ∈ Cc (R). We will prove this in the next section.
Proof of Proposition 14.4. Supposing that the semi-norm is identically

zero amounts to assuming that
X Λ(n) Z
n
f log = f (t) dt + o(1)
n x R
n>1
as x = eh → ∞ for any f ∈ Cc (R). If now g ∈ Cc ((0, ∞)) then we may

set f (t) = et g(et ) to define a function f ∈ Cc (R) and conclude that
X Λ(n) n n X Λ(n) n
g = f log
n x x n x
n>1 n>1
Z Z ∞
t t
= g(e )e dt + o(1) = g(u) du + o(1),
R 0
or equivalently
X n Z ∞
Λ(n)g =x g(u) du + o(x).
x 0
n>1
Applying this to compactly supported functions 0 6 g− 6 1[ 21 ,1] 6 g+ 6 1

R
satisfying (g+ − g− ) dx < δ for some δ > 0 (see Figure 2.3 on p. 50) we
obtain X X n
Λ(n) 6 Λ(n)g+ 6 ( 12 + δ)x + oδ (x)
1
x
2 x<n6x
n>1
and X X n
Λ(n) > Λ(n)g− > ( 12 − δ)x + oδ (x).
1
x
2 x<n6x
n>1
As δ > 0 is arbitrary, this also gives

X
Λ(n) = 21 x + o(x). (14.3)
1
2 x<n6x
We sum estimates of this form to get the desired claim. To handle the error
term carefully we fix some ε > 0 and suppose M > 1 is such that the error
term is bounded in absolute
P value by εx whenever x > M . Also note that for
a fixed ε both M and n6M Λ(n) are oε (x). Hence we may write
X X X X
Λ(n) = Λ(n) + Λ(n) + · · · + Λ(n) + oε (x),
n6x 1 1 1 1 1
2 x<n6x 4 x<n6 2 x 2ℓ+1
x<n6
2ℓ
x
where ℓ > 0 is chosen maximally with 21ℓ x > M . Applying the asymptotic
1
in (14.3) to each sum we see that the main terms add to x− 2ℓ+1 x = x+oε (x)
14.2 The Selberg Symmetry Formula and Banach Algebra Norm 507
and the error terms add up to no more than 2εx by choice of M . As ε > 0
was arbitrary, we see that (14.1) in Lemma 14.2 follows.
Exercise 14.5. Show that PNT implies that the Λ-semi-norm vanishes on Cc (R).
14.2 The Selberg Symmetry Formula and Banach

Algebra Norm
Theorem 14.6. The function f 7−→ kf kΛ on Cc (R) defined in (14.2) is a

semi-norm with
kf kΛ 6 kf k1 (14.4)
and kf1 ∗ f2 kΛ 6 kf1 kΛ kf2 kΛ for f, f1 , f2 ∈ Cc (R).
This will be an important step towards the proof of PNT. Assuming that
the semi-norm is not identically zero will allow us to construct a Banach
algebra homomorphism from L1 (R) to the completion AΛ of Cc (R) with
respect† to the semi-norm k · kΛ , which will induce a dual homomorphism
from the space of characters AoΛ of AΛ into L1 (R)o ∼
= R. This will eventually
lead to a contradiction.
For the proof of Theorem 14.6 we will need some elementary tools from
number theory: the Selberg symmetry formula and Mertens’ theorem.
14.2.1 Dirichlet Convolution and Möbius Inversion
Given two functions f1 , f2 : N → R we define the Dirichlet convolution ‡ by

X n
(f1 D∗ f2 )(n) = f1 (d)f2 ,
d
d|n
where the sum is taken over all divisors d of n ∈ N (including both 1 and n
itself).
The special case of convolution with the constant function 1 is of particular
interest as it simply corresponds to taking the sum over all divisors,
X
∗ 1)(n) =
(f D f (d).
d|n
† The reader may easily check that the formal mechanism of taking the completion with
respect to a semi-norm gives the same result as first forming the quotient with respect to
the kernel of the semi-norm and then taking the completion with respect to the norm.
‡ This is often just denoted f ∗ f as it is a multiplicative convolution on the semigroup N.
1 2
However, as we make more use of the additive convolution in this volume we reserve the
unadorned ∗ for the latter.
In the case of the von Mangoldt function we claim that

X
∗ 1)(n) =
(Λ D Λ(d) = log n. (14.5)
d|n
In fact, if n = pk is a prime power, then
X k
X
Λ(d) = Λ(pℓ ) = k log p = log n,
d|n ℓ=0
and if n = pk11 · · · pkaa , then

X X X
Λ(d) = Λ(d) + · · · + Λ(d) = log n.
k
d|n d|p1 1 d|pk
a
a
The map f 7−→ f D∗ 1 has an inverse operation, known as Möbius inversion.

We define the Möbius function µ : N → Z by
(
0 if p2 n for some prime p and
µ(n) =
(−1)ℓ if n = p1 · · · pℓ ,
where p1 , . . . , pℓ denote distinct primes. Notice that the second case includes
the statement that µ(1) = (−1)0 = 1 as 1 is taken to be a product of no
primes.
Proposition 14.7 (Möbius inversion). Given functions f, g : N → R we
∗ 1 if and only if f = g D
have g = f D ∗ µ. Moreover δ1 = 1 D
∗ µ, where
(
1 for n = 1,
δ1 (n) =
0 otherwise.
We note that in particular this allows us to reformulate (14.5), giving an

alternate definition of the von Mangoldt function as
X n
Λ(n) = µ(d) log .
d
d|n
Proof of Proposition 14.7 As we will see the proposition is a con-

sequence of unique prime factorization and is a number-theoretic version
of the inclusion–exclusion principle. We start by noting that Dirichlet convo-
lution is commutative and associative since
X
∗ f2 )(n) =
(f1 D ∗ f1 )(n)
f1 (d)f2 (e) = (f2 D
n=de
and
X
∗ f2 ) D
(f1 D ∗ f3 (n) = ∗ (f2 D
f1 (d)f2 (e)f3 (f ) = f1 D ∗ f3 )(n)
n=def
for any three functions f1 , f2 , f3 on N and all n ∈ N. Also note that the
function δ1 is an identity for Dirichlet convolution since
X
∗ δ1 )(n) =
(f D f (d)δ1 (e) = f (n).
n=de
Thus it is sufficient to show that 1 D∗ µ = δ1 , or equivalently that

(
X 1 for n = 1,
µ(d) =
d|n
0 otherwise.
For n = 1 this is clear. Suppose that n = pk11 · · · pkℓ ℓ > 1 with p1 , . . . , pℓ

distinct primes. Then
X ℓ
X X Xℓ
ℓ
µ(d) = µ(1) + µ pj1 · · · pjr = (−1)r = (1 − 1)ℓ = 0,
r=1 j1 ,...,jr r=0
r
d|n
where the inner sum over j1 , . . . , jr runs over all different r-tuples of distinct
indices within {1, . . . , ℓ}.
14.2.2 The Selberg Symmetry Formula
Summarizing the above discussions for the von Mangoldt function we have
(
log p if n = pk for a prime p and some k > 1;
∗ log)(n) =
Λ(n) = (µ D
0 otherwise,
for every n ∈ N. By analogy we define the second von Mangoldt function by
∗ log2 .
Λ2 = µ D
We start by describing the second von Mangoldt function more carefully.
Lemma 14.8 (Second von Mangoldt function). For every n ∈ N we

have
 2
 k
(2k − 1) log p if n = p ;
∗ Λ)(n) = 2 log p1 log p2
Λ2 (n) = Λ(n) log n + (Λ D if n = pk11 pk22 ;


0 otherwise,
where p1 and p2 denote different primes and we assume k, k1 , k2 > 1.

Proof. Below we will use the fact that Λ(n) = log p for n = pk and k > 1
and Λ(n) = 0 otherwise without explicit reference. Define f to be the second
expression in the lemma, that is
X n
f (n) = Λ(n) log n + Λ(d)Λ
d
d|n
for n ∈ N. We first claim that f also equals the third expression (defined by
the three cases). In fact, f (1) = 0 and if n = pk then
k−1
X
f (n) = log p log pk + log2 p = (2k − 1) log2 p.
ℓ=1
If n = pk11 pk22 for two different primes p1 , p2 and k1 , k2 > 1, then Λ(n) = 0
and
f (n) = Λ(pk11 )Λ(pk22 ) + Λ(pk22 )Λ(pk11 ) = 2 log p1 log p2 .
Finally,
if n has three or more prime factors, then clearly f (n) = 0 since
for dn either d or nd must have at least two prime divisors.
Now let n = p1k1 · · · pkℓ ℓ and calculate
kj kj1 kj2
X XX X X X
∗ 1)(n) =
(f D f (d) = f (paj ) + f (paj11 paj22 )
d|n j a=1 j1 6=j2 a1 =1 a2 =1
kj
XX X
= (2a − 1) log2 pj + 2 kj1 kj2 log pj1 log pj2
j a=1 j1 6=j2
X X
= kj2 log2 pj + 2 kj1 kj2 log pj1 log pj2
j j1 6=j2
X 2
= kj log pj = log2 n,
j
where the sum over j, respectively j1 6= j2 , is always from 1 to ℓ. Using

∗ log2 = Λ2 , as claimed.
Proposition 14.7 we see that f = µ D
Recalling the reformulation (14.1) of PNT in Lemma 14.2 P and thinking
of Λ2 as a modified version of Λ we might be interested in n6x Λ2 (n). As
Selberg noticed, the latter sum is much easier to understand, but its asymp-
totic description is still useful for obtaining PNT. This was an important
ingredient in Selberg’s elementary proof of PNT, and is also crucial for the
argument of Tao presented here.
While not strictly relevant, it might be helpful to provide a reason why the
Selberg symmetry formula is easier to obtain than PNT. The second formula
in Lemma 14.8 shows that Λ2 might be thought of as a weighted counting
function for products of primes, and since this set is larger than the set of
primes it might be easier to study. Whatever the true rationale may be, it is
still surprising that the following argument relies only on elementary analysis
and Möbius inversion.
Proposition 14.9 (Selberg symmetry formula). We have
X
Λ2 (n) = 2x log x + O(x)
n6x
for x > 1.
Proof. We fix some x > 1. The proof will use Möbius inversion and the
following (elementary) asymptotic estimates, which we will prove below.
X y −1 1
⌊y⌋ 1
= =1+O ; (14.6)
m m y y
m6y
X 1
1
= log y + c1 + O ; (14.7)
m y
m6y
X log(y/m)
1 log(1 + y)
= log2 y + c1 log y + c2 + O ; (14.8)
m 2 y
m6y
2
1 X 2 2 log (1 + y)
log m = log y − 2 log y + 2 + O , (14.9)
y y
m6y
for some constants c1 , c2 ∈ R and all y > 1.

Using Möbius inversion. Given a function g : N → R we may define f by
X
f (n) = (µ D∗ g)(n) = µ(d)g(m)
n=dm
for all n > 1, which gives

X X X X X
f (n) = µ(d)g(m) = µ(d) g(m),
n6x n6x n=dm d6x m6 x
d
where we changed the order of summation. Applying this with g(n) = log2 n
and f (n) = Λ2 (n) gives
1X X µ(d) 1 X
Λ2 (n) = log2 m, (14.10)
x d x/d x
n6x d6x m6 d
and we see that the estimate (14.9) might be useful for y = xd . Multiply-
ing (14.8) by 2, (14.7) by a constant c3 , (14.6) by a constant c4 , and summing
we can choose the constants to match† the right-hand side of the asymp-
† Explicitly, c3 = −2 − 2c1 and c4 = 2 − c2 − c1 c3 .
totic (14.9) to obtain

2
1 X 2
X 2 log( y ) + c3 + c4 ( y )−1
m m log (1 + y)
log m = +O .
y m y
m6y m6y
To simplify the expressions we introduce the shorthand
F (t) = 2 log t + c3 + c4 t−1

x
and put the above approximate identity with y = into (14.10) to obtain
d
!
1X X µ(d) X F ( x ) X µ(d)
dm log2 (1 + xd )
Λ2 (n) = + O
x d m d x/d
n6x d6x m6 x
d d6x
 
X X F( x ) 1 X x
= µ(d) dm
+ O log2 1 + 
dm x d
d6x dm6x d6x
 
X F(x) X 1 X x
= n
µ(d) + O log2 1 + ,
n x d
n6x d|n d6x
where we set
Pn = dm and exchanged the order of summation again. Next we
recall that d|n µ(d) = (1 D
∗ µ)(n) = δ1 (n) for all n > 1 by Proposition 14.7,
and claim that
X x
log2 1 + = O(x). (14.11)
d
d6x
Together we obtain
1X 1
Λ2 (n) = F (x) + O(1) = 2 log x + c3 + c4 + O(1) = 2 log x + O(1)
x x
n6x
and the proposition follows.

Riemann sums. It remains to prove the estimates (14.7)–(14.9) and (14.11)
(the bound (14.6) is clear), all of which are simple exercises in Riemann
integration.
Recall that a non-decreasing function f : [1, ∞) → R>0 satisfies
Z y Z ⌊y⌋ ⌊y⌋
X Z ⌊y⌋+1 Z y
f (t) dt−f (y) 6 f (t) dt 6 f (m) 6 f (t) dt 6 f (t) dt+f (y +1),
1 1 m=2 2 2
and so ⌊y⌋
X Z y

f (m) − f (t) dt 6 f (1) + f (y + 1).
1
m=1
Applying this with f (t) = log2 t gives

X
log2 m = y log2 y − 2y log y + 2y + O(log2 (1 + y)),
m6y
which then gives (14.9) after dividing by y.

A similar estimate holds for non-increasing functions, so that
Z x
X x x
2 2
log 1 + − log 1 + dt 6 log2 (1 + x) + log 2 = O(x).
d 1 t
d6x
x
Using the substitution u = t (with du = x(−t−2 ) dt) we see that
Z Z x Z ∞ 2
x
x x log (1 + u)
log2 1 + dt = log2 (1 + u) 2 du 6 x du = O(x),
1 t 1 u 1 u2
which gives (14.11).

For the statements in (14.7) and (14.8) we have to be more careful as these
1
involve finer estimates. For (14.7), let f (t) = ⌊t⌋ and note that
X 1 Z ⌊y⌋+1
= f (t) dt.
m 1
m6y

Also note that f (t) − 1t ≪ 1
t2 . Hence
X 1 Z ⌊y⌋+1 Z y
1
= log y + f (t) dt − dt
m 1 1 t
m6y
Z ∞ Z ∞ Z ⌊y⌋+1
1 1
= log y + f (t) − dt − f (t) − dt + f (t) dt
1 t y t y

1
= log y + c1 + O
y
proves (14.7). P log m

In light of (14.7), it is sufficient to consider the sum m6y m for the
log⌊t⌋ log t
proof of (14.8). We define f (t) = ⌊t⌋ ,
and by differentiating t 7→ t the
mean value theorem gives

f (t) − log t ≪ 1 + log t .
t t2
Note that Z y
log t 1
dt = log2 y
1 t 2
and that
Z ∞ Z
log t ∞ ∞ log y + 1
dt = ue−u du = −ue−u − e−u = .
y t2 log y log y y
Hence, if we set Z ∞
log t
−c2 = f (t) − dt
1 t
then we have, similarly,
X log m Z y Z y Z ⌊y⌋+1
1 log t
= log2 y + f (t) dt − dt + f (t) dt
m 2 1 1 t y
m6y
Z ∞
1 2 log t log(1 + y)
= log y − c2 − f (t) − dt + O
2 y t y

1 log(1 + y)
= log2 y − c2 + O .
2 y
Multiplying this formula by −1 and (14.7) by log y we obtain (14.8) as the

sum.
14.2.3 Convolution of Measures and a Measure-Theoretic

Reformulation of Selberg’s Symmetry Formula
In order to get closer to functional analysis we associate to any given func-

tion f : N → R a Radon measure νf on [0, ∞) defined by
X∞
f (n)
νf = δlog n .
n=1
n
Lemma 14.10 (Convolution of measures). For two functions f1 , f2 on N

we have
νf1 ∗ νf2 = νf1 D
∗ f2 ,
where the convolution of two Radon measures ν1 , ν2 on [0, ∞) is defined by
ZZ
(ν1 ∗ ν2 )(B) = 1B (t1 + t2 ) dν1 (t1 ) dν2 (t2 )
for any Borel subset B ⊆ R, and is again a Radon measure on [0, ∞).
Proof. For any x > 0 we have
ν1 ∗ ν2 ([−x, x]) = ν1 ∗ ν2 ([0, x]) 6 ν1 ([0, x])ν2 ([0, x]) < ∞,

which shows that ν1 ∗ν2 is again a Radon measure. Let B ⊆ R be measurable.

Then
∞
X ∞
X f1 (m1 ) f2 (m2 )
(νf1 ∗ νf2 )(B) = 1B (log m1 + log m2 )
m1 =1 m2 =1
m1 m2
∞
X 1X n
= 1B (log n) f1 (d)f2
n=1
n d
d|n
Z
= 1B dνf1 D∗ f2 = νf1 D∗ f2 (B),
as claimed.
For a Radon measure ν on R and some h ∈ R we define the shifted meas-
ure λh ν by Z Z
f dλh ν = f (t − h) dν(t).
R R
Clearly any Radon measure defines a functional on Cc (R). If we endow Cc (R)

with a locally convex vector space structure as in Example 8.63(4) then these
functionals are in fact continuous. However, we will not need this, even though
we are interested in the associated weak* topology on the space of Radon
measures (which is also referred to as the vague topology).
Exercise 14.11. Show that a Radon measure defines a continuous functional on the locally
convex vector space Cc (R) defined by the semi-norms from Example 8.63(4).
Lemma 14.12 (Third reformulation of PNT via weak* convergence).

The semi-norm k · kΛ as in Proposition 14.4 vanishes on all of Cc (R) if and
only if the measures λh νΛ weak* converge to Lebesgue measure m as h → ∞.
Proof. Fix some f ∈ Cc (R) and h ∈ R. Then

Z Z X Λ(n)
f dλh νΛ = f (t − h) dνΛ (t) = f (log n − h).
R R n
n>1
Hence we see that kf kΛ = 0 is equivalent to

Z Z
f dλh νΛ −→ f dm
R R
as h → ∞.
We will not use this lemma except as motivation for the next step, which
relies on the Selberg symmetry formula.
Corollary 14.13 (Measure-theoretic Selberg symmetry). We define

the measure νsym = νΛ2 / log . Then
1
dνsym (t) = dνΛ (t) + d νΛ ∗ νΛ (t)
t
and λh νsym converges to 2m in the weak* topology as h → ∞.
Proof. By Lemma 14.10 we have νΛ ∗ νΛ = νΛ D∗ Λ and so
X Λ2 (n)
νsym (B) = νΛ2 / log (B) = 1B (log n)
n log n
n>1
X
∗ Λ)(n)
Λ(n) (Λ D
= 1B (log n) +
n n log n
n>1
Z
1
= νΛ (B) + 1B (t) dνΛ D∗ Λ (t)
t
by the properties of Λ2 in Lemma 14.8. This gives the identity in the corollary.
To obtain the claimed convergence we apply the Selberg symmetry formula
(Proposition 14.9). By this formula,
X
Λ2 (n) = 2cx log x + O(x) = 2cx log x + o(x log x)
n6cx
for any constant c > 0. Dividing by x log x this gives the asymptotic
1 X
Λ2 (n) = 2(b − a) + o(1) (14.12)
x log x
ax<n6bx
as x → ∞ for any 0 < a < b. Now fix ε > 0 and assume that x is large
enough to ensure that ax < n 6 bx implies that
log x
1−ε6 6 1 + ε.
log n
1
Multiplying this by log x Λ2 (n) and summing over n ∈ (ax, bx] leads to
1
P Λ2 (n)
x ax<n6bx log n
1−ε6 1
P 6 1 + ε.
x log x ax<n6bx Λ2 (n)
By (14.12) we know an asymptotic for the denominator, so the same asymp-

totic holds for the numerator since ε > 0 was arbitrary. Therefore, we have
X Λ2 (n) X n Λ2 (n)
1(log a,log b] (log n − log x) elog n−log x =
n log n x n log n
n>1 ax<n6bx
= 2(b − a) + o(1)
as x → ∞. Here the left-hand side equals the integral of the function defined
by f (t) = 1(log a,log b] (t)et with respect to λlog x νsym , so that
Z Z
1(log a,log b] (t)e dλlog x νsym = 2(b−a)+o(1) = 2 1(log a,log b] (t)et dt+o(1)
t
as x → ∞ (or, equivalently, as h = log x → ∞). In other words, the conver-

gence claim already holds for fa,b (t) = 1(log a,log b] (t)et for any a < b in (0, ∞).
Finally, we can approximate any f ∈ Cc (R) uniformly from above and below
by finite linear combinations of such functions. The corollary follows.
14.2.4 A Density Function and the Continuity Bound
Notice that Corollary 14.13 in particular gives 0 6 νΛ 6 νsym and so
0 6 λh νΛ 6 λh νsym (14.13)
for any h ∈ R. Using the Banach–Alaoglu theorem and Radon–Nikodym

derivatives we can conclude from this the following result.
Proposition 14.14. For any sequence (hn ) with hn → ∞ as n → ∞ there

exists a subsequence (hnk ) and a Borel measurable function D : R → [0, 2]
such that for any f ∈ Cc (R) we have
Z Z
lim f dλhnk νΛ = f D dm.
k→∞
Proof. Fix some integer ℓ > 1. Since by Corollary 14.13

Z
λh νsym ([−ℓ, ℓ]) 6 fℓ dλh νsym −→ 2(2ℓ + 1)
as h → ∞ for the function defined by



1 if |x| 6 ℓ,
fℓ (x) = ℓ + 1 − |x| if |x| ∈ [ℓ, ℓ + 1], and


0 if |x| > ℓ + 1,
we see that (λhn νΛ )|[−ℓ,ℓ] can be identified with a bounded sequence of func-
tionals on C([−ℓ, ℓ]). Let Rℓ = supn>1 λhn νΛ ([−ℓ, ℓ]). By the Banach–Alaoglu
theorem (Theorem 8.10 and Proposition 8.11) the closed ball
C([−ℓ,ℓ])∗
B ℓ = B Rℓ
of radius Rℓ is compact and metrizable in the weak* topology. Therefore

Y
X= Bℓ
ℓ>1
is compact and metrizable in the product topology (see Appendix A.3). Un-
folding the definitions, it follows that (λhn νΛ ) has a subsequence (λhnk νΛ )
R
such that f dλhnk νΛ converges as k → ∞ for any f ∈ Cc (R). As in-
tegration and the limit are linear, by taking the limit we obtain a linear
functional on Cc (R). Moreover, this functional is non-negative for any non-
negative f ∈ Cc (R) and so the Riesz representation theorem (Theorem 7.44)
shows that there is a Radon measure µ on R such that
Z Z
lim f dλhnk νΛ = f dµ,
k→∞
for all f ∈ Cc (R). By Corollary 14.13, f > 0 also implies that

Z Z Z Z
f dµ = lim f dλhnk νΛ 6 lim f dλhnk νsym = 2 f dm.
k→∞ k→∞
Using the density of Cc (R) in L1µ+m (R) we can approximate the character-
istic function 1B of any bounded measurable set by a non-negative func-
tion in Cc (R) simultaneously with respect to both µ and m, which im-
plies that µ(B) 6 2m(B). Using Proposition 3.29 it follows that µ is ab-
solutely continuous with respect to m and that the Radon–Nikodym deriv-
dµ
ative D = dm takes values in [0, 2] almost everywhere.
Recall from Lemma 14.12 that we wish to show that D ≡ 1.
Proof of first inequality in Theorem 14.6. Fix some f ∈ Cc (R) and
choose a sequence (hn ) with hn → ∞ as n → ∞ such that
Z
X Λ(n)

kf kΛ = lim f (log n − hn ) − f dm .
n→∞ n R
n>1
Applying Proposition 14.14 we can choose a subsequence such that λhnk νΛ

converges to the measure D dm with D taking values in [0, 2]. Therefore
Z Z

kf kΛ = f (D − 1) dm 6 |f | dm = kf k1 .
As f ∈ Cc (R) was arbitrary the inequality (14.4) in Theorem 14.6 follows.
14.2.5 Mertens’ Theorem
The second number-theoretic input needed is one of several results known as

Mertens’ theorem, which we state in the following form.
Theorem 14.15 (Mertens’ theorem). We have

X Λ(n)
= log x + O(1) (14.14)
n
n6x
for x > 1.
Proof. We first claim that
X
Λ(n) = O(x), (14.15)
n6x
which will allow us to control error terms in the calculation below. The
bound (14.15) is a trivial consequence of PNT (in the form of the state-
ment (14.1)), but fortunately the results developed above are sufficient to
prove (14.15) quite directly. In fact, the continuity bound kf kΛ 6 kf k1
in (14.4) (proven above) implies that

X Λ(n) n

lim sup f log 6 2kf k1
x→∞ n x
n>1
for every f ∈ Cc ((0, ∞)). Using this for f (t) = et g+ (et ) and a non-negative
function g+ ∈ Cc (R) with 1[ 21 ,1] 6 g+ we obtain (much as in the proof of
Proposition 14.4) that
Z ∞
1 X
lim sup Λ(n) 6 2 g+ (t) dt,
x→∞ x 1 0
2 x<n6x
and so X
Λ(n) 6 Cx
1
2 x<n6x
for some C > 1 and all x > 1. Therefore

X X X X
Λ(n) = Λ(n) + Λ(n) + · · · + Λ(n)
n6x 1 1 1 1 1
2 x<n6x 4 x<n6 2 x 2ℓ+1
x<n6
2ℓ
x
6 Cx + 12 Cx + · · · + 1
2ℓ
Cx 6 2Cx,
where ℓ = ⌊log2 x⌋, so that 21ℓ x > 1 but 2ℓ+1

1
x < 1. This proves (14.15).
P
Working towards (14.14), we first recall from (14.5) that log n = d|n Λ(d),
which implies that
X XX X jxk
log n = Λ(d) = Λ(d) .
d
n6x n6x d|n d6x
On the right-hand side we use ⌊ xd ⌋ = x

d + O(1) and (14.15) to obtain
X X Λ(d)
log n = x + O(x). (14.16)
d
n6x d6x
On the left-hand side we again use monotonicity of t 7→ log t to replace the

sum by an integral
X Z x

log n = log t dt + O log(1 + x) = x log x + O(x) (14.17)
n6x 1
as in the proof of the Selberg symmetry formula above. Combining (14.16)

and (14.17) and dividing by x gives the theorem.
14.2.6 Completing the Proof
Proof of Banach algebra inequality in Theorem 14.6. It remains to

show that for any f1 , f2 ∈ Cc (R) we have
Z Z

(f1 ∗ f2 ) dλh νΛ − (f1 ∗ f2 ) dm 6 kf1 kΛ kf2 kΛ + of1 ,f2 (1) (14.18)

as h → ∞. Note that the left-hand side is the integral of f = f1 ∗ f2 with

respect to the measure† λh νΛ − m. Next we define m+ = m|[0,∞) and notice
that λh m+ → m as h → ∞. For that reason we can equivalently consider
the integral of f with respect to λh applied to νΛ − m+ . It will be helpful to
write the latter measure in a different way as a sum of measures.
For this, notice first that the convolution of m+ = m|[0,∞) with itself is
given by
Z ∞Z ∞ Z ∞ Z u Z ∞
(m+ ∗m+ )(B) = 1B (t + s) dt ds = 1B (u) ds du = 1B (t)t dt
0 0 0 0 0
for any Borel subset B ⊆ R, where we used the substitution u = t + s (for

a fixed s) and Fubini’s theorem. In particular, we can write the relationship
above in the convenient but notationally illogical form
1
d(m+ ∗ m+ )(t) = m+ .
t
Splitting into three signed measures. Working in the dual space

of Cc (R), it follows from Corollary 14.13 that we can write
† Note that this ‘signed Radon measure’ may be undefined on unbounded Borel sets, but
interpreting these measures as functionals on Cc (R) gives the right viewpoint.

νΛ −m+ = νsym − 2m+ −1t ( dνΛ ∗ νΛ )(t)+m+

| {z }
=ρ1

= ρ1 − 1t d (νΛ −m+ ) ∗ (νΛ −m+ ) (t) + 2m+ −2 1t d(νΛ ∗ m+ )(t)
| {z } | {z }
=ρ2 =ρ3
as a sum of three signed measures. For the first of these we recall from
Corollary 14.13 that
λh ρ1 = λh (νsym − 2m+ ) −→ 2m − 2m = 0 (14.19)
as h → ∞. For the third we calculate νΛ ∗ m+ and obtain

XZ ∞ Λ(n)
(νΛ ∗ m+ )(B) = 1B (t + log n) dt
n
n>1 0
XZ ∞ Λ(n)
= 1B (s)1[log n,∞) (s) ds
n
n>1 0
Z ∞ X Λ(n)
= 1B (s) ds
0 s
n
n6e
for a Borel subset B ⊆ R. Therefore, 1t d(νΛ ∗ m+ ) is equal to the absolutely

continuous measure whose Radon–Nikodym derivative with respect to m is
given by
1 X Λ(n)
.
t t
n
n6e
By Mertens’ theorem (Theorem 14.15) this function is 1 + o(1) as t → ∞.

Hence
1
λh d(m+ ∗ νΛ ) −→ m (14.20)
t
as h → ∞, and we also have
λh ρ3 −→ 0 (14.21)
as h → ∞. To summarize, we have split νΛ − m+ into the sum ρ1 − ρ2 + ρ3 ,

where ρ1 and ρ3 satisfy (14.19) resp. (14.21).
Shifting ρ2 . We claim that in addition
Z

f1 ∗ f2 dλh ρ2 6 kf1 kΛ kf2 kΛ + o(1) (14.22)

as h → ∞ for f1 , f2 ∈ Cc (R). Together with what we have proved above

about ρ1 and ρ3 , this will imply (14.18).
Hence it remains to prove (14.22) for f1 , f2 ∈ Cc (R). We set f = f1 ∗ f2
and recall that by definition of λh and of ρ2 we have
Z Z Z

f dλh ρ2 = f (t − h) dρ2 (t) = f (t − h) 1t d (νΛ − m+ ) ∗ (νΛ − m+ ) (t).
To be able to operate with this expression we note that

1

t d (νΛ + m+ ) ∗ (νΛ + m+ )
is bounded from above by

1
2
t d(νΛ + m+ ) ∗ (νΛ + m+ ) + νΛ = νsym + t d(νΛ ∗ m+ ) + m+
which when shifted by λh converges to 5m as h → ∞ by Corollary 14.13

and (14.20). If gh , g ∈ Cc (R) with g > 0 satisfy |gh | 6 o(1)g as h → ∞, then
the above gives
Z Z

gh dλh ρ2 6 o(1) g(t − h) 1 d (νΛ + m+ ) ∗ (νΛ + m+ ) (t) = og (1)
t
as h → ∞. We apply this to g = |f | and

t+h

gh (t) = f (t) h −1 .
Note that the requirement above is satisfied, since t+h

h = 1 + o(1) as h → ∞
uniformly for t ∈ Supp f by compactness. We obtain
Z Z
t
1
f (t − h) h − 1 t d (νΛ − m+ ) ∗ (νΛ − m+ ) (t) = gh dλh ρ2 = of (1)
| {z }
dρ2 (t)
as h → ∞, and so we also have

Z Z
1
f dλh ρ2 = f (t − h) d (νΛ − m+ ) ∗ (νΛ − m+ ) (t) + of (1). (14.23)
h
Finally, we recall that f = f1 ∗ f2 and calculate
Z
1
(f1 ∗f2 )(t−h) d (νΛ −m+ ) ∗(νΛ −m+ )(t)
h
ZZ
1
= (f1 ∗f2 )(t1 +t2 −h) d(νΛ −m+ )(t1 ) d(νΛ −m+ )(t2 )
h
ZZZ
1
= f1 (u)f2 (t1 +t2 −h−u) du d(νΛ −m+ )(t1 ) d(νΛ −m+ )(t2 )
h
ZZZ
1
= f1 (t1 −r)f2 (t2 +r−h) dr d(νΛ −m+ )(t1 ) d(νΛ −m+ )(t2 )
h
ZZ Z
1
= f1 dλr (νΛ −m+ ) f2 dλh−r (νΛ −m+ ) dr,
h
| {z }| {z }
=I1 (r) =I2 (h−r)
14.3 Non-Trivial Spectrum of the Banach Algebra 523
where we used the substitution u = t1 − r and Fubini’s theorem (extended

by linearity to signed Radon measures and functions with compact support).
Depending on the support of f1 and f2 there exists some R ∈ R with the
property that the first inner integral I1 (r) vanishes if r < R and the second
inner integral I2 (h − r) vanishes if h − r < R. Given ε > 0 there also exists
some S such that
|I1 (r)| 6 kf1 kΛ + ε
for r > S and
|I2 (h − r)| 6 kf2 kΛ + ε
for h − r > S. Together with the bound (14.23) this gives
Z Z
1 h−R

(f1 ∗ f2 ) dλh ρ2 6 I (r)I (h − r) dr + of1 ,f2 (1)
h 1 2

R
Z
1 1 h−S

6 Of1 ,f2 (1) + I1 (r)I2 (h − r) dr + of1 ,f2 (1)
h h S
h − 2S
6 (kf1 kΛ + ε) (kf2 kΛ + ε)+ of1 ,f2 (1)
h
as h → ∞. As ε > 0 was arbitrary, this proves the claim in (14.22) and hence
the theorem.
14.3 Non-Trivial Spectrum of the Banach Algebra
We assume in this section that the semi-norm k·kΛ defined in Proposition 14.4
is non-trivial. Let AΛ be the completion of Cc (R) with respect to k · kΛ and
note that Theorem 14.6 shows that there is a Banach algebra homomorphism
Φ : L1 (R) → AΛ .
Essential Exercise 14.16. Give the details of the argument that deduces
the existence of a Banach algebra homomorphism Φ as above from The-
orem 14.6.
Theorem 14.17 (Spectrum of AΛ ). Suppose that k · kΛ is a non-trivial

semi-norm on Cc (R), so that the associated Banach algebra AΛ is non-trivial.
Then there exists some ξ ∈ R such that
Z
Cc (R) ∋ f 7−→ fb(ξ) = f (t)e−2πitξ dt
R
is continuous with respect to k·kΛ . Given such a ξ, if f ∈ Cc (R) and f (t)e−2πitξ

is non-negative for all t ∈ R, then kf kΛ = kf k1 .
Proof. Notice that every character χ of AΛ gives rise to the character χ ◦ Φ

on L1 (R). If χ is a non-trivial character, then χ ◦ Φ is also non-trivial since Φ
has dense image (by definition of AΛ ). As every non-trivial character of L1 (R)
has the form
f 7−→ fb(ξ)
for some ξ ∈ R (see Proposition 11.38), it is sufficient to show that AΛ has a
non-trivial character.
By the spectral radius formula (Corollary 11.29) the existence of a non-
trivial character follows if we can find some element f ∈ AΛ whose spectral
1/n
radius limn→∞ kf ∗n kΛ is non-zero.
Suppose now that g0 ∈ Cc (R) has kg0 kΛ > 0. By the density of S (R)
in L1 (R) we can find some g1 ∈ S (R) with kg1 − g0 k1 < kg0 kΛ , so that by
Theorem 14.6 we also have kg1 kΛ > 0. In fact, we may even assume that gb1
lies in Cc∞ (R) since these functions are also dense in L1 (R) (see Exercise 9.48).
Now let f ∈ S (R) be chosen so that fb ≡ 1 on Supp gb1 . By Proposition 9.31
this implies that
n
g1\
∗ f ∗n = gb1 fb = gb1
and so g1 ∗ f ∗n = g1 by Fourier inversion (Theorem 9.36). That is, f be-
haves like an identity for the element g1 . Applying the continuous algebra
homomorphism Φ : L1 (R) → AΛ we see that
0 < kg1 kΛ = kg1 ∗ f ∗n kΛ 6 kg1 kΛ kf ∗n kΛ
and so kf ∗n kΛ > 1 for all n > 1. As argued above, this gives the existence
of ξ ∈ R as in the theorem.
Now let one such ξ ∈ R be fixed and suppose that f ∈ Cc (R) has
f (t)e−2πitξ > 0
for all t ∈ R. We then have

Z Z
kf k1 = |f (t)| dt = f (t)e−2πitξ dt = fb(ξ) 6 kf kΛ
since the norm of a character is at most one by Lemma 11.24. By The-

orem 14.6 this shows that kf k1 = kf kΛ , as claimed.
14.4 Trivial Spectrum of the Banach Algebra
Theorem 14.18 (Trivial spectrum). For every ξ ∈ R there exists some f

in Cc (R) such that f (t)e−2πitξ > 0 for all t ∈ R and kf kΛ < kf k1 .
14.4 Trivial Spectrum of the Banach Algebra 525
Notice that Theorems 14.17 and 14.18 together show that kf kΛ = 0 for
every f ∈ Cc (R). Proposition 14.4 then gives the PNT (Theorem 14.1).
Proof of Theorem 14.18 for ξ 6= 0. In this case we will only use that
the density function D in Proposition 14.14 takes values in [0, 2] ⊆ R almost
surely. Let 

1 if |t| 6 1,
f0 (t) = 2 − |t| if |t| ∈ [1, 2],


0 otherwise,
and for a fixed ξ 6= 0 we define f (t) = f0 (t)e2πitξ . Choose a sequence (hn )
with hn → ∞ as n → ∞ for which
Z Z

kf kΛ = lim f dλhn νΛ − f dm.
n→∞
By Proposition 14.14 we may find a Borel measurable function D : R → [0, 2]

with Z

kf kΛ = f (D − 1) dm .
Choose θ ∈ R with Z
iθ
kf kΛ = e f (D − 1) dm,
so that with the bound |D(t) − 1| 6 1 for all t ∈ R we obtain

Z 2
kf kΛ = ℜ eiθ e2πitξ f0 (t)(D(t) − 1) dt
−2
Z 2
= cos(θ + 2πtξ)f0 (t)(D(t) − 1) dt
−2
Z 2
< f0 (t) dt = kf0 k1 = kf k1 ,
−2
where the strict inequality follows from ξ 6= 0. Thus the theorem follows in
this case.
Proof of Theorem 14.18 for ξ = 0. In this case we will use Mertens’

theorem one more time. Define a function f0 ∈ Cc (R) by


 1 if t ∈ [−N, 0],

1 − t if t ∈ [0, 1],
f0 (t) =

 N +1+t if t ∈ [−(N + 1), −N ],


0 otherwise,
for some N to be determined later. By Mertens’ theorem (Theorem 14.15)

we have
X Λ(n) X Λ(n)
f0 (log n − h) 6
n n
n>1 eh−(N +1) 6n6eh+1
= log eh+1 − log eh−(N +1) + O(1) = N + O(1)
for h > N + 1 and

X Λ(n) X Λ(n)
f0 (log n − h) >
n n
n>1 eh−N 6n6eh
= log eh − log eh−N + O(1) = N + O(1)

R
for h > N . Choosing N sufficiently large and using f0 (t) dt = N + 1, we
find that
X Λ(n) Z
n
f0 log − f0 (t) dt < N < kf0 k1
n h
n>1
for all sufficiently large h. Thus kf0 kΛ < kf0 k1 and the theorem follows.
14.5 Primes in Arithmetic Progressions
The prime number theorem generalizes to give an asymptotic density for

primes in arithmetic progressions, strengthening Dirichlet’s classical result
that an arithmetic progression {nq + a | n ∈ N} contains infinitely many
primes if gcd(q, a) = 1. In this section we once again follow Tao’s blog [103]
and indicate, largely through a sequence of exercises, how the arguments
above can be adapted to obtain additional asymptotic results which combine
to prove the following.
Theorem 14.19 (PNT in arithmetic progressions). Fix q > 1. Then

X x
1∼ (14.24)
φ(q) log x
p6x;
p≡a (mod q)
as x → ∞, where a ∈ Z has gcd(a, q) = 1 and φ(q) = |(Z/qZ)× | is the Euler

totient function of q.
Essential Exercise 14.20. Show that in order to prove Theorem 14.19 it is

enough to show that X x
Λ(n) ∼
φ(q)
n6x;
n≡a (mod q)
for any a ∈ Z with gcd(a, q) = 1.

14.5 Primes in Arithmetic Progressions 527
The sum in Exercise 14.20 lacks a certain structure, so we decompose the

characteristic function of the arithmetic progression into more convenient
×
expressions. Characters of the multiplicative group (Z/qZ) are usually called
×
multiplicative characters, so a function χ : (Z/qZ) → S1 is a multiplicative
×
character if χ(ab) = χ(a)χ(b) for all a, b ∈ (Z/qZ) . These are of course
simply characters on this abelian multiplicative group in the sense of Fourier
analysis. As is customary, we think of a multiplicative character χ as defining
a function χ′ : Z → C called a Dirichlet character of level or modulus q by
defining χ′ (k) = 0 if gcd(q, k) 6= 1 and χ′ (k) = χ(k + qZ) if gcd(q, k) = 1
for k ∈ Z. For convenience we will again write χ for χ′ .
Essential Exercise 14.21. (a) For any a ∈ Z with gcd(q, a) = 1 show that
the function fa = 1{k∈Z|k≡a (mod q)} can be expressed as a linear combination
of Dirichlet characters of modulus q.
(b) Show that in order to prove Theorem 14.19 it is enough to show that
X
χ(n)Λ(n) = o(x) (14.25)
n6x
for any non-trivial Dirichlet character χ of modulus q.
Now fix a non-trivial Dirichlet character χ of modulus q.
Essential Exercise 14.22. Adapt Proposition 14.4 and deduce that it is

sufficient to show that

X χ(n)Λ(n)

kf kχ = lim sup f (log n − h)
h→∞ n
n>1
vanishes for all f ∈ Cc (R).
Essential Exercise 14.23. Show that k · kχ from Exercise 14.22 defines a

semi-norm on Cc (R) satisfying kf kχ 6 kf k1 for every f ∈ Cc (R).
We now study a ‘twisted’ version of νΛ from Section 14.2.3, namely

X∞
χ(n)Λ(n)
νχΛ = δlog n ,
n=1
n
which is a complex-valued Radon measure on [0, ∞).
Essential Exercise 14.24. Using the fact that λh νΛ → m in the weak*

topology as h → ∞, show that for any sequence (hn ) with hn → ∞ as n → ∞
there exists a subsequence (hnk ) such that λhnk νχΛ converges in the weak*
topology to Dχ dm for some density function Dχ taking values in the convex
hull of the values of χ in C.
Working towards the proof of the algebra inequality, we replace the second
von Mangoldt function with the twisted version χΛ2 , which by Lemma 14.8
satisfies
X
χ(n)Λ2 (n) = χ(d)µ(d)χ nd log2 nd (14.26)
d|n
X
n n
= χ(n)Λ(n) log n + χ(d)Λ(d)χ d Λ d
d|n
∗ χΛ)(n),
= χ(n)Λ(n) log n + (χΛ D
by Lemma 14.8. By using a complex analogue of Lemma 14.10, we can

χ
define νsym by
χ 1
dνsym = dνχΛ2 / log = dνχΛ + d(νχΛ ∗ νχΛ ),
t
where the second equality follows from the formula above, just as in the proof
of Corollary 14.13.
Essential Exercise 14.25. (a) Show that
2
1X 2 log (1 + y)
χ(n) log n = O
y y
n6y
for y > 1.
(b) Deduce the twisted version of the Selberg symmetry formula,
X
χ(n)Λ2 (n) = O(x)
n6x
for x > 1.
χ
(c) Show that λh νsym → 0 in the weak* topology as h → ∞.
Essential Exercise 14.26. Show that kf1 ∗ f2 kχ 6 kf1 kχ kf2 kχ for all func-
tions f1 , f2 ∈ Cc (R).
Essential Exercise 14.27. Show that Theorem 14.17 also holds in a similar
way for the semi-norm k · kχ .
Essential Exercise 14.28. Use Exercise 14.27 to prove Theorem 14.18 for
the semi-norm k · kχ and ξ 6= 0.
It remains to establish the analogue of Theorem 14.18 for k · kχ and ξ = 0.
In this case we previously used the full force of Mertens’ theorem (The-
orem 14.15). Here we replace this with the statement
X χ(n)Λ(n)
= O(1) (14.27)
n
n6x
for x > 1, which we prove in the following subsection (this is also due to
Dirichlet).
Essential Exercise 14.29. Assuming (14.27), prove Theorem 14.18 for k·kχ
and ξ = 0, and conclude the proof of Theorem 14.19.
14.5.1 Non-Vanishing of Dirichlet L-function at 1
In this section we will prove (14.27). This will require a brief excursion into
the beginnings of analytic number theory; we refer to Serre [97] for more
details. The tools needed are basic properties of Dirichlet series and the
Abel summation formula. Following a convention going back to Riemann, we
write s = σ+it with σ, t ∈ R for any s ∈ C. There are shorter proofs of (14.27)
which do not use complex analysis; we refer, for example, to Tao [103] for the
details.
Essential Exercise 14.30. Given aP sequence (an ) of complex numbers, as-

sociate to it a formal Dirichlet series n>1 anns . Suppose that the set of s ∈ C
for which the series converges absolutely is neither the empty set nor all of C.
Show that there exists some σa ∈ R, the abscissa of absolute convergence,
such that the series converges absolutely if σ > σa but does not converge
absolutely if σ < σa . Moreover, show that the convergence is uniform on
compact subsets of {s ∈ C | ℜ(s) > σa }, so that the Dirichlet series defines a
holomorphic function on this half-plane.
A function θ : N → C is said to be completely multiplicative if
θ(mn) = θ(m)θ(n)
for all m, n > 1 and is multiplicative if the same property holds for all m, n > 1
with gcd(m, n) = 1. In particular, Dirichlet characters are completely multi-
plicative and the Möbius function is multiplicative.
Theorem 14.31 (Dirichlet L-functions). Fix q > 1 and let χ be a Dirich-

let character of modulus q. Define
X χ(n) X χ(n)
L(s, χ) = = ,
ns ns
n>1 n:gcd(n,q)=1
which converges absolutely for ℜ(s) > 1.

(1) If χ is non-trivial, then the series defining L(s, χ) converges uniformly
on any compact subset of H+ = {s ∈ C | ℜ(s) > 0}, so that the series
for L(s, χ) defines a holomorphic function on the right half-plane H+ .
Moreover,

X χ(n)

L(1, χ) − ≪ x−1 (14.28)
n
n6x
for x > 1.
(2) Writing χ0 for the trivial character, L(s, χ0 ) has a meromorphic exten-
sion to the half-plane H+ which has a simple pole at s = 1, and is holo-
morphic on H+r{1}.
(3) If χ is a non-trivial character, then L(1, χ) 6= 0.
Using in particular the last statement of this theorem, we will be able to

prove the remaining step for the proof of Theorem 14.19.
Corollary 14.32. The bound (14.27) holds for any non-trivial Dirichlet
character χ.
The following is a rather simple but useful tool for our discussions.
Lemma 14.33 (Abel summation). For any sequences (an ) and (bn ),
m−1
X
Sm = An (bn − bn+1 ) + Am bm
n=1
m
X m
X
where Am = an and Sm = an bn , for all m > 1.
n=1 n=1
Proof. Notice that an = An − An−1 for n > 1 and A0 = 0, so that

m
X m
X m−1
X
Sm = (An − An−1 )bn = An bn − An bn+1 .
n=1 n=1 n=1
Proof of Corollary 14.32. We now use L(1, χ) 6= 0 to deduce (14.27).

We calculate
X χ(n) log n X χ(n) X X χ(d)Λ(d) X χ(m)
= Λ(d) = ,
n n d m
n6x n6x n=dm d6x m6 x
d
| {z }
Sk
P
where Sk = m6k χ(m) x
m with k = ⌊ d ⌋ is the partial sum appearing in The-
orem 14.31(1). By (14.28) we then have |Sk − L(1, χ)| ≪ k1 . Substituting this
into the expression above gives
X χ(n) log n X χ(d)Λ(d)

d
= L(1, χ) + O
n d x
n6x d6x
 
X χ(d)Λ(d) 1 X
= L(1, χ) + O Λ(d)
d x
d6x d6x
X χ(d)Λ(d)
= L(1, χ) + O(1)
d
d6x
by (14.15).
We want to show that the left-hand side in the last calculation is also O(1),
as then (14.27) follows since L(1, χ) 6= 0. For this we use Lemma 14.33
with an = χ(n) and bn = logn n . Note that
m
X
Am = an
n=1
Pq−1
satisfies |Am | 6 φ(q) since n=0 χ(n) = 0 and χ(n + q) = χ(n) for all n ∈ N.
This gives

X χ(n) log n ℓ−1
X

6 φ(q) |bn − bn+1 | + φ(q)bℓ ,
n
n6x n=1
where ℓ = ⌊x⌋. Since the sequence (bn ) is monotonically decreasing for

sufficiently large n with limit 0, we obtain a telescoping sum and deduce
that this term is indeed O(1). As mentioned above, this finishes the proof
since L(1, χ) 6= 0.
We split the proof of Theorem 14.31 into several steps.

Proof of Theorem 14.31(1) and (2). Since χ(n) 1
n 6 nσ , the series defin-
s
ing L(s, χ) converges absolutely for σ > 1.

Uniform convergence on compact subsets of H+ . Let χ be a non-
trivial character, K ⊆ H+ a compact subset, and choose σ0 so as to ensure
that ℜ(s) > σ0 > 0 for all s ∈ K. We apply Abel summation with an = χ(n)
and bn = n−s to obtain
Xm m−1
X
χ(n)
s
= S m = An n−s − (n + 1)−s + Am m−s . (14.29)
n=1
n n=1
Note that |m−s | 6 m−σ0 → 0 as m → ∞. Since the derivative of x 7→ x−s

with respect to x is −sx−s−1 , we see that
−s
n − (n + 1)−s 6 Rn−σ0 −1
for all s ∈ K, where R > 0 satisfies K ⊆ BR C . Together with |A | 6 φ(q) for

n
all n > 1 we see that (14.29) converges uniformly on K. This shows that
∞
X
L(s, χ) = An n−s − (n + 1)−s
n=1
is a holomorphic function on H+ , and setting s = 1 we also obtain

X∞

−1 −1 −1
|L(1, χ) − Sm | = An n − (n + 1) − Am m ≪ m−1 ,

n=m
after again using the triangle inequality and |An | 6 φ(q), since the sum
telescopes. This gives the claim in (14.28).
Properties of L(s, χ0 ). Let χ0 be the trivial character of modulus q. It
will be convenient to start by recalling some properties of the Riemann zeta
function, defined for ℜ(s) > 1 by
X∞
1
ζ(s) = s
.
n=1
n
One easily see that this series converges absolutely for ℜ(s) > 1, and so defines
a holomorphic function there by Exercise 14.30. To obtain the extension
to H+ and the pole at s = 1, we write
∞
X Z ∞ X∞ Z n+1
1
ζ(s) − = n−s − x−s dx = n−s − x−s dx,
s − 1 n=1 1 n=1 n
and as in the proof of the first part of the theorem, we see that the series on
the right-hand side converges uniformly on any compact subset K ⊆ H+ .
Returning to the trivial Dirichlet character χ0 of modulus q, we will see
that the difference between L(s, χ0 ) and ζ (or more precisely, their ratio)
is relatively benign. Let p1 , . . . , pℓ be the finite list of primes that divide q.
Using unique factorization and
X
L(s, χ0 ) = n−s
n:gcd(n,q)=1
we obtain
L(s, χ0 )(1 − p−s

1 )
−1
· · · (1 − p−s
ℓ )
−1
=
! ℓ
X Y X −s
n−s 1 + p−s
j + p −2s
j + · · · = n = ζ(s) (14.30)
n:gcd(n,q)=1 j=1 n>1
for ℜ(s) > 1 by absolute convergence of all the series involved. Since the
±1
functions s 7→ (1 − p−s ) are holomorphic on H+ , we may use the results
for ζ above and deduce the same properties for L(s, χ0 ).
The last part of the proof of Theorem 14.31 requires some facility with
Dirichlet series provided by the following exercise and lemma.
P
Essential Exercise 14.34. Show that if f (s) = n>1 anns converges abso-
P
lutely for ℜ(s) > σ0 , then − n>1 an nlog
s
n
converges absolutely and uniformly
P
on compact subsets of {s ∈ C | ℜ(s) > σ0 }, and f ′ (s) = − n>1 an nlog s
n
there.
Lemma 14.35. Let (an ) be a real P sequence with an > 0 for all n > 1 and
suppose the Dirichlet series f (s) = n>1 anns converges for ℜ(s) > 1. Suppose
that f can be extended to a meromorphic function on H+ , also denoted f .
Then either
P
• n>1 anns converges absolutely for ℜ(s) > 0 and f is holomorphic on H+ ,
or P P
• there exists some σ0 > 0 such that n>1 naσn = ∞, n>1 anns converges
0
absolutely for ℜ(s) > σ0 , and f has a pole at σ0 .
Proof. Define n X an o
σ0 = inf σ > 0 | < ∞ .
nσ
n>1
By non-negativity
P of the coefficients an and monotonicity of σ 7→ n−σ , we see
an
that the series n>1 ns converges absolutely for ℜ(s) > σ0 , and so defines a
holomorphic function there (see Exercise 14.30), which must therefore coin-
cide P
with f . If σ0 = 0 then we are P in the first case of the lemma. If σ0 > 0
and n>1 naσn = ∞, then f (s) = n>1 anns for ℜ(s) > σ0 . Moreover, it is
0
easy to see (for example, using the monotone convergence theorem) by non-
negativity of the coefficients that
X an X an
lim f (σ) = lim = = ∞,
σցσ0 σցσ0 nσ nσ0
n>1 n>1
which shows that f must have a pole at σ0 . P

It remains to rule out the possibility that σ0 > 0 and n>1 naσn < ∞. 0
Assuming this is the case andPusing the assumption that f is meromorphic

an
on H+ we will deduce that n>1 nσ −ε < ∞ for some ε ∈ (0, σ0 ), which
0
will be a contradiction to the definition of σ0 . Since the coefficients are non-

negative we have
X an X an
= lim = lim f (σ),
nσ0 σցσ0 nσ σցσ0
n>1 n>1
which shows that f must be holomorphic at σ0 (since if σ0 were to be a

pole of f we would have limσցσ0 |f (σ)| = ∞). By the same argument and
Exercise 14.34 we then have
X an (log n)k
= (−1)k f (k) (σ0 ) < ∞
nσ0
n>1
for all k > 0. By the Taylor expansion of f at σ0 , this gives for sufficiently
small ε > 0 that
1 (k)
∞
X ∞
X 1 X an (log n)k k
f (σ0 − ε) = f (σ0 ) (−ε)k = ε
k! k! nσ0
k=0 k=0 n>1
X ∞
X 1
= an (− log n)k n−σ0 (−ε)k
k!
n>1 k=0
since all terms are again non-negative. The inner sum is precisely the Taylor
expansion of s 7→ n−s at σ0 and so gives n−(σ0 −ε) and hence
X an
= f (σ0 − ε) < ∞,
nσ0 −ε
n>1
which contradicts the definition of σ0 .
Proof of Theorem 14.31(3): non-vanishing of L(1, χ). Summarizing

the arguments above, we have obtained the meromorphic function
s 7→ L(s, χ0 )
on H+ for the trivial Dirichlet character χ0 with a simple pole at s = 1, and

the holomorphic functions s 7→ L(s, χ) on H+ for any non-trivial Dirichlet
character χ. If L(1, χ) = 0 for some non-trivial Dirichlet character χ, then
the function ζq defined by
Y
ζq (s) = L(s, χ)
χ
would be holomorphic on H+ . Here the product is taken over all the charac-
ters.
We will see that ζq has a pole at 1 by using Euler product expansions.
Unique factorization in the integers and complete multiplicativity of χ show
that
X χ(n) Y χ(p)
−1 Y
χ(p)
−1
L(s, χ) = = 1− s = 1− s
ns p
p p
n>1 p:gcd(p,q)=1
for ℜ(s) > 1. This may be seen by extending the argument for (14.30) to all
primes and using absolute convergence. Taking the product over all Dirichlet
characters χ again gives the function
Y Y χ(p)
−1
ζq (s) = 1− s . (14.31)
χ
p
p:gcd(p,q)=1
Clearly the set of Dirichlet characters of modulus q forms a group with

respect to pointwise multiplication, and for a fixed p with gcd(p, q) = 1 the
image of the homomorphism χ 7→ χ(p) ∈ S1 ⊆ C is a subgroup consisting of
all roots of unity of order f (p)φ(q), and the kernel of this homomorphism
has g(p) = fφ(q)
(p) elements. It follows that
f (p) g(p) g(p)

Y Y
−s
1 − χ(p)p = 1 − ωfk(p) p−s = 1 − p−f (p)s ,
χ k=1
where ωf (p) is a primitive f (p)th root of unity. Using this in the expres-
sion (14.31) for ζq (s) gives
Y −g(p)
ζq (s) = 1 − p−f (p)s
p:gcd(p,q)=1
Y g(p)
= 1 + p−f (p)s + p−2f (p)s + · · · (14.32)
p:gcd(p,q)=1
X an X an
= =
ns n
ns
n:gcd(n,q)=1
for ℜ(s) > 1, where we expanded the Euler product once again into a con-
vergent Dirichlet series with certain coefficients an . Notice that the precise
form ofP(14.32) shows that an ∈ N0 for all n ∈ N. By Lemma 14.35 the
series n>1 anns either converges absolutely for ℜ(s) > 0 and is holomorphic
on H+ or there exists some σ0 > 0 such that ζq has a pole at σ0 . In the latter
case it follows that σ0 = 1 and L(1, χ) 6= 0 for every non-trivial Dirichlet
character χ 6= χ0 .
It remains to show that the former case cannot occur. To see this, notice
that for any prime p with gcd(p, q) = 1 and σ > 0 we have
g(p)
1 + p−f (p)σ + p−2f (p)σ + · · · > 1 + p−φ(q)σ + p−2φ(q)σ + . . .
P an
and hence under the assumption that ζq (σ) = n>1 nσ converges for σ > 0,
X an Y g(p)
−f (p)σ −2f (p)σ
ζq (σ) = = 1 + p + p + · · ·
nσ
n>1 p:gcd(p,q)=1
Y
> 1 + p−φ(q)σ + p−2φ(q)σ + . . . = L(φ(q)σ, χ0 ) (14.33)
p:gcd(p,q)=1
also for all σ > 0. Using (14.33) we obtain
lim ζq (σ) = lim1 L(φ(q)σ, χ0 ) = ∞,

1
σց φ(q) σց φ(q)
a contradiction to ζq being holomorphic on H+ .

In this chapter we have discussed a single result of great historical import-
ance, which involved, in addition to functional analysis, some ideas from ana-
lytic number theory. For a thorough introduction to this important subject
we refer to the guided course in the work of Murty [77] and the monograph
of Iwaniec and Kowalski [49].
Appendix A: Set Theory and Topology
A.1 Set Theory and the Axiom of Choice
We will be using naive set theory, and in particular will use without specific
reference the axioms of Zermelo–Fraenkel set theory with the axiom of choice
(we refer to Kelley [51] for a good general source on all of the material in this
appendix). This does require some caution. For example, it does not permit
there to be a set that contains all sets, for if there were such a ‘universal’ set V
then its subset C = {A ∈ V | A ∈ / A} forces the statement C ∈ C ⇐⇒ C ∈ / C,
which is contradictory.
Here are some basic properties of sets that we will use without comment.
(1) A set will never contain itself. S
(2) For every set S of sets there is a set A∈S A, the union, containing all
elements that are contained in some A ∈ S.
(3) For every set A there is a power set P(A) containing all subsets of A.
(4) Any condition on the elements of a set can be used to define a new set,
namely the subset of all elements that satisfy the condition.
Examples of sets include the empty set ∅, the natural numbers N, the real
numbers R, the set of functions R → C, which may also be written as CR ,
and so on.
The following axiom of set theory is less intuitive than those above, but it
plays a central role in analysis.
Axiom of Choice. Suppose that Aı is a non-empty set for all ı ∈ I. Then

there is a function [
f : I −→ Aı
ı∈I
with f (ı) ∈ Aı for all ı ∈ I.

Q
In other words, the Cartesian product ı∈I Aı (which by definition com-
prises all such functions) is non-empty.

Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6
538 Appendix A: Set Theory and Topology
While this axiom appears quite innocent (indeed, it appears almost obvi-
ous), it turns out to have a number of exotic consequences.(36) The axiom of
choice has many equivalent formulations, one of which is Zorn’s lemma, which
is particularly useful in analysis. In order to state this, recall that a partial
order on a set S is a relation 4 with the reflexivity property that a 4 a
for all a ∈ S, the transitivity property that a 4 b, b 4 c =⇒ a 4 c for
all a, b, c ∈ S, and the anti-symmetry property that a 4 b, b 4 a =⇒ a = b
for all a, b ∈ S. A partial order is a linear order if for every pair a, b ∈ S
we have either a 4 b or b 4 a. A maximal element in a partially ordered
set (S, 4) is an element m ∈ S for which m 4 a for some a ∈ S implies
that a = m.
Zorn’s lemma. Let (S, 4) be a partially ordered set, and suppose that for
every linearly ordered subset L ⊆ S there exists an element m ∈ S with ℓ 4 m
for all ℓ ∈ L. Then there exists a maximal element m ∈ S.
One might imagine setting out to prove Zorn’s lemma inductively along
the following lines. Starting with a single element (which certainly forms a
linearly ordered set) one can build larger and larger linearly ordered subsets.
If the current linearly ordered subset L has a maximal element, then it may
also be a maximal element for S, in which case we are done. Otherwise, one
can use the assumed property and add an element to L which is bigger than
every element of L. Repeating this inductively (by transfinite induction, and
noting that this procedure only ends once a maximal element in S is found),
Zorn’s lemma follows. However, in the course of the proof one has to make
(potentially uncountably) many choices, and doing this carefully reveals that
the argument needs the axiom of choice.
A.2 Basic Definitions in Topology
The notion of an open set is fundamental for defining continuity and conver-
gence.
Definition A.1. Let X be a set. A family T ⊆ P(X) of subsets of X is called
a topology on X if
• ∅, X ∈ T ;
• if O1 , O2 ∈ T then O1 ∩ O2 ∈ T ; S
• if Oi ∈ T for all i ∈ I, where I is an arbitrary index set, then i∈I Oi ∈ T .
The pair (X, T ) is called a topological space. The elements of a topology are
called open sets and a set A ⊆ X with XrA ∈ T is called closed. A set that
is both open and closed is called a clopen set.
Given a point x in a topological space, a neighbourhood of x is a set V
containing an open set U that contains x. We will often want to assume that
A.2 Basic Definitions in Topology 539
neighbourhoods are open sets, in which case we will speak of open neighbour-
hoods. A topological space is called Hausdorff if for any points x1 6= x2 in X
there exist neighbourhoods U1 of x1 and U2 of x2 such that U1 ∩ U2 = ∅.
Many of the topological spaces that we will study are particularly well-
behaved ones arising from a metric.
Definition A.2. A function d : X × X → R is called a metric if it satisfies

the following properties:
• (strict positivity) d(x, y) > 0 and d(x, y) = 0 if and only if x = y, for
all x, y ∈ X;
• (symmetry) d(x, y) = d(y, x) for all x, y ∈ X;
• (triangle inequality) d(x, y) 6 d(x, z) + d(z, y) for all x, y, z ∈ X.
The pair (X, d) is called a metric space. A set O ⊆ X in a metric space is
called open if for any x ∈ O there is some ε > 0 such that
Bε (x) = {y ∈ X | d(x, y) < ε} ⊆ O.
The set Bε (x) is called an open ε-ball around x.
It is easy to check that the collection of all open sets in a metric space
defines a topology on the metric space. If instead of strict positivity we only
have
• (positivity) d(x, y) > 0 for all x, y ∈ X
then we say that d is a pseudo-metric; this also gives rise to a topology in
the same way.
Definition A.3. A function f : X → Y between topological spaces (X, TX )

and (Y, TY ) is continuous if f −1 (O) ∈ TX for all O ∈ TY .
Definition A.4. Let X be a set and suppose that T1 and T2 are two topo-
logies on X. If the identity map I : X → X viewed as a map from (X, T1 )
to (X, T2 ) is continuous (which means that T2 ⊆ T1 ), then T2 is said to be
weaker or coarser than T1 , and T1 is called stronger or finer than T2 .
As is well-known from analysis, we say that a sequence (xn ) in a Haus-

dorff topological space X converges to x, written limn→∞ xn = x, if for
every neighbourhood U of x there exists some N such that xn ∈ U for
all n > N . While this notion is sufficient for metric spaces, for more gen-
eral topological spaces we also need the notions of filters and convergent
filters (see also Exercise A.12). If the topology is given by a metric d, then
this is equivalent to the property that for any ε > 0 there is some N such
that n > N =⇒ d(xn , x) < ε. Sufficiency means, for example, that we can
characterize continuity for functions between metric spaces using convergence
of sequences.
Definition A.5. Let X be a set. A family F ⊆ P(X) of subsets of X is a

filter if
• X ∈ F but ∅ ∈ / F;
• if F1 , F2 ∈ F then F1 ∩ F2 ∈ F; and
• if F ∈ F and F ⊆ B ⊆ X, then B ∈ F.
Example A.6. (a) Let (X, T ) be a topological space and x ∈ X. Then
Ux = {U ∈ P(X) | there exists some O ∈ T with x ∈ O ⊆ U }
is a filter, called the neighbourhood filter of x.

(b) Let X = N and set
F∞ = {B ⊆ N | there exists some N ∈ N with n ∈ B for all n > N }.
Then F∞ is a filter, called the tail filter of N.

(c) While this is not needed here, we mention that a directed set (as in the
definition of nets) gives rise to a generalization of tail filters.
Definition A.7. Let F1 , F2 ⊆ P(X) be filters on a set X. Then F1 is finer

than F2 , or F2 is coarser than F1 , if F1 ⊇ F2 .
Definition A.8. Let X be a Hausdorff topological space, and let F ⊆ P(X)

be a filter. We say that F converges to x ∈ X, written x = lim F , if F is
finer than the neighbourhood filter Ux .
Exercise A.9. Let X be a Hausdorff topological space, and let F ⊆ P(X) be a filter.
Show that the limit lim F is unique if it exists.
Definition A.10. Let M be a set, F ⊆ P(M ) a filter, X a topological space,

and f : M → X a map. We say that f converges along F to x ∈ X, written
as limF f = x, if the image filter
f (F ) = {B ⊆ X | there exists some A ∈ F with f (A) ⊆ B}
is finer than Ux (that is, the image filter converges to x).

Exercise A.11. Let M = N, let X be a Hausdorff topological space, and let the func-
tion f : N → X correspond to a sequence (f (n)). Show that the sequence (f (n)) converges
in the usual sense if and only if f converges along F∞ , and in this case
lim f (n) = lim f,

n→∞ F∞
where F∞ is the tail filter from Example A.6(b).
Exercise A.12. Let X, Y be Hausdorff topological spaces, and let f : X → Y be a map.

Show that f is continuous if and only if for all x ∈ X we have limUx f = f (x), where Ux
is the neighbourhood filter from Example A.6(a).
A.3 Inducing Topologies 541
A more direct generalization of sequences and their convergence properties

is given by nets. These are also known as Moore–Smith sequences, and while
they are an intuitive route to generalize sequences, they are a less natural
starting point for the important notion of ultrafilters (see Definition A.23).
To define nets we first need to define the directed sets that will replace
the use of N and its order 6 that give the domain of a sequence. A directed
set is a set D together with a binary relation . that satisfies the following
properties:
• (Reflexivity) We have n . n for all n ∈ D.
• (Transitivity) If ℓ . m and m . n for some ℓ, m, n ∈ D, then ℓ . n.
• (Filter property) If ℓ1 , ℓ2 ∈ D, then there exists some n ∈ D with ℓ1 . n
and ℓ2 . n.
Exercise A.13. Show that Example A.6(b) can be generalized as follows. For any directed
set (D, .) the set

B ⊆ D | there exists some ℓ ∈ D such that {n ∈ D | ℓ . n} ⊆ B
forms a filter: the tail filter of (D, .).
A function f : D → X whose domain D is a directed set taking values in a

topological space is called a net. A net f : D → X converges to x0 ∈ X if for
every neighbourhood U ∋ x0 of x0 there exists some ℓ0 ∈ D such that n ∈ D
and ℓ0 . n implies that f (n) ∈ U .
Exercise A.14. (a) Show that a net f : D → X converges to x0 ∈ X if and only if f
converges to x0 along the tail filter of (D, .).
(b) Conversely, given a function f : M → X on some set M taking values in a topological
space X and a filter F ⊆ P(M ), show how to define a directed set (D, .) and a net so that
the net converges to x0 ∈ X if and only if f converges to x0 ∈ X along F .
A.3 Inducing Topologies
If (X, T ) is a topological space and Y ⊆ X is any subset, then the topology

on Y induced from the topology on X is the weakest topology on Y for which
the identity inclusion map Y ֒→ X is continuous. Equivalently, the induced
topology on Y is {Y ∩ O | O ∈ T }.
Suppose that f : X → Y is a map between two sets. If Y has a topo-
logy TY , then there is a weakest topology on X which makes f continuous.
This topology is given by f −1 (TY ) = {f −1 (O) | O ∈ TY }. If on the other
hand X has a topology TX , then there is a strongest topology on Y which
makes f continuous. It is given by {O ⊆ Y | f −1 (O) ∈ TX }. The former case
has an important generalization as follows.
Definition A.15. Let X be a set, and let fı : X → Yı for ı ∈ I be a family
of maps from X to topological spaces (Yı , TYı ). Then the initial (or weak, or
limit, or projective) topology induced by these maps is the weakest topology

for which all of the maps are continuous.
The open sets in the initial topology are arbitrary unions of finite intersec-
tions of elements of fı−1 (TYı ) for various ı ∈ I. The initial topology can also
be characterized by the following universal property. A function g : Z → X
is continuous if and only if fı ◦ g : Z → Yı is continuous for each ı ∈ I.
A particular case of the initial topology is the product topology.
Definition A.16. Suppose that (Yı , Tı ) for ı ∈ I is a collection of topological

spaces. Define Y
X= Yı .
ı∈I
The product topology on X is the initial topology induced by the projection

maps
π : (yı )ı∈I 7−→ y
from X to Y for all  ∈ I.
Another case is given by the topology generated by a family of topologies.

Suppose that X is a set, and for all ı ∈ I we have a topology Tı on X.
Then we may consider the identity map I : X → X as a map from X to
the topological space (X, Tı ) for each ı ∈ I, and associate to X the weakest
topology that is finer than all the topologies Tı for ı ∈ I.
Notice that the product topology, or the weakest topology that is finer
than a given family of topologies, may not be metric (that is, derived from a
metric) even if the original topologies were metric. However, there is a special
situation in which the metric property is preserved by taking products.
Lemma A.17. Let X be a set and suppose that dn : X ×X → R is a sequence

of pseudo-metrics. Then the weakest topology that is finer than the topologies
induced by dn for n ∈ N is itself induced by a pseudo-metric. In particular, the
countable product of metric spaces is a metric space in the product topology.
Proof. For the main part of the argument it is important to know that we
may assume that dn only takes on values in [0, 1). To see this, we claim that
dn
if dn is any pseudo-metric then dn = 1+d n
is a pseudo-metric that defines
the same topology as dn does.
Positivity and symmetry of dn are clear since they hold for dn . Hence it
is enough to check the triangle inequality for dn . For this, notice first that
u
the function u 7→ 1+u maps from [0, ∞) to [0, 1), is monotone increasing and
satisfies
u+v u v
6 + (A.1)
1+u+v 1+u 1+v
for u, v ∈ [0, ∞). The inequality (A.1) follows from the inequality
A.3 Inducing Topologies 543
(u + v)(1 + u)(1 + v) = (u + v)(1 + u + v + uv)

6 (1 + u + v)(u + v + 2uv)
= (1 + u + v) (u(1 + v) + v(1 + u))
after dividing by (1 + u + v)(1 + u)(1 + v). It follows that if x, y, z ∈ X, then
dn (x, y) dn (x, z) + dn (z, y)

dn (x, y) = 6
1 + dn (x, y) 1 + dn (x, z) + dn (z, y)
dn (x, z) dn (z, y)
6 + = dn (x, z)+dn (z, y),
1 + dn (x, z) 1 + dn (z, y)
as required. It is clear that dn (x, y) < ε for x, y ∈ X implies that dn (x, y) < ε.
ε
For the converse, notice that dn (x, y) < 1+ε implies that dn (x, y) < ε for
u
all ε > 0 and x, y ∈ X, since u 7→ 1+u is strictly monotonely increasing. This
implies that dn and dn define the same open sets.
So suppose that dn : X × X → [0, 1) is a pseudo-metric for each n > 1.
We define ∞
X 1
d(x, y) = d (x, y).
n n
n=1
2
Since this sum converges on X × X, it defines another pseudo-metric on X.
We claim that the topology induced by d is precisely the weakest topology
that is finer than all the topologies induced by dn for n > 1.
Suppose first that O ⊆ X is an open set with respect to d, and let x ∈ O.
By definition there exists an ε > 0 with
Bεd (x) = {y ∈ X | d(x, y) < ε} ⊆ O.

P∞
Now choose N with n=N +1 21n < 2ε . Then
N
\
dn
Bε/2N (x) ⊆ Bεd (x) ⊆ O
n=1
ε
since if y ∈ X satisfies dn (y, x) < 2N for n = 1, . . . , N then
N
X ∞
X 1
d(x, y) 6 dn (x, y) + < ε.
n=1
2n
n=N +1
As this holds for all x ∈ O, we see that O is a union of finite intersections of

sets that are open with respect to the topology induced by dn .
The converse is similar. Suppose O is a union of finite intersections of sets
that are open with respect to dn . Let x ∈ O and suppose that
N
\
x∈ On ⊆ O,
n=1
where On is open with respect to dn for n = 1, . . . , N . Then we may as well

assume On = Bεdn (x) for some ε > 0. We claim that
N
\
d
Bε/2N (x) ⊆ On ⊆ O,
n=1
which then implies that O is open with respect to the pseudo-metric d. So

suppose y ∈ X satisfies
X∞
1 ε
n
dn (y, x) = d(y, x) < N ,
n=1
2 2
then 21n dn (y, x) < 2εN and for n ∈ N with 1 6 n 6 N this implies
that dn (y, x) < ε, hence y ∈ On and so the claim. The first part of the
lemma follows.
Now suppose that
Y∞
X= Xn
n=1
where each (Xn , dn ) is a metric space, and we define

∞
X 1 dn (xn , yn )
d (xn ), (yn ) = .
n=1
2n 1 + dn (xn , yn )
Then d is a pseudo-metric by the argument above. However,

d (xn ), (yn ) = 0 =⇒ dn (xn , yn ) = 0
for all n > 1, which implies that (xn ) = (yn ), so d is a metric on X. The
topology induced by the pseudo-metric
((xk )k , (yk )k ) 7−→ dn (xn , yn )
is precisely the weakest topology for which the projection to Xn is continuous.

By the first part of the lemma, this shows that the topology induced by d
is the weakest topology for which all the projections are continuous, so d
induces the product topology.
A.4 Compact Sets and Tychonoff’s Theorem 545
A.4 Compact Sets and Tychonoff’s Theorem
Compactness is a fundamental notion for all of analysis, and in particular for

functional analysis. It plays a role in analysis a little like finiteness does in
combinatorics.
Definition A.18. Let (X, T ) be a Hausdorff topological space. A family of

sets U is called an open cover if U consists of open sets and
[
X⊆ O.
O∈U
The space (X, T ) is called compact if every open cover has a finite subcover,
that is, a finite subset V ⊆ U which is also an open cover.
An alternative and equivalent condition for compactness can be given in

terms of closed sets. A collection of sets {Aı | ı ∈ I} has the finite intersection
property if
\k
Aıℓ 6= ∅
ℓ=1
for any finite subset {ı1 , . . . , ık } ⊆ I, and has the infinite intersection property
if \
Aı 6= ∅.
ı∈I
Then a Hausdorff topological space (X, T ) is compact if and only if every

family of closed sets with the finite intersection property also has the infinite
intersection property.
Recall that a metric space (X, d) is called complete if every sequence (xn )
with the Cauchy property that for every ε > 0 there is some N = N (ε) for
which
m, n > N =⇒ d(xm , xn ) < ε
is convergent, meaning that there is some x∗ ∈ X with the property that for
any ε > 0 there is some N = N (ε) such that
n > N =⇒ d(xn , x∗ ) < ε.
For metric spaces there are further equivalent properties characterizing com-
pactness.
• A metric space (X, d) is sequentially compact if any sequence (xn ) in X
has a convergent subsequence.
• A metric space (X, d) is compact if and only if it is complete and
totally bounded, meaning that for every ε > 0 there is a finite set of
points {x1 , . . . , xn } in X with
n
[
X= Bε (xi ).
i=1
Exercise A.19. Recall the proofs that the different notions of compactness coincide for
metric spaces.
Compactness is closed under taking products in the following sense.
Theorem A.20 (Tychonoff ). Let I be an index set, Q and suppose for ı ∈ I

that (Xı , Tı ) is a compact topological space. Then ı∈I Xı is compact with
respect to the product topology.
The notion of compactness has many useful extensions and generalizations.

We will only need two of these.
Definition A.21. A Hausdorff topological space is called locally compact if

every point has a neighbourhood which is compact in the inducedS∞ topology.
A topological space is called σ-compact if it can be written as n=1 Kn with
each Kn compact in the induced topology.
Lemma A.22. Let X be a locally compact space. Then for every compact
subset K ⊆ X there exists an open subset O ⊆ X with compact closure that
contains K. If X is in addition σ-compact,
S∞ then there exists a sequence of
o
compact sets (Kn ) such that X = n=1 Kn and Kn ⊆ Kn+1 for all n > 1.
Proof. Let K ⊆ X be compact. Since X is locally compact, any x ∈ K has

an open neighbourhood Ux with compact closure. Applying compactness to
the open cover {Ux | x ∈ K} we get K ⊆ O = Ux1 ∪ · · · ∪ Uxm for some
finite collection of points x1 , . . . , xm ∈ K. By construction, O is open with
compact closure.
Suppose now X S is σ-compact. Then there exists a sequence of compact
∞
sets (Qn ) with X = n=1 Qn . We first define K1 = Q1 and then construct Kn
inductively as follows. Suppose Kn ⊇ Qn has already been constructed. Ap-
plying the above argument to Kn gives some open set On with compact
closure that contains Kn . Now define Kn+1 = On ∪ Qn+1 . The sequence
constructed in this way satisfies all the desired properties.
Compactness can also be characterized in terms of filters, and for this
another notion is useful.
Definition A.23. Let X be a set and F ⊆ P(X) a filter. Then F is an

ultrafilter if for every B ⊆ X we have B ∈ F or XrB ∈ F.
Proposition A.24. Let X be a Hausdorff topological space. Then the follow-

ing are equivalent.
(1) X is compact.
A.5 Normal Spaces 547
(2) Every filter on X has a finer filter that converges to some x ∈ X.

(3) Every ultrafilter converges.
The implication (3) =⇒ (1) uses the axiom of choice in the form of
Zorn’s lemma (to show that any filter is contained in an ultrafilter; see Ex-
ercise A.25).
Exercise A.25. (a) Use Zorn’s lemma to show that every filter has a finer filter that is
an ultrafilter.
(b) Prove Proposition A.24.
(c) Use Proposition A.24 to prove Tychonoff’s theorem.
A.5 Normal Spaces
A circle of useful constructions concerns ways to approximate functions with

continuous functions. The appropriate level of generality is provided by nor-
mal spaces; as the name suggests, many of the topological spaces that arise
in mathematics have this property (in particular, any metric space is normal
and any compact space is normal).
Definition A.26. A topological space (X, T ) is said to be normal if for any

closed sets A, B in X with A ∩ B = ∅ there are open sets U ⊇ A and V ⊇ B
with U ∩ V = ∅.
This definition, which says that disjoint closed sets can be separated by
open sets, may be thought of as requiring that there are ‘enough’ open sets.
An important consequence is that there are ‘enough’ continuous functions in
the following sense (this presentation is taken from Tao’s blog [103]).
Lemma A.27 (Urysohn’s lemma). Let (X, T ) be a topological space.

Then the following properties of X are equivalent.
(1) X is a normal space.
(2) For every closed set K ⊆ X and every open set U ⊇ K, there is an open
set V and a closed set L with U ⊇ L ⊇ V ⊇ K.
(3) For every pair of closed sets K and L in X with K ∩ L = ∅, there exists
a continuous function f : X → [0, 1] with
(
1 if x ∈ K,
f (x) =
0 if x ∈ L.
(4) For every closed set K ⊆ X and every open set U ⊇ K, there exists a
continuous function f : X → [0, 1] with 1K 6 f 6 1U .
For metric spaces the proof of the difficult step below is rather simple:
Given two disjoint closed sets K and L we can define the function f as in (3)
by
d(x, L)
f (x) =
d(x, K) + d(x, L)
for x ∈ X, where d(·, K) and d(·, L) are the continuous distance functions
defined in (2.29).
Proof. The implications (3) ⇐⇒ (4) and (1) ⇐⇒ (2) are clear, since a set
is closed if and only if its complement is open.
Assume now that (3) holds. Given disjoint closed sets K, L ⊆ X, let f be
the function given by (3). Then the open sets U = {x ∈ X | f (x) > 0.9}
and V = {x ∈ X | f (x) < 0.1} show (1).
Assume next that (2) holds, let K = K1 be a closed set, and let U = U0
be an open set with K1 ⊆ U0 . By (2), we can find a closed set K1/2 and an
open set U1/2 with
U0 ⊇ K1/2 ⊇ U1/2 ⊇ K1 .
Applying (2) again twice gives closed sets K1/4 , K3/4 and open sets U1/4 , U3/4
with
U0 ⊇ K1/4 ⊇ U1/4 ⊇ K1/2 ⊇ U1/2 ⊇ K3/4 ⊇ U3/4 ⊇ K1 .
Continuing in exactly the same way and setting K0 = X and U1 = ∅, we
construct for every rational q ∈ D = { 2an | n > 0, a ∈ Z, 0 6 a 6 2n } a closed
set Kq and an open set Uq with Kq ⊇ Uq for q ∈ D and with Uq1 ⊇ Kq2 for
all q1 , q2 ∈ D with q1 < q2 . Now define
(
0 for x ∈
/ U0 ,
f (x) =
sup{q ∈ D | x ∈ Uq } otherwise.
Notice that we also have

(
1 for x ∈ K1 ,
f (x) =
inf{q ∈ D | x ∈
/ Kq } otherwise.
It is easy to check that

[
f −1 ((s, ∞)) = {x ∈ X | f (x) > s} = Uq
q>s
for any s > 0 and f −1 ((s, ∞)) = X for s < 0. Similarly,

[
f −1 ((−∞, s)) = {x ∈ X | f (x) < s} = XrKq
q<s
A.5 Normal Spaces 549
for s 6 1 and f −1 ((−∞, s)) = X for s > 1. Hence both f −1 ((s, ∞))
and f −1 ((−∞, s)) are open sets for any real s, so f is continuous and (4)
follows.
For a continuous function f on a topological space X with values in R, C,
or a vector space, we define the support of f to be
Supp f = {x ∈ X | f (x) 6= 0}
and say that f is supported on a subset Y ⊆ X if Supp f ⊆ Y .

Lemma A.28 (Partition of unity). Let X be a normal topological space,
and assume that {Kα | α ∈ A} is a collection of closed sets covering X,
and {Uα | α ∈ A} is an open cover of X with the property that Uα ⊇ Kα
for each α ∈ A. Suppose further that each x ∈ X has an open neighbourhood
that non-trivially intersects only finitely many Uα . Then for each α ∈ A there
exists a continuous function fα : X → [0, 1] supported on Uα such that
X
fα (x) = 1
α∈A
for all x ∈ X.
Proof. By Urysohn’s lemma (Lemma A.27), there exists for each α ∈ A an

open set Vα containing Kα with Vα ⊆ Uα . Furthermore, there is a continuous
Kα and equal to 0 on XrVα . In
function gα : X → [0, 1] which is equal to 1 onP
particular, gα is supported on Uα . Then g = α∈A gα is well-defined (by the
finite intersection property) and is bounded below by 1. Setting fα = gα /g
for α ∈ A gives the result.
Proposition A.29 (Tietze’s extension theorem). Let X be a normal to-

pological space, A ⊆ X a closed subset, and let f : A → R (or C) be a
bounded continuous function. Then there exists a bounded continuous func-
tion F : X → R (or C) with F |A = f . If in addition S is locally compact
and A is compact, then we can find such an extension F in Cc (X).
We note that the assumption of boundedness is not essential, but does
simplify the proof and is sufficient for our purposes.
Proof of Proposition A.29. If f is complex-valued then we may use the
following argument for ℜ(f ) and ℑ(f ) separately, so it is enough to consider
the real-valued case. If |f (x)| 6 M for all x ∈ X then we may also apply
1
the following argument to M f , so we may assume without loss of generality
that f is a continuous function from A to [−1, 1].
Define sets B− = f −1 ([−1, − 31 ]) and B+ = f −1 ([ 13 , 1]). By defini-
tion and by continuity of f , B− ⊆ A and B+ ⊆ A are disjoint closed
sets. By Urysohn’s lemma (Lemma A.27) there exists a continuous func-
tion g : X → [0, 1] with g|B− = 0 and g|B+ = 1. Define h1 = 32 (g − 12 ). We
claim that kf − h1 |A k∞ 6 32 by considering each possibility in turn:
• If x ∈ B− then f (x) ∈ [−1, − 31 ] and h1 (x) = − 31 , so |f (x) − h1 (x)| 6 32 .

• If x ∈ B+ then f (x) ∈ [ 31 , 1] and h1 (x) = 31 , so |f (x) − h1 (x)| 6 32 .
• Finally, if x ∈ Ar(B− ∪ B+ ), then f (x) ∈ (− 31 , 13 ) and |h1 (x)| 6 31 ,
so |f (x) − h1 (x)| 6 32 again.
We interpret the argument above as follows. Every continuous function
f : A → [−1, 1]
has an approximation h1 |A which is the restriction of a continuous func-

tion h1 : X → [−1, 1] to A, with kf −h1 |A k∞ 6 32 . Applying this general state-
ment to f2 = 23 (f − h1 |A ) we find some continuous function h2 : X → [−1, 1]
with kf2 − h2 |A k∞ 6 23 or, equivalently, with

2 2
kf − h1 + 32 h2 |A k∞ = 23 k 32 (f − h1 |A ) −h2 |A k∞ 6 3 .
| {z }
=f2
2
Continuing inductively starting with f3 = 32 f2 − h2 |A , we find func-
tions h1 , h2 , . . . , hn : X → [−1, 1] with

f − h1 + 2 h2 + · · · + ( 2 )n−1 hn |A 6 2 n . (A.2)
3 3 ∞ 3
We set
∞
X
2 n−1
F = 3 hn ,
n=1
and notice that

m
X
2 n−1 2 m
F (x) − 3 hn (x) 6 3 3

n=1
for any m > 1 and x ∈ X, so F is bounded, the convergence is uniform,

and F ∈ C(X) (see the proof of Example 2.24(3) on p. 30). By (A.2) we
have f = F |A , as required.
If X is also assumed to be locally compact and A ⊆ X is compact, then
by Lemma A.22 there exists an open set O ⊇ A with compact closure. Now
extend f , first by using the definition
(
e f (x) for x ∈ A,
f (x) =
0 for x ∈ XrO.
Then Ae = A ∪ XrO is closed, and fe is a continuous function on A

e which (by
the argument above) can be extended to a continuous function F ∈ C(X).
By construction Supp(F ) ⊆ O is compact, so F ∈ Cc (X), as required.
Appendix B: Measure Theory
Measure theory is one approach to making rigorous the idea of the size (or
length, volume, and so on) of a set in an abstract setting. We refer to the
notes of Tao [105] for a good general introduction to measure theory. By
carefully controlling the complexity of the sets allowed in the theory, the
basic intuition (for example, that the volume of the disjoint union of two
sets is the sum of their volumes) can be developed into a powerful theory,
indispensable in several fields including functional analysis and probability.
B.1 Basic Definitions and Measurability
The path to the definition of the Lebesgue integral starts with a discussion
about which sets (and hence which functions) are allowed in the theory.
Definition B.1. Let X be a set. A family A ⊆ P(X) of subsets of X is called
an algebra if it satisfies the following properties:
• ∅, X ∈ A;
• if A ∈ A then Ac = XrS A ∈ A;
• if A1 , . . . , An ∈ A then ni=1 Ai ∈ A;
and if, in addition,
S∞
• if A1 , A2 , · · · ∈ A then n=1 An ∈ A
then A is a σ-algebra.
If A is a σ-algebra, then we call the pair (X, A) a measurable space and
the elements of A measurable sets or A-measurable sets.
It is straightforward to check that the intersection of any collection of σ-
algebras is also a σ-algebra. Hence for any family C ⊆ P(X) of subsets there is
a unique smallest σ-algebra containing C, called the σ-algebra generated by C,
and denoted σ(C). If X is a topological space, then the σ-algebra generated by
all open subsets of X is called the Borel σ-algebra, and is denoted B or B(X).

552 Appendix B: Measure Theory
Definition B.2. A function φ : X → Y between measurable spaces (X, AX )

and (Y, AY ) is called measurable if φ−1 (A) ∈ AX for all A ∈ AY .
If Y is a topological space and φ is a map from a measurable space (X, A)

to Y then we will usually assume unless explicitly indicated otherwise that
we are dealing with the Borel σ-algebra on Y to define measurability of φ.
In particular, in such a setting φ is measurable if and only if φ−1 (O) ∈ A
for every open set O ⊆ Y , since φ−1 preserves all set-theoretic operations
and the Borel σ-algebra is generated by the open sets in Y . This applies in
particular to the cases Y = R and Y = C.
Pointwise limits of sequences of measurable functions are measurable in
the following sense. If (fn ) is a sequence of measurable functions fn : X → Y
for each n > 1, where Y is a metric locally compact σ-compact space, and
for each x ∈ X the sequence (fn (x)) converges to some f (x) in Y , then the
pointwise limit function f : X → Y is measurable.
In order to define the integral of a measurable function one needs a precise
notion of size or measure of a measurable set.
Definition B.3. A function µ : A → R ∪ {∞} defined on a σ-algebra A

of subsets of a set X is called a (positive) measure if it has the following
properties:
• (positivity) µ(A) > 0 for A ∈ A;
• (σ-additivity) if An ∈ A for all n > 1 and An ∩ Am = ∅ for all m 6= n,
then !
∞
[ X∞
µ An = µ(An ),
n=1 n=1
where the sum on the right-hand side may or may not converge.
Theorem B.4 (Carathéodory extension [15]). Let A be an algebra of

subsets of X, and assume that µ : A → [0, ∞] is a function satisfying the
S
(1) if A1 , A2 , . . . are disjoint members of A with ∞
n=1 An ∈ A, then
∞
! ∞
[ X
µ An = µ(An );
n=1 n=1
(2) there is a countable collectionS{An | n ∈ N} with An ∈ A and µ(An ) < ∞

for all n > 1, and with X = ∞ n=1 An .
Then there is a measure µ on the smallest σ-algebra containing A (equival-

ently, on the σ-algebra generated by A) that extends µ in the sense that
µ(A) = µ(A)
for any A ∈ A.
B.1 Basic Definitions and Measurability 553
The triple (X, A, µ) consisting of a space X, a σ-algebra A of subsets

of X, and a measure µ is called a measure space, and is called a probability
space if µ(X) = 1. If the measure µ of a measure space (X, A, µ) satisfies
condition (2) of Theorem B.4 then we say that the measure and the measure
space are σ-finite. We will assume from now on that we are given some σ-finite
measure µ on a measurable space (X, B).
Definition B.5. A measurable function f : X → C is called simple if
µ ({x ∈ X | f (x) 6= 0}) < ∞
and we have finite range |f (X)| < ∞. In other words, f is simple if

N
X
f= an 1Bn (B.1)
n=1
for some constants an ∈ C and Bn ∈ B with µ(Bn ) < ∞ for 1 6 n 6 N . The

integral of the function f in (B.1) is defined to be
Z N
X
f dµ = an µ(Bn ). (B.2)
X n=1
One can show rather easily that the integral defined in (B.2) is independent
of the particular description of f as a finite sum as in (B.1). For the next
definition, the analogous claim is an important step in the theory (this is
essentially the monotone convergence theorem discussed below).
Definition B.6. Suppose that f : X → R>0 ∪ {∞} is measurable. We define

the integral of f as the limit
Z Z
f dµ = lim fn dµ,
X n→∞ X
where (fn ) is a sequence of simple measurable functions fn : X → R>0 with
0 6 fm 6 fn 6 f
for m 6 n, and f (x) = lim fn (x) for all x ∈ X.

n→∞
Implicit in this definition is the fact that (on σ-finite measure spaces) any
non-negative measurable function is a pointwise limit of simple functions.
Notice also that we permit sets to have infinite measure and functions to
have infinite integral. If f : X → [0, ∞] = R>0 ∪ {∞}, then we define
Z
f dµ = ∞
X
if µ ({x ∈ X | f (x) = ∞}) > 0, and

Z Z
f dµ = f 1{x∈X|f (x)<∞} dµ
X X
otherwise. Here the product ∞ · 0 is defined to be 0. The function f is called

integrable if Z
f dµ < ∞.
X
If f : X → R and both f + = max{0, f } and f − = max{0, −f } are integrable,

then we define Z Z Z
+
f dµ = f dµ − f − dµ
X X X
and say that f is integrable. Finally, if f : X → C and ℜ(f ), ℑ(f ) are

integrable, then we define
Z Z Z
f dµ = ℜ(f ) dµ + i ℑ(f ) dµ
and once again say that f is integrable.
B.2 Properties of the Integral
The space of integrable functions forms a vector space, and the integral is
a linear function on that vector space. Moreover, the integral satisfies the
following fundamental continuity properties, each of which is a consequence
of the σ-additivity of the measure µ.
Theorem B.7 (Monotone convergence). Let (X,B,µ) be a measure space,

and let (fn ) be a sequence of measurable functions fn : X → [0, ∞] for n > 1
with fn ր f as n → ∞. That is, fm 6 fn for m 6 n, and
f (x) = lim fn (x)

n→∞
for all x ∈ X. Then f is measurable and

Z Z
f dµ = lim fn dµ.
n→∞
Theorem B.8 (Dominated convergence). Let (X, B, µ) be a measure

space, and let (fn ) be a sequence of complex-valued measurable functions with
f (x) = lim fn (x)

n→∞
B.2 Properties of the Integral 555
for all x ∈ X. Assume that there is an integrable function g : X → R>0

with |fn | 6 g for n ∈ N. Then f is integrable and
Z Z
f dµ = lim fn dµ.
n→∞
Definition B.9. We write
Lµ1 (X) = {f : X → C | f is integrable}
for the space of integrable functions on a measure space (X, B, µ), and define
Z
kf k1 = |f | dµ
for any measurable function f : X → C.

Notice that f ∈ Lµ1 ⇐⇒ kf k1 < ∞, and
Z

f dµ 6 kf k1

for all f ∈ Lµ1 (X). It is easy to check that kλf k1 = |λ|kf k1 and
kf + gk1 6 kf k1 + kgk1
for all f, g ∈ Lµ1 and λ ∈ C.

Definition B.10. A measurable set N ⊆ X is called a null set if µ(N ) = 0.
We say that a property holds almost everywhere (also written a.e., or where
the measure is not obvious from the context, µ-almost everywhere) if it holds
on the complement of a null set.
Thus, for example, if f ∈ Lµ1 (X) then kf k1 = 0 ⇐⇒ f = 0 almost
everywhere with respect to µ. It is often convenient to relax the requirement
that a null set be measurable and call a set N a null set if there is a measurable
set N ′ with µ(N ′ ) = 0 and N ′ ⊇ N .
Definition B.11. We define
L1µ (X) = Lµ1 (X)/∼,
where the equivalence relation ∼ is µ-almost everywhere equality, meaning

that f ∼ g if and only if f = g µ-almost everywhere.
While the natural notation for the element of L1µ (X) containing a func-
tion f in Lµ1 (X) is [f ]∼ , it is conventional to simply write f ∈ L1µ (X), with
the understanding that such a function f is only defined up to equivalence
under ∼.
Finally, we turn to integration of functions of several variables.
Theorem B.12 (Fubini’s theorem). For σ-finite measure spaces (X, B, µ)

and (Y, C, ν) there exists a unique σ-finite measure µ × ν on X × Y such that
(µ × ν)(B × C) = µ(B)ν(C)
for all B ∈ B and C ∈ C. If now f is a measurable function on X × Y ,

integrable in the sense that
Z
|f (x, y)| d(µ × ν) < ∞,
X×Y
then for almost every y ∈ Y and x ∈ X, the integrals

Z Z
h(x) = f (x, y) dν(y), g(y) = f (x, y) dµ(x)
Y X
are integrable, and

Z Z Z
f d(µ × ν) = h dµ = g dν. (B.3)
X×Y X Y
This may also be written in a more familiar form as

 Z Z 

 

Z 
 f (x, y) dν(y) dµ(x) 
 
X Y
f (x, y) d(µ × ν)(x, y) = Z Z (B.4)
X×Y 
 


 f (x, y) dµ(x) dν(y). 

 
Y X
Theorem B.13 (Tonelli’s theorem). Let f : X × Y → [0, ∞] be a

non-negative measurable function on the product of two σ-finite measure
spaces (X, B, µ) and (Y, C, ν). Then (B.4) holds again.
We will simply refer to ‘Fubini’s theorem’ whenever an interchange of the

order of integration is justified by an application of either of these theorems;
the reader may check in each case that the application is justified.
B.3 The p-Norm
Definition B.14. For any p ∈ [1, ∞) we define

Z 1/p
kf kp = |f |p dµ
X
for any measurable f : X → C,

B.3 The p-Norm 557
Lµp (X) = {f : X → C | kf kp < ∞},
and Lpµ (X) = Lµp (X)/ ∼, where again f ∼ g if f = g µ-almost everywhere.

The measure µ is often clear from the context of the space X, and in that
case we will simply write Lp (X). Thus, for example, Lp (R) denotes Lpm (R)
where m is Lebesgue measure on R.
Once again it is clear that kλf kp = |λ|kf kp and kf + gkp 6 kf kp + kgkp
holds as well for any measurable f, g and λ ∈ C. We will review the proof of
this triangle inequality, starting with the following important step.
1
Theorem B.15 (Hölder’s inequality). Let p, q ∈ (1, ∞) satisfy p + 1q = 1
(in which case q is called the conjugate exponent of p). Then
Z
|f g| dµ 6 kf kp kqkq
X
for any measurable functions f, g : X → C.

For p = q = 2 this is the Cauchy–Schwarz inequality (see also Proposi-
tion 3.2).
Proof of Theorem B.15. R For kf kp = 0 or kgkq = 0 we have f g = 0 µ-
almost everywhere, and so |f g| dµ = 0. Assume that kf kp > 0 and kgkq > 0.
If either is infinite, then the inequality holds trivially. So it is enough to
consider the case kf kp , kgkq ∈ (0, ∞). Dividing by kf kp and by kgkq we may
also assume that kf kp = kgkq = 1.
Suppose now that x ∈ X satisfies |f (x)| > 0 and |g(x)| > 0. Then we
may choose s, t ∈ R with |f (x)| = es/p and |g(x)| = et/q . By convexity of the
function v 7→ ev on R, we see that
|f g|(x) = es/p+t/q 6 p1 es + 1q et = 1p |f (x)|p + 1q |g(x)|q , (B.5)
and the inequality between the left-hand side and the right-hand side of (B.5)
also holds trivially if f (x) = 0 or g(x) = 0. Integrating (B.5) over x ∈ X gives
Z
|f g| dµ 6 p1 kf kpp + 1q kgkqq = 1,
proving the theorem.
Theorem B.16 (Triangle inequality). For measurable functions f and g

from X to C we have kf + gkp 6 kf kp + kgkp for any p ∈ [1, ∞).
Proof. For p = 1 the inequality follows by integrating the triangle inequal-

ity |f (x) + g(x)| 6 |f (x)| + |g(x)| for complex numbers over x ∈ X. Assume
from now on that p > 1. Then we have |f + g|p 6 |f ||f + g|p−1 + |g||f + g|p−1 ,
and integrating over X and applying Hölder’s inequality (Theorem B.15)
gives
kf + gkpp 6 kf kp k(f + g)p−1 kq + kgkp k(f + g)p−1 kq , (B.6)
where q is the conjugate exponent of p. Notice that q(p − 1) = p implies

Z 1/q
p−1 p
k(f + g) kq = |f + g| dµ = kf + gkp/q
p .
p/q
If now kf + gkp ∈ (0, ∞) we can divide (B.6) by kf + gkp and the theorem
follows since p − p/q = 1.
However, if kf + gkp = 0 then the inequality in the theorem is trivially
satisfied. Finally, note that
|f + g|p 6 (|f | + |g|)p 6 2p max{|f |p , |g|p } 6 2p (|f |p + |g|p )
implies kf + gkpp 6 2p (kf kpp + kgkpp ). Hence if kf + gkp = ∞, then kf kp = ∞

or kgkp = ∞ and the theorem also holds in this case.
B.4 Near-Continuity of Measurable Functions
Even though measurable functions are typically very far from being continu-
ous, if we are working with a finite measure on the Borel σ-algebra of a metric
space then they are nearly continuous in the following sense.
Theorem B.17 (Lusin: near-continuity of measurable functions).
Let X be a metric space, let µ be a finite measure on the Borel σ-algebra of X,
let Y be a separable metric space, and let f : X → Y be (Borel) measurable.
Then for every ε > 0 there exists a closed set K ⊆ X with µ(XrK) < ε
such that f |K is continuous. If X is σ-compact, then K can be chosen to be
compact.
As the proof will show, we will in essence produce the continuity of f |K
by removing very small open subsets around every possible discontinuity. To
do this we will use the following regularity property of measures on metric
spaces.
Lemma B.18 (Regularity of measures). Let (X, d) be a metric space and
let µ be a finite Borel measure on X. Then for every Borel set B ⊆ X and
every ε > 0 there exists a closed set K ⊆ X and an open set O ⊆ X with
K⊆B⊆O
and µ(OrK) < ε.
Proof. Consider the family A ⊆ B of sets B ∈ B with the property that

for every ε > 0 there exists a closed set K ⊆ B and an open set O ⊇ B
B.4 Near-Continuity of Measurable Functions 559
with µ(OrK) < ε. The statement of the lemma is then A = B, which we will
prove in stages. By definition of the Borel σ-algebra B, it is enough to show
that A is a σ-algebra containing all the open sets.
Closure under complements. Since taking complements switches open
and closed sets, A is closed under taking complements. Explicitly, if B lies
in A and for a given ε > 0 we have K ⊆ B ⊆ O as in the definition of A,
then XrO ⊆ XrB ⊆ XrK and µ ((XrK)r(XrO)) = µ(OrK) < ε, which
shows that XrB ∈ A.
Open sets. Using the continuous distance function x 7→ d(x, A) for a closed
subset A of X from (2.29) it follows that
\
A= {x ∈ X | d(x, A) < n1 }
n>1
| {z }
=On , an open set
is a decreasing countable intersection of open sets (that is, a Gδ -set). From

the properties of the measure it now follows that B = A satisfies the claim
of the lemma with K = A and O = On with n depending on ε > 0. Since
closed sets belong to A, open sets also belong to A by the previous step.
Finite unions. Suppose B1 , B2 ∈ A and ε > 0. Then there exist
K1 ⊆ B1 ⊆ O1
and
K2 ⊆ B2 ⊆ O2
as in the definition of A, with µ(O1rK1 ) < ε and µ(O2rK2 ) < ε. Now
define K = K1 ∪ K2 , B = B1 ∪ B2 and O = O1 ∪ O2 so that K is closed, O is
open, and K ⊆ B ⊆ O. Moreover, µ(OrK) 6 µ(O1rK1 ) + µ(O2rK2 ) < 2ε,
and since ε > 0 was arbitrary we deduce that B = B1 ∪B2 ∈ A. By induction,
the same holds for any finite union.
Countable unions. By the steps above, B1 ∪· · ·∪Bn ∈ A if B1 , . . . , Bn ∈ A
for any n > 1. Therefore, and since we are interested in the union of these
sets, we may assume that B1 , B2 , · · · ∈ A satisfy Bn ⊆ Bn+1 for all n > 1.
Define B1′ = B1 and Bn+1
′
= Bn+1rBn ∈ A for all n > 1, so that
∞ ∞
!
X [
′
µ(Bn ) = µ Bn < ∞
n=1 n=1
by our assumption that µ is a finite measure. Therefore, for any ε > 0 there
exists some m > 1 with
X∞
′
µ(Bn+1 ) < ε.
n=m
Since Bm ∈ A, there exists some closed K ⊆ Bm and open O′ ⊇ Bm

with µ(O′rK) < ε. Since Bn′ ∈ A there must exist open sets On ⊇ Bn′
with µ(OnrBn′ ) < ε/2n for n > m. Now define

∞
[
O = O′ ∪ On
n=m+1
and notice that ∞

[
K⊆ Bn ⊆ O
n=1
and
∞
!
[
µ(OrK) 6 µ(O K) + µ
′r
On
n=m+1
∞
! ∞
!
[ [
<ε+µ (OnrBn′ ) +µ Bn′ < 3ε.
n=m+1 n=m+1
It follows that
∞
[
Bn ∈ A.
n=1
Conclusion. By the above A is a σ-algebra that contains all open sets, and
as mentioned above this forces A = B.
Proof of Theorem B.17. Let f : X → Y be as in the statement of the

proposition. By definition of measurability and of the Borel σ-algebra, the
pre-image of every open set is Borel measurable in X. We wish to find, for
every ε > 0, a closed set K ⊆ X with µ(XrK) < ε such that
(f |K )−1 (U ) = K ∩ f −1 (U )
is open in K for every open set U ⊆ Y .

By our assumptions on Y , there exists a countable basis of the topology
of Y (for example, using all balls of radius n1 for n > 1 with centres at the
points of a countable dense subset). Let {Un | n > 1} be such a basis. Now
apply Lemma B.18 to each of the sets f −1 (Un ) ⊆ X to find a closed set Kn
and an open set On with Kn ⊆ f −1 (Un ) ⊆ On and with µ (OnrKn ) < ε/2n
for n > 1. Now define
∞
\
K= (Kn ∪ XrOn ) .
n=1
Notice first that Kn ∪ XrOn is closed for all n > 1, and so K is also closed.
Second, we have
B.5 Signed Measures 561
 
∞
! ∞
\ X  
µ(XrK) = µ Xr (Kn ∪ XrOn ) 6 µ Xr(Kn ∪ XrOn ) < ε
n=1 n=1
| {z }
=OnrKn
by construction of Kn and On . Finally, notice that
f |−1
K (Un ) = K ∩ f
−1
(Un ) = K ∩ On
is an open subset of K (in the induced topology). Since this holds for all the
sets Un in the basis, it follows
S∞ that f |K is continuous.
If now in addition X = n=1 Ln is a countable union of compact sets,
SN
then K ′ = K ∩ n=1 Ln satisfies the final claim of the proposition if N is
sufficiently large.
In the setting considered in this section there is a convenient formulation
of the support of a measure as follows.
Definition B.19. The support Supp µ of a Borel measure µ on the Borel σ-
algebra of a metric space X is the set of all points x ∈ X with the property
that every neighbourhood of x has positive measure.
Notice that with this definition µ(Xr Supp µ) = 0 for the spaces con-
sidered in this section.
B.5 Signed Measures
Let (X, B) be a measurable space. We define a (real- or complex-)valued

signed measure ν by a finite measure µ and some (real- or complex-)valued
function g ∈ Lµ1 (X) and the formula dν = g dµ. More concretely, ν isR in that
case the (real- or complex-)valued function on B defined by ν(B) = B g dµ.
By dominated convergence (applied for the measure µ), ν is σ-additive on B.
The signed measure can also be used to integrate bounded measurable func-
tions by setting Z Z
f dν = f g dµ
for f ∈ L ∞ (X). Using dominated convergence and R a sequence of simple

functions to approximate f ∈ L ∞ (X), we see that f dν only depends on
the function ν (that is, on the signed measure) and not on the choices involved
in the representation of ν in terms of µ and g ∈ Lµ1 (X).
Exercise B.20 (Polar decomposition). Let ν be a signed measure on a measurable
space (X, B). Show that the representation dν = g dµ consisting of µ and g ∈ Lµ1 (X) can
be chosen so that |g(x)| = 1 for all x ∈ X.
Using the Radon–Nikodym theorem (Proposition 3.29) it can be shown

that the set of signed measures as defined above forms a vector space. To see
this, suppose that ν1 = g1 dµ1 and ν2 = g2 dµ2 are signed measures as above,
and λ1 , λ2 are scalars. Then we may define the finite measure µ = µ1 + µ2
which satisfies µ1 , µ2 ≪ µ and so dµ1 = f1 dµ and dµ2 = f2 dµ for some non-
negative functions f1 , f2 ∈ Lµ1 (X). This gives the presentation dνj = gj fj dµ
for j = 1, 2, and so d(λ1 ν1 + λ2 ν2 ) = (λ1 g1 f1 + λ2 g2 f2 ) dµ defines the linear
combination λ1 ν1 + λ2 ν2 of the signed measures ν1 and ν2 .
Hints for Selected Problems
Exercise 1.3 (p. 3): One way to start the proof is to lift φ to a function φ : R → C and to
show that φ must be differentiable and satisfies a differential equation by comparing φ(x)
with Z Z
x+ε ε
ψ(x) = φ(t) dt = φ(x + t) dt = φ(x)c
x 0
Rε
(for ε small enough to ensure that c = 0
φ(t) dt 6= 0).
Exercise 1.7 (p. 11): Extend the given function first to an odd function on (−1, 1) and
then by periodicity to a function on R/2Z. Then use the Fourier series.
Exercise 2.7 (p. 20): For the first part consider rapidly oscillating functions, which can
have k · kC([0,1]) small and k · kC 1([0,1]) large. For the second, use the fundamental theorem
of calculus.
Exercise 2.9 (p. 20): Find a way to use Proposition 2.6.

Exercise 2.18 (p. 24): Suppose the unit ball is strictly convex and v, w ∈ V r{0} sat-
isfy kv + wk = kvk + kwk and kvk 6 kwk. Use the estimate
−1
kvk v + kvk−1 w > kvk−1 v + kvk−1 w − kvk−1 w − kwk−1 w
−1 −1 −1

= kvk kv + wk − kvk − kwk kwk

= kvk−1 kvk + kwk − kvk−1 − kwk−1 kwk = 2
to conclude from strict convexity of the unit ball that kvk−1 kvk = kwk−1 kwk.
Exercise 2.26 (p. 29): For (a) assume that (yn ) is a sequence in Y converging to x ∈ X,
and note that (yn ) must be a Cauchy sequence. For the reverse implication in (b) assume
that (yn ) is a Cauchy sequence in Y and note that it then is also a Cauchy sequence in X.
Exercise 2.39 (p. 42): For (b) we note that in the formulation of the compactness criterion
in C0 (X) (which is not given in the exercise) an extra uniformity condition regarding decay
at infinity is necessary.
Exercise 2.43 (p. 47): Notice first that without constants the given proof cannot be
applied. Add a point xnew , forming Xnew = X ⊔ {xnew }. Extend functions in A to Xnew
by setting f (xnew ) = 0 for all f ∈ A. Define Anew = A + R1 (or A + C1) and apply

564 Hints for Selected Problems
Theorem 2.40. For (a) let xnew be an isolated point of Xnew . For (b) define the topology
on Xnew so that this space is the one-point compactification of X.
Exercise 2.48 (p. 51): Recall that a Riemann integrable function is a function that can
be approximated from above and below by step functions such that the integral of the
difference is arbitrary small. With this in mind repeat the argument for (2) =⇒ (1) to
show that (1) =⇒ (3).
Exercise 2.50 (p. 51): Express this in terms of the orbit of 0 under t 7→ t + log10 2
modulo 1.
Exercise 2.56 (p. 56): Apply (2.31) for kvk 6 1 and consider L(kvk−1 v) for non-zero
vectors v ∈ V .
Exercise 2.61 (p. 59): Use the Cauchy integral formula to see that Ez and
ıO : V ∋ f 7→ f |O ∈ C(O)
are continuous. To prove injectivity, define Dr = {z ∈ C | |z| < r} for r < 1, Vr = V (Dr )
and its completion H p (Dr ). Now consider the maps
Y
H p (D) ∋ f 7−→ ıDr (f ) | r < 1 ∈ C(Dr )
r<1
and Y Y
C(Dr ) ∋ (fr | r < 1) 7−→ (fr | r < 1) ∈ Hp (Dr ),
r<1 r<1
and notice that for f ∈ V the composition of the two maps is given by
V ∋ f 7→ (f |Dr | r < 1)
which satisfies kf kH p (D) = supr<1 kf |Dr kH p (Dr ) . Extend this to all f ∈ H p (D) and
consider now the case where f ∈ H p (D) satisfies ıDr (f ) = 0 for all r < 1.
Exercise 2.70 (p. 69): Using the discussion concerning (2.47) the assumption is equivalent
to f ′′ = λf with the boundary conditions f (0) = f (1) = 0.
Exercise 3.4 (p. 73): Use the same argument as in (2.34).
Exercise 3.5 (p. 74): For (b) express hx1 , yi + hx2 , yi in terms of the norm using the
definition and apply the parallelogram identity separately to the positive and the negative
parts to obtain 21 hx1 + x2 , 2yi. Setting x2 = 0, this gives hx1 , yi = 21 hx1 , 2yi. Now consider
rational multiples of x1 to prove linearity of h·, ·i. For part (c) verify first the complex
polarization identity
3
1X k
hx, yi = i kx + ik yk2
4
k=0
for elements x, y ∈ H of a complex inner product space H.
Exercise 3.9 (p. 75): Use the polarization identity from Exercise 3.5.
Exercise 3.10 (p. 76): For (a), analyze the proof of the triangle inequality using the
equality case of the Cauchy–Schwarz inequality in Proposition 3.2. For (b) show that the
closed unit ball is strictly convex and apply Exercise 2.18.
Hints for Selected Problems 565
Exercise 3.15 (p. 77): For (a), use the inequality (a2 /(a2 + b2 ))q + (b2 /(a2 + b2 ))q 6 1.
For (b) use Jensen’s inequality.
Exercise 3.21 (p. 81): For (a) apply Corollary 3.19 to the linear functional sending y
to B(x, y) for a fixed x ∈ H. For (b), notice that kT xk > ckxk and show that this implies
that T (H) ⊆ H is closed. Finally, x ∈ T (H)⊥ implies hT x, xi = 0 > ckxk2 .
Exercise 3.22 (p. 81): Define hφ(x), φ(y)iH∗ = hx, yiH .
Exercise 3.23 (p. 81): Either use Section 2.2.2, or define an inner product on the closure
of the image of H in the double dual.
Exercise 3.28 (p. 82): For (1) simply use the evaluation maps for every n ∈ N. For (2)
(i)
use the axiom of choice to fix for every i ∈ I a sequence (ym ) of rationals approaching i
(i) (i)
and define xn to be equal to 1 if the nth rational number appears in the sequence (ym )
and to be equal to 0 otherwise. Now use (2) for proving (3), (3) for (4), and that I is
uncountable to conclude.
Exercise 3.30 (p. 85): For (b) recall (from analysis or as a trivial case of Fubini’s theorem)
P∞ P∞ P∞ P∞
that if amn > 0 then m=1 n=1 amn = n=1 m=1 amn (where the sum is also
allowed to be infinity).
Exercise 3.31 (p. 85): One approach is to use the result for the case of a finite measure
space and apply Exercise 3.30.
Exercise 3.33 (p. 85): First show that ν can be identified with a linear functional
on L ∞ (X) and show that kνk is precisely the operator norm. For (b) use Lemma 2.28,
P
Exercise B.20, and the fact that µ = ∞n=1 µn defines a finite measure if each µn is a finite
P∞
measure for n > 1 and n=1 µn (X) < ∞.
Exercise 3.34 (p. 86): Working first with real-valued L2 functions, start with the pro-
jection operator P : L2µ (X, B) → L2µ (X, A) and show that kP (f )k1 6 kf k1 .
L
Exercise 3.37 (p. 87): For (a), show first that the inner product is well-defined on n Hn
and satisfies all the properties of an inner product. Then show that a Cauchy sequence
in the sum gives rise to Cauchy sequences in each Hn for all n. For (b) use (a) and the
P L
canonical map (vn ) 7→ n vn from the abstract Hilbert space sum n Hn into H.
Exercise 3.48 (p. 94): Recall that χe1 , . . . , χed are sufficient to separate points on Td ,
and that the group of characters generated by these are all the characters of the stated
form.
Exercise 3.49 (p. 95): The characters already appeared implicitly in Section 1.1.
Exercise 3.50 (p. 95): For (d), write Lp (G) for functions on G with respect to normalized
Haar measure, and ℓp (G) b for functions on Gb with respect to counting measure. Notice
that f = 1Supp(f ) f so that
kf k1 = k1Supp(f ) f k1 6 k1Supp(f ) k2 kf k2 = | Supp(f )| 2 |G|− 2 kf k2

1 1
by the Cauchy–Schwarz inequality. Similarly
kfbk2 = k1Supp(fb) fbk2 6 k1Supp(fb) k2 kfbk∞ = | Supp(fb)| 2 kfbk∞ .

1
Combine these inequalities with the estimate from (c).
Exercise 3.53 (p. 95): For (a) prove first that TΓ is a metric compact abelian group
and show that G is a closed subgroup. For (b) notice that G can be identified with the
group of characters on G and that every γ0 ∈ Γ defines the continuous group homomorph-
ism χγ0 : G ∋ (zγ ) 7→ e2πizγ0 ∈ S1 , which is non-trivial by the theorem on completeness of
characters (applied to Γ ). If now χ is a character on G, then there exists a neighbourhood U
1
of 0 ∈ G such that χ(U ) ⊆ B1/10 S (1). By the definition of the product topology there exist
finitely many γ1 , . . . , γd ∈ Γ such that H = {(zγ ) ∈ G | zγ1 = · · · = zγd = 0} ⊆ U .
Using that H is a subgroup and χ is a homomorphism, show that χ(H) = 1, which
shows that χ is well-defined on G/H and depends only on the coordinates zγ1 , . . . , zγd for
any (zγ ) ∈ G. Combine this with Exercise 3.48 to conclude that χ can be expressed in
terms of χγ1 , . . . , χγd .
Exercise 3.55 (p. 96): For part (b), consider the odd extension of a given function f
in L2 ((0, 1)) and apply part (a) rephrased for L2 ((−1, 1)). For part (c) consider the even
extension.
Exercise 3.56 (p. 96): Use de Moivre’s formula (e2πiφ )n = cos(2πnφ) + i sin(2πnφ).
Exercise 3.69 (p. 106): Localize to a small open subset Bδ (x) ⊆ U by multiplying by
a function Cc∞ (Bδ (x)) which is equal to 1 on Bδ/2 (x). Treat the new localized function
as an element on T2 . Now generalize Theorem 3.57 to give an inequality concerning (and
as a result, the existence of) ∂1 ∂2 f at x. This exercise should become easier after reading
Theorem 5.6.
Exercise 3.72 (p. 107): Do this via a familiar sequence of approximations, first for indic-
ator functions of measurable sets, then for simple functions, then for non-negative functions
by monotone convergence, and finally for all integrable functions.
Exercise 3.76 (p. 110): First use the case p = 1 to see that
Z
n
x∈X|
G
.
φ(g)f (g −1 x) dmG (g) 6= 0
o
is a null set for any f ∈ L ∞ (X)

with f = 0 µ-almost everywhere. Therefore, if two
bounded measurable functions f1 and f2 on X are equivalent modulo µ, then
Z Z
G
.
φ(g)f1 (g −1 x) dmG (g) =
G
.
φ(g)f2 (g −1 x) dmG (g)
.
for almost every x ∈ X. Use this to see that φ ∗f is well-defined for an equivalence class of
functions f ∈ L∞
µ (X).
Exercise 3.77 (p. 110): For the continuity requirement approximate (vn ) by a finitely
supported vector (v1 , . . . , vk , 0, . . .) for some k > 1 and then use continuity of the unitary
representations π1 , . . . , πk .
Exercise 3.82 (p. 113): Prove the same statements first for the Riemann sums.
Exercise 3.86 (p. 115): Use Proposition 3.83 for the first part. For part (b) go through
the proof of Lemma 3.75 to see that it also works for a measure ν. Then take a second

.

function f ′ ∈ L2µ (X) and apply Fubini’s theorem to f ′ , ν ∗f (and similarly to f ′ , φ ∗f ). .
Exercise 3.90 (p. 118): Use integration by parts just as in the proof of Theorem 3.57 to
bound χn π∗ f uniformly on compact subsets of R2 .
Exercise 3.92 (p. 119): Repeat the argument for Lemma 3.59(1).
Exercise 3.93 (p. 119): For µ, ν ∈ M(G) and any Borel measurable B ⊆ G define
ZZ
µ ∗ ν(B) = 1B (gh) dµ(g) dν(h).
Exercise 4.4 (p. 123): See Example 8.56 for the counter-example.
Exercise 4.16 (p. 128): Consider the closed sets
Xn = {x ∈ X | kTα xk 6 n for all α ∈ A}
for n > 1.
Exercise 4.18 (p. 128): Define the oscillation of f at x ∈ X by
oscf (x) = inf diam(f (Bε (x))),

ε>0
so that f is continuous at x if and only if oscf (x) = 0. Show that the set
{x ∈ X | oscf (x) < c}
is open for c > 0, and that \

1
{x ∈ X | oscf (x) < n
}
n>1
gives the set of points where f is continuous.
Exercise 4.19 (p. 128): Show that

n Z b o
O(a,b),n = f ∈ L1 ((0, 1)) | |f |dx > n(b − a)
a
is open in L1 ((0, 1)) for all 0 6 a 1, and show that
[
D(a,b),n = O(c,d),n
(c,d):a<c<d<b
is open and dense. Now take the intersection over (a, b) ∈ Q2 with a < b and n ∈ N.
+
Exercise 4.20 (p. 128): Consider for every n ∈ N the set Bn of functions f in C([0, 1])
with the property that there exists some x ∈ [0, 21 ] such that

f (x+h)−f (x)
h 6n
+
for all h ∈ (0, 12 ]. Use compactness of [0, 12 ] to show that each Bn is closed. Show
that C([0, 1])rBn
+
is dense, for example by using piecewise linear functions. Repeat the
−
argument considering difference quotients for x ∈ [ 12 , 1] and h ∈ [− 21 , 0) to define Bn . Con-
S
clude that C([0, 1])r +
n∈N (Bn
−
∪ Bn ) is dense and consists of functions that are nowhere
differentiable.
Exercise 4.23 (p. 130): Use the argument from the proof of Lemma 4.22.
Exercise 5.11 (p. 142): Recall first that Cc (U ) ⊆ Lp (U ) is dense by Proposition 2.51.
Given some f ∈ Cc (U ) choose some open V with compact closure V ⊆ U . Now apply the
Stone–Weierstrass theorem (in the form of Exercise 2.43(b)) to Cc∞ (V ) ⊆ C0 (V ).
Exercise 5.15 (p. 143): Describe the relationship between the Fourier coefficients of f
and of ∂ α f and use Lemma 5.4. Alternatively, convolve with a suitable version of ε from
Exercise 5.17 and show that the resulting smooth (or, using Exercise 5.17, L2 ) function
actually approximates f with respect to the norm on H k (Td ).
Exercise 5.16 (p. 144): Show by induction that the derivatives of ψ for t < 0 are of the
form p( 1t )ψ(t) for some real-valued polynomials p and show that such functions converge
to 0 as t ր 0. Use this and the mean value theorem for differentiation to show that all
derivatives of ψ at t = 0 vanish.
Exercise 5.17 (p. 144): For (a) use Exercise 5.16. For (b) argue as in the proof of
Theorem 3.54. For (c), differentiate under the integral (which may be justified by dominated
convergence). In (e) the appropriate convergence is with respect to k · kp , which can be
obtained using the density of Cc (U ) ⊆ Lp (U ), Lemma 3.75, and parts (b) and (d).
Exercise 5.18 (p. 144): Localize the function f ∈ C(U ) to a small set using a smooth
function of compact support (for example, replace f with f (x)ε (x − x0 ) for ε from Ex-
ercise 5.17) and consider it as a smooth function on Td (see also Lemma 5.36). Then
Pd
j=1 nj ≍ knk2 for n ∈ Z and k > 1, Lemma 5.4, and Theorem 5.6.
use (3.18), 2k 2k d
Exercise 5.19 (p. 144): Note that for λ ∈ (0, 1) the function f λ (x) = f (λx) is defined on a
slightly larger version of the set U . Let ε > 0 and recall the function ε from Exercise 5.17.
Show that the restriction of fj ∗ ε is the weak ej -partial derivative of f ∗ ε on the
set Uε = {x ∈ U | x + Bε ⊆ U }. Now choose some λ < 1 sufficiently close to 1 and ε > 0
sufficiently small and show that the smooth function f λ ∗ ε is defined on U , its restriction
is close to f in L2 , and that the same applies to the weak partial derivatives.
Exercise 5.25 (p. 146): Either convolve with an approximate identity (that is, with ε
from Exercise 5.17) or show that the sequence of functions (fn ) defined by fn = min{n, f }
all lie in H 1 (B1/2 ).
Exercise 5.27 (p. 146): For (a) (and (b)) consider first functions in C ∞ (U ) ∩ H k (U )
(respectively Cc∞ (V )). For (c) consider for instance d = 1, V = (0, 21 ) ⊆ U = (0, 1), find
some χ ∈ Cc∞ (U ) with χ( 12 ) 6= 0 and show that χ|V ∈ H 1 (V )rH01 (V ) using the argument
in Example 5.20.
Exercise 5.30 (p. 147): Use the regular map φ to pull back any function
f0 ∈ C ∞ (U ) ∩ H 1 (U )
(or f0 ∈ H 1 (U )) to an element
f ∈ C ∞ ((0, 1)d ) ∩ H 1 ((0, 1)d )
(or f ∈ H 1 ((0, 1)d )) and then apply Example 5.28.

Exercise 5.35 (p. 150): Here elements of function spaces on the closed cube are defined
to have the claimed degree of smoothness in the interior of the cube, and in addition have
the property that all the claimed partial derivatives extend continuously to the closure.
For (a) apply the trace operator in Example 5.28 (see Exercise 5.29) for every α ∈ Nd−10
with kαk1 6 k − 1 to see that the map
H k (U ) ∋ f 7→ ∂ α f |Sy ∈ L2 (Sy )
is a bounded operator. Together these show that H k (U ) ∋ f 7→ f |Sy ∈ H k−1 (U ) is also

bounded. For (b) first generalize (a) to prove that
p
kf |Sy1 − f |Sy2 kH k−1 (S) ≪ kf kH k (U ) |y1 − y2 |.
Next set ℓ = 0 and use induction on the dimension to prove that kf k∞ ≪ kf kH d (U ) .

∂ α f k∞ for all α with kαk1 6 ℓ.
Finally, take ℓ > 1 and apply the first part to bound k∂
For (c) use the arguments in (b) together with y ց 0.
Exercise 5.37 (p. 151): Choose ε > 0 such that K + B3ε ⊆ U . Let ε be the function
from Exercise 5.17 and consider ε ∗ 1K+Bε .
Exercise 5.39 (p. 152): Apply the arguments behind Lemma 5.36 and Theorem 5.34 using
some fixed χ ∈ Cc∞ (U ) with χ|K ≡ 1.
S
Exercise 5.52 (p. 161): Set Kj = Vjr i6=j Vi for j = 0, . . . , k and apply Lemma A.28 to
find a continuous partition of unity. Combine this with Exercise 5.17 to obtain the smooth
partition of unity.
Exercise 5.54 (p. 163): Average the function over large balls with different centres and
use Proposition 5.53 and the boundedness assumption to estimate the difference between
the values at the two centres.

Exercise 5.57 (p. 165): Assume first either that U is a set of the form U ∩ Bε z (0) or
is convex as in Definition 5.31. Show that for φ ∈ C ∞ (U ) we have
h∂j g, φiL2 (U ) = − hg, ∂j φiL2 (U )
so g satisfies the usual integration by parts formula but even for φ ∈ C ∞ (U ). Then for λ > 1
show that the function g λ defined by
(
λ g(λx) for λx ∈ U,
g (x) =
0 for λx ∈
/U
is in H01 (U ) (for example, by using similar arguments to Exercise 5.19) take λ ց 1, and
conclude that g ∈ H01 (U ). Finally, use Lemmas 5.40, 5.41, and ∆g = 0. For more general
sets as in Definition 5.31 use a smooth partition of unity to localize g to sets of the

form U ∩ Bε z (0) (without destroying the feature that g vanishes in the square-mean
sense at the boundary).
Exercise 6.1 (p. 167): For (a) note that an eigenvalue would have absolute value one and
the eigenvector would have to be a sequence with constant absolute value. For (b) consider
geometric sequences.
Exercise 6.6 (p. 169): Use Hölder’s inequality and the Arzela–Ascoli theorem to prove
compactness for p > 1. For p = 1 compactness fails as one sees from studying a se-
quence (fn ) of positive functions with integral one and support [ 12 − 1 1
,
n 2
+ 1
n
] for n > 3.
Exercise 6.9 (p. 170): The special case of k = 0 in (a) is treated in detail in Lemma 6.58
and the method there also works for general k > 0. For (b) use the Arzela–Ascoli theorem
(Theorem 2.38) in the case of compact closure and consider a fixed non-zero function and
all its shifts in the case of U = R. In (c) the answer is negative, for example because the
closure of the image of the unit ball contains all characters.
Exercise 6.12 (p. 171): Use the first part of the proof of Proposition 6.11, Proposi-
tion 2.51, and Lemma 6.7.
Exercise 6.16 (p. 174): Consider the images under K of the functions fn = 1[3n,3n+1]
for n > 1, all of which have L2 norm one.
Exercise 6.24 (p. 176): Set V = U (H)⊥ and n m
L shown that U (V ) ⊥ U (V ) for all in-
tegers m, n with 0 6 n < m. Define Hshift = n>0 U V and show that
\
Hunitary = H⊥
shift = U nH
n>0
satisfies the claims in the exercise.
Exercise 6.29 (p. 178): Expand φ and f in terms of the orthonormal basis of The-
orem 6.27, and compare coefficients.
T
Exercise 6.33 (p. 181): In both cases show that n∈J ker(An − λn I) is invariant un-
der A1 , A2 , . . . for any choice of λn and any choice of index set J ⊆ N. Show moreover
that this intersection is finite-dimensional if λ1 6= 0. Now apply Theorem 6.27 to A1 and
to An restricted to the eigenspaces of A1 .
Exercise 6.34 (p. 182): To prove the second inequality (6.14) first take the linear hull V0
of the eigenvectors v1 , . . . , vk corresponding to the first k positive eigenvalues (assume first
that there are at least k positive eigenvalues) and calculate the minimum. Then let W be
the linear hull of V0⊥ and the k-th eigenvector vk (also belonging to V0 ), and note that
any k-dimensional subspace V will intersect W non-trivially.
Exercise 6.45 (p. 192): Consider the self-adjoint compact operator A∗ A and apply The-
orem 6.27. Using that basis, define P such that P 2 = A∗ A, and define Q so that A = QP .
Exercise 6.47 (p. 193): Calculate the trace-class norm for k a character (it will be 1) and
use absolute convergence of Fourier series (Theorem 6.47).
Exercise 6.49 (p. 195): For a fixed compact set Y ⊆ X use the proof of Proposition 6.48
R
to show that Y |k(x, x)| dµ(x) 6 kKktc . Conclude that k is integrable along the diagonal.
S
Fix an increasing sequence of compact sets with X = n>1 Yn . For every n consider a

sequence of partitions ξn,ℓ ℓ>1 of YnrYn−1 as in the proof of Proposition 6.48 (where
we set Y0 = ∅). Use this sequence (by enumerating N2 in some fashion) to define an
orthonormal basis. To conclude, use in addition Lemma 6.41.
Exercise 6.50 (p. 195): For (a), suppose that (Ak ) is a Cauchy sequence with respect
to k · ktc . Since k · kop 6 k · ktc we have limk→∞ Ak = A ∈ B(H). For given k, ℓ, N > 1 and
any list of orthonormal vectors (vn )n=1,...,N and (wn )n=1,...,N we have
N
X
|h(Ak − Aℓ )vn , wn i| 6 kAk − Aℓ ktc .
n=1
Fixing (vn ) and (wn ) and letting ℓ → ∞ gives

X
|h(Ak − A)vn , wn i| < ∞.
n>1
For k = 1 this shows A ∈ TC(H), and taking k → ∞ then gives kAk − Aktc → 0.
Exercise 6.52 (p. 195): Let (vn ) and (wn ) be lists of orthonormal vectors, and notice that
N
X N Z
X
Z

|hAvn , wn i| = hAt vn , wm i dµ(t) 6 kAt ktc dµ(t)

n=1 n=1 T T
for all N > 0. This gives the first claim; the argument for the trace of A is similar.
Exercise 6.53 (p. 195): Part (a) follows quickly from the identity
|hAej , ek i| = |hA∗ ek , ej i|
for all j, k. For (b), let (fn ) be a different orthonormal basis and note that
X X
|hAej , ek i|2 = kAej k2 = |hAej , fk i|2 ,
k >1 k >1
and so X X X
|hAej , ek i|2 = |hAej , fk i|2 = |hej , A∗ fk i|2 .
j,k>1 j,k>1 j,k>1
Arguing similarly one can also replace ej by fj . For (c), suppose first that B = U is unitary
and apply Lemma 6.38 to conclude the argument. For (d) define
X
hA1 , A2 iHS = hA1 ej , ek i hA2 ej , ek i
j,k>1
and show that the (amn ) ∈ ℓ2 (N2 ) correspond precisely to operators A ∈ HS(H) by setting
X X X
A c j ej = ajk cj ek
j >1 k >1 j >1
(Proposition 6.11 with X = Y = N shows this is a well-defined bounded operator). For (e)
suppose A, B ∈ HS(H) and calculate
X X
kABk2HS = |hABej , ek i|2 = |hBej , A∗ ek i|2
j,k>1 j,k>1
X
6 kBej k2 kA∗ ek k2 = kBk2HS kAk2HS
j,k>1
by part (a) and (b) and its proof. For (f) assume that H is infinite-dimensional, define Ben
to be n−1/2 en and show that B ∈ HS(H). For (g) and (h) apply Proposition 6.11.
Exercise 6.54 (p. 196): Let A, B ∈ HS(H) and let (vn ) and (wn ) be two orthonormal
lists. Then
N N N
!1/2 N
!1/2
X X X X
∗ 2 ∗ 2
|hABvn , wn i| 6 kBvn kkA wn k 6 kBvn k kA wn k
n=1 n=1 n=1 n=1
shows that kABktc 6 kAkHS kBkHS by Exercise 6.53(a) and (b) (and its proof above).
For (b) suppose first that P is positive, self-adjoint and trace-class, and find A ∈ HS(H)
with P = A2 . Then apply Exercise 6.45.
Exercise 6.55 (p. 196): For (a), show that for any w ∈ H the map H ∋ v 7→ hv, wi0
is a bounded linear operator that depends semi-linearly on w. Conclude that it must be
of the form hv, wi0 = hv, AwiH for a bounded operator A. Use the properties of h·, ·i0 to
show that A is positive and self-adjoint. For (b) recall that H ∋ f 7→ f (x) is a bounded
functional and show that A as in (a) is of the form A(v) = hv, vx i vx for some vx ∈ H.
For (c) show that supx∈K kvx k is finite for all compact subsets K of U (for example, by
analyzing the arguments leading to Theorem 5.34).
Exercise 6.62 (p. 199): Apply the argument behind Theorem 5.45 to prove that
kχf kH k (U ) ≪χ,k |λ|k/2 kf k2
for some fixed χ ∈ Cc∞ (U ) and k > 1. Apply Exercise 5.39.
Exercise 6.63 (p. 201): For (a), differentiate under the integral sign to express Jn′ and Jn′′
as integrals. Simplify
x2 Jn′′ (x) + (x2 − n2 )Jn
using the identities sin2 t + cos2 t = 1 and a2 − b2 = (a − b)(a + b) and integration

by parts. Notice that the resulting expression coincides with −xJn′ (x). For (b), repeat
the argument for the first integral in the expression for Yn . The boundary terms from
the partial integration cancel with the corresponding expression arising from treating the
second integral in the same way (differentiating under the integral needs to be justified
as the domain is unbounded), via the identity sinh2 t + 1 = cosh2 t. For (c), notice that
if f ∈ L2 (U ) has weight n then f is orthogonal to all eigenfunctions of weight m ∈ Zr{n}.
Exercise 6.66 (p. 204): Given f (x) = sin(πR−1 n1 x1 ) · · · sin(πR−1 nd xd ), for x ∈ (0, R)d
and f (x) = 0 for x ∈ Rdr(0, R)d , define

fλ (x) = f ( R
2
,..., R
2
) + λ(x − ( R
2
,..., R
2
))
for λ > 1 and fe = fλ ∗ ε (cf. Exercise 5.17; also see the proof of Corollary 8.47).
Exercise 6.70 (p. 208): Use ∆fn = λn fn and f ∈ Cc∞ (U ) to first show that
|hf, fn i| ≪f,k |λn |−k
for any k > 1. Then fix some compact K ⊆ U and use Exercise 6.62 to bound kfn kK,∞ in
terms of |λn |, whose growth rate we know.
Exercise 7.5 (p. 212): Construct the complement as a kernel of a linear map, using the
Hahn–Banach theorem.
Exercise 7.12 (p. 214): Consider a dense countable subset {ℓ1 , ℓ2 , . . . } of X ∗ and choose
for every ℓn some xn ∈ X with kxn k = 1 and |ℓn (xn )| > kℓn k/2. Now take the Q-linear
(or Q(i)-linear) hull of {xn }, which is countable, and show that it is dense.
Exercise 7.15 (p. 218): After establishing linearity over C choose θ ∈ C with |θ| = 1
and θ LIM((an )) = LIM((θan )) > 0 to prove that the complex extension has norm one.
Exercise 7.26 (p. 222): If H is abelian and finitely generated, then H is a quotient
of some Zd and so has Følner sequences. Use this for finitely generated subgroups of a
countable abelian group G to find a Følner sequence for G.
Exercise 7.27 (p. 222): One approach is to construct a box-like (not cube-like) Følner
sequence. An alternative is to write the group as a semi-direct product and use Proposi-
tion 7.20.
Exercise 7.28 (p. 223): Emulate the strategy used to show that a free group is not
amenable in Example 7.22.
Exercise 7.30 (p. 223): For (a), let mG be a finitely additive left-invariant mean on G.
.
Let x0 ∈ X and B ⊆ X and define mX (B) = mG ({g ∈ G | g x0 ∈ B}). For (b), note
that setting X = R2 does not immediately work in order to prove that (a) implies (b), as
one would have mX (K) = 0 for any bounded set K in R2 . Instead, use mX as in (a) to
construct a finitely additive function m defined on all bounded sets by setting
m(B) = cn mX (2n+1 Z2 + B)
for any subset B of [−2n , 2n )2 and define cn > 0 so that m([0, 1)2 ) = 1, and show that
the definition does not depend on n.
Exercise 7.45 (p. 240): Check the claim first for open subsets, and then argue along the
lines used to prove Proposition 2.51.
Exercise 7.49 (p. 241): For (a) note that X has a countable base {Un } for the topology,
and write every clopen set as a union of finitely many Un . For (b) use this to construct an
injective continuous map from X to {0, 1}N .
Exercise 7.53 (p. 248): Apply Theorem 7.44 to obtain a locally finite measure representing
the restriction of Λ to C0 (X). Assuming that µ(X) = ∞, find some function f ∈ C0 (X)
R
for which X f dµ = ∞, and then use positivity to obtain a contradiction. Finally, show
that µ represents Λ on all of C0 (X) by showing that Λ is necessarily bounded.
Exercise 7.57 (p. 252): Combine the argument in Section 7.4.4 with Theorem 7.54.
Exercise 7.58 (p. 252): For (a), notice that ℓ∞ (N) can be embedded into the Banach
1
space L ∞ (X) using the subset { n | n ∈ N}. Now extend the Banach limit from ℓ∞ (N)
∞
to L (X) and show that it does not arise from a signed measure on X. For (b), if f is a
non-measurable bounded function X → R then f induces a linear functional on the space
{µ ∈ M(X) | |µ|(B) = 0 for all B ⊆ XrD for some countable set D ⊆ X},
R
since for each such measure one can define f dµ as a countable sum. Now extend this
functional to all of M(X).
Exercise 8.6 (p. 255): To see that (1) is necessary apply Theorem 4.1.
Exercise 8.8 (p. 255): Apply Theorem 4.1.
Exercise 8.9 (p. 256): For (b) prove (A∗ )−1 Nx1 ,...,xn ;ε (A∗ y0∗ ) = NAx1 ,...,Axn ;ε (y0∗ ).
Exercise 8.12 (p. 258): For (a) use the Baire category theorem (Theorem 4.12). For (b)
assume that the neighbourhoods of the form Nx1 ,...,xn ;1/n (0) form a basis of the weak*
topology neighbourhoods of 0 ∈ X ∗ and conclude that X is the linear hull of {x1 , x2 , . . . }
by using the same argument as in the proof of Lemma 8.13.
Exercise 8.14 (p. 259): Apply Exercises 7.11–7.12 to reduce to the separable case.
Exercise 8.15 (p. 259): Suppose that there is a sequence that converges weakly but not
in norm. Show that this implies that there is a sequence (fn ) in ℓ1 (N) such that kfn k1 = 1
for all n > 1 but for which fn converges weakly to 0 as n → ∞. Use this to con-
struct a strictly increasing sequence of natural numbers (Ij ) and a subsequence (fnj ) such
PIj−1 1
P∞ 1
that k=1 |fnj (k)| 6 5
and k=Ij +1 |fnj (k)| 65
for all j > 1, where we set I0 = 0.
P∞
Using this partition, construct an element h in ℓ (N) for which
∞
k=1 fnj (k)h(k) does
not converge to 0 as j → ∞.
Exercise 8.21 (p. 261): For every g ∈ G consider the map Lg : CR (G) → CR (G) defined
by (Lg f )(x) = f (gx). Show that {Λ ∈ C(G)∗ | Λ = Λ ◦ Lg1 = · · · = Λ ◦ Lgn , Λ > 0, Λ(1) = 1}
is a closed non-empty subset of the unit ball in C(G)∗ for any g1 , . . . , gn ∈ G. To see that
these sets are non-empty, use induction and suppose that Λ0 belongs to the set defined
1 K−1P k
by g1 , . . . , gn−1 ∈ G. Then any weak* limit of K k=0 Λ0 ◦ Lgn will belong to the set
defined by g1 , . . . , gn ∈ G. See also Exercise 8.37 and the discussion there.
Exercise 8.22 (p. 262): For both parts of the exercise, let (vn ) be any sequence in H
with kvn k 6 1, assume without loss of generality that vn → v ∈ H as n → ∞ in the weak*
topology, and recall Exercise 8.6. Use compactness of A to prove that kAA∗ (vn − v)k → 0
as n → ∞ and consider
kA∗ (vn − v)k2 = hA∗ (vn − v), A∗ (vn − v)i = h(vn − v), AA∗ (vn − v)i .
Exercise 8.23 (p. 262): Show first that SH is weak* closed and non-empty. Us-
ing Theorem 8.10 deduce that the intersection is only empty if some finite intersec-
tion SH1 ∩ · · · ∩ SHn is empty. However, H = H1 + · · · + Hn is another finitely generated
subgroup and so SH1 ∩ · · · ∩ SHn = SH is non-empty.
Exercise 8.24 (p. 262): Give G the discrete topology, so that by amenability there is a
Banach limit in (ℓ∞ (G))∗ . Restrict this to C(G) and deduce the existence of a translation-
invariant measure from the Riesz representation theorem.
Exercise 8.26 (p. 262): Suppose without loss of generality that x∗0 = 0. Now apply weak*
compactness to the weak* closed subsets BsX ∗ ∩ K for s > inf k∈K kkk.
Exercises 8.32–8.33 (p. 264): Both exercises require the generalization of Section 2.3.3
to Td .
Exercise 8.40 (p. 268): For ergodicity, use Fourier series as in the proof of Lemma 8.38.
Exercise 8.44 (p. 272): Apply Exercise 8.39 to obtain weak* convergence on the set
V + Bs/2 .
To obtain strong convergence, express the difference quotient at a point x ∈ V and

s
for h, 0 < |h| < 2
as an integral of shifts of the weak derivative (cf. Lemma 8.42) and
apply Lemma 3.74.
Exercise 8.59 (p. 292): The uniform operator topology is the only topology that has
neighbourhoods that are bounded with respect to the operator norm. If xn in X and yn
in Y ∗ have norm one for all n > 1, then
∞
X 1 ∗
L 7−→ yn (Lxn )
2n
n=1
is a continuous functional on B(X, Y ) and so also continuous with respect to the weak
topology. Choosing the sequence (xn ) carefully makes this functional not continuous with
respect to the strong, nor the weak, operator topology. Finally, notice that for the strong
operator topology and x ∈ Xr{0} there exists a neighbourhood, namely Nx;1 (0), such
that {Lx | L ∈ Nx;1 (0)} ⊆ Y is bounded while there is no such neighbourhood in the weak
operator topology.
Exercise 8.62 (p. 293): Apply the Hahn–Banach lemma (Lemma 7.1).
Exercise 8.67 (p. 295): For (c) suppose that ℓ : MF([0, 1])∗ → C is continuous and linear.
Suppose ε > 0 is chosen so that f ∈ Uε (0) implies that |ℓ(f )| < 1. Given any f ∈ MF([0, 1])
P
use a partition of [0, 1] to split f into a finite sum f = nk=1 fk such that λfk ∈ Uε (0) for
all λ ∈ C.
Exercise 8.75 (p. 300): Look at the proof of Theorem 7.3 to see how to obtain a complex-
linear functional from a real-linear functional.
Exercise 8.76 (p. 300): Without loss of generality we may assume that 0 is an interior
point of K. Fix some y0 ∈ L so that 0 is an interior point of M = K − L + y0 . Since K
and L are disjoint, K − L cannot contain 0 and M does not contain y0 . Now let g be the
gauge function of M so g(y0 ) > 1. Define f (λy0 ) = λg(y0 ) for all scalars λ. Extend f to
the whole space with f (x) 6 g(x) for all x using the Hahn–Banach lemma, and notice
that f (x) 6 1 for all x ∈ M and f (y0 ) > 1.
Exercise 8.77 (p. 300): Let r = inf x∈K kz − xk. Hence for every ε > 0 there is an x0 ∈ K
such that v = z − x0 satisfies kvk < r + ε and hence for ℓ ∈ X ∗ with kℓk = 1 we have
r + ε > ℓ(v) = ℓ(z) − ℓ(x0 ) > ℓ(z) − sup ℓ(x).

x∈K
For the converse set L = Br (z) and apply Exercise 8.76 for K and L.

Exercise 8.78 (p. 300): Consider the compact convex set K = ı B1X with the clos-
ure taken with respect to the weak* topology, assume that ℓ ∈ (B1X )∗∗rK, and apply
Theorem 8.73 using the weak* topology on X ∗∗ .
Exercise 8.84 (p. 304): For (a) use the Arzela–Ascoli theorem. For (c) use piecewise
linear functions as in (b) to approximate a given function f ∈ K. For (d) use the fact that
any f ∈ K is almost everywhere differentiable with derivative in [−1, 1].
Exercise 8.87 (p. 304): Notice that two possible barycentres of µ cannot be separated
by X ∗ .
Exercise 8.89 (p. 306): To see that the set of barycentres is closed, show that
{(µ, x) ∈ C(M )∗ × K | µ is a probability measure on M, x is the barycentre of µ}
is a compact subset of C(M )∗ × X and consider the projection map to K.
Exercise 8.91 (p. 306): Use induction on the dimension n. If x0 is a boundary point,
then there exists a hyperplane V that contains x0 with the property that K lies in one of
the closed half-spaces with boundary V . If x0 is an interior point, take any extreme point y
and find a boundary point z such that x0 is in the line segment from y to z.
Exercise 8.93 (p. 311): One direction is clear using Theorem 8.90. For the other direction,
suppose that µ represents x0 and µ 6= δx0 . Then there exists some y in Supp(µ)r{x0 },
a linear functional ℓ ∈ X ∗ , and an open neighbourhood U of y with µ(U ) ∈ (0, 1)
1 1
and supz∈U ℓ(z) < ℓ(x0 ). Now use µ = λ µ(U µ| + (1 − λ) µ| with λ = µ(U )
) U µ(KrU ) KrU
and the existence of barycentres (Lemma 8.88) to see that x0 is not extreme.
Exercise 9.9 (p. 319): For (a) the precise condition is f (x) 6= 0 for µv -almost every x. Sim-
plifying the notation, assume that H = L2 (T, µ). If f vanishes on a set B of positive meas-
ure, then clearly Hf ⊥ 1B . So suppose that f 6= 0 almost everywhere. Then clearly gf ∈ Hf
for g any character, hence for g any trigonometric polynomial, hence for g ∈ C(T), hence
for g = 1O for any open set O by dominated convergence, and finally for g = 1G for
T
any Gδ -set n>1 On . Since any measurable set coincides modulo µ with a Gδ -set, we may
apply dominated convergence once again to obtain the case g ∈ L∞ µ (T). Apply this to
the function defined by gn = f1 1{x∈T||f (x)|>1/n} to obtain 1{x∈T||f (x)|>1/n} ∈ Hf for
all n > 1 and conclude that 1 ∈ Hf . In (b) the spectral measure is given by the Lebesgue
measure.
P P n
Exercise 9.10 (p. 320): For (a) note first that ∞ n=0 dn
∞
k=1 ck z
k = z for sufficiently
smallP
z, and as an
P identity in the
ring C JzK of formal power series. Using the assumption
that ∞ ∞ k n < ∞ we see first that
n=0 |dn | k=1 |ck |kAk
∞
X ∞
X N
X K
X
n n
g f (A) = dn c k Ak ≈ dn c k Ak
n=0 k=1 n=0 k=1
if N and K are sufficiently large, so
N
X K
X NK
X
n
dn c k Ak = A+ eN,K,ℓ Aℓ ,
n=0 k=1 ℓ=min{N,K}+1
where the last sum can be made arbitrarily small if N and K are sufficiently large.
Exercise 9.14 (p. 323): Use Theorem 9.2 to obtain

M M
H= Hwn ∼
= L2 (T, µwn )
n>1 n>1
and prove that Hρ can be expressed as the direct sum of certain subspaces of L2 (T, µwn ).
Exercise 9.15 (p. 323): For (a), multiply by the square root of the Radon–Nikodym
derivative of one measure with respect to the other. Using (a), we may modify the measures
in Corollary 9.13 to satisfy µn = µ1 |Bn for some nested sequence of Borel sets
B1 = T ⊇ B2 ⊇ · · · ,
hence (b) follows by repeating that argument. For (c), show that f ∈ L2 (T, ν1 ) satisfies
the property defining H(1) . For the converse note that we can define in a measurable way
for any u ∈ Cn with n > 2 or u ∈ ℓ2 (N) a vector u′ in the same space with u′ ⊥ u and
with ku′ k = kuk, for example by first projecting onto C2 ⊆ Cn ⊆ ℓ2 (N) and there using
the orthogonal direction or a suitable multiple of the first basis vector if the projection
is zero. Now consider a general function F = (f1 , f2 , . . . , f∞ ) with fn : (T, νn ) → Cn
and f∞ : (T, ν∞ ) → ℓ2 (N). If fn 6= 0 for some n ∈ {∞, 2, 3, . . . } then, using the argument
above, construct a function fn′ with fn (x) ⊥ fn′ (x) and with kfn (x)k = kfn′ (x)k for νn -
′ = 0 for all m 6= n and conclude that F does not satisfy the property
almost every x. Set fm
defining H(1) , showing the reverse inclusion. For (d) argue in a similar way: For F given
by (0, f2 , 0, . . . , 0), define F2 by rotating f2 and show the defining property for H(2) . We
note again that for n > 3 we can measurably define for every u1 , u2 ∈ Cn or in ℓ2 (N) a
vector u′ orthogonal to both u1 and u2 with ku′ k = ku1 k. Using this argue as before.
Exercise 9.19 (p. 326): Fix some v ∈ H and describe the unitary operator U on Hv by a
multiplication operator on L2 (S1 , µv ). Now calculate the spectral measures µv,w = µv,P (w)
in that context, where P : H → Hv is the orthogonal projection.
Exercise 9.23 (p. 327): For Rα note that the characters are eigenfunctions. For A show
that a character is mapped to a character but that the orbit of any non-trival character is
infinite, and then apply Lemma 9.12.
Exercise 9.33 (p. 334): For (c), apply the Stone–Weierstrass theorem (see Exercise 2.43).
For (d), first approximate g simultaneously in L1 (Rd ) and L2 (Rd ) by some function f0
2
in Cc (Rd ). Then approximate eπkxk f0 (x) by some function f1 ∈ A with respect to k · k∞ ,
2
and notice that f (x) = e−πkxk f1 (x) will then approximate g with respect to k·k1 and k·k2 .
For (e) consider f1 , f2 ∈ A and express the inner product in the form
Z

hf1 , f2 i = f1 f2 dx = \
f1 f2 (0)
and use part (b) and Proposition 9.34.
Exercise 9.41 (p. 340): Show that the four-fold Fourier transform of a function is again
the original function, and apply the argument used in Section 1.1 (or Theorem 3.80). Also
consider the function f in Example 9.27 together with λx0 f and products of such functions
to prove that all four possible eigenvalues appear.
Exercise 9.47 (p. 342): Consider the associated (well-defined) function g in C ∞ (Td )
P
defined by g(x) = n∈Zd f (n + x).
Exercise 9.48 (p. 342): First show that we can approximate any function in Lp (R) by
a function of compact support, so it is enough to approximate a compactly supported
function f ∈ Lp (R) (so that fb ∈ C ∞ (R)). Let h1 ∈ Cc∞ (R) be a non-trivial real-valued
function with h1 (x) = h1 (−x), define h to be h1 ∗ h1 , multiply by a scalar so that h(0) = 1,
cr ∗ f → f in Lp (R) as r → ∞ using
and set hr (x) = h(rx) for all r > 0. Now prove that h
Jensen’s inequality as in the proof of Theorem 9.39.
Exercise 9.50 (p. 343): Use the condition for equality in the Cauchy–Schwarz inequality
to deduce that f must satisfy a differential equation of the form f ′ (x) = λxf (x) for
some λ ∈ R.
Exercise 9.51 (p. 343): Notice that if for f we have equality, then we have equality in
Exercise 9.50 for g(x) = e−2πixt0 f (x) + x0 .
Exercise 9.55 (p. 345): For (a) let C[G] be the space of finitely supported complex-valued
measures and use p to define a semi-inner product on C[G] with
p(g) = hδg , δe i = hδhg , δh i
for all g, h ∈ G. Then show that πh (δg ) = δhg extends to a unitary representation on
the completion of C[G] modulo the kernel of the semi-norm induced by the semi-inner
product. For (b) assume that p is extreme and the unitary representation in (a) is reducible,
decompose the generator into the components corresponding to an invariant subspace and
its orthocomplement, and study the matrix coefficient of these three vectors.
Exercise 9.63 (p. 352): For (b) notice that this is equivalent to C\
V
c (R) being dense
∞
in Graph(A) = {(f, g) ∈ L (R) × L (R) | g(t) = tf (t) for t ∈ R}. For this improve the
2 2
argument for Exercise 9.48 (also see its hint on p. 577) by proving that

cr ∗ f = MI h
MI h cr ∗ f + h
cr ∗ (MI f )

cr ∗ f → 0 in L2 (R) as r → ∞. For (d) extend Proposition 9.43 to
and showing that MI h
weak derivatives.
k k+1

Exercise 10.4 (p. 360): Set B1 = f −1 z ∈ C | ℜ(z) ∈ [ n , n ), ℑ(z) ∈ [ nℓ , ℓ+1
n
) for
some k, ℓ ∈ Z and n ∈ N, and set B2 = GrB1 . Combine the assumption in the exercise
and Lemma 10.3 to conclude that mG (B1 ) = 0 or mG (B2 ) = 0. Vary k, ℓ, n to conclude
the proof.
Exercise 10.5 (p. 360): For (a) show that θ∗ mG (B) = mG (θ −1 B) defines a left Haar
measure and use Proposition 10.2. For the continuity in (b) let B = K be a fixed compact
set and use the regularity of the measure mG . For (c) use the substitution formula
Z Z
f ◦ θ dmG = f dθ∗ mG .
For (d) apply (a) with B = G.

Exercise 10.6 (p. 361): For f ∈ Cc (G) we may use uniform continuity to argue that
Z
ψn ∗ f (g) = ψn (h) f (h−1 g) dmG (h)
| {z }
≈f (g)
since ψn vanishes outside Un . For the convolution on the right a different argument is
needed as follows. Write
Z Z
f ∗ ψn (g) = f (h)ψn (h−1 g ) dmG (h) = f (gk)ψn (k −1 ) dmG (k),
G | {z } G
=k−1
using the subsitution gk = h. From here (depending on how much one wishes to assume
about the sequence of functions) one could assume that each ψn is symmetric in the sense
that ψn (g) = ψn (g −1 ), or use the fact that the modular character is itself a continuous
function so the difference between integrating against ψn (k −1 ) and ψn (k) for k ∈ Un is
small. To deduce the result for f ∈ L1 (G) use the usual approximation arguments.
Exercise 10.7 (p. 361): Let U = U −1 be a compact neighbourhood of the identity e in

the group G. Find a maximal collection of disjoint left translates g1 U, g2 U, . . . and show
S
that this collection must be finite. Now show that G = gi U 2 .
Exercise 10.20 (p. 373): Use Proposition 7.20(a) and the Følner condition. For the
converse use the same argument as in Exercise 8.23.
Exercise 10.21 (p. 373): For (1) notice that the proof that (3) =⇒ (1) in Theorem 10.15
only uses finite sets. For (2) show, for example, that a function f in P(G) satisfying Reiter’s
condition in Definition 10.14 for a compact K ⊆ G and ε > 0 also satisfies a topological
version for all f0 ∈ P(G) that vanish outside of K. Use this to induce a left-invariant
mean that is also topologically left-invariant.
Exercise 10.22 (p. 373): For (1) show that G × G is amenable. Then use the left-
invariant mean M2 on G × G to define M (φ) = M2 ((g1 , g2 ) 7→ φ(g1 g2−1 )). For (2) convolve
a function f as in Reiter’s condition with its flipped version fe(g) = f (g −1 ) and show
that f ∗ fe satisfies Reiter’s condition for left- and right-multiplication.
Exercise 10.23 (p. 373): If G is discrete and uncountable, then combine the following
with the conclusion in Exercise 10.20. So assume that G is σ-compact, locally compact,
and metric. For (1) =⇒ (2), define for f ∈ C(X) the functional Λ(f ) = M (g 7→ f (gx))
for some left-invariant mean M on G and some x ∈ X. For (2) =⇒ (3) one would like to
use the action of G on K to find an invariant measure µ and then apply Lemma 8.88 to
find a G-invariant barycentre of µ in K. However, as K is not assumed to be metrizable
this requires a small work-aroundSas follows. Let k · k1 , . . . , k · km be a finite collection of
semi-norms on V and write G = ∞ n=1 Gn with Gn compact and Gn contained in Gn+1
o
for all n > 1. Show that the semi-norms
kvkk,n = sup kπglin (v)kk

g∈Gn
are finite for any 1 6 k 6 m and n > 1 and are compatible with the topology on V .
Define V0 = {v ∈ V | kvkk,n = 0 for all 1 6 k 6 m and n > 1}, set W = V /V0 and
define p : V → W by p(v) = v + V0 . Equip W with the collection of quotient semi-norms
induced by k · kk,n and show that p is continuous. Show that G acts continuously on W
and that p is equivariant for the G-action. By applying the argument for the metrizable
case outlined above, show that the set
{v ∈ K | kπgaff (v) − vk1 = · · · = kπgaff (v) − vkm = 0 for all g ∈ G}
is closed and non-empty for any finite collection k · k1 , . . . , k · km of semi-norms on V .

Finally apply compactness. For (3) =⇒ (1) we would like to use the compact convex set
M (G) ⊆ (L∞ (G))∗
and the linear action λ∗g for the left regular representation λg : L∞ (G) −→ L∞ (G). If G
is discrete this is possible, but in general this does not define a continuous affine action of
the sort considered in (3). For this reason, define the subspace
LUC(G) = {φ ∈ L ∞ (G) | kλg φ − φk∞ −→ 0 as g → e}
of left uniformly continuous functions on G, and its dual space X = (LUC(G))∗ with the
topology induced by the semi-norms kM kK,φ = supg∈K |M (λg φ)| for M ∈ (LUC(G))∗ ,
where K ⊆ G is a non-empty compact subset and φ ∈ LUC(G). Show that on any bounded
subset B of (LUC(G))∗ the topology induced by these semi-norms agrees with the weak*
topology on B. Show that the action λ∗g for g ∈ G on X satisfies the assumptions of (3)
and deduce that there exists a left-invariant mean on LUC(G). Use the argument from
the proof of Lemma 10.17 and the step (1) =⇒ (3) in Theorem 10.15 to complete the
argument.
Exercise 10.24 (p. 374): For (1), show that the Reiter condition for G implies the Reiter
condition for H. Let f ∈ Cc (G) ∩ P(G) satisfy the Reiter condition for ε > 0 and a
finite K ⊆ H. Define the space X = H\G with the usual map g 7→ Hg from G → X, and
the probability measure ν on X by
Z
ν(B) = 1B (Hg)f (g) dm(g).
G
It suffices to find g ∈ G such that

R
X |f (k −1 hg) − f (hg)| dmH (h)
F (g) = R
f (hg) dmH (h)
k∈K
is defined and bounded above by |K|ε. RShow that F (h0 g) = F (g) for every h0 in H (even
if H is not unimodular), that ν {Hg | f (hg) dmH (h) = 0} = 0, and choose a compact
subset L ⊆ H so that f (g) > 0 and f (hg) > 0 (or f (k −1 hg) > 0) implies h ∈ L. Then
Z Z
F (Hg) dν = F (Hg)f (g) dmG (g)
X G
X Z Z
1
= F (Hg) f (hg) dmH (h) dmG (g)
mH (L) G L
k∈K
X Z Z
1
= |f (k −1 hg) − f (hg)| dmH (h) dmG (g)
mH (L) G L
k∈K
X
= kλk f − f k1 < |K|ε.
k∈K
Finally, use Exercise 10.21. For (2) use Exercise 10.23 (either (2) or (3)).
Exercise 10.25 (p. 374): Use Exercises 10.23(2) and 10.24(1).
Exercise 10.28 (p. 374): For (1), fix a generating set and use metric open balls of
increasing radius to define a Følner sequence.
Exercise 10.30 (p. 375): Take the closed convex hull K of {πg v | g ∈ G} and apply
Theorem 3.13 with v0 = 0.
Exercise 10.35 (p. 376): Apply the definitions and Exercises 10.7 and 10.30.
Exercise 10.37 (p. 377): For any finite subset F ⊆ G define the subgroup HF = hF i gen-
erated by F . Note that G acts on the quotient space G/HF and also unitarily on ℓ2 (G/HF ).
L
Consider the direct product representation of G on F ℓ2 (G/HF ).
Exercise 10.47 (p. 384): Combine Theorem 10.38 and Proposition 10.41.
Exercise 10.53 (p. 388): Given a function f ∈ L2 (G) show that for almost every g ∈ G
the function fg : Γ ∋ γ 7→ f (gγ) belongs to ℓ2 (Γ ). Define φ : L2 (G) → HG by φ(f )(g) = fg
for all g ∈ G.
Exercise 10.64 (p. 404): Show that the characteristic function of a ‘connected compon-
ent’ is also an eigenfunction for eigenvalue one, and use the level sets of a non-constant
eigenfunction for the converse.
Exercise 10.67 (p. 406): Combine the argument after Proposition 10.41 with division
with remainder in Z.
Exercise 11.4 (p. 410): For λ = 0. Use Corollary 4.30 (or Exercise 6.25) to see that Mg
is invertible if and only if µ({0}) = 0 (respectively, g is non-zero µ-almost everywhere)
and z 7→ z1 (resp. g1 ) is essentially bounded with respect to µ.
Exercise 11.7 (p. 410): For (a) use the isomorphism between ℓ2 (Z) and L2 (T) provided
by Fourier series (Theorem 3.54). For (b), you may show that A is isomorphic (as a Banach
algebra) to the algebra generated by S with S as in Exercise 6.1(b).
Exercise 11.9 (p. 411): Use Lemma 2.67 and Theorem 11.6.
Exercise 11.12 (p. 413): Recall from Section 2.4.2 that multiplication is continuous.
Exercise 11.18 (p. 417): Use the C ∗ -property of the norm to first show kak 6 ka∗ k for
all a ∈ A.
Exercise 11.20 (p. 417): Start with the identity 1∗A 1A = 1A 1∗A = 1∗A , apply the star
operator and then use the C ∗ -property.
Exercise 11.36 (p. 425): For (a), combine Proposition 11.21, Corollary 11.29, and Exer-
cise 2.43(b). For (c), use (a) and the fact that C0 (σ(A)) ⊕ C ∼
= C(σ(A) ∪ {∞}).
Exercise 11.41 (p. 428): The Banach algebra of limits of absolutely convergent Fourier
series with pointwise multiplication is isometrically isomorphic to ℓ1 (Zd ) with convolution.
Apply Theorem 11.23 and Proposition 11.38 to Zd and Z cd ∼
= Td .
Exercise 11.42 (p. 428): Show first that if G is a locally compact metrizable abelian group
then
V = hλy f | y ∈ Gi
cannot be L1 (G) if fb(t) = 0 for some t ∈ G.

b Next show that L1 (G) ∗ f ⊆ V (for example,
using Propositions 2.51, 3.81, and 3.91). For (a), take G = Td and show that χn ∈ V for
all n ∈ Zd by Lemma 3.59(2). For (b), set A = L1 (G) ⊕ C as in Exercise 11.1. Replace f
by fe ∗ f if necessary to assume fb > 0. Fix some g ∈ S (Rd ) such that bg ∈ Cc∞ (Rd ).
Let h ∈ S (Rd ) have b
h ∈ Cc∞ (Rd ), b
h ∈ [0, 1] and b
h ≡ 1 on Supp(b
g ). Show that h ∗ g = g
and that 1A − h + f ∈ A is invertible. Use this to show that g ∈ V and apply Exercise 9.48.
br{1} has χn = 1 for some n > 1 and notice

Exercise 11.45 (p. 431): For (a), suppose χ ∈ G
that χ then takes values in a discrete subgroup of S1 . For (b), suppose that G = O1 ⊔ O2
is a partition into two non-empty clopen sets. Show that there exists a neighbourhood U
of e ∈ G with U + Oj = Oj for j = 1, 2. Define H = hU i and show that H is a proper open

subgroup of G, and that G/H is a finite abelian group.
Exercise 12.10 (p. 435): For the description of σresid (T ), prove that

(im(T − λI))⊥ = ker T ∗ − λI
and then use this together with an explicit description of T ∗ (see Exercises 6.23(c)
and 6.1(b)).
Exercise 12.17 (p. 437): Since the kernel of I − A is the eigenspace of A for eigenvalue 1,
almost injectivity follows directly from compactness of A (see, for example, Exercise 3.40).
The proof that im(I − A) is closed is a little more involved. Assume first that
(I − A)vn = vn − Avn → w
as n → ∞ with vn ∈ ker(I − A)⊥ . Show that (vn ) is bounded (for example, by assuming
that kvn k → ∞ as n → ∞ and applying compactness for vn ′ = kv k−1 v ). Finally, use
n n
compactness of A to conclude w ∈ im(I − A). To prove that T = I − A is almost surjective
assume that V = (T (H))⊥ is infinite-dimensional, and let (vn ) be an orthonormal basis of V
so that hvn , vn − Avn i = 0 for all n > 1. Now choose a subsequence (vnk ) with Avnk → w
as k → ∞ and derive a contradiction.
Exercise 12.18 (p. 437): For the first direction assume that T is Fredholm, let H1
be (ker(T ))⊥ , H2 be im(T ), and use Proposition 4.25 to show that T |H1 : H1 → H2
−1
has a bounded inverse. Define S|H2 to be T |H1 and S|H⊥ = 0. For the converse
2
apply Exercise 12.17 to the compact operators ST − I and T S − I.
Exercise 12.21 (p. 438): Use Fourier series and the isomorphism ℓ2 (Z) ∼
= L2 (T).
Exercise 12.27 (p. 442): The base cases n = 0 and n = 1 hold trivially by definition.
For n > 1 consider
 
1 1 X 1 X X X 
√ S Un (f ) (v) = √ Un (f )(v′ ) = (n+1)/2  f (w)
p p ′ p ′
v ∼v v ∼v k6n, w∼k v ′
k≡n(mod2)
and count how often the term f (w) appears in this sum, distinguishing between the
cases d(w, v) = n + 1 and d(w, v) 6 n − 1.

Exercise 12.29 (p. 442): Use the addition formula for sin (n + 1)θ + θ .
1
Exercise 12.30 (p. 442): For (a) use the operator U2n − U
p 2n−2
and Cauchy–Schwarz
on the finite set {w | w ∼2n v}. For (b) define fe(v) = (−1)d(v,v0 ) f (v) for a fixed vertex v .
0
For (c) treat the case θ = 0 first. If θ > 0 recall first that p > 2 and use Exercise 12.29
and (b) to deduce that it is enough to show that there are infinitely many n with
1
| sin((2n + 1)θ)| > 2
+ε
for some fixed ε > 0. Note that this holds, for example, if (2n + 1)θ ∈ π Z + [ π4 , 3π 4
]. Now
consider the following three cases: If θ 6 π4 , then every closed interval of length π2 is visited
by the rotation on R/(π Z) by 2θ infinitely often. If θ = π2 − φ for some φ ∈ (0, π4 ], then
π
(2n + 1)θ + Zπ = 2
− (2n + 1)φ + Zπ
π
and the same argument applies. In the only remaining case θ = 2
works.

dµ 1/2
Exercise 12.32 (p. 444): Use multiplication by dν
.
Exercise 12.44 (p. 451): Show that if f ∈ C(σ(T )) then λ ∈

/ f (σ(T )) =⇒ λ ∈
/ σ(f (T ))
1
by applying the functional calculus for g(z) = f (z)−λ . For the converse, let λ be an
element of σappt (T ) (noting that there is no residual spectrum in this case) and generalize
the argument for Theorem 12.37(4) to this case, again using a sequence of polynomials.
√
Exercise 12.48 (p. 453): Define the positive self-adjoint operators B = T ∗ T and A = B
using Corollary 12.45. Show that kT vk = kAvk for all v ∈ H1 and
(im(A))⊥ = ker(A) = ker(T ).
Define U v = 0 for v ∈ ker(T ) and U Av = T v for Av ∈ im(A). Show that U is well-defined

and extends to an isometry on (ker(T ))⊥ satisfying the claims in the exercise.
Exercise 12.51 (p. 454): Show that
Z Z
hf (T )v, vi = f (g(x))|v(x)|2 dµ(x) = f (y) dµv (y)
X C
where µv is the push-forward under g of the measure |v|2 dµ.
Exercise 12.54 (p. 456): Adapt the argument from Section 9.1.2. If H is not separable,
combine these arguments with Zorn’s lemma.
Exercise 12.57 (p. 458): Apply the polar decomposition from Exercise 12.48 to find an
isometry U : H1 → H2 and a positive self-adjoint operator A ∈ B(H1 ) with T = U A.
Deduce that U and A are bijective and show Aπ1 (g) = π1 (g)A and U π1 (g) = π2 (g)U for
all g ∈ G.
Exercise 12.58 (p. 458): For (a), consider the self-adjoint operator A = B ∗ B with
π1 (g)A = Aπ1 (g)
for all g ∈ G. If σ(A) contains more than one point, then there exist two non-zero func-
tions f1 , f2 ∈ C(σ(A)) such that f1 f2 = 0, which implies that V = ker(f1 (A)) is a closed
proper subspace. Then show that V is invariant under π1 (g) for all g ∈ G. For (b), apply
B+B ∗ B−B ∗
the same argument to 2
and 2i
.
ý
Exercise 12.59 (p. 458): By Exercise 9.55(b) all that remains is to show that irreducibility
ý
of the unitary representation implies extremality. Suppose therefore that πφ : G Hφ
ý
and φ = λφ1 + (1 − λ)φ2 for some λ ∈ (0, 1) and φ1 , φ2 ∈ P(G). Construct π1 : G H1
with generator v1 and π2 : G H2 with generator v2 using Exercise 9.55(a) so that
φ(g) = λφ1 (g)+(1−λ)φ2 (g) = λhπ1 (g)v1 , v1 i + (1 − λ)hπ2 (g)v2 , v2 i

= π(g)(λ1/2 v1 +(1−λ)1/2 v2 ), λ1/2 v1 +(1−λ)1/2 v2 ,
where π(g) = π1 (g) × π2 (g) on H = H1 × H2 . Thus v = λ1/2 v1 + (1 − λ)1/2 v2 generates a

cyclic sub-representation Hv of H isomorphic to Hφ by Lemma 9.53. Consider the ortho-
gonal projection P from Hv ⊆ H1 × H2 onto H1 and apply Schur’s lemma (Exercise 12.58)
to deduce that the unitary representations πφ and π1 are unitarily isomorphic under an
isomorphism sending v to v1 . Similarly for π2 , and hence φ = φ1 = φ2 .
Exercise 12.61 (p. 461): For (a) and the first part of (b) notice that ı is continuous
by definition of the weak* topology. For the example in (b) set T2 = f (T1 ) for some
function f ∈ C(σ(T1 )), or consider a measure µ on σ(T1 ) × σ(T2 ) whose support projects
surjectively onto each coordinate and define both operators as multiplication operators, or
use, for example, two diagonal 3-by-3 matrices T1 , T2 , each with two different eigenvalues
such that T1 T2 has 3 different eigenvalues.
Exercise 12.65 (p. 462): Show that dµv,w = f0 dµv satisfies (12.15) for all a ∈ A.
Exercise 12.69 (p. 467): For (a) assume that µ(U × N) = 0 for some non-empty open
set U ⊆ σ(A). Use some non-zero f ∈ Cc (U ) ֒→ C(σ(A)) ∼ = A to derive a contradiction.
For (b), notice that continuity of π follows from the definition of the weak* topology.
a for the Gelfand transform of a ∈ A when considered as an element of A′
For (c), write b
and show that b a = ao ◦ π for all a ∈ A. Now use the characterizing property of spectral
measures. For (d) use (c). For (e), note that π(σ(A′ )) ⊆ σ(A) is compact and that by (c)
we have Supp(µv,w ) ⊆ π(σ(A′ )) for all v, w ∈ H. Now apply (a).
Exercise 12.72 (p. 468): Note that ST = T S. By (FC5) this gives ST 1/n = T 1/n S,
where T 1/n is defined as in Corollary 12.42. Apply Theorem 12.60 to the C ∗ -algebra
generated by I, S and T 1/n to realize both as multiplication operators Mg resp. Mh
on L2 (X, µ) for a finite measure space (X, µ) and two positive functions g, h in L∞
µ (X)
with g n = hn µ-almost everywhere.
Exercise 12.75 (p. 470): Consider first the case B1 ⊆ B2 and show that in this
case im ΠB1 ⊆ im ΠB2 using the argument in the proof of Lemma 12.74.
Exercise 12.77 (p. 472): First deal with simple functions using the properties of a
projection-valued measure and Exercise 12.75.
Exercise 12.78 (p. 472): It suffices to consider the case f = 0. Fix v ∈ H and
show that µv (B) = hΠB v, vi for B ∈ B defines a finite measure on X. Then show
R 2
R R
that X fn (λ) dΠλ v = X
|fn |2 (λ) dΠλ v, v = X
|fn |2 dµv , and apply dominated
convergence.
Exercise 12.82 (p. 477): In both contexts the cyclic subspace is the minimal invariant
closed subspace containing a given v ∈ H and so it suffices to show that the notions of
invariance are equivalent. It is easy to verify using only the definition of convolution that
a closed subspace that is invariant under the unitary representation is invariant under
convolution. To see the converse, use the same approximation argument as in the proof of
Corollary 12.81.
Exercise 12.88 (p. 481): Show that L 0 b is a subalgebra that is closed under
1 (G) ⊆ C (G)
conjugation and separates points. Then apply Exercise 2.43.
Exercise 12.90 (p. 483): For (a) suppose that gn → g0 and tn → t0 as n → ∞. Recall
from Proposition 11.43 that the topology on Gb can be defined by uniform convergence on
compact sets, and apply this to K = {gn | n > 1}∪{g0 }. For (b), notice first that (a) implies
b → S1 is continuous. Moreover, uniform continuity of h·, ·i restricted to K × L
that ı(g) : G
b shows that ı(gn )|L → ı(g0 )|L uniformly as n → ∞. By
for some compact subset L ⊆ G
b
b is continuous. For (c)
Proposition 11.43 this shows that ı(gn ) → ı(g0 ) and so ı : G → G

approximate f1 , f2 ∈ L2 (G) by f1′ , f2′ ∈ Cc (G) and notice that λg f1′ , f2′ = 0 once g is
outside a certain compact subset. For (d), apply Theorem 12.85 to see that Mgn f → 0 in
b
b and a second time to see that b
the weak topology as n → ∞ for f ∈ L2 (G) λı(gn ) f → 0
b Now use continuity of the unitary representation b
b b b
b
weakly as n → ∞ for f ∈ L2 (G). λ of G
b
b to conclude that ı(gn ) → ∞ as n → ∞.
on L2 (G)
Exercise 12.93 (p. 484): For (a) notice that a character on G/H can be lifted to G using
composition with the quotient map G → G/H. For (b) use Theorem 12.84 on G/H to
b ⊥.
show that (H ⊥ )⊥ ⊆ H. For (c) apply (a) to G/H
Exercise 12.94 (p. 484): For (b) suppose first that θ has dense image and conclude
b t ) = 1 for some t ∈ G
from θ(χ b that t = 0. For the converse, use Exercise 12.93 to find a
b ∩ (im θ)⊥ if im θ is not dense in G.
non-trivial character t ∈ G
Exercise 12.95 (p. 485): For the isomorphism between the dual group of the product
and the direct sum of the dual groups show that the elements of the direct sum define
characters and that these separate points.
Q
Exercise 12.96 (p. 485): By definition lim(Gn , φn ) is a subgroup of n>1 Gn . Combine
←−
Exercise 12.93 and Exercise 12.95.
Exercise 12.97 (p. 485): Use Exercise 12.96 and Pontryagin duality.
Exercise 13.7 (p. 489): For (c) note that
H01 ((0, 1)) ⊆ H 1 (T) ⊆ H 1 ((0, 1))
with 1 ∈ H 1 (T)rH01 ((0, 1)) and I ∈ H 1 ((0, 1))rH 1 (T) where I(x) = x for x ∈ (0, 1).
Use Fourier series to show that Tp = −Tp∗ . Use the definition of weak derivatives to show
that T = −T0∗ is the weak derivative on H 1 ((0, 1)).
Exercise 13.10 (p. 494): Show first that (im(B))⊥ = {0} and deduce that B −1 is densely

defined. To see that B −1 is self-adjoint prove that B −1 u, v = u, B −1 v for u, v ∈ im(B)

and that B(B −1 )∗ u, v = hu, vi for any u ∈ D(B−1 )∗ and v ∈ H.
Exercise 13.11 (p. 494): To see that in (a) and (b) there are no other eigenfunctions than
the given ones, use elliptic regularity (Theorem 5.34 and Example 5.20) to conclude that
the eigenfunctions satisfy certain differential equations with boundary conditions.
Exercise 13.13 (p. 494): For (a) assume first that there exists a bound N on the number
of neighbours and show that Tinitial (f )((v1 , v2 )) = f (v1 ) and Tterminal (f )((v1 , v2 )) = f (v2 )
→
for any f ∈ L2 (V) and (v1 , v2 ) ∈ E defines a pair of bounded operators
→
Tinitial , Tterminal : L2 (V) −→ L2 ( E )
with T = Tterminal − Tinitial . For the converse consider functions f = δvn so that the
vertex vn ∈ V has more than n neighbours. In (b) the operator T ∗ is defined on a subset
→ P
of L2 ( E ) and maps g ∈ DT ∗ to T ∗ (g)(v) = w∼v g(w, v) − g(v, w) for all v ∈ V.
Exercise 13.14 (p. 494): In (a), show that DT ∗ is defined by Kirchhoff’s law: That is, a
function
M
f ∈ L2 (Q) = L2 (Se )
→
e∈ E
is in the domain of T∗ if each function fe = f |Se of f belongs to H 1 (Se ) ⊆ C(Se ) and

X X
fe (v) = fe (v)
→ →
e=(v,w)∈ E e=(w,v)∈ E
at every vertex v ∈ V. For (b) argue as in Section 6.4.2. For (c) show that the eigenfunctions
are on each interval defined by an appropriate trigonometric function that vanishes on
the three vertices that are not in the centre. Use the Kirchhoff condition in the centre
to find the constraint for the eigenvalues. Assume first that the ratios of the lengths are
incommensurable and reduce the counting to the counting of poles of another trigonometric
function.
Exercise 13.16 (p. 498): Using the natural unitary representation on H1 × H2 show
that Graph(T ) is invariant and that the operators B in the proofs of Theorems 13.9
and 13.15 commute with the unitary representation. Now apply Exercise 12.58(b).
Exercise 13.19 (p. 499): Show that the eigenvalues of H are unbounded, and then use
Exercise 4.29.
Exercise 13.21 (p. 500): By the spectral theorem (Theorem 13.15) it is sufficient to
consider the multiplication operator Mg for a real-valued function g on a finite measure
space.
Exercise 13.22 (p. 500): Use Exercise 13.21.
Exercise 13.23 (p. 501): For (a) take the inner product with v ∈ DS . For (b) simply
calculate kSv ± ivk2 ; for (c) and (d) notice that
(I − US )(Sv + iv) = Sv + iv − (Sv − iv) = 2iv
for all v ∈ DS .
Exercise 13.25 (p. 501): For (a) let v ∈ DS and w = Sv + iv ∈ DU for U = US so

that (I − U )w = 2iv and

SU (I − US )w = i(I + US )(w) = i(Sv + iv + Sv − iv) = 2iSv,
giving DSU = DS and SU = S. For (b), let w ∈ DU and v = w − U w ∈ DS for S = SU so

that Sv + iv = iw + iU w + iw − iU w = 2iw and
US (SU v + iv) = SU v − iv = iw + iU w − iw + iU w = 2iU w,
giving DUS = DU and US = U .
Exercise 13.26 (p. 501): To show that SU is self-adjoint apply Theorem 9.2 to see that
it is sufficient to consider unitary multiplication operators. Then apply Exercise 13.5(a).
Exercise 13.27 (p. 502): Show that U = T from Example 12.9 is an isometry defined on
the whole Hilbert space for which I − U is injective and im(I − U ) is dense, and U cannot
be extended to a unitary operator.
Exercise 13.28 (p. 502): By applying the Cayley transform (and its inverse) it is sufficient
to consider the associated partial isometries.
Exercise 13.30 (p. 502): Show that if g ∈ L2 ((0, 1)) satisfies
0 = hif ′ ± if, gi = i(hf ′ , gi ± hf, gi)
for all f ∈ H01 ((0, 1)) it follows that g ∈ H 1 ((0, 1)) satisfies the equation g ′ = ±g. Conclude
from this that n+ (S) = n− (S) = 1.
Exercise 14.16 (p. 523): First take the quotient of Cc (R) by the kernel of k · kΛ using
Lemma 2.15. Apply Theorem 2.32 to obtain the completion AΛ of the quotient of Cc (R).
Use the Banach algebra inequality in Theorem 14.6 to show that the convolution operation
extends to AΛ and gives it the structure of a Banach algebra. Now use (14.4) and the
automatic extension property in Proposition 2.59 to extend the canonical map from Cc (R)
to AΛ to a map from L1 (R) to AΛ .
Exercise 14.20 (p. 526): Argue as in the proof of Lemma 14.2.
Exercise 14.21 (p. 527): For (a) use the fact that the characters on the abelian

group (Z/q Z)× form an orthonormal basis of L2m (Z/q Z)× where m is the counting
1
measure multiplied by φ(q) (notice that this is a special case of Exercise 3.50). For (b)
1
notice that the coefficient of the trivial character in the Fourier expansion of fa is φ(q) .
Now combine the assumption and PNT itself in the form (14.1) (see also Exercise 14.3).
Exercise 14.22 (p. 527): Argue along the lines of the proof of Proposition 14.4, but use
Lemma 14.12 to control the error term (as we cannot use monotonicity).
Exercise 14.23 (p. 527): Estimate
X X Λ(n)
χ(n)Λ(n)
kf kχ = lim sup f (log n − h) 6 lim sup |f |(log n − h)
h→∞ n h→∞ n
and apply Lemma 14.12.
Exercise 14.24 (p. 527): For any b ∈ Z define Λb = Λfb = Λ1{k∈Z|k≡b (mod q)} so
P q−1 Pq−1
that Λ = b=0 Λb and νΛ = b=0 νΛb . Now use the convergence in Lemma 14.12 and
argue as in the proof of Proposition 14.14 to find a subsequence on which (λh νΛb ) converges
Pq−1
for all b = 0, . . . , q − 1. Finally, note that νχΛ = b=0 χ(b)νΛb .
Exercise 14.25 (p. 528): For (a) apply Abel summation (Lemma 14.33) with the
choices an = χ(n) and bn = log2 n and use the fact that |An | 6 q for all n > 1 to
see that
m
X m−1
X

An (bn − bn+1 ) 6 q (bn+1 − bn ) + qbm = 2qbm 6 2q log2 (1 + y)

n=1 n=1
where m = ⌊y⌋. For (b) use (14.26), re-order the summation, and apply (14.11). For (c)
argue as in Corollary 14.13 (using Corollary 14.13 to control errors).
Exercise 14.26 (p. 528): The argument is similar to the proof of the algebra inequality
for k · kΛ , but much simpler. Use Exercise 14.25(c) to obtain
Z

kf1 ∗ f2 kχ = lim sup f1 ∗ f2 dλh ρ
h→∞
1
where ρ is defined by dρ = t
d(νχΛ ∗ νχΛ ), and then repeat the argument for (14.22).
Exercise 14.27 (p. 528): Verify that the proof of Theorem 14.17 works if k · kΛ is simply
replaced by k · kχ throughout.
Exercise 14.28 (p. 528): Use the same f0 and f as in the corresponding case of The-
R
orem 14.18. If now kf kχ = f Dχ dm = kf k1 = kf0 k1 for the density Dχ from
Exercise 14.24, then |Dχ (t)| = 1 almost everywhere for t in [−2, 2]. However, this
forces Dχ ∈ im χ almost everywhere, which leads to a contradiction.
Exercise 14.29 (p. 529): Use (14.27) as a replacement for Mertens’ theorem in the proof
of the ξ = 0 case in Theorem 14.18. Use this, Exercise 14.27 and Exercise 14.28 to conclude
that k · kχ = 0 for any non-trivial Dirichlet character χ. Conclude by using Exercise 14.22.
|an |
Exercise 14.34 (p. 533): If ℜ(s) = σ > σ0 then annlog
s
n
≪ , which implies
P n(σ+σ0 )/2
an log n
that g = − n>1 ns
converges absolutely and uniformly on compact subsets of the
half plane {s ∈ C | ℜ(s) > σ0 }. Integrating g term-by-term along line segments shows
that f ′ = g.
(
1 for g(x) = 0,
Exercise B.20 (p. 561): Define dµ′ = |g| dµ and g ′ (x) =
arg g(x) for g(x) =
6 0.
Notes
(1) (Page v) This description — natural in light of the fact that there seem to be more than
seven hundred books in Mathematical Reviews whose title contains the phrase ‘Functional
Analysis’ — appears in the preface to the monograph of Aubin [2] and is doubtless older
than that.
(2) (Page 6) The Laplace operator is intimately connected with both geometry and phys-
ics. An elegant brief discussion in the notes of Arnold [1, Ch. 4] points out the connection
between the Laplace operator applied to a surface f : R2 → R with |f | small viewed as a
perturbation of a flat sheet, the area of the surface defined by f , and the work required
to bend the surface into this shape. An aspect we are not able to explore here — essential
to the physical meaning of the Laplace operator — is reflected in Arnold’s comment “The
enemies of physics define the Laplace operator in their mathematical textbooks by [rela-
tion (1.5)], which renders this physical object relativistically meaningless (it depends not
only on the function to which the operator is applied, but also on the choice of the coordin-
ate system). On the contrary, the operators [. . . ] and ∆ depend only on the Riemannian
metric and do not depend on the coordinate system.”
(3) (Page 24) The proof here is taken from a note by Väisälä [107], and the original result
is in a paper of Mazur and Ulam [70].

(4) (Page 36) The Dvoretzky–Rogers theorem [24], answering a question of Banach, states
that a Banach space is finite-dimensional if and only if every unconditionally convergent

series is absolutely convergent. The difficult part of this result is to show that in any
infinite-dimensional Banach space there is an unconditionally convergent series that is not
absolutely convergent. This is often relatively easy to show for a concretely given Banach
space (and in particular, for a Hilbert space) but in general requires analysis of the geometry
of convex bodies in Banach spaces.
(5) (Page 42) A more constructive proof can be given using Bernstein polynomials [10]
P
which are defined by Bf,n (x) = n n k
k=0 k x (1 − x)
n−k f ( k ) for any function f ∈ C([0, 1])
n
and n > 1. The original proof due to Weierstrass [109] uses convolutions with a Gaussian
heat kernel and is much closer in spirit to Exercise 3.68.
(6) (Page 43) The strongest result in this direction is Mergelyan’s theorem [71]. This
states that if X ⊆ C is a compact set for which CrX is connected, then any continuous
function X → C whose restriction to the interior X o is holomorphic, is a uniform limit
of a sequence of polynomials. Without the additional hypothesis that the function be
holomorphic on the interior the result is simply false, as indicated. If CrX is not connected
a similar result holds using rational functions instead of polynomials.

590 Notes
(7) (Page 59) Bergman [8] introduced the space of holomorphic functions in a complex
domain with sufficiently regular behaviour at the boundary to ensure they are absolutely
integrable. Part of their importance is that they are Banach spaces; we refer to the mono-
graph of Hedenmalm, Korenblum and Zhu [43] for an accessible treatment.
(8) (Page 77) In fact, the space Lp (X) (or ℓp (N)) is uniformly convex for any p in (1, ∞),
µ
but the proof for p in (1, 2) is more involved; we refer to Clarkson [18] for the details.
(9) (Page 82) The property that all closed subspaces are complemented in fact charac-
terizes Hilbert spaces in the following sense. Lindenstrauss and Tzafriri [63] showed that
if (V, k · k) is a Banach space in which every closed subspace is complemented then the
norm is equivalent to one induced by a scalar product.
(10) (Page 92) In fact the existence of a left-invariant Borel measure is closely related to
local compactness. Weil [110] showed that if a group has a left-invariant measure for which
a convolution can be defined, then there is a topology on the group with the property
that the completion of the group in that topology is locally compact, and the left-invariant
measure is essentially the Haar measure on the completion. Oxtoby [83], in investigating
what invariant measures can be found on groups that are not locally compact, showed that
a complete separable metric group possesses a left-invariant Borel measure if and only if
the group is locally compact and dense in itself.
(11) (Page 93) In particular, the convergence in L2 does not imply convergence of the
Fourier series at any given point, and a priori does not even imply convergence almost
everywhere. In the classical setting G = T, these questions have been of central importance.
Dirichlet proved that the Fourier series converges at each point if f ∈ C 1 (T), and Paul
du Bois-Reymond showed that there is a function f ∈ C(T) whose Fourier series diverges
at one point. Lusin conjectured that the Fourier series converges almost everywhere to
the function for f ∈ L2 (T), and Kolmogorov [56] found a function in L1 (T) whose Fourier
series diverges almost everywhere. Carleson [16] proved the convergence almost everywhere
for f ∈ L2 (T), an extremely difficult result later extended to f ∈ Lp (T) for p ∈ (1, ∞)
by Hunt [48]. We refer to Lacey [58] for a modern, approachable, account. The situation
is more complicated for functions on compact abelian groups, in part because there is no
canonical way to sum over the group of characters.
(12) (Page 95) This form of uncertainty principle is pointed out for finite cyclic groups
as part of a wider investigation by Donoho and Stark [23]. In the case where G is the
group Z/pZ for a prime p, Tao [104] proved the stronger result that
| Supp f | + | Supp fb| > |G| + 1,
but the proof requires methods in matrix theory beyond our scope.
(13) (Page 126) The Baire category theorem result is a powerful tool across much of topology
and analysis. It was shown by Osgood [81] for R, and independently by Baire [3] for Rd .
It was later applied in functional analysis by Banach and Steinhaus [4].
(14) (Page 128) This analogy is pursued in a monograph by Oxtoby [82], motivated by work
of Sierpiński [98] and Erdős [28], who showed that under the assumption of the continuum
hypothesis there is an injective function f : R → R with f = f −1 with the property
that f (A) is a null set if and only if A is of first category. An approach to constructing sets
with prescribed Diophantine approximation properties was given by Schmidt via what we
now call Schmidt games [93]. The simplest of these takes the following form: Let X be a
metric space, S ⊆ X any subset, and fix constants α, β ∈ (0, 1). The game is played as
follows: the first player, Bob, chooses any open ball B0 ⊆ X with radius ρ0 . Then Alice,
the second player, chooses a ball B1 ⊆ B0 with radius ρ1 = αρ0 . Bob then chooses a
ball B2 ⊆ B1 with radius ρ2 = βρ1 , Alice chooses a ball B3 ⊆ B2 with radius ρ3 = αρ2 ,
and so on. The intersection of all the balls Bn for n > 1 comprises a single point x. If x ∈ S
then Alice wins the game, if not Bob wins. If Alice can force a victory, then the set S is
called (α, β)-winning, and S is said to be α-winning if it is (α, β)-winning for all β ∈ (0, 1).
Clearly S needs to be dense if it is (α, β)-winning, and it may be shown that there are
Notes 591
some null sets that are also meagre and α-winning. Moreover, any countable intersection
of α-winning sets is again α-winning.
(15) (Page 143) This is shown by Meyers and Serrin [72]; if the closure is taken of functions
that are smooth up to the boundary then the situation is different. We refer to Evans [30]
for an accessible account.
(16) (Page 182) Horn’s conjecture [47], which was proved in two parts, one by Klyachko [53]
and the other by Knutson and Tao [54] says the following. If A and B are Hermitian n × n
matrices, then an ordered triple (I, J, K) of subsets of {1, . . . , n} with the same cardinality
is called admissible if the inequality
X X X
λi (A + B) 6 λi (A) + λi (B)
i∈I i∈J i∈K
holds. Horn’s conjecture was that all such admissable inequalities together with the trace
identity (6.13) characterize the possible eigenvalues of pairs of Hermitian matrices and their
sum. We refer to the survey article by Knutson and Tao [55] for the details and references.
(17) (Page 182) This is one of a large number of results in matrix analysis and its applica-
tions by Weyl [111]. Courant and Hilbert [20, p. 286] give this inequality the following phys-
ically intuitive meaning, familiar to anyone who has used a stringed musical instrument: If
a dynamical system stiffens, then the frequency of its fundamental tone or resonance, and
that of all the overtones, increases.
(18) (Page 196) This relation between semi-norms and traces may be found in the work of
Bernstein and Reznikov [9].

(19) (Page 199) We refer to Courant and Hilbert [20] for a thorough classical treatment of
Bessel functions.
(20) (Page 202) Weyl’s motivation came from a problem in black body radiation, though it
was well understood at the time that the mathematical questions also arose in the theory
of vibrations. The result was foreshadowed by Lord Rayleigh [100] in 1877, who used a
three-dimensional lattice point counting problem to count vibrational modes in a cube,
allowing him to asymptotically count the number of ‘overtones’. Somerfeld and Lorentz
conjectured in 1910 that in fact the quantity was also independent of the shape, giving
the context in which Weyl proved this remarkable theorem. Weyl also gave error terms in
dimensions 2 and 3, and conjectured the form of a second term in terms of the area of the
boundary of U in dimension 3.
(21) (Page 202) Milnor [73] noted that a remarkable pair of lattices in R16 constructed by
Witt [115] gives rise to a pair of 16-dimensional tori that have the same eigenvalues but
different shapes. Much later Gordon, Webb, and Wolpert [40] exhibited two non-convex
polygons in R2 with the same eigenvalues but different shapes. In the positive direction,
Zelditch [117] showed that the answer to Kac’s question is yes for a large class of convex
subsets of R2 with analytic boundary.
(22) (Page 215) A sequence (z ) of complex numbers with |z | < 1 for all n > 1 is said
n P∞ n
to satisfy the Blaschke condition [12] if n=1 (1 − |zn |) < ∞; in this case the Blaschke
Q |z |
product B(z) = ∞ n zn −z
n=1 zn 1−zn z , where the product is taken over all n with zn 6= 0, and
with a factor z if zn = 0, is analytic in the open unit disk and vanishes at each zn . Finite
Blaschke products as used here may be characterized as the analytic functions on the open
unit disk with continuous extension to the closed unit disk.
(23) (Page 223) This was shown by Banach and Tarski in 1924 [5]; we refer to the monograph
of Wagon [108] for more details, other related paradoxical decompositions, and the history
of this kind of result.
(24) (Page 226) This alludes to the observation that if every room is occupied in a hotel with
infinitely many rooms then a new guest can always be accommodated. If there are countably
many rooms, this is done by moving each guest to the ‘next’ room: “Sobald nun ein neuer
Gast hinzukommt, braucht der Wirt nur zu veranlassen, dass jeder der alten Gäste das
592 Notes
Zimmer mit der um 1 höheren Nummer bezieht, und es wird für den Neuangekommenen
das Zimmer 1 frei” (from a lecture of Hilbert in 1924; see [31, p. 730]).
(25) (Page 262) We refer to Parthasarathy [84] for a more detailed treatment of the theory
of probability measures on compact metric spaces, and to [27, Ch. 4] for material on
equidistribution from a dynamical point of view.
(26) (Page 267) This is the Kryloff–Bogoliouboff Theorem [57], and it means that a con-
tinuous transformation on a compact metric space always gives rise to one (and perhaps
to many) measure-preserving systems.
(27) (Page 296) The theory of distributions is of central importance in partial differen-
tial equations, where it sometimes allows solutions to be found in the sense of distribu-
tions when they cannot be readily found in the classical sense (as seen in Chapters 5
and 6). The theory of generalized functions was initiated by Sobolev [99] to provide weak
solutions to certain partial differential equations, and then developed systematically by
Schwartz [94], [95].
(28) (Page 342) The uncertainty principle has many extensions, generalizations, and applic-
ations. We would struggle to do better than to quote Folland and Sitaram [34] both for its
extensive bibliography and for its elegant description: “The uncertainty principle is partly
a description of a characteristic feature of quantum mechanical systems, partly a state-
ment about the limitations of one’s ability to perform measurements on a system without
disturbing it, and partly a meta-theorem in harmonic analysis that can be summed up as
follows. A non-zero function and its Fourier transform cannot both be sharply localized.”
(29) (Page 353) The approach developed here is close to that of von Neumann, whose
lectures on the original work of Haar are now available in a convenient form [80].
(30) (Page 403) Margulis’ argument showed in particular that the quotients SL (Z)/Λ by
3
finite index subgroups Λ are (via a standard graph structure on them) an expander family.
To prove this, we will discuss unitary representations of the group SL3 (Z) (that is, actions
of SL3 (Z) by unitary transformations on a Hilbert space). There is also a family of certain
finite quotients SL2 (Z)/Λ that give an expander family, but the proof of this lies deeper
and goes beyond what we will be able to cover. We refer to the monographs of Sarnak [92],
Lubotzky [66] or the notes [26] for the details.
(31) (Page 419) This is the simplest result in the topic of automatic continuity, which
asks for algebraic conditions on Banach algebras A and B that ensure that any algebra
homomorphism χ : A → B is continuous. We refer to the monograph of Dales [21] for a
thorough account.
(32) (Page 439) We refer to the monograph of Lubotzky [66, Sec. 4.5] and the papers of
Kesten [52] and Buck [14] for more details (and for generalizations to other Cayley graphs).
(33) (Page 442) The reader should not confuse the word ‘classical’ with ‘outdated’. Apart
from playing an important role in approximation theory and differential equations these
relations and the resulting polynomials are, in part because of their relation to regular
trees, of great importance for number theory and related areas; we refer to the work of
Lindenstrauss on arithmetic quantum unique ergodicity [62] for a striking instance of this.
(34) (Page 503) The priority for the elementary proof and for some of the steps toward it
is contested; we refer to Goldfeld [39] for a detailed account.

(35) (Page 503) The conventional complex-analytic proof of the PNT involves showing that
P 1
it is equivalent to the non-vanishing of the Riemann zeta function s 7→ n>1 ns on
the line ℜ(s) = 1. The question of error rates in the prime number theorem also involves
behaviour of the Riemann zeta function on the critical line ℜ(s) = 21 , and hence ultimately
the Riemann hypothesis itself.
(36) (Page 538) Some of these are well explained in the monograph of Wagon [108]; a par-
ticularly striking one is the existence of paradoxical decompositions, which was discussed
in Section 7.2.3.
References
1. V. I. Arnold, Mathematical understanding of nature (American Mathematical Soci-

ety, Providence, RI, 2014).
2. J.-P. Aubin, Applied functional analysis, in Pure and Applied Mathematics (New
York) (Wiley-Interscience, New York, second ed., 2000).
3. R. Baire, ‘Sur les fonctions de variables réelles’, Annali di Mat.(3) III (1899), 1–123.
4. S. Banach and H. Steinhaus, ‘Sur le principe de la condensation de singularités’,
Fundamenta 9 (1927), 50–61.
5. S. Banach and A. Tarski, ‘Sur la décomposition des ensembles de points en parties
respectivement congruentes’, Fund. Math. 6 (1924), 244–277.
6. B. Bekka, P. de la Harpe, and A. Valette, Kazhdan’s property (T), in New Mathem-
atical Monographs 11 (Cambridge University Press, Cambridge, 2008).
7. F. Benford, ‘The law of anomalous numbers’, Proc. Amer. Phil. Soc. 78 (1938), no. 4,
551–572.
8. S. Bergman, The Kernel Function and Conformal Mapping, in Mathematical Sur-
veys, No. 5 (American Mathematical Society, New York, N. Y., 1950).
9. J. Bernstein and A. Reznikov, ‘Sobolev norms of automorphic functionals’, Int. Math.
Res. Not. (2002), no. 40, 2155–2174.
10. S. N. Bernstein, ‘Démonstration du Théorème de Weierstrass fondée sur le calcul des
Probabilités’, Comm. Soc. Math. Kharkov 2 XIII (1912), no. 1, 1–2.
11. B. Blackadar, Operator algebras, in Encyclopaedia of Mathematical Sciences 122
(Springer-Verlag, Berlin, 2006). Theory of C ∗ -algebras and von Neumann algebras,
Operator Algebras and Non-commutative Geometry, III.
12. W. Blaschke, ‘Eine Erweiterung des Satzes von Vitali über Folgen analytischer Funk-
tionen’, Berichte über die Verhandlungen der Königlich-Sächsischen Gesellschaft der
Wissenschaften zu Leipzig, Mathematisch-Physische Klasse 67 (1915), 194–200.
13. N. Bourbaki, ‘Sur certains espaces vectoriels topologiques’, Ann. Inst. Fourier Gren-
oble 2 (1950), 5–16 (1951).
14. M. W. Buck, ‘Expanders and diffusers’, SIAM J. Algebraic Discrete Methods 7
(1986), no. 2, 282–304.
15. C. Carathéodory, Vorlesungen über reelle Funktionen, in Third (corrected) edition
(Chelsea Publishing Co., New York, 1968).
16. L. Carleson, ‘On convergence and growth of partial sums of Fourier series’, Acta
Math. 116 (1966), 135–157.
17. A. L. Cauchy, Sur l’equation á l’aide de laquelle on détermine les inégalités séculaires
des mouvements des planétes, in Oeuvres Complétes (IInd Série), 9 (Gauthier–
Villars, 1829).
18. J. A. Clarkson, ‘Uniformly convex spaces’, Trans. Amer. Math. Soc. 40 (1936), no. 3,
396–414.

594 References
19. J. B. Conway, A course in functional analysis, in Graduate Texts in Mathematics 96

(Springer-Verlag, New York, second ed., 1990).
20. R. Courant and D. Hilbert, Methods of mathematical physics. Vol. I (Interscience
Publishers, Inc., New York, N.Y., 1953).
21. H. G. Dales, Banach algebras and automatic continuity, in London Mathematical
Society Monographs. New Series 24 (The Clarendon Press Oxford University Press,
New York, 2000). Oxford Science Publications.
22. F. Diamond and J. Shurman, A first course in modular forms, in Graduate Texts in
Mathematics 228 (Springer-Verlag, New York, 2005).
23. D. L. Donoho and P. B. Stark, ‘Uncertainty principles and signal recovery’, SIAM J.
Appl. Math. 49 (1989), no. 3, 906–931.
24. A. Dvoretzky and C. A. Rogers, ‘Absolute and unconditional convergence in normed
linear spaces’, Proc. Nat. Acad. Sci. U. S. A. 36 (1950), 192–197.
25. M. Einsiedler and T. Ward, Homogeneous dynamics and applications.
http://www.personal.leeds.ac.uk/~mattbw. In preparation.
26. M. Einsiedler and T. Ward, Unitary representations and spectral gap.
http://www.personal.leeds.ac.uk/~mattbw. In preparation.
27. M. Einsiedler and T. Ward, Ergodic theory with a view towards number theory, in
Graduate Texts in Mathematics 259 (Springer-Verlag London Ltd., London, 2011).
28. P. Erdös, ‘Some remarks on set theory’, Ann. of Math. (2) 44 (1943), 643–646.
29. P. Erdös, ‘On a new method in elementary number theory which leads to an element-
ary proof of the prime number theorem’, Proc. Nat. Acad. Sci. U. S. A. 35 (1949),
374–384.
30. L. C. Evans, Partial differential equations, in Graduate Studies in Mathematics 19
(American Mathematical Society, Providence, RI, second ed., 2010).
31. W. Ewald and W. Sieg (eds.), David Hilbert’s lectures on the foundations of arith-
metic and logic 1917–1933, in David Hilbert’s Foundational Lectures 6 (Springer-
Verlag, Berlin, 2013).
32. G. B. Folland, A course in abstract harmonic analysis, in Studies in Advanced Math-
ematics (CRC Press, Boca Raton, FL, 1995).
33. G. B. Folland, Real analysis, in Pure and Applied Mathematics (New York) (John
Wiley & Sons Inc., New York, second ed., 1999). Modern techniques and their ap-
plications, A Wiley-Interscience Publication.
34. G. B. Folland and A. Sitaram, ‘The uncertainty principle: a mathematical survey’,
J. Fourier Anal. Appl. 3 (1997), no. 3, 207–238.
35. M. Fornasier and D. Toniolo, ‘Fast, robust and efficient 2d pattern recognition for
re-assembling fragmented images’, Pattern Recognition 38 (2005), 2074–2087.
36. H. Furstenberg, ‘Strict ergodicity and transformation of the torus’, Amer. J. Math.
83 (1961), 573–601.
37. H. Furstenberg, ‘Disjointness in ergodic theory, minimal sets, and a problem in Dio-
phantine approximation’, Math. Systems Theory 1 (1967), 1–49.
38. O. Gabber and Z. Galil, ‘Explicit constructions of linear-sized superconcentrators’,
J. Comput. System Sci. 22 (1981), no. 3, 407–420.
39. D. Goldfeld, ‘The elementary proof of the prime number theorem: an historical per-
spective’, in Number theory (New York, 2003), pp. 179–192 (Springer, New York,
2004).
40. C. Gordon, D. Webb, and S. Wolpert, ‘Isospectral plane domains and surfaces via
Riemannian orbifolds’, Invent. Math. 110 (1992), no. 1, 1–22.
41. A. Gorodnik and A. Nevo, The ergodic theory of lattice subgroups, in Annals of
Mathematics Studies 172 (Princeton University Press, Princeton, NJ, 2010).
42. G. Greschonig and K. Schmidt, ‘Ergodic decomposition of quasi-invariant probability
measures’, Colloq. Math. 84/85 (2000), no. 2, 495–514.
43. H. Hedenmalm, B. Korenblum, and K. Zhu, Theory of Bergman spaces, in Graduate
Texts in Mathematics 199 (Springer-Verlag, New York, 2000).
References 595
44. S. Helgason, Differential geometry, Lie groups, and symmetric spaces, in Pure and
Applied Mathematics 80 (Academic Press, Inc. [Harcourt Brace Jovanovich, Publish-
ers], New York-London, 1978).
45. E. Hewitt and K. A. Ross, Abstract harmonic analysis. Vol. I, in Grundlehren der
Mathematischen Wissenschaften 115 (Springer-Verlag, Berlin, second ed., 1979).
46. D. Hilbert and E. Schmidt, Integralgleichungen und Gleichungen mit unendlich vielen
Unbekannten, in Teubner-Archiv zur Mathematik [Teubner Archive on Mathematics],
11 (BSB B. G. Teubner Verlagsgesellschaft, Leipzig, 1989). Edited and with a fore-
word and afterword by A. Pietsch.
47. A. Horn, ‘Eigenvalues of sums of Hermitian matrices’, Pacific J. Math. 12 (1962),
225–241.
48. R. A. Hunt, ‘On the convergence of Fourier series’, in Orthogonal Expansions and
their Continuous Analogues (Proc. Conf., Edwardsville, Ill., 1967), pp. 235–255
(Southern Illinois Univ. Press, Carbondale, Ill., 1968).
49. H. Iwaniec and E. Kowalski, Analytic number theory, in American Mathematical
Society Colloquium Publications 53 (American Mathematical Society, Providence,
RI, 2004).
50. M. Kac, ‘Can one hear the shape of a drum?’, Amer. Math. Monthly 73 (1966), no. 4,
part II, 1–23.
51. J. L. Kelley, General topology (Springer-Verlag, New York-Berlin, 1975). Reprint of
the 1955 edition [Van Nostrand, Toronto, Ont.], Graduate Texts in Mathematics, No.
27.
52. H. Kesten, ‘Symmetric random walks on groups’, Trans. Amer. Math. Soc. 92 (1959),
336–354.
53. A. A. Klyachko, ‘Stable bundles, representation theory and Hermitian operators’,
Selecta Math. (N.S.) 4 (1998), no. 3, 419–445.
54. A. Knutson and T. Tao, ‘The honeycomb model of GLn (C) tensor products. I. Proof
of the saturation conjecture’, J. Amer. Math. Soc. 12 (1999), no. 4, 1055–1090.
55. A. Knutson and T. Tao, ‘Honeycombs and sums of Hermitian matrices’, Notices
Amer. Math. Soc. 48 (2001), no. 2, 175–186.
56. A. Kolmogorov, ‘Une série de Fourier-Lebesgue divergente presque partout’, Funda-
menta math. 4 (1923), 324–328.
57. N. Kryloff and N. Bogoliouboff, ‘La théorie générale de la mesure dans son application
à l’étude des systèmes dynamiques de la mécanique non linéaire’, Ann. of Math. (2)
38 (1937), no. 1, 65–113.
58. M. T. Lacey, ‘Carleson’s theorem: proof, complements, variations’, Publ. Mat. 48
(2004), no. 2, 251–307.
59. P. D. Lax, Functional analysis, in Pure and Applied Mathematics (New York) (Wiley-
Interscience [John Wiley & Sons], New York, 2002).
60. C. G. Lekkerkerker, Geometry of numbers, in Bibliotheca Mathematica, Vol.
VIII (Wolters-Noordhoff Publishing, Groningen; North-Holland Publishing Co.,
Amsterdam-London, 1969).
61. V. B. Lidskiı̆, ‘Non-selfadjoint operators with a trace’, Dokl. Akad. Nauk SSSR 125
(1959), 485–487.
62. E. Lindenstrauss, ‘Invariant measures and arithmetic quantum unique ergodicity’,
Ann. of Math. (2) 163 (2006), no. 1, 165–219.
63. J. Lindenstrauss and L. Tzafriri, ‘On the complemented subspaces problem’, Israel
J. Math. 9 (1971), 263–269.
64. M. Loève, Probability theory. I, in Graduate Texts in Mathematics 45 (Springer-
Verlag, New York, fourth ed., 1977).
65. M. Loève, Probability theory. II, in Graduate Texts in Mathematics 46 (Springer-
Verlag, New York, fourth ed., 1978).
66. A. Lubotzky, Discrete groups, expanding graphs and invariant measures, in Modern
Birkhäuser Classics (Birkhäuser Verlag, Basel, 2010). With an appendix by Jonathan
D. Rogawski, Reprint of the 1994 edition.
596 References
67. G. A. Margulis, ‘Explicit constructions of expanders’, Problemy Peredači Informacii

9 (1973), no. 4, 71–80.
68. G. A. Margulis, ‘Explicit constructions of expanders’, Problems of Information Trans-
mission 9 (1975), no. 4.
69. F. I. Mautner, ‘Geodesic flows on symmetric Riemann spaces’, Ann. of Math. (2) 65
(1957), 416–431.
70. S. Mazur and S. Ulam, ‘Sur les transformationes isométriques d’espaces vectoriels
normés’, C. R. Math. Acad. Sci. Paris 194 (1932), 946–948.
71. S. N. Mergelyan, ‘Uniform approximations to functions of a complex variable’, Amer.
Math. Soc. Translation 1954 (1954), no. 101, 99.
72. N. G. Meyers and J. Serrin, ‘H = W ’, Proc. Nat. Acad. Sci. U.S.A. 51 (1964),
1055–1056.
73. J. Milnor, ‘Eigenvalues of the Laplace operator on certain manifolds’, Proc. Nat.
Acad. Sci. U.S.A. 51 (1964), 542.
74. H. Minkowski, Geometrie der Zahlen, in Bibliotheca Mathematica Teubneriana, Band
40 (Johnson Reprint Corp., New York, 1968).
75. C. C. Moore, ‘The Mautner phenomenon for general unitary representations’, Pacific
J. Math. 86 (1980), no. 1, 155–169.
76. C. H. Müntz, ‘Über den Approximationssatz von Weierstraß’, Schwarz-Festschr.
(1914), 303–312.
77. M. R. Murty, Problems in analytic number theory, in Graduate Texts in Mathematics
206 (Springer-Verlag, New York, 2001). Readings in Mathematics.
78. J. v. Neumann, ‘Proof of the quasi-ergodic hypothesis’, Proc. Nat. Acad. Sci. U.S.A.
18 (1932), 70–82.
79. J. v. Neumann, ‘On a certain topology for rings of operators’, Ann. of Math. (2) 37
(1936), no. 1, 111–115.
80. J. v. Neumann, Invariant measures (American Mathematical Society, Providence,
RI, 1999).
81. W. F. Osgood, ‘Non-uniform convergence and the integration of series term by term.’,
Amer. J. Math. 19 (1897), 155–190.
82. J. C. Oxtoby, Measure and category. A survey of the analogies between topological and
measure spaces (Springer-Verlag, New York, 1971). Graduate Texts in Mathematics,
Vol. 2.
83. J. C. Oxtoby, ‘Invariant measures in groups which are not locally compact’, Trans.
Amer. Math. Soc. 60 (1946), 215–237.
84. K. R. Parthasarathy, Probability measures on metric spaces, in Probability and Math-
ematical Statistics, No. 3 (Academic Press Inc., New York, 1967).
85. F. Peter and H. Weyl, ‘Die Vollständigkeit der primitiven Darstellungen einer
geschlossenen kontinuierlichen Gruppe’, Math. Ann. 97 (1927), no. 1, 737–755.
86. R. R. Phelps, Lectures on Choquet’s theorem, in Lecture Notes in Mathematics 1757
(Springer-Verlag, Berlin, second ed., 2001).
87. M. S. Pinsker, ‘On the complexity of a concentrator’, in Proceedings of the Seventh
International Teletraffic Congress (Stockholm, 1973), 318 (1973), 318/1–318/4. un-
published.
88. M. S. Pinsker, ‘On the complexity of a concentrator’, Problems of Information Trans-
mission 9 (1975), no. 4, 325–332.
89. M. S. Raghunathan, Discrete subgroups of Lie groups (Springer-Verlag, New York,
1972).
90. J. G. Ratcliffe, Foundations of hyperbolic manifolds, in Graduate Texts in Mathem-
atics 149 (Springer, New York, second ed., 2006).
91. M. Reed and B. Simon, Methods of modern mathematical physics. II. Fourier ana-
lysis, self-adjointness (Academic Press [Harcourt Brace Jovanovich, Publishers], New
York-London, 1975).
92. P. Sarnak, Some applications of modular forms, in Cambridge Tracts in Mathematics
99 (Cambridge University Press, Cambridge, 1990).
References 597
93. W. M. Schmidt, ‘On badly approximable numbers and certain games’, Trans. Amer.
Math. Soc. 123 (1966), 178–199.
94. L. Schwartz, Théorie des distributions. Tome I, in Actualités Sci. Ind., no. 1091 =
Publ. Inst. Math. Univ. Strasbourg 9 (Hermann & Cie., Paris, 1950).
95. L. Schwartz, Théorie des distributions. Tome II, in Actualités Sci. Ind., no. 1122 =
Publ. Inst. Math. Univ. Strasbourg 10 (Hermann & Cie., Paris, 1951).
96. A. Selberg, ‘An elementary proof of the prime-number theorem’, Ann. of Math. (2)
50 (1949), 305–313.
97. J.-P. Serre, A course in arithmetic (Springer-Verlag, New York-Heidelberg, 1973).
Translated from the French, Graduate Texts in Mathematics, No. 7.
98. W. Sierpiński, ‘Sur les fonctions jouissant de la propriété de Baire de fonctions con-
tinues’, Ann. of Math. (2) 35 (1934), no. 2, 278–283.
99. S. Soboleff, ‘Méthode nouvelle à résoudre le problème de Cauchy pour les équations
linéaires hyperboliques normales’, Rec. Math. [Mat. Sbornik] N.S. 1(43) (1936), no. 1,
39–72.
100. J. W. Strutt, The theory of sound. Second edition, revised and enlarged. Volume II.
(London: Macmillan. 520 S. 8◦ , 1896) (English). 3rd Baron Rayleigh.
101. M. Takesaki, Theory of operator algebras. I (Springer-Verlag, New York, 1979).
102. M. Talagrand, ‘Pettis integral and measure theory’, Mem. Amer. Math. Soc. 51
(1984), no. 307, ix+224.
103. T. Tao, A Banach algebra proof of the prime number theorem; Urysohn’s lemma; The
prime number theorem in arithmetic progressions; Elementary multiplicative number
theory (https://terrytao.wordpress.com ). Accessed: 29th October 2015.
104. T. Tao, ‘An uncertainty principle for cyclic groups of prime order’, Math. Res. Lett.
12 (2005), no. 1, 121–127.
105. T. Tao, An introduction to measure theory, in Graduate Studies in Mathematics 126
(American Mathematical Society, Providence, RI, 2011).
106. F. Trèves, Topological vector spaces, distributions and kernels (Academic Press, New
York, 1967).
107. J. Väisälä, ‘A proof of the Mazur–Ulam theorem’, Amer. Math. Monthly 110 (2003),
no. 7, 633–635.
108. S. Wagon, The Banach-Tarski paradox, in Encyclopedia of Mathematics and its Ap-
plications 24 (Cambridge University Press, Cambridge, 1985). With a foreword by
Jan Mycielski.
109. K. Weierstrass, ‘Über die analytische Darstellbarkeit sogenannter willkürlicher Func-
tionen einer reellen Veränderlichen’, Verl. d. Kgl. Akad. d. Wiss. Berlin 2 (1885),
633–639.
110. A. Weil, L’intègration dans les groupes topologiques et ses applications, in Actual.
Sci. Ind., no. 869 (Hermann et Cie., Paris, 1940).
111. H. Weyl, ‘Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Dif-
ferentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung)’,
Math. Ann. 71 (1911), 441–479.
112. H. Weyl, ‘Über die Gleichverteilung von Zahlen mod Eins’, Math. Ann. 77 (1916),
313–352.
113. E. T. Whittaker and G. N. Watson, A course of modern analysis, in Cambridge
Mathematical Library (Cambridge University Press, Cambridge, 1996).
114. N. Wiener, ‘Tauberian theorems’, Ann. of Math. (2) 33 (1932), no. 1, 1–100.
115. E. Witt, ‘Eine Identität zwischen Modulformen zweiten Grades’, Abh. Math. Sem.
Hansischen Univ. 14 (1941), 323–337.
116. D. Witte Morris, Introduction to arithmetic groups (Deductive Press, 2015).
117. S. Zelditch, ‘Spectral determination of analytic bi-axisymmetric plane domains’,
Geom. Funct. Anal. 10 (2000), no. 3, 628–677.
598
Notation
N, natural numbers, v {·}, fractional part of a real num-

N0 , non-negative integers, v ber, 51
Z, integers, v B(V, W ), bounded linear maps V
Q, rational numbers, v to W , 55
R, real numbers, v B(V ), bounded linear maps V to V ,
C, complex numbers, v 55
ℜ(·), ℑ(·), real and imaginary parts, V ∗ , continuous linear functionals
v on V , 55
≪, o, O, relations between growth H p (D), Hardy space, 59
in functions, vi Ap (D), Bergman space, 59
k(φ), matrix of rotation through φ Y ⊥ , orthogonal complement of Y
on R2 , 2 in a Hilbert space, 79
SO2 (R), group of rotations of the hSi, linear hull of S, 82
plane, 3 µ1 ⊥ µ2 , µ1 and µ2 are mutually
χn , character φ 7→ e2πinφ , 3 singular, 83
∆, Laplace operator, 6 P(X), set of all subsets of X, 90
Sd−1 , (d − 1) unit sphere in Rd , 7 Hχ , weight space associated to char-
C (X), continuous functions on X, acter χ, 110
16 Gδ , countable intersection of open
ℓ1 (N), space of summable sequences, sets, 127
20 H k (Td ), Sobolev space, 135
cc , space of finitely supported se- nα , shorthand for (nα αd
1 , . . . , nd ),
1
quences, 20 138
Lµ1 (X), space of integrable func- ∆, weak Laplace operator, 154
tions, 21 K(V, W ), K(V ), space of compact
B(X), space of bounded functions, operators, 168
27 A∗ , adjoint of operator A, 175
Cb (X), space of continuous bounded HS(H), space of Hilbert–Schmidt
functions, 27 operators on H, 195
C0 (X), space of continuous func- ωd , volume of the unit ball in Rd ,
tions vanishing at infin- 202
ity, 27 TdR , scaled torus, 202
λcount , counting measure, 28 F2 , free group on two generators,
L ∞ (X), space of bounded meas- 225
urable functions, 29 ΣX , simple integrable functions on
c0 , space of null sequences, 39 X, 235
CR (X), CC (X), real- and complex- T∗ µ, push-forward of a measure,
valued continuous func- 265
tions, 42 D(U ), space of distributions on U ,
1A , indicator function of the set A, 296
50 L1 , space of equivalence classes of
integrable functions, 297
NOTATION 599
Cc∞ , space of smooth compactly P1 , projective line, 382

supported functions, 297 AG , adjacency matrix for a graph,
L1loc , space of equivalence classes 403
of locally integrable func- MG , averaging operator for a graph,
tions, 297 404
ρ(a), resolvent set of operator a,
P(X), space of Borel probability 409
measures on X, 304
B(H), algebra of bounded oper-
D(U ), space of test functions, 312 ators on a Hilbert space,
S (X), space of Schwartz functions
417
on X, 312
βN, Stone–Čech compactification
L2 (X, µ), alternative for L2µ (X), of N, 421
space of square-integrable
σdisc , discrete spectrum, 433
functions, 313 σappt , approximate point spectrum,
δa,b , function equal to 1 if a = b 434
and 0 if not, 316 σapprox , approximate spectrum, 434
UT , unitary operator associated to σcont , continuous spectrum, 434
T , 327 σresid , residual spectrum, 434
xα , shorthand for the monomial σess , essential spectrum, 437
αd
xα1 · · · xd , 340
1
d
S (R ), Schwartz space of func- Λ, von Mangoldt function, 504
tions on Rd , 341
ý
Λ2 , second von Mangoldt function,
π : G H, unitary representa- 509
tion, 344 Bε (·), ε ball in a metric space, 539
P1 (G), positive-definite functions lim F , limit of a convergent filter,
on a group, 345 540
λg , left regular representation, 353 limF f , convergence along a filter,
λg , shift in domain, 362 540
AB, product of sets in a group, 371 σ(C), σ-algebra generated by C,
HG , subspace of invariant vectors 551
in a unitary representa-
tion, 375
600
General Index
Abel summation formula, 529 almost

absolute convergence, 32 everywhere, 22, 555
absolutely continuous, 83 injective, 437
absorbent, 293 invariant, 363, 376
adjacency matrix, 403 vector, 375
adjoint, 12, 175 surjective, 437
densely defined operator, 488 amenable, 362
operator, 175 fixed point of action, 373
densely defined, 488 group, 70, 219
star operator, 417 admits Følner sets, 362
admits Følner sets, 362 Følner sets, 371
affine left-invariant mean, 362
action, 373 quotient, subgroup, 220
map, 24 Reiter condition, 363
subspace, 66 growth in groups, 374
alert reader, 390, 414, 493 radical, 221, 262
algebra, 48 analytic, 414
Banach, 61 Blaschke product, 591
automatic continuity, 592 boundary, 591
bounded operators, 433 methods, 503
C ∗ , 417 power series, 323
continuous functions, 62 strongly, 260
dual space, 418 weakly, 260
Gelfand dual, 418 annihilator
Gelfand transform, 422 double, 484
homomorphism, 448 approximate
ideal, 168 eigenvalue, 495
integrable functions, 333 eigenvector, 434
maximal ideal, 419 identity, 102
von Neumann, 431 invariant measure, 266
spectral radius, 410 point spectrum, 434
spectrum, 409 spectrum, 434
unital, 409 Arzela–Ascoli theorem, 39
C ∗ , 417 locally compact, 42
normal element, 417 asymptotic, 504
self-adjoint element, 417 atom, 314
star operator, 417 atomic measure, 314
Calkin, 170, 437 automatic continuity, 592
commutative, 61 averaging operator, 404
homomorphism, 418 axiom of choice, 223, 537
von Neumann, 311, 431
algebraically complemented, 82 Baire category theorem, 126
topological, 127
GENERAL INDEX 601
balanced, 293 invariant, 579

ball, 6, 18 measure, 310
closed, 77 uniquely determined, 304
metric, 402 base field, 296
non-compactness, 38 Benford’s law, 51
open, 26 Bergman space, 590
unit, 17 Bernstein polynomial, 589
Banach Bessel
Alaoglu theorem, 256 equation, 200
algebra, 61 function, 200
automatic continuity, 592 bidual, 213
bounded operators, 433 isometric embedding, 213
C ∗ , 417 bilinear pairing, 227
continuous functions, 62 binary relation, 541
dual space, 418 reflexive, transitive, filter prop-
examples, 62 erty, 541
field, 411 black box, 91, 184, 187
Gelfand dual, 418 Blaschke product, 591
Gelfand transform, 422 Bochner
generated, 410 integral, 111
homomorphism, 448 theorem, 346, 381
ideal, 168 for Rd , 346
integrable functions, 118, 333 Borel
inverse of an element, 411 measure, 170
maximal ideal, 419 probability measure, 373
von Neumann, 431 set, 240
resolvent, 411 σ-algebra, 28, 51
spectral radius, 410 Lusin’s theorem, 558
spectrum, 409 boundary
unital, 62, 409 conditions, 66
without a unit, 421 Dirichlet, 199, 494
limit, 217 Neumann, 494
space, 4 periodic, 494
compact operator, 174 of a set in a graph, 402
dual, 70, 209 value problem, 67
reflexive, 113, 209, 214 Dirichlet, 8
topology, 253 bounded
trace-class norm, 195 derivative, 28
uniformly convex, 77 function, 16, 29
Steinhaus theorem, 121 functional, 418
application to Fourier ana- linear operator, 55
lysis, 123 extension, 58
Tarski paradox, 223 operator, 55
barycentre, 304 compact, 169
existence, 305 sequence, 517
602 GENERAL INDEX
limit, 70 graph theorem, 131

linear hull, 82, 213
C ∗ -algebra, 417 operator, 131
normal element, 417 set, 538
spectral radius formula, 418 coarser
self-adjoint element, 417 filter, 540
star operator, 417 strictly, 300
Calkin algebra, 170 topology, 539
Carathéodory extension theorem, coercive, 81
242, 552 common refinement, 471
category commutative
first, 126, 590 algebra, 419
second, 126 ideal, 419
Cauchy quotient, 419
formula, 330 Banach algebra, 418
inequality, 276 unital, 419
inequality with an ε, 276 C ∗ -algebra, 423
integral formula, 260, 414, 564 diagram, 317
integration, 411 ring, 420
interlacing theorem, 183 compact, 545
Schwarz inequality, 72 integral operator, 170, 177
sequence, 27, 545 intersection property, 545
convergent subsequence, 33 operator, 167–169
cautious reader, 204 Hilbert–Schmidt, 64, 171
Cayley transform, 500 ideal in a Banach algebra,
Čech, 421 168
Césaro average, 218 preserved by limits, 169
chain, 90 regularity property, 169
character, 3 spectral theorem, 177
modular, 371 sequentially, 545
multiplicative, 527 totally bounded, 546
separate points, 92 Tychonoff, 546
weight, 110 compactification
characteristic one-point, vi
function, 50 Stone–Čech, 421
smooth approximate, 151 complemented, 82
polynomial, 12 algebraically, 82
cheat, 296, 477 topologically, 82
Chebyshev polynomial of the second complete, 27, 88
kind, 442 diagonalizability, 178
Choquet’s theorem, 306 metric space
circle rotation, 267 Baire category theorem, 126
clopen, 240 completely multiplicative function,
set, 538 529
closable operator, 487 completeness of characters, 92
closed
GENERAL INDEX 603
completion, 36, 38, 58 absorbent, 293

Hardy space, 59 balanced, 293
metric space, 38 space, 76
unique extension, 58 strictly, 24
compression, 183 convolution, 96, 118
concave, 182, 306 associative, 369
conditional Dirichlet, 507
convergence, 36 multiplicative, 507
expectation, 85, 86 operator, 115
conjugate Courant–Fischer–Weyl theorem, 182
exponent, 97, 557 cover, 40
space, 209 finite subcover, 545
connected open, 545
graph, 401 co-volume, 393
network, 400 cross-section, 227
content, 241 cyclic
continuous Hilbert space, 316
addition, 18 operator, 458
extension, 139 representation, 316
function, 539 generator, 316
functions dense in Lp , 51 subspace, 317
group action, 107 vector, 455, 456, 590
scalar multiplication, 18
sequentially, 259 decay
uniformly, 40 at infinity, 563
convergence boundary, 143
absolute, 32 Fourier transform, 340
Banach space, 33 super-polynomial, 341
almost everywhere, 590 deficiency index, 502
conditional, 36 δ-measure, 350
equivalent norm, 30 dense, 48
Fourier series, 197, 590 densely defined operator, 487
in measure, 295 adjoint, 488
measure, 70 self-adjoint, 488
strong, 123 diagonal group, 392
unconditional, 36 diameter, 401
uniform, 27, 122 differentiable vector, 351
convergent differential equation
convex combination, 304 fundamental solution, 69
filter, 539 partial, 6, 8, 9, 200, 592
sequence, 539 directed set, 541
convex, 15 tail field, 541
combination, 15, 304 Dirichlet
hull, 301 boundary condition, 199, 494
set, 70 boundary problem
wave equation, 10
604 GENERAL INDEX
boundary value problem, 8, 135, essentially

152, 161, 165 bounded, 337
character, 527 self-adjoint, 502
convolution, 507 Euler totient function, 526
kernel, 99, 100 even function, part, 1
Neumann bracketing, 202 expander
series, 529, 533 family, 402
theorem, 526 graph, 375, 400
discrete spectrum, 433 logarithmically small diameter,
distribution, 296 402
divergence theorem, 162 expectation, 342
dual conditional, 85, 86
Banach algebra, 418 exponential growth, 374
Gelfand, 418 extension, 488
space, 209 bounded linear operator, 58
duality, 473 Carathéodory theorem, 552
Dvoretzky–Rogers theorem, 589 continuous, 139
dynamical systems, 327 natural, 90
operator, 146
edge, 400 Tietze theorem, 549
eigenvalue, 12, 167 extremal subset, 301
cancellation, 157 extreme, 266
variational characterization, 181 point, 70
eigenvector, 12, 167
approximate, 434 false hope, 314
elliptic Fejér kernel, 100, 102
differential operator, 153 Fekete’s lemma, 415
regularity, 69, 135, 153, 155 Fell topology, 376
on the torus, 157 filter, 539
entire function, 414 compactness, 546
equicontinuous, 39 convergence, 540
equidistributed, 48 convergence along, 540
probability measure, 263 convergent, 254
sequence of measures, 263 finer, 547
equivalence relation, 225 finer, coarser, 540
equivalent norm, 18, 131 neighbourhood, 540
ergodic, 265 tail, 540
circle rotation, 267 finer, 540
mean theorem, 175, 261 filter, 540
relation to indecomposable, 265 topology, 539
theory, 304 finite intersection property, 302
essential first category, 126, 590
radius, 437 Følner
range, 410, 443 sequence, 222
spectrum, 437 sets, 362
supremum norm, 29
GENERAL INDEX 605
Fourier distribution, 330

analysis, 4 Gδ -set, 127
back transform, 475 Gelfand
coefficient, 4 dual, 418
inversion theorem, 335 and Pontryagin dual, 426
series, 4 Pettis integral, 113
convergence, 590 transform, 422
convergence almost every- not an isometry, 428
where, 590 generalized function, 296
diverges almost everywhere, generic point, 328
590 geometric series, 65
non-convergent, 125 geometry of numbers, 392
transform, 329, 426, 428 Gram–Schmidt procedure, 88
Gaussian, 330 non-separable, 90
not an isometry, 428 graph, 400, 437
Fredholm operator, 437 adjacency matrix, 403
frequency variable, 330 averaging operator, 407
Friedrichs extension, 499 boundary of a set, 402
Fréchet connected, 401, 404
Riesz theorem, 80 diameter, 401
space, 295, 312, 341 edge, vertex, 400
Fubini theorem, 270, 556 expander, 401–403
function family, 405
continuous, 539 k-regular, 400
even, 1 Laplace operator, 404
even, odd part linear operator, 131
uniqueness, 1 metric, 401, 494
generalized, 296 oriented edge, 494
odd, 1 path, 401
simple, 553 quantum, 495
integral, 553 regular tree, 438
test, 296 simple, 400
weight, 3 sparsity, 400
functional, 209, 296 spectral gap, 405
calculus, 324, 454 undirected, 400, 437
continuous, 444 Weyl law, 495
measurable, 444 Green function, 67
fundamental group
domain, 150, 384 action, 2
solution, 69 affine, 373
tone, 591 associated unitary operator,
fundamental frequency, 4 107
measure-preserving, 107
gauge function, 211 amenable, 70, 219
Gauss elimination, 378 character, 3
Gaussian, 479
606 GENERAL INDEX
continuous action, 107 orthogonal complement, 79

diagonal, 392 orthogonal projection, 80
left action, 118 real, 74
orthogonal, 392 Hölder
topological, 91, 353 conjugate, 97, 234
unipotent, 392 inequality, 97, 234, 557
unitary representation, 107 Holmgren operator, 174
growth, 374 homogeneous, 211
polynomial, exponential, inter- ordinary differential equation,
mediate, 374 63
Horn conjecture, 182
Haar measure, 92, 353, 396 Howe–Moore theorem, 311
Hadamard three-lines theorem, 234 hull
Hahn–Banach closed linear, 82
lemma, 209 linear, 82
theorem, 211 hyperplane, 305
Hamel basis, 88 hypersurface, 396
Hardy space, 59, 81, 89
harmonic ideal, 419
function, 152, 161 commutative algebra, 419
mean value principle, 161 maximal, 419
weak, 154 proper, 419
weakly, 154, 156 impatient reader, 334
harmonic function, 8 inclusion–exclusion principle, 508
Hausdorff space, 539 induced representation, 385
Hausdorff–Young inequality, 340 induction, 384
heat inequality
equation, 7, 8, 10 Cauchy–Schwarz, 72
kernel, 589 Hölder, 234, 557
Heine–Borel theorem, 20 Jensen, 109
Hellinger–Toeplitz theorem, 132 triangle, 539, 557
Herglotz infinite
theorem, 252 dimensional, 167
positive-definite sequence, 314 Hilbert space, 88
torus, 320 dimensional space, 38
Hermitian, 176 intersection property, 545
matrix, 176 initial
Hilbert topology, 542
hotel, 226 values, 63
Schmidt inner product, 71
norm, 195 space, 71
operator, 64, 171 sesqui-linear, 72
space, 71 strict positivity, 71
infinite-dimensional, 88 integral
norm is strictly sub-additive, Bochner, 111
76
GENERAL INDEX 607
Gelfand–Pettis, 113 equation, 8

Lebesgue, 551 operator, 6
operator, 62 compact self-adjoint, 198
compact, 170 eigenfunction, 157
kernel, 64 elliptic regularity, 155
Pettis, 113 graph, 404
simple function, 553 infinitesimal, 6
strong, 368 regular tree, 437
weak, 113 lattice, 384
interlacing theorem, 183 Minkowski’s first theorem, 393
intermediate growth, 374 Lax–Milgram lemma, 81
invariant Lebesgue
ergodic measure, 265 decomposition, 83, 322
measure, 265 integral, 111, 551
irreducible spectrum, 318, 327
representation, 458 left
unitary representation, 345, 458, action, 118
498 Haar measure, 92
isometry, 24 inverse, 130
isomorphic left-invariant
unitarily, 313 functional, 219
isomorphism mean, 362
isometric, 58 topological, 368
Iwasawa decomposition, 393 measure, 590
Haar measure, 396 Leibniz’ rule, 150
LF space, 312
Jensen inequality, 565 Lidskiı̆’s theorem, 193
joining, 327 limit
Jordan Banach, 217
block, 167 topology, 542
measurable, 196 Lindenstrauss–Tzafriri theorem, 590
normal form, 12 linear
kernel, 22, 64 functional, 55, 209
Dirichlet, 99, 100 hull, 82
Fejér, 100 closed, 82
heat, 589 operator, 55
Landau, 104 extension, 58
reproducing, 81 order, 538
semi-norm, 22 Lipschitz
Kirchhoff’s law, 585 condition, 58
Krein–Milman theorem, 301 constant, 18
locally
λ-semi-norm, 505 compact group, 473
Landau kernel, 104 bidual, 473
Laplace constant, 153
608 GENERAL INDEX
convex ergodic, 265

extreme point, 301 locally finite, 51, 108, 239
topology, 292 preserving, 107
vector space, 293 preserving system, 265
finite, 51 disjoint, 328
measure, 108 probability, 262
H k , 155 projection-valued, 468
Lp , 155 push-forward, 265
Lusin’s theorem, 60, 558 regular, 240, 558
σ-finite, 83
von Mangoldt function, 504, 528
space, 553
second, 509 σ-finite, 553
Mantegna fresco, 11 spectral, 444
matrix Mergelyan’s theorem, 589
adjacency, 403 meromorphic extension, 533
coefficient, 311, 344
Mertens’ theorem, 503
compression, 183 metric, 539
diagonal, 12 graph, 494
group, 361
pseudo, 539
Hermitian, 176, 181, 182 space, 539
permutation, 406 completion, 38
rotation, 2 separable, 41
self-adjoint, 176 metrizable
Mautner phenomenon, 378, 381
group, 239, 353
maximal subset, 305
element, 538 topology, 257, 304
ideal, 420
min-max principle, 182
spectral type, 321 Minkowski
maximum modulus theorem, 235 first theorem, 393
Mazur–Ulam theorem, 24 theorem
meagre, 126 Carathéodory’s form, 306
mean, 362
miracle, 248
ergodic theorem, 175, 261 mixing, 311
value principle, 161 Möbius inversion, 508
measurable
modular character, 360, 371
functional calculus, 444, 462, de Moivre formula, 566
464 multiplication operator, 176
Jordan, 202 self-adjoint, 176
measure spectrum, 410, 435
absolutely continuous, 83
multiplicative
atomic, 314, 326 convolution, 507
Borel, 176, 265 function, 529
equivalence class, 321
inverse, 419
Fourier coefficient, 314 unit, 409
Haar, 353 multiplicity, 323
invariant, 107, 265
GENERAL INDEX 609
Müntz’ theorem, 214 spectral theorem, 177

densely defined, 487
natural extension, 90 adjoint, 488
neighbourhood, 538 self-adjoint, 488
filter, 540 eigenvalues, 68
net, 541 equality, 488
von Neumann Fredholm, 437
algebra, 311, 431 Hilbert–Schmidt, 64, 171
series, 64 integral, 64, 171
Neumann boundary conditions, 494 compact, self-adjoint, 177
non-diagonal spectral measure, 461 Laplace, 438
norm, 16 multiplication, 176
defines a metric, 18 norm, 55
equivalent, 18 partial differential, 284
operator, 55 positive, 191, 450
pseudo-, 21 restriction, 147
semi-, 21 self-adjoint, 132
trace-class, 195 spectral theorem, 177
uniformly convex, 77 summing, 438
normal symmetric, 498
space, 547 trace-class, 183
topological space, 547 unbounded, 487
normed uniformly elliptic, 284
linear space unitary, 175, 313
dual, 209 spectral theory, 314
space unitary multiplication, 313
bidual, 213 order
vector space linear, 538
conjugate, 209 partial, 538
inner product, 71 ordinary differential equation, 62
nowhere dense, 126 homogeneous, 63
nuclear space, 312 initial value, 63
null set, 555 Sturm–Liouville, 66
numerical range, 181 Volterra, 64
odd function, part, 1 oriented
one-point compactification, vi edge, 494
open path, 414, 416
cover, 545 orthogonal
mapping theorem, 126 complement, 79
neighbourhood, 539 group, 392
set, 538 projection, 80, 185
operator orthonormal, 86
averaging, 407, 438 basis, 88
closable, 487
compact, 168 pairing, 227
610 GENERAL INDEX
paradoxical decomposition, 223 pre-dual, 254, 303

parallelogram identity, 74, 564 pre-Hilbert space, 71
characterizes Hilbert space, 74 prime number theorem, 503, 525
Parseval error rate, 503
formula, 95, 97 principal matrix coefficient, 344
theorem, 93 probability
partial distribution, 362
differential equation, 6, 8, 9, measure, 70, 262, 265
200, 592 equidistribution, 263
heat equation, 7 ergodic, 265
wave equation, 10 invariant, 265
differential operator, 133, 157, space, 553
284 product topology, 542
isometry, 501 projection, 82
order, 538 bounded, 82
maximal element, 538 orthogonal, 80, 185
reflexive, 538 valued measure, 325, 468
transitive, 538 projective
partition, 112 topology, 542
finer, 471 property (T), 375–377
of unity, 161, 549 relative, 375
approximate, 356 spectral gap, 376
path, 401 pseudo-metric, 21, 539
permutation matrix, 406 pseudo-norm, 21
Pettis integral, 113 push-forward, 265
phase shift, 332 Pythagoras’ theorem, 79
Plancherel formula, 337, 339, 479
PNT, 504 quantum graph, 495
Poisson summation formula, 342
polar decomposition, 192, 453, 561 Radon
polarization identity, 74, 325, 564 measure, 239, 353, 514
polynomial vague topology, 515
Bernstein, 589 Nikodym derivative, 83
characteristic, 12 σ-finite case, 85
growth, 374 signed measure, 520
amenable, 374 random walk, 437
trigonometric, 48 reduced word, 225
Pontryagin duality, 120, 426, 473 reflection, 25
positive operator, 191, 450 reflexive, 113, 209, 214, 254
positive-definite reflexivity, 473
function, 344, 346 regular
boundedness, 345 graph, 400
sequence, 314 connected, 404
Herglotz’s theorem, 315 measure, 240, 558
power set, 90 representation, 376, 478
regularity
GENERAL INDEX 611
elliptic, 69, 153, 155 semi-inner product, 499

Laplace operator, 135 semi-linearity, 72
on the torus, 157 semi-norm, 21
Sobolev, 156 absorbent set, 298
Reiter condition, 363 continuous, 22
relative trace, 196 defines a norm, 22
Rellich’s theorem, 198 Fréchet
representation space, 295
induced, 385 kernel, 22
induction, 384 locally convex topology, 292
irreducible, 345, 458, 498 strong operator topology, 290
left regular, 368 sequence, 539
regular, 478 dense, 48
unitarily isomorphic, 344 equidistributed, 48
unitary, 107 sequentially
reproducing kernel, 81 compact, 545
residual spectrum, 434 continuous, 259
resolvent, 411, 413 sesqui-linearity, 72
function, 413, 416 set
set, 409, 412, 500 clopen, 538
resonance, 11, 591 closed, 538
restriction operator, 147 open, 538
trace, 148 theory, 537
Riemann Siegel set, 398
hypothesis, 503 simple
integral, 111, 112, 368 approximation, 473
sum, 112 function, 235, 553
zeta function, 532, 592 simultaneous spectral theorem, 181
Riemann–Lebesgue lemma, 334 smooth
Riesz representation, 239 boundary, 148
locally compact, 248 partition of unity, 161
on C(X), 248 Sobolev
Riesz–Thorin interpolation, 233 embedding theorem, 135, 139,
150
Schauder basis, 88
regularity, 156
Schmidt game, 590 space, 8, 135
Schur lemma, 458 space variable, 330
Schwartz space, 341 spectral
second category, 126 gap, 375, 405
Selberg symmetry formula, 503, 507
measure, 317, 324, 444, 453
self-adjoint, 167 atomic, 327
integral operator, 177 continuity, 381
operator, 132
non-diagonal, 461
densely defined, 488 radius, 410, 418
positive, 191 resolution, 181
spectral theorem, 177
612 GENERAL INDEX
simultaneous theorem, 181 support of a function, 549

theorem, 176 symmetric
compact self-adjoint oper- operator, 498
ator, 177
proof, 180 tail filter, 540, 541
theory, 12, 70 Taylor
unbounded self-adjoint, 487 approximation, 6, 43
unitary flow, 344 coefficient, 89
unitary operator, 314 expansion, 89
unitary operators, 313 test function, 141, 296
spectrum, 410 thermal equilibrium, 8
approximate, 434 Tietze’s extension theorem, 61, 549
approximate point, 434 Toeplitz’s theorem, 255
discrete, point, 433 Tonelli’s theorem, 556
essential, 437 topological
essential range, 443 complement, 82
Lebesgue, 318 group, 91, 353
pure discrete, 327 abelian, 91
residual, 434 character, 92
stadium, 165, 568 space
star operator, 417 compact, 545
star-shaped, 144 neighbourhood, 538
Stone normal, 547
Čech compactification, 421 vector space, 295
theorem, 351 topology, 538
Weierstrass theorem, 42, 49 coarser, weaker, 539
locally compact, 47 finer, 543
strictly convex, 24 induced from a subset, 541
strong initial, limit, projective, weak,
analytic, 260 542
convergence, 123, 261 locally convex, 292
integral, 111, 112, 114, 368 product, 542
operator topology, 290 strong, 253
topology, 253 strong operator, 290, 469
stronger topology, 539 stronger, finer, 539
Sturm–Liouville ultra-strong operator, 312
boundary value problem ultra-weak operator, 312
integral operator, 69 uniform convergence on com-
equation, 66, 68, 69, 167 pact sets, 429
sub-additive uniform operator, 290
sequence, 415 vague, 515
strictly, 24, 76 weak, 253
sub-multiplicative sequence, 415 operator, 292
super-polynomial decay, 341 weak*, 254
superposition, 330 torsion element, 431
total
GENERAL INDEX 613
derivative, 6, 491 group, 392

weight, 354 subgroup, 378
totally unique
bounded, 169, 245, 546 best approximation, 77
compact, 545 ergodicity, 267
disconnected, 240, 241 extension, 58
cover, 245 prime factorization, 508
ordered set, 90 solution, 66
trace, 147, 183 unit, 62
map, 182 ball, 19
relative, 196 compact in weak* topology,
restriction operator, 148 256
trace-class ball is non-compact, 38, 39, 89
compact, 187 sphere, 259
norm, 183, 187, 195 unital, 62
operator, 183 Banach algebra, 409
tree unitary, 167, 175
averaging operator, 438 flow, 344
Laplace operator, 438 isomorphism, 313
regular, 438 operator, 107, 175
summing operator, 438 representation, 107
triangle inequality, 539 cyclic, 316
Lp norm, 557 irreducible, 345, 458, 498
trick, 396 upper envelope, 306
trigonometric polynomial, 48, 96 upper-semicontinuous, 306, 307
Tychonoff’s theorem, 41, 546 Urysohn’s lemma, 547
ultrafilter, 546, 547 vague topology, 515
convergent, 547 vector space
uncertainty principle, 342, 592 inner product, 71
unconditional convergence, 36 norm, 16
uniform topological, 295
boundedness, 121 vertex, 400
convergence, 27 Volterra equation, 64
convergence on compact sets,
294 Walsh system, 95
operator topology, 290 wave equation, 10
uniformly Dirichlet boundary problem,
continuous, 40 10
convergent, 122 vibrating string, 11
convex norm, 77 weak
convex space, 76 analytic, 260
distributed, 48 function is analytic, 260
elliptic operator, 284 derivative, 141
unimodular, 361 harmonic, 154
unipotent
614 GENERAL INDEX
integral, 113, 114 Landau kernel, 104

operator topology, 292 weight, 110
topology, 252, 253, 542 of a function, 3
weak* topology, 252, 254 space, 110
weaker topology, 539 Weyl
weakly harmonic, 156 law, 196, 202
function, 154 monotonicity principle, 182
Weierstrass Wiener lemma, 428
approximation theorem, 42
Zorn’s lemma, 538

Funcional

Uploaded by

Copyright:

Available Formats

Funcional

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Funcional

Uploaded by

Copyright:

Available Formats

Graduate Texts in Mathematics

Alejandro Adem, University of British Columbia

More information about this series at http://www.springer.com/series/136

Functional Analysis, Spectral

ISSN 0072-5285 ISSN 2197-5612 (electronic)

Library of Congress Control Number: 2017946473

© Springer International Publishing AG 2017

Printed on acid-free paper

This Springer imprint is published by Springer Nature

Hilbert Spaces & Fourier Series

Unitary Operators 9 Banach Algebras

Prime Number Theorem Unbounded Operators

2 Norms and Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.6 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3 Hilbert Spaces, Fourier Series, Unitary Representations . . 71

4 Uniform Boundedness and the Open Mapping Theorem . . 121

5 Sobolev Spaces and Dirichlet’s Boundary Problem . . . . . . . . 135

6 Compact Self-Adjoint Operators, Laplace Eigenfunctions . 167

7 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

8 Locally Convex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

8.2.2 Elliptic Regularity for the Laplace Operator . . . . . . . . . 270

9 Unitary Operators and Flows, Fourier Transform . . . . . . . . . 313

10 Locally Compact Groups, Amenability, Property (T) . . . . . 353

11 Banach Algebras and the Spectrum . . . . . . . . . . . . . . . . . . . . . . . 409

12 Spectral Theory and Functional Calculus . . . . . . . . . . . . . . . . . 433

13 Self-Adjoint and Symmetric Operators . . . . . . . . . . . . . . . . . . . 487

14 The Prime Number Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503

Appendix A: Set Theory and Topology . . . . . . . . . . . . . . . . . . . . . . . 537

Appendix B: Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551

Hints for Selected Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563

1.1 From Even and Odd Functions to Group

We recall the following elementary notions of symmetry and anti-symmetry

chapter or return to it later, as convenient.

© Springer International Publishing AG 2017 1

In order to generalize this observation, recall that an action of a group G

In order to define another action on R2 we write

for the matrix of anti-clockwise rotation on R2 by the angle φ ∈ R. We

We say that a complex-valued function on R2 has weight n for this action

for every v ∈ R2 and n = 0, . . . , q − 1. Since

R2 , we see that fn has weight n. By the geometric series formula,

for every v ∈ R2 . Therefore, f can be written as a finite sum of functions of

where fn has weight n. However, in contrast to (1.1) this is an infinite sum,

To summarize, we will introduce classes of functions (which will be ex-

1.2 Partial Differential Equations and the Laplace

There is no need to motivate the study of differential equations, as they are of

for a smooth function f : Rd → R because of the following simple observation.

Proposition 1.5 (Laplace and neighbourhood averages). Let U ⊆ Rd

for any x ∈ U , where dy denotes integration with respect to the Lebesgue

Proof. Suppose for simplicity of notation that x = 0, and apply Taylor

as y → 0, where f ′ (0) is the total derivative of f at 0, and we use the

using the substitution y = rz. It follows that

Combining this with (1.6) gives

For completeness, we calculate the value of c using d-dimensional spherical

v ∈ Sd−1 = {w ∈ Rd | kwk = 1}.

Using this substitution we have

where the integration with respect to v uses the (d − 1)-dimensional volume

1.2.1 The Heat Equation

The heat equation describes how temperatures in a region U ⊆ Rd (represent-

the temperature at each point and the temperature in a neighbourhood of the