Harmonic Analysis

A Course in Harmonic Analysis
Jason Murphy
Missouri University of Science and Technology
Contents
1 Introduction 5
2 Fourier analysis, part I 8

2.1 Separation of variables . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Fourier series in general . . . . . . . . . . . . . . . . . . . . . 10
2.3 Fourier series, revisited . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Convergence of Fourier series . . . . . . . . . . . . . . . . . . 15
2.5 The Fourier transform . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Remarks about pointwise convergence . . . . . . . . . 27
2.6 Applications to PDE . . . . . . . . . . . . . . . . . . . . . . . 29
2.7 The Fourier transform of distributions . . . . . . . . . . . . . 31
2.8 The Paley–Wiener theorem . . . . . . . . . . . . . . . . . . . 33
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Fourier analysis, part II 39

3.1 Sampling of signals . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Discrete Fourier transform . . . . . . . . . . . . . . . . . . . . 44
3.3 Fast Fourier transform . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Compressed sensing . . . . . . . . . . . . . . . . . . . . . . . 54
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4 Abstract Fourier analysis 76

4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Locally compact abelian groups . . . . . . . . . . . . . . . . . 79
4.3 Compact groups . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2
CONTENTS 3
5 Wavelet transforms 99
5.1 Continuous wavelet transforms . . . . . . . . . . . . . . . . . 99
5.2 Discrete wavelet transforms . . . . . . . . . . . . . . . . . . . 112
5.3 Multiresolution analysis . . . . . . . . . . . . . . . . . . . . . 123
5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6 Classical harmonic analysis, part I 131

6.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.2 Some classical inequalities . . . . . . . . . . . . . . . . . . . . 141
6.3 Hardy–Littlewood maximal function . . . . . . . . . . . . . . 147
6.4 Calderón–Zygmund theory . . . . . . . . . . . . . . . . . . . . 156
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7 Classical harmonic analysis, part II 166

7.1 Mihlin multiplier theorem . . . . . . . . . . . . . . . . . . . . 166
7.2 Littlewood–Paley theory . . . . . . . . . . . . . . . . . . . . . 169
7.3 Coifman–Meyer multipliers . . . . . . . . . . . . . . . . . . . 182
7.4 Oscillatory integrals . . . . . . . . . . . . . . . . . . . . . . . 188
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
8 Semiclassical and microlocal analysis 205

8.1 Semiclassical analysis . . . . . . . . . . . . . . . . . . . . . . . 205
8.2 Microlocal analysis . . . . . . . . . . . . . . . . . . . . . . . . 222
8.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9 Sharp inequalities 235

9.1 Sharp Gagliardo–Nirenberg . . . . . . . . . . . . . . . . . . . 235
9.2 Sharp Sobolev embedding . . . . . . . . . . . . . . . . . . . . 245
9.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
10 Restriction theory and related topics 253

10.1 Restriction theory . . . . . . . . . . . . . . . . . . . . . . . . 253
10.2 Strichartz estimates . . . . . . . . . . . . . . . . . . . . . . . 259
10.3 Application: local well-posedness for NLS . . . . . . . . . . . 266
10.4 More restriction theory . . . . . . . . . . . . . . . . . . . . . . 268
10.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
11 Additional topics 273

11.1 Linear Scattering Theory . . . . . . . . . . . . . . . . . . . . 273
11.1.1 Solving the free Helmholtz equation . . . . . . . . . . 277
11.1.2 Existence and uniqueness for the Helmholtz equation . 280
4 CONTENTS
11.1.3 The distorted Fourier transform . . . . . . . . . . . . 282
A Prerequisite material 285

A.1 Lebesgue spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 285
A.2 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
A.3 Analysis tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
A.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
Chapter 1
Introduction
These notes were written to accompany the courses Math 6461 and Math
6462 (Harmonic Analysis I and II) at Missouri University of Science & Tech-
nology.
The goal of these notes is to provide an introduction into a range of top-
ics and techniques in harmonic analysis, covering material that is interesting
not only to students of pure mathematics, but also to those interested in
applications in computer science, engineering, physics, and so on. We will
focus on giving an overall sense of the available results and the analytic tech-
niques used to prove them; in particular, complete generality or completely
optimal results may not always be pursued. Technical details will sometimes
be left to the reader to work out as exercises; solving these exercises is an
important part of solidifying the reader’s understanding of the material. At
times we will not develop the full theory but rather give a survey of results,
along with citations to references containing full details.
These notes are organized as follows:
• In Chapter 2, we introduce Fourier series, motivating their develop-

ment through an application to solving PDE (a common theme for
us). We then develop the Fourier transform, also providing some ap-
plications to PDE. Other topics are discussed, including questions of
pointwise convergence, the Fourier transform on distributions, and the
Paley–Wiener theorem.
• In Chapter 3 we discuss the topic of sampling of signals (e.g. the

Shannon–Nyquist theorem), as well as the discrete and fast Fourier
transform. We close this chapter with a discussion of compressed sens-
ing, providing a relatively complete presentation of the result of Can-
5
6 CHAPTER 1. INTRODUCTION
des, Romberg, and Tao [4] on reconstruction of signals using randomly

sampled Fourier coefficients.
• In Chapter 4, we present a survey of results in abstract Fourier trans-

form, relying primarily on the textbook of Folland [10]. In particular,
we demonstrate how many of the preceding topics may be viewed un-
der the same umbrella (i.e. Fourier analysis on locally compact abelian
groups). Most results are stated without proof. We then briefly discuss
the case of Fourier analysis on compact groups and present a few im-
portant examples in detail (namely, SU (2) and SO(n) for n ∈ {3, 4}).
• In Chapter 5, we discuss the continuous and discrete wavelet trans-

forms, as well as the notion of multiresolution analysis. In addition to
wavelet transforms, we also frequently discuss the ‘windowed’ Fourier
transform. Our primary reference is the book of Daubechies [7]. This
chapter provides a relatively brief introduction into a very rich topic
with a wide range of applications.
• In Chapter 6, we begin discussing what I have called ‘classical’ har-

monic analysis (although this distinction of ‘classical’ versus ‘modern’
should not be taken too seriously). This includes the theory of interpo-
lation of linear operators, some ‘classical inequalities’ (like convolution
inequalities and Sobolev embedding), the Hardy–Littlewood maximal
function (and vector maximal function), and finally the Calderón–
Zygmund theory for singular integral operators.
• In Chapter 7, we continue the study of ‘classical’ topics in harmonic

analysis. We firstly prove the Mihlin multiplier theorem. We then de-
velop Littlewood–Paley theory (including the Littlewood–Paley square
function estimate and some fractional calculus estimates). After this,
we will prove the non-endpoint cases of the Coifman–Meyer multiplier
theorem. Finally, we study oscillatory integrals (proving, for exam-
ple, the stationary phase theorem and providing some applications to
PDE).
• In Chapter 8, we begin our study of more ‘modern’ topics in harmonic

analysis. We begin with a study of semiclassical analysis. This is
not actually a subfield of harmonic analysis; however, it is closely
related due to the frequent analysis of oscillatory integrals. We give
a brief introduction based on the textbook of Martinez [20]. In the
semiclassical setting, we get as far as the proof of L2 boundedness
7
for pseudodifferential operators. We then give a short introduction to

microlocal analysis, still following the presentation of [20].
• In Chapter 9, we continue our study of ‘modern’ topics and turn to

the question of sharp inequalities and existence of optimizers. We
consider two examples, namely, the Gagliardo–Nirenberg inequality
and Sobolev embedding. For Gagliardo–Nirenberg, we present a proof
based on radial decreasing rearrangements and the compactness of the
radial Sobolev embedding. For Sobolev embedding, we present a proof
based on profile decompositions, thus giving a short introduction into
‘concentration-compactness’ techniques (which have come to play an
important role in the setting of nonlinear PDE).
• In Chapter 10, we prove some basic results in ‘restriction theory’. This

refers to the question of when it makes sense to restrict a function’s
Fourier transform to a surface. We begin with a result due to Strichartz
for the paraboloid. This result can be interpreted as a space-time
estimate for solutions to the linear Schrödinger equation. We take a
slight detour to prove a wider range of such estimates (which now go
by the name of Strichartz estimates). We then return to restriction
theory and prove the ‘Tomas–Stein’ result (up to the endpoint) for the
case of the sphere.
• In Chapter 11, we will collect some additional topics. Currently, this

section contains only a partial result towards pointwise convergence of
Fourier series.
• Finally, in the appendix, we have collected some prerequisite material

for the reader’s reference.
The material from these notes has been drawn from many different
sources. In addition to the references listed in the bibliography, this in-
cludes the author’s personal notes from a harmonic analysis course given by
M. Visan at UCLA.
The author gratefully acknowledges support from the University of Mis-
souri Affordable and Open Educational Resources Initiative Award, as well
as the students of Math 6461-6462 for their useful feedback and corrections.
Chapter 2
Fourier analysis, part I
2.1 Separation of variables

Consider the following partial differential equation (PDE):

2 (t, x) ∈ (0, ∞) × (0, 1)
∂t u = ∂x u

u(0, x) = f (x) x ∈ (0, 1) (2.1)

u(t, 0) = u(t, 1) = 0 t ∈ [0, ∞),

where f : (0, 1) → R is some given function. This is the well-known heat

equation. This is an example of an initial-value problem (the solution is
specified at t = 0), as well as a boundary-value problem (the values of the
solution are prescribed at the boundary of the spatial domain (0, 1)).
One approach to solving PDEs like this is the method of separation of
variables, which entails looking for separated solutions of the form
u(t, x) = p(t)q(x).
Using (2.1) and rearranging, we find that for u to be a solution we must

have
−p0 (t) −q 00 (x)
= .
p(t) q(x)
As the left-hand side depends only on t and the right-hand side depends
only on x, we are led to the problem
p0 (t) = −λp(t) and − q 00 (x) = λq(x) for some constant λ.
The equation for p is solvable for any λ; indeed, p(t) = e−λt p(0) does the
job. The problem for q is more interesting, since in addition to the ordinary
8
2.1. SEPARATION OF VARIABLES 9
differential equation (ODE) it must also satisfy the boundary conditions.

One finds that there are solutions only for special choices of λ, namely,
λ = (nπ)2 for some integer n > 0. A corresponding solution is then given
by q(x) = sin(nπx).
What we have therefore discovered is that the method of separation of
variables yields a countable family of solutions to the heat equation satisfying
the boundary conditions, namely
2
e−(nπ) t sin(nπx)cn for any n>0 and cn ∈ R.
Furthermore, any linear combination of these solutions solves the PDE and
satisfies the boundary condition. Therefore, we can solve the initial-value
problem in (2.1) provided we can find cn such that
∞
X
f (x) = cn sin(nπx).
n=1
The possibility of finding such expansions and understanding the sense in
which they hold are precisely the questions addressed by the study of Fourier
series.
Remark 2.1.1. Separation of variables has led us to the ‘eigenvalue problem
for the Dirichlet Laplacian’. That is, we were led to look for eigenvalues of
−∂x2 on the domain (0, 1) with eigenfunctions q satisfying the ‘Dirichlet’
boundary conditions q(0) = q(1) = 0. This is a good perspective to keep
in mind if one wishes to generalize the discussion above to a wider class of
equations and boundary conditions.
In Section 2.3 we will prove the following theorem.
Theorem 2.1.2. Let F : [−L, L] → C satisfy F (−L) = F (L) and assume
F ∈ L2 ([−L, L]). There exist cn ∈ C such that F can be expanded in the
Fourier series
X
F (x) = cn en (x) as functions in L2 ([−L, L]),
n∈Z
where
inπx
en (x) := e L .
The Fourier coefficients cn are given by
Z L
1
cn = hF, en i = 2L F (x)ēn (x) dx.
−L
We may also write
cn = F̂ (n).
10 CHAPTER 2. FOURIER ANALYSIS, PART I
Using this result, one can recover sine series and cosine series, which
are often used in the solution of PDEs via separation of variables (see the
exercises).
Note that we may identify periodic functions on [−L, L] with functions
on the torus or on the circle, which we may denote by TL .
2.2 Fourier series in general

Instead of proving Theorem 2.1.2 directly, let us begin with a more general
perspective, inspired largely by the presentation in [15].
Consider the space L2 (E) for some measurable E ⊂ Rd . Recall that L2
admits an inner product given by
Z p
hf, gi = f (x)ḡ(x) dx, kf kL2 (E) = kf k = hf, f i.
E
If hf, gi = 0, then we call f and g orthogonal. A set {φα }α∈A is orthog-

onal if any two of its elements are orthogonal and orthonormal if it is
orthogonal and kφα k = 1 for all α ∈ A. By convention, we always assume
that orthogonal sets consist only of nonzero elements. Using separability,
one can show that any orthogonal set in L2 is necessarily countable (see
the exercises). An orthogonal set {φk } is complete if hf, φk i = 0 for all k
implies f = 0.
Suppose now that {φk } is an orthonormal set in L2 . For f ∈ L2 , we
define the Fourier coefficients of f (with respect to {φk }) by
Z
ck = hf, φk i = f φ̄k .
E
We define the Fourier series of f (with respect to {φk }) by

X
S[f ] = ck φk .
k
We define the partial Fourier series by

N
X
sN = ck φk .
k=1
We will prove the following:

Theorem 2.2.1. Suppose {φk } is a complete orthonormal set in L2 . Then
for every f ∈ L2 , the Fourier series S[f ] converges to f in L2 .
2.2. FOURIER SERIES IN GENERAL 11
Remark
P 2.2.2. If one supposes that there is a decomposition of the form
f = k ck φk , one can already see that we should have ck = hf, φk i by taking
the inner product of both sides with φk and using orthonormality.
Remark 2.2.3. The space L2 admits complete orthonormal sets. To see
this, first take a countable dense subset (L2 is separable), and then apply
the Gram–Schmidt algorithm from linear algebra to find an orthonormal
basis for L2 . This means an orthonormal set whose span is dense in L2
(where span means the collection of all finite linear combinations). Using
Cauchy–Schwarz, one can then prove that any orthogonal basis for L2 must
be complete (see the exercises).
The key property of Fourier series is that they give the best L2 approx-
imation using linear combinations of the {φk }.
Lemma 2.2.4. Let {φk } be an orthonormal set in L2 and f ∈ L2 .
(i) Given N , the best L2 approximation to f using the φk is given by the
partial Fourier series.
(ii) (Bessel’s inequality) We have c := {ck } ∈ `2 and
kck`2 ≤ kf kL2 ,
where {ck } are the Fourier coefficients of f .

Proof. Fix N and γ := (γ1 , · · · , γN ) and consider linear combinations of the
form
XN
F = F (γ) = γk φk .
k=1
By orthonormality,
N
X
kF k2 = |γk |2 .
k=1
Thus, recalling ck := hf, φk i, we can write
X X
kf − F k2 = hf − γk φk , f − γk φk i
N
X N
X
= kf k2 − [γ̄k ck + γk c̄k ] + |γk |2
k=1 k=1
XN N
X
= kf k2 + |ck − γk |2 − |ck |2 .
k=1 k=1
It follows that
N
X
min kf − F (γ)k2 = kf k2 − |ck |2
γ
k=1
and
argminγ kf − F (γ)k2 = (c1 , · · · , cN ).
This proves (i). Furthermore (evaluating at γ = (c1 , . . . , cN )) we can deduce
N
X
|ck |2 = kf k2 − kf − sN k2 ,
k=1
which yields Bessel’s inequality upon sending N → ∞.
If equality holds in Bessel’s inequality (i.e. kck`2 = kf kL2 ), we say f

satisfies Parseval’s formula. From the proof of Bessel’s inequality, we
deduce the following:
Proposition 2.2.5. Parseval’s formula holds if and only if S[f ] converges
to f in L2 .
In particular, we have reduced the proof of Theorem 2.2.1 to proving that
Parseval’s formula always holds whenever {φk } is a complete orthonormal
set.
Before proving this, we need a result that allows us to use Fourier coef-
ficients to define L2 functions.
Proposition 2.2.6 (Riesz–Fischer). Let {φk } be an orthonormal 2
2 2
P set in L
and {ck } ∈ ` . There exists an f ∈ L such that S[f ] = ck φk and f
satisfies Parseval’s formula.
Proof. Write tN = N
P
k=1 ck φk . For M < N , orthonormality implies
N
X
2
ktN − tM k = |ck |2 .
k=M +1
Thus {ck } ∈ L2 implies {tN } is Cauchy and hence converges to some f ∈ L2 .

Now observe for N ≥ k
Z Z Z Z
f φ̄k = (f − tN )φ̄k + tN φ̄k = (f − tN )φ̄k + ck
which tends to ck as NP→ ∞ by Cauchy–Schwarz and the fact that tN → f

2
in L . Thus S[f ] = ck φk and tN = sN (f ). In particular, Parseval’s
formula follows from the fact that tN → f in L2 .
2.3. FOURIER SERIES, REVISITED 13
This result does not guarantee uniqueness. However, one does have
uniqueness if the set {φk } is complete. Indeed, if f and g have the same
Fourier coefficients then f − g is perpendicular to each φk .
Finally, we turn to the following:
Proposition 2.2.7. An orthonormal set {φk } is complete if and only if

Parseval’s formula holds for every f ∈ L2 .
Proof. If {φk } is complete and f ∈ L2 , then Bessel’s inequality implies that

the Fourier coefficientsP{ck } are in `2 . ThusP (by Riesz–Fischer) there exists
g ∈ L2 with S[g] = ck φk and kgk2 = |ck |2 . Because f, g have the
same Fourier coefficients and {φk } is complete, we get f = g a.e. Thus
kf k2 = kgk2 = |ck |2 .
P
Conversely, if hf, φk i = 0 for all k and kf k2 = |hf, φk i|2 , then kf k = 0

P
which shows that the {φk } are complete.
Proof of Theorem 2.2.1. Theorem 2.2.1 now follows from the combination
of Proposition 2.2.5 and Proposition 2.2.7.
One can consider even more general settings and prove similar results
in the setting of abstract Hilbert spaces. However, at this point we will
return to the more specific setting of Fourier series for periodic functions on
[−L, L].
2.3 Fourier series, revisited

In light of Theorem 2.2.1, to prove Theorem 2.1.2, we need only to verify
that the set { √12L en : n ∈ Z} is orthonormal and complete in L2 ([−L, L]),
where we recall
inπx
en (x) := e L .
To simplify formulas, let us fix L = π in what follows; in particular,
en (x) = einx .
A direct computation shows that
Z π
1
he
2π n m, e i = 1
2π ei(n−m)x dx = δnm ,
−π
where (
1 n=m
δnm =
0 otherwise.
One calls δnm the Kronecker delta. Thus the family { √12π en } is orthonor-
mal.
It remains to prove that this set is complete.
Lemma 2.3.1. Let f ∈ L2 ([−π, π]). If hf, en i = 0 for all n, then f = 0.
Proof. By assumption, we have

N
X
1
SN f ≡ 0, where SN f (x) = 2π hf, en ien (x).
n=−N
We can rewrite
SN f (x) = f ∗ DN (x),
where DN is the Dirichlet kernel given by
N
X
1
DN (x) = 2π en (x).
n=−N
If we could prove that SN f → f (in some sense), we would be finished.

However, this is difficult because the kernels DN do not form a family of
good kernels. In fact, we can compute the Dirichlet kernel explicitly:
1 sin([N + 12 ]x)
DN (x) = 2π (2.2)
sin( 12 x)
(see the exercises). One can verify, for example, that the L1 -norm of DN
grows like log N . (Again, see the exercises.)
As we will see, averaging improves the situation. In particular, if we
define the Cesáro means by
N
X −1
1
σN f = N Sn f,
n=0
then we may write

σN f (x) = f ∗ FN (x),
where FN is the Fejér kernel given by
N
X −1 N
X −1 n
X
FN (x) = 1
N Dn (x) = 1
2πN eikx .
n=0 n=0 k=−n
2.4. CONVERGENCE OF FOURIER SERIES 15
By assumption, we have σN f ≡ 0. On the other hand, we will prove that

FN are a family of good kernels, so that σN f → f in L2 as N → ∞ (see
Lemma A.3.2). From this we can conclude f = 0, as desired.
First, a direct computation shows
Z π N
X −1 n
X
1
FN (x) dx = 2πN 2πδk0 = 1.
−π n=0 k=−n
For the next property, we use the identity
1 [sin( N2 x)]2
FN (x) = 2πN ,
[sin( 12 x)]2
which we also leave as an exercise. In particular, FN (x) ≥ 0, so that

Z π
|FN (x)| dx = 1
−π
as well. Finally, we fix δ > 0 and observe that | sin( 21 x)| & δ for |x| > δ.
Thus, using the identity above for example, we find
Z
|FN (x)| dx . N1δ2 → 0 as N → ∞.
|x|>δ
The result follows.
2.4 Convergence of Fourier series

We have now seen that Fourier series for periodic functions in L2 ([−L, L])
(or equivalently, for functions on the torus/circle) converge in the sense of
L2 . It is a natural question to ask in what other senses the Fourier series of
a function converge.
The arguments in the proof of Lemma 2.3.1 show that if f ∈ Lp , then
the Cesáro means σN f converge to f in Lp . Indeed, the Fejér kernels are
good kernels. Similarly, if f is (uniformly) continuous, then σN f converges
to f uniformly. One can further show that for f ∈ L1 and a point x,
lim σN f (x) = 21 [f (x+) + f (x−)], (2.3)

N →∞
where f (x±) denotes limits from the right/left (see the exercises). In par-
ticular, σN f (x) will converge to f (x) at any point of continuity.
As for the convergence of SN f to f , we have so far established con-

vergence in L2 -norm. Other notions of convergence are much more subtle.
Let us begin with some negative results, which are essentially consequences
of the fact that the Dirichlet kernels are unbounded in L1 , along with the
uniform boundedness principle. We will continue to work on the interval
[−π, π] for convenience.
Proposition 2.4.1. The following hold:
(i) The Fourier series of an L1 function need not converge in L1 .
(ii) The Fourier series of a continuous function need not converge uni-
formly.
(iii) There exists a continuous function with Fourier series diverging at a
point.
Proof. Recall that the Fourier series of a function f is given by
1 sin([N + 21 ]x)
S N f = f ∗ DN , DN (x) = 2π ,
sin( 12 x)
where we recall that
kDN kL1 & log N.
P −1
We also recall the Fejér kernels FN = N1 Nn=0 Dn .
(i) For each n, we may view Sn as a linear operator from L1 to L1 . We
define the operator norm of Sn by
kSn kL1 →L1 = sup{kSn f kL1 : f ∈ L1 with kf kL1 = 1}.
Recalling that the Fejér kernels FN are uniformly bounded in L1 (by 1), we
have
kSn (FN )kL1 ≤ kSn kL1 →L1 kFN kL1 ≤ kSn kL1 →L1 .
On the other hand, Sn (FN ) = σN (Dn ) (check!), which yields
kSn (FN )kL1 = kσN (Dn )kL1 → kDn kL1 as N → ∞,
where we use that the σN are good kernels. We conclude that
lim kSn kL1 →L1 = ∞. (2.4)

n→∞
This implies that Sn f must not converge to f for every f ∈ L1 . Indeed, if

Sn f → f for every f ∈ L1 then (2.4) would contradict the uniform bound-
edness principle.
(ii) Similar to (i), we will show that
lim kSn kL∞ →L∞ = ∞

n→∞
by showing that
kSn kL∞ →L∞ & kDn kL1 .
We let ψn (x) = signum[Dn (x)], except in small intervals around the 2n
points of discontinuity of signum[Dn (x)]. In particular, we can make ψn be
continuous, with
kψn kL∞ ≤ 1 uniformly in n.
Fixing ε > 0 and choosing the total length of the small intervals {Ik }2n
k=1 to
be smaller than ε/2n, we may derive that
Z

kSn ψn kL∞ ≥ |Sn ψn (0)| & Dn (x)ψn (x) dx & kDn kL1 − ε.

Indeed, we have
Z Z 2n Z
X
Dn ψn (x) dx = |Dn (x)| dx + Dn (x)[ψn (x) − signum[Dn (x)] dx
k=1 Ik
Using the fact that supx |Dn0 (x)| . n2 and 2n ε

P
k=1 |Ik | ≤ 2n , we may derive
that that |Dn | . 1 uniformly over ∪Ik , and hence the sum above is controlled
by ε as claimed. This completes the proof.
(iii) Finally, consider the functionals `n : C([−π, π]) → C defined by
f 7→ Sn f (0). The proof of (ii) shows that k`n k → ∞. Thus, there must
exist f such that `n f (0) → ∞, for otherwise we would reach a contradiction
to the uniform boundedness principle.
The previous result shows failure of convergence (in general) in L1 and

L∞ , but in fact Lp convergence does hold for 1 < p < ∞. Instead of proving
this general fact, let us simply prove the following positive result.
Proposition 2.4.2. Let f be of bounded variation on T. Then
lim Sn f (x) = 12 [f (x+) + f (x−)],

n→∞
where f (x±) denotes limits from the right/left. In particular, Sn f (x) con-
verges to f (x) at any point of continuity of f .
Remark 2.4.3. This result is related to the well-known Gibbs phenomenon.

In particular, we see that at a jump discontinuity the Fourier series will
converge to the middle of the jump. It turns out that near the jump the
Fourier series will ‘overshoot’ and ‘undershoot’ the function on either side
of the jump in a way that does not diminish as one increases the number of
terms in the series.
Example 2.4.1. Let f (x) = 0 for |x| ∈ ( π2 , π) and f (x) = a for |x| < π2 . Then
Z π Z π (
a
1 −inx a
2
−inx n=0
cn = 2π f (x)e dx = 2π e dx = 2a nπ
πn sin( 2 ) n 6= 0.
π
−π −2
As this is an even sequence, the partial Fourier sums are given by

N
X N
X
inx a 2a 1
SN f (x) = cn e = 2 + π n sin( nπ
2 ) cos(nx),
n=−N n=1
which we may rewrite as

K
a 2a
X (−1)k cos((2k + 1)x)
S2K+1 f (x) = 2 + π .
2k + 1
k=1
In particular, note that S2K+1 f (± π2 ) ≡ a2 , as we expect. However, let us

now consider x = π2 + ε. Then (employing some trig identities) the series
becomes
K
X sin((2k + 1)ε)
gK (ε) := − ,
2k + 1
k=1
with gK (0) ≡ 0 as expected. However we can see that |gK 0 (0)| = K, which
suggests that this series can reach size 1 even over an interval of length 1/K.
In fact, more detailed analysis (or studying this example in Mathemat-
ica) would reveal that (for all large K) gK (ε) reaches size about −.92 for
ε = 0+ and +.92 for ε = 0−, and hence we see that the Fourier series will
undershoot/overshoot the correct value of the function by a fixed amount,
yielding approximate values
a a 4
2 ∓ 2 π · .92)
as x approaches π2 from the right/left.

This is just a special case of the more general Gibbs phenomenon, in
which one sees the Fourier series undershooting/overshooting the correct
value at a jump discontinuity by about 9% of the jump on each side.
Proof of Proposition 2.4.2. As we will see, the key is to establish some decay
for the Fourier coefficients fˆ(n) of f .
In fact, we claim that
|fˆ(n)| . 1
|n| kf kBV for |n| ≥ 1. (2.5)
Indeed, this follows from the integration by parts formula for Riemann–
Stieltjes integrals (see Proposition A.1.3):
Z
ˆ −inx
1
|f (n)| = 2π e
f (x) dx
Z
−inx
1 1
= 2πin e df (x) . |n| kf kBV .
The next step is to restate things in terms of σn f , which are known to

satisfy the conclusion of the proposition (cf. (2.3)). First, observe that by
expanding the definition of (n + 1)σn+1 , we can deduce the identity
|k| ˆ
X
σn+1 f (x) = [1 − n+1 ]f (k)eikx
|k|≤n
|k| ˆ
X
ikx
= Sn f (x) − n+1 f (k)e .
|k|≤n
Now let m > n be an integer to be determined shortly. Similar to the above,

we can write
X |k| X |k|
σm+1 f (x) = Sn f (x) − fˆ(k)eikx + (1 − m+1 )fˆ(k)eikx .
m+1
|k|≤n n<|k|≤m
Now we can see that any linear combination of the form
ασn+1 f + (1 − α)σm+1 f
will produce a single copy of Sn f plus some ‘error terms’. As any such
combination converges to the desired limit as n, m → ∞, we can complete
the proof if we can find a suitable combination for which we can control the
error terms.
The sums involving |k| ≤ n are the most problematic, as they do not
tend to zero individually; for example, applying (2.5), the best estimate we
have for the term appearing in σn+1 yields

X |k|
ˆ n

n+1 |f (k)| . n+1 ,

|k|≤n
which does not converge to 0 as n → ∞. Thus, we choose a combination

in order to make the two sums over |k| ≤ n cancel. In particular (choosing
n+1
α = −( m−n )), we may write
|k|
X
n+1
Sn f = − m−n m+1
σn+1 f + m−n m+1
σm+1 f − m−n (1 − m+1 )fˆ(k)eikx .
n<|k|≤m
Our final step is to show that for m chosen suitably depending on n, this
final term can be made arbitrarily small (uniformly as n → ∞). In fact,
using (2.5), we can estimate
|k|
X X
m+1
|(1 − )fˆ(k)| . 1 m
m−n m+1 |k| . log( n ),
n<|k|≤m n<|k|≤m
where we have used the bound

|k|
|(1 − m+1 )| = | m+1−|k|
m+1 | ≤
m+1
m+1 =1 for |k| > n.
Given ε > 0, the result now follows by choosing m to be the integer part of
(1 + δ)n for small δ = δ(ε) > 0.
Remark 2.4.4. In the preceding proof, we used some regularity condition

on f to prove that the Fourier coefficients converged to zero quantitatively
as n → ∞. In fact, from Bessel’s inequality we know that the Fourier coeffi-
cients of an L2 function always tend to zero as n → ∞; this is also sometimes
called the Riemann–Lebesgue lemma. However, without imposing some
regularity conditions, it is possible to have a sequence of Fourier coefficients
converging to zero arbitrarily slowly.
The phenomenon that smoothness yields decay of Fourier coefficients
(and vice versa!) is an important fact in Fourier analysis. Revisiting the
proof of (2.5), for example, one can see that being k-times continuously
differentiable would imply decay of the Fourier coefficients like |n|−k (by
repeating the integration by parts k times). In fact, more can be said. In
the case of the torus, it is a fact that a function is analytic if and only if its
Fourier coefficients decay exponentially. We will not pursue this result on
the torus, but will prove a related result (the Paley–Wiener theorem) for the
Fourier transform in the next chapter. Also see the exercises, where these
topics are explored a bit more.
To close the discussion, let us finally mention the deep result of Carleson:
Theorem 2.4.5 (Carleson). Fourier series of functions in L2 converge
pointwise almost everywhere.
2.5. THE FOURIER TRANSFORM 21
We will briefly discuss this result below in Section 2.5.1.

One can extend much of what we have done above to the case of higher
dimensional tori, although we will not pursue the details here. We also
remark that it is possible to study analogues of Fourier series on more general
groups than the torus (e.g. compact Lie groups). We will venture briefly in
this direction in Chapter 4 below. For now, we turn to the extension of the
preceding ideas from [−L, L] to the whole real line.
2.5 The Fourier transform

We have seen that for a periodic L2 function f : [−L, L] → C, we can write
n
f as a linear combination of waves of frequencies 2L , namely,
∞ Z L
X inπx inπx
f (x) = cn e L , cn = 1
2L f (x)e− L dx. (2.6)
n=−∞ −L
The Fourier transform extends this to the case L → ∞. For f : R → C,

we define fˆ : R → C formally by
Z
ˆ 1
f (ξ) = √2π f (x)e−ixξ dx.
R
That is, fˆ(ξ) is the ‘Fourier coefficient’ at frequency ξ ∈ R. The question is

then whether or not we can recover f from fˆ; i.e. do we have an analogue
of (2.6)?
Suppose that f : R → C satisfies f (x) = 0 for |x| > M , and let L > M .
Then Z
inπx
π 1
cn = L 2π f (x)e− L dx = √12π Lπ fˆ( nπ
L ),
R
and hence for fixed x ∈ R we have

∞
X inπx
√1 π ˆ nπ
f (x) = 2π L f ( L )e
L .
n=−∞
Writing ε = Lπ and G(y) = fˆ(y)eixy , we can send L → ∞ (i.e. ε → 0) to

formally deduce
∞
X Z Z
f (x) = √1
2π
εG(εn) → √1
2π
G(ξ) dξ = √1
2π
fˆ(ξ)eixξ dy
n=−∞ R R
Thus we arrive formally at the Fourier inversion formula

Z Z
1
f (x) = 2π
√ ˆ ixξ ˆ
f (ξ)e dξ, where f (ξ) = 2π √1
f (x)e−ixξ dx.
R R
We call the function fˆ the Fourier transform of f . This extends naturally

to higher dimensions as follows: for f : Rd → C and ξ ∈ Rd ,
Z
− d2
ˆ
f (ξ) := (2π) f (x)e−ixξ dx,
Rd
where xξ really denotes x · ξ = x1 ξ1 + · · · + xd ξd .
Remark 2.5.1. Recall the viewpoint that the Fourier series of a function
is an expansion in terms of eigenfunctions of −∂x2 (or, in higher dimensions,
the Laplacian). Here we see that the Fourier transform of a function is an
expansion in terms of generalized eigenfunctions of −∂x2 on R (or, in higher
dimensions, the Laplacian). The difference is that the Laplacian has discrete
eigenvalues (or spectrum) on a compact domain, while it has continuous
spectrum on Rd . Spectral theory allows for a unified interpretation of Fourier
series/transform, namely, as a spectral resolution of the Laplacian. We
will also see a unification of Fourier series and Fourier transform through
the perspective of Fourier analysis on locally compact abelian groups in
Chapter 4.
Remark 2.5.2. There are other normalizations for the Fourier transform.
A common one is to define
Z
f (ξ) = e−2πixξ f (x) dx.
ˆ
We will use this normalization in the next chapter and at times below.
We turn to the details. Note that fˆ is not necessarily well-defined for

an arbitrary function f : Rd → C, as the integral may not converge. On the
other hand, for any f ∈ L1 we have fˆ well-defined as a bounded function on
Rd .
We will begin by restricting to a nice function space, which (as we will
see) is very compatible with the Fourier transform.
Definition 2.5.3 (Schwartz space). We define
S(Rd ) = {f ∈ C ∞ (Rd ) : xα ∂ β f ∈ L∞ for all multi-indices α, β}.

The Schwartz space is a topological vector space, with the topology

generated by the open sets
{f ∈ S(Rd ) : kxα ∂ β (f − g)kL∞ < ε}
for some g ∈ S(Rd ), ε > 0, and multi-indices α, β. More can be said about
the structure of Schwartz space, but it will not be too relevant for our
discussions here.
If f is a Schwartz function, then f is absolutely integrable, and hence
the Fourier transform of f is well-defined pointwise. In fact, we will show
that fˆ is also a Schwartz function! We begin with the following lemma.
Lemma 2.5.4. Let f ∈ S(Rd ).
• If g(x) = ∂ α f (x), then ĝ(ξ) = (iξ)α fˆ(ξ).
• If g(x) = (−ix)α f (x), then ĝ(ξ) = ∂ α fˆ(ξ).
Proof. Let us consider the simplest case of d = 1 and a single derivative
or power of x, leaving the rest as an exercise. First, if g(x) = f 0 (x), then
integration by parts (and the fact that f → 0 as |x| → ∞) yields
Z
1
ĝ(ξ) = (2π)− 2 e−ixξ f 0 (x) dx
RZ
− 12
= (2π) iξ e−ixξ f (x) = iξ fˆ(ξ),
R
as desired. Similarly, if g(x) = −ixf (x), then

Z
− 21
ĝ(ξ) = −(2π) ixe−ixξ f (x) dx
Z R
− 12 d −ixξ d ˆ
= (2π) f (x) dξ [e ] dx = dξ f (ξ).
R
This completes the proof.
This lemma already suggests the connection between the Fourier trans-
form and partial differential equations (PDE): it interchanges taking deriva-
tives and multiplication by x. We leave the following corollary as an exercise:
Corollary 2.5.5. If f ∈ S(Rd ) then fˆ ∈ S(Rd ).
Thus, defining the transformation F which takes f ∈ S(Rd ) and returns
fˆ ∈ S(Rd ), we can see that F (also called the Fourier transform) is a well-
defined linear transformation on S. Linearity is straightforward to check and
is left to the reader. In fact, more is true:
Theorem 2.5.6. The Fourier transform F is a bijection on Schwartz space

S(Rd ), and the Fourier inversion formula holds. That is,
Z
− d2
f (x) = (2π) fˆ(ξ)eixξ dξ for f ∈ S(Rd ).
Rd
To prove this theorem, we need a few auxiliary lemmas.
Lemma 2.5.7 (Multiplication formula). For f, g ∈ S(Rd ), we have

Z Z
f (x)ĝ(x) dx = fˆ(y)g(y) dy.
Rd Rd
Proof. This is a consequence of Fubini’s theorem.
Lemma 2.5.8. Let f ∈ S(Rd ). Then the following hold:
• If g(x) = f (−x), then ĝ(ξ) = fˆ(−ξ).
• If g(x) = f (x − h), then ĝ(ξ) = e−ihξ fˆ(ξ).

1 ˆ ξ
• If g(x) = f (λx), then ĝ(ξ) = λd
f ( λ ).
Proof. Let us verify the third identity and leave the first two as exercises.
This follows from the change of variables formula. Indeed,
Z Z
iyξ
e−ixξ f (λx) dx = λ1d e− λ f (y) dy.
Rd Rd
The result follows.
As a consequence of the first two identities, we observe that
if g(y) = f (x − y), then ĝ(y) = e−ixy fˆ(−y).

2 /2
Lemma 2.5.9. Define f (x) = e−|x| . Then f ∈ S(Rd ) and fˆ = f .
Proof. Let us prove this identity for the case d = 1. We leave it to the reader
to see why this implies the general case.
We make an ODE argument. Using the fact that f 0 (x) = −xf (x) and
d ˆ 2
Lemma 2.5.4, we find that dξ f = −ξ fˆ. Therefore fˆ(ξ) = e−ξ /2 fˆ(0). How-
ever, Z
− 12 2
ˆ
f (0) = (2π) e−x /2 dx = 1.
R
Thus the result follows.
d 2 /2
In the following, we will define K(x) = (2π)− 2 e−|x| and for ε > 0 set
Kε (x) = ε−d K( xε ).
It follows that Kε form a family of good kernels as ε → 0. Furthermore, by

the scaling property of the Fourier transform proven above, we have that
if Gε (x) = K(εx), then Ĝε = Kε .
We can now prove Theorem 2.5.6.
Proof of Theorem 2.5.6. We begin with the inversion formula for f ∈ S(Rd ).
Using the lemmas above, we compute
Z
f ∗ Kε (x) = f (x − y)Kε (y) dy
Rd
Z
= f (x − y)Ĝε (y) dy
Rd
Z Z
−ixy ˆ
= e f (−y)K(εy) dy = fˆ(y)eixy K(−εy) dy.
Rd Rd
Sending ε to zero (applying dominated convergence and noting K(0) =

d
(2π)− 2 ), we deduce the inversion formula
Z
− d2
f (x) = (2π) fˆ(y)eixy dy.
Rd
To see that F is a bijection, we can define F̃ : S → S via

Z
d
F̃g(x) = (2π)− 2 eixξ g(ξ) dξ
Rd
and observe that the Fourier inversion formula yields F̃ ◦ F = 1 on S.

Combining this with the fact that F̃f (y) = Ff (−y), it follows that F ◦ F̃ = 1
on S as well. We conclude that F̃ = F −1 and F is a bijection on S.
While the Schwartz space is clearly well-suited for the Fourier transform,
it is not the end of the story. Given what we have learned about Fourier
series, it is natural to seek an extension of the Fourier transform to L2 (Rd ).
The key to this is the Plancherel formula. Before we state and prove it,
let us recall the inner product structure on L2 (Rd ), namely,
Z p
hf, gi = f (x)ḡ(x) dx, and kf kL2 = hf, f i.
Rd
Theorem 2.5.10 (Plancherel). For f, g ∈ S(Rd ), we have
hf, gi = hfˆ, ĝi.
In particular, kf kL2 = kfˆkL2 .
We begin with a convolution identity that is of more general use.
Lemma 2.5.11. For f, g ∈ S(Rd ),

d
F(f ∗ g)(ξ) = (2π) 2 fˆ(ξ)ĝ(ξ).
Proof. We compute directly:

ZZ
− d2
F(f ∗ g)(ξ) = (2π) g(y)e−iyξ f (x − y)e−i(x−y)ξ dx dy
d
= (2π) 2 fˆ(ξ)ĝ(ξ),
where we have used Fubini’s theorem.
Now we can prove the Plancherel formula:

¯
Proof. Define G(x) = ḡ(−x), so that Ĝ(ξ) = ĝ(ξ). Then by the Fourier
inversion formula and the convolution identity above,
Z
f (x)ḡ(x) dx = f ∗ G(0)
Z
− d2
= (2π) F(f ∗ G)(ξ) dξ
Z Z
ˆ
= f (ξ)Ĝ(ξ) dξ = fˆ(ξ)ĝ(ξ)
¯ dξ,
Rd
as desired.
Using Theorem 2.5.10, we can now extend the Fourier transform to a

linear operator acting on L2 . In fact, the Fourier transform acts as a unitary
operator on L2 .
Theorem 2.5.12. The Fourier transform extends from S(Rd ) to a unitary

map on L2 (Rd ).
We will use the following lemma, which is left as an exercise.
Lemma 2.5.13. Schwartz space is dense in L2 .

Proof of Theorem 2.5.12. We let f ∈ L2 and choose fn ∈ S(Rd ) such that

fn → f in L2 -norm. As F restricted to S(Rd ) is an isometry, it follows that
{fˆn } is Cauchy in L2 . We define fˆ to be the L2 limit of fˆn and set Ff = fˆ.
To see that Ff is well-defined, suppose gn is another sequence in S(Rd )
that converges to f in L2 . Let hn be the intertwining of fn and gn , so that
hn → f in L2 . Then ĥn is Cauchy in L2 and hence converges. As the
subsequence fˆn converges to fˆ, it follows that the subsequence ĝn must also
converge to fˆ.
The fact that F is an isometry on L2 follows from the corresponding
property on Schwartz space. Let us finally show that F maps onto L2 (so
that F is unitary). As the range of F contains a dense subclass of L2
(namely, the Schwartz functions), it suffices to show that the range of F
(which is a linear subspace of L2 ) is closed. To this end, we let g be in the
L2 -closure of the range of F, so that there exist fn ∈ L2 so that fˆn → g. As
F is an isometry, {fn } is Cauchy in L2 . Denoting f by the limit, we apply
the isometry property one more time to deduce that fˆn → fˆ. This implies
fˆ = g, which completes the proof.
In fact, the Fourier transform extends from S(Rd ) to Lp (Rd ) for all
1 ≤ p ≤ 2, with a corresponding bound (known as the Hausdorff–Young
inequality):
kfˆkLp0 . kf kLp for all 1 ≤ p ≤ 2, where 1

p + 1
p0 = 1.
In fact, this is the only range for which this is possible. That is, if the
estimate
kfˆkLq . kf kLp
holds for all Schwartz functions, then q = p0 and 1 ≤ p ≤ 2. We will discuss
the Hausdorff–Young inequality later in the setting of interpolation; as for
the second point, we leave it as an exercise.
2.5.1 Remarks about pointwise convergence

Let us briefly discuss the question of pointwise convergence of the Fourier
transform; this is closely related to the problem of convergence of Fourier
series (cf. Theorem 2.4.5 above). We will not discuss the full proof of
Carleson’s theorem; we refer the reader to [14, 13] for a streamlined proof.
Instead, let us show how the result is implied by a ‘weak type (2, 2) bound’
for a suitable ‘maximal operator’ (cf. Definition 6.1.1 below, for example).
In particular, this discussion may serve in part to preview some of the topics
to be covered later in these notes.
Recall (from Theorem 2.5.6) that we have the Fourier inversion formula
Z N
f (x) = lim √12π eixξ fˆ(ξ) dξ, x ∈ R, (2.7)
N →∞ −N
for all Schwartz functions f ∈ S(R). Carleson’s theorem [2, 9] states that the
same convergence holds almost everywhere for functions f that are merely
in L2 (R).
The integral appearing on the right-hand side of (2.7) can be written
in terms of the convolution of f with the Dirichlet kernel DN (x) = sinπxN x .
Thus, the convolution is a combination of the singular integral part x1 (which
would fall under the purview of Calderon–Zygmund theory, cf. Section 6.4
below) and the oscillatory part sin N x, which requires some additional tech-
niques.
Following [14], we will work with the (equivalent) one-sided inversion
formula Z N
f (x) = lim √ 1
2π
eixξ fˆ(ξ) dξ, (2.8)
N →∞ −∞
which again holds on R for any Schwartz function f . As Schwartz functions
are dense in L2 (Exercise 2.9.13), the problem reduces to proving that the
set of functions for which almost everywhere convergence holds is closed.
For such purposes, it is common to introduce a suitable maximal func-
tion. In particular, we define the Carleson operator
Z N
ixξ ˆ

Cf (x) = sup e f (ξ) dξ , x ∈ R.
N −∞
We can then show that a weak type (2, 2) bound for C implies the desired
result. (One can compare this to the fact that the weak type (1, 1) bound for
the Hardy–Littlewood maximal function implies the Lebesgue differentiation
theorem, for example; see Proposition 6.3.5.)
Proposition 2.5.14. Suppose that C obeys the weak type (2, 2) bound
|{Cf > λ}| . λ−2 kf k2L2 .
Then (2.8) holds almost everywhere for any f ∈ L2 .

Proof. Let f ∈ L2 and ε > 0. Choose g ∈ S such that
3
kf − gkL2 < ε 2 .
2.6. APPLICATIONS TO PDE 29
Now, setting
N
Z
√1 ixξ ˆ

Lf = lim supf (x) − 2π
e f (ξ) dξ ,
N →∞ −∞
we note that since (2.8) holds for g (for all x), we may write
Lf ≤ C(f − g) + |f − g|.
Using the weak type (2, 2) bound, we find
|{C(f − g) > 12 ε}| . ε−2 kf − gk2L2 . ε.
Similarly, by Tchebychev’s inequality,
{|f − g| > 12 ε}| . ε−2 kf − gk2L2 . ε.
Thus
|{Lf > ε}| . ε
for any ε > 0. It follows that Lf = 0 almost everywhere, which implies the
desired result.
Actually proving the weak type bound for the Carleson operator is quite
an undertaking. We again refer the interested reader to [14, 13] and end our
discussion of Carleson’s theorem here.
2.6 Applications to PDE

In this section, we discuss a few applications of the Fourier transform to the
solution of some linear partial differential equations.
Example 2.6.1. Consider the Poisson/Laplace equation
−∆u = f, u : R3 → R,
where f : R3 → R is a given function. Here ∆ is the Laplacian,

3
∂2u
X
∆u = ∂x2j
.
j=1
Applying the Fourier transform and Lemma 2.5.4, we find that the equation
is equivalent to
3
|ξ|2 û = fˆ, so that û = |ξ|−2 fˆ = (2π)− 2 F(K ∗ f )(ξ),
3
where K = F −1 (|ξ|−2 ). In particular, u(x) = (2π)− 2 K ∗ f (x).
Here we reach a bit of a subtle point: the function |ξ|−2 is not a Schwartz
function, nor is it even an L2 function! Let us ignore this subtlety for
the moment—it can be resolved by the theory of distributions (see below).
Instead, let us see if we can compute a formula for K anyway.
There is an elegant way to compute K(x) exactly using the Gamma
function (see the exercises). Let us instead argue by symmetry to deduce
the form of K(x). First, we observe by Lemma 2.5.8 that
|λξ|−2 = λ−2 |ξ|−2 =⇒ K(λx) = λ−1 K(x).
Next, because |ξ|−2 is invariant under rotations, so is K (see the exercises).

Consequently, K is constant on the unit sphere. It follows that
K(x) = |x|−1 K( |x|

x
) = c|x|−1 for some c ∈ R.
To compute c, let us go back to the PDE. Noting that

3 3
u(x) = (2π)− 2 c|x|−1 ∗ f, we want − |x|−1 ∗ ∆f = c−1 (2π)− 2 f.
Using translation invariance, it is enough to evaluate both sides at x = 0.

Thus, we are left to find c such that
Z
3
− |x|−1 ∆f (x) dx = c(2π)− 2 f (0).
R3
A computation using integration by parts (see the exercises) yields

Z
− |x|−1 ∆f (x) dx = 4πf (0).
R3
3
Thus (2π)− 2 c−1 = 4π, and we conclude
1
u(x) = 4π|x| ∗f solves − ∆u = f
in three dimensions.
Example 2.6.2 (Heat equation). We next consider the heat equation on
(0, ∞) × Rd :
(t, x) ∈ (0, ∞) × Rd ,

ut − ∆u = 0
u(0, x) = f (x) x ∈ Rd .
We apply the Fourier transform in the x variables only. We find
bt (t, ξ) = F(∆u)(t, ξ)
u ⇐⇒ bt (t, ξ) = −|ξ|2 u
u b(t, ξ).
2.7. THE FOURIER TRANSFORM OF DISTRIBUTIONS 31
For each ξ, this is an ODE in t that we can solve:

2 2
u b(0, ξ)e−t|ξ| = e−t|ξ| fb(ξ).
b(t, ξ) = u
Thus
2 2
u(t, x) = F −1 [fb e−t|ξ| ](x) = (2π)−d/2 [f ∗ F −1 (e−t|ξ| )](x),
and again we need to compute an inverse Fourier transform.

Fortunately, we have already done this computation! Recall that
2 /2 2 /2
F(e−|x| )(ξ) = e−|ξ|
Thus (by Lemma 2.5.8), we have

2 /4t d 2
F(e−|x| )(ξ) = (2t) 2 e−t|ξ| .
We conclude that the solution to the heat equation is given by

Z
− d2 −|·|2 /4t − d2 2
u(t, x) = (4πt) e ∗ f (x) = (4πt) e−|x−y| /4t f (y) dy.
Rd
2.7 The Fourier transform of distributions

In computing the solution to the Poisson equation, we quickly ran into
the problem of taking Fourier transforms or inverse Fourier transforms of
functions that are not even L2 . In fact, one can extend the Fourier transform
quite naturally to the setting of ‘tempered distributions’, which includes a
much larger class of functions than Schwartz space or L2 . Without delving
too deeply into this topic, let us introduce some of the main points.
A tempered distribution u is a continuous linear functional acting on
Schwartz space, that is, u : S(Rd ) → C. The set of all tempered distributions
is denoted S 0 (Rd ), or just by S 0 . In this setting, one often calls the elements
of S (which are the arguments of elements of S 0 ) test functions.
Schwartz functions themselves may be embedded in the set of tempered
distributions through the mapping T : S → S 0 given by
Z
T u(f ) = uf dx for u, f ∈ S.
As an exercise, one should check that the map T is injective, and hence we
can identify a function u with the distribution T u. In fact, the mapping T
makes sense for any function that is integrable against Schwartz functions,
hence a very large class of functions may naturally be viewed as distributions

(e.g. any Lp function multiplied by any polynomial).
Not all distributions are given by functions. A classical example is the
Dirac delta distribution δ0 ∈ S 0 , defined by
δ0 (f ) = f (0) for f ∈ S.
We remark that if Kn is a family of good kernels, then Kn → δ0 ‘in the

sense of distributions’.
The multiplication formula (Lemma 2.5.7) reveals how to extend the
Fourier transform to the space of distributions. Recalling that
Z Z
f ĝ = fˆg for all f, g ∈ S,
we define the Fourier transform of a distribution u ∈ S 0 to be the distri-

bution û ∈ S 0 satisying
û(f ) = u(fˆ) for all f ∈ S.
This definition guarantees that û agrees with the usual definition of the
Fourier transform in the case that u actually arises from a Schwartz function
under the mapping T introduced above. Thus the Fourier transform F
extends to a mapping F : S 0 → S 0 .
Similarly, if we define F ∗ : S 0 → S 0 by F ∗ u(f ) = u(F −1 f ), we can
deduce that FF ∗ = F ∗ F = Id, and hence F ∗ = F −1 and F is a bijection
on S 0 .
To define operations on distributions, one first observes how these op-
erations behave on Schwartz functions. For example, given a multiindex α
and u, f ∈ S, integration by parts yields
Z Z
α |α|
(∂ u)f = (−1) u∂ α f.
Then for a distribution u ∈ S 0 , we define ∂ α u ∈ S 0 by
∂ α u(f ) = (−1)|α| u(∂ α f ) for f ∈ S.
Note that this allows us to take derivatives of non-differentiable functions!

Similarly, a computation shows that the correct definition of f ∗ u (for
f ∈ S and u ∈ S 0 ) should be
(f ∗ u)(g) = u(f˜ ∗ g), where f˜(x) = f (−x), g ∈ S.

2.8. THE PALEY–WIENER THEOREM 33
Alternately, one can define f ∗ u as a function via
(f ∗ u)(x) = u(τg
x f ), τx f (y) = f (y − x). (2.9)
One can check that these two definitions agree.

Finally, for u ∈ S 0 and a (moderately well-behaved) function f , we can
define f u ∈ S 0 via [f u](g) = u(f g).
Using the definitions introduce above, one can verify all of the nice prop-
erties of the Fourier transform continue to hold in the setting of distributions,
e.g.
d
α u = (iξ)α û,
∂d F(f ∗ u) = (2π) 2 fû̂,
and so on. The moral is that one can often perform ‘formal’ computations
with the Fourier transform, even if the functions involved are not Schwartz.
The resulting computations will typically be valid, provided they are inter-
preted in the appropriate sense.
Let us conclude this section with an example.
Example 2.7.1. Consider δ0 ∈ S 0 . Then
Z
d
δ̂0 (f ) = δ0 (fˆ) = fˆ(0) = (2π)− 2 f.
d
Thus δ̂0 = (2π)− 2 .
2.8 The Paley–Wiener theorem

For the final topic of this section, let us return to the idea that Fourier
transformation exchanges decay and smoothness.
We present a classical result on the line known as the Paley–Wiener
theorem.
Theorem 2.8.1 (Paley–Wiener theorem). For f ∈ L2 (R), the following are

equivalent:
(i) f is the restriction to R of a function F defined on a strip {x + iy :

x ∈ R, |y| < a} ⊂ C that is holomorphic and satisfies
Z
|F (x + iy)|2 dx . 1 for all |y| < a.
(ii) ea|ξ| fˆ ∈ L2 (R).

Proof. Suppose (ii) holds. We then define

Z
F (z) = √2π eizξ fˆ(ξ) dξ,
1
which satisfies F |R = f by the Fourier inversion formula. This defines a

holomorphic function on the strip {|y| < a} due to the exponential decay of
fˆ. Furthermore, by Plancherel,
Z Z
|F (x + iy)|2 dx = 2π
1
|fˆ(ξ)|2 e−2yξ dξ . kfêa|ξ| k2L2
uniformly for |y| < a. This implies (i).

Next suppose (i) holds. Denoting fy (x) = F (x + iy) (so that f0 = f ), we
will show fˆy (ξ) = fˆ(ξ)e−ξy . Then by Plancherel’s theorem (just as above),
we will have Z
|fˆ(ξ)|2 e2ξy dξ . 1
for |y| < a, yielding (ii).

We would be able to say fˆy (ξ) = fˆ(ξ)e−ξy immediately if (ii) already
held. We therefore utilize a family of good kernels to introduce compactly
supported approximating functions (which, in particular, satisfy (ii)).
We utilize the following family, which form a family of good kernels:
Kλ (x) = λK(λx),
where Z 1
1 sin(x/2) 2
K(x) = 2π x/2 ) = 1
2π (1 − |ξ|)eixξ dξ
−1
(check!).
We set Z
Gλ (z) = Kλ ∗ F (z) = F (z − w)Kλ (w) dw.
R
Then Gλ is holomorphic in {|y| < a}. We now define
gλ,y (x) = Gλ (x + iy) = Kλ ∗ fy (x).
In particular
ĝλ,y (ξ) = K̂λ (ξ)fˆy (ξ) for each λ.
Now observe that each ĝλ,y (ξ) has compact support, specifically, in [−λ, λ].
In particular, (ii) holds for ĝλ,y and hence
ĝλ,y (ξ) = ĝλ,0 (ξ)e−ξy .

2.9. EXERCISES 35
As K̂( λξ ) is supported in [−λ, λ], it follows that
fˆy (ξ) = fˆ0 (ξ)e−ξy for |ξ| < λ.

Sending λ → ∞ yields the result.
Using this theorem, we can prove the following important fact:
Corollary 2.8.2. Let f ∈ L2 (R) and let fˆ ∈ L2 (R) denote the Fourier
transform of f . Then f and fˆ cannot both be compactly supported (unless
f ≡ 0).
Sketch of proof. Suppose fˆ and f are both compactly supported. Then
ea|ξ| fˆ ∈ L2 for any a > 0. Thus f is the restriction of an entire function F .
However, F vanishes on R\[−M, M ], and hence (by the uniqueness theorem
of complex analysis) F ≡ 0. This implies f ≡ 0.
2.9 Exercises
Exercise 2.9.1. The 1d Dirichlet Laplacian on (0, 1) is the operator −∂x2
defined on the set of smooth functions f : (0, 1) → R satisfying f (0) =
f (1) = 0. The 1d Neumann Laplacian is also defined to be −∂x2 but on the
set of smooth functions f (0, 1) → R satisfying f 0 (0) = f 0 (1) = 0.
(i) Find the eigenfunctions and eigenvalues for the Dirichlet Laplacian.
(ii) Find the eigenfunctions and eigenvalues for the Neumann Laplacian.
Exercise 2.9.2. Using Fourier series, show the following:
(i) For f : (0, 1) → R smooth and satisfying f (0) = f (1) = 0, we have
∞
X
f (x) = an sin(nπx) for some an ∈ R.
n=1
Moreover, find a formula for the coefficients an .

(ii) For f : (0, 1) → R smooth and satisfying f 0 (0) = f 0 (1) = 0, we have
∞
X
f (x) = bn cos(nπx) for some bn ∈ R.
n=0
Moreover, find a formula for the coefficients bn .

Exercise 2.9.3. Suppose {φk } is a complete orthonormal set in L2 and f, g ∈
L2 . Let {fˆk } and {ĝk } be the Fourier coefficients of f, g. Use Parseval’s
theorem to prove the following:
X
hf, gi = fˆk ĝk .
k
Exercise 2.9.4. Show that any orthogonal set in L2 is at most countably

infinite.
Exercise 2.9.5. Show that any orthogonal basis in L2 is complete. In par-
ticular, there exists a complete orthonormal basis for L2 .
Exercise 2.9.6. Compute the formula for the Dirichlet kernel by showing
N
X sin[(N + 12 )x]
einx = .
n=−N
sin( 21 x)
Compute the formula for the Fejér kernel by showing

N −1 n
X X [sin( N2 x)]2
1
2πN eikx = 1
2πN
n=0 k=−n
[sin( 21 x)]2
Hint. Write einx = (eix )n and sum the geometric series.

Exercise 2.9.7. Prove Fejér’s theorem: for f ∈ L1 (T),
lim σn f (x) = 21 [f (x+) + f (x−)],

n→∞
provided these limits exist. Hint: Use the facts that the Fejér kernels are
good kernels that are positive, even, and decay away from x = 0.
Exercise 2.9.8. Let f be a function on the torus with Fourier coefficients
fˆ(n).
(i) Show that if f is k-times differentiable and f (k) ∈ L1 , then
|fˆ(n)| ≤ min |n|−j kf (j) kL1 .

0≤j≤k
(ii) Show that if f is Hölder continuous of order α ∈ (0, 1] then
|fˆ(n)| . |n|−α .
Exercise 2.9.9. Let f be a function on the torus. Show that f is analytic if

and only if there exist K > 0 and a > 0 such that |fˆ(n)| ≤ Ke−a|n| .
Exercise 2.9.10. This exercise appears as Theorem I.4.1 in [16]: Let {an }∞
n=−∞
be an even sequence of nonnegative numbers such that an → 0 as |n| → ∞.
Suppose that for n > 0 we have
an+1 − an−1 − 2an ≥ 0.
Show that there exists a nonnegative function f ∈ L1 (T) such that the
Fourier coefficients of f are given by fˆ(n) = an .
2.9. EXERCISES 37
Exercise 2.9.11. Show that the Dirichlet kernels satisfy kDn kL1 & log n for
large n.
Exercise 2.9.12. Let f be a Schwartz function. Show that for any multii-
indices α, β and any 1 ≤ p ≤ ∞ we have xα ∂ β f ∈ Lp .
Exercise 2.9.13. Show that Schwartz space is dense in L2 .
Exercise 2.9.14. Show that if an estimate of the form
kfˆkLq . kf kLp
holds for Schwartz functions, then q = p0 and 1 ≤ p ≤ 2. Hint. For the first
2
part, use a scaling argument. For the second, consider f (x) = e−(1+it)|x| /2
and send t → ∞.
Exercise 2.9.15. Let A be a d × d invertible matrix with real entries. Show
that if g(x) = f (Ax), then
ĝ(ξ) = | det A|−1 fˆ((At )−1 ξ).
Exercise 2.9.16. Show that

d−α α
F π − 2 Γ( d−α α−d
= π − 2 Γ( α2 )|ξ|−α .

2 )|x|
This appears in [28, Lemma 1, p.117]. It can be computed using the Gamma
function Z ∞
Γ(z) = e−t tz−1 dt,
0
as we now sketch. First, show that
Z ∞
2 d−α d−α
e−πt|x| t 2 dtt = π −( 2 ) Γ( d−α
2 )|x|
α−d
.
0
Now compute the Fourier transform, using the fact that the Fourier trans-
form of a Gaussian is a Gaussian (which follows from computing the appro-
priate Gaussian integral):
Z Z ∞ Z ∞
2 d−α π|ξ|2 α α
e−2πixξ e−πt|x| t 2 dtt dx = e− t t− 2 dtt = π − 2 |ξ|−α Γ( α2 ),
0 0
where the last equality comes from a change of variables.

Exercise 2.9.17. Show that for f ∈ Cc∞ (R3 ),
Z
− |x|−1 ∆f (x) dx = 4πf (0).
Exercise 2.9.18. Solve the Poisson equation −∆u = f in dimensions d ≥ 4.

Exercise 2.9.19. Use the Fourier transform to solve the wave equation ∂tt u−
∂xx u = 0 with initial condition (u(0, x), ∂t u(0, x) = (f (x), g(x)) in one space
dimension.
Exercise 2.9.20. Let c ∈ Rd . Use the Fourier transform to solve the transport
equation
∂t u + c · ∇u = 0
with initial condition u(0, x) = f (x).
Exercise 2.9.21. Solve the linear Schrödinger equation
i∂t u + ∆u = 0, u(0, x) = f (x)
on both the torus and Rd .

Exercise 2.9.22. For f a Schwartz function, define the tempered distribution
T f by Z
T f (g) = f g dx.
Show that the mapping f 7→ T f is injective.

Exercise 2.9.23. Let T be the mapping as in Exercise 2.9.22. Show that if
f is a Schwartz function then [T f ]0 = T [f 0 ]. (The notation on the left refers
to the distributional derivative of T f .)
Exercise 2.9.24. Compute the second distributional derivative of the func-
tion f (x) = |x|
Exercise 2.9.25. Compute the Fourier transform of ∂ α δ0 .
Exercise 2.9.26. Give full details for the proof of Corollary 2.8.2.
Chapter 3
Fourier analysis, part II
In this section, we continue our study of the Fourier transform and con-
sider several applied topics. We will use the following normalization for the
Fourier transform: Z
ˆ
f (ξ) = e−2πitξ f (t) dt,
R
and similarly for Fourier series.
3.1 Sampling of signals

Consider a time-dependent signal (e.g. an audio recording). This is mod-
eled as a function f : R → R, which we write as f = f (t) to keep the
interpretation of ‘time’ clear. In practice, we should consider signals that
are compactly supported in time, but let us return to this point later.
We will consider the problem of reconstructing a signal f using a col-
lection of samples {f (tk )}∞
k=−∞ . We will assume that the sampling rate is
constant, e.g. tk = kr for some r > 0. (The sampling rate would then be
1/r, i.e. the number of samples taken per second.) In practice, one can
only take finitely many samples, in which case we would like as good of an
approximation of f as possible.
We begin with the observation that this problem is essentially hopeless
unless we restrict to bandlimited signals, that is, signals with compact
Fourier support.
Example 3.1.1. Fix 0 < r < 1. Let
f (x) = χ[−1,1] (x) and g(x) = χ[−1,1] (x) cos( 2πx

r ).
39
40 CHAPTER 3. FOURIER ANALYSIS, PART II
Then
g(kr) = χ[−1,1] (kr) cos(2πk) = χ[−1,1] (kr) = f (kr)
for all k ∈ Z. Thus f and g are indistinguishable by sampling at the points
tk = kr.
The previous example hints at the general principle that in order to
faithfully represent a signal with sampling rate r−1 , the sample should not
contain frequencies higher than r−1 . (Actually, both signals above contain
arbitrarily high frequencies due to the fact that they are compactly sup-
ported in time.)
Let us turn to the positive result. We define the function
sin πx
sinc(x) := πx .
Theorem 3.1.1 (Shannon–Nyquist sampling theorem). Let r > 0. Let

f ∈ S(R) and suppose fˆ(ξ) = 0 for all |ξ| ≥ 2r 1
. Then
X
f (t) = f (kr) sinc[ 1r (t − kr)].
k∈Z
In particular, to reconstruct a function that is bandlimited to frequencies

1
|ξ| ≤ 2r , one must use a sampling rate at least twice the highest frequency
1
(i.e. r ). This is called the Nyquist frequency associated to the sampling
rate r−1 .
Note that a function with compactly supported Fourier transform is an-
alytic (cf. the Paley–Wiener theorem); thinking of such functions as ‘infinite
degree polynomials’, it is perhaps not surprising that the function may be
reconstructed using only countably many sample points.
Remark 3.1.2. The appearance of the sinc function in Theorem 3.1.1 may
seem a bit mysterious. One approach for deriving the formula appearing in
the Shannon–Nyquist theorem is to use the fact that
fˆ = χ[− 1 , 1 ] [Π 1 ∗ fˆ]
2r 2r r
and take the inverse Fourier transform. Here ΠN denotes the distribution
X
ΠN (ϕ) = ϕ(kN ),
k∈Z
and the effect of convolution with ΠN is to periodize the function, cf.

X
ΠN ∗ ϕ(x) = ϕ(x − kN ).
k∈Z
3.1. SAMPLING OF SIGNALS 41
Computing the inverse Fourier transform of χ[− 1 , 1 ] (see Lemma 3.1.3 be-
2r 2r
low) and Π 1 (see Exercise 3.5.1), one can deduce the formula appearing in
r
Theorem 3.1.1. See Exercise 3.5.2.
We turn to Theorem 3.1.1. Let us begin by recording the key property
of the sinc function.
Lemma 3.1.3. The function sinc belongs to L2 (R). Its Fourier transform
is given by
F[sinc](ξ) = χ[− 1 , 1 ] (ξ).
2 2
In particular, for any r > 0 and k ∈ Z,
F[sinc( 1r (· − kr))](ξ) = rχ[− 1 , 1 ] (ξ)e−2πiξkr .

2r 2r
Proof. That sinc ∈ L2 follows from the fact that it is bounded near x = 0
and decays like |x|−1 for large x.
Next, we compute
Z 1/2
−1 sin(πx)
F [χ[− 1 , 1 ] ](x) = e2πixξ dξ = = sinc(x),
2 2
−1/2 πx
which yields the desired identity. The final identity then follows from apply-
ing the scaling and translation identities for the Fourier transform, obtained
by a change of variables.
Corollary 3.1.4. For r > 0, the family {sinc( 1r (· − kr))}k∈Z forms an

orthogonal basis for the subspace of L2 defined by
H = {g ∈ L2 : ĝ is supported in 1 1
[− 2r , 2r ]}.
Proof. By Plancherel’s theorem and Lemma 3.1.3, the family is orthonormal.

Now if g ∈ H is orthogonal to every element of this family, then again
applying Plancherel’s theorem we see that the Fourier series of ĝ is identically
zero, and hence g is zero. This shows that the family is complete, as needed.
The next lemma is an important result of independent interest. It is

known as the Poisson summation formula.
Lemma 3.1.5 (Poisson summation formula). For ϕ ∈ S, we have
X X
ϕ(k) = ϕ̂(k).
k∈Z k∈Z
Proof. Let X
Φ(t) = ϕ(t − k).
k∈Z
Note that Φ(0) gives the left-hand side of the Poisson summation formula.
Next, observe that Φ(t) is periodic of period one. Thus it has a Fourier
series expansion X
Φ(t) = Φ̂(m)e2πimt .
m∈Z
The Fourier coefficients Φ̂(m) as computed as follows:

Z 1
Φ̂(m) = e−2πimt Φ(t) dt
0
XZ 1
= e−2πimt ϕ(t − k) dt
k∈Z 0
XZ −k+1
= e−2πimt ϕ(t) dt = ϕ̂(m),
k∈Z −k
where we have changed variables and used e−2πimk ≡ 1. In particular, we

now see that Φ(0) also equals the right-hand side of the Poisson summation
formula, and thus the result follows.
We next prove a lemma from which Theorem 3.1.1 will follow directly.
Lemma 3.1.6. Let r > 0 and f ∈ S(R). Define

X
g(t) = f (kr) sinc[ 1r (t − kr)].
k∈Z
Then g ∈ L2 , g(kr) = f (kr) for all k ∈ Z, and

X
ĝ(ξ) = χ 1 1 (ξ)
ˆ − k ).
f (ξ
[− 2r , 2r ] r
k∈Z
Proof. That g ∈ L2 follows from the fact that it is a linear combination

of orthogonal functions with rapidly decaying coefficients; this latter fact
follows from the assumption that f ∈ S. Next,
X X sin π(` − k)
g(`r) = f (kr) sinc[` − k] = f (kr) = f (`r).
π(` − k)
k∈Z k∈Z
3.1. SAMPLING OF SIGNALS 43
Now we compute the Fourier transform. By Lemma 3.1.3, we have

X
ĝ(ξ) = rχ 1 1 (ξ) f (kr)e−2πiξkr .
[− 2r , 2r ]
k∈Z
Applying the Poisson summation formula to the function
h(x) = f (xr)e−2πiξxr
(for fixed r > 0 and ξ ∈ R) now yields

X
ĝ(ξ) = χ 1 1 (ξ) f (ξ ˆ − k ),
[− 2r , 2r ] r
k∈Z
as was needed to show.
Proof of Theorem 3.1.1. Define g(t) as in Lemma 3.1.6. In particular,

X
ĝ(ξ) = χ 1 1 (ξ) fˆ(ξ − kr ).
[− 2r , 2r ]
k∈Z
Thus, if fˆ is supported in [− 2r
1 1
, 2r ], it follows that ĝ = fˆ. This yields the
result.
We have seen that to reconstruct a signal using a discrete set of samples,

we need two things: (i) we need the signal to be bandlimited, and (ii)
we need (countably) infinitely many samples. In practice, neither of these
will be satisfied. Indeed, since our signals will be compactly supported in
time, they cannot be bandlimited. This is the content of Corollary 2.8.2.
Furthermore, clearly we can only take finitely many samples of any signal.
Thus, in practice one should either decide upon a feasible sampling rate
(or a feasible maximum frequency that one hopes to capture) and then take
as many samples as possible with the appropriate sampling rate.
If one uses too low of a sampling rate, one can run into the problem
of ‘aliasing’. This can be understood by considering Lemma 3.1.6. In
particular, for a Schwartz function f we take samples at the points kr and
define the function
X
g(t) = f (kr) sinc[ 1r (t − kr)]
k∈Z
as a possible candidate for reconstructing f . In particular, note that g(kr) =

f (kr).
Suppose that fˆ is supported in [− N2 , N2 ]. If r−1 > N then we see that

ĝ(ξ) will contain one single copy of fˆ(ξ), confirming what we already know:
for a sufficiently high sample rate of a bandlimited signal, we can reproduce
f by a discrete set of samples. On the other hand, if r−1 < N , then ĝ will
contain extra (shifted) copies of fˆ. In particular, certain frequencies will be
‘counted twice’ (or more than twice); this is called aliasing. Thus, if the
sample rate is too low, the function g will not be a faithful reproduction of
the signal. Indeed, in this case we will not have ĝ = fˆ.
3.2 Discrete Fourier transform

Recall that given a continuous signal f : R → R, we will generally only
−1
be able to take finitely many samples, say {f (tn )}N
n=0 . In this section we
describe how to define a corresponding discrete version of the Fourier trans-
form of f .
Let us write the sampled signal as the following distribution on R:
N
X −1
fd (t) := f (tn )δ(t − tn ),
n=0
where tn = nr for some r > 0. In order for this to faithfully reproduce f ,

we should assume that fˆ is supported on [0, r−1 ], say. (Previously it was
1 1
[− 2r , 2r ], but let us shift here for convenience.) We should also assume that
the bulk of the support of f is the interval [0, N r].
Now note that
N
X −1
ˆ
fd (ξ) = f (tn )e−2πiξtn .
n=0
We would like to use finite samples of fˆd to approximate fˆd . Since f is

supported mostly in [0, N r], we should take a sample rate of N r. Thus we
set sm = Nmr (for m = 0, . . . , N − 1, which should cover the entire support
of fˆd ) and define
N
X −1
Fd (sm ) = fˆd (sm ) = f (tn )e−2πism tn ,
n=0
i.e.
N −1
m X
Fd = f (nr)e−2πimn/N .
Nr
n=0
3.2. DISCRETE FOURIER TRANSFORM 45
We regard {Fd (sm )} as an approximate discrete version of fˆ. In fact, by

considering the Riemann sum approximation to the continuous integral, we
expect
Z Nr
−1
m
Fd ( N r ) ∼ r f (x)e−2πixm/N r dx = r−1 fˆ( Nmr ).
0
With this motivating example in mind, we proceed to the definition of
the discrete Fourier transform.
Definition 3.2.1. Let f ∈ CN have entries f [n], n = 0, . . . , N − 1. The

discrete Fourier transform of f is the vector
fˆ = Ff ∈ CN
with entries
N
X −1
fˆ[m] = e−2πimn/N f [n].
n=0
Note that F : CN → CN is a linear transformation and hence is rep-

resented by an N × N matrix F with entries Fmn = e−2πimn/N (where we
index m, n ∈ {0, 1, . . . , N − 1}). Writing
ω = ωN = e2πi/N ,
we can also write Fmn = ω −mn .

It is also convenient to introduce the vector ω with entries ω[k] = ω k ,
where k ∈ {0, . . . , N − 1}. Then the nth column of F is given by ω −n , where
this notation refers to component-wise operation. In particular,
N
X −1
Ff = ω −n f [n].
n=0
To make sure the notation is clear, here is F in the case N = 3:

 
1 1 1
F =  1 ω −1 ω −2  .
1 ω −2 ω −4
Just as in the continuous case, the discrete Fourier transform is invert-

ible. To invert the matrix F, it suffices to solve the equations
Fx = δk for each k = 0, . . . , N − 1,
where δk has a 1 in the k th position and zero elsewhere. (The columns of

F −1 are then the solutions ~uk .) By analogy with the continuous case, we
might expect that we should take x = ω k . Let us check:
N
X −1 N
X −1
−n k
Fω = k
ω ω [n] = ω −n ω kn .
n=0 n=0
Now ω −n [m] = ω −nm , and so

ω −n [m]ω kn = ω n(k−m) .
Now we claim that
N
X −1
ω n(k−m) = N δkm , (3.1)
n=0
which will imply
Fω k = N δk ,
whence
F −1 = 1
N F.
For (3.1), we just need to check that the sum is zero when k 6= m, i.e.
N
X −1
(ω ` )n = 0 for any ` ∈ {−N − 1, . . . , −1, 1, . . . , N − 1}.
n=0
In fact, this is a geometric series with ω ` 6= 1, and so equals

N −1
1−ω `N
X
(ω ` )n = 1−ω `
.
n=0
Since ω N = 1, the numerator is zero, as desired.

Given f ∈ CN , its Fourier transform fˆ can naturally be extended to a
periodic sequence of period N , i.e. fˆ(m + `N ) = fˆ(m) for any integer `.
This follows from the fact that ω −k(m+N ) = ω −km . In particular we can
view
N
X −1
ˆ
f [m] = ω −km f [k] for all m ∈ Z.
k=0
Because the inverse Fourier transform has the same general structure as
the Fourier transform, it is also natural to assume that the original discrete
signal f is also periodic of period N . This will be helpful below.
We now turn to some properties of the discrete Fourier transform that
parallel what we already know for the continuous Fourier transform.
3.2. DISCRETE FOURIER TRANSFORM 47
Lemma 3.2.2 (Plancherel). We have the following:
Ff · Fg = N (f · g),
P
where f · g = k f [k]ḡ[k] denotes the standard complex inner product on
CN .
Proof. We have
N
X −1 N
X −1
Ff · Fg = f [j]ḡ[k]ω −j · ω −k .
j=0 k=0
By the computations above,

N
X −1
ω −j · ω −k = ω `(k−j) = N δkj ,
`=0
where δkj is the Kronecker delta. Thus

N
X −1
Ff · Fg = N f [k]ḡ[k] = N f · g,
k=0
as desired.
We also note the following identities (left as an exercise):
F f [· − a] = ω −a Ff,

(3.2)
F ω a f = Ff [· − a].

This requires the interpretation of f, Ff as periodic sequences. Furthermore,

the product of two vectors should be interpreted as component-wise product,
i.e. (vw)[k] = v[k]w[k].
We next turn to convolution of sequences. This also requires the inter-
pretation of f ∈ CN as an infinite periodized sequence. The convolution
of f and g and is then defined by
N
X −1
f ∗ g[m] = f [k]g[m − k], m ∈ {0, . . . , N − 1}.
k=0
The convolution of two periodized elements of CN is then another periodized

element of CN .
Lemma 3.2.3. The following identities hold. First,
F(f ∗ g) = (Ff )(Fg),
where as usual the product on the right is component-wise product. Next,
F(f g) = N −1 Ff ∗ Fg,
where again f g denotes component-wise product.

Proof. We have
N
X −1
−1
NF (fˆĝ)[m] = fˆ[n]ĝ[n]ω mn
n=0
X
= f [k]g[`]ω n(m−`−k)
n,k,`
X
=N f [k]g[`]δ`(m−k)
k,`
X
=N f [k]g[m − k] = N f ∗ g[m],
k
and the desired identity follows from applying F. Now, an analogous com-
putation shows that
F −1 (a ∗ b) = N F −1 aF −1 b.
Thus
F −1 [N −1 Ff ∗ Fg] = f g,
which implies the second identity upon applying F.
Applying convolutions to signals (or, equivalently, multiplying their dis-

crete Fourier transforms by functions) allows us to perform various ‘filters’
on our signal (e.g. low-pass, high-pass, band-pass filters, and many other
variations). We will not pursue this topic here, but would like to remark
that this is a starting point for many important applications.
We would also like to make the following observation, which is useful
to keep in mind in applications. For real signals (i.e. vectors in RN ), the
Fourier transform has some symmetry properties. In particular, the discrete
Fourier transform splits at index N/2. Note that
N
X −1
Ff [N/2] = (−1)k f [k],
k=0
3.3. FAST FOURIER TRANSFORM 49
and in particular is real. One finds in general that

Ff [ N2 + k] = Ff [ N2 − k]
for k = 0, . . . , N2 − 1. Note Ff [0] is just the sum of the components of
f . One calls Ff [0] the ‘DC’ component; the frequencies m = 1, . . . , N2 − 1
the ‘negative frequencies’; and the frequencies m = N2 + 1, . . . , N − 1 the
‘positive frequencies’. All of the information about Ff is contained in the
DC component, negative frequencies, and the value Ff [N/2].
3.3 Fast Fourier transform

In the previous section, we saw that the discrete Fourier transform and its
inverse are simply linear transformations on CN and hence are represented
by N × N matrices. In applications, however, it can be rather costly to
perform matrix multiplications. In general, multiplication of an N × N
matrix by an N × 1 vector has computational complexity O(N 2 ).
The fast Fourier transform gives an efficient method for computing the
discrete Fourier transform. This approach actually appears in work of Gauss
(in 1805, modeling orbits by ‘Fourier series’). The algorithm as it will be
described here is more commonly associated to Cooley and Tukey.
In the following, we will typically assume that N = 2j for some j ∈ N.
We will write F = Fp for the discrete Fourier transform on sequences of
length p. We recall ωa = e2πi/a .
Lemma 3.3.1. Let f ∈ CN with N even. Then for m = 0, 1, . . . , N/2,
−m
FN f [m] = FN/2 fe [m] + ωN FN/2 f0 [m],
−m
FN f [m + N/2] = FN/2 fe [m] − ωN FN/2 f0 [m],
where
fe [n] = f [2n] and fo [n] = f [2n + 1] for n = 0, . . . , N2 − 1.
Proof. Let m ∈ {0, 1, . . . , N/2}. Then, splitting into even and odd parts,
N/2−1 N/2−1
X X
FN f [m] = f [2k]e−2πim(2k)/N + f [2k + 1]e−2πim(2k+1)/N .
k=0 k=0
The first term equals

N/2−1
X
fe [k]e−2πimk/(N/2) = FN/2 fe [m].
k=0
The second term is treated similarly to produce FN/2 fo [m], except there
−m
appears an extra power of e−2πim/N = ωN .
Next consider FN f [m + N/2]. In this case we only need to observe that
e−2πi(m+N/2)(2k)/N = e−2πim(2k)/N e−2πik = e−2πim(2k)/N ,
which leads to the FN/2 fe [m] term again. On the other hand,
1
e−2πi(m+N/2)(2k+1)/N = e−2πim(2k+1)/N e−2πi(k+ 2 ) = −e−2πim(2k+1)/N ,
which accounts for the minus sign in the formula above. This completes the
proof.
For N = 2j , this lemma can be iterated until one is reduced to computing
F1 , which is trivial. Computing the discrete Fourier transform this way is
the fast Fourier transform algorithm.
We will look at the fast Fourier transform in more detail below. First,
let us see what computational advantage it has.
Proposition 3.3.2. Let F (N ) be the number of operations it takes to com-
pute the discrete Fourier transform with the fast Fourier transform algo-
rithm. Then
F (N ) ∼ N log N.
Recalling that computing the discrete Fourier transform using matrix
multiplication takes O(N 2 ) elementary operations (e.g. additions and mul-
tiplications), we see that the fast Fourier transform provides a huge compu-
tational advantage.
Proof. Using Lemma 3.3.1, we find that
F (N ) = 2F (N/2) + cN for some c > 0.
Rearranging, this yields
F (N ) F (N/2)
N = N/2 + c.
F (2j )
Assuming N = 2j and setting aj = 2j
, this simply reads
aj = aj−1 + c, so that aj = jc + a0 .
But a0 = F (1), the number of operations needed to compute the discrete
Fourier transform of a single point. In particular, a0 = 0, so that aj = jc.
Thus
F (N ) = cN log N,
as claimed.
Let us now look closer at the fast Fourier transform algorithm. As we will
see, this algorithm amounts to a factorization of FN . Recalling Lemma 3.3.1,
we need to define an operation that sorts a vector into its even and odd
indices.
Definition 3.3.3. For f ∈ CN with N even, let
π 0 f [k] = f [2k] and π 1 f [k] = f [2k + 1]
for k ∈ {0, 1, . . . , N2 − 1}. In particular, π 0 f ∈ CN/2 and π 1 f ∈ C N/2 . We

denote sequential applications in a contravariant fashion, namely
π 01 f = π 1 π 0 f,
and so on. In particular, the input will always be a vector in CN , but the
j
input is a vector in CN/2 , where j is the number of digits appearing in
superscripts.
Next let us write IN for the N ×N identity matrix and ΩN for the N ×N
−m
diagonal matrix with ΩN [m, m] = ω2N . Then Lemma 3.3.1 may be written
as
FN/2 π 0 f

IN/2 ΩN/2
F N f = BN , BN = .
FN/2 π 1 f IN/2 −ΩN/2
Now we repeat the process, yielding
FN/4 π 00 f
 
 
 FN/4 π 01 f
 

 
FN f = BN · diag(BN/2 ) 

,

 FN/4 π 10 f 
 
 
FN/4 π 11 f
where diag(BN/2 ) is the N × N block diagonal matrix with BN/2 in the

upper left and lower right blocks.
Now continue this until we reach a vector with N applications of F1 = Id.
This leads to
N −1
logY
−1
FN f = diag(BN/2k ) (π c(k) f )N
k=0 , (3.3)
k=0
where diag should always be taken with a suitable interpretation, and c(k)
denotes the unique sequence of log N binary digits corresponding to the
element k ∈ {0, 1, . . . , N − 1}. For example, if N = 8 then c(4) = 100,

c(5) = 101, and so on. By uniqueness, we may also define the map c−1 taking
sequences of binary digits to the corresponding integer (so that c−1 (101) = 5
in the example just given).
−1
Let us continue from (3.3). The vector (π c(k) f )Nk=0 consists of some
rearrangement of the entries of f . We can understand exactly which rear-
rangement occurs through the following lemma.
Lemma 3.3.4. Given a sequence d = d1 · · · dj of binary digits, define Rd =
dj · · · d1 to be the reversal of d. Then
π d f = f [c−1 (Rd)]
j
for any vector f ∈ C2 , where c−1 is the map from sequences to integers
introduced above. Equivalently,
π dj · · · π d1 f = f [c−1 (dj · · · d1 )].
Proof. We proceed by induction. If j = 1 then d = Rd and so the claim
boils down to to π d f = f [d] for d ∈ {0, 1} and f ∈ C2 , which is true by
definition.
Now suppose the result holds up to level j − 1. Then
π dj ···d2 π d1 f = (π d1 f )[c−1 (dj · · · d2 )].
There are two cases, namely d1 ∈ {0, 1}. Let us first assume d1 = 0. In this
case,
(π 0 f )[k] = f [2k], so that (π 0 f )[c−1 (dj · · · d2 )] = f [2c−1 (dj · · · d2 )].
It therefore remains to show that
2c−1 (d) = c−1 (d0)
for any binary sequence d, where we note the slight abuse of notation in the
map c−1 above. Indeed, multiplying a number by 2 just increases each power
of 2 in its binary expansion, which equivalently shifts the binary sequence
to the left.
If d1 = 1 then (π 1 f )[k] = f [2k + 1], so that
(π 1 f )[c−1 (dj · · · d2 )] = f [2c−1 (dj · · · d2 ) + 1].
Thus we need to check
2c−1 (d) + 1 = c−1 (d1).
In fact, multiplying by 2 appends a zero to the sequence (as we just saw),
and adding one changes this zero to a one. This completes the proof.
Returning to the setting of (3.3),

π c(k) f = f [c−1 (Rc(k))].
Thus we see that to perform the fast Fourier transform consists of first
sorting the indices of f according to the rule above and subsequently multi-
plying by log N explicit block diagonal matrices. (This also provides another
derivation of the O(N log N ) computational complexity of the fast Fourier
transform.)
Note that the sorting described above defines a linear transformation on
RN and hence is represented by an N ×N matrix P . In particular, imposing
(P f )[j] = f [c−1 (Rc(j))]
leads to (
1 c(k) = Rc(j)
P [j, k] =
0 otherwise.
Thus for each row there will be precisely one nonzero entry (with value 1).
Example 3.3.1. If N = 8, the fast Fourier transform is computed by
 
f [0]
  f [4] 
 

B 2 0 0 0 
 f [2] 

B4 0  0 B 2 0 0   f [6] 
B8   .
0 B4  0 0 B2 0 
 f [1] 
 
0 0 0 B2  f [5] 
 
 f [3] 
f [7]
The sorting matrix has the form
 
1 0 0 0 0 0 0 0
 0 0 0 0 1 0 0 0 
 
 0 0 1 0 0 0 0 0 
 
 0 0 0 0 0 0 1 0 
 .
 0 1 0 0 0 0 0 0 
 
 0 0 0 0 0 1 0 0 
 
 0 0 0 1 0 0 0 0 
0 0 0 0 0 0 0 1
Remark 3.3.5. If N is not of the form 2j , a common trick is to ‘pad’ with
zeros until N is of this form. Despite increasing the dimension, this can still
result in a computational advantage. We will not pursue this topic here.
3.4 Compressed sensing

We next turn to an introduction to the area of compressed sensing. In
many applications, signals are in some sense ‘sparse’, and because of this
it is often possible to (i) reconstruct signals using far fewer measurements
than expected and (ii) compress these signals significantly without losing
information (for purposes of data storage, for example). This is a field that
has developed rapidly in recent years and has many important applications.
We will primary present the results of [4]; we also use [11] as a reference. At
times, certain standard probabilistic estimates may be used without proof.
Most of the proof will be presented; however, some very technical elements
will be relegated to the exercises.
We focus on the problem of recovering sparse signals from small sets of
Fourier coefficients. Sparseness can be measured using the ‘`0 norm’ (it is
not a norm, nor even a quasi-norm): for f ∈ CN ,
kf k`0 = | supp(f )|, supp(f ) = {j : f [j] 6= 0}.
Here | · | denotes counting measure.

We begin with the following preliminary result.
Theorem 3.4.1. Let f ∈ CN satisfy kf k`0 = s. If N ≥ 2s, then f can be

reconstructed from its first 2s Fourier coefficients {fˆ[n] : n = 0, 1, . . . , 2s −
1}.
Remark 3.4.2. More generally, if N is prime, and kf k`0 = s then f can

be reconstructed from any collection of 2s Fourier coefficients. We will be
content to prove Theorem 3.4.1. See [25] for the more general result.
Proof of Theorem 3.4.1. We need to determine S = supp{f } and {f [j] : j ∈

S}. The proof will actually demonstrate how one may reconstruct f .
Consider Y
p(t) = N1 (1 − e−2πik/N e2πit/N ).
k∈S
This is a trigonometric polynomial, i.e. a polynomial in e2πit/N . Note

that p(t) = 0 for t ∈ S.
Furthermore, since f [j] = 0 for t 6∈ S, we have
p(t)f (t) = 0 for all 0 ≤ t ≤ N − 1.
In particular,
p̂ ∗ fˆ = pf
c = 0,
3.4. COMPRESSED SENSING 55
where pf denotes component-wise product. We may rewrite this as

N
X −1
p̂(k)fˆ(j − k) = 0 (3.4)
k=0
for j = 0, . . . , N − 1.
Now observe (by considering the inverse Fourier transform) that p̂(k) is
the coefficient of p(t) appearing with e2πikt/N . In particular, p̂(0) = 1 and
(since p is a trigonometric polynomial of degree ≤ s) p̂(k) = 0 for k > s.
Rewriting (3.4) for s ≤ j ≤ 2s − 1 leads to
fˆ[s] + p̂[1]fˆ[s − 1] + · · · + p̂[s]fˆ[0] = 0,

fˆ[s + 1] + p̂[1]fˆ[s] + · · · + p̂[s]fˆ[1] = 0,
..
.
fˆ[2s − 1] + p̂[1]fˆ[2s − 2] + · · · + p̂[s]fˆ[s − 1] = 0.
We rewrite this as a linear system
fˆ[s − 1] fˆ[s − 2] · · · fˆ[0] fˆ[s]

    
p̂[1]
.. ..   ..  ..
 .  = −  . (3.5)
  
 . ··· . .
fˆ[2s − 2] ··· · · · fˆ[s − 1] p̂[s] fˆ[2s − 1]
Now, the matrix on the left and the vector on the right are known quan-
tities. In particular, given f ∈ CN and knowledge of its first 2s Fourier
coefficients, we can write down the system (3.5), which must have at least
one solution (namely {p̂[k]}sk=1 ). However, this solution may not be unique.
In the following, we find some solution q̂ to (3.5), which we extend to CN
by setting q̂[0] = 1 and q̂[k] = 0 for k > s.
Then, reversing the steps above, we have q̂ ∗ fˆ[j] = 0 for s ≤ j ≤
2s − 1. In particular qf (component-wise product) has Fourier transform
vanishing on s consecutive indices. We claim that this implies qf ≡ 0. To
see this, we first note that since f = 0 outside of S, to compute the vector
of Fourier coefficients qfc [j] for s ≤ j ≤ 2s − 1 it suffices to multiply the
vector {qf [j]}j∈S by the s × s submatrix A of FN defined by choosing the
rows s through 2s − 1 of FN and the columns defined by indices in S. Thus
it remains to check that A is invertible.
To see this, note that A is of the form A[i, j] = ω −(s+i)nj , where i =
0, . . . , s − 1 and S = {nj : j = 0, . . . , s − 1}. In particular, after factoring
out ω −snj from each column, A is the transpose of a Vandermonde matrix
(i.e. a matrix whose rows are geometric progressions) corresponding to the

parameters ω −n0 , . . . ω −ns−1 . Thus det A is a nonzero multiple of the product
over 0 < j < k ≤ s − 1 of ω −nj − ω −nk , and so the result follows provided
ω nj 6= ω nk for any j, k. In fact, this follows from the fact that 0 ≤ nj , nk ≤
N − 1 and nj 6= nk .
We conclude that A is invertible, and hence qf [j] = 0 for j ∈ S. In
particular, q[j] = 0 for j ∈ S. Now, recalling that q̂[k] = 0 for k > s (so
that q is a trigonometric polynomial of degree ≤ s), we see that the fact
that q[j] = 0 for j ∈ S actually identifies S. That is, S is given precisely by
the zeros of q.
Finally, to find f [j] for j ∈ S, note that fˆ[k] for k = 0, . . . , 2s − 1
are given by 2s linear equations involving the unknowns f [j]. Solving this
system yields f [j].
Remark 3.4.3. Let us summarize the proof above. To reconstruct a signal

f with kf k`0 = s from its first 2s Fourier coefficients, proceed as follows:
• Find a solution q̂ to the linear system (3.5). Extend to CN by setting

q̂[0] = 1 and q̂[k] = 0 for k > s.
• Find the zeros of q = F −1 q̂. This identifies S.
• Solve the linear system that produces the first 2s Fourier coefficients
of f from the unknowns f [j], j ∈ S.
The results just discussed suggest that in general, one should expect
that knowledge of a signal’s Fourier coefficients on a set Ω should suffice
to construct signals with support of around the same size as Ω. Strictly
speaking, this is not true.
√
Example 3.4.1. √ Suppose N is a perfect square. Define f by f [j√ N ] = 1 for
j = 0, 1, . . . N − 1 and f [k] = 0 otherwise. Then kf k`0 = N . Let us
compute the Fourier transform. We have
√
N
X −1 N −1
X √
−2πi`k/N
fˆ[k] = e f [`] = e−2πijk/ N
.
`=0 j=0
√ √
Now, if k = p N for some p, then the sum yields N . Otherwise, summing
the geometric series (as we did when computing the inverse Fourier transform
in general) yields zero. Thus
√
fˆ = N f.
In √
particular, we may√ choose Ω to be the set of √
frequencies precisely avoiding
{p N : p = 0, . . . , N − 1}. Then |Ω| = N − N , but knowing the Fourier
coefficients
√ on Ω cannot distinguish f (supported on a set of much smaller
size N ) from the zero signal.
The result that we will present essentially restores the intuition intro-
duced above (which was just shown to be wrong, strictly speaking). The
key is that one must incorporate a probabilistic viewpoint.
Before stating the result, let us first note that the reconstruction problem
under consideration is equivalent to the following `0 minimization problem:
minimize kgk`0 subject to ĝ|Ω = fˆ|Ω . (P0 )
This turns out to be computationally expensive and not particularly robust

(e.g. if one is dealing with noisy measurements). In practice, one instead
may consider the following `1 minimization problem:
minimize kgk`1 subject to ĝ|Ω = fˆ|Ω . (P1 )
The result we will prove will ultimately construct solutions to (P1 ).

Before moving on to the main result, let us quickly show that solving
(P1 ) yields ‘sparse’ signals (in the real-valued case, at least).
Proposition 3.4.4. Suppose there exists a unique minimizer g to the prob-

lem (P1 ) over RN . Then kgk`0 ≤ |Ω|.
Proof. The problem consists of minimizing the `1 norm subject to a con-

straint of the form Ag = fˆ for A ∈ C|Ω|×N . In fact, A is just a submatrix of
FN .
Let g be the unique minimizer and S = supp(g). Writing aj for the
columns of A, we will show that {aj : j ∈ supp(g)} is independent, which
implies | supp(g)| ≤ |Ω|, as desired.
Suppose Av = 0 for some v with supp v ⊂ S. Suppose toward a contra-
diction that v 6= 0.
As g is the unique minimizer of (P1 ) and A(g + v) = Ag, we have
X
kgk`1 < kg + tvk`1 = sign(g[j] + tv[j])(g[j] + tv[j])
j∈S
for any t 6= 0. Choosing
|t| < min |g[j]|kvk−1

`∞ ,
j∈S
we are guaranteed that sign(g[j] + tv[j]) = sign(g[j]) for each j. Thus

X X
kgk`1 < sign(g[j])g[j] + t sign(g[j])v[j]
j∈S j∈S
X
= kgk`1 + t sign(g[j])v[j].
j∈S
This gives a contradiction upon sending |t| → 0.
Let us finally state the main result of this section.
Theorem 3.4.5 (Candès–Romberg–Tao). Let f ∈ CN and M ≥ 1. There

exists CM > 0 such that the following holds:
Suppose f is supported on some set S. Choose Ω of size |Ω| = Nω
uniformly at random. If
|Ω| = Nω ≥ CM |S| log N,
then with probability at least 1 − O(N −M ) the minimizer of (P1 ) is unique

and equals f .
Remark 3.4.6. In Theorem 3.4.5, the Fourier coefficients are randomly

sampled. In particular, given Nω , we choose Ω uniformly at random from
all sets of this size. Thus each of the NNω possible subsets are equally
likely. The result says that the fraction of such subsets from which we can
reconstruct f is at least 1 − O(N −M ), provided Nω ≥ CM |S| log N .
Remark 3.4.7. This theorem is optimal. Consider again the vector in

Example 3.4.1. To have a chance of recovering f , the set Ω must overlap
W = supp fˆ in at least one point. Now, choosing Ω uniformly at random,
we have √
N− N N
P(Ω ∩ W = ∅) = ÷ .
|Ω| |Ω|
√
Now, we should already be assuming that |Ω| > |S| = N . Under this
assumption, we get the following lower bound:
√
2|Ω| N
P(Ω ∩ W = ∅) ≥ (1 − N )
See Exercise 3.5.4. Therefore, if we hope for P(Ω ∩ W = ∅) ≤ N −M , we need

√ 2|Ω|
N log(1 − N ) ≤ −M log N.
Supposing we also want to avoid the case when |Ω| is comparable to N (so
that |Ω| 12 N ), we can view log(1 − 2|Ω| 2|Ω|
N ) as comparable to − N , whence
√
|Ω| &M N log N ∼M |S| log N.
In particular, the log N appearing in Theorem 3.4.5 cannot be avoided.
We turn to the proof of Theorem 3.4.5. In the following, we denote

the restricted Fourier transform by FS→Ω ; that is, if e : `2 (S) → `2 (CN ) is
extension by zero, then FS→Ω : `2 (S) → `2 (Ω) is given by
FS→Ω f = FN (ef )|Ω for all f ∈ `2 (S).
For complex vectors f ∈ CN supported on a set S, we let the sign vector

sgn(f ) be defined by
f [n]
sgn(f )[n] = for n ∈ S,
|f [n]|
with sgn(f ) = 0 off S.

We will prove the following proposition, which we will then use in the
proof of Theorem 3.4.5.
Proposition 3.4.8. Let Ω ⊂ {0, . . . , N − 1}. Suppose f ∈ CN and S =

supp(f ). Suppose that there exists P ∈ CN such that
• supp P̂ ⊂ Ω,
• P [t] = sgn f [t] on S,
• |P [t]| < 1 for t ∈

/ S.
Then if FS→Ω is injective, then the minimizer to (P1 ) is unique and equals
f . (Conversely, if f is the unique minimizer of (P1 ) then there exists P as
above.)
Proof of Proposition 3.4.8. Let us prove only the forward direction, which
is most directly useful for us.
Suppose such a vector P exists. Suppose that g satisfies ĝ|Ω = fˆ|Ω . Set
h = g − f . Then on S we have
|g| = |f + h| ≥ |f | + Re[hsgn(f )] = |f | + Re[hP̄ ].

To establish this inequality, write f = |f |eiθ (i.e. eiθ = sgn(f )) and h =

|h|eiα ; then the inequality is equivalent to
||f | + |h| cos β + i|h| sin β| ≥ |f | + |h| cos β, β = α − θ.
Outside of S, we have
|g| = |h| ≥ Re[hP̄ ],
since |P | < 1.
It follows that
N
X −1

kgk`1 ≥ kf k`1 + Re h[n]P̄ [n] .
n=0
Applying Plancherel, using the properties of P , and recalling ĥ = 0 on Ω,

we deduce that
N
X −1 N
X −1
1

Re h[n]P̄ [n] = N Re ĥ[n]P̂ [n] = 0.
n=0 n=0
It follows that kgk`1 ≥ kf k`1 , so that f is a minimizer of (P1 ).

Now suppose that kgk`1 = kf k`1 . Then (considering the argument
above) we must have
|h[n]| = Re(h[n]P̄ [n]) for n ∈

/ S.
(Indeed, otherwise we obtain

N
X −1
kf k`1 = kgk`1 > kf k`1 = Re[h(n)P̄ (n)],
n=0
| {z }
=0
a contradiction.) However, since |P [n]| < 1 for n ∈

/ S, this implies h ≡ 0 off
of S. By the assumption that FS→Ω is injective, we also have that h = 0 on
S. In particular, f = g, and hence f is the unique minimizer of (P1 ).
The strategy will now be to construct a suitable polynomial satisfying

the first two properties in Proposition 3.4.8 and to show that it satisfies the
desired upper bound with high probability. We would like to choose
−1 ∗
P := FC∗ N →Ω FS→Ω FS→Ω
∗
FS→Ω e sgn(f ), (3.6)
where the notation means the following. First, ∗ denotes adjoint (i.e. conju-
gate transpose). In particular, recalling that e is extension by zero, we have
that e∗ : `2 (CN ) → `2 (S) is simply restriction to S.
If we can define such P (given S and Ω), then P automatically has
Fourier support in Ω. Furthermore, we claim that e∗ P = e∗ sgn(f ). In fact,
this follows from
e∗ FC∗ N →Ω = (FCN →Ω e)∗ = FS→Ω

∗
.
Thus, fixing f and its support S, the proof of Theorem 3.4.5 boils down
to proving that if Ω is chosen uniformly at random from sets of size &M
|S| log N , then
1. The operator FS→Ω is injective with probability 1 − O(N −M ), and
2. The function P defined by (3.6) satisfies |P | < 1 off of S with proba-

bility 1 − O(N −M ).
∗
Indeed, if item (1) is satisfied then FS→Ω FS→Ω is necessarily invertible.
(Both are equivalent to FS→Ω having full column rank.)
It turns out to be simpler to prove these when one uses a different prob-
abilistic model than simply selecting Ω uniformly at random. In particular,
let us first consider the Bernoulli model. Given 0 < τ < 1, we create the
random sequence
(
1 with probability τ
Iω = (3.7)
0 with probability 1 − τ.
We then can then define a random set of Fourier coefficients by Ω = {ω :

Iω = 1}. The size |Ω| is random and follows a binomial distribution with
E(|Ω|) = τ N . In fact, when N is large, one has |Ω| ∼ τ N with high
probability (by large deviation estimates).
We will prove the following two propositions.
Proposition 3.4.9 (Invertibility). Let S ⊂ CN and M ≥ 1. Choose Ω
according to the Bernoulli model with parameter τ . Suppose
τ N &M |S| log N.

∗
Then FS→Ω FS→Ω is invertible with probability at least 1 − O(N −M ).
Proposition 3.4.10 (Bounds). Under the assumptions of Proposition 3.4.9,
the function P defined by (3.6) satisfies |P | < 1 off the set S with probability
at least 1 − O(N −M ).
Assuming these two results, let us complete the proof of Theorem 3.4.5.
Proof of Theorem 3.4.5, assuming Propositions 3.4.9 and 3.4.10. Let F (Ω)
be the event that no polynomial P exists as in Proposition 3.4.8 if we choose
the set Ω of Fourier coefficients.
Let Ω be of size Nω drawn uniformly, and Ω0 be drawn according to the
Bernoulli model with τ = NNω . Writing Ωk for a set of frequencies chosen
uniformly at random with |Ωk | = k, we have
N
X N
X
P(F (Ω0 )) = P(F (Ω0 ) : |Ω0 | = k) P(|Ω0 | = k) = P(F (Ωk )) P(|Ω0 | = k).
k=0 k=0
Now note that P(F (Ωk )) is decreasing in k (it only gets easier to reconstruct
using larger sets). We also claim that
P(|Ω0 | ≤ τ N ) > 12 ,
which follows from the fact that τ N is an integer and hence the median of
the random variable |Ω0 |. Thus
Nω
X Nω
X
P(F (Ω0 )) ≥ P(F (Ωk )) P(|Ω0 | = k) ≥ P(F (Ω)) P(|Ω0 | = k) ≥ 1
2 P(F (Ω)).
k=1 k=1
In particular, if we can bound the probability of failure for the Bernoulli

model, then the probability of failure for the uniform model will be no more
than twice as large.
The key to proving both Proposition 3.4.9 and 3.4.10 is to establish

certain probabilistic estimates for random matrices. From this point on, we
assume the Bernoulli model holds and also assume |τ N | > M log N .
Define X X
Hf [t] = − e2πiω(t−s)/N f [s] (3.8)
ω∈Ω S3s6=t
and set H0 = e∗ H. Writing IS for the identity operator on `2 (S) (so that
e∗ e = IS ), we have
1 1 ∗
e− |Ω| H = |Ω| FCN →Ω FS→Ω ,
1 1 ∗
IS − |Ω| H0 = |Ω| FS→Ω FS→Ω .
In particular, introducing H0 separates the diagonal term of FS→Ω ∗ FS→Ω

(which equals |Ω| identically) from the oscillatory off-diagonal. To define P
as in (3.6), then we wish to have
1 1 −1 ∗
P = (e − |Ω| H)(IS − |Ω| H0 ) e sgn(f ). (3.9)
To prove invertibility (cf. Proposition 3.4.9), we need to estimate the

operator norm of H0 . We will prove the following below:
Lemma 3.4.11 (Moment bounds). Let τ ≤ (1 + e)−1 and n0 = τN

4|S|(1−τ ) .
If n ≤ n0 , then
n n+1
E tr(H02n ) ≤ 2 4
|τ N |n |S|n+1 .

e(1−τ ) n
To prove the upper bound in Proposition 3.4.10, we will also need esti-
mates on H. The crucial estimate will be the following:
Lemma 3.4.12 (Moment bounds, II). Let τ ≤ (1 + e)−1 and n0 = τN

4|S|(1−τ ) .
For n = km ≤ n0 ,
n n+1
E |H0m sgn(f )|2k ≤ 2 4
|τ N |n |S|n

e(1−τ ) n
uniformly on S.
Assuming these moment bounds for now, let us complete the proof of
Proposition 3.4.9 and (3.4.10). Then, finally, we will prove the moment
bounds and thereby complete the proof of Theorem 3.4.5.
Proof of Proposition 3.4.9. We fix M > 0. We need to prove invertibility of

the matrix
1
IS − |Ω| H0 ,
where H0 = e∗ H and H is as in (3.8). For this, we essentially need to show

that we have the following bound on the operator norm:
kH0 k < c|Ω|
for some 0 < c < 1 (see Exercise 3.5.5). Recall that E{|Ω|} = τ N . We first
deal with the probability that |Ω| is far from its expectation.
We will use a standard large deviation estimate, namely
t 2
P{|Ω| < E(|Ω|) − t} ≤ exp{− 2 E(|Ω|) }
for any t > 0. Applying this with the choice

q
2M log N
t = τ N εM , where εM := τN (3.10)
yields
P{BM } ≤ N −M , where BM = {|Ω| < (1 − εM )τ N }.
Next, let AM denote the event {kH0 k ≥ τ√N2 }. We would like to bound
P{AM }. For this we will rely on Lemma 3.4.11 and the fact that (since H0
is self-adjoint)
kH0 k2n = kH0n k2 ≤ tr(H02n ).
τN
See Exercise (3.5.6). In particular, for any n ≤ 4|S|(1−τ ) , we use Tchebychev
and Lemma 3.4.11 to estimate
P{kH0 k ≥ τ√N
2
} = P{kH0 k2n ≥ ( τ√N2 )2n }
≤ P{tr(H02n ) ≥ ( τ√N2 )2n }
2n
≤ (τ N )2n
E{tr(H02n )}
2n n n+1
≤ (τ N )2n
4
2( e(1−τ )) n (τ N )n |S|n+1 .
Recalling the assumed lower bound τ N &M |S| log N , we see that we may
choose n ∼M log N , say n = (M + 1) log N . Choosing constants appropri-
ately, the upper bound above is of the form
8n|S| n −n
2n τ N (1−τ ) |S|e . |τ N |N −(M +1) . N −M .
In conclusion, we have shown that on AcM ∩ BM

c , we have
τ√N |Ω|
kH0 k ≤ 2
≤ √
2(1−εM )
< c|Ω|
for some uniform 0 < c < 1. In fact, this holds with the Frobenius norm
of H0 . This shows the desired invertibility with the desired probability and
completes the proof.
So far we have established that we may define the function (3.6) with high
probability. We turn to the proof of the upper bounds in Proposition 3.4.10,
which will again rely on Lemma 3.4.11 and Lemma 3.4.12.
Proof of Proposition 3.4.10. We need to prove bounds for

1 1 −1 ∗
P = (e − |Ω| H)(IS − |Ω| H0 ) e sgn(f ).
on the complement of S. We begin by writing

n−1
!
X
1 −1 1 n −1 1 m
(IS − |Ω| H0 ) = (IS − |Ω|n H0 ) |Ω|m H0
m=0
 
∞ n−1
!
np 
X X
1 1 m
= IS + |Ω|np H0 |Ω|m H0 ,
p=1 m=0
where we have used the identity (1−M )−1 = (1−M n )−1 (1+M +· · ·+M n−1 ).
In the following, we denote
∞
np
X
1
R= |Ω|np H0
p=1
tr H0 H0∗
p
and regard this as a remainder term. In fact, writing kH0 kF =
for the Frobenius norm, we have the following implication:
αn
kH0 kF ≤ α|Ω| =⇒ kRkF ≤ 1−αn .
Using the Cauchy–Schwarz inequality, we can write

1
kRk∞ ≤ |S| 2 kRkF ,
where X
kRk∞ := sup |R[i, j]| = sup kRxk`∞ .
i j kxk`∞ ≤1
Indeed here S is the number of columns of R. In particular, we have the

following implication:
1 n
α
kH0 kF ≤ α|Ω| =⇒ kRk∞ ≤ |S| 2 1−α n. (3.11)
As we will see, this will deal with the contribution of R in the formulas
above. Thus we will focus on proving estimates for the truncated series
n−1
X
1 m
|Ω|m H0 .
m=0
We claim that we may write
P = P0 + P1 off of S,
where
1 ∗
P0 = Dn sgn(f ), P1 = |Ω| HRe (I + Dn−1 ) sgn(f ),
n
X
1 m
Dn = |Ω|m H0 .
m=1
To this end, note that by (3.9) we have

1 1 −1 ∗
P = − |Ω| H(IS − |Ω| H0 ) e sgn(f ) off of S.
Continuing from above, some rearrangement shows that the claim boils down
to the identity
n−1
X n−1
X
∗ −m ∗ m
e |Ω| (He ) sgn(f ) = |Ω|−m (e∗ H)m sgn(f ).
m=0 m=0
We prove this by induction. In particular, if m = 0 then e∗ sgn(f ) = sgn(f )

since f is supported on S. Next, if e∗ (He∗ )m sgn(f ) = (e∗ H)m sgn(f ), then
(e∗ H)m+1 sgn f = e∗ H[e∗ (He∗ )m sgn f ] = e∗ (He∗ )m+1 sgn f,
as desired.
Now, choosing any a0 , a1 > 0 with a0 + a1 = 1, we begin with the bound

P sup |P (t)| > 1 ≤ P(kP0 k∞ > a0 ) + P(kP1 k∞ > a1 ).
t∈S
/
Let us first focus on proving bounds for P0 ; we will return to P1 below.

For the P0 term, we will use the moment bounds in Lemma 3.4.12. Recalling
the proof of Proposition 3.4.9, we have the set BM on which |Ω| < (1 −
εM )τ N , where εM is as in (3.10).
On the complement of BM , we have
n
X
1 m
|P0 | ≤ Ym , Ym = (1−εM )m (τ N )m |H0 sgn(f )|.
m=1
We suppose n = 2J − 1 for some J and let βj be positive numbers such that

J−1
X
2j βj ≤ α0 .
j=0
Then, by Tchebychev,
j+1 j+1
n
X X 2 X−1
J−1 X 2 X−1
J−1
−2Kj
E{|Ym |2Kj },

P Ym > a0 ≤ P(Ym > βj ) ≤ βj
m=1 j=0 m=2j j=0 m=2j
where Kj := 2J−j . Now, for 2j ≤ m < 2j+1 , we have n ≤ Kj m < 2n. Thus,
recalling |S| . τ N/n (with n ∼M log N ), we can apply Lemma 3.4.12 to get
a bound like
E{|Ym |2Kj } . (1 − εM )−2n ne−n αn
for some α ∈ (0, 1) (one should think of α ∼ 21 , but this parameter will be
−K
specified more precisely below). If we choose βj j ≡ β0−n , then summing
the above gives
P{|P0 (t)| > a0 } ≤ 2(1 − εM )−2n n2 e−n αn β0−2n .
With β0 ∼ .42, one has j 2j βj ≤ .91 and hence one can conclude
P
P{|P0 (t)| > a0 } ≤ εn := 2(1 − εM )−2n n2 e−n α2n (.42)−2n ,
where a0 ∼ .91. In particular, we have a set A(t) with P{A(t)} > 1 − εn and
c . As a consequence,
|P0 (t)| < .91 on A(t) ∩ BM
P sup |P0 (t)| > a0 ≤ N −M + N εn .

(3.12)
t
We now need to deal with P1 . For this, we observe

1
HRe∗ sgn(f ) + Q0 , Q0 := Dn−1 sgn(f ),

P1 = |Ω|
so that
1

kP1 k∞ ≤ |Ω| kHRk∞ 1 + kQ0 k∞ .
Note the argument just given for P0 applies equally well to Q0 .
Consider the event E = {kH0 kF ≤ α|Ω|}. As we saw in the proof of
Proposition 3.4.9, the probability of E exceeds 1 − O(N −M ). Using the
crude bound kHk∞ ≤ |S||Ω| (since H has |S| columns and each entry is
bounded by |Ω|) and (3.11), we have
3 n
1 α
|Ω| kRk∞ kHk∞ ≤ |S| 2 1−α n on E.
Putting together the pieces and recalling the bound (3.12) (for Q0 ), we see
that
3
αn
kP1 k∞ ≤ 2|S| 2 1−α n ≤ a1 (3.13)
with probability 1 − O(N −M ), provided that the second inequality holds.

Recall that a1 ∼ .09 is just a fixed constant here.
It remains to put together the pieces and complete the proof of Propo-
sition 3.4.10. Recall that we choose n ∼ (M + 1) log N and that we have a
free parameter α appearing in the definition of εn , which needs to be cho-
sen small. The choice is ultimately dictated by the fact that we want the
probability in (3.12) to be O(N −M ). In particular, we should take
α = .42(1 − εM ).
It remains to check that the final inequality in (3.13) holds. Using the crude
bound |S| ≤ N , we need
3 [.42(1 − εM )]n
2N 2 ≤ .09,
1 − [.42(1 − εM )]n
where we recall
q
2M log N
εM = τN and n ∼ (M + 1) log N.
In particular, the inequality above holds for N, M reasonably large. This
Finally, we turn to the probabilistic estimates in Lemma 3.4.11 and
Lemma 3.4.12. The proofs are similar, so let us focus on Lemma 3.4.11.
Proof of Lemma 3.4.11. Let us write the matrix elements of the |S| × |S|
matrix H0 as follows:
(
0 t = t0 , X
H0 (t, t0 ) = where c(u) := e2πiωu/N .
c(t − t0 ) t 6= t0 , ω∈Ω
In particular, the diagonal entries of H02n are given by

X
H02n (t1 , t1 ) = c(t1 − t2 ) · · · c(t2n − t1 ),
t2 ,...,t2n :tj 6=tj+1
where we write t2n+1 = t1 . It follows that

E{tr(H02n )}
X X 2n
X
= E exp{ 2πi
N ωj (tj − tj+1 )}
t1 ,...,t2n :tj 6=tj+1 ω1 ,...,ω2n ∈Ω j=1
X X 2n
X 2n
Y
= exp{ 2πi
N ωj (tj − tj+1 )} E Iωj ,
t1 ,...,t2n :tj 6=tj+1 0≤ω1 ,...,ω2n ≤N −1 j=1 j=1
where Iωj is the random variable defined in (3.7) and we have used the
linearity of expectation.
Now, for any ω = {ω1 , . . . , ω2n }, we may define an equivalence relation
R(ω) on A = {1, . . . , 2n} by imposing that
jR(ω)k iff ωj = ωk .
We claim that the expectation above depends only on the equivalence class
of R(ω), denoted A/R(ω). This is because the Iωj are independent and
identically distributed. In particular,
2n
Y
E Iωj = τ |A/R(ω)| .
j=1
(The number of elements in the equivalence class A/R(ω) tells you how
many times you should really multiply the probability τ to compute the
expected value.) With this in mind, we rewrite
X X X 2n
X
|A/R|
E{tr(H02n )} = τ exp N2πi
ωj (tj −tj+1 ) ,
t1 ,...,t2n :tj 6=tj+1 R∈P(A) ω∈Ω(R) j=1
(3.14)
where P(A) is the set of all equivalence relations on A and
Ω(R) = {ω : R(ω) = R} = {ω : ωa = ωb ⇐⇒ aRb}.
In particular, the equation (3.14) implicitly contains sums defined by im-

posing ωa 6= ωb for some a, b. The next step will be to rewrite the sums in
a way that avoids such ‘exclusions’, so that we can end up writing sums as
products that are easily understood. Here is the relevant identity:
Lemma 3.4.13 (Inclusion-exclusion formula).

0
X X Y X
f (ω) = (−1)|A/R|−|A/R | (|A0 /R| − 1)! f (ω),
ω∈Ω(R) R0 ≤R A0 ∈A/R0 ω∈Ω≤ (R0 )
where
R0 ≤ R if aRb =⇒ aR0 b
and
Ω≤ (R) = {ω : R(ω) ≤ R} = {ω : aRb =⇒ ωa = ωb }.
The proof of this lemma is outlined in Exercise 3.5.7.

Let us continue from (3.14), writing the inner sum as
X X
τ |A/R| f (ω),
R∈P(A) ω∈Ω(R)
where
2n
X
f (ω) = exp{ 2πi
N ωj (tj − tj+1 )}.
j=1
We apply Lemma 3.4.13 to this expression. By rearranging the sums over

R ∈ P(A) and over R0 ≤ R, we may rewrite the expression above as
X X
T (R0 ) f (ω),
R0 ∈P(A) ω∈Ω≤ (R0 )
where
0
X Y
T (R0 ) := τ |A/R| (−1)|A/R|−|A/R | (|A0 /R| − 1)!.
R≥R0 A0 ∈A/R0
We now claim that (by splitting A into equivalence classes of A/R0 and
further splitting the relations R00 on A0 ∈ A/R0 by number of equivalence
classes), we may rewrite this as
|A | 0
0
Y X
0
T (R ) = S(|A0 |, k)τ k (−1)|A |−k (k − 1)!, (3.15)
A0 ∈A/R0 k=1
where S denotes the Stirling number of the second kind, i.e.
S(n, k) = #{R ∈ P(A) : |A/R| = k}, with #A = n.
See Exercise 3.5.9. We denote the sum appearing above by F|A0 | (τ ), i.e.
n
X
Fn (τ ) := (k − 1)!S(n, k)(−1)n−k τ k . (3.16)
k=1
Using this notation, let us continue from above to finally express the desired
expected value in a way that is amenable to estimation. So far, we have
arrived at
X X Y X
E{tr(H02n )} = F|A0 | (τ ) f (ω),
R∈P(A) t1 ,...,t2n :tj 6=tj+1 A0 ∈A/R ω∈Ω≤ (R)
with f as above. Now let us work on this final sum. Note that for any
equivalence class A0 ∈ A/R and ω ∈ Ω≤ (R), we have ωa =Pωb for any a, b ∈
A0 . Denote this common value by ωA0 . Denote also tA0 = a∈A0 (ta − ta+1 ).
Then we can write
X Y
exp{ 2πi
N ω j (t j − t j+1 )} = exp{ 2πi
N ωA tA }.
0 0
j A0 ∈A/R
Using this, X Y X
f (ω) = exp{ 2πi
N ωA tA },
0 0
ω∈Ω≤ (R) A0 ∈A/R ωA0
where the sum is over all possible ωA0 ∈ {0, . . . , N − 1}. But now we observe
that the inner sum equals N when tA0 = 0 and equals zero otherwise. In
conclusion:
X X Y
E{tr(H02n )} = N |A/R| F|A0 | (τ ), (3.17)
R∈P(A) T (R) A0 ∈A/R
where
T (R) = {t1 , . . . , t2n s.t. tj 6= tj+1 and tA0 = 0 for all A0 ∈ A/R}.
This formula implies that we may disregard any R such that some equiv-
alence class in A/R is a singleton. Indeed, if A0 ∈ A/R equals {j}, then
tA0 = tj −tj+1 6= 0 (because of the constraint on the set T (R)). Disregarding
such relations, we can get the bound
#T (R) ≤ |S|2n−|A/R|+1 .
This follows from the fact that there are |A/R| many constraintsPon the tj
coming from the condition tA0 = 0, and one more coming from 2n j=1 (tj −
tj+1 ) = 0. Thus, continuing from (3.17),
n
X X Y
E{tr(H02n )} ≤ N k |S|2n−k+1 F|A0 | (τ ), (3.18)
k=1 R∈P(A,k) A0 ∈A/R
where P(A, k) contains equivalence relations on A with k equivalence classes

and no singleton classes. We will now estimate Fn (τ ) and then the final inner
sum and product.
We claim
(
τ τ
log 1−τ ≤ 1 − n,
Fn (τ ) ≤ G(n) := 1−τ 1−τ
(n−1)(log(n−1)−log log τ −1) τ
(3.19)
e log 1−τ > 1 − n.
Sketch of proof. Recall the definition of Fn in (3.16). Now, note that the
Stirling numbers satisfy the recurrence relation
S(n + 1, k) = S(n, k − 1) + kS(n, k).
Indeed, if a ∈ A and R ∈ P(A) has k equivalence classes, then either a is

not equivalent to any other element of A (so R has k − 1 equivalence classes
on A\{a}) or a belongs to one of the k equivalence classes of A\{a}. Using
this recurrence and induction, one can prove the identity
∞
X τ x xn−1
Fn (τ ) = (−1)n+k g(k), where g(x) = (3.20)
(1 − τ )x
k=1
for 0 ≤ τ ≤ 21 , say. We leave this as an exercise (see also [4]). Now g is

increasing for 0 < x < x∗ and decreasing for x > x∗ , with
x∗ = n−1
log( 1−τ )
.
τ
The different ranges of τ correspond to x∗ ≤ 1 or x∗ > 1; in either case one

gets the appropriate bound by looking at the alternating series.
Continuing from (3.18), we replace F|A0 | (τ ) with G(|A0 |). We are then
faced with estimating
X Y
Q(2n, k) := G(|A0 |).
R∈P(A,k) A0 ∈A/R
We claim
Q(n, k) ≤ G(2)k (2n)n−k , (3.21)
τ
where we note that G(2) = 1−τ . This will complete the proof as follows.
Noting
N G(2)
4n|S| ≥ 1,
we can apply (3.21) to see
n
X
E{tr(H02n )} ≤ N k |S|2n−k+1 G(2)k (4n)2n−k
k=1
n
( N4n|S|
G(2) k
X
≤ |S|2n+1 (4n)2n )
k=1
2n N G(2) n
2n+1
≤ n|S| (4n) ( 4n|S| )
n+1 n n n
≤ n|S| N G(2) (4n) ,
3.5. EXERCISES 73
τ 1
which is the desired estimate (recalling G(2) = 1−τ and τ ≤ 1+e ).I
It remains to verify (3.21), which we only sketch (and leave the details
as an exercise). The key is to establish the recursive estimate
Q(n, k) ≤ (n − 1)[Q(n − 1, k) + G(2)Q(n − 2, k − 1)]
for n ≥ 3, k ≥ 1, for then (3.21) can be deduced by induction (in n ≥ 3,

for fixed k). For the recursive estimate, fix any α ∈ {1, . . . , n} and let
R ∈ P({1, . . . , n}). Two situations are possible:
(i) [α]R contains only one other element (note that there are n − 1
choices). Removing [α]R from the product gives the (n−1)G(2)Q(n−2, k−1)
term.
(ii) [α]R has more than two elements, so that removing α from {1, . . . , n}
yields an equivalence class in P 0 = P({1, . . . , n}\{α}, k). Now let R0 ∈ P 0
and write A1 , . . . , Ak for the corresponding classes. Then α is attached to
one of these classes Ai , and we claim that G([α]R ) ≤ |Ai |G(|Ai |). Indeed,
this follows from G(n + 1) ≤ nG(n) (a consequence of log convexity of G).
Thus the total contribution to Q(n, k) is bounded by
k
X X Y
|Ai | G(|A0 |).
R0 ∈P 0 i=1 A0 ∈{1,...,n}\{α}/R
As ki=1 |Ai | = n − 1, the contribution becomes (n − 1)Q(n − 1, k). This

P
3.5 Exercises
Exercise 3.5.1. Prove that
F −1 [ΠN ] = N −1 ΠN −1 . (3.22)
Hint. Use the Poisson summation formula to treat the case N = 1. Then
compute the general case by scaling.
Exercise 3.5.2. Derive the formula appearing in Theorem 3.1.1 by following
the scheme outlined in Remark 3.1.2.
Exercise 3.5.3. Prove the identities (3.2).
I
Actually, we will miss by the factor e−n if n is not too large. However, in the appli-
cation above, n was of size ∼ log N , which may be expected to be large. Recovering this
factor in general requires an additional argument; see [4].
Exercise 3.5.4. Suppose n ≥ b > a ≥ 0 are integers with n − b − a > 0. Show

that
n−a

b 2b a
n ≥ (1 − n) .
b
(Hint: You can use induction on a.)

Exercise 3.5.5. Show that a matrix I − A is invertible if kAk < 12 , where k · k
denotes operator norm.
Exercise 3.5.6. Let H ∈ CN ×N and let H ∗ denote the adjoint (i.e. conjugate
transpose) of H. Let kHk denote the operator norm of H. Show that kHk
equals the largest (in magnitude) eigenvalue of H.
Show also that
kHk = kH ∗ k = kHH ∗ k.
p
and that p
kHk ≤ kHkF := tr(HH ∗ ).
Exercise 3.5.7. Prove exercise Lemma 3.4.13 by completing the following
argument. The details are found in Section IV B of [4].
One can pass from A to A/R to assume that R is simply equality =.
After relabeling A as A = {1, . . . , n}, the formula reduces to
X X Y X
f (ω) = (−1)n−|A/R| (|A| − 1)! f (ω), (3.23)
ω1 ,...ωn distinct R A0 ∈A/R ω∈Ω≤ (R)
where the sum is over all equivalence relations R on A. This formula may
be proved
P by induction. The base case n = 1 follows because both sides
equal f (ω). Suppose the formula has been proven up to level n − 1. Then
rewrite the left-hand side as
X X n−1
X
f (ω 0 , ωn ) − f (ω 0 , ωj ) , (3.24)
ω1 ,...,ωn−1 distinct ωn j=1
and apply the inductive hypothesis to get a new expression for the left-hand
side. Here ω 0 = (ω1 , . . . , ωn−1 ).
Now work on the right-hand side of the formula above that will eventually
lead to the same formula just derived. Note that any equivalence class R on
A can be restricted to an equivalence class R0 on A0 = {1, . . . , n − 1}. Then
R can be formed from R0 either by adding {n} as a new equivalence class
(in which case we write R = {R0 , {n}}), or by having nRj for some j ∈ A0 ,
in which case we write R = {R0 , {n}}/(n = j). In the latter case, there may
3.5. EXERCISES 75
be multiple ways to recover R; in particular there are |[j]R0 | ways, where

[·] denotes equivalence class. It follows that for any function F defined on
equivalence classes R of A, we can write
X X X n−1
X
1
F (R) = F ({R0 , {n}}) + |[j]R0 | F ({R0 , {n}/(n = j)}).
R R0 R0 j=1
Apply this identity to the right-hand side of (3.23). This produces two terms
that can be shown to match the two terms arising from (3.24). To make the
second terms match, one must utilize the identity
Y Y
0
1
|[j]R | (|A | − 1)! = (|A0 | − 1)!.
0
A0 ∈A/({R0 ,{n}}/(n=j)) A0 ∈{1,...,n−1}/R0
Using the above as a guide, complete the proof.

Exercise 3.5.8. Prove (3.20).
Exercise 3.5.9. Prove equation (3.15).
Chapter 4
Abstract Fourier analysis
In this section, we will take a tour through some topics in abstract harmonic
analysis. Our goal will not be to present a thorough theoretical presentation,
but rather to show how many of the preceding topics can be understood as
special cases of a more general theory. In particular, many preliminary
results will simply be quoted as needed; the interested reader is encouraged
to pick up [10] to find complete details. We will also explore related topics
in some new settings (e.g. in the setting of compact Lie groups).
4.1 Preliminaries
Definition 4.1.1. A topological group is a group G with a topology
such that the group operation and inverse operation are continuous (from
G × G → G and from G → G, respectively).
We will restrict our attention to groups whose topology is Hausdorff

(i.e. around any two distinct points one can find disjoint neighborhoods).
We will typically consider either compact or locally compact groups. Here
locally compact means that every point has a compact neighborhood.
Definition 4.1.2. A left Haar measure on G is a nonzero Radon measure

µ (i.e. a Borel measure, finite on compact sets, outer regular on Borel sets,
and inner regular on open sets) on G that satisfies µ(xE) = µ(E) for all
Borel sets E ⊂ G and every x ∈ G. A right Haar measure instead
satisfies µ(Ex) = µ(E).
dx
Example 4.1.1. If G = R\{0} (with multiplication), then |x| is a Haar mea-
sure on G.
76
4.1. PRELIMINARIES 77
Example 4.1.2. If G = GL(n, R) (the group of invertible n×n matrices), then

| det T |−n dT is a left and right Haar measure on G (where dT is Lebesgue
measure on the space of n × n matrices).
Example 4.1.3. If G is the ax+b group of all affine transformations x 7→ ax+b
on R (with a > 0 and b ∈ R), then a−2 da db is a left Haar measure and
a−1 da db is a right Haar measure on G. This measure will appear in the
setting of wavelets.
The basic facts we need about Haar measure are the following:
• Every locally compact group possesses a left Haar measure ([10, Theo-
rem 2.10]). Left Haar measure is unique up to a multiplicative constant
([10, Theorem 2.220]).
• If λ is a left Haar measure and x ∈ E, then λx (E) := λ(Ex) is again a

left Haar measure. Thus there exists ∆(x) so that λx = ∆(x)λ. This
defines a function (the modular function) ∆ : G → (0, ∞).
We need some facts about Banach algebras as well.
Definition 4.1.3. A Banach algebra refers to a Banach space with a

product ∗ such that kx ∗ yk ≤ kxk kyk. An involution is a map x 7→ x∗ such
that
(x + y)∗ = x∗ + y ∗ , (λx)∗ = λ̄x∗ , (xy)∗ = y ∗ x∗ , (x∗ )∗ = x.
A Banach algebra equipped with an involution is called a ∗-algebra. An

algebra is called unital if it contains a unit element. If A and B are ∗-
algebras, a ∗-homomorphism from A to B is a homomorphism φ such
that φ(x∗ ) = φ(x)∗ .
Example 4.1.4. If H is a Hilbert space, then L(H) (the space of bounded

operators on H) is a unital Banach algebra using the operator norm and
composition of operators. The involution is given by T 7→ T ∗ (the adjoint
of T ). In fact, this makes L(H) a C ∗ algebra, which means kx∗ xk = kxk2
for all x.
We need the following facts about Banach algebras:
• The spectrum of a commutative Banach algebra is the set of all

nonzero homomorphisms from the algebra to C. For unital Banach
algebras, the spectrum is compact (in the weak star topology).
78 CHAPTER 4. ABSTRACT FOURIER ANALYSIS
• If G is a locally compact group, then the space L1 (G) forms a Banach

∗-algebra under the product given by convolution, defined by
Z
f ∗ g(x) = f (y)g(y −1 x) dy,
where dy denotes Haar measure, and involution given by f ∗ (x) =

∆(x−1 )f (x−1 ), where ∆ is the modular function.
Fourier analysis on groups is closely connected to the topic of repre-

sentation theory.
We first define the notion of ∗-representation.
Definition 4.1.4. A ∗-representation of a Banach ∗-algebra A on a Hilbert

space H is a ∗-homomorphism φ from A to L(H) (the space of bounded
operators on H). We call φ nondegenerate if there is no nonzero v ∈ H
such that φ(x)v = 0 for all x ∈ A.
Next, we have the notion of a unitary representation of a group.
Definition 4.1.5. A unitary representation of a group G is a homomor-

phism π from G into the group U (Hπ ) of unitary operators on some nonzero
Hilbert space Hπ that is continuous with respect to the strong operator
topology. The dimension of the representation space Hπ is called the degree
of π.
The definition above means π(xy) = π(x)π(y) with π(x−1 ) = π(x)−1 =

π(x)∗ , with
x 7→ π(x)u
continuous from G to Hπ for any u ∈ Hπ .
If π1 and π2 are unitary representations of the same group G, we define
an intertwining operator for π1 and π2 to be a bounded linear map T :
Hπ1 → Hπ2 such that T π1 (x) = π2 (x)T for all x ∈ G. The set of such
operators is denoted by C(π1 , π2 ). We call π1 and π2 unitarily equivalent
if C(π1 , π2 ) contains a unitary operator U , so that π2 (x) = U π1 (x)U −1 .
Group representations essentially allow group elements to be represented
by matrices, with the group operation replaced by matrix multiplication.
This has applications in algebraic problems, as it can reduce questions about
group theory to problems in linear algebra. Group representations are also
widely found in modern physics, where the groups in question are typically
symmetry groups for some physical model.
4.2. LOCALLY COMPACT ABELIAN GROUPS 79
The two notions of representation above are related, in the sense that any
unitary representation π of G corresponds to a ∗-representation of L1 (G),
which we may still denote by π. In particular, for f ∈ L1 (G) we define the
bounded operator π(f ) ∈ L(Hπ ) by
Z
π(f ) = f (x)π(x) dx,
G
where dx denotes Haar measure. We interpret this operator in the weak

sense, namely
Z
hπ(f )u, viHπ = f (x)hπ(x)u, viHπ dx.
G
A closed subspace M of Hπ is called invariant for the representation π

if π(x)M ⊂ M for all x ∈ G. If π admits a nontrivial invariant subspace,
then π is called reducible. Otherwise, π is irreducible.
If G is an abelian group (i.e. xy = yx for all x, y ∈ G), then every
irreducible representation of G is one-dimensional (see [10, Corollary 3.6]).
This still leaves the question of the existence of such representations (besides
the trivial representation π0 (x) ≡ Id). For this, we quote the following
theorem, known as the Gelfand–Raikov theorem (see [10, Theorem 3.34]).
Theorem 4.1.6 (Gelfand–Raikov). Let G be a locally compact group. Then

for any distinct x, y ∈ G, there exists an irreducible representation π such
that π(x) 6= π(y).
4.2 Locally compact abelian groups

Let G be a locally compact abelian group and suppose π is an irreducible
representation of G. In particular, π is one-dimensional and hence we may
take Hπ = C. In this case we may write
π(x)z = hx, ξiz, z ∈ C,
where x 7→ hx, ξi denotes a continuous homomorphism from G into T (the

circle group). We call ξ a character of G, and denote the set of all characters
by Ĝ, which we call the dual group of G. Indeed, Ĝ forms an abelian group.
The group operation is given by
hx, ξ1 ξ2 i := hx, ξ1 ihx, ξ2 i (x ∈ G, ξj ∈ Ĝ),

with
hx, ξ −1 i = hx−1 , ξi = hx, ξi.
We give Ĝ the weak∗ topology (inherited as a subset of L∞ (G)). It turns out

(cf. [10, Theorem 3.31]) that this coincides with the topology of ‘compact
convergence’ on G, under which the group operations are continuous. This
also guarantees (via [10, Proposition 1.10(c)] and Alaoglu’s theorem) that
Ĝ is locally compact.
In fact, Ĝ can be identified with the spectrum of L1 (G). Indeed, ξ gives
a nondegenerate ∗-homomorphism of L1 (G) on C via
Z
ξ(f ) = hx, ξif (x) dx. (4.1)
G
Proposition 4.2.1. If G is compact and its Haar measure is normalized so

that |G| = 1, then Ĝ is an orthonormal set in L2 (G).
Proof. Let ξ ∈ Ĝ. Then |ξ|2 ≡ 1. As |G| = 1, this yields
kξkL2 (G) = 1.
Now, if ξ 6= η, then there exists x0 ∈ G so that hx0 , ξη −1 i =

6 1. Writing dx
for Haar measure, we then have
Z Z
hx, ξihx, ηi dx = hx, ξη −1 i dx
Z
= hx0 , ξη i hx−1
−1 −1
0 x, ξη i dx
Z
= hx0 , ξη i hx, ξη −1 i dx,
−1
where we have used the translation invariance of Haar measure. This implies
Z
hx, ξihx, ηi dx = 0,
as desired.
We will next prove the following result.
Proposition 4.2.2. If G is discrete then Ĝ is compact. If G is compact

then Ĝ is discrete.
Proof. If G is discrete, then L1 (G) has a unit element. Indeed, we take

δ(1) = 1 and δ = 0 otherwise. Thus the spectrum of L1 (G) (which is
identified with Ĝ) is compact.
Next, suppose G is compact. Using Proposition 4.2.1, we observe that
Z (
1 ξ=1
ξ=
0 ξ 6= 1.
In particular,
{f ∈ L∞ (G) : ∫ f > 21 } ∩ Ĝ = {1}.

We claim that this implies that {1} is open in Ĝ. Indeed, because R G is
compact, we have that the constant function 1 is in L1 (G) (so that f may
be viewed as 1(f ) through the identification of (L1 )∗ with L∞ ). This in turn
implies that every singleton set in Ĝ is open, i.e. Ĝ is discrete.
The next result puts the Fourier transform, Fourier series, and the dis-
crete Fourier transform under the same umbrella.
Theorem 4.2.3. We have the following:
• R̂ = R with hx, ξi = e2πixξ .
• T̂ = Z with hα, ni = αn .
• Ẑ = T with hn, αi = αn .
• If Zk is the additive group of integers modulo k, then Ẑk = Zk with

hm, ni = e2πimn/k .
Remark 4.2.4. If we write α ∈ T as α = e2πix for some x ∈ [−1, 1] then

we recover the familiar pairing hα, ni = e2πixn .
Proof. If φ ∈ R̂, then φ(0) = 1 (these represent the identity elements in R

and T, respectively). Thus there exists a > 0 so that
Z a
A := φ(t) dt 6= 0.
0
Now (as φ is a homomorphism)

Z a Z a+x
Aφ(x) = φ(x + t) dt = φ(t) dt,
0 x
which implies (by the fundamental theorem of calculus) that φ is differen-

tiable, with
φ0 (x) = A−1 [φ(a + x) − φ(x)] = cφ(x), c := A−1 [φ(a) − 1].
Thus φ(t) = ect ; however, since |φ| = 1 we may write c = 2πiξ for some
ξ ∈ R.
Next, since T can be identified with R/Z (via the identification of x ∈
R/Z with α = e2πix ), the characters of T are the characters of R that are
trivial on Z, so the result follows from above.
Now if φ ∈ Ẑ then α := φ(1) ∈ T and φ(n) = [φ(1)]n = αn (by the
homomorphism property).
Finally, the characters of Zk are the characters of Z that are trivial on
kZ. Thus they are of the form φ(n) = αn where α is a k th root of unity.
One can also check that if G1 , . . . , Gn are locally compact Abelian groups,
then
(G1 × · · · × Gn )ˆ = Ĝ1 × · · · × Ĝn .
This allows us to extend the previous result to see R̂n = Rn , T̂n = Zn ,
Ẑn = Tn , and finally Ĝ = G for any finite Abelian group G.
To define the Fourier transform on G, we make use of (4.1). In par-
ticular, the Fourier transform is the map from L1 (G) to C(Ĝ) given by
Z
Ff (ξ) = fˆ(ξ) := hx, ξif (x) dx.
G
Even in this generality, the Fourier transform enjoys many of the familiar
properties that we are used to. For example, it defines a norm-decreasing
∗-homomorphism from L1 (G) to C0 (Ĝ). It can also be extended to complex
Radon measures on G via
Z
µ̂(ξ) = hx, ξidµ(x).
G
This defines a bounded continuous function on Ĝ. The reverse works as

well: if µ is a complex Radon measure on Ĝ, then
Z
φµ (x) = hx, ξidµ(ξ)
Ĝ
defines a bounded continuous function on G; furthermore, the mapping µ 7→

φµ is linear and injective. A fundamental result in the theory (known as
Bochner’s theorem, cf. [10, Theorem R 4.19]) states that if φ is a continuous

∗
function of positive type on G (i.e. G [f ∗ f ]φ ≥ 0 for all f ∈ L1 (G)), then
there exists a unique positive measure µ on Ĝ such that φ = φµ .
One also has suitable notions of Fourier inversion formulas in this
generality. For example, if f = φµ for some complex Radon measure µ on
Ĝ and additionally f ∈ L1 (G), then fˆ ∈ L1 (Ĝ) and
Z
f (x) = hx, ξifˆ(ξ) dξ
provided Haar measure dξ on Ĝ is suitably normalized relative to the given

Haar measure on G. We can also write dµf (ξ) = fˆ(ξ) dξ. One calls dξ the
dual measure of the given Haar measure on G.
Example 4.2.1. If we identify R̂ with R via hx, ξi = e2πiξx , then Lebesgue
measure is its own dual. Indeed, the inversion formula holds with both dx
and dξ given by Lebesgue measure. If we instead identify R̂ with R with
1
hx, ξi = eiξx , then the dual of dx is 2π dξ. If we use the Haar measure
√1 dx, then the measure is again its own dual.
2π
We also have the following result related to Proposition 4.2.2.
Proposition 4.2.5. If G is compact and Haar measure is chosen so that

|G| = 1, then the dual measure on Ĝ is counting measure. If G is discrete
and Haar measure is taken to be counting measure, then the dual measure
on Ĝ satisfies |Ĝ| = 1.
Proof. Suppose G is compact. Let g ≡ 1. Then (using Proposition 4.2.1)

we have ĝ = χ{1} . It follows that
X
g(x) = hx, ξiĝ(ξ).
ξ∈Ĝ
where we have used that hx, 1i ≡ 1. This shows that the dual measure on
Ĝ must be counting measure.
On the other hand, if G is discrete then we let g = χ{1} . Then ĝ ≡ 1
and Z
g(x) = hx, ξi dξ,
Ĝ
so that |Ĝ| = 1.
Example 4.2.2. The groups T and Z are dual. The dual measures can be
dθ
taken to be normalized Lebesgue measure 2π and counting measure. Then
Fourier inversion becomes
Z 2π X
ˆ
f (n) = f (θ)e−inθ 2π
dθ
, f (θ) = fˆ(n)einθ .
0 n∈Z
Example 4.2.3. If G = Zk then the dual of counting measure is counting

measure divided by k (so that |Zk | = 1). Fourier inversion reads
k
X k
X
fˆ(m) = f (n)e−2πimn/k , f (n) = 1
k fˆ(m)e2πimn/k .
n=0 m=0
The general form of the Plancherel theorem is given by the following (see
[10, Theorem 4.26]).
Theorem 4.2.6 (Plancherel). The Fourier transform on L1 (G) ∩ L2 (G)
extends uniquely to a unitary isomorphism from L2 (G) to L2 (Ĝ). Conse-
quently, if G is compact and |G| = 1, then Ĝ is an orthonormal basis for
L2 (G).
We turn to our final main result concerning Fourier analysis on locally
compact Abelian groups, namely, the Pontrjagin duality theorem. Recall
that by definition, elements of Ĝ are characters on G. We can also view
elements of G as characters on Ĝ. Indeed, for x ∈ G we can define a
character Φ(x) on Ĝ via
hξ, Φ(x)i = hx, ξi.
ˆ
It follows that Φ defines a group homomorphism from G to Ĝ. The Pontr-
jagin duality theorem (see [10, Theorem 4.32]) states the following:
Theorem 4.2.7 (Pontrjagin duality). If G is a locally compact Abelian
group, then Φ is an isomorphism of topological groups.
According to this theorem, we may freely write hx, ξi or hξ, xi for the
pairing between G and Ĝ.
One consequence of this theorem is the other form of Fourier inversion:
if f ∈ L1 (G) and fˆ ∈ L1 (Ĝ), then
Z
f (x) = hx, ξifˆ(ξ) dξ
Ĝ
almost everywhere. We also have the dual form of Proposition 4.2.2; that
is, if Ĝ is compact then G is discrete, and if Ĝ is discrete then G is compact.
4.3. COMPACT GROUPS 85
Fourier analysis in the setting of groups can be applied to express an

arbitrary unitary representation of a locally compact Abelian group in terms
of irreducible representations (i.e. characters). We close this section by
stating the following result (see [10, Theorem 4.45]).
Theorem 4.2.8. Suppose π is a unitary representation of a locally com-
pact Abelian group G. There exists a unique regular Hπ -projection-valued
measure P on Ĝ so that
Z
π(x) = hx, ξidP (ξ) for x ∈ G,
ZĜ
π(f ) = ξ(f )dP (ξ) for f ∈ L1 (G),
Ĝ
R
where ξ(f ) = G hx, ξif (x) dx.
4.3 Compact groups

We next discuss some of the basic results of representation theory and
Fourier analysis for compact (but not necessarily Abelian) groups. We will
focus on introducing the relevant terms and stating the necessary results;
we will then work through some specific examples.
We begin with the following (see [10, Theorem 5.2]).
Theorem 4.3.1. If G is compact, then any irreducible representation of G
is finite-dimensional. Every unitary representation of G is a direct sum of
irreducible representations.
When G is an abelian group, we saw that Ĝ is a set of continuous func-
tions on G. The general definition of Ĝ is the set of unitary equivalence
classes of irreducible representations of G. In the abelian case, we consid-
ered characters of G. In the general case the corresponding set of functions
is the set of matrix elements of the irreducible representations of G.
Definition 4.3.2. Suppose π is a unitary representation of G. The functions
φu,v (x) = hπ(x)u, viHπ
for u, v ∈ Hπ are called the matrix elements of π.

Note that if u, v belong to an orthonormal basis {ej }, then φu,v (x) is one
of the entries of the matrix for π(x) in that basis, cf.
πij (x) = hπ(x)ej , ei i.

We denote the span of the matrix elements of π by Eπ . This defines a

subspace of C(G) and depends only on the unitary equivalence class of π.
The matrix elements of irreducible representions can be used to build an
orthonormal basis for L2 (G). This relies on two main results.
First, we have the Schur orthogonality relations (see [10, Theorem 5.8]):
Theorem 4.3.3. Let π, π 0 be irreducible unitary representations of G. Con-

sider Eπ , Eπ0 as subspaces of L2 (G).
• If [π] 6= [π 0 ] then Eπ ⊥ Eπ0 .
• If {ej } is an orthonormal basis for Hπ and πij is defined as above,

then p
{ dim Hπ πij : i, j = 1, . . . , dim Hπ }
is an orthonormal basis for Eπ .
The next result we need is the following theorem (see [10, Theorem 5.11]).
Theorem 4.3.4. Let E denote the linear span of

[
Eπ .
[π]∈Ĝ
Then E is dense in C(G) in the uniform norm and in Lp (G) for all p < ∞.
We now state the main result (called the Peter–Weyl theorem [10, The-
orem 5.12]). In the following, given an equivalence class [π] we assume we
have chosen one fixed representative π.
Theorem 4.3.5 (Peter–Weyl theorem). Let G be a compact group. Then

M
L2 (G) = Eπ ,
[π]∈Ĝ
and if
πij (x) = hπ(x)ej , ei i,
then the set
p
dim Hπ πij : i, j = 1, . . . dim Hπ , [π] ∈ Ĝ
is an orthonormal basis for L2 (G).

4.3. COMPACT GROUPS 87
This is the starting point for Fourier analysis on compact groups. In

particular, for f ∈ L2 (G) we get the representation
X dim
X Hπ Z
f= cπij πij , cπij = dim Hπ f (x)πij (x) dx.
G
[π]∈Ĝ i,j=1
As stated, this requires that we choose an orthonormal basis for each

Hπ . Alternately, we can define the Fourier transform of f ∈ L1 (G) at π
to be the operator fˆ : Hπ → Hπ given by
Z Z
f (π) = f (x)π(x ) dx = f (x)π(x)∗ dx
ˆ −1
In particular, given an orthonormal basis for Hπ , then fˆ(π) is represented

by the matrix Z
fˆ(π)ij = f (x)πji (x) dx = 1 cπ , dim Hπ ji
where the coefficients are as above. In this case, we get

X X
cπij πij (x) = dim Hπ fˆ(π)ji πij (x) = dim Hπ tr fˆ(π)π(x) .

i,j i,j
Thus we arrive at the Fourier inversion formula

X
dim Hπ tr fˆ(π)π(x) ,

f (x) =
[π]∈Ĝ
where convergence should be understood in the L2 sense. The Parseval

formula now reads
X
kf k2L2 (G) = dim Hπ tr[fˆ(π)∗ fˆ(π)].
[π]∈Ĝ
We turn to one more formulation. If π is a finite-dimensional unitary

representation of G, we define the character χπ of π by the function
χπ (x) = tr π(x).
In fact, this depends only on the equivalence class of π. A direct computation

shows
tr[fˆ(π)π(x)] = ∫ f (y) tr[π(y −1 )π(x)] dy = ∫ f (y) tr π(y −1 x) dy = f ∗ χπ (x),

so that the Fourier inversion formula may be written

X
f= dim Hπ f ∗ χπ .
[π]∈Ĝ
In particular, dim Hπ f ∗ χπ is the orthogonal projection of f onto Eπ .

We introduce one final notion before working through some examples.
A function f on G is called central if f is constant on conjugacy classes,
i.e. f (yxy −1 ) = f (x) for all x, y ∈ G. For example, the character of any
finite-dimensional representation is central, as
tr[π(x)π(y)] = tr[π(y)π(x)].
We denote the set of central functions with the prefix Z, e.g. ZC(G) and
ZLp (G). The linear span of {χπ : [π] ∈ Ĝ} is dense in ZC(G) as well as
ZLp (G) (see [10, Proposition 5.25]). One has that Lp (G) and C(G) form
Banach algebras under convolution, with ZLp (G) and ZC(G) their centers.
Our final result (appearing as [10, Proposition 5.23]), states:
Proposition 4.3.6. We have that
{χπ : [π] ∈ Ĝ}
forms an orthonormal basis for ZL2 (G).
4.4 Examples
We work through the details of some special examples, namely SU (2) and
SO(n) for n ∈ {3, 4}.
Example 4.4.1 (SU (2)). Let U (n) denote the group of unitary transforma-
tions of Cn , that is, the set of n × n matrices T satisfying T ∗ T = I. We let
SU (n) be the subgroup consisting of T ∈ U (n) with det T = 1. Note that
T ∈ U (n) if and only if T T ∗ = I, so that the rows of T are an orthonormal
set.
When n = 2, we can write

a b
T := ∈ U (2)
c d
if and only if |a|2 + |b|2 = |c|2 + |d|2 = 1 and ac̄ + bd¯ = 0. In particular,
(a, b) is a unit vector and (c, d) = eiθ (−b̄, ā) for some θ ∈ R. It follows that
det T = eiθ , and so T ∈ SU (2) if and only if eiθ = 1. Writing

a b
Ua,b = ,
−b̄ ā
4.4. EXAMPLES 89
we have
SU (2) = {Ua,b : a, b ∈ C, |a|2 + |b|2 = 1}.
The correspondence Ua,b with (a, b) = (1, 0)Ua,b identifies SU (2) with a
subset of the unit sphere S 3 ⊂ C2 (where the identity element is identified
with (1, 0).
Three one-parameter subgroups of SU (2) are of particular interest:
iθ
e 0
F (θ) = ,
0 e−iθ

cos φ sin φ
G(φ) = ,
− sin φ cos φ

cos ψ i sin ψ
H(ψ) = .
i sin ψ cos ψ
These are three mutually orthogonal great circles in the sphere that intersect
at ±Id.
Proposition 4.4.1. Every T ∈ SU (2) is conjugate to precisely one matrix

F (θ) as above, with 0 ≤ θ ≤ π.
Proof. Unitary matrices are normal, so by the spectral theorem there exists
V ∈ U (2) with
V T V −1 = diag(α, β).
For T ∈ SU (2) we have β = ᾱ = e−iθ for some θ ∈ [−π, π]. In particular,
1
V T V −1 = F (θ). By replacing V with [det V ]− 2 V , we may assume that
V ∈ SU (2) as well. Furthermore, using
F (−θ) = H( 12 π)F (θ)H(− 21 π),
we may reduce to θ ∈ [0, π]. To conclude, note that if θ1 , θ2 ∈ [0, π] then

F (θ1 ) and F (θ2 ) have different eigenvalues (and hence are not conjugate)
unless θ1 = θ2 .
This implies the following corollary.
Corollary 4.4.2. Let g be a continuous function on SU (2) and set
g 0 (θ) := g(F (θ)).
Then g 7→ g 0 is an isomorphism from the algebra of continuous central

functions on SU (2) to C([0, π]).
We now describe a family of unitary representations of SU (2). We let P

denote the space of all polynomials
X
P (z, w) = cjk z j wk
in two complex variables, and Pm ⊂ P be the space of homogeneous poly-

nomials of degree m, i.e.
m
X
cj z j wm−j : cj ∈ C .

Pm =
j=0
Now let σ denote normalized surface measure on S 3 . Then we can view P

as a subset of L2 (σ) with
Z
hP, Qi = P Q dσ.
S3
We will show that the monomials z j wk are orthogonal in P.

To this end, given (z, w) ∈ C2 , we introduce polar coordinates
(z, w) = Z = rZ 0 , r = |Z| = |z|2 + |w|2 , Z 0 ∈ S 3 .

p
Denote Lebesgue measure on C2 by d4 Z and Lebesgue measure on C by d2 z

or d2 w. Then
d4 Z = d2 zd2 w = cr3 dr dσ(Z 0 ),
where c = 2π 2 is the Euclidean surface measure of S 3 (cf. the following
lemma).
Lemma 4.4.3. Suppose f : C2 → C and f (aZ) = am f (Z) for a > 0. Then
Z Z
0 0 2
1
f (Z ) dσ(Z ) = π2 Γ( m +2) f (Z)e−|Z| d4 Z,
S3 2 C2
where Γ(·) is the Gamma function.

Proof. We compute
Z Z ∞Z
2 2
f (Z)e−|Z| d4 Z = c f (rZ 0 )e−r r3 dσ(Z 0 ) dr
2 3
C
Z0 ∞ S Z
m+3 −r2
=c r e f (Z 0 ) dσ(Z 0 ) dr
0 Z S3
= 2c Γ( m
2 + 2) f (Z 0 ) dσ(Z 0 ).
S3
4.4. EXAMPLES 91
To complete the proof, take f = 1, so that

Z Z 4
−|Z|2 4 −t2
c
2 = e d Z = e dt = π2.
C2 R
We now can prove the following.

Proposition 4.4.4. Let p, q, r, s denote nonnegative integers. Then
(
0 q 6= p or s 6= r,
Z
p q r s
z z̄ w w̄ dσ(z, w) = p!r!
(p+r+1)! q = p and s = r.
Consequently, the spaces Pm are mutually orthogonal in L2 (σ), and

q
(m+1)! j m−j
j!(m−j)! z w :0≤j≤m
forms an orthonormal basis for Pm .

Proof. By the previous lemma, the integral equals
Z Z
1 p q −|z|2 2 2
z z̄ e d z wr w̄s e−|w| d2 w.
π 2 Γ( 12 (p + q + r + s) + 2)
These integrals can be computed using polar coordinates, viz.
Z Z ∞ Z 2π
2 2
z p z̄ q e−|z| d2 z = ei(p−q)θ rp+q+1 e−r dr dθ
(0 0
0 p 6= q
= 1
2π · 2 Γ(p + 1) p = q.
This implies the result.
We can now describe a representation π of SU (2) on P and Pm . Given

Ua,b , we will define π(Ua,b ) : P → P via
−1
[π(Ua,b )P ](z, w) = P (Ua,b (z, w)) = P (āz − bw, b̄z + aw), (4.2)
where we use the natural action of SU (2) on C2 . Note that Pm is invariant

under π. We denote the subrepresentation of π on Pm by πm . Then each
πm is a unitary representation of SU (2) on Pm (with respect to the inner
product in L2 (σ)).
Our next goal is to show that each πm is irreducible. We begin with a
lemma.
Lemma 4.4.5. If M is a π-invariant subspace of Pm and P ∈ M , then
z ∂P
∂w ∈ M and w ∂P
∂z ∈ M.
Proof. Let G(φ) be as defined above. By assumption, for any φ 6= 0, we

have
1

φ π(G(φ))P − P ] ∈ M.
As φ → 0, the coefficients of this polynomial approach those of

d
P̃ := dφ π(G(φ))P |φ=0 .
As Pm is finite dimensional, we have that M is closed in Pm . Thus, P̃ ∈ M .

Now we compute
d

P̃ = dφ P (z cos φ − w sin φ, z sin φ + w cos φ)]|φ=0
= z ∂P ∂P
∂w − w ∂z .
A similar argument using H(ψ) yields
z ∂P ∂P
∂w + w ∂z =
1 d
i dψ π(H(ψ))P |ψ=0 ∈ M.
Adding and subtracting yields the result.
Theorem 4.4.6. For m ≥ 0, each πm is irreducible.
Proof. Suppose M is an invariant subspace of Pm . We need to show M =

Pm .
Let 0 6= P ∈ M . Write
m
X
P (z, w) = cj z j wm−j
j=0
and let J denote the largest j such that cj 6= 0. Then we have

∂ J
(w ∂z ) P (z, w) = cJ J!wm .
∂
By the previous lemma, this implies wm ∈ M . Now, by applying z ∂w (and
the lemma) successively, we can deduce
zwm−1 ∈ M, z 2 ω m−2 ∈ M, and finally z m ∈ M.
It follows that M = Pm , as desired.

4.4. EXAMPLES 93
Our next goal is to show that the πm ’s give a complete list of the irre-
ducible representations of SU (2):
Theorem 4.4.7. [SU (2)]ˆ = {[πm ] : m ≥ 0}.
Proof. First note that none of the πm ’s are equivalent representations, be-
cause they all have different dimensions (and different characters, as we will
see).
Now, let χm be the character of πm , and define
χ0m (θ) = χm (F (θ))
as in the corollary above. Note that the orthogonal basis vectors z j wm−j for
Pm are eigenvectors for πm F (θ); indeed, using (4.2) with a = eiθ and b = 0,
we get
πm (F (θ))(z j wm−j ) = e−i(2j−m)θ z j wm−j .
Thus
m
sin((m+1)θ)
X
χ0m (θ) = e−i(2j−m)θ = sin θ (4.3)
j=0
(see Exercise 4.5.3). It follows that χ00 (θ) ≡ 1, χ01 (θ) = 2 cos θ, and more
generally
χ0m (θ) − χ0m−2 (θ) = 2 cos mθ for m ≥ 2. (4.4)
(see Exercise 4.5.4). Thus the span of {χ0m } equals the span of {cos mθ}.
The latter is uniformly dense in C([0, π]); thus, by Corollary 4.4.2, the span
of {χ0m } is uniformly dense in the space of continuous central functions on
SU (2).
This means that the only function orthogonal to all χm is the zero func-
tion. By Proposition 4.3.6, this shows that χm must include all possible
irreducible characters. This completes the proof.
To do ‘Fourier analysis’ on SU (2) (i.e. to write down a decomposition

of L2 (SU (2)), we now need to compute the matrix elements of the repre-
sentations πm relative to the orthonormal bases given in Proposition 4.4.4.
We set q
(m+1)! j m−j
ej (z, w) = j!(m−j)! z w .
In what follows, we reparametrize SU (2) by replacing b with b̄. So we set

jk
πm (a, b) = πm (Ua,b̄ ), πm (a, b) = hπm (a, b)ek , ej i.
Now, by definition of the representations, we have

q
(m+1)! k m−k
k!(m−k)! (āz − b̄w) (bz + aw) = [πm (a, b)ek ](z, w)
X
jk
= πm (a, b)ej (z, w)
j
Xq (m+1)! jk j m−j
= j!(m−j)! πm (a, b)z w .
j
If we set z = e2πit and w = 1, then the sum becomes a Fourier series and
jk
πm (a, b) are computed like Fourier coefficients:
q Z 1
j!(m−j)!
jk
πm (a, b) = k!(m−k)! (āe2πit − b̄)k (be2πit + a)m−k e−2πijt dt.
0
When k = 0, one can compute

j0 √ 1 ej (b, a)
πm (a, b) = m+1
(4.5)
j0
√ Exercise 4.5.5). In particular {πm : 0 ≤ j ≤ m} span Pm . Here
(see
m + 1 = dim Hπm is needed to normalize the matrix elements.
jk
Let us also discuss the span of {πm : 0 ≤ j ≤ m}. The identities
jk
above show that πm (a, b) is a polynomial in the variables (a, b, ā, b̄) that is
homogeneous of degree m − k in (a, b) and of degree k in (ā, b̄). That is, it
has bidegree (m − k, k).
jk
Furthermore, as a function on C2 , each πm is harmonic:
4
jk
∂ 2 πm
X
∂x2n
= 0, a = x1 + ix2 , b = x3 + ix4
n=1
(check!).
We conclude this example with the following: We identify SU (2) with
the unit sphere in C2 by identifying Ua,b with (a, b̄). Then the Peter–Weyl
decomposition
∞
M
L2 = Eπm
m=0
agrees with the decomposition of functions on the sphere into ‘spherical
harmonics’. We then have the further decomposition
M
Eπ m = Hp,q ,
p+q=m
4.4. EXAMPLES 95
jq
where Hp,q is the span of πp+q for 0 ≤ j ≤ p + q. This is a grouping of the
spherical harmonics of degree m according to their bidegree.
Example 4.4.2 (SO(3)). We write SO(n) for the group of rotations on Rn .
To describe these groups (and their connections to SU (2)) we will make use
of the quaternions, denoted by H.
As a real vector space, we may identify H with R × R3 , with elements
denoted by (a, x). Multiplication in H is given by
(a, x)(b, y) = (ab − x · y, bx + ay + x × y),
where · denotes dot product and × denotes cross product. One can verify
that this product is associative and that |ξη| = |ξ| |η| (where ξ, η ∈ R4 and
| · | denotes euclidean length in R4 ).
The subspace R × {0} is the center of H. We identify this with R and
view it as the ‘real axis’, and we identify {0} × R3 with R3 . Then instead
of writing (a, x), we may write a + x. Denoting the standard basis of R by
i, j, k, we then have
(a, x) = a + x = a + x1 i + x2 j + x3 k.
The multiplication law is determined by giving the products of these vectors:
i2 = j 2 = k 2 = −1, ij = −ji = k, jk = −kj = i, ki = −ik = j.
The conjugate of ξ = (a, x) is ξ¯ = (a, −x). Note that
ξ ξ¯ = ξξ
¯ = a2 + |x|2 = |ξ|2 .
Defining
U (H) = {ξ ∈ H : |ξ| = 1},
we find that U (H) forms a group. For ξ ∈ U (H), the map
η 7→ ξηξ −1
is an isometric linear map from H to H that leaves the center R (and hence
the subspace R3 ) invariant. The restriction of this map to R3 is thus an
element of SO(3) denoted by κ(ξ), i.e.
κ(ξ)x = ξxξ −1 .
This belongs to SO(3) (and not just O(3)) because κ is a continuous map
on the connected set U (H).
Theorem 4.4.8. The map κ is a 2-to-1 homomorphism from U (H) onto

SO(3). In particular,
SO(3) ∼
= U (H)/{±1}.
Proof. Let ξ ∈ U (H). Write
ξ = cos θ + [sin θ]u,
where θ ∈ [0, π] and u ∈ R3 is a unit vector. The angle θ is unique, as is u
(except when sin θ = 0, which corresponds to ξ = ±1 and κ(±1) = I). In
particular, we may assume θ ∈ (0, π).
Now let v ⊥ u be another unit vector in R3 and define w = v × u. Then
{u, v, w} forms an orthonormal basis for R3 satisfying
uv = −uv = w, wu = −uw = v.
An explicit computation yields
ξ(au + bv + cw)ξ −1 = au + (b cos 2θ − c sin 2θ)v + (c cos 2θ + b sin 2θ)w, (4.6)
so that κ(ξ) is a rotation of angle 2θ about the u-axis.
Let us now show that every rotation of R3 is of this form. To see this,
note that if T ∈ SO(3) then the eigenvalues of T have absolute value 1,
have product equal to 1, and the non-real eigenvalues come in conjugate
pairs. Thus (ignoring the case that T = I) we have that 1 is an eigenvalue
of multiplicity one, and T is then a rotation about the u-axis (where u is
the corresponding unit eigenvector).
Let us now connect H with SU (2). We write

a + bi + cj + dk = (a + bi) + (c + di)j
and observe that the algebra structure on H restricted to elements of the
form a + bi coincides with that of C, with
j(a + bi) = (a − bi)j.
In particular, we can identify H with C2 , where multiplication corresponds
to
(z + wj)(u + vj) = (zu − wv̄) + (zv + wū)j,
with z, w, u, v ∈ C. This identity shows that we may define an isomorphism
by identifying z + wj with the complex matrices

z w
= Uz,w .
−w̄ z̄
4.4. EXAMPLES 97
In particular, this gives an identification of U (H) with SU (2). We therefore

have the following:
Corollary 4.4.9. We have
SO(3) ∼
= SU (2)/{±I}.
This shows that the representations of SO(3) are given by representa-

tions of SU (2) that are trivial on ±I. Since the irreducible representation πm
of SU (2) on Pm satisfies πm (−I) = (−1)m I, one can deduce the following:
[SO(3)]ˆ = {[ρk ] : k = 0, 1, 2, . . . },
where
ρk (Ad(U )) = π2k (U ).
Here Ad(U )A := U AU −1 .
Example 4.4.3 (SO(4)). We define L, R : SU (2) → L(U (H)) via
L(ξ)ζ = ξζ and R(ξ)ζ = ζξ −1 ,
where ξ, ζ ∈ U (H) ∼= SU (2). Because the Euclidean norm on H is multiplica-

tive, it follows that the images of SU (2) under L, R are closed subgroups of
SO(4) if we identify H with R4 .
These subgroups commute (this is the associative law) and intersect only
at ±1. To see this, suppose L(ξ) = R(η). Then
ξ = L(ξ)1 = R(η)1 = η −1 , so that L(ξ) = R(ξ −1 ).
Thus ξ belongs to the center of H, i.e. R. Then ξ = ±1 and η = ξ −1 = ±1.

Next, we have the following.
Theorem 4.4.11. If T ∈ SO(4), then there exist ξ, η ∈ SU (2) (unique up to

a common factor of ±1) such that T = L(ξ)R(η). Thus
SO(4) ∼
= [SU (2) × SU (2)]/{±(1, 1)}.
Proof. Let T ∈ SO(4) and ζ = T (1) (where we view 1 = (1, 0) ∈ R × R3 as

above). Write S = L(ζ)−1 T , so that S ∈ SO(4) and S(1) = 1. In particular,
S leaves the real axis pointwise fixed, and hence can be viewed as a rotation
of the orthogonal subspace R3 . By the preceding example, we saw that S
must be given by conjugation by some η ∈ U (H). In the present notation,

that means
S = L(η)R(η).
Thus T = L(ζ)S = L(ξ)R(η), where ξ := ζη. Uniqueness and the fact that
(ξ, η) 7→ L(ξ)R(η) is a homomorphism follow from the remarks above.
Using this result, one can describe the irreducible representations of

SO(4) in terms of those of SU (2) × SU (2), which may ultimately be de-
scribed in terms of the irreducible representations of SU (2). Without delv-
ing into the theory, one defines the representation πmn of SU (2) × SU (2) on
Pm ⊗ Pn via
πmn (ξ, η) = πm (ξ) ⊗ πn (η)
(where ⊗ denotes tensor product). The conclusion is the following:
[SO(4)]ˆ = {[ρmn ] : m, n ≥ 0, m ≡ n mod 2},
where
ρmn (L(ξ)R(η)) := πmn (ξ, η).
The restriction that m, n have the same parity comes from the fact that
πm (−1) = (−1)m I.
4.5 Exercises
Exercise 4.5.1. Recover the theory of Fourier series using the Peter–Weyl
theorem and the discussion thereafter.
Exercise 4.5.2. Verify that the maps defined in (4.2) are unitary represen-
tations of SU (2).
Exercise 4.5.3. Compute the sum in (4.3).
Exercise 4.5.4. Prove the identity (4.4).
Exercise 4.5.6. Verify (4.6).
Chapter 5
Wavelet transforms
In this section we give an introduction to wavelet transforms, which are

mathematical tools that have a wide range of applications within mathe-
matics as well as in physics, engineering, and so on. Our primary reference
is [7], which covers much more ground than we can hope to cover here.
5.1 Continuous wavelet transforms

Given ψ : R → C, we define a family of ‘wavelets’ by rescaling and translat-
ing ψ:
1
ψ a,b (x) = |a|− 2 ψ x−b

a ,
where a ∈ R\{0} and b ∈ R. These are L2 normalized, i.e.
kψ a,b kL2 ≡ kψkL2 ,
and we will assume

R that kψkL2 = 1.
We assume ψ = 0, and more generally we will assume the admissibility
condition Z
Cψ := 2π |ξ|−1 |ψ̂(ξ)|2 dξ < ∞
1
(i.e. ψ ∈ Ḣ − 2 ). The necessity of this condition will become apparent below.
The continuous wavelet transform with respect to this wavelet family
is defined by
T f (a, b) = hf, ψ a,b i.
The wavelet transform yields a resolution of the identity in the following
sense.
99
100 CHAPTER 5. WAVELET TRANSFORMS
1
Proposition 5.1.1. For ψ ∈ Ḣ − 2 , we have
ZZ
−1
f = Cψ T f (a, b)ψ a,b a−2 da db
in the weak sense; i.e. for all f, g ∈ L2 ,

ZZ
−1
hf, gi = Cψ T f (a, b)T g(a, b)a−2 da db.
Proof. The proof is a direct computation. We begin by using Plancherel’s

theorem and the definition of the wavelet transform to write
ZZ
T f (a, b)T g(a, b)a−2 da db
Z Z Z
1
ˆ −ibξ
= f (ξ)|a| e
2 ψ̂(aξ) dξ
Z
1
× ĝ(η)|a| 2 e ψ̂(aη) dη a−2 da db.
ibη
Now consider the functions

1 1
Fa (ξ) := |a| 2 fˆ(ξ)ψ̂(aξ) and Ga (ξ) = |a| 2 ĝ(ξ)ψ̂(aξ).
Then Z
1 1
fˆ(ξ)|a| 2 e−ibξ ψ̂(aξ) dξ = (2π) 2 F̂a (b),
with a similar expression for the remaining integral. Applying Plancherel

(in the db integral), we continue from above to write
ZZ ZZ
−2
T f (a, b)T g(a, b)a da db = 2π Fa (ξ)Ga (ξ)a−2 dξ da
Z Z
ˆ 2 −1
= 2π f (ξ)ĝ(ξ) |ψ̂(aξ)| |a| da dξ
= Cψ hf, gi,
where we have changed variables in the da integral above and then applied
Plancherel once more.
Remark 5.1.2. It is possible to be a bit more quantitative about the sense

in which convergence holds, but we will not concern ourselves with all of the
details here. See [7] for more details.
5.1. CONTINUOUS WAVELET TRANSFORMS 101
There are several variations of the situation descibed above. For exam-
ple, if we use a real-valued ψ, then we get
ψ̂(−ξ) = ψ̂(ξ),
and we can write

Z ∞ Z 0
−1
C̃ψ = 2π |ξ| 2
|ψ̂(ξ)| dξ = 2π |ξ|−1 |ψ̂(ξ)|2 dξ.
0 −∞
Then we have Z ∞Z
f = C̃ψ−1 T f (a, b)ψ a,b a−2 db da.
0 R
One can also investigate using bandlimited functions, wavelets, or using
complex-valued wavelets with real-valued signals, and so on.
One important variation involves using different wavelets for the decom-
position and the reconstruction of the signal f . In particular, we have:
Proposition 5.1.3. If ψ1 , ψ2 satisfy
Z
|ψ̂1 (ξ)| |ψ̂2 (ξ)| |ξ|−1 dξ < ∞,
then ZZ
f = Cψ−1
1 ,ψ2
hf, ψ1a,b iψ2a,b a−2 da db
in the weak sense, where

Z
Cψ1 ,ψ2 = 2π ψ̂1 (ξ)ψ̂2 (ξ)|ξ|−1 dξ.
Proof. The proof proceeds exactly as in Proposition 5.1.1.
As before, under some reasonable conditions on f , the convergence above

can be shown to hold in stronger senses (e.g. pointwise at points of continuity
for f ).
Reproducing kernel Hilbert spaces.
The continuous wavelet transform is related to the notion of a repro-
ducing kernel Hilbert space, as we briefly explain here.
Definition 5.1.4. Let H be a Hilbert space of real-valued functions on
some set X. For x ∈ X, define the evaluation functional Lx : H → R by
Lx f = f (x). We call H a reproducing kernel Hilbert space (rkHs) if for all
x ∈ X, Lx defines a bounded operator on H.
By the Riesz representation theorem (see Section A.2), if H is a re-

producing kernel Hilbert space, then for every x ∈ X, there exists unique
gx ∈ H such that
f (x) = Lx f = hf, gx i for all f ∈ H.
If the inner product is given by integration, this becomes

Z
f (x) = f (y)g(x, y) dy for all f ∈ H.
The function g is called the reproducing kernel for the Hilbert space.
Now let us return to the continuous wavelet transform. For f ∈ L2 (R),
we have ZZ Z
Cψ−1 |T f (a, b)|2 a−2 da db = |f (x)|2 dx,
so that T maps L2 (R) isometrically into the Hilbert space
L2 (R2 ; Cψ−1 a−2 da db).
Let H denote the closed subspace given by the image of L2 under T . Then
H is a reproducing kernel Hilbert space. In fact,
ZZ
−1
a,b
F (a, b) = hf, ψ i = Cψ K(a, b; α, β)F (α, β)α−2 dα dβ,
where
K(a, b; α, β) = hψ α,β , ψ a,b i.
That is, point evaluation is given by integration against a kernel.
The windowed Fourier transform.
We turn to a comparison of the wavelet transform with the so-called
windowed Fourier transform. Given a smooth function g supported
near the origin, we define the windowed Fourier transform by
Fg f (ω, t) = hf, g ω,t i, where g ω,t (x) = eiωx g(x − t).
That is, Fg f (ω, t) represents the Fourier transform of the portion of f (x)
around the point x = t. This is a natural quantity to consider in the setting
of signal processing.
Arguing as in the proof of Proposition 5.1.1, we have the following:
Proposition 5.1.5. We have

ZZ
1
f= Fg f (ω, t)g ω,t dω dt
2πkgk2L2
in the weak sense. That is, for all f1 , f2 ∈ L2 ,

ZZ
1
hf1 , f2 i = Fg f1 (ω, t)Fg f2 (ω, t) dω dt.
2πkgk2L2
One can use any g ∈ L2 for the windowing function. Typically, one
normalizes kgkL2 = 1. The windowed Fourier transform also maps L2 to a
reproducing kernel Hilbert space, viz.
ZZ
1
F (ω, t) = 2π K(ω, t; ω 0 , t0 )F (ω 0 , t0 ) dω 0 dt0 ,
where
0 0
F (ω, t) = Fg f (ω, t), K(ω, t; ω 0 , t0 ) = hg ω ,t , g ω,t i.
Construction of operators and time-frequency localization.

We saw in Proposition 5.1.1 the following reconstruction formula:
ZZ
−1
f = Cψ hf, ψ a,b iψ a,b a−2 da db.
This can also be viewed as a ‘resolution of the identity’, i.e.

ZZ
−1
Id = Cψ h·, ψ a,b iψ a,b a−2 da db
for an admissible ‘mother wavelet’ ψ. Similarly, we have a resolution of the

identity using the windowed Fourier transform:
ZZ
1
Id = 2π h·, g ω,t ig ω,t dω dt
for an L2 -normalized windowing function.

Note that in both cases, we are reconstructing the identity operator as
a linear combination of rank one projections. That is, the operator
f 7→ hf, φiφ
is simply the rank one projection onto the span of φ.

For the identity matrix, we give each projection equal weight. However,
by varying the weight given to each projection operator, we can construct a
wide variety of operators.
Let us consider this first in the setting of the windowed Fourier trans-
form. Inspired by applications in quantum mechanics, we switch to the posi-
tion/momentum variables (p, q) (instead of (t, ω)). Given a weight function
w(p, q) and an L2 -normalized window function, we may define the operator
ZZ
W = 2π1
w(p, q)h·, g p,q ig p,q dp dq.
If w is unbounded, then W may be an unbounded operator. However, for

reasonably chosen w and g, one can get a densely defined operator.
Example 5.1.1. With w(p, q) = p2 , one gets
d 2
W = − dx 2 + Cg Id,
where Z
Cg = ξ 2 |ĝ(ξ)|2 dξ.
With w(p, q) = v(q), one gets
W f (x) = Vg (x)f (x), Vg (x) = v ∗ |g|2 .
These correspond to the “quantized versions” of the phase space function

w(p, q), up to the additional g-dependent parts. Ideas related to these ap-
peared in work of Lieb establishing the validity of the Thomas–Fermi model
in the limit as the nuclear charge tends to infinity (i.e. very heavy atoms).
Let us consider the operator corresponding to time-frequency local-
ization. Recall from Corollary 2.8.2 that a non-trivial function and its
Fourier transform cannot both be compactly supported. Nonetheless, in
practice it is of great utility to localize signals in both space and frequency
as much as possible. We will consider two notions of time-frequency local-
ization.
Example 5.1.2. First, we make use of the orthogonal projection operators
QT and PΩ , given by
QT f (x) = 1[−T,T ] (x)f (x) and F[PΩ f ](ξ) = 1[−Ω,Ω] (ξ)fˆ(ξ).
Suppose f is time-limited, i.e. f = QT f . If we transmit f over a bandlim-

ited signal, the result is of the form PΩ QT f. To measure how faithfully the
transmitted signal represents the original signal, we may compute
kPΩ QT f k2L2 hQT PΩ QT f, f i

2 = .
kf kL2 kf k2L2
The largest value of this ratio corresponds to the largest eigenvalue of the
symmetric operator QT PΩ QT , which is an integral operator with integral
kernel
(QT PΩ QT )(x, y) = 1[−T,T ] (x)1[−T,T ] (y) sin[Ω(x−y)]
π(x−y)
(exercise). The eigenvalues and eigenvectors of this operator are known, due
to the fact that this operator commutes with the operator
2 Ω2 2
A= d
dx (T − x2 ) dx
d
− π2
x ,
which was previously studied in the context of solving the Helmholtz equa-
tion via separation of variables. The eigenfunctions are called ‘prolate
spheroidal wave functions’.
The eigenvalues may be put in decreasing order. One finds (by a rescaling
argument) that the eigenvalues of QT PΩ QT depend only on the product
T Ω. It turns out that (for fixed T Ω) the eigenvalues begin close to 1 before
suddenly plunging down to essentially zero. There are about 2TπΩ eigenvalues
near one before a plunge region of width about log(T Ω).
This gives a rigorous version of something empirically observed long ago.
In particular, in a time and band-limited region, there are 2T Ω/π ‘degrees
of freedom’ (i.e. independent functions that are essentially time and band-
limited in this way). This is proportional to the area of [−T, T ] × [−Ω, Ω];
it also corresponds to the number of sampling times within [−T, T ] dictated
by Shannon’s sampling theorem (Theorem 3.1.1) for a function with Fourier
support in [−Ω, Ω].
Example 5.1.3. Next, returning to the discussion above, let us use the win-
dowed Fourier transform to define an operator that corresponds to time-
frequency localization to a set S ⊂ R2 . We define
ZZ
LS = 2π1
h·, g ω,t ig ω,t dω dt.
(ω,t)∈S
Note that 0 ≤ LS ≤ Id in the sense of operators. The lower bound is

immediate, while the upper bound follows from the resolution of the identity
property: ZZ
|hLS f, f i| ≤ 1
2π |hf, g ω,t i|2 dω dt = kf k2L2 .
Suppose S is a bounded subset of R2 and ϕn is an orthonormal basis for

L2 (R). Then
X
hLS ϕn , ϕn i
ZZ X ZZ
ω,t 2
1 1
kg ω,t k2L2 dw dt = |S|.

= 2π hϕn , g i| dω dt = 2π
S n S
This shows that LS is a trace class operator (see Section A.2).

By the spectral theorem, we may find a complete set of eigenvectors
φn forming an orthonormal basis for L2 with nonnegative eigenvalues λn
decreasing to zero.
The interpretation of LS f is that we build up f out of time-frequency
localized pieces hf, g ω,t ig ω,t , only using (ω, t) ∈ S.
Even if S = [−Ω, Ω] × [−T, T ], the operator LS will not be the same as
QT PΩ QT . In this construction, we are free to choose more general sets S.
In general, it is difficult to compute the eigenvalues and eigenvectors of
the operator LS constructed above, in which case the construction is not
particularly useful. However, there is a special case in which things can be
computed explicitly.
Example 5.1.4. Consider the operator LR = LSR defined as above, with
S = SR = {(ω, t) : ω 2 + t2 ≤ R2 }
and
1 2 /2
g(x) = g0 (x) := π − 4 e−x .
In this case, things can be computed fairly explicitly. We only mention some
of the results; more details may be found in [7, pp38–40].
With these choices of g0 and SR , one finds by direct computation that
LR commutes with the harmonic oscillator Hamiltonian, defined by
d 2 2
H = − dx 2 + x − 1.
In fact, one can compute the action of the unitary group e−isH on g0 explic-
itly and prove that
LR e−isH = e−isH LR ,
which implies the result. The proof boils down to showing that the action
of e−isH on g0ω,t simply produces (up to a phase factor) some other g0ωs ,ts ,
where (ωs , ts ) are given by a rotation matrix applied to (ω, t). One then
applies change of variables (given by a rotation, which leaves the domain SR
invariant) to prove the identity above.
As LR commutes with H, these operators are simultaneously diagonal-

izable. Fortunately, it is a well-known exercise in quantum mechanics to
compute the eigenvalues and eigenvectors of the harmonic oscillator (see
e.g. [12]). The eigenvalues are simply {2n} and the eigenfunctions are the
Hermite functions
n 1
φn = 2− 2 (n!)− 2 (x − d n
dx ) g0 (x).
The eigenvalues for LR can be found (after some computation), using

1 2
Z
2
R
λn (R) = hLR φn , φn i = 1
n! sn e−s ds.
0
This is called an incomplete Γ-function and the behavior as a function of n

and R is understood:
For each R, the λn (R) decrease monotonically as n increases. They start
close to 1 and then ‘plunge’ down to zero. One finds
max{n : λn (R) ≥ 21 } ∼ 12 R2 ,
which (writing it as πR2 /2π) is again the area of the time-frequency local-
ization region divided by 2π. The width of the ‘plunge region’ is larger this
time, but still essentially negligible compared to 21 R2 . The eigenfunctions
turn out to be independent of the size of SR ; that is, the R-dependence is
completely represented through the eigenvalues.
So far, we have focused on using the windowed Fourier transform to build
operators. Of course, a similar construction is possible using the wavelet
transform. That is, given a weight w(a, b), we may define
ZZ
−1
W = Cψ w(a, b)h·, ψ a,b iψ a,b a−2 da db.
Once again, one may interested in using a weight w(a, b) that simply cuts off
to a subset S. Once again this leads to operators LS satisfying 0 ≤ LS ≤ 1,
which (provided S is compact and does not contain a = 0) are trace class.
As before, the eigenvalues/eigenvectors can be difficult to analyze except in
some special cases. One such case involves taking ψ̂(ξ) = 2ξe−ξ χ[0,∞) (ξ)
and using the identity
Z ∞Z
−1 a,b a,b a,b a,b −2
1 = Cψ [h·, ψ+ iψ+ + h·, ψ− iψ− ]a db da,
0
where ψ+ = ψ and ψ̂− (ξ) = ψ̂(ξ). One then considers localization to regions
SC = {(a, b) ∈ (0, ∞) × R : a2 + b2 + 1 ≤ 2aC}
corresponding to the disks |z − iC|2 ≤ C 2 − 1 in the upper half of the

complex plane. The role of the harmonic oscillator Hamiltonian is played by
a different operator that commutes with LC and is diagonalized by Laguerre
polynomials. We refer the interested reader to [7] and the references therein.
Let us now consider one final ‘purely mathematical’ application of the
continuous wavelet transform.
Characterization of local regularity.
We will prove two theorems relating regularity of functions to their
wavelet transforms. The first is a global result; we have seen similar re-
sults in the setting of the Fourier transform or Fourier series. The second is
a local result; this type of result is typically not possible using the Fourier
transform or windowed Fourier transform.
Theorem 5.1.6. Suppose

Z
(1 + |x|)|ψ(x)| dx < ∞ and ψ̂(0) = 0.
If f is bounded and Hölder continuous of order α, then

1
|hf, ψ a,b i| . |a|α+ 2 . (5.1)
Conversely, suppose ψ is compactly supported and f ∈ L2 (R) is bounded and

continuous. If (5.1) holds, then f is Hölder continuous of order α.
R
Proof. The first statement follows from a direct computation. Using ψ =
0,
Z
1
|hψ a,b , f i| . |a|− 2 ψ( x−b

a ) |f (x) − f (b)| dx

Z
1
. |a|− 2 |ψ( x−b α
a )||x − b| dx
Z
α+ 12
. |a| |ψ(y)| |y|α dy,
as claimed.
We turn to the converse statement. Let ψ2 ∈ Cc∞ with ψ2 = 0. We

R
can choose ψ2 so that the constant Cψ,ψ2 = 1. Using the resolution of the
identity, ZZ
f (x) = hf, ψ a,b iψ2a,b (x)a−2 da db.
We split the integral into the part where |a| ≤ 1 and |a| > 1. We call
these two terms fS and fL (for small and large scales, respectively).
First consider the large scale piece, which is actually already guaranteed
to be Lipschitz. We begin by showing fL is uniformly bounded. In fact,
ZZ
5
|fL (x)| . kf kL2 kψkL2 a− 2 |ψ2 ( x−b
a )| da db
|a|≥1
Z
3
. kf kL2 kψkL2 kψ2 kL1 |a|− 2 da . 1.
|a|≥1
Now let |h| ≤ 1 and write
|fL (x + h) − fL (x)|
ZZZ
a−3 |f (y)||ψ( y−b ψ2 ( x+h−b ) − ψ2 ( x−b ) dy db da

≤ a )| a a
|a|≥1
ZZZ
. |h| a−4 |f (y)| da db dy,
S
where (using the compact support of ψ and ψ2 )
S = {(a, b, y) : |a| ≥ 1, |y − b| ≤ |a|R, |x − b| ≤ |a|R}
for some R > 0. In particular, using Hölder’s inequality,

Z
3
|fL (x + h) − fL (x)| .R |h|kf kL2 a−4 · a 2 da . |h|.
|a|≥1
We turn to the small scale piece, beginning again with uniform bound-
edness:
ZZ
1 1
|fS (x)| . |a|α+ 2 |a|− 2 |ψ2 ( x−b
a )|a
−2
da db
|a|≤1
Z
. kψ2 kL1 a−1+α da . 1.
|a|≤1
Next, for |h| ≤ 1, we consider fS (x + h) − fS (x) and split the integral into
two regions. First, we get the contribution
ZZ
x+h−b) −2
|a|α |ψ2 ( x−b
a )| + |ψ2 ( a )| a da db
|a|≤|h|
ZZ Z
. a α−2
da db . |a|−1+α da . |h|α ,
S |a|≤|h|
where
S = {(a, b) : |a| ≤ |h| and |x − b| ≤ |a|R}
for some R > 0. Next, using the fact that ψ2 is Lipschitz, the contribution
of |h| ≤ |a| ≤ 1 is controlled by
ZZ Z
−1 −2
α
|a| |h||a| a da db . |h| |a|−2+α . |h|α ,
S̃ |h|≤|a|≤1
where we have denoted
S̃ = {(a, b) : |h| ≤ |a| ≤ 1 and |x − b| ≤ |a|R}
for some R > 0. This completes the proof.
Next, we turn to the following characterization of local Hölder regularity.
Theorem 5.1.7. Suppose

Z
|ψ(x)|(1 + |x|) dx < ∞ and ψ̂(0) = 0.
If f is bounded and Hölder continuous of order α at x0 , then

1
|hf, ψ a,x0 +b i| . |a| 2 (|a|α + |b|α ).
Conversely, suppose ψ is compactly supported and f ∈ L2 is bounded and

continuous. If there exists γ > 0 such that
1 1 |b|α
|hf, ψ a,b i| . |a|γ+ 2 and |hf, ψ a,b+x0 i| . |a| 2 |a|α + | log |b|| (5.2)
(uniformly in a, b), then f is Hölder continuous of order α at x0 .
Proof. By applying a translation, we may assume x0 = 0.

R
Again, the first part follows from a computation: using ψ = 0,
Z
1
|hf, ψ a,b i| ≤ |f (x) − f (0)||a|− 2 |ψ( x−b
a )| dx
Z
1
. |x|α |a|− 2 |ψ( x−b
a )| dx
Z
1 1 1
. |a|α+ 2 |ψ(y)| |y + ab |α dy . |a| 2 +α + |a| 2 |b|α ,
as desired.
We turn to the converse statement. In fact, the argument proceeds
similar to the proof of Theorem 5.1.6. We actually only need to establish a
suitable bound on |fS (h) − fS (0)| for |h| ≤ 1. This time we split into four
pieces:
ZZ
−2
|a|γ |ψ2 ( h−b
a )|a da db, (5.3)
|a|≤|h|α/γ
ZZ
|b|α −2
(|a|α + | log h−b
|b|| )|ψ2 ( a )|a da db, (5.4)
|h| α/γ ≤|a|≤|h|
ZZ
|b|α −2
(|a|α + | log b
|b|| )|ψ2 (− a )|a da db, (5.5)
|a|≤|h|
ZZ
|b|α −b −2
(|a|α + | log h−b
|b|| )|ψ2 ( a ) − ψ2 ( a )|a da db. (5.6)
|h|≤|a|≤1
We first estimate
Z
(5.3) . |a|−1+γ kψ2 kL1 . |h|α ,
|a|≤|h|α/γ
which is acceptable.
Next, supposing the support of ψ2 is contained in [−R, R], we have
Z
(5.4) .kψ2 kL1 |a|−1+α da
|a|≤|h|
(|a|R + |h|)α
Z
+ |a|−1 da
|h|α/γ ≤|a|≤|h| | log(|a|R + |h|)|
Z
. |h|α 1 + 1
| log |h|| |a|−1 da . |h|α .
|h|α/γ ≤|α|≤|h|
Similarly,
(|a|R)α
Z Z
−1+α
(5.5) . |a| da + |a|−1 da
|a|≤|h| |a|≤|h| | log(|a|R)|
α
. |h| .
[To solve the latter integral, make the substitution u = log(|a|).]

Finally, using the Lipschitz bound for ψ2 ,
(|a|R + |h|)α
Z
−3 α
(5.6) . |h| |a| |a| + (|a|R + |h|) da
|h|≤|a|≤1 | log(|a|R + |h|)|
(|a|R + |h|)1+α
Z
. |h| |a|−3+α (|a|R + |h|) + |a|−3 da
|h|≤|a|≤1 | log(|a|R)|
. |h|α .
5.2 Discrete wavelet transforms

We turn to the discrete wavelet transform. We let ψ be an admissible
function. We will restrict to positive scales a > 0, so that the admissibility
condition becomes
Z ∞ Z 0
Cψ = |ξ|−1 |ψ̂(ξ)|2 dξ = |ξ|−1 |ψ̂(ξ)|2 dξ < ∞.
0 −∞
For a fixed dilation step a0 > 1, we will restrict to the discrete set of
scales a = am
0 where m ∈ Z. We next fix a translation parameter b0 > 0 and
define the rescaled and translated wavelets
−m −m x−nb0 am
ψm,n (x) = a0 2
ψ(a−m
0 x − nb0 ) = a0
2
ψ( am
0
),
0
where m, n ∈ Z.
The basic questions we are interested in are whether functions are uniquely
determined by their wavelet coefficients hf, ψm,n i, as well as how functions
may be reconstructed using wavelets and wavelet coefficients.
We will need the notion of a frame.
Frames and reconstruction.
Definition 5.2.1. A family of functions {ϕj }j∈J in a Hilbert space H is

called a frame if there exist 0 < A ≤ B < ∞ such that for all f ∈ H,
X
Akf k2 ≤ |hf, ϕj i|2 ≤ Bkf k2 .
j∈J
We call A, B the frame constants. If A = B then we call {ϕj } a tight

frame.
5.2. DISCRETE WAVELET TRANSFORMS 113
If {ϕj } is a tight frame, then for any f ∈ H we have

X
kf k2 = A1 |hf, ϕj i|2 ,
j
which implies (via the polarization identity) that

X
f = A−1 hf, ϕj iϕj (5.7)
j
in the weak sense.

Frames (even tight frames) need not be orthonormal bases.
Example 5.2.1. In H = C2 , {ϕj }3j=1 defined by
√ √
3 1 3 1
ϕ1 = (0, 1), ϕ2 = (− 2 , − 2 ), ϕ3 = ( 2 , −2)
form a tight frame with A = 32 .

In this case, A = 23 gives the redundancy ratio (i.e. we are using three
vectors in a two-dimensional space). If A = 1 then a tight frame is an
orthonormal basis.
Proposition 5.2.2. If {ϕj } is a tight frame with bound A = 1 and kϕj k ≡ 1,
then {ϕj } is an orthonormal basis.
Proof. By definition, if hf, ϕj i ≡ 0 then f = 0. Thus, it suffices to verify
orthonormality. To this end, we observe
X X
kϕj k2 = |hϕj , ϕk i|2 = kϕj k4 + |hϕj , ϕk i|2 .
k j6=k
As kϕj k = 1, this implies hϕj , ϕk i = 0 for any j 6= k.
We turn to the question of reconstruction. In the case of a tight frame,

one has (5.7). To deal with more general frames, we first introduce the
notion of the frame operator.
Definition 5.2.3. If {ϕj }j∈J is a frame in H, the frame operator F is
the linear operator F : H → `2 (J) defined by (F f )j = hf, ϕj i.
The frame operator is bounded, as kF f k2 ≤ Bkf k2 . The adjoint of F is
computed via
X X
hF ∗ c, f i = hc, F f i = cj hf, ϕj i = cj hϕj , f i,
j j
which yields X
F ∗c = cj ϕj .
j∈J
1
Note that kF ∗ k = kF k ≤ B 2 and that
X
|hf, ϕj i|2 = kF f k2 = hF ∗ F f, f i.
j
Thus, the frame condition may be written
A Id ≤ F ∗ F ≤ B Id.
As a consequence, we find that F ∗ F is invertible (cf. the lemma below). In

fact, one has B −1 Id ≤ (F ∗ F )−1 ≤ A−1 Id.
Lemma 5.2.4. Let S be a positive bounded linear operator on a Hilbert
space H satisfying S ≥ α Id. Then S is invertible, with S −1 bounded by
α−1 .
Proof. Exercise.
Proposition 5.2.5. Suppose {ϕj } is a frame and F is the frame operator.

Define
ϕ̃j = (F ∗ F )−1 ϕj .
Then {ϕ̃j } is a frame with frame constants B −1 and A−1 . The associated
frame operator F̃ satisfies
F̃ = F (F ∗ F )−1 ,
as well as
F̃ ∗ F̃ = (F ∗ F )−1 , F̃ ∗ F = F ∗ F̃ = Id.
Finally, F̃ F ∗ = F F̃ ∗ is the orthogonal projection in `2 onto R(F ) = R(F̃ ).
Proof. Exercise.
One calls {ϕ̃j } the dual frame of ϕj . (The dual frame of ϕ̃j is simply the
ϕj again.) The conclusions of Proposition 5.2.5 may be succinctly written
as X X
hf, ϕj iϕ̃j = f = hf, ϕ̃j iϕj .
j j
This yields a reconstruction formula for f using hf, ϕj i, while simultaneously

writing f as a linear combination of the ϕj .
Note that typically the ϕj are not even linearly independent, and thus
there may be many linear combinations of the ϕj that yield f . The particular
linear combination given above has a minimality property.
P
Proposition 5.2.6. If f = j cj ϕj but we do not have cj ≡ hf, ϕ̃j i, then
X X
|cj |2 > |hf, ϕ̃j i|2 .
j j
Proof. First note that f = j cj ϕj is equivalent to f = F ∗ c. Now decom-

P
pose c = F̃ g + b where F̃ g ∈ R(F̃ ) = R(F ) and b ∈ [R(F )]⊥ = nul(F ∗ ).

Then
kck2 = kF̃ gk2 + kbk2 .
Now (since F ∗ F̃ = Id)
f = F ∗ c = F ∗ F̃ g + F ∗ b =⇒ f = g.
Then X X
|cj |2 = kck2 = kF̃ f k2 + kbk2 = |hf, ϕ̃j i|2 + kbk2 ,
which implies the result.
Example 5.2.2. In the example above we had

3
X
2
v= 3 hv, ϕj iϕj .
j=1
P
However, since ϕj = 0, we also have
3
X
2
v= 3 [hv, φj i + α]ϕj for any α ∈ C.
j=1
However, the minimal length representation occurs when we choose α = 0.

The {ϕ̃jP
} also play a special role in the decomposition for f , in the sense
that if f = j hf, ϕj iuj then
X X
|huj , gi|2 ≥ |hϕ̃j , gi|2
j j
for all g ∈ H.
We return to the question of reconstruction. Given a frame {ϕj }, the

problem boils down to computing the inverse of F ∗ F . Suppose r = B
A −1
1. Then we may expect F ∗ F ∼ 21 (A + B)Id, and hence
(F ∗ F )−1 ∼ 2
A+B Id, and so ϕ̃j ∼ 2
A+B ϕj .
With this in mind, we set R = Id − 2 ∗

A+B F F and write
X
2
f= A+B hf, ϕj iϕj + Rf.
j
By construction, we have
− B−A
B+A Id ≤ R ≤
B−A
B+A Id, and so kRk ≤ B−A
B+A = r
2+r .
This shows that simply writing

X
2
f∼ A+B hf, ϕj iϕj
j
yields an approximate reconstruction of f with error of size ∼ rkf k.

Pushing this further, we can describe an algorithm for reconstruction
based on the fact that
F ∗F = A+B
2 (Id − R), so that (F ∗ F )−1 = 2
A+B (Id − R)−1 .
B−A
As kRk ≤ B+A < 1, we can write
∞
X
−1
(Id − R) = Rk .
k=0
Then
∞
X
ϕ̃j = 2
A+B Rk ϕj .
k=0
The approximation above corresponds to keeping only the k = 0 term. More

generally, we can define
N
X ∞
X
ϕ̃N
j = 2
A+B
k
R ϕj = ϕ̃j − 2
A+B Rk ϕj = [Id − RN +1 ]ϕ̃j .
k=0 k=N +1
It follows that
X X
hf, ϕj iϕ̃N hf, ϕj iϕ̃N

f − j
= sup hf − j , gi

j kgk=1 j
X
= sup hf, ϕj ihϕ̃j − ϕ̃N

j , gi

g
j
X
= sup hf, ϕj ihRN +1 ϕ̃j , gi

g
j
r N +1
≤ kRkN +1 kf k ≤

2+r kf k.
As for actually computing the ϕ̃N j , one can use an iterative algorithm
(that actually is relatively practical to implement) and write
−1
ϕ̃N
j =
2
A+B ϕj + Rϕ̃N
j .
See [7] and the references therein for more details.

Frames of wavelets.
We return to the setting of the discretized wavelet transform, with
−m/2
ψm,n (x) = a0 ψ(a−m
0 x − nb0 ).
We will next discuss necessary and sufficient conditions for this family to
form a frame in L2 (R). The following appear as Theorem 3.3.1 and Propo-
sition 3.3.2 in [7].
Theorem 5.2.7. Suppose ψm,n form a frame for L2 with frame bounds
A, B. Then
Z ∞
b0 log a0
2π A ≤ ξ −1 |ψ̂(ξ)|2 dξ ≤ b0 log a0
2π B,
0
Z 0
b0 log a0
2π A ≤ |ξ|−1 |ψ̂(ξ)|2 dξ ≤ b0 log a0
2π B.
−∞
Conversely, suppose the following hold:

X X
inf |ψ̂(am 2
0 ξ)| > 0, sup |ψ̂(am 2
0 ξ)| < ∞,
1≤|ξ|≤a0 1≤|ξ|≤a0 m∈Z
m∈Z
and X
β(s) := sup |ψ̂(am m
0 ξ)| |ψ̂(a0 ξ + s)|
ξ m
decays at least as fast as (1 + |s|)−(1+ε) for some ε > 0. Then there exists
b∗0 > 0 such that the ψm,n form a frame for any b0 < b∗0 , with the following
frame bounds:
X X 1
A = 2π |ψ̂(am 2
β( 2π 2π

inf 0 ξ)| − b0 k)β(− b0 k) ,
2
b0
1≤|ξ|≤a0
k6=0
X X 1
2π
|ψ̂(am 2
β( 2π 2π

B= b0 sup 0 ξ)| + b0 k)β(− b0 k)
2
.
1≤|ξ|≤a0 k6=0
Instead of going through the technical details of the proof of Theo-

rem 5.2.7, we will focus on discussing a few examples below. At this point,
we only remark that the conditions on ψ are satisfied, if (for example)
|ψ̂(ξ)| . |ξ|α (1 + |ξ|)−γ , with α > 0 and γ > α + 1.
Roughly speaking, if ψ is an admissible mother wavelet (in the sense of

the continuous wavelet transform), then we can expect that the discretized
ψm,n will form a frame for (a0 , b0 ) close to (1, 0). In fact, for some examples
we can use (a0 , b0 ) rather far from this special value.
It is convenient to consider the value a0 = 2. If we hope to have an
(almost) tight frame, then the bounds
X
A ≤ 2πb0 |ψ̂(am 2
0 ξ)| ≤ B
m
imply that m |ψ̂(2m ξ)|2 should be roughly constant (for ξ 6= 0). This is a
P
rather strong condition. One way to remedy this is to use multiple wavelets
ψ 1 , . . . , ψ N , and to consider the frame obtained from {ψm,n
ν } (setting a = 2
0
for each).
The analogue of Theorem 5.2.7 for the windowed Fourier transform is
given by the following. In this case, we consider the functions
gm,n (x) = eimω0 x g(x − nt0 )
for m, n ∈ Z.
Theorem 5.2.8. If gm,n form a frame for L2 with frame bounds A, B, then
2π 2
A≤ ω0 t0 kgkL2 ≤ B.
Conversely, suppose
X X
inf |g(x − nt0 )|2 > 0, sup |g(x − nt0 )|2 < ∞,
0≤x≤t0 0≤x≤t0
n∈Z n∈Z
and X
β(s) := sup |g(x − nt0 )| |g(x − nt0 + s)|
0≤x≤t0 n
decays at least as fast as (1 + |s|)−(1+ε) for some ε > 0. Then there exists
ω0∗ > 0 so that the gm,n form a frame whenever ω0 < ω0∗ , with the following
frame bounds:
X X 1
A = ω2π0 inf |g(x − nt0 )|2 − β( ω2π0 k)β(− ω2π0 k) 2 ,
x
n k6=0
X X 1
B = ω2π0 sup |g(x − nt0 )|2 + β( ω2π0 k)β(− ω2π0 k) 2 .

x n k6=0
Example 5.2.3 (The Mexican hat function). The Mexican hat function ψ is
2
the second derivative of the Gaussian e−x /2 . Normalizing the L2 norm and
imposing ψ(0) > 0, one finds
1 2 /2
ψ(x) = √2 π − 4 (1 − x2 )e−x .
3
If one uses at least two voices, this yields an essentially tight frame for
b0 ≤ .75. This wavelet is often used in computer vision applications.
Example 5.2.4. For the windowed Fourier transform, the Gaussian g(x) =
1 2
π − 4 e−x /2 is commonly used. It turns out that the ratio ω0 t0 ÷2π is relevant.
In particular, the gm,n form a frame whenever ω0 t0 < 2π. This is related
to the notion of time-frequency density, which is discussed in detail in [7,
Chapter 4].
Time-frequency localization in frames.

We have the following result regarding time-frequency localization.
−m/2
Theorem 5.2.9. Suppose ψm,n (x) = a0 ψ(a−m
0 x − nb0 ) forms a frame
with bounds A, B. Suppose that
|ψ(x)| . hxi−α and |ψ̂(ξ)| . |ξ|β hξi−(β+γ)
for some α > 1, β > 0, and γ > 0. Then for any ε > 0, 0 < Ω0 < Ω1 , and
T > 0, there exists a finite set B such that for any f ∈ L2 ,
X
kf − hf, ψm,n iψ
] m,n kL2
B
q
B
εkf kL2 + kf kL2 (|x|>T ) + kfˆkL2 (|ξ|≤Ω0 ) + kfˆkL2 (|ξ|>Ω1 ) .

≤ A
The set B is defined by
B = {(m, n) : m0 ≤ m ≤ m1 , |nb0 | ≤ a−m

0 T + t}, (5.8)
where m0 , m1 , t are chosen depending on Ω0 , Ω1 , T, ε.

The analogous result for the windowed Fourier transform is the following.
Theorem 5.2.10. Suppose gn,m (x) = eimω0 x g(x − nt0 ) form a frame with
bounds A, B. Suppose that
|g(x) . hxi−α and |ĝ(ξ)| . hξi−α
with α > 1. Then for any ε > 0, there exists tε , ωε such that for any f ∈ L2
and T, Ω > 0, we have
X
kf − hf, gm,n ig̃m,n kL2
B
q
B
εkf kL2 + kf kL2 (|x|>T ) + kfˆkL2 (|ξ|>Ω) ,

≤ A
where B = {(m, n) : |mω0 | ≤ Ω + ωε , |nt0 | ≤ T + tε }.
We will only sketch a proof of Theorem 5.2.9. Similar ideas suffice to

establish Theorem 5.2.10.
Sketch of the proof of Theorem 5.2.9. We define B as in (5.8). We will es-

timate the norm by duality. Fix h ∈ L2 with khkL2 = 1. Then
X X
hf, hi − hf, ψm,n ihψ
]m,n , hi = hf, ψm,n ihψ
]m,n , hi.
B Bc
This can be controlled by two sums. First, we have

X
[|hPΩ f, ψm,n i| + |h(1 − PΩ )f, ψm,n ]|hψ
]m,n , hi|
B1
where
B1 = {(m, n) : m ≤ m0 or m > m0 }.
Here we write PΩ to be the Fourier multiplier operator cutting off the fre-
quency support to Ω0 ≤ |ξ| ≤ Ω1 . Next, we have
X
[|hQT f, ψm,n i| + |h(1 − QT )f, ψm,n i|]|hψ
]m,n , hi|,
B2
where
B2 = {(m, n) : m0 ≤ m ≤ m1 and |nb0 | > a−m

0 T + t}.
Using the fact that the ψ] −1 −1

m,n are a frame with frame bounds B , A ,
we can estimate
X
|h(1 − PΩ )f, ψm,n i| |hψ
]m,n , hi|
B1
X 1 X 1
2 2
2 2
≤ |h(1 − PΩ )f, ψm,n i| |hψ
]m,n , hi|
m,n m,n
1
− 21
≤ B k(1 − PΩ )f kL2 A khkL2
2
q
≤ B ˆ ˆ

A kf kL2 (|ξ|≤Ω0 ) + kf kL2 (|ξ|>Ω1 ) .
Similarly, the contribution of (1 − QT )f to the B2 sum is controlled by

the kf kL2 (|x|>T ) term. It remains to show that the remaining terms can be
q
controlled by ε B A kf kL2 . We deal only with the PΩ term, leaving the QT
term as an exercise.
Applying Cauchy–Schwarz and writing g = PΩ f , we are led to estimate
XZ 2
X m m nξ
2 m ib a

|hg, ψm,n i| = 0
2
ĝ(ξ)a ψ̂(a0 ξ)e 0 dξ
0

B1 B1
2πb−1 −m 2
XZ 0 a0
ib0 am
X
0 nξ 2πà−m −1 m −1

=
e ĝ(ξ + 0 b0 )ψ̂(a0 ξ + 2π`b0 ) dξ
B1 0 `∈Z
2πb−1 −m 2
0 a0
X Z
X
2π
2πà−m −1 m
2π`b−1

= b0

ĝ(ξ + 0 b0 )ψ̂(a0 ξ + 0 )
dξ,
0
m∈B10 `
where we use Plancherel (on the torus) in the final step and
B10 = {m : m ≤ m0 or m > m0 }.
Expanding this out, this becomes

X Z
2π
b0 |fˆ(ξ)| |fˆ(ξ − 2πà−m −1 −1
0 b0 )| |ψ̂(ξ)| |ψ̂(ξ − 2π`b0 )| dξ,
S−
`∈Z,m∈B10
where
S± = {Ω0 ≤ |ξ|, |ξ ± 2πà−m −1
0 b0 | ≤ Ω1 }.
We now apply Cauchy–Schwarz and a change of variables. We are led to

estimate 1
X Y Z 2
−1 ˆ 2
b0 |f (ξ)| Fσ (ξ) dξ ,
` σ∈± Sσ
where X
−1
F± = |ψ̂(am
0 ξ)|
2−λ
|ψ̂(am
0 ξ ± 2πb0 `)|
λ
m∈B10
for some 0 < λ < 1 to be chosen below. Now, by assumption,

−1 m −γ m −1 −γ
|ψ̂(am m

0 ξ)| ψ̂(a0 ξ ± 2πb0 `) . ha0 ξi ha0 ξ ± 2πb0 ì

. hì−γ .
Continuing from above,

X
|hPΩ f, ψm,n i|2
B1
X X
. b−1 2
0 kPΩ f kL2 hì−γλ sup |ψ̂(am
0 ξ)|
2(1−λ)
.
` Ω0 ≤|ξ|≤Ω1 m∈B 0
1
The sum in ` converges provided γλ > 1, while

X
sup |ψ̂(am
0 ξ)|
2(1−λ)
Ω0 ≤|ξ|≤Ω1 m∈B 0
1
X X
−2γ(1−λ)
. ham
0 Ω0 i + (am
0 Ω1 )
2β(1−λ)
m>m1 m<m0
1 −2γ(1−λ)
. (Ω0 am
0 ) + (Ω1 am0 2β(1−λ)
0 ) .
Choosing λ = 21 (1 + γ −1 ) (say) and then choosing m0 sufficiently negative

and m1 sufficiently positive, we can arrange
X
|hPΩ f, ψm,n i| ≤ Bε2 kf k2L2 ,
B1
as desired. The QT f term is left as an exercise. This completes the proof.
Redundancy in frames
We close this section by making a brief comment about redundancy in
frames. For frames that are tight or close to tight, this can be measured by
the size of 12 (A + B).
5.3. MULTIRESOLUTION ANALYSIS 123
Suppose {ϕj }j∈J is a frame. If additionally {ϕj } is an orthonormal basis

then the map f 7→ hf, ϕj i is a unitary map from H onto `2 (J). If the frame
is redundant, then the range is a strict subset of `2 (J).
Recall that the reconstruction formula
X
f= hf, ϕj iϕ̃j
j∈J
involves a projection onto the range of this map, written f = F̃ ∗ F f .

Now, if the coefficients hf, ϕj i were modified by adding some αj (e.g. be-
cause of a roundoff error in a numerical computation) then the reconstructed
function would be given by
fapp = F̃ ∗ (F f + α).
However, since F̃ ∗ includes a projection onto the range of F , the part of the
sequence α that is orthogonal to the range of F does not contribute. Thus,
we expect
kf − fapp k = kF̃ ∗ αk
to be smaller than kαk. This effect should become even more pronounced
if the frame is more redundant, since in this case the range of F becomes
even ‘smaller’.
Example 5.2.5. Let u1 = (1, 0) and u2 = (0, 1), giving an orthonormal basis
for C2 . Let
√ √
3 3
e1 = u2 , e2 = − 2 u1 − 12 u2 , e3 = 2 u1 − 21 u2 .
Then e1 , e2 , e3 is a tight frame with frame bound 23 .

Suppose we add αj ε to the coefficients hf, uj i or hf, ej i, where αj are
independent normal random variables. Then one can compute
X
[hf, uj i + αj ε]uj k2 = 2ε2 ,

E kf −
while X
E kf − 32 [hf, ej i + αj ε]ej k2 = 43 ε2 .

5.3 Multiresolution analysis

We turn to the notion of a multiresolution analysis.
Definition 5.3.1. A multiresolution analysis is a sequence of closed
subspaces Vj ⊂ L2 satisfying the following conditions:
(i) Vj ⊂ Vj−1 for all j ∈ Z,
(ii) j∈Z Vj = L2 (R),

S
T
(iii) j∈Z Vj = {0},
(iv) f ∈ Vj ⇐⇒ f (2j ·) ∈ V0 ,
(v) f ∈ V0 =⇒ f (· − n) ∈ V0 for all n ∈ Z,
(vi) There exists φ ∈ V0 (called the ‘scaling function’) such that
{φ0,n : n ∈ Z}
is an orthonormal basis for V0 , where

j
φj,n (x) := 2− 2 φ(2−j x − n).
Thus, all the spaces are scaled versions of a central space V0 , which is
invariant under integer translations. The final condition (vi) may be relaxed
considerably, but we begin with this simpler setting.
We will write Pj for the orthogonal projection onto Vj .
Example 5.3.1 (Haar multiresolution analysis). Let
Vj = {f ∈ L2 (R) : f [2j k,2j (k+1)) = constant for all k ∈ Z}.

We may take φ(x) = 1[0,1] (x).

We will prove the following theorem.
Theorem 5.3.2. If {Vj }j∈Z forms a multiresolution analysis of L2 , then

there exists an orthonormal wavelet basis {ψj,k : j, k ∈ Z} such that
X
Pj−1 = Pj + h·, ψj,k iψj,k .
k
Proof. Let Wj denote the orthogonal compliment of Vj in Vj−1 , so that
Vj−1 = Vj ⊕ Wj .
Thus
J−j−1
M
Vj = VJ WJ−k for all j < J.
k=0
By (ii) and (iii), this implies

M
L2 = Wj .
j∈Z
Next, note that the Wj also inherit the scaling property (iv):
f ∈ Wj ⇐⇒ f (2j ·) ∈ W0 .
Using the scaling/transation property, the problem therefore reduces to find-

ing ψ ∈ W0 such that ψ(· − k) produces an orthonormal basis for W0 .
To do this, we will utilize the scaling function φ of (vi). In particular,
since φ ∈ V0 ⊂ V−1 and φ−1,n are an orthonormal basis for V−1 , we may
write
X X
φ= hn φ−1,n , with hn = hφ, φ−1,n i and |hn |2 = 1.
n n
We rewrite this as
√ X X
φ(x) = 2 hn φ(2x − n), so that φ̂(ξ) = √1
2
hn e−inξ/2 φ̂(ξ/2).
n n
Defining
X
m0 (ξ) = √1
2
hn e−inξ ,
n
this becomes φ̂(ξ) = m0 (ξ/2)φ̂(ξ/2), with m0 a 2π periodic function.

We will now show that with
ψ̂(ξ) := eiξ/2 m0 ( 2ξ + π)φ̂( 2ξ ), (5.9)
the functions {ψ(· − k)} form an orthonormal basis for W0 (and so {ψj,k }
form a wavelet basis for L2 ).
To this end, we let f ∈ W0 , i.e. f ∈ V−1 and f ⊥ V0 . By assumption,
we have X
f= fn φ−1,n , fn = hf, φ−1,n i,
n
which becomes (as in the above computation)

X
fˆ(ξ) = mf (ξ/2)φ̂(ξ/2), mf (ξ) = √1
2
fn e−inξ .
n
The assumption that f ⊥ φ0,k for all k becomes

Z 2π X
0= eikξ fˆ(ξ + 2π`)φ̂(ξ + 2π`) dξ,
0 `
so that X
fˆ(ξ + 2π`)φ̂(ξ + 2π`) ≡ 0.
`
Inserting the identities above (evaluating at 2ξ), we get

X
mf ( 2ξ + π`)m0 ( 2ξ + π`)|φ̂( 2ξ + π`)|2 = 0
`
As mf and m0 are 2π periodic, we this becomes

X
0 = mf ( 2ξ )m0 ( 2ξ ) |φ̂( 2ξ + π`)|2
` even
X
+ mf ( 2ξ + π)m0 ( 2ξ + π) |φ̂( 2ξ + π`)|2 .
` odd
We next claim X
|φ̂(ξ + 2π`)|2 ≡ 1
2π , (5.10)
`∈Z
which then implies
mf (·)m0 (·) + mf (· + π)m0 (· + π) ≡ 0. (5.11)
To prove (5.10), we rely on orthogonality. In particular,

Z
δk0 = φ(x)φ(x − k) dx
Z Z 2π X
= eikξ |φ̂(ξ)|2 dξ = eikξ |φ̂(ξ + 2π`)|2 dξ,
0 `∈Z
which implies (5.10).

Now, arguing as we did to derive (5.11), we have
X
|m0 ( 2ξ + π`)|2 |φ̂( 2ξ + π`)|2 ≡ 2π
1
,
`
which yields
|m0 (ξ)|2 + |m0 (ξ + π)|2 ≡ 1.
In particular, m0 (·) and m0 (·+π) cannot both equal zero on a set of nonzero
measure, and hence (5.11) implies that there exists a 2π-periodic function
λ so that
mf (ξ) = λ(ξ)m0 (ξ + π) and λ(ξ) + λ(ξ + π) ≡ 0.
We rewrite the final expression as
λ(ξ) = eiξ ν(2ξ)
for some 2π-periodic ν. Returning to our formula for fˆ, we get
fˆ(ξ) = eiξ/2 m0 ( 2ξ + π)ν(ξ)φ̂( 2ξ ),
which (by periodicity) allows us to write

X X
fˆ(ξ) = νk e−ikξ ψ̂(ξ), i.e. f =

νk ψ(· − k)
k k
for suitable νk .
It therefore remains to verify that the {ψ(· − k)} belong to V−1 ∩ V0⊥
and are orthonormal. This we leave as an exercise!
Remark 5.3.3. The proof above provides an example of a wavelet that one
can use. Such examples are not unique—one could modify ψ̂ by multiplying
by any 2π-periodic function that has magnitude one almost everywhere.
Using this freedom, we will take
X
ψ= gn φ−1,n , gn = (−1)n h−n+1 .
n
Let us mention a few examples.

Example 5.3.2. In the Haar multiresolution analysis, we take φ = 1[0,1] . In
this case (
√ Z √1 n = 0, 1
hn = 2 φ(x)φ(2x − n) dx = 2
0 else.
It follows that

1
 0 ≤ x < 21
√1 φ−1,0 − √1 φ−1,1 = −1 1
2 ≤x<1
ψ= 2 2 
0 else.

This is the Haar wavelet basis.

Example 5.3.3 (Meyer basis). Define φ via its Fourier transform:


1
 |ξ| ≤ 2π
3
φ̂(ξ) = √12π · cos[ π2 ν( 2π
3
|ξ| − 1)] 2π
3 ≤ |ξ| ≤ 4π
3 ,

0 else,

where ν satisfies ν = 0 for x ≤ 0, ν = 1 for x ≥ 1, and ν(x) + ν(1 − x) = 1.

We then take V0 to be the closed subspace spanned by the (orthonormal)
set {φ(· − k)}. In this case, it turns out that
√
ψ̂(ξ) = 2πeiξ/2 [φ̂(ξ + 2π) + φ̂(x − 2π)]φ̂( 2ξ ).
See [7] (say) for more details.

In practice, orthonormality of the basis φ(· − k) can be relaxed to requir-
ing that 2
X X X
2
|ck |2

A |ck | ≤
ck φ(· − k)
2 ≤ B
k L
(in which case they are called a Riesz basis). In many examples, one starts
with the scaling function φ and then defines V0 by taking {φ(· − k)} as an
(orthonormal) basis. The spaces Vj are then the closed subspace spanned
by
φj,k (x) = 2−j/2 φ(2−j x − k).
This construction will lead to a multiresolution analysis provided
X
φ(x) = cn φ(2x − n)
n
for some {cn } ∈ `2 , with

X
0<α≤ |φ̂(ξ + 2π`)|2 ≤ β < ∞.
`
See [7] for more detail.

We close this section by describing the connection between multiresolu-
tion analysis and subband filtering schemes, which play an important role
in applications.
Suppose we begin with some initial approximation to a function f , say
f 0 = P0 f . We can then write
f 0 = f 1 + δ 1 ∈ V1 ⊕ W1 = V0 ,
where f 1 = P1 f 0 is the next coarser approximation to f in the multiresolu-

tion analysis, and δ 1 = f 0 − f 1 = Q1 f 0 represents the information lost in
the transition from f 0 to f 1 . Here Q1 denotes orthogonal projection onto
W1 .
Recalling that Vj , Wj have orthonormal bases φj,k , ψj,k , we may write
X X
fj = cjn φj,n , δ 1 = d1n ψ1,n .
n n
In fact, we can describe these coefficients c1k and d1k . To see this, first recall
that by construction we may write
X X
ψj,k = gn φj−1,2k+n = gn−2k φj−1,n ,
n n
and similarly X
φj,k = hn−2k φj−1,n .
n
Using this, we deduce

X X
c1k = hn−2k c0n and d1k = gn−2k c0n .
n n
In compact notation, c1 = Hc0 and d1 = Gc0 .

Continuing this, we can write f 1 = f 2 + δ 2 with c2 = Hc1 and d2 = Gc1 .
In practice, one stops this after a finite number of levels. In particular,
we write the information in c0 in the vectors d1 , . . . , dJ and a final coarse
approximation cJ .
This process is invertible, and can be summarized via
cjk = djk =
X X
hn−2k cj−1
n , gn−2k cj−1
n ,
n n
hn−2k cjk + gn−2k djk .

X
cnj−1

=
k
In engineering applications, these are the analysis and synthesis steps of

a ‘subband filtering scheme’. In the analysis part, the incoming sequence
is convolved with two filters (one ‘low-pass’ and one ‘high-pass’). In the
synthesis part, the resulting sequences are subsampled (i.e. only the even
or odd entries are retained).
This type of subband filtering scheme is central to many applications of
wavelets (e.g. the JPEG2000 compression algorithm).
5.4 Exercises
Exercise 5.4.1. Work out the details in the discussion of Example 5.1.2.
Exercise 5.4.2. Solve the differential equation discussed in Example 5.1.4.
Exercise 5.4.3. Show that if {ϕj } is a tight frame with bound A then
X
f = A−1 hf, ϕj iϕj .
j
Exercise 5.4.4. Prove Lemma 5.2.4.

Exercise 5.4.5. Prove Proposition 5.2.5.
Exercise 5.4.6. Show that for t chosen sufficiently large in the proof of The-
orem 5.2.9, we can arrange
kQT f k2L2 . Bεkf k2L2 .
Exercise 5.4.7. Show that the functions {ψ(· − k)} defined in Theorem 5.3.2
are orthonormal. [Hint: Use the same trick appearing in the proof of the
theorem. That is, apply Plancherel and split the integral into a sum over
` ∈ Z. Split the sum into even and odd parts and use the properties of m0
and φ̂ that were already established.]
Chapter 6
Classical harmonic analysis,

part I
Many problems in harmonic analysis are related to the boundedness prop-

erties of certain operators on various function spaces. For example, in this
section we will study a special example (the Hardy–Littlewood maximal
function) and its variants, along with other classes of operators for which
we can understand boundedness properties.
An extremely useful tool for proving boundedness properties of operators
is that of interpolation, and so we begin there.
6.1 Interpolation
In this section we will consider sublinear operators, i.e. those satisfying
|T (cf )| = |c| |T (f )| and |T (f + g)| ≤ |T (f )| + |T (g)|. This includes not only
linear operators, which include many important examples, but also maximal
operators, as well as square function type operators. The latter two may
have the form
X 1
T f = sup |Tn f | or T f = |Tn f |2 2
n n
for some collection of linear or sublinear operators Tn .

We begin by making some notions of boundedness precise.
Definition 6.1.1. Let 1 ≤ p, q ≤ ∞ and let T be sublinear. We say T is of
(strong) type (p, q) if we have an estimate of the form
kT f kLq ≤ Ckf kLp
131
132 CHAPTER 6. CLASSICAL HARMONIC ANALYSIS, PART I
for some C = C(T, p, q) and all f ∈ Lp . Here we typically take Lp (Rd )

and Lq (Rd ), but one can of course work in Lp (X) and Lq (Y ) for more
general measure spaces. Also, we may initially only require that T be defined
pointwise on a dense subclass of Lp ; the bound above allows for a unique
extension to all of Lp .
Similarly, for q < ∞ we say T is of weak type (p, q) if
kT f kLq,∞ ≤ Ckf kLp
(cf. Section A.1). When q = ∞, we define weak type (p, q) to mean strong
type (p, q).
Note that if T is of strong type (p, q), then T is of weak type (p, q);
indeed, by Tchebychev’s inequality,
kT f kLq,∞ ≤ kT f kLq .
As usual, we can define operator norms
kT kLp →Lq = inf{C : kT f kLq ≤ Ckf kLp for all f ∈ Lp },
and so on.
In the following, we will always consider operators with the property
that h|T f |, |g|i is finite whenever f, g are taken to be simple functions with
finite measure support.
We will study the problem of interpolating various types of bounds for
operators T . Let us begin with the following application of Hölder’s in-
equality, which is essentially an interpolation-type result for the identity
operator.
Lemma 6.1.2 (Hölder’s inequality). We have the following estimate:
kf kLp . kf kθLp0 kf k1−θ

Lp1
whenever 1 ≤ p0 , p1 ≤ ∞ and
1 θ 1−θ
p = p0 + p1 for some θ ∈ [0, 1].
To prove this, write |f | = |f |θ |f |1−θ and apply Hölder’s inequality with

p1
exponents pθ0 and 1−θ .
Let us turn to a slightly more general result, in which we begin with
strong type bounds and vary the exponent in either the domain or target
space alone.
6.1. INTERPOLATION 133
Proposition 6.1.3 (Warmup version of interpolation). The following hold:
(i) Suppose T is type (p, q0 ) and type (p, q1 ) for some 1 ≤ p, q0 , q1 ≤ ∞.

Then T is type (p, q) for all q such that
1 θ 1−θ
q = q0 + q1 for some θ ∈ [0, 1].
In fact,
kT kLp →Lq . kT kθLp →Lq0 kT k1−θ
Lp →Lq1 .
(ii) Suppose T is type (p0 , q) and type (p1 , q) for some 1 ≤ p0 , p1 , q ≤ ∞.

Then T is type (p, q) for all p such that
1 θ 1−θ
p = p0 + p1 for some θ ∈ [0, 1].
In fact,
kT kLp →Lq . kT kθLp0 →Lq kT k1−θ
Lp1 →Lq .
Proof. For (i), we apply Hölder’s inequality (in the form of the lemma
above). We find
kT f kLq . kT f kθLq0 kT f k1−θ θ 1−θ

Lq1 . kT kLp →Lq0 kT kLp →Lq1 kf kL ,
p
which yields the result.

We turn to (ii). Without loss of generality, suppose p1 ≤ p ≤ p0 . We let
λ > 0 and estimate

kT f kLq ≤ kT f χ{|f |≤λ} kLq + kT f χ{|f |>λ} kLq
. kT kLp0 →Lq kf χ{|f |≤λ} kLp0 + kT kLp1 →Lq kf χ{|f |>λ} kLp1
1 p
X 1− pp p
. λ j kT kLpj →Lq kf kLjp .
j=0
Optimizing in λ now yields the result.
In the following we will extend Proposition 6.1.3 to the setting where

we vary both of the parameters p and q. Furthermore, we will see that
even if we begin with weak type bounds, we can obtain strong type bounds
through interpolation. There are two standard results, known as ‘real inter-
polation’ and ‘complex interpolation’ (in reference to the techniques used in
the proofs).
Our first main goal is the following:
Theorem 6.1.4 (Marcinkiewicz interpolation theorem). Let T be a sublin-

ear operator and
1 ≤ p0 , q0 , p1 , q1 ≤ ∞, p0 6= p1 , q0 6= q1 .
If T is of weak type (p0 , q0 ) and weak type (p1 , q1 ), then T is of type (p, q)
for all (p, q) such that p ≤ q and
( p1 , 1q ) = ( pθ0 + 1−θ θ
p1 , q0 + 1−θ
q1 ) for some θ ∈ (0, 1). (6.1)
Remark 6.1.5. We have not pursued optimal hypotheses here. For exam-
ple, it suffices to assume ‘restricted weak type’ bounds on T rather than
weak type bounds, and the conclusion of the theorem can be stated in terms
of more general Lorentz spaces (rather than just Lebesgue spaces). We will
not pursue such generality in these notes; however, we would like to point
out that although we will work in the context of Lebesgue measure, this
result applies to operators mapping Lp (dµ1 ) to Lq (dµ2 ) for more general
measures.
Remark 6.1.6. We would also like to point out that to obtain strong type
bounds, the restriction to p ≤ q is necessary. To see this consider for example
1
T f = |x|− 2 f
in dimension d = 1. Let us check that T maps L2 to L1,∞ and L∞ to L2,∞ .

2p
However, T does not map Lp into L p+2 for any 2 α}| . α−1 kf kL2
uniformly in α; this yields the L2 → L1,∞ bound. We write
1 [ 1
{|x|− 2 |f | > α} ⊂ {|x| 2 α ∼ N and |f | > N }
N ∈2Z
Thus, by Tchebychev’s inequality and volume bounds we can estimate

1 X
|{|x|− 2 |f | > α}| . min{α−2 N 2 , N −2 kf k2L2 }.
N ∈2Z
Choosing the optimal N and summing leads to the desired estimate.

2p
Finally, to see that T does not map Lp into L p+2 , consider the function
− p1 − p+2
x 7→ |x| [log(|x| + |x|−1 )] 2p .
and observe that

x 7→ |x|−1 log(|x| + |x|−1 )−θ
belongs to L1 if and only if θ > 1. (See the exercises for more details.)
Before beginning the proof of Marcinkiewicz interpolation, let us collect

a few preliminary lemmas. The first lemma is an improvement of Hölder’s
inequality in a special case.
Lemma 6.1.7. Let f ∈ Lq,∞ with q > 1 and let E be a set of finite measure.
Then
1
|hf, χE i| . kf kLq,∞ |E| q0
Proof. Note that the distribution function of f χE is given by
α 7→ |{x ∈ E : |f | > α}|,
so that
Z Z ∞
|f χE | dx = |{x ∈ E : |f | > α}| dα.
0
Now, for each α we have the bound
|{x ∈ E : |f | > α}| . min{|E|, α−q kf kqLq,∞ }.
Setting
− 1q
α0 = kf kLq,∞ |E| ,
we therefore have
Z Z α0 Z ∞
|f χE | dx . |E| dα + α−q kf kqLq,∞ dα
0 α0
1
. kf kLq,∞ |E| , q0
as desired.
The next lemma is a weakened version of Marcinkiewicz.

Lemma 6.1.8 (Weak Marcinkiewicz). Under the assumptions of the Marcinkiewicz

theorem, the operator T satisfies
1 1
|hT χF , χE i| . |F | p |E| q0
for all (p, q) as in (6.1) and all finite-measure sets F, E.
Remark 6.1.9. The estimate appearing above would be an immediate con-

sequence of the refined Hölder inequality and a weak type bound, but only
provided q > 1. In particular, we do not know a priori that this bound
holds for T using the exponents (pj , qj ). The purpose of this lemma will be
to allow us to assume qj > 1 (at the price of replacing weak type bounds
with the restricted weak type bounds appearing above).
Proof. Let F be a set of finite measure. By assumption,

1 1
α|{|T χF | > α}| qj . kχF kLpj . |F | pj
uniformly in α for j = 0, 1. Recalling the definition of (p, q), it follows that

1 θ
+ 1−θ 1
α|{|T χF | > α}| q . |F | p0 p 1 . |F | p ,
uniformly in α, and thus

1
kT χF kLq,∞ . |F | p .
for all sets of finite measure. As we may assume q > 1 (cf. q0 6= q1 ), we may
apply the previous lemma to deduce
1 1
|hT χF , χE i| . kT χF kLq,∞ kχE kLq0 . |F | p |E| q0 ,
which implies the desired result.
With the preliminaries in place, we are now ready to prove the Marcinkiewicz
interpolation theorem.
Proof of the Marcinkiewicz interpolation theorem. Our goal is to show that
kT f kLq . kf kLp
for (p, q) as defined above. By duality and density, it suffices to prove that
|h|T f |, |g|i| . kf kLp kgkLq0 (6.2)

for f and g simple functions with finite measure support, say. Furthermore,
we may assume f, g are nonnegative.
Using Lemma 6.1.8, we may assume that assume that
0
|h|T χF |, |χE |i| . min |F |1/pj |E|1/qj

(6.3)
j=0,1
for any F, E (cf. Remark 6.1.9 above).

We turn to (6.2). In fact, it will be convenient to write down a different
decomposition for f and g. First, writing
X
g= g · χ{2m−1 ≤g<2m } ,
m∈Z
we find (by the triangle inequality) that we may assume

X
g= 2m χEm ,
m∈Z
where Em are disjoint sets such that

X 10
q
mq 0
kgkLq0 ∼ 2 |Em | .
m
For f , it is not enough to use upper bounds (due to the presence of T ).

Instead, we write
X X ∞
X
f= f · χ{2n−1 ≤f <2n } = 2 n
2−j χAjn ,
n∈Z n∈Z j=1
where Ajn is the set of x such that 2n−1 ≤ f (x) < 2n and the coefficient of
2−j in the binary expansion of 2−n f (x) is 1. In particular, we may write
∞
X X
f= 2−j 2n χFnj ,
j=1 n∈Z
where Fnj are disjoint sets and for each j

X 1
p
np
2 |Fnj | . kf kLp . (6.4)
n
−j
P
In particular, by applying sublinearity and noting that j≥1 2 is finite,
we find that it also suffices to consider f of the form
X
f= 2n χFn
n∈Z
with the Fn obeying (6.4).

Applying (6.3) and sublinearity, we find
X 0
|h|T f |, |g|i| . 2n 2m min {|Fn |1/pj |Em |1/qj }
j=0,1
m,n
1 1
0 − q0 m
1
X 1 −1 1
2n |Fn | p min |Fn | pj p |Em | qj

. 2 |Em | q0
j=0,1
m,n
Recalling the definition of p, q, it follows that we may rewrite the minimum

appearing above as
1 1
j−θ
0 − q0
1
− p1 q1
min |Fn | p1 0 |Em | 0 .
j=0,1
Applying a dyadic decomposition, we are now led to estimate

X X 1 1
j−θ X
0 − q0
1
n p
1 − p1 q m
1
0
2 A min A 1 0 B 1 0
p 2 B q .
j=0,1
A,B∈2Z n:|Fn |∼A m:|Em |∼B
Now, because p0 6= p1 and q0 6= q1 , we are in a position to apply Hölder’s

inequality and Schur’s test (Lemma A.3.4) and bound this quantity by
X X
1 1
n p m q0

2 A 2 B 0
p p

n:|Fn |∼A À m:|Em |∼B `B

X 1 X 1
n p m

2 A 2 B q0
. q0 ,

p

n:|Fn |∼A À m:|Em |∼B `B
where we have used p ≤ q and the nesting of sequence spaces in the final
step. It therefore remains to verify that
X X p X X
1 X
n p
2 A . 2np |A| ∼ 2np |Fn |,
A n:|Fn |∼A A n:|Fn |∼A n
q0
and similarly for the `B sum. The desired inequality follows from the fact
that X p X
2n ≤ |2 max 2n |p ≤ 2p 2np
n∈S
n∈S n∈S
for any finite set S ⊂ Z. This completes the proof that T is type (p, q).
We turn to another interpolation result. This estimate requires strong

type bounds to begin with, but yields strong type bounds with no restriction
on the order of p and q; it also easily gives an estimate on the operator norm.
Theorem 6.1.10 (Riesz–Thorin complex interpolation). Let T be a linear
operator and 1 ≤ p0 , p1 , q0 , q1 ≤ ∞.
If T is strong type (p0 , q0 ) and (p1 , q1 ), then T is strong type (p, q) for
all (p, q) such that
( p1 , 1q ) = ( pθ0 + 1−θ θ
p1 , q0 + 1−θ
q1 ) for some θ ∈ (0, 1).
In fact,
1−θ
kT kLp →Lq . kT kθLp0 →Lq0 kT kLp1 →Lq1 .
Proof. Clearly we may assume (p0 , q0 ) 6= (p1 , q1 ); otherwise, there is nothing

to prove. Let us prove the estimate
hT f, gi . Aθ0 A1−θ kf kLp kgk q0

1 L
for all f, g simple functions of finite measure support, where
Aj = kT kLpj →Lqj .
Let us first assume that none of the exponents equal infinity. Writing f =
P
ak χFk , we observe that
θ 1−θ
X p X p
f = sgn(f ) |ak | p0 χFk |ak | p1 χFk ,
which follows from the fact that

X p
apk χFk
X
= ak χFk
for disjoint sets Fk . In other words, we may factor
f = cf0θ f11−θ ,
where fj is a simple function in Lpj . Similarly, we may factor

0
g = c0 g0θ g11−θ , gj ∈ Lqj .
Note that under this decomposition, we have

1−θ
kf0 kθLp0 kf1 kLp1 ∼ kf kLp
and similarly for g.

Now we define
F (z) = hT (cf0z f11−z ), c0 g0z g11−z i.
One can verify (by linearity of T and the fact that the functions involved
are simple functions) that F is analytic and has at most exponential growth
in z. Furthermore, we claim that by hypothesis we have
(
A0 kf0 kLp0 kg0 kLq00 Re z = 1
|F (z)| .
A1 kf1 kLp1 kg1 kLq10 Re z = 0.
Indeed, let us consider the case Re z = 1. Then we can write
f0z f11−z = f0 · h where |h| ≡ 1
on the support of f , and similarly for the g term. Thus the result follows
from Hölder’s inequality and the assumption that T is type (p0 , q0 ).
Applying the three lines lemma of complex analysis (Lemma A.3.6), we
deduce that
1−θ
|F (θ)| . Aθ0 A11−θ kf0 kθLp0 kf1 kL θ 1−θ
p1 kg0 kLq0 kg1 kLq1
. Aθ0 A11−θ kf kLp kgkLq0
for θ ∈ (0, 1), which yields the result.

It remains to consider the case when one or more of the exponents is
infinite. In light of Proposition 6.1.3, we may assume p0 6= p1 and q0 6= q1 .
Let us give the idea, but leave the details to the interested reader. If, say,
p1 = ∞, we use the fact that p1 = pθ0 and write
X θ
X p
|ak |χFk = |ak | χFk
p0
in place of the decomposition above. We do a similar decomposition if one

of the qj is infinite. Following the same argument as above then yields the
result.
A similar argument extends this to the case where T also varies analyt-
ically in z. We leave the following result due to Stein as an exercise.
6.2. SOME CLASSICAL INEQUALITIES 141
Theorem 6.1.11 (Stein interpolation). Let Tz be a family of linear opera-

tors defined on {0 ≤ Re z ≤ 1} such that for each pair of simple functions
f, g of finite measure support, we have
z 7→ hTz f, gi
defines an analytic function on the strip with at most double-exponential

growth. Suppose that for some 1 ≤ p0 , p1 , q0 , q1 ≤ ∞ we have
kTz kLpj →Lqj ≤ Aj for Re z = j.
Then
kTθ kLp →Lq ≤ A01−θ Aθ1
for θ ∈ [0, 1] and (p, q) satisfying
( p1 , 1q ) = ( pθ0 + 1−θ θ
p1 , q0 + 1−θ
q1 ).
6.2 Some classical inequalities

In this section we discuss some applications of the interpolation results given
in the previous section.
The first is an estimate for the Fourier transform.
Theorem 6.2.1 (Hausdorff–Young inequality). The Fourier transform F
is strong type (p0 , p) for all 2 ≤ p ≤ ∞. That is,
kfˆkLp . kf kLp0 for all 2 ≤ p ≤ ∞.
Here we recall p0 is such that 1

p + 1
p0 = 1.
Proof. The Fourier transform F is strong type (1, ∞) and (2, 2). Therefore,
by interpolation, F is strong type (p0 , p) for all 2 ≤ p ≤ ∞.
Next we have the Hardy–Littlewood–Sobolev inequality. This is a stronger

version of what is commonly known as Young’s convolution inequality.
Theorem 6.2.2 (Hardy–Littlewood–Sobolev). The following estimate holds
for functions f, g : Rd → C:
kf ∗ gkLr . kf kLp kgkLq,∞
whenever
1 1
1 < p < r < ∞, 1 < q < ∞, and 1+ r = p + 1q .
In particular,
k|x|−(d−α) ∗ f kLr . kf kLp
whenever
1 α
0 < α < d, 1 < p < r < ∞, and r + d = p1 .
Before turning to the proof, we recall two basic convolution inequalities.
Lemma 6.2.3. The following hold:
kf ∗ gkL∞ . kf kLp kgkLp0 , kf ∗ gkLp . kf kLp kgkL1 ,
where 1 ≤ p ≤ ∞.
Proof. The first estimate follows from Hölder’s inequality, while the second
0
follows from duality: for h ∈ Lp , we use Fubini’s theorem to write
|hf ∗ g, hi| = |hg, f˜ ∗ hi| . kgkL1 kf ∗ hkL∞ . kgkL1 kf kLp khkLp0 ,
where f˜(x) = f (−x).
Proof of Theorem 6.2.2. The first estimate implies the second, cf.
k|x|−(d−α) ∗ f kLr . kf kLp k|x|−(d−α) k d ,∞ . kf kLp

L d−α
provided we define exponents as above.

We turn to the first estimate. We fix 1 < q < ∞ and g ∈ Lq,∞ and
consider the linear operator
T f = f ∗ g.
We normalize g so that kgkLq,∞ = 1. Our goal is then to prove that T is

strong type (p, r) for all 1 α}| . α−r kf krLp
uniformly in α > 0. To this end, let us fix α > 0.

Let λ > 0 and split g = g1 + g2 , where g1 = gχ{|g|≤λ} . In particular,
|{|f ∗ g| > α}| ≤ |{|f ∗ g1 | > 12 α}| + |{|f ∗ g2 | > 21 α}|.

We claim that by choosing λ sufficiently small, the first term on the right-
hand side vanishes. To see this, we first need the following bound:
q q
kg · χ|g|≤λ kLa ≤ kgkLa q,∞ λ1− a for any 1 < q < a < ∞. (6.5)
This can be deduced by rewriting the La -norm in terms of the distribution

function. We leave the details as an exercise. Then, using (6.5) with a = p0
(which is greater than q provided r < ∞), we find
1− pq0
kf ∗ g1 kL∞ . kg1 kLp0 kf kLp . λ kf kLp ,
1− pq0
which implies the claim, provided we choose λ αkf k−1
Lp , i.e.
0
p0 − 0p
λ = cα p0 −q kf kLpp −q
for 0 < c 1.
We are left to consider the contribution of g2 above. In this case, we
need the following:
kg · χ|g|>λ kL1 . kgkqLq,∞ λ1−q for any 1 < q < ∞, (6.6)
which we leave as a further exercise (and which can again be deduced by

writing the norm in terms of the distribution function). Now, applying
Tchebychev and (6.6), we estimate (using the form of λ above)
|{|f ∗ g2 | > 21 α}| . α−p kf ∗ g2 kpLp

. α−p kf kpLp kg2 kpL1
. α−p λp(1−q) kf kpLp kg2 kpq
Lq,∞
0 0
−p+p(1−q) p0p−q p−p(1−q) p0p−q
.α kf kLp .
Using the scaling relations, we can simplify this to
|{|f ∗ g2 | > 12 α}| . α−r kf krLp ,
which finally completes the proof.
An important application of the Hardy–Littlewood–Sobolev inequality

yields a class of so-called Sobolev embedding inequalities. To describe
them, we need to introduce a family of operators that extend the usual
notion of derivatives.
Recall that differentiation operators act as multiplication operators un-

der the Fourier transform. More precisely, we found
F[∂ α f ](ξ) = (iξ)α fˆ(ξ),
which we may write more succinctly as
∂ α = F −1 (iξ)α F
as operators. In fact, an important class of operators (known as Fourier

multiplier operators) arises in this fashion: given a function m(ξ), we
may define the operator
Tm := F −1 m(ξ)F.
Using Lemma 2.5.11, we find that (as long as m is a reasonably well-behaved

function) the operator Tm is given as a convolution operator, namely
d
Tm f = (2π)− 2 m̌ ∗ f,
where m̌ = F −1 m. Conversely, a convolution operator (i.e. an operator of

the form T f = g ∗ f ) may also be viewed as the Fourier multiplier operator
d
with symbol (2π) 2 ĝ.
Remark 6.2.4. We have just observed that convolution operators and

Fourier multipliers are one in the same. We remark here that a (bounded)
operator T is of this type if and only if it commutes with translations. In-
deed, to see that this is sufficient, note that for any test function ψ, the fact
that T commutes with translation implies
ψ ∗ T f = T ∗ ψ ∗ f,
where T ∗ denotes the adjoint of T . Thus, by Lemma 2.5.11,
∗ ψ · fˆ.
ψ̂ · Tcf = Td
2
Choosing ψ so that ψ̂ is everywhere nonzero (e.g. by choosing ψ = e−|x| , cf.
Lemma 2.5.9), we find that T is a Fourier multiplier operator, as desired. Let
us finally note here that all Fourier multipliers commute with one another,
as well.
Returning to the above, we can naturally extend the notion of derivative

and define the class of operators |∇|s as Fourier multiplier operators, namely,
|∇|s = F −1 |ξ|s F, s ∈ R.
For −d < s < 0 these operators are instead known as ‘fractional integra-
tion’ operators. In this case we can actually compute the inverse Fourier
transform of |ξ|s (in the sense of distributions) and so compute the corre-
sponding convolution kernel. In fact, we already saw a special case of this
in Section 2.6, when we used a symmetry argument to deduce that
F −1 |ξ|−2 = c|x|−1 when d = 3.
Arguing similarly, one finds that
F −1 |ξ|−s = cs,d |x|s−d for 0 < s < d.
Exercise 2.9.16 shows how to compute exactly the constants cs,d .

Having introduced this notion of fractional derivative (or fractional in-
tegration), we can now state an important class of Sobolev embedding in-
equalities.
Theorem 6.2.5 (Sobolev embedding). Let

1 1
s > 0, 1<p<q<∞ and q = p − ds .
For any f ∈ S, we have
kf kLq . k|∇|s f kLp .
Remark 6.2.6. Let us briefly explain the name Sobolev embedding. Just
as one has ‘Lebesgue spaces’, there is also a notion of ‘Sobolev spaces’. In
particular, one defines the spaces
Ẇ s,p := {f ∈ S 0 : k|∇|s f kLp < ∞}
for 1 ≤ p ≤ ∞ and s ∈ R. The theorem above asserts that
Ẇ s,p ,→ Lq whenever s > 0, 1 < p < q < ∞ and 1

q = 1
p − ds .
That is, it yields an embedding result between Sobolev and Lebesgue spaces.
Such results have wide application in the field of PDE.
Proof of Theorem 6.2.5. We argue by duality. For f, g ∈ S we first observe

(by Plancherel and the definition of |∇|s ) that
|hf, gi| = |h|∇|s f, |∇|−s gi| ≤ k |∇|s f kLp k|∇|−s gkLp0 .
Now, by Hardy–Littlewood–Sobolev (and the scaling relations)
k|∇|−s gkLp0 . k|x|s−d ∗ gkLp0 . kgkLq0 .
The result follows.
The result just stated only applies to the fractional derivative operators
|∇|s . What about when s ∈ N? For example, do we have
d d
kf kLq . k∇f kLp whenever 1 < p < q < ∞ and q = p − 1.
This would follow if we could establish that
k|∇|f kLp . k∇f kLp
for any 1 < p < ∞. This question, in turn, boils down to a question about
certain Fourier multiplier operators. Namely, are the operators defined with
multipliers
ξ
mj (ξ) = |ξ|j
(known as Riesz transforms) bounded on Lp ? The answer is quickly seen

to be yes for p = 2; this is a consequence of Plancherel. For p 6= 2 the answer
is still yes, but it is not as simple to prove it. We will return to this question
below.
In Theorem 6.2.5, we have not included the endpoints p = 1 or q = ∞.
While there are some endpoint Sobolev embedding estimates available, one
needs to be careful when considering the endpoints. To illustrate this point,
consider the following example, which demonstrates the possibility of failure
of Sobolev embedding into L∞ .
Example 6.2.1. Let p > 1 be an integer and let s = dp . We will show there
is no continuous embedding Ẇ s,p ,→ L∞ .
We let φ ∈ S be a nonnegative function supported inside the set 1 <
|x| < 2. For each N ∈ 2N , we define
N
X N
X
−s
fN (x) = |∇| s
cM M φ(M x) = cM (|∇|−s φ)(M x),
M =1 M =1
6.3. HARDY–LITTLEWOOD MAXIMAL FUNCTION 147
where the sum is also restricted to M ∈ 2N and the coefficients cM will

be chosen below. We will show that kfN kL∞ → ∞ as N → ∞, while
k|∇|s fN kLp remains uniformly bounded.
First, we observe
Z
|∇|−s φ(0) = c| · |s−d ∗ φ(0) = c |y|s−d φ(y) dy > 0.
Thus we find
N
X
kfN kL∞ ≥ |fN (0)| & cM .
M =1
On the other hand, using the support properties of φ, we have
N
Z X p
s
fN kpLp s

k|∇| = cM M φ(M x) dx
M =1
N
X p Z N
cpM ,
X
= cM M sp |φ(M x)|p dx ∼
M =1 M =1
where we have changed variables and used s = dp . Thus, we may arrange that
kfN kL∞ → ∞ while kfN kLp remains bounded by choosing the coefficients
to obey {cM } ∈ `p \`1 . Recalling M ∈ 2N , we can take cM = [log M ]−1 , say.
Finally, note that we have only stated Theorem 6.2.5 in the setting of
the whole space Rd . Sobolev embedding results for bounded domains are
certainly available (and important), but we do not discuss this topic here.
See e.g. [8] for results of this type.
6.3 Hardy–Littlewood maximal function

We turn to our topic, namely, the Hardy–Littlewood maximal function. This
is a nonlinear operator that is widely-studied within harmonic analysis, and
we will use it on several occasions in later sections.
For x ∈ Rd and r > 0, we denote the ball of radius r centered at x by
B(x, r) = {y ∈ Rd : |x − y| < r}.
Given a locally integrable function f on Rd and r > 0, we consider the

average Z
1
|B(x,r)| f (y) dy.
B(x,r)
This is well-defined and is a continuous function of x. The operator

Z
1
M f (x) := sup |B(x,r)| |f (y)| dy
r>0 B(x,r)
is called the Hardy–Littlewood maximal function of f . The operator

M is called the Hardy–Littlewood maximal operator.
As a simple motivating example, let us begin with the following propo-
sition.
Proposition 6.3.1. Suppose g : Rd → R obeys |g(x)| . hxi−k for some

k > d. Then
|f ∗ g(x)| . M f (x).
Proof. We use the definition of the maximal function to estimate

X Z X
|f ∗ g(x)| . |f (y)|hyi−k dy . N d hN i−k M f (x) . M f (x).
|y|∼N
N ∈2Z N ∈2Z
Our main goal will be to establish the following.
Theorem 6.3.2. The Hardy–Littlewood maximal operator is weak type (1, 1);
that is,
kM f kL1,∞ . kf kL1 .
As one readily observes that M is strong type (∞, ∞), we may then
apply the Marcinkiewicz interpolation theorem to conclude the following:
Corollary 6.3.3. The Hardy–Littlewood maximal operator is strong type

(p, p) for all 1 < p ≤ ∞.
Before we begin the proof, let us make a few observations about the
operator M . First, suppose f is a nontrivial function. By considering r ∼
|x| 1, we can observe that
|M f (x)| & |x|−d .
Thus we should not expect M to be strong type (1, 1); instead, mapping into
weak L1 is the natural assertion. Of course, this rate of decay is compatible
with mapping into all higher Lp spaces.
We will need the following lemma.
Lemma 6.3.4 (Wiener covering lemma). Let Bj = Bj (xj , rj ) be a finite

collection of balls in Rd . There exists a subcollection of balls, denoted S,
such that
• distinct balls in S are disjoint,
• ∪Bj ⊂ ∪S B(xj , 3rj ).
Proof. We run the following algorithm. Begin by setting S = ∅.
1. Choose a ball of largest radius from the remaining collection and add
it to S.
2. Discard from the remaining collection all balls that intersect a ball in
S.
3. If no balls remain, stop. Otherwise, return to step 1.
This algorithm terminates in finitely many steps; indeed, we remove at least

one ball from the collection in each step. The balls in S are disjoint by
construction, so it remains to verify the second point. In fact, if Bj does
not belong to S, it must interest a ball in S of larger radius; it then belongs
in the three times dilate of that ball. This completes the proof.
We are now ready to prove Theorem 6.3.2.
Proof of Theorem 6.3.2. Let α > 0 and consider an arbitrary compact sub-
set K of the set
{x : M f (x) > α}.
We need to show that |K| . α−1 kf kL1 (with implicit constant independent
of K).
Let x ∈ K. Then there exists a radius r(x) > 0 such that
Z
1
|B(x,r(x))| |f (y)| dy ≥ α.
B(x,r(x))
We therefore have [
K⊂ B(x, r(x)),
x∈K
and hence by compactness there exists a finite subcollection Bj = B(xj , r(xj ))

such that K ⊂ ∪N j=1 Bj . By the Wiener covering lemma, we may find a sub-
collection S of disjoint balls such that K ⊂ ∪S B(xj , 3r(xj )).
We now write
X X
|K| ≤ |3Bj | ≤ 3d |Bj |.
S S
Now, by the choice of r(xj ), we have

Z
|Bj | . α−1 |f (y)| dy for each j,
Bj
and hence (since the balls are disjoint) we have

XZ Z
d −1 d −1
|K| ≤ 3 α |f (y)| dy ≤ 3 α |f (y)| dy.
S Bj
As the implicit constant (namely, 3d ) is independent of K, we may now take

the supremum over compact K ⊂ {x : M f > α} to conclude the desired
result.
A standard application of the Hardy–Littlewood maximal inequality is

to prove the Lebesgue differentiation theorem:
Proposition 6.3.5. Let f be locally integrable. Then

Z
lim 1 f (y) dy = f (x)
r→0 |B(x,r)| B(x,r)
for almost every x.
The details are left as an exercise, but we sketch the idea as follows. First,
one should show that it suffices to treat functions in L1 (not merely locally
integrable). Next, one should observe that the result follows for smooth,
compactly supported functions. Finally, one should extend this to L1 by
approximation; it is here that the maximal function estimate will come into
play.
Let us next discuss a generalization of the result above that we will need
later. For a locally integrable function ω : Rd → [0, ∞), we define a measure
via Z
ω(E) = ω(x) dx.
E
Then we have the following result:

Theorem 6.3.6. We have M : L1 (M ω dx) → L1,∞ (ω dx) and M : Lp (M ω dx) →

Lp (ω dx) for 1 α} . α |f (x)| M ω(x) dx,
Z Z
|M f (x)|p ω(x) dx . |f (x)|p M ω(x) dx for 1 < p < ∞,
kM f kL∞ (ω dx) . kf kL∞ (M ω dx) .

Sketch of proof. Again, by Marcinkiewicz interpolation it suffices to estab-
lish strong type (∞, ∞) and weak type (1, 1) bounds. For the (∞, ∞)
bounds, we note
kM f kL∞ (ω dx) = inf sup M f (x) ≤ kf kL∞ (dx) ,
ω(E)=0 x∈E c
and thus it suffices to check that

kf kL∞ (dx) = inf sup |f (x)| ≤ inf sup |f (x)| = kf kL∞ (M ω dx) .
|E|=0 x∈E c (M ω)(E)=0 x∈E c
(cf. Exercise A.4.3). In fact, this inequality is a consequence of the fact that
(M ω)(E) = 0 implies |E| = 0.
For the weak type (1, 1) bound, the argument is similar to that appearing
in the proof of Theorem 6.3.2. One constructs the set K, the balls Bj , and
the subcollection S as in that proof. This time we need to estimate ω(3Bj )
(where 3Bj := Bj (xj , 3rj ), which we do by writing
Z Z
ω(3Bj ) = ω(x) dx ≤ ω(x) dx ≤ 4d |Bj |M ω(y)
3Bj |x−y|<4rj
for any y ∈ Bj , so that

Z Z
ω(3Bj ) |f (y)| dy . |Bj | |f (y)|M ω(y) dy.
Bj Bj
This implies (by the choice of Bj )

Z
−1
ω(3Bj ) . α |f (y)|M ω(y) dy,
Bj
which then gives

X XZ Z
−1 −1
ω(K) ≤ ω(3Bj ) . α |f (y)|M ω(y) dy . α |f (y)|M ω(y) dy,
S S Bj
yielding the desired bound.

Remark 6.3.7. It is also a natural question to ask for which weights ω we

have that M maps Lp (ω dx) to Lp (ω dx) boundedly for some 1 < p < ∞.
The sharp condition for this is known; in particular, one needs that
Z Z 0
p0
1 1 − pp p
sup |B| ω(y) dy · |B| ω(y) dy . 1.
B B B
In this case we call ω an Ap weight, and write ω ∈ Ap . We will not pursue
the topic of Ap weights in these notes; we refer the interested reader to [29].
We next turn our attention to so-called vector-valued maximal func-
tions.
Definition 6.3.8. For f : Rd → `2 (C) given by f (x) = {fn (x)}n≥1 , we
define Z 1
p
p
kf kL =
p kf (x)k`2 dx .
We define the vector maximal function by

X 1
2
2
M̄ f (x) = |M fn (x)| = k{M fn (x)}k`2 .
n≥1
The result we will prove is the following.

Theorem 6.3.9. The operator M̄ is weak type (1, 1) and strong type (p, p)
for all 1 α}| . α−1 kf kL1
and
kM̄ f kLp . kf kLp for 1 < p < ∞.
Remark 6.3.10. In the scalar case, the L∞
case was trivial. In the vector
case, it is false! In particular, one can check that if fn (x) = χ[2n−1 ,2n ] (x)
then f ∈ L∞ but |M̄ f (x)|2 ≡ ∞. Instead, the trivial estimate is from L2 to
L2 . Indeed,
X 1
2
2

kM̄ f kL2 =
|M fn |

n L2
Z X 1
2
2
. |M fn (x)| dx
Z X 1
2
2
. |fn | dx = kf kL2 ,
where we use that M : L2 → L2 boundedly.

We now have enough tools to prove Theorem 6.3.9 when p ∈ (2, ∞).
Proof of Theorem 6.3.9 for p ∈ (2, ∞). Recall from Theorem 6.3.6 that we
have Z Z
|M fn (x)| ω(x) dx . |fn (x)|2 M ω(x) dx,
2
so that Z Z
2
|M̄ f (x)| ω(x) dx . kf (x)k2`2 M ω(x) dx
for any locally integrable ω. Now let 2 < p < ∞ and set q = ( p2 )0 . Then by
duality we have
Z
2
2 M̄ f 2 ω dx

kM̄ f kLp = k M̄ f k p2 = sup
L
kωkLq =1
Z
. sup kf (x)k2`2 M ω dx
kωkLq =1
sup kf (x)k2`2 kM ωkLq . kf k2Lp ,

. p
L2
kωkLq =1
where we have used that M maps Lq → Lq boundedly. This completes the

proof.
It remains to establish the weak type (1, 1) bound in Theorem 6.3.9, for
then all of the remaining cases follow from Marcinkiewicz interpolation. For
this, we introduce the so-called Calderon–Zygmund decomposition.
Lemma 6.3.11 (Calderon–Zygmund decomposition). Let f ∈ L1 (Rd ; `2 (C))
and set α > 0. There exists a decomposition f = g + b such that g, b have
the following properties:
• kg(x)k`2 ≤ α for a.e. x ∈ Rd .
• The support of b is a union of nonoverlapping cubes Qk , with

Z
1
α < |Qk | kb(x)k`2 dx ≤ 2d α.
Qk
P
• We have g = f (1 − χQk ).
Proof. We begin by decomposing Rd into a mesh of equal-sized nonoverlap-
ping cubes whose common diameter is large enough that
Z
1
|Q| kf (x)k`2 dx ≤ α
Q
for all cubes in the mesh.

Let Q be one of the cubes in this mesh. Subdivide Q into 2d congruent
cubes, and let Q0 denote one of the resulting cubes. If
Z
1
|Q0 | kf (x)k`2 dx > α,
Q0
stop and select Q0 as one of the cubes Qk . Note that in this case we have
Z Z
2d
1
α < |Q0 | kf (x)k`2 dx ≤ |Q| kf (x)k`2 dx ≤ 2d α.
Q0 Q
If instead Z
1
|Q0 | kf (x)k`2 dx ≤ α,
Q0
then we subdivide further into 2d congruent cubes and repeat the same
selection process for each of the resulting cubes. We continue subdividing
until (if ever) we are forced into the first case.
We repeat this with all of the cubes in the mesh, leading
P to the collection
of nonoverlapping cubes Qk . We then define b = f · χQk , which has the
desired bounds by construction.
It remains to verify that kg(x)k`2 ≤ α for a.e. x ∈ Rd , where g = f − b.
To this end, note that for any x ∈ / ∪Qk , there exists a sequence of cubes
Q 3 x with diameter tending to zero and such that
Z
1
|Q| kf (y)k`2 dy ≤ α.
Q
Applying the Lebesgue differentiation theorem (to the integrable function

x 7→ kf (x)k`2 ), we deduce that
kf (x)k`2 ≤ α for almost every x ∈
/ ∪Qk .
As f = g outside of ∪Qk , the result follows.
Remark 6.3.12. The same proof and decomposition works for scalar valued
f . In this case we can extend the result to get a mean zero condition for b,
namely Z
1
|Qk | b dx = 0
Qk
for each k. Indeed, in this case we let
(
f (x) x∈/ ∪Qk ,
g(x) = 1
x ∈ Q◦k
R
|Qk | Qk f (y) dy
and again set b = f − g. In this case we have |g(x)| ≤ 2d α, while

Z Z
b(x) dx = [f (x) − g(x)] dx = 0.
Qk Qk
Note that we still have

Z
1
|Qk | |b(x)| dx . α.
Qk
Proof of Theorem 6.3.9. Let α > 0 and f ∈ L1 (Rd ; `2 (C)). We use the
Calderon–Zygmund decomposition to write f = g+b as above. In particular,
|{|M̄ f | > α}| ≤ |{|M̄ g| > 21 α}| + {|M̄ b| > 12 α}|.
We can first estimate by Tchebychev and the strong type (2, 2) bound for
M̄ :
|{|M̄ g| > 12 α}| . α−2 kM̄ gk2L2 . α−2 kgk2L2 .
Now, since kg(x)k`2 ≤ α a.e., we have

Z Z
−2 −1
α 2
kg(x)k`2 dx ≤ α kg(x)k`2 dx ≤ α−1 kgkL1 ≤ α−1 kf kL1 ,
which is acceptable. It remains to treat the contribution of b.

Observe that by construction we have
∪Qk ≤ α−1 kf kL1 .

In particular, writing 2Qk for the dilate of Qk with the same center, we have
∪2Qk . 2d α−1 kf kL1 ,

and hence it suffices to show that

c
|{x ∈ ∪2Qk : M̄ b > 21 α}| . α−1 kf kL1 .
For this, we introduce an averaged version of b with components

X Z
1
b̃n = χQk |Qk | |fn (y)| dy.
k Qk
As we will see, this leads to a function b̃ that obeys a pointwise bound of

α, an L1 bound of kb̃kL1 . kf kL1 , and a lower bound of M̄ b̃ & M̄ b (on
the complement of ∪k 2Qk ). These ingredients will allow us to complete the

estimation of the contribution of b in the same way that we handled g.
We first observe that for x ∈ Qk ,
Z Z
1 1
kb̃(x)k`2 = |Qk |
|fn (y)| dy ≤ |Qk |
kfn (y)k`2 dy . α.
Qk `2 Qk
This also shows

XZ XZ
kb̃kL1 = kb̃(x)k`2 dx ≤ kf (y)k`2 dy ≤ kf kL1 .
k Qk k Qk
Finally, note that if x ∈

/ ∪k 2Qk and B(x, r) ∩ Qk 6= ∅, then Qk ⊂ B(x, 2r).
Thus, letting S = Sr,x = {k : B(x, r) ∩ Qk 6= ∅}, we can estimate
XZ
1
M bn (x) = sup |B(x,r)| |bn (y)| dy
r>0 B(x,r)∩Qk
k∈S
X Z
1 1
. sup |B(x,2r)| |Qk | · |Qk | |bn (y)| dy
r>0 Qk
k∈S
Z X Z
1
. sup |B(x,2r)| χQk (z) |Q1k | |bn (y)| dy dz
r>0 B(x,2r) k Qk
. M b̃n (x),
whence M̄ b . M̄ b̃ on (∪k 2Qk )c .

Using the above and arguing as we did for g,
/ ∪k 2Qk : M̄ b > 21 α}| . |{M̄ b̃ & α}|

|{x ∈
. α−2 kM̄ b̃k2L2
. α−2 kb̃k2L2
. α−1 kb̃kL1 . α−1 kf kL1 ,
which is acceptable. This completes the proof.
Remark 6.3.13. The theory of vector maximal functions can be extended

to `q instead of `2 , but we will not pursue this here.
6.4 Calderón–Zygmund theory

Recall that (in the setting of Sobolev embedding) we encountered the ques-
tion of whether the Fourier multiplier operators defined via the symbol
6.4. CALDERÓN–ZYGMUND THEORY 157
ξ
mj (ξ) = |ξ|j (known as Riesz transforms) are bounded on Lp spaces.
To answer this question (as well as to understand some other fundamen-
tal operators in harmonic analysis) we will need to develop what is known
as Calderón–Zygmund theory. This theory addresses the case of ‘singular
integral operators’. The precise definition we need is the following.
Definition 6.4.1. A Calderón–Zygmund convolution kernel is a func-
tion K : Rd \{0} → C that obeys
(a) |K(x)| . |x|−d uniformly for x 6= 0,
(b) For any 0 < R1 < R2 < ∞,

Z
K(x) dx = 0,
R1 ≤|x|≤R2
(c) The estimate Z

|K(x + y) − K(x)| dx . 1
|x|≥2|y|
holds uniformly for y ∈ Rd .

Item (b) is a cancellation condition. Item (c) is a type of regularity
condition and is implied by the bound |∇K(x)| . |x|−(d+1) , as we will see
below.
Given a Calderón–Zygmund kernel K, we will consider the operator
defined via f 7→ K ∗ f .
−iξ
Example 6.4.1 (Riesz transforms). Consider mj (ξ) = |ξ|j . The Fourier
multiplier operator with symbol mj is a convolution operator with the kernel
Kj (x) = F −1 mj (x) = −∂xj F −1 (|ξ|−1 ).
Now recall that F −1 (|ξ|−1 ) = c|x|1−d , and so
Kj (x) = cxj |x|−(d+1)
for some c.
Using this, we readily observe that (a) holds. As for (b), let us con-
sider (without loss of generality) the case j = 1 and d ≥ 2. Define A =
diag(−1, −1, 1, . . . , 1) and consider the change of variables y = Ax. As
det A = 1, |y| = |Ax|, and y1 = −x1 , this shows
Z Z
−(d+1)
xj |x| dx = − xj |x|−(d+1) dx,
R1 ≤|x|≤R2 R1 ≤|x|≤R2
which shows this integral is zero. Finally, for (c) we observe that
|∇Kj | = O(|x|−(d+1) ),
and hence the desired inequality follows from Lemma 6.4.2 below.
Lemma 6.4.2. A kernel K obeys (c) whenever |∇K(x)| . |x|−(d+1) .
Proof. We use the fundamental theorem of calculus to write
Z 1
K(x + y) − K(x) = ∇K(x + θy) · y dθ.
0
Thus
Z Z 1Z
|K(x + y) − K(x)| dx . |y||∇K(x + θy)| dθ dx
|x|≥2|y| 0 |x|≥2|y|
Z 1Z
. |y| |x + θy|−(d+1) dx dθ
0 |x|≥2|y|
Z
. |y| |x|−(d+1) dx . 1.
|x|≥2|y|
In the above, we have used

|x + θy| ≥ |x| − θ|y| ≥ 12 |x|
for |x| ≥ 2|y|, which implies
|x + θy|−(d+1) . |x|−(d+1) .
We also used Z
|x|−(d+1) dx . |y|−1 ,
|x|≥2|y|
which can be seen by changing to spherical coordinates:
Z Z Z ∞ Z ∞
−(d+1) −(d+1) d−1
|x| dx = r r dr dω . r−2 dr . |y|−1 .
|x|≥2|y| Sd−1 2|y| 2|y|

1
Example 6.4.2 (Hilbert transform). Define K : R\{0} → R by K(x) = πx .
This kernel defines the Hilbert transform
Z
f (x−y)
f 7→ π1 y dy.
This kernel clearly satisfies (a) and (b), while to verify (c) we compute the
derivative of K.
We turn to our first main result, which shows that Calderón–Zygmund

operators define bounded operators on L2 (i.e. are type (2, 2)).
Theorem 6.4.3. Let K be a Calderón–Zygmund convolution kernel. Given
ε > 0, define
Kε (x) = χ{ε<|x|<ε−1 } (x)K(x).
Then for all f ∈ S,
kKε ∗ f kL2 . kf kL2 uniformly in ε > 0.
Consequently, the operator
f 7→ K ∗ f := lim Kε ∗ f
ε→0
extends from Schwartz space to a bounded operator on L2 .

Proof. Let f ∈ S. We will show that {Kε ∗ f }ε>0 is Cauchy in L2 as ε → 0.
This implies that Kε ∗ f converges in L2 to a limit denoted by K ∗ f . We
will also show that f 7→ Kε ∗ f is bounded on L2 uniformly in ε > 0. In this
case, we obtain
kK ∗ f kL2 . kKε ∗ f kL2 + o(1) as ε→0
and hence
kK ∗ f kL2 . kf kL2 .
Now let 0 < ε1 < ε2 . Then we may write
Kε1 ∗ f (x) − Kε2 ∗ f (x)

Z Z
= K(y)f (x − y) dy − K(y)f (x − y) dy.
ε1 ≤|y|≤ε2 ε−1 −1
2 ≤|y|≤ε1
Consider first the contribution of ε1 ≤ |y| ≤ ε2 . Using the cancellation

condition (b), the fundamental theorem of calculus, and the bound in (a),
we may write this term as
Z

K(y)[f (x − y) − f (x)] dy
ε1 ≤|y|≤ε2
Z Z 1
. |K(y)||y| |∇f (x + θy)| dθ dy
ε1 ≤|y|≤ε2 0
Z
. |y|1−d k∇f kL∞ ({|y|∼|x|}) dy.
ε1 ≤|y|≤ε2
Because f is a Schwartzp function, we may bound this gradient term by

hxi −100d (where hxi = 1 + |x|2 ). Thus, performing the integral in y, we
find
Z

2 . ε2 − ε1 → 0 as ε2 , ε1 → 0.
K(y)f (x − y) dy
ε1 ≤|y|≤ε2 Lx
It remains to treat the range ε−1

2 ≤ |y| ≤ ε−11 . For this, we use the
convolution inequality Lemma 6.2.3 and (a) to bound
Z

−1 K(y)f (x − y) dy
−1 2
ε2 ≤|y|≤ε1 L
. kKχε−1 ≤|y|≤ε−1 ∗ f kL2
2 1
. kf kL1 kKχε−1 ≤|y|≤ε−1 kL2

2 1
Z 1
2
−2d
. kf kL1 |x| dx
ε−1 −1
2 ≤|x|≤ε1
d
. kf kL1 ε2 → 0 2
as ε2 → 0.
Now we need to establish L2 → L2 bounds for Kε that are uniform in

ε > 0. To do this, we will show that Kε themselves are Calderón–Zygmund
convolution kernels. In fact, properties (a) and (b) are straightforward to
see, so we will focus on (c). We need to bound
Z
|Kε (x + y) − Kε (x)| dx.
|x|≥2|y|
To this end, recall that Kε = Kχ, where to simplify notation we write

χ = χ{ε≤|·|≤ε−1 } . Then
Kε (x + y) − Kε (x) = [K(x + y) − K(x)]χ(x + y) + K(x)[χ(x + y) − χ(x)].
The contribution of the first term above is uniformly bounded due to the
assumption that K is a Calderón–Zygmund kernel. Now consider the second
term. The part containing the characteristic function is nonzero if
ε ≤ |x + y| ≤ ε−1 |x| > ε−1

AND |x| < ε OR (6.7)
or if
|x + y| > ε−1 } and ε ≤ |x| ≤ ε−1 .

|x + y| < ε OR (6.8)
Consider the scenario (6.7). Suppose |x| < ε. Then the bounds |x+y| ≥ ε
and |x| ≥ 2|y| imply that |x| ≥ 32 ε. Thus we are led to estimate
Z Z
|K(x)| dx . |x|−d dx . 1
2 2
3
ε<|x|<ε 3
ε<|x|<ε
uniformly in ε > 0. If instead |x| > ε−1 , then we similarly deduce that
|x| < 2ε−1 , and we instead estimate
Z
|x|−d dx . 1
ε<|x|<2ε−1
uniformly in ε > 0.
The scenario (6.8) is similar. That is, if |x + y| < ε then we deduce
ε ≤ |x| ≤ 2ε, while if |x + y| > ε−1 then we deduce 23 ε−1 ≤ |x| ≤ ε−1 . In
particular, we obtain a suitable estimate in either case.
This completes the proof that Kε is a Calderon–Zygmund kernel (with
implicit bounds independent of ε).
To complete the the proof, we will show that kK̂ε kL∞ . 1 (uniformly
in ε > 0), which implies that Kε : L2 → L2 boundedly (uniformly in ε).
Indeed,
kKε ∗ f kL2 ∼ kK̂ε fˆkL2 . kK̂ε kL∞ kf kL2
by Plancherel’s theorem.
We fix ξ ∈ Rd \{0}. Then (up to constants depending only on π, d),
Z
K̂ε (ξ) = e−ixξ Kε (x) dx
Z Z
−ixξ
= e Kε (x) dx + e−ixξ Kε (x) dx.
|x|≤|ξ|−1 |x|>|ξ|−1
Using condition (b) and then (a),

Z
−ixξ

e K ε (x) dx
|x|≤|ξ|−1

Z
−ixξ

= [e − 1]Kε (x) dx
|x|≤|ξ|−1
Z
. |x|1−d |ξ| dx . 1
|x|≤|ξ|−1
uniformly in ξ. The remaining region will take a bit more effort.

We begin by writing
Z
2 e−ixξ Kε (x) dx
|x|>|ξ| −1
Z Z
−ixξ
= e Kε (x) dx − e−ixξ eiπ Kε (x) dx
|x|>|ξ|−1 |x|>|ξ|−1
Z Z
−ixξ
= e Kε (x) dx − e−ixξ Kε (x + πξ
|ξ|2
) dx,
|x|>|ξ|−1 |x+ πξ2 |>|ξ|−1
|ξ|
where we have written ξ

πiξ·
eπi = e |ξ|2
and performed a change of variables in the second integral.

Now let us rewrite the integral above as the sum of three pieces, namely,
Z
e−ixξ Kε (x) − Kε (x + |ξ|
πξ

2 ) dx, (6.9)
|x|≥2π|ξ|−1
Z
e−ixξ Kε (x) dx, (6.10)
−1
|ξ| ≤|x|≤2π|ξ| −1
Z
− e−ixξ Kε (x + |ξ|
πξ
2 ) dx, . (6.11)
R
where R is the region
R = {|x| ≤ 2π|ξ|−1 , |x + πξ
|ξ|2
| > |ξ|−1 }.
The term (6.9) is bounded uniformly by property (c). The second term
(6.10) is bounded uniformly by property (a). For the third term, we note
that in the region R we have
|ξ|−1 ≤ |x + πξ
|ξ|2
| ≤ 3π|ξ|−1 ,
and hence term (6.11) is again uniformly bounded by property (a). This
Our next main result states that Calderón–Zygmund operators are weak
type (1, 1) and strong type (p, p) for all 1 0):
• |{|Kε ∗ f | > α}| . α−1 kf kL1 ,

• kKε ∗ f kLp . kf kLp for all 1 0 and f ∈ L1 , we perform a Calderon–Zygmund decomposition and
write f = g + b, where the support of b is a union of nonoverlapping cubes
Qk , with
Z Z
1 1
|Qk | b = 0 and |Qk | |b| . α for each k,
Qk Qk
and |g| . α a.e.

We then have
|{|Kε ∗ f | > α}| ≤ |{|Kε ∗ g| > 12 α}| + |{|Kε ∗ b| > 21 α}|.
The contribution of g is handled in a straightforward fashion. We have

by Tchebychev’s inequality,
|{|Kε ∗ g| > 12 α}| . α−2 kKε ∗ gk2L2 . α−2 kgk2L2 . α−1 kf kL1 ,
where we recall that |g| . α and the definition of g (cf. Remark 6.3.12).
Now consider the contribution of b. We ∗
√ let Qk denote the cube centered
at xk (the center of Qk ) and dilated by 2 d. Then
X X √
| ∪ Q∗k | ≤ |Q∗k | = (2 d)d |Qk | . α−1 kf kL1
by construction of the Qk , and hence we only need to show

c
|{x ∈ ∪Q∗k : |Kε ∗ b(x)| > 21 α}| . α−1 kf kL1 .
We begin by with an application of Tchebychev’s inequality to bound

this measure by Z
−1
α |Kε ∗ b| dx. (6.12)
∩(Q∗k )c
We now use the mean zero condition on b to write

XZ
Kε ∗ b(x) = [Kε (x − y) − Kε (x − xk )]b(y) dy.
k Qk
Thus
Z XZ
−1
(6.12) . α |Kε (x − y) − Kε (x − xk )| |b(y)| dy dx
∩(Q∗k )c k Qk
X Z Z
−1
. α |Kε (x − y) − Kε (x − xk )| dx |b(y)| dy.
k Qk (Q∗k )c
Now write
Z Z
|Kε (x−y)−Kε (x−xk )| dx = |Kε (x+xk −y)−Kε (x)| dx.
(Q∗k )c −{xk }+(Q∗k )c
Now we claim that |x| ≥ 2|xk − y| for x, y in the appropriate sets above.
Indeed, writing `(Qk ) for the sidelength of Qk , we first have |xk − y| ≤
√
d
√
2 `(Q k for y ∈ Qk , while |x| ≥
) d`(Qk ). Thus condition (c) applies and
this integral is bounded uniformly, leading to
XZ
−1
(6.12) . α |b(y)| dy . α−1 kf kL1 ,
k Qk
as desired.
Marcinkiewicz interpolation now yields (p, p) bounds for 1 < p ≤ 2
(uniformly in ). It remains to treat the case 2 < p < ∞. To this end, we
fix 2 < p < ∞ and write
ZZ
kKε ∗ f kLp = sup Kε (x − y)f (y)ḡ(x) dx dy
kgk 0
Lp =1
Z Z
= sup f (y) Kε (x − y)g(x) dx dy
= suphf, K̄ε (−·) ∗ gi

. kf kLp sup kK̄ε (−·) ∗ gkLp0
. kf kLp ,
p0
where we have used L boundedness for K̄ε (−·), which follows from the fact
that 1 < p0 ≤ 2. This completes the proof.
Remark 6.4.5. Let us briefly summarize the ideas of the proofs above.
Essentially, what we showed is that the three conditions defining a Calderon–
Zygmund kernel K guarantee that K̂ is bounded, which yields the L2 → L2
bounds by Plancherel. Then, using a Calderon–Zygmund decomposition
and item (c), we showed that Calderon–Zygmund kernels have weak type
(1, 1) bounds. Interpolation yields (p, p) bounds for 1 < p < 2, and a duality
argument yields (p, p) bounds for 2 < p < ∞.
6.5. EXERCISES 165
Remark 6.4.6. Boundedness in L1 and L∞ can fail. To see this, consider

again the Hilbert transform
f (x − y)
Z
1
Hf (x) = π dy.
y
Let f = χ[a,b] ∈ L∞ ∩ L1 . We claim that
1
x − a
Hχ[a,b ](x) = π log ,
x−b
which belongs to neither L1 nor L∞ . Indeed, we can write
χ[a,b] (x − y)
Z
1
Hχ[a,b] (x) = π lim dy.
ε→0 ε≤|y|≤ε−1 y
Now note that χ[a,b] (x − y) 6= 0 if and only if x − b ≤ y ≤ x − a. Thus if

x > b, then for ε sufficiently small we get
Z x−a
Hχ[a,b] (x) = π1 1 1
x−a
y dy = π log x−b .

x−b
Similar considerations treat the cases a < x 1. Use this fact to fill in the details in
Remark 6.1.6.
Exercise 6.5.2. Prove the estimates (6.5) and (6.6). Hint: Write the norms
in terms of the integral of the distribution function.
Exercise 6.5.3. Use the Hardy–Littlewood maximal inequality (Theorem 6.3.2)
to prove the Lebesgue differentiation theorem (Proposition 6.3.5).
Exercise 6.5.4. Fill in the details of the proof of Theorem 6.3.6.
Exercise 6.5.5. Show that if fn (x) = χ[2n−1 ,2n ] (x) then f = {fn } ∈ L∞ but
|M̄ f (x)|2 ≡ ∞.
Exercise 6.5.6. Fill in the details in Remark 6.4.6.
Chapter 7
Classical harmonic analysis,

part II
In this chapter we primarily focus on the topics of Littlewood–Paley theory

and the theory of oscillatory integrals.
We define a partition of unity to be used throughout the chapter as
follows. We let ϕ : Rd → [0, 1] be a smooth C ∞ function such that
(
1 |x| ≤ 1.4
ϕ(x) =
0 |x| > 1.42.
We let ψ : Rd → [0, 1] be given by ψ(x) = ϕ(x) − ϕ(2x). For N ∈ 2Z , we set

ψN (x) = ψ( Nx ). It follows that
X
ψN (x) = 1 almost everywhere.
N ∈2Z
Indeed, X
ψN (x) = ϕ( Nx2 ) − ϕ( N
2x
1
).
N1 ≤N ≤N2
In what follows, sums over N will be understood to be indexed by N ∈ 2Z .
7.1 Mihlin multiplier theorem

Recall the notion of a Fourier multiplier operator, i.e. an operator of the
form
Tm = F −1 mF for some m : Rd → C.
166
7.1. MIHLIN MULTIPLIER THEOREM 167
It is a simple consequence of Plancherel’s theorem that Tm maps L2 → L2

boundedly whenever m ∈ L∞ . The following theorem concerns the more
general question of Lp boundedness for Fourier multiplier operators.
Theorem 7.1.1 (Mihlin multiplier theorem). Suppose m : Rd \{0} → C
satisfies
|∂ξα m(ξ)| . |ξ|−|α| (7.1)
uniformly in ξ ∈ Rd \{0} and for all multiindices α of order 0 ≤ |α| ≤ d d+1
2 e.
−1 p
Then Tm = F mF is bounded on L for all 1 < p < ∞.
Proof. Observe that Tm is given by convolution with m̌ = F −1 m. Now, as
just mentioned, the L2 → L2 bound for Tm follows from the assumption
that m is bounded (which is just (7.1) with |α| = 0).
Thus, revisiting the proof of Lp boundedness for Calderón–Zygmund
operators (cf. Remark 6.4.5 following Theorem 6.4.4), we only need to verify
that condition (c) holds for m̌, i.e.
Z
|m̌(x + y) − m̌(x)| dx . 1 uniformly in y.
|x|≥2|y|
Recall that this condition would be implied by the estimate |∇m̌(x)| .

|x|−(d+1) uniformly in x ∈ Rd \{0} (cf. Lemma 6.4.2). Let us first see that
this stronger condition holds if we assume (7.1) holds up to |α| ≤ d + 2.
We write
X
m(ξ) = mN (ξ), where mN = ψN m.
N
By the product rule, for a given multiindex α, we have

α
X α α

|∂ξ [ξmN ]| = 1
cα1 ,α2 ∂ξ (ξm(ξ))∂ξ ψN 2
α1 +α2 =α
X
. N −|α2 | |ξ|1−|α1 | χ{|ξ|∼N } .
α1 +α2 =α
Thus
kxα ∇m̌N kL∞ . k∂ξα [ξmN ]kL1

X Z
. |ξ|1−|α1 | N −|α2 | dξ
α1 +α2 =α |ξ|∼N
. N d+1−|α| .
168 CHAPTER 7. CLASSICAL HARMONIC ANALYSIS, PART II
Applying this with |α| = 0 and |α| = d + 2 yields
|∇m̌N (x)| . min{N d+1 , N −1 |x|−(d+2) }
uniformly in x. Thus
X X
|∇m̌(x)| . N d+1 + N −1 |x|−(d+2) . |x|−(d+1) ,
N ≤|x|−1 N >|x|−1
as needed.
Let us now prove condition (c) assuming (7.1) holds only up to |α| ≤
d d+1
2 e. By Plancherel and the computation above,
Z Z
|xα m̌N (x)|2 dx ∼ |∂ξα mN (ξ)|2 dξ
X Z
. |ξ|−2|α1 | N −2|α2 | dξ
α1 +α2 =α |ξ|∼N
. N d−2|α| .
Using Cauchy–Schwarz and applying the above with |α| = 0, we find

Z
d
|m̌N (x)| dx . (N R) 2 .
|x|≤R
On the other hand, applying the above with |α| = d d+1

2 e,
Z Z 1 Z 1
2 2
α 2 −2|α|
|m̌N (x)| dx ≤ |x m̌N (x)| dx |x| dx
|x|>R |x|>R
d
−d d+1 e
. (N R) 2 2 .
1
Choosing R ∼ N, we find
Z
|m̌N (x)| dx . 1 uniformly in N.
Arguing in the same way, we have

Z
|∂ β m̌N (x)| dx . N |β| .
In particular, this shows

Z
|m̌N (x + y) − m̌N (x)| dx . N |y|.
7.2. LITTLEWOOD–PALEY THEORY 169
Thus we have
Z XZ
|m̌(x + y) − m̌(x)| dx ≤ |m̌N (x + y) − m̌N (x)| dx
|x|≥2|y| N |x|≥2|y|
X X Z
. N |y| + |m̌N (x)| dx
|x|≥|y|
N ≤|y|−1 N >|y|−1
X d d+1
.1+ (N |y|) 2 −d 2
e
. 1.
N >|y|−1
Remark 7.1.2. This result is sharp in the sense that L1 and L∞ bounds
can fail. To see this, let us re-use the Hilbert transform example, which
essentially corresponds to taking m̌(x) = x1 in d = 1. By using contour
integration (say), one can verify that m(ξ) is a multiple of the signum func-
tion. In particular, it satisfies (7.1). However, as we saw before, the Hilbert
transform is not bounded on L1 or L∞ .
Remark 7.1.3. One application of the Mihlin multiplier theorem is the

following ‘Schauder’ type estimate, which is useful in the setting of elliptic
PDE: for any i, j = 1, . . . , d and 1 < p < ∞,
2
k ∂x∂i ∂x
f
j
kLp . k∆f kLp
(uniformly for f ∈ S). Indeed, this is equivalent to the Lp boundedness of

the Fourier multiplier operator
ξi ξj
mij (ξ) := |ξ|2
,
which is a consequence of the Mihlin multiplier theorem. (This can also be

deduced by using boundedness of Riesz transforms twice.)
7.2 Littlewood–Paley theory

Definition 7.2.1. For N ∈ 2Z , we define the Littlewood–Paley projection
operators PN via
fˆN (ξ) = Pd ˆ
N f (ξ) = ψN (ξ)f (ξ).
In particular, Z
fN (x) = N d ψ̌(N y)f (x − y) dy.
Note that PN is not an actual projection, in the sense that PN2 6= PN . We

further define P≤N by
\ ξ ˆ
fd
≤N (ξ) = P≤N f (ξ) = ϕ( N )f (ξ).
P
Finally, we have P>N = I − P≤N and PN ≤·≤M = N ≤K≤M PK .
Remark 7.2.2. There is an alternate definition of frequency projection that

utilizes heat flow. Recall that the solution to the heat equation
ut = ∆u, u(0, x) = f (x)
is given by u(t, x) satisfying

2
û(t, ξ) = e−t|ξ| fˆ(ξ).
We may alternately write this as u(t) = et∆ f . Suppose f is a nice function

(e.g. f ∈ S). Then at time t > 0, û will mostly be concentrated where
|ξ| ≤ √1t . In particular, we can consider
2
P̃≤N f := e∆/N f
to represent a projection of f to frequencies ≤ N . We could then define

2 2
P̃N f = e∆/N f − e4∆/N f.
This viewpoint is useful in settings in which one may not have a nice notion
of a Fourier transform, but one can still solve the heat equation (e.g. on
manifolds).
We next prove some basic properties of the Littlewood–Paley operators.
Proposition 7.2.3. The following hold:
(a) PN and P≤N are bounded on Lp for 1 ≤ p ≤ ∞ (uniformly in N ).
(b) We have the pointwise bound
|fN (x)| + |f≤N (x)| . [M f ](x)
for all N , where M denotes the Hardy–Littlewood maximal function.
(c) For 1 < p < ∞ and f ∈ Lp , the sum N fN converges in Lp to f .

P
(d) For 1 ≤ p ≤ q ≤ ∞, we have

d
− dq
kfN kLq . N p kfN kLp
and d
− dq
kf≤N kLq . N p kf≤N kLp .
(e) For 1 ≤ p ≤ ∞ and s ∈ R, we have
k|∇|s fN kLp ∼ N s kfN kLp .
In particular, for s > 0,
k|∇|s f≤N kLp . N s kf kLp ,

kf>N f kLp . N −s k|∇|s f kLp .
Remark 7.2.4. The estimates in (d) and (e) are known as Bernstein esti-
mates. Item (c) may fail in L1 and L∞ . To see this, first observe that for
each N , we have that P≤N f ∈ C ∞ (since P≤N is convolution with a smooth
function), but C ∞ is not dense in L∞ . Alternately, note that f ≡ 1 belongs
to L∞ , while Z
fN (x) = ψ̌(y) dy ≡ 0.
To see why (c) fails in L1 , note that any individual piece fN (and hence any
finite sum of pieces) has mean zero, while mean zero functions are not dense
in L1 . Indeed, Z
fN (x) dx = fˆN (0) = 0.
The proof of (c) above, however, will show that f≤N → f in L1 as N → ∞.

Proof. Item (a) follows from the fact that PN and P≤N are given by con-
volution with L1 functions with uniformly bounded L1 -norms (in N ). In
particular, by Young’s convolution inequality and a change of variables,
kPN f kLp = kF −1 [ψN ] ∗ f kLp ≤ kF −1 [ψN ]kL1 kf kLp ≤ kf kLp ,
and similarly for P≤N . This argument also proves (d), since
kF −1 [ψN ] ∗ f kLq . kF −1 [ψN ]kLr kf kLp

1 1
for q +1= r + p1 . By a change of variables, one readily checks that
d
− dq
kF −1 [ψN ]kLr . N p ,
which yields (d).

Item (b) follows from the general fact that convolution with a spheri-
cally symmetric L1 -normalized function is always controlled by the maximal
function, which we state as LemmaP 7.2.5 below.
Next consider (c). Writing N1 ≤N ≤N2 fN = f≤N2 − f≤N1 , the problem
reduces to showing f≤N → f as N → ∞ and f≤N → 0 as N → 0; here
the convergence is in Lp . For the first point, we need only to observe that
the family of convolution kernels corresponding to the operators P≤N is
of the form {N d ϕ̌(N ·)}, where ϕ̌ ∈ S. That is, these kernels correspond
to convolution with L1 -preserving rescalings of a fixed L1 function. Thus
they form a family of good kernels and the result follows from the usual
approximation tot he identity argument (see e.g. Lemma A.3.1). Note that
this part of the argument works even when p = 1.
For the second point, recalling that |P≤N f | . M f pointwise, we see that
by dominated convergence (and the maximal function estimate) it would be
sufficient to establish P≤N f → 0 pointwise. In fact, by item (d), we have
d
kP≤N f kL∞ . N p kf kLp → 0 as N → 0.
Finally, we turn to (e). It suffices to prove the bound for fN , for the
remaining two estimates can then be obtained by summation over M ≤ N
or M > N . We consider the Fourier multiplier operator with multiplier
mN (ξ) = N −s |ξ|s ψ̃N (ξ),
where ψ̃N is a slight fattening of ψN . As ψ̃N is supported away from ξ = 0,
we have mN ∈ S. Thus F −1 (mN ) ∈ S and a change of variables shows
kF −1 (mN )kL1 . 1
uniformly in N . Writing P̃N for the Fourier multiplier with symbol ψ̃N , we
have that P̃N PN = PN . Thus we deduce from Young’s inequality that
k|∇|s fN kLp . N s kfN kLp .
However, since s ∈ R was arbitrary, the argument above also shows the
reverse inequality. This completes the proof.
The following lemma was used in the proof above and may be of inde-
pendent interest.
Lemma 7.2.5. Suppose K ∈ S is nonnegative. For any N > 0, we have
Z
N d K(N (x − y))|f (y)| dy .K M f [x].
Proof. Without loss of generality, assume f ≥ 0. Then

Z
N d K(N (x − y))f (y) dy
Z XZ
. Nd f (y) dy + N d K(N (x − y))f (y) dy
N |x−y|≤1 R>1 R<N |x−y|<2R
X Z
1 N d
. M f (x) + R( R ) |N (x − y)|d+1 K(N (x − y))f (y) dy
R>1 N |x−y|≤2R
X
. M f (x) + R−1 M f (x) . M f (x),
R>1
where we sum over dyadic R > 1 and the implicit constants depend only on
khxid+1 KkL∞ .
Our next result is the Littlewood–Paley square function estimate.
Theorem 7.2.6 (Littlewood–Paley square function estimate). Let

X 1
2
2
(Sf )(x) := |fN (x)| .
N ∈2Z
Then
kSf kLp ∼ kf kLp for all 1 < p < ∞.
The proof we present will make use of a probabilistic result known as

Khinchin’s inequality.
Lemma 7.2.7 (Khinchin’s inequality). Let Xn be independent identically

distributed random variables on a probability space with Xn = ±1 with equal
probability. For any 0 < p < ∞,
1 X 1
X p p 2
2
E cn Xn ∼p |cn |
for any {cn } ∈ `2 .
Here independence of X and Y means that
E{f (X)g(Y )} = E{f (X)} E{g(Y )}
for any measurable, bounded f, g. Note that E{Xn } = 0 for the random
variables defined above.
Proof. Without loss of generality, take cn ∈ R. By Tchebychev’s inequality,

for any λ > 0 and t > 0 we can write
X P
P{ cn Xn > λ} ≤ e−λt E{et cn Xn }
Y
≤ e−λt E{etcn Xn }
n
Y
−λt
≤e 1 tcn
2 {e + e−tcn }
n
2 c2 /2 2 c2n /2
Y P
−λt
≤e et n ≤ e−λt et .
n
Here we have used the inequality

2 /2
1 θ
2 [e + e−θ ] ≤ eθ .
A similar argument yields

2
X P 2
P{ cn Xn < −λ} ≤ e−λt et cn /2 .
Now choose t = Pλ 2 to get the bound

cn
λ2
X − P
P{ cn Xn > λ} ≤ 2e
2 c2
n.
Writing the Lp (d P)-norm in terms of the distribution function, this implies

1 Z ∞ 1
X p X p
p p−1
E{| cn Xn | } = pλ P{| cn Xn | > λ} dλ
0
Z ∞ λ2
1
− p
p−1 P 2
≤ pλ 2e 2 cn dλ
0
X 1
2
.p c2n ,
where in the final inequality we have made the substitution

X 1
µ=( c2n )− 2 λ
2
and used finiteness of the integral e−µ /2 µp−1 dµ.
R
It remains to establish the reverse inequality. We claim that

X X 2
c2n = E cn Xn .
Indeed,
X 2 X
E cn Xn = E cn cm Xn Xm
n,m
X X
= c2n E{Xn2 } + cn cm E{Xn } E{Xm }
n n6=m
X X X
= c2n + cn cm E{Xn } E{Xm } = c2n ,
n n6=m n
where we used E{Xn2 } = 1 and independence.

Thus, for 1 < p < ∞ we may use Hölder’s inequality and the inequality
proved above to get
X X
c2n = E{| cn Xn |2 }
X 1 X 0 1
≤ E{| cn Xn |p } p E{| cn Xn |p } p0
X 1 X 21
. E{| cn Xn |p } p cn 2 ,

Finally, for 0 < p ≤ 1, we argue similarly to get
X X X
c2n = E{| cn Xn |p/2 | cn Xn |2−p/2 }
X 1/2 X 1/2
≤ E{| cn Xn |p } E{| cn Xn |4−p
X 1/2 X 2 1−p/4
. E{| cn Xn |p } cn .
Rearranging now yields the desired inequality.
We turn to the proof of Theorem 7.2.6.
Proof of Theorem 7.2.6. Let XN be independent identically distributed ran-

dom variables with XN = ±1 with equal probability. By Khinchin’s inequal-
ity,
1
X p p
(Sf )(x) ∼p E XN fN (x) .
Thus
Z
p
kSf kpLp
X
∼E XN fN (x) dx
XN fN kpLp } = E{km̌X ∗ f kpLp },

X
= E{k
where X
mX (ξ) := XN ψN (ξ).
N
Let us now show that mX is a Mihlin multiplier (with bounds independent

of the value of the XN ), which will imply kSf kLp . kf kLp . We compute

X −|α| α
α
(∂ξ ψ)( N ) . |ξ|−|α| ,
ξ

|∂ξ mX (ξ)| =
XN N
N
where we use the fact that for ψ ∈ Cc∞ (Rd \{0}) only finitely many of the
terms above will contribute to the sum.
Remark 7.2.8. The proof shows that the bound kSf kLp . kf kLp holds
for a wider class of ‘square functions’. In particular, instead of using the
multiplier ψ (that defines P1 ), one could use any Cc∞ function supported in
Rd \{0}.
It remains to establish kf kLp . kSf kLp . This will actually depend on
the fact that ψN is a partition of unity. Define
P̃N = P N + PN + P2N , so that P̃N PN = PN .

2
Then by duality, Proposition 7.2.3(c), and the estimate proved above, we

0
have the following estimate for any g ∈ Lp :
Z X
hf, gi = P̃N PN f ḡ dx
N
Z X
= PN f P̃N g dx
N
Z X 1 X 1
2 2
2 2
. |PN f (x)| |P̃N g(x)| dx
N N
. kSf kLp kS̃gkLp0
. kSf kLp kgkLp0 ,
where S̃ denotes the square function defined in terms of the multipliers for
P̃N . This completes the proof.
In the following, we will establish some ‘fractional calculus’ estimates

that are useful in applications to PDE. We begin with the following corollary.
Corollary 7.2.9. The following hold:
X 1
2
s 2s 2

k|∇| f kLp ∼s,p
N |fN (x)|
(7.2)
N Lp
for all s ∈ R and 1 N (x)|
(7.3)
N Lp
for all s > 0 and 1 < p < ∞.
Proof. Using the (proof of) the square function estimate, we first observe
1
X 2s 2
−s 2

N |P N |∇| g|
. kgkLp .
N Lp
Taking g = |∇|s f now yields RHS(7.2). LHS(7.2). For the reverse in-
equality, we use Plancherel, Cauchy–Schwarz, Hölder, and the estimate just
established to write
Z X
hg, hiL2 = N s |∇|−s PN g · N −s |∇|s P̃N h dx
Z X 1 1
2 2
2s −s 2 −2s s 2
. N |PN |∇| g| N |P̃N |∇| h| dx
N
X 1 X 1
.k N 2s |PN |∇|−s g|2 2
kLp k N −2s |P̃N |∇|s h|2 2
kLp0
X 1
.k N 2s |PN |∇|−s g|2 2
kLp khkLp0 .
0
Taking the supremum over h ∈ Lp and applying this to g = |∇|s f yields
LHS(7.2).RHS(7.2).
We turn to (7.3). We will show RHS(7.2)∼RHS(7.3). Using
fN = f≥N − f≥2N
and the triangle inequality, we readily deduce RHS(7.2).RHS(7.3). For the

reverse, we estimate as follows:

X X X
N 2s |f>N |2 ≤ N 2s |fN1 ||fN2 |
N1 ,N2 ≥N
X X
≤2 N 2s 1 s s
N1s N2s N1 |fN1 |N2 |fN2 |
N ≤N1 ≤N2
N 2s
X X
s s
. N1s N2s N1 |fN1 |N2 |fN2 |
N1 ≤N2 N ≤N1
N1s s
X X
s
. N2s N1 |fN1 |N2 |fN2 | . N 2s |fN |2 ,
N1 ≤N2 N
where in the last step we have used Schur’s test (Lemma A.3.4). This
We turn to the following fractional calculus estimates due to Christ and

Weinstein [6].
Theorem 7.2.10 (Fractional product rule). Let s > 0 and
1 1 1 1 1
p = p1 + p2 = q1 + q2 ,
with 1 < p, q, pj , qj < ∞. Then

k|∇|s (f g)kLp . k|∇|s f kLp1 kgkLp2 + kf kLq1 k|∇|s gkLq2 .
Proof. We use
X 1
k|∇|s (f g)kLp ∼ k N 2s |PN (f g)|2 2
kLp .
Now write
f g = f>N/4 g + f≤N/4 g>N/4 + f≤N/4 g≤N/4 ,
so that
PN (f g) = PN (f>N/4 g) + PN (f≤N/4 g>N/4 ).
Thus (using Proposition 7.2.3)
X X X
N 2s |PN (f g)|2 ≤ N 2s |PN (f>N/4 g)|2 + N 2s |PN (f≤N/4 g>N/4 )|2
X X
. N 2s |M (f>N/4 g)|2 + N 2s |M (M f g>N/4 )|2 .
Now, by the vector maximal inequality and the corollary,

X 1 X
k N 2s |M (f>N/4 g)|2 2 kLp . k( |N s f>N/4 |2 )1/2 gkLp
. kgkLp2 k|∇|s f kLp1 .
Similarly,
X 1 X
k N 2s |M (M f g>N/4 )|2 2 kLp . kM f ( |N s g>N/4 |2 )1/2 kLp
. kM f kLq1 k|∇|s gkLq2
. kf kLq1 k|∇|s gkLq2 .
Theorem 7.2.11 (Fractional chain rule). Let F : C → C be such that
|F (u) − F (v)| ≤ |u − v|[G(u) + G(v)] for some G : C → [0, ∞).
Then for any 0 < s < 1, 1 < p, p1 < ∞ and 1 < p2 ≤ ∞ with
1 1 1
p = p1 + p2 ,
we have
k|∇|s (F ◦ u)kLp . kG(u)kLp2 k|∇|s ukLp1 .
We will need the following lemma, similar in spirit to Lemma 7.2.5
Lemma 7.2.12. Suppose |h(x)| ≤ g(x), where g is a radial decreasing func-
tion with limr→∞ rd g(r) = 0. Then
|(h ∗ f )(x)| . kgkL1 (M f )(x).
Proof. By the fundamental theorem of calculus, we have

Z ∞
g(y) = χ(0,ρ) (|y|)(− ∂g
∂r )(ρ) dρ.
|y|
Thus
Z Z ∞
|(h ∗ f )(x)| ≤ χ(0,ρ) (|y|)(− ∂g
∂r )(ρ)|f (x − y)| dρ dy
0
Z ∞ Z
≤ |f (x − y)| dy (− ∂g ∂r )(ρ) dρ
0 |y|<ρ
Z ∞
. M f (x) ρd (− ∂g
∂r )(ρ) dρ
0
Z ∞
. M f (x) g(ρ)ρd−1 dρ . kgkL1 M f (x).
0
Note that to integrate by parts used rd g(r) → 0 as r → ∞. This completes

the proof.
Proof of the fractional chain rule. We write

X 1
k|∇|s F (u)kLp ∼ k( N 2s |PN F (u)|2 ) 2 kLp .
R
Now, using that ψ̌ = 0, we write
Z
[PN (F (u))](x) = N d ψ̌(N y)[F ◦ u(x − y) − F ◦ u(x)] dy,
so that
Z
|[PN (F (u))](x)| ≤ N d |ψ̌(N y)||u(x − y) − u(x)|[G ◦ u(x − y) + G ◦ u(x)] dy.
(7.4)
We decompose
X
|u(x − y) − u(x)| ≤ |u>N (x − y)| + |u>N (x)| + |uK (x − y) − uK (x)|.
K≤N
We will now prove

|uK (x − y) − uK (x)| . K|y| M uK (x − y) + M uK (x) . (7.5)
It suffices to treat the case K|y| < 1. We may apply the fattened projection
P̃K (abusing notation and writing the corresponding convolution kernel as
ψ̌) and write
Z
uK (x − y) − uK (x) = K d ψ̌(Kz)[uK (x − y − z) − uK (x − z)] dz
Z
= K d [ψ̌(K(z − y)) − ψ̌(Kz)]uK (x − z) dz
Z Z 1
d
= K Ky · ∇ψ̌(Kz − θKy) dθ uK (x − z) dz.
0
Thus, using the lemma above,

Z
Kd
|uK (x − y) − uK (x)| . K|y| |u (x
(1+K|z|)100d K
− z)| dz
. K|y|M uK (x)kK d (1 + K|z|)−100d kL1

. K|y|M uK (x),
Continuing from (7.4), we bound |PN (F ◦ u)(x)| by
|PN (F ◦ u)(x)|
≤ M (u>N G ◦ u)(x) + M (u>N )(x)|G ◦ u(x)|
+ |u>N (x)|M (G ◦ u)(x) + |u>N (x)(G ◦ u)(x)|
X Z
+ N d K|y||ψ̌(N y)|[M uK (x − y) + M uK (x)]
K≤N
× {|G ◦ u(x − y)| + |G ◦ u(x)|} dy.
The contribution of the first four terms can be bounded by
M (u>N G ◦ u)(x) + M (u>N )(x)M (G ◦ u)(x).
The contribution of the sum can be bounded by

X
K
N {M (M uK G ◦ u)(x) + M [M uK ]M [G ◦ u](x)}.
K≤N
Now we can estimate

X 1
k( N 2s |PN F (u)|2 ) 2 kLp
X 1
. k( N 2s |M (u>N G ◦ u)|2 ) 2 kLp (7.6)
X 1
+ k( N 2s |M (u>N )M (G ◦ u)|2 ) 2 kLp (7.7)
2 21
X X
+ k( N 2s | K
N M (M uK G ◦ u)| ) kL
p (7.8)
K≤N
X X 1
+ k( N 2s | K
N M (M uK )M (G ◦ u)|2 ) 2 kLp . (7.9)
K≤N
It remains to bound these four terms.

First, by the vector maximal inequality and the corollary to the square
function estimate,
X 1
(7.6) . kG ◦ u( N 2s |u>N |2 ) 2 kLp
. kG ◦ ukLp2 k|∇|s ukLp1 .
Arguing similarly,
X 1
(7.7) . kM (G ◦ u)kLp2 k( |M (N s u>N )|2 ) 2 kLp1
. kG ◦ ukLp2 k|∇|s ukLp1 .
Note that p2 = ∞ is allowed.

For (7.8) and (7.9), we need the following general inequality:
X X K 2 X 2s
N 2s N cK
. N |cN |2 provided s < 1. (7.10)
K≤N
Using this and arguing as above suffices to treat (7.8) and (7.9). The proof
of (7.10) (and (7.8) and (7.9)) is left as an exercise.
7.3 Coifman–Meyer multipliers

In this section we will prove a version of the Coifman–Meyer multiplier
theorem. This may be viewed as a generalization of the Mihlin multiplier
theorem (cf. Theorem 7.1.1) to the case of bilinear operators. In particular,
given m : Rd ×Rd → C we may define the bilinear operator Tm by prescribing
its Fourier transform:
Z
F[Tm (f, g)](ξ) = m(ξ − η, η)fˆ(ξ − η)ĝ(η) dη.
Rd
Equivalently, we may write

ZZ
Tm (f, g)(x) = e2πixξ m(ξ − η, η)fˆ(ξ − η)ĝ(η) dη dξ.
Note that if m ≡ 1 then Tm (f, g) = f g (cf. Lemma 2.5.11). Similarly,

if m(ξ1 , ξ2 ) = a(ξ1 )b(ξ2 ) then Tm (f, g) = [Ta f ][Tb g], where Ta is the Fourier
multiplier operator with symbol a.
We may also understand Tm by observing that (formally)
Tm (e2πixξ1 , e2πixξ2 ) = m(ξ1 , ξ2 )e2πix(ξ1 +ξ2 ) .
Indeed, this follows from F[e2πixξj ] = δ(ξ − ξj ). This shows that Tm multi-
plies plane waves (adding their frequencies) and modulates their amplitude
by m(ξ1 , ξ2 ).
Our goal will be to prove Lp × Lr → Lq mapping properties for bilin-
ear operators of this type. We will consider multipliers/operators of the
following type.
Definition 7.3.1. We call m : Rd × Rd → C a Coifman–Meyer symbol if
it obeys
|∂ξα11 ∂ξα22 m(ξ1 , ξ2 )| .|α1 |,|α2 |,d (|ξ1 | + |ξ2 |)−|α1 |−|α2 |
for all multiindices α1 , α2 . We call Tm a Coifman–Meyer multiplier.
7.3. COIFMAN–MEYER MULTIPLIERS 183
Remark 7.3.2. This should be compared with the definition of a Mih-

lin multiplier (see again Theorem 7.1.1). In practice, only finitely many
multiindices (depending on the dimension) are needed, but we will not be
concerned with this refinement.
Note that the product of two Coifman–Meyer multipliers is again a
Coifman–Meyer multiplier.
In the setting of Mihlin multipliers, we sought to prove Lp → Lp bounds.
For Coifman–Meyer multipliers, it is more natural to seek Hölder-type esti-
mates, i.e.
kTm (f, g)kLr . kf kLp kgkLq , p1 + 1q = 1r .
In fact, we will prove the following.
Theorem 7.3.3. Let m be a Coifman–Meyer symbol. Then the multiplier
Tm maps Lp × Lq → Lr boundedly for all 1 < p, q, r < ∞ satisfying
1 1
p + q = 1r .
Remark 7.3.4. This theory can be extended to handle the endpoints p, q, r ∈

{1, ∞}; however, we will not pursue this extension here. See e.g. [23] for a
clear presentation.
Example 7.3.1. Define
ξ1 · ξ2
m(ξ1 , ξ2 ) = ,
|ξ1 |2 + |ξ2 |2
i.e.
Tm (f, g) = (−∆)−1 [∇f · ∇g].
Then m is a Coifman–Meyer symbol, and hence
k(−∆)−1 [∇f · ∇g]kLr . kf kLp kgkLq

1 1
for 1 < p, q, r < ∞ satisfying r = p + 1q .
The proof of Theorem 7.3.3 will firstly rely on a ‘paraproduct decompo-
sition’ for Coifman–Meyer multipliers. In particular, we need the notion of
high-high, high-low, and low-high multipliers:
• We call Tm a high-high paraproduct if |ξ1 | ∼ |ξ2 | on the support of
m.
• We call Tm a low-high paraproduct if |ξ1 + ξ2 | ∼ |ξ2 | on the support

of m.
• We call Tm a high-low paraproduct if |ξ1 + ξ2 | ∼ |ξ1 | on the support

of m.
We have the following paraproduct decomposiion:
Lemma 7.3.5. Given a Coifman–Meyer paraproduct Tm , we may decom-
pose
Tm = πhh + πhl + πlh ,
where πhh , πhl , πlh are high-high, high-low, and low-high Coifman–Meyer
paraproducts.
Proof. We recall the Littlewood–Paley multipliers ψN , ϕN from Section 7.2.
In particular, as the ψN form a partition of unity (where we take N ∈ 2Z ),
we can write for any (ξ1 , ξ2 ),
X
1= ψN (ξ1 )ψM (ξ2 )
N,M
X X X X
= ψN (ξ1 )ϕ N (ξ2 ) + ψN (ξ1 )ψM (ξ2 ) + ϕ N (ξ1 )ψN (ξ2 ).
8 8
N N N
≤M ≤8N N
8
These three expressions are high-low, high-high, and low-high multipliers,

respectively. If we multiply by m(ξ1 , ξ2 ), then we complete the proof of the
lemma.
Thus, to prove the Coifman–Meyer theorem, it suffices to treat low-high

and high-high multipliers; the high-low case follows by symmetry.
We will need the following technical lemma.
Lemma 7.3.6. Let f be a Schwartz function and N ∈ 2Z . For any x, y, we
have the bound
|P≤N f (y)| . hN (y − x)id M f (x),
√
where hxi = 1 + x2 and M is the Hardy–Littlewood maximal function.
Proof. We begin with the estimate
Z Z
N d
|ϕ̌(N (y − z))| |f (z)| dz . N d
hN (y − z)i−100d |f (z)| dz,
where we use the fact that ϕ̌ is a Schwartz function.

For the region hz − xi . hy − xi, we estimate
Z Z
d −100d d
N hN (y − z)i |f (z)| dz . N |f (z)| dz
hz−xi.hy−xi hz−xi.hy−xi
. N d hy − xi M f (x),
d
which is acceptable. For the remaining region hz − xi hy − xi, we use

Lemma 7.2.5 to estimate
Z
N d
hN (y − z)i−100d |f (z)| dz
hz−xihy−xi
Z
. N d hN (z − x)i−100d |f (z)| dz . M f (x),
which is also acceptable.
Lemma 7.3.7 (High-high paraproducts). Let πhh be a high-high paraprod-

uct. Then πhh satisfies the bounds appearing in Theorem 7.3.3.
Proof. We write
X
πhh (f, g) = πhh (P̃N PN f, P̃M PM g),
N,M
where P̃N denotes the operator corresponding to the fattened Littlewood–

Paley multiplier. Now, the operator
πhh (P̃N ·, P̃M ·)
is zero unless N ∼ M ; in this case, it is a bilinear multiplier with a symbol

mN M , which is a bump function supported where |ξ1 | ∼ |ξ2 | ∼ N . Writing
TN M for TmN M , we find
X
|πhh (f, g)| ≤ |TN M (PN f, PM g)|.
N ∼M
Now consider the symbol mN M . We will decompose mN M (ξ1 , ξ2 ) using

a Fourier series on a torus in Rd × Rd of sidelength CN for large enough
C > 0: X
mN M (ξ1 , ξ2 ) = cn1 ,n2 e2πi(n1 ·ξ1 +n2 ·ξ2 )/CN
n1 ,n2 ∈Zd
on the support of ψN (ξ1 )ψM (ξ2 ). Now, using the definition of the Fourier co-
efficients and integration by parts, the Coifman–Meyer condition guarantees
that
cn1 ,n2 . (1 + |n1 | + |n2 |)−100d (7.11)
(see Exercise 8.3.2).
The advantage of this decomposition is that it factors m into a sum of

terms of the form a(ξ1 )b(ξ2 ). In particular, we compute
TN M (PN f, PM g)(x)
X
n1 n2
= cn1 ,n2 PN f (x − CN )PM g(x − CN ),
n1 ,n2
so that
|πhh (f, g)(x)|
X X
. (1 + |n1 | + |n2 |)−100d |PN f (x − n1
CN )| |PM g(x − n2
CN )|.
N ∼M n1 ,n2
We write PN f = P̃N PN f and use Lemma 7.3.6 to estimate

|PN f (x − n1
CN )| . hn1 id M [PN f ](x).
Similarly, recalling N ∼ M ,
|PM g(x − n2
CN )| . hn2 id M [PM g](x).
Thus
hn1 id hn2 id
X X
|πhh (f, g)(x)| . h|n1 |+|n2 |i100d
|M PN f (x)| |M PM g(x)|
N ∼M n1 ,n2
X
. |M PN f (x)| |M PM g(x)|
N ∼M
X 1 X 1
2 2
2 2
. |M PN f (x)| |M PN g(x)| ,
N N
and hence, using Hölder’s inequality, the vector maximal inequality (Theo-
rem 6.3.9) and the Littlewood–Paley square function estimate (Theorem 7.2.6)
1 X 1
X 2 2
2 2
kπhh (f, g)kLr .
|M PN f (x)| |M PN g(x)|

N N Lr
X 1 X 1
2 2
2 2

.
|M PN f (x)|

|M P N g(x)|

p
L Lq
N N
1 1
X 2 X 2
2 2
.
|PN f (x)|

|PN g(x)|

Lp Lq
N N
. kf kLp kgkLq ,
1 1
provided 1 < p, q, r < ∞ and p + q = 1r .
It remains to treat the case of low-high paraproducts.

Lemma 7.3.8 (Low-high paraproducts). Let πlh be a low-high paraproduct.
Then πlh satisfies the bounds appearing in Theorem 7.3.3. (In fact, we may
allow p = ∞.)
Proof. We write
X X
πlh (f, g) = πlh (f, P̃N PN g) = TmN (P≤ N f, PN g),
8
N N
where
mN (ξ1 , ξ2 ) = m(ξ1 , ξ2 )ϕ̃ N (ξ1 )ψ̃N (ξ2 ).
8
Here m denotes the multiplier for πlh . In particular mN is a bump function

supported where |ξ1 | . N and |ξ2 | ∼ N . Thus, we may perform a Fourier
series decomposition for mN (ξ1 , ξ2 ) as before and write
X X
n1 n2
πlh (f, g)(x) = cn1 ,n2 P≤ N f (x − CN )PN g(x − CN )
8
N n1 ,n2
with
|cn1 ,n2 | . h|n1 | + |n2 |i−100d .
0
We estimate the Lr norm by duality. We fix h ∈ Lr . Noting that
P≤ N f PN g = P̃N [P≤ N f PN g],
8 8
we use Lemma 7.3.6, Proposition 7.2.3(b), the vector maximal inequality

(Theorem 6.3.9), and the Littlewood–Paley square function estimate (The-
orem 7.2.6), to estimate
Z X X
n1 n2
|cn1 ,n2 P≤ N f (x − CN )PN g(x − CN )P̃N h(x)| dx
8
N n1 ,n2
Z X X
hn1 id hn2 id
. h|n1 |+|n2 |i100d
[M f ]M [PN g]|P̃N h| dx
N n1 ,n2
Z X 1 X 1
2 2
2 2
. |M f | |M [PN g]| |P̃N h| dx
N N
X 1 X 1
2 2
2 2

. kM f kL
p |M [PN g]| |P̃ N h|
Lr0
q

N L N
. kf kLp kgkLq khkLr0 .
0
Taking the supremum over unit h ∈ Lr yields the result.
Proof of Theorem 7.3.3. Combining the high-high and low-high estimates,

we complete the proof of Theorem 7.3.3.
7.4 Oscillatory integrals

In this section we discuss the theory of oscillatory integrals. There are two
types of integrals one often considers.
Oscillatory integrals of the first kind are written
Z
I(λ) = eiλφ(x) ψ(x) dx,
where λ > 0, φ : Rd → R, and ψ : Rd → C. In this case we are interested in

the asymptotic behavior of I(λ) as λ → ∞.
Oscillatory integrals of the second kind are written
Z
Tλ f (x) = eiλφ(x,y) K(x, y)f (y) dy,
where λ > 0, φ : Rd × Rd → R, K : Rd × Rd → C, and f : Rd → C. In this

case, we are interested in estimates on the operator norm of Tλ as λ → ∞.
We begin by considering oscillatory integrals of the first kind in dimen-
sion d = 1.
Proposition 7.4.1. Let φ : R → R and ψ : R → C be smooth functions.

Suppose ψ has compact support inside an interval (a, b) and φ0 (x) 6= 0 for
all x ∈ [a, b]. Then
Z b
I(λ) = eiλφ(x) ψ(x) dx
a
satisfies
|I(λ)| .N λ−N for all N ≥ 0.
Note that without the assumption of compact support inside (a, b), the
best possible decay is λ−1 , which is realized by φ(x) = x and ψ(x) = 1.
Proof. First, the bound |I(λ)| . 1 is immediate. Next, we write
eiλφ(x) = 1 d iλφ(x)
iλφ0 (x) dx e
and integrate by parts. This yields

Z
I(λ) = − eiλφ(x) dx
d
1
iλφ0 (x) ψ(x)] dx,
7.4. OSCILLATORY INTEGRALS 189
so that
|I(λ)| . λ−1 .
To continue, define the operator D via
1 d
Df (x) = iλφ0 (x) dx f (x).
The computation above shows that the adjoint Dt is given by
Dt f (x) = − dx
d 1

iλφ0 (x) f (x) .
For any N ≥ 0, we may write eiλφ = DN eiλφ , and so

Z b
I(λ) = eiλφ (Dt )N ψ dx.
a
This yields
Z
|I(λ)| . |(Dt )N ψ| dx
N β
∂ ψ(∂ α1 φ) · · · (∂ αk φ)

X X
−N
.λ
(φ0 )N +k
L1 ([a,b])
k=0 |β|+|α1 |+···+|αk |=N
.N λ−N .
Proposition 7.4.2 (Van der Corput Lemma). Let φ be real-valued and

smooth. Let k ≥ 1 and suppose
|φ(k) (x)| ≥ 1 for all x ∈ [a, b].
If k = 1, assume additionally that φ0 is monotone. Then

Z b
I(λ) = eiλφ(x) dx
a
satisfies
1
|I(λ)| .k λ− k ,
where the implicit constant is independent of λ, φ, a, b.

Note that if k = 1 then one needs more than just |φ0 | ≥ 1. This can be
seen first by noting that
Z b Z b
iφ(x)

e dx ≥ cos(φ(x)) dx .

a a
Now, if φ is chosen so that φ0

is very large when cos φ < 0 and φ0 ∼ 1 when
cos φ > 0, then one can arrange that
Z b

cos(φ(x)) dx → ∞ as b → ∞.

a
Proof of Van der Corput. First consider k = 1. Then an integration by

parts yields
1 b iλφ d
Z
1 eiλφ(b) eiλφ(a)
e dx φ01(x) dx.

I(λ) = iλ φ0 (b) − φ0 (a) + iλ
a
As the integral is bounded by

Z b
d 1

dx φ0 (x) dx . 1
a
(since φ0 is assumed to be monotone), the result follows in this case.

For k ≥ 2, we proceed by induction. Suppose the result holds at level
k ≥ 1. Replacing φ by −φ if necessary, we may assume that
φ(k+1) (x) ≥ 1 for all x ∈ [a, b].
In particular, φ(k) is increasing, so there is at most one point c ∈ [a, b] such
that φ(k) (c) = 0.
Case 1. Suppose there exists c ∈ [a, b] so that φ(k) (c) = 0. Then for any
δ > 0 we have
|φ(k) (x)| ≥ δ for all x ∈ [a, b]\(c − δ, c + δ).
We write
Z c−δ Z c+δ Z b
iλφ iλφ
I(λ) = e dx + e dx + eiλφ dx.
a c−δ c+δ
1
We estimate by the change of variables x = δ − k y:
1
c−δ δ k (c−δ)
Z Z 1

iλφ(δ − k y)
iλφ
−1

e dx = δ
k
1
e dy
a δka
1
− k1
. δ− k λ ,
where we have used the inductive hypothesis and the fact that
1
|∂ k φ(δ − k y)| ≥ δ −1 δ ≥ 1.
The contribution of (c + δ, b) is treated similarly, while the contribution

of (c−δ, c+δ) is bounded by the length of the interval, i.e. 2δ. In particular,
Z b
1 1
e dx . δ + (δλ)− k . λ− k+1 ,
iλφ

a
1
as can be seen by choosing δ ∼ λ− k+1 .
Case 2. Suppose φ(k) 6= 0 for x ∈ [a, b]. If φ(k) (a) > 0 then we have
(k)
φ (x) ≥ δ for x ∈ [a + δ, b]. Then we can write
Z b Z a+δ
Z b

iλφ iλφ iλφ

e dx ≤ e dx + e dx

a a a+δ
1
≤ δ + (δλ)− k
1
. λ− k+1
choosing δ as above. If φ(k) (a) < 0 then we have φ(k) (b) < 0 and so φ(k) (x) <
−δ for x ∈ (a, b − δ). Then we can argue similarly. This completes the
proof.
Corollary 7.4.3. Let φ : R → R be smooth. Let k ≥ 1 and assume that
|φ(k) (x)| ≥ 1 for all x ∈ [a, b].
If k = 1, assume additionally that φ0 is monotone. Then

Z b
1
iλφ(x)
ψ(x) dx .k λ− k |ψ(b)| + kψ 0 kL1 ([a,b]) .

e
a
Proof. We write
Z b Z b Z x
iλφ(x)
e ψ(x) dx = eiλφ(y) dy dx
d
ψ(x) dx
a a a
Z b Z b Z x
iλφ(y) 0
= ψ(b) e dy − ψ (x) eiλφ(y) dy dx.
a a a
Thus
Z b Z b
iλφ iλφ

e ψ dx ≤ |ψ(b)| e dy
a
aZ x
e dy kψ 0 kL1
iλφ

+ sup
x∈[a,b] a
− k1 1
. |ψ(b)|λ + λ− k kψ 0 kL1 ,
where in the final step we use the van der Corput lemma.
Proposition 7.4.4 (Stationary phase). Let φ : R → R be smooth. Assume

φ has a nondegenerate critical point at x0 , that is,
φ0 (x0 ) = 0 and φ00 (x0 ) 6= 0.
If ψ : R → C is smooth and supported in a sufficiently small neighborhood

of x0 , then Z
satisfies
1 1 1 3
I(λ) = (2πi) 2 λ− 2 [φ00 (x0 )]− 2 eiλφ(x0 ) ψ(x0 ) + O(λ− 2 ) as λ → ∞.
Proof. Let us first get the decay rate.

Let a ∈ Cc∞ satisfy a(x) = 1 for |x| ≤ 1 and a(x) = 0 for |x| > 2. We
write
I(λ) = I1 (λ) + I2 (λ),
where
Z
1
I1 (λ) = eiλφ(x) ψ(x)a(λ 2 (x − x0 )) dx,
Z
1
I2 (λ) = eiλφ(x) ψ(x)[1 − a(λ 2 (x − x0 ))] dx.
Thus by a change of a variables,

1
|I1 (λ)| . λ− 2 .
On the other hand, using integration by parts (and the fact that φ0 (x) 6= 0
on the support of ψ away from x = x0 ), we can get
|I2 (λ)| .N λ−N

for any N .
To get the exact coefficients, we argue as follows. By Taylor’s theorem
and φ0 (x0 ) = 0
φ(x) − φ(x0 ) = 12 φ00 (x0 )(x − x0 )2 1 + η(x) ,

where η is smooth and η(x) = O(|x − x0 |).

Now let U be a small neighborhood of x0 so that (i) |η| < 1 on U and
(ii) φ0 = 0 on U \{x0 }. We make the change of variables
1
y = (x − x0 ){1 + η(x)} 2 ,
which is a diffeomorphism (i.e. differentiable bijection with differentiable in-
verse) from U to a small neighborhood of y = 0. Assume that ψ is supported
in U .
Now we write
Z
iλφ(x0 )
I(λ) = e eiλ[φ(x)−φ(x0 )] ψ(x) dx
ZU
00 2
= eiλφ(x0 ) eiλφ (x0 )y /2 ψ1 (y) dy,
where ψ1 ∈ Cc∞ is supported in a neighborhood of y = 0. Set λ1 =

λφ00 (x0 )/2.
Introducing another Cc∞ function ψ2 (equal to one on the support of ψ1 ),
we can write
Z Z
2 2 2 2
eiλ1 y ψ1 (y) dy = eiλ1 y e−y ey ψ1 (y)ψ2 (y) dy.
We now use Taylor expansion to write

N
2
X
ey ψ1 (y) = aj y j + y N RN (y) = P (y) + y N RN (y)
j=0
for some N ≥ 2. Note a0 = ψ(x0 ).

Thus
Z N Z
2 2 2 2 2
X
eiλ1 y e−y ey ψ1 ψ2 dy = aj eiλ1 y e−y y j dy (7.12)
j=0
Z
2 2
+ eiλ1 y e−y P [ψ2 − 1] dy (7.13)
Z
2 2
+ eiλ1 y e−y y N RN ψ2 dy. (7.14)
First consider (7.12). Then, using the change of variablesI z = y(1 −

1
iλ1 ) 2 and using (1 + z)−a = 1 − az + O(z 2 ), we have
Z Z
iλ1 y 2 −y 2 j 2
e e y dy = e−y (1−iλ1 ) y j dy
Z
− 12 − 2j 2
= (1 − iλ1 ) e−z z j dz
Z
− 12 − 2j −1 − 12 − 2j 2
= (−iλ1 ) (1 + iλ1 ) e−z z j dz
Z
− 12 − 2j −1 2
= (−iλ1 ) (1 + O(λ1 )) e−z z j dz.
Thus the leading term of (7.12) is

√ 00 1
ψ(x0 ) π[ −iφ 2(x0 )λ ]− 2 ,
−3 3
as desired. The next terms are all O(λ1 2 ) = O(λ− 2 ) (note that the integral
in the j = 1 case vanishes).
3
We would like to show that (7.13) and (7.14) are O(λ− 2 ).
For term (7.13), we note that
2
e−y P (y)[ψ2 (y) − 1]
is supported away from zero, so that we may integrate by parts to deduce

(7.13) is .m λ−m for any m ≥ 0.
We turn to (7.14). We write
Z Z
iλ1 y 2 N −y 2 2 2
e y e RN ψ2 dy = eiλ1 y y N e−y RN ψ2 a( yε ) dy (7.15)
Z
2
+ eiλ1 y y N b(y)[1 − a( yε )] dy, (7.16)
where
2
b(y) := e−y RN (y)ψ2 (y).
Now, Z
|(7.15)| . |y|N |a( yε )| dy . εN +1 .
To deal with (7.16), we may use the identity

2
d iλ1 y 2
eiλ1 y = 1
2iλ1 y dy e ,
I
Technically this should be done more carefully using contour integration.
since the integrand in (7.16) is supported away from y = 0. Then

Z
2 m N
(7.16) = eiλ1 y − dy d 1
2iyλ1 y b(y)(1 − a( yε ))] dy.
Thus, choosing m > N + 1,

m
y N −α1 |∂ α2 b|ε−α3 |∂ α3 [1 − a]( yε )|
X X Z
|(7.16)| . λ−m
1 dy
|y|m+k
k=0 α1 +α2 +α3 =m−k
XZ
. λ−m
1 |y|N −m−(k+α1 ) ε−α3 dy
k,α |y|≥ε
X
. λ−m
1 εN +1−m−(k+α1 +α3 )
k,α
. λ−m
1 ε
N +1−2m
.
−1
Now choose ε ∼ λ1 2 , so that
εN +1 ∼ λ−m
1 ε
N +1−2m
.
We have
− N2+1 − 32
|(7.16)| . λ1 . λ1
for N ≥ 2. This completes the proof.
Example 7.4.1 (Linear Schrödinger equation). Consider the linear Schrödinger

equation (
i∂t u = − 21 ∆u,
u(0, x) = u0 (x),
where u : Rt × Rx → C. We can solve this using the Fourier transform.
With Z
− 21
ˆ
f (ξ) = (2π) e−ixξ f (x) dx,
we have
i∂t û(t, ξ) = 12 |ξ|2 û(t, ξ),
so that Z
− 12 2 /2
u(t, x) = (2π) eixξ−itξ û0 (ξ) dξ.
Stationary phase allows us to describe the long-time behavior of solutions.

In particular, we write
2 /2
eixξ−itξ = eitΦ(ξ;t,x) , Φ(ξ; t, x) = xt ξ − 21 ξ 2 .
We compute the critical points of Φ:

x
∂ξ Φ = t −ξ =0 for ξ = ξ0 := xt .
As ∂ξ2 φ ≡ −1, we have that ξ0 is a nondegenerate critical point.

Thus, stationary phase yields
1 1 1 1 x 1 2 3
u(t, x) = (2π)− 2 (2πi) 2 t− 2 (−1) 2 eit( t ξ0 − 2 ξ0 ) û0 (ξ0 ) + O(t− 2 )
as t → ∞. Simplifying this we get the Fraunhofer approximation

1 2 /2t
u(t, x) ∼ (it)− 2 eix û0 ( xt ).
Roughly, this states that the long-time spatial distribution is determined by

the initial momentum distribution.
Example 7.4.2. Consider the Klein–Gordon equation
utt − uxx + m2 u = 0.
Using the Fourier transform, this becomes
ûtt = −(ξ 2 + m2 )û,
and so √ √
ξ 2 +m2 ξ 2 +m2
û(t, ξ) = A(ξ)eit + B(ξ)e−it
for some A, B defined in terms of the initial data. In particular, to un-
derstand the asymptotic behavior we need to understand the asymptotic
behavior of Z √
2 2
eixξ±it ξ +m ϕ(ξ) dξ.
Consider first p
Φ = xt ξ + ξ 2 + m2 ,
so that
1
m2
Φ0 = x
t + ξ(ξ 2 + m2 )− 2 , Φ00 = 3 6= 0.
(ξ +m2 ) 2
2
The stationary point of Φ occurs when

1
x
t = −ξ(ξ 2 + m2 )− 2 .
Squaring both sides leads to

m2 ( xt )2
ξ2 = ,
1 − ( xt )2
and thus we get a stationary point if | xt | < 1. Considering separately the

cases x < 0 and x > 0, we find that the stationary point is given by
−m( xt )
ξ0 = p
1 − ( xt )2
In this case, we get that

q 3 − 3
Φ(ξ0 ) = m 1 − ( xt )2 and Φ00 (ξ0 ) = m 2 1 − ( xt )2 2 ,
and so
Z √ √ x 2
2 2 1 1 3 3 −m x
eixξ+it ξ +m ϕ(ξ) dξ ∼ (2πi) 2 t− 2 m− 4 [1−( xt )2 ] 4 eimt 1−( t ) ϕ √ tx 2
1−( t )
provided | xt | < 1.
For the other phase, observe
Z √
2 2
eixξ−it ξ +m ψ(ξ) dξ
Z √
2 2
= e−ixξ+it ξ +m ψ̄(ξ) dξ
Z √
2 2
= eixξ+it ξ +m ψ̄(−ξ) dξ
1 1 3 3
√ x 2 mx
∼ (−2πi) 2 t− 2 m− 4 [1 − ( xt )2 ] 4 e−imt 1−( t ) ψ √ t x 2 .

1−( t )
Now return to the PDE. Suppose u|t=0 = f for some f : R → R and

∂t u|t=0 = 0. (We leave the more general case as an exercise). Solving for
A and B above then yields A = B = 12 fˆ. Noting that fˆ(−ξ) = fˆ(ξ) (for
real-valued f ), we deduce
1 3 3 1 √ x 2 −m x
u(t, x) ∼ t− 2 m− 4 [1 − ( xt )2 ] 4 Re i 2 eimt 1−( t ) fˆ √ tx 2
1−( t )
1
for |x| < t. Alternately, writing ρ = (t2 − |x|2 ) 2 , we can write
1 1
u(t, x) ∼m ρ− 2 Re[i 2 eimρ fˆ(mρ)], |x| < t.
We turn to the higher dimensional case. We begin with the following.

Proposition 7.4.5. Let ψ : Rd → C be smooth and compactly supported.

Let φ : Rd → R be smooth, with ∇φ nonzero on the support of ψ. Then
Z
obeys
|I(λ)| .N λ−N for all N > 0.
Proof. The case N = 0 is immediate. Let us demonstrate the N = 1 case;

the extension to N ≥ 2 then follows from iteration.
We use the fact that
∇eiλφ = iλ∇φ eiλφ ,
so that
∇φ · ∇eiλφ
eiλφ = .
iλ|∇φ|2
We can then write
∇φ · ∇eiλφ
Z
I(λ) = ψ dx
iλ|∇φ|2
d Z
X ∂j φ
= −∂j ψ eiλφ dx,
iλ|∇φ|2
j=1
and so
−1
∇φ ψ −1
|I(λ)| . λ ∇ · |∇φ|2 1 . λ .

L
As mentioned above, the case N ≥ 2 follows from iteration.
We skip the analogue of van der Corput’s lemma, which would yield a
− 1
bound of λ |α| whenever one has lower bounds on ∂ α φ.
Instead, we will move to stationary phase in higher dimensions. The
result is the following.
Proposition 7.4.6. Let φ : Rd → R be smooth. Assume φ has a nonde-

generate critical point at x0 . If ψ : Rd → C is smooth and supported in a
sufficiently small neighborhood of x0 , then
Z
I(λ) := eiλφ(x) ψ(x) dx
satisfies
d
(2πi) 2 eiλφ(x0 ) ψ(x0 ) − d − d −1
I(λ) =
∂2φ 1 λ 2 + O(λ 2 )
det[ ∂xi ∂xj (x0 )] 2
as λ → ∞.
2
We also write ( ∂x∂i ∂x
φ
j
) = D2 φ.
The proof is similar in spirit to the d = 1 case. The key step there was to
use a change of variables to turn the phase into an exactly quadratic phase.
The necessary result in higher dimensions is the following:
Lemma 7.4.7 (Morse lemma). Let φ : Rd → R be smooth with a nonde-
generate critical point at x0 . Then there exists a smooth local change of
∂y
variables y = y(x) such that y(x0 ) = 0, ∂x |x0 = Id, and
d
X
1 2
φ(x) = φ(x0 ) + 2 λj yj ,
j=1
where λj are the eigenvalues of D2 φ(x0 ).

With the Morse lemma in hand, the proof of the stationary phase lemma
is very similar to the proof in one dimension. So, we will conclude our
discussion of oscillatory integrals of the first kind by proving the Morse
lemma.
Proof of the Morse lemma. Noting that D2 φ(x0 ) is symmetric, we may (by
a change of variables) assume that
D2 φ(x0 ) = diag(λ1 , . . . , λd ).
Now we can write

Z 1
d 2
φ(x) = φ(x0 ) + ∇φ(x0 ) · (x − x0 ) + (1 − t) dt2 [φ(x0 + t(x − x0 ))] dt,
0
as one can check by integrating by parts in the final integral. In particular,

d 1
∂2φ
X Z
φ(x) − φ(x0 ) = (x − x0 )i (x − x0 )j (1 − t) (x0 + t(x − x0 )) dt
0 ∂xi ∂xj
i,j=1
d
X
=: (x − x0 )i (x − x0 )j mij (x).
i,j=1
2
Observe that mij is smooth, with mij = mji and mij (x0 ) = 12 ∂x∂i ∂x
φ
j
(x0 ).
We proceed by induction. Suppose we have found a smooth local change
of variables y = y(x) so that
X
φ(x) = φ(x0 ) + 12 λ1 y12 + · · · + 12 λr−1 yr−1
2
+ yi yj m̃ij (y),
i,j≥r
∂y
where y(x0 ) = 0, ∂x |x=x0 = Id, and the m̃ij are smooth and symmetric.
(The computation above gives the base case r = 1 with y(x) = x − x0 .)
Now we compute
∂xi ∂xj ( 21 λk yk2 ) = λk (∂xi yk ∂xj yk ) + λk yk (∂xi ∂xj yk ).
In particular, at x = x0 we have
∂xi ∂xj ( 21 λk yk2 )|x=x0 = λk δik δjk .
Thus
D2 ( 12 λ1 y12 + · · · + 12 λr−1 yr−1

2
)|x=x0 = diag{λ1 , . . . , λr−1 , 0, . . . , 0}.
Using the fact that
D2 (φ(x) − φ(x0 ))|x=x0 = diag{λ1 , . . . , λd },
we deduce
X
D2 yi yj m̃ij (y) = diag{0, . . . , 0, λr , . . . , λd }.
i,j≥r
In particular, this implies
[m̃ij (y(x0 ))]i,j≥r = [m̃ij (0)]i,j≥r = 12 diag{λr , . . . , λd },
because if any yk survives undifferentiated then the contribution of the term

will be zero.
We would now like to define a change of variables y 0 so that yj0 = yj for
any j 6= r, while
X X
yi yj m̃ij (y) = 12 λr (yr0 )2 + yi yj m̃0ij (y)
i,j≥r i,j≥r+1
for some smooth symmetric m̃0ij . This will imply all of the desired properties
for the new variable y 0 . To this end, we write
X X X
yi yj m̃ij (y) = m̃rr (y)yr2 + yr m̃jr (y)yj + yi yj m̃ij (y).
i,j≥r j≥r+1 i,j≥r+1
We now ‘complete the square’. That is, we take

s
m̃rr (y) X m̃jr (y)
yr0 = 1 yr + 1
2 yj
2 λr
m̃rr (y)
j≥r+1
and modify the matrix m̃ij accordingly. With this change of variables, one
can now verify all of the desired properties and hence complete the induction.
We conclude this section with one sample result regarding oscillatory

integrals of the second kind. We will merely scratch the surface of a very
rich subject.
Define the family of operators Tλ by
Z
(Tλ f )(ξ) = eiλΦ(x,ξ) ψ(x, ξ)f (x) dx,
Rd
where λ > 0, ψ ∈ Cc∞ (Rd × Rd ), and Φ : Rd × Rd → R is smooth. Assume

that on the support of ψ the Hessian of Φ is nonzero:
∂ 2 Φ(x,ξ)
det ∂xi ∂ξj 6= 0.
We will prove:
Proposition 7.4.8. Under the assumptions above, we have

d
kTλ kL2 →L2 . λ− 2 .
Remark 7.4.9. Note that for fixed λ, we can easily obtain L2 boundedness:
Z
kTλ f kL2 ≤ kψ(x, ξ)kL2 |f (x)| dx ≤ kψkL2 kf kL2 .
ξ ξ x,ξ
The interesting point is to obtain decay as λ → ∞.

Proof of Proposition 7.4.8. Let Tλ∗ denote the adjoint of Tλ . By the method
of T T ∗ , it suffices to prove that the L2 → L2 norm of Tλ Tλ∗ is bounded by
λ−d . (See Exercise 7.5.5.)
We may write
Z
Tλ Tλ∗ f (ξ) = Kλ (ξ, η)f (η) dη,
where Z
Kλ (ξ, η) = eiλ[Φ(x,ξ)−Φ(x,η)] ψ(x, ξ)ψ̄(x, η) dx.
We will prove bounds on Kλ .

∂2Φ
Now let us denote by M (x, ξ) the matrix ∂xi ∂ξj . Given a ∈ Rd , denote
by ∇ax differentiation in the a direction.
For fixed (ξ, η), denote
∆ = ∆(x, ξ, η) = ∇a(x)
x [Φ(x, ξ) − Φ(x, η)]
where a(x) ∈ Rd is to be determined. We have
∆ = M (x, ξ)a(x) · (ξ − η) + O(|ξ − η|2 ).
As M is invertible, we may take
a(x) = a(x, ξ, η) = M (x, ξ)−1 |ξ−η|

ξ−η
,
so that
M (ξ, η)a(x) · (ξ − η) = |ξ − η|.
Now, if the support of ψ is sufficiently small, then we can guarantee
|∆(x, ξ, η)| ≥ c|ξ − η| for (ξ, η) ∈ supp Kλ .

a(x)
Now set Dx = [iλ∆(x, ξ, η)]−1 ∇x . We use the identity
(Dx )N (eiλ[Φ(x,ξ)−Φ(x,η)] ) = eiλ[Φ(x,ξ)−Φ(x,η)]
and integrate by parts N times to obtain

Z
Kλ (ξ, η) = eiλ[Φ(x,ξ)−Φ(x,η)] (DxT )N [ψ(x, ξ)ψ̄(x, η)] dx.
Rd
This yields
|Kλ (ξ, η)| .N (1 + λ|ξ − η|)−N := Aλ (|ξ − η|)

7.5. EXERCISES 203
for any N ≥ 0. Thus
|Tλ Tλ∗ f (ξ)| . [Aλ ∗ |f |](ξ),
so that (applying Young’s inequality and a change of variables), we get
kTλ Tλ∗ f kL2 . kAλ kL1 kf kL2 . λ−d kf kL2 .
This implies the desired result provided the support of ψ is sufficiently small.
To deal with the more general case, we employ a partition of unity to split
the support of ψ up into a finite number of sufficiently small pieces. This
7.5 Exercises
Exercise 7.5.1. Complete the proof of (c) in Proposition 7.2.3.
Exercise 7.5.2. Prove (7.10) and complete the proof of (7.8) and (7.9). Hint:
Expand the inner sum and change the order of integration. Performing the
sum in N leads to a bound of
X
1−s s
(K
L) K cK Ls cL ,
K≤L
which can then be dealt with by Schur’s test.

Exercise 7.5.3. Show that
Z π
lim N t sin( 1t ) cos(N t) dt = 0.
N →∞ 0
[Thanks to D. Grow for suggesting this exercise!]

Exercise 7.5.4. Prove the stationary phase lemma in higher dimensions, us-
ing the Morse lemma to find the change of variables that makes the phase
exactly quadratic.
Exercise 7.5.5. Let T : L2 → L2 and let T ∗ be its adjoint. Show that
kT k2 = kT ∗ k2 = kT T ∗ k.
Exercise 7.5.6. In this exercise, you will prove the stationary phase lemma
in the special case of an exactly quadratic phase: Show that for u ∈ Cc∞ (Rd )
and N ≥ 1, we have
N −1 d d π
(2π) 2 hk+ 2 ei 4 sgn Q
Z X
e ix·Qx/2h
u(x) dx = 1 ((Dx ·Q−1 Dx )k u)(0)+SN (u, h),
k=0 (2i)k | det Q| 2 k!
where Dx = −i∇x and

d
|SN (u, h)| .d,Q,u,N hN + 2 .
Here sgn Q denotes the signature of Q, which is a nondegenerate real sym-

metric d × d matrix.
Hint: Use Plancherel and the Fourier transform to prove
Z Z
d
i π4 sgn Q − 12 −1
e ix·Qx/2h
u(x) dx = h e
2 | det Q| e−ihξ·Q ξ/2 û(ξ) dξ.
Then use Taylor’s theorem to write

N −1
(it)k N
+ O( |t|N ! ).
X
eit =
k!
k=0
Exercise 7.5.7. Let σ denote the surface measure of the sphere S ⊂ Rd with
d ≥ 2. Show that
d−1
|σ̌(x)| . hxi− 2 ,
where Z p
σ̌(x) = eixξ dσ(ξ) and hxi := 1 + |x|2 .
S
Hint: As dσ is invariant under rotations, we can write
Z Z π
i|x|ξd
σ̌(x) = e dσ(ξ) ∼ ei|x| cos θ [sin θ]d−2 dθ
S 0
where θ is the angle between x and ed . To estimate this integral, use sta-
tionary phase and the van der Corput lemma.
Chapter 8
Semiclassical and microlocal

analysis
In this section we discuss some of the basic concepts and results in semiclas-
sical and microlocal analysis, following [20].
8.1 Semiclassical analysis

Semiclassical analysis was developed largely in order to give rigorous mean-
ing to the ‘Bohr correspondence principle’, which informally states that one
recovers classical mechanics from quantum mechanics in the limit h → 0
(where h is Planck’s constant). One of the key tools in semiclassical anal-
ysis is known as pseudodifferential calculus, which in turn has applications
in a wide range of related fields (e.g. partial differential equations and other
areas of mathematical physics).
We begin with some basic definitions.
1. Let g be a nonnegative smooth function on Rn . If for all multi-indices

α we have
∂ α g = O(g)
uniformly over Rn , then we call g a order function on Rn . For
example,
m
1, hzim = (1 + |z|2 ) 2 and ehzi
are order functions. If g is an order function, so is 1/g (check!).
2. Given an order function g, we define Sd (g) to be the set of all functions

a = a(x; h) (defined on Rd × (0, h0 ] for some h0 > 0) that are smooth
205
206 CHAPTER 8. SEMICLASSICAL AND MICROLOCAL ANALYSIS
in x and satisfy
∂xα a(x; h) = O(g)
uniformly over (x, h) for any multi-index α. Frequently, one works

with the order function g = 1, noting that a ∈ Sd (g) if and only if
ag −1 ∈ Sd (1).
3. Let a and {aj } belong to Sd (g). We write

∞
X
a∼ hj aj
j=0
if for any N, α there exists hN,α > 0 so that

N
X
α j .N,α hN g

∂x a − h aj

j=0
uniformly on Rd ×(0, hN,α ]. If a ∼ 0 in Sd (g), then we write a = O(h∞ )

in Sd (g).
We have the following result.
P jsequence {aj } of symbols in Sd (g), there exists

Proposition 8.1.1. For any
a ∈ Sd (g) such that a ∼ h aj in Sd (g). Furthermore, a is unique up to
the addition of an O(h∞ ) symbol.
hj aj .
P
Remark 8.1.2. One calls a a resummation of the formal symbol
Proof. We only sketch the proof; the complete details may be found in [20,
Lemma 2.3.3]. Without loss of generality, take g ≡ 1. One first constructs
a sequence εj → 0 such that if |α| ≤ j,
ε
k(1 − χ( hj ))∂ α aj kL∞
x
≤ h−1
for h small enough, where χ is a bump function. One then defines

ε
X
a(x; h) = hj 1 − χ( hj ) aj (x; h),
which
Pfor any h > 0 is actually only a finite sum. One can then verify that
a ∼ hj aj .
8.1. SEMICLASSICAL ANALYSIS 207
Example 8.1.1 (WKB approximation). The resummation of a formal series

can be used to construct approximate eigenfunctions for 1d Schrödinger
operators.
In particular, let V be a smooth function on R and suppose V (x0 ) < E.
For x near x0 , we define solutions to the ‘eikonal equation’
(ϕ0± )2 = E − V
by setting Z xp
ϕ± (x) = ± E − V (y) dy.
x0
Using this, a direct computation shows

p √
a00
(−h2 ∂x2 + V − E)(aeiϕ/h ) = −2ih ϕ0 (a ϕ)0 − ih 2√
iϕ/h
ϕ0
e
for ϕ = ϕ± and a smooth near x0 (check!). We then recursively define a±

j
by solving the following transport equations:
(a± 00
q q
j−1 )
(a±
0 ϕ0± )0 = 0, (a±
j ϕ0± )0 − i √ 0 =0
2 ϕ±
P jj ≥
for 1. We then let a± (x; h) be a resummation of the formal symbol
h a±
j By construction one can check that
.
(−h2 ∂x2 + V − E)u± (x, h) = O(h∞ ), where u± = a± eiϕ± /h .
Indeed, this follows from the fact that the formal series
q ± 00
0 0 − ih (aj )
X
hj a± √

j ϕ± 0
2 ϕ±
j≥0
is identically zero.
These approximate solutions are called WKB solutions (after Wentzel,
Kramers, and Brillouin).
We only considered the case V (x0 ) < 0. In the case V (x0 ) = E (called
a ‘turning point)’, this technique breaks down. In this case one can instead
use a power series expansion for V (x) − E. Solving the ODE to first order
leads to an equation known as the Airy equation, which is solved with spe-
cial functions (the Airy functions). One then needs to patch together the
approximate solutions away from and near the turning points (which will
only be possible for special values of E).
Pseudodifferential operators.
We next define the semiclassical Fourier transform of a Schwartz
function u on Rn by
Z
−n
Fh u(ξ) = û(ξ) = (2πh) 2 e−ixξ/h u(x) dx.
Rn
This is an L2 -isomorphism with inverse

Z
n
Fh−1 v(x) = (2πh)− 2 eixξ/h v(ξ) dξ.
Rn
As with the standard Fourier transform, Fh extends to an isomorphism on

tempered distributions. Writing Dx = −i∂x , we have
Fh (hDx u) = ξFh u and Fh (xu) = −hDξ Fh u,
generalizing the familiar identities obtained for the standard Fourier trans-
form.
Expanding out Fh−1 Fh u = u leads to the identity
ZZ
−n
u(x) = (2πh) ei(x−y)ξ/h u(y) dy dξ.
Our next goal is to make sense of more general operators of the form
ZZ
−n
u(x) 7→ (2πh) ei(x−y)ξ/h a(x, y, ξ)u(y) dy dξ
for some kernel a(x, y, ξ). This requires that we make sense of the integrals
Z
I(a) = I(a; x, y) = ei(x−y)ξ/h a(x, y, ξ) dξ. (8.1)
Suppose a(x, y, ξ) ∈ S3n (hξim ). If m < −n, the integral I(a) converges and
for u ∈ Cc∞ (Rn ) we may define
Z
Aa u(x, h) = ei(x−y)ξ/h a(x, y, ξ)u(y) dy dξ.
Now observe that
Lei(x−y)ξ/h = ei(x−y)ξ/h , where L := 1

1+ξ 2
(1 − hξDy ).
Thus we can write

Z
Aa u(x, h) = Ik u(x) = ei(x−y)ξ/h [t L(ξ, hDy )]k (au) dy dξ
where t denotes transpose and we have

1+hξD k
(t L)k (au) = 1+ξ2 y (au) = O(hξim−k )
uniformly as |ξ| → ∞. Thus we have

• Ik u(x) converges provided m < k − n,
• Ik+` u = Ik u for all ` ≥ 0.
Thus for any m ∈ R, a ∈ S3n (hξim ), and u ∈ Cc∞ (Rn ), we may define
Z
Aa u(x, h) = ei(x−y)ξ/h [t L(ξ, hDy )]k (au) dy dξ
for any k > m + n. One can check that Aa defines a continuous linear
operator from Cc∞ to C ∞ ; in particular, (by the ‘Schwartz kernel theorem’)
we may find a distribution K on Rn × Rn (called the distribution kernel
of Aa ) such that
hAa u, vi = hK, v ⊗ ui,
where u, v ∈ Cc∞ , ⊗ denotes tensor product, and h·, ·i denotes the pairing of
distributions and test functions. We denote the distribution kernel by (8.1).
Example 8.1.2. If a = 1, then choosing k > n we may verify that
Aa u(x) = (2πh)n u(x),
so that Z
ei(x−y)ξ/h dξ = (2πh)n δ(y − x)
in the sense of oscillatory integrals.

In light of the above, we make the following definition.
Definition 8.1.3. Given a ∈ S3n (hξim ) and u ∈ Cc∞ (Rn ), define
Z
−n
Oph (a)u(x; h) = (2πh) ei(x−y)ξ/h a(x, y, ξ)u(y) dy dξ.
Then Oph (a)u ∈ C ∞ (Rn ). For any ν ∈ R, the operator

h−ν Oph (a) : Cc∞ (Rn ) → C ∞ (Rn )
is called the semiclassical pseudodifferential operator of symbol h−ν a.
We say h−ν Oph (a) is of degree m and order ν.
Proposition 8.1.4. For a ∈ S3n (hξim ), we can extend Oph (a) to a map
from S → S or from S 0 → S 0 .
Proof. Let us show that Oph (a) : S → S. We write
Z
x ∂x Ik u(x) = xβ ∂xα [ei(x−y)ξ/h (t L)k (au)] dy dξ
β α
and split the integral into two regions I1 and I2 , where
I1 = {|x − y| ≤ 21 |x|} and I2 = {|x − y| > 12 |x|}.
Here we recall L = (1 + ξ 2 )−1 (1 − hξDy ). For k > m + n + |α|, the integral

over I1 is uniformly bounded since
xβ hξim+|α|−k hyi−γ = O(hξim+|α|−k hyi|β|−γ )
for any γ > 0 on {|x − y| ≤ 21 |x|}. Choosing γ > |β| + n, this contribution
is integrable with respect to y and ξ.
For the remaining region, we write
1
L̃ = 1+|x−y|2
(1 + h(x − y)Dξ ).
Integrating by parts N times with respect to ξ, the integral over I2 is written

as a sum of terms of the form
Z
xβ ei(x−y)ξ/h (t L̃)N [ξ α1 ∂xα2 (t L)k (au)] dy dξ
|x−y|≥ 21 |x|
(with α1 +α2 = α). Choosing N ≥ |β|+n, the contribution of I2 is uniformly

bounded. This shows Oph (a)u ∈ S, and in fact these estimates suffice to
show that the mapping is continuous.
Example 8.1.3 (Semiclassical differential operators). If

X
a(x, y, ξ) = bα (x)ξ α
|α|≤m
with bα ∈ Sn (1), then

X
Oph (a) = bα (x)(hDx )α .
|α|≤m
If X
a(x, y, ξ) = bα (y)ξ α
|α|≤m
with bα ∈ Sn (1), then

X
Oph (a) = (hDx )α bα (x).
|α|≤m
Remark 8.1.5. If we replace ei(x−y)ξ/h with eiϕ(x,y,ξ)/h where ϕ is a phase

function, then we are led to ‘Fourier integral operators’, often abbreviated
FIOs.
Remark 8.1.6. Given a(x, y, ξ), we define
a∗ (x, y, ξ) = a(y, x, ξ).
Then the operator

[Oph (a)]∗ := Oph (a∗ )
is the formal adjoint of Oph a, which satisfies
h[Oph (a)]∗ u, vi = hu, Oph (a)vi
for all u, v ∈ S.
Composition of pseudodifferential operators.

0
Let a ∈ S3n (hξik ) and b ∈ S3n (hxik ), and let A = Oph (a) and B =
Oph (b). The composition of A and B is defined formally by
Z
(A ◦ B)u(x) = (2πh)−n ei(x−y)ξ/h a(x, y, ξ)Bu(y) dy dξ
Z
= (2πh)−n ei(x−z)η/h ch (x, z, η)u(z) dz dη,
where
Z
ch (x, z, η) := (2πh)−n ei(x−y)(ξ−η)/h a(x, y, ξ)b(y, z, η) dy dξ.
To show that A ◦ B is again a pseudodifferential operator, we need to verify

that ch ∈ S3n (hξim ) for some m. We will prove the following:
0
Theorem 8.1.7 (Composition). Given a ∈ S3n (hξim ) and b ∈ S3n (hξim ),
0
there exists c ∈ S3n (hxim+m ) such that
Oph (a) ◦ Oph (b) = Oph (c).

A choice for c is given by

Z
−n
a#b(x, y, ξ) := (2πh) ei(x−z)(η−ξ)/h a(x, z, η)b(z, y, ξ) dz dη,
which satisfies
h|α| α α
X
a#b ∼ ∂ ∂ (a(x, z, η)b(z, y, ξ))η=ξ, z=x
i|α| α! z η
|α|≥0
0
in S3n (hξim+m ).
To prove this, we rely on the method of stationary phase, specifically in

the form of Exercise 7.5.6. In particular, an application of that result with
d = 2n, and Q the block matrix with −I in the top-right and bottom-left
corners, we may deduce
Z
h|α| α α
X
−n
(2πh) e−ixy/h u(x, y) dx dy = ∂ ∂ u(0, 0) + SN
i|α| α! x y
(8.2)
|α|≤N −1
for u ∈ Cc∞ (R2n ), where

X
S N . hN k∂xα ∂yβ (∂x ∂y )N ukL1 (R2n ) .
|α+β|≤2n+1
Proof of Theorem 8.1.7. Proceeding as above, we may write

Z
Oph (a) ◦ Oph (b)u(x) = lim (2πh)−n ei(x−y)ξ/h−εhξi cδ (x, y, ξ)u(y) dy dξ,
ε,δ→0+
where
Z
−n
cδ (x, y, ξ) := (2πh) ei(x−z)(η−ξ)/h−δhzi−δhηi a(x, z, η)b(z, y, ξ) dz dη.
0
We will show that cδ = O(hξim+m ) uniformly in δ, so that we may pass to
a limit c0 as δ → 0 (by dominated convergence). We will then show c0 ∈
0
S3n (hξim+m ), which allows us to send ε → 0 (and interpret the resulting
integral in the sense of oscillatory integrals).
We turn to the details. We define
|η−ξ|2 |x−z|2 −1 η−ξ x−z
L1 = 1 + h2
+ h2
[1 − h Dz + h Dη ].
We next let χ1 ∈ Cc∞ (R) satisfy

(
1 |s| ≤ 1
χ1 (s) =
0 |s| ≥ 2
and let χ(x, y) = χ1 (|x − y|). Choosing k ≥ |m| + 2n + 1, we can write
cδ (x, y, ξ)
Z
−n
ei(x−z)(η−ξ)/h [t L1 ]k e−δhzi−δhηi a(x, z, η)b(z, y, ξ) dz dη

= (2πh)
=: dδ (x, y, ξ) + eδ (x, y, ξ) + fδ (x, y, ξ),

where dδ includes the cutoff 1 − χ(ξ, η) before a, eδ includes χ(ξ, η)[1 −
χ(x, z)], and fδ includes χ(ξ, η)χ(x, z).
We regard dδ and eδ as perturbative terms. In particular, we can write
Z 0

hηim hξim
(2πh)n dδ (x, y, ξ) = O (1+h−1 |ξ−η|+h −1 |x−z|)k dz dη
|ξ−η|≥1
Z 0

hηim hξim
= O k−n− 1
dη.
(1+(2h)−1 |ξ−η|) 2
Thus, for example, if m ≥ 0, we can deduce (writing hηim . hξim +hξ −ηim )
that
1 0
|(2πh)n dδ (x, y, ξ)| . hk−n− 2 hξim+m ,
which is acceptable. We leave the remaining case m < 0 as an exercise (hint:
split into regions where |η| ≤ 21 hξi and |η| > 12 hξi and estimate each piece
separately). Similarly one can deduce that
1 0
|(2πh)n eδ (x, y, ξ)| . hk−n− 2 hξim+m
for k ≥ |m| + 2n + 1. We leave this estimate as an exercise. We also note
that one can get the same estimates for any number of derivatives, i.e.
0
|∂ α dδ (x, y, ξ)| + |∂ α eδ (x, y, ξ)| = O(h∞ hξim+m )
uniformly over (x, y, ξ) and δ > 0. In fact, taking derivatives just produces
powers of |x − z| or |η − ξ|, which can always be overcome by choosing k
larger.
It remains to consider fδ , which (undoing the integration by parts) has
the form
fδ (x, y, ξ)
Z
−n
= (2πh) ei(x−z)(η−ξ)/h χ(ξ, η)χ(x, z)e−δhzi−δhηi a(x, z, η)b(z, y, ξ) dz dη.
We will understand the behavior of this integral through the stationary

phase theorem (in the form (8.2)). We write z 0 = z − x and η 0 = η − ξ, so
that fδ has the form
Z
−n 0 0
fδ (x, y, ξ) = (2πh) e−iz η /h uδx,y,ξ (z 0 , η 0 ) dz 0 dη 0 ,
with appropriate
uδx,y,ξ (z 0 , η 0 ) ∈ Cc∞ (Rnz0 × Rnη0 ).
In particular, by (8.2),
h|α| α α δ

X
fδ (x, y, ξ) = ∂ z ∂η ux,y,ξ (z, η) + SN ,
i|α| α!
z=0,η=0
|α|≤N −1
where
X
|SN | . hN k∂zα ∂ηβ (∂z ∂η )N uδx,y,ξ kL1 (Rnz ×Rnη )
|α+β|≤2n+1
Z
0
. hN hηim hξim dz dη
|η−ξ|≤2, |x−z|≤2
0
. h hξim+m .
N
In fact, we can get the same bound for any derivatives of f .

Collecting the estimates, we can deduce that cδ (x, y, ξ) → c0 (x, y, ξ),
where
Z
c0 (x, y, ξ) = (2πh)−n ei(x−z)(η−ξ)/h (t L1 )k (a(x, z, η)b(z, y, ξ)) dz dη.
Furthermore, since the estimates above were uniform in δ, we can deduce

0
that c0 ∈ S3n (hξim+m ). Finally, taking the limit as δ → 0 in the stationary
phase approximation for f , we can deduce the asymptotic expansion for
a#b. This completes the proof.
Note that when a is a polynomial in ξ, so that Oph (a) is a differential

operator, then the formula giving a#b is an exact formula.
We now give an application of the composition theorem. We call a
symbol a ∈ Sd (g) elliptic if
|a| & g
uniformly on Rd × (0, h0 ] for some h0 . The following result shows how we
may use the composition theorem to invert elliptic symbols up to errors
that are O(h∞ ). (One can compare this to the case of Fourier multiplier
operators; in this case, if |m(ξ)| & 1 then the inverse operator is simply the
1
operator with symbol m(ξ) .)
Proposition 8.1.8. Let a ∈ S3n (hξim ) be elliptic. Then there exists b ∈

S3n (hξi−m ) such that
Oph (a) ◦ Oph (b) = 1 + Oph (r), where r = O(h∞ ) in S3n (1).
Similarly, Oph (b) ◦ Oph (a) = 1 + Oph (r0 ) for some O(h∞ ) symbol r0 .
Proof. LetPus only sketch the first claim. The idea is to construct b in the
form b ∼ hj bj in such a way to guarantee that a#b ∼ 1, where a#b is as
in the composition theorem. To do this, we firstly set b0 = a1 ∈ S3n (hξi−m )
(cf. the chain rule). We can then define bj for j ≥ 1 recursively. For
example, the linear in h terms in the sum will involve cα for |α| = 1 and
b0 , along with cα for |α| = 0 and b1 , which we use to define b1 (so that the
total contribution is zero). Proceeding in this way, we can construct b as
desired.
Quantization and symbolic calculus.

Classical observables are given as functions of the position x ∈ Rn and
momentum ξ ∈ Rn . We would therefore like to define pseudodifferential
operators with symbols depending only on 2n variables (x, ξ). There is an
inherent nonuniqueness here—indeed, the symbol xj ξj could be associated
with either xj · hDxj or hDxj · xj .
Now, for a ∈ S2n (hξim ) and t ∈ [0, 1], we have
a((1 − t)x + ty, ξ) ∈ S3n (hξim ).
Thus we may define
Opth (a) := Oph (a((1 − t)x + ty, ξ)).
In particular, we have:
• If t = 0, we get the standard or ‘left’ quantization.
• If t = 21 , we get the Weyl quantization (denoted by OpW

h (a)).
• If t = 1, we get the ‘right’ quantization.

The Weyl quantization yields a symmetric operator whenever a is real-

valued; for this reason it is useful in the setting of quantum mechanics.
We will state two results regarding quantization of symbols; for the de-
tails see [20, Section 2.7].
Proposition 8.1.9. Given b = b(x, y, ξ) ∈ S3n (hξim ) and t ∈ [0, 1], there
exists unique bt (x, ξ) ∈ S2n (hξim ) such that Oph (b) = Opth (bt ). In fact,
Z
−n 0
bt (x, ξ) = (2πh) ei(ξ −ξ)θ/h b(x + tθ, x − (1 − t)θ, ξ 0 ) dξ 0 dθ
R2n
in the sense of oscillatory integrals, and

X (−1)|α| h|α|
bt (x, ξ) ∼ ∂ξα ∂θα b(x + tθ, x − (1 − t)θ, ξ)|θ=0
i|α| α!
in S2n (hξim ).
In this result, bt is called the symbol of index t of B = Oph (b), and we
denote
bt = σt (B).
When t = 21 , we call b 1 = bW the Weyl symbol of Op(b).
2
This proposition is proven by seeking bt satisfying
Z Z
ei(x−y)ξ/h b(x, y, ξ) dξ = ei(x−y)ξ/h bt ((1 − t)x + ty, ξ) dξ.
This ultimately leads to the oscillatory integral above; the asymptotic ex-
pansion is a consequence of the stationary phase theorem (after isolating the
relevant part of the integral, as in the proof of the composition theorem).
In the case t = 0, one can write
σ0 (B)(x, ξ; h) = e−ixξ/h B(ei(·)ξ/h ).
Example 8.1.4. If V = V (x) ∈ Sn (1), then for every t ∈ [0, 1] we get
σt (−h2 ∆ + V ) = ξ 2 + V (x),
which is independent of t.
The next result we state concerns composition:
Theorem 8.1.10 (Symbolic calculus). Let a = a(x, ξ) ∈ S2n (hξim ) and b =
0 0
b(x, ξ) ∈ S2n (hξim ). For all t ∈ [0, 1], there exists unique ct ∈ S2n (hξim+m )
such that
Opth (a) ◦ Opth (b) = Opth (ct ).
In fact, one can write down a formula and asymptotic expansion for
the symbol ct in the preceding theorem. The proof proceeds by applying
the composition theorem to write the composition in the form Oph (c) for
some symbol c, which then satisfies Oph (c) = Opth (ct ) for suitable ct (by the
previous theorem). The asymptotic expansion is again a consequence of the
stationary phase lemma.
If A and B are pseudodifferential operators with symbols in S2n (hξim )
0
and S2n (hξim ), then for all t ∈ [0, 1] one can show
0
σt (A ◦ B) = σt (A)σt (B) + O(h) in S2n (hξim+m ).
In the case of the commutator of two operators, namely [A, B] := AB − BA,

we instead get
0
σt ([A, B]) = hi {a, b} + O(h2 ) in S2n (hξim+m ),
where {a, b} is the Poisson bracket defined by

∂a ∂b ∂a ∂b
{a, b} = ∂ξ ∂x − ∂x ∂ξ .
A symbol a ∈ S2n (hξim ) is said to be classical if it admits an expansion

of the form X
a(x, ξ; h) ∼ hj aj (x, ξ),
j≥0
where aj ∈ S2n (hξim ) do not depend on h, and a0 is not identically zero.

For ν ∈ R, we call hν a0 (x, ξ) the principal symbol of the classical pseudod-
ifferential operator A = hν Opth (a). In particular, changing the quantization
does not affect the classical character of a pseudodifferential operator, nor
does the principal symbol depend on the choice of quantization. We define
hν a0 = σp (A).
Then one has
σp (AB) = σp (A)σp (B) and σp ([A, B]) = hi {σp (A), σp (B)}.
L2 boundedness.
Our final topic will be to consider the L2 boundedness of pseudodiffer-
ential operators. To this point, we have only considered such operators as
acting on S or S 0 . We will prove the following result.
Theorem 8.1.11 (Calderón–Vaillancourt). Let a ∈ S3n (1). Then there

exists M = M (n) such that
X
kOph (a)kL2 →L2 .n k∂ α akL∞ (R3n ) .
|α|≤M
We will need the following lemma, which may be of independent interest.
Lemma 8.1.12 (Cotlar–Stein Lemma). Let H be a Hilbert space, {Aµ }µ∈Zd

a family of bounded linear operators on H, and ω : Zd → [0, ∞) satisfying
the following:
• For all µ, ν ∈ Zd ,
kAµ A∗ν k + kA∗µ Aν k ≤ ω(µ − ν).
P p
• C0 := µ ω(µ) < ∞.
Then for any M ≥ 0, we have

X

≤ C0 .
Aµ

|µ|≤M
P
Proof. First set S = |µ|≤M Aµ . Using Exercise 7.5.5, we have
kSk2m = kS ∗ Skm .
Next, observe that
kS ∗ Skm = k(S ∗ S)m k for m ≥ 1.
Indeed the ≥ direction is clear. For the reverse, we argue essentially as in

(A.3), using the fact that S ∗ S is a bounded, positive, self-adjoint operator.
1
In particular, we can write S ∗ S = [(S ∗ S)m ] m and use the general bound
kT θ k ≤ kT kθ ; this yields the desired estimate.
Now observe that
X m X
S∗S = A∗µ Aν = A∗µ1 Aν1 · · · A∗µm Aνm .
|µ|,|ν|≤M |µ` |,|ν` |≤M
By assumption, each summand obeys the bound
k · k ≤ kA∗µ1 Aν1 k · · · kA∗µm Aνm k ≤ ω(µ1 − ν1 ) · · · ω(µm − νm ).

Using instead the bound
kAµ k2 = kA∗µ Aµ k ≤ ω(0),
we can also bound each summand by

p p
k · k ≤ ω(0)ω(ν1 − µ2 ) · · · ω(νm−1 − µm ) ω(0).
Taking the geometric mean of the previous two estimates yields the bound
1
k · k ≤ [ω(0)ω(µ1 − ν1 )ω(ν1 − µ2 ) · · · ω(µm − νm )] 2 .
Thus, continuing from above, we perform one sum at a time (starting from
the νm sum, then µm , then νm−1 , . . . ) to get
X 1
k(S ∗ S)m k ≤ [ω(0)ω(µ1 − ν1 )ω(ν1 − µ2 ) · · · ω(µm − νm )] 2
|µ` |,|ν` |≤M
X p p
≤ ω(0)C02m−1 ≤ (2M + 1)d ω(0)C02m−1 .
|µ1 |≤M
Hence
kSk2m = k(S ∗ S)km ≤ (2M + 1)d
p
ω(0)C02m−1 .
1
Taking the 2m root of both sides and sending m → ∞ yields the desired
result.
Proof of Theorem 8.1.11. Using Proposition 8.1.9, we may find b ∈ S2n (1)
so that A = OpW h (b). Moreover, by integrating by parts in the integral
expression for b, we can control derivatives of b in terms of derivatives of
a. Thus we may take A = OpW h (a) for some a ∈ S2n (1). Furthermore, by
rescaling ξ 7→ hξ, we can further reduce to proving the theorem for the case
h = 1. That is, we may take A = OpW 1 (a).
∞ 2n
Now let χ0 ∈ Cc (R ) yield a partition of unity
P through the translations
χµ (z) = χ0 (z − µ) for µ ∈ Zd . In particular, µ χµ ≡ 1. We set aµ = aχµ
and observe that
|∂ α aµ | . sup k∂ β akL∞ uniformly.

|β|≤|α|
Define Aµ = OpW
1 (aµ ), so that
X
Au = Aµ u for all u ∈ Cc∞ (Rn ).
µ
This series can be summed in L2 , for example. To proceed, we will prove

estimates for the operators Aµ A∗ν and then apply the Cotlar–Stein lemma.
To this end, we write
Z
∗
Aµ Aν u(x) = Kµ,ν (x, y)u(y) dy,
where
Z
−2n y+z
Kµ,ν (x, y) = (2π) ei(xξ−yη−zξ+zη) aµ ( x+z
2 , ξ)āν ( 2 , η) dz dη dξ.
Now, aµ and aν are smooth and compactly supported, so that Kµ,ν is smooth
on R2n . We now use the operator
L = [1 + |x − z|2 + |y − z|2 + |ξ − η|2 ]−1 (1 + (x − z)Dξ − (y − z)Dη − (ξ − η)Dz ),
which obeys
L[ei(xξ−yη−zξ+zη) ] = ei(xξ−yη−zξ+zη) .
Thus for any N ≥ 0, we may integrate by parts N times to get
Z
−2n y+z
Kµ,ν (x, y) = (2π) ei(xξ−yη−zξ+zη) (t L)N [aµ ( x+z
2 , ξ)āν ( 2 , η)] dz dη dξ.
Now, if |µ − ν| is large enough, then in order for aµ (t, τ )āν (s, σ) to be

nonzero we must have
|µ − ν| ∼ |t − s| + |τ − σ|.
Thus, if we write µ = (µ1 , µ2 ) and ν = (ν1 , ν2 ) and define the following set
of (y, z, η, ξ),
Dµ,ν = {|µ − ν| ∼ |x − y| + |ξ − η|, |ξ − µ2 | . 1, |η − ν2 | . 1},
then we find
Z Z
|Kµ,ν (x, y)| dy . [1 + |x − z| + |y − z| + |ξ − η|]−N dy dz dη dξ,
Dµ,ν
where the implicit constant depends on
sup k∂ α ak2L∞ .
|α|≤N
Now we use the fact that
|x − z| + |y − z| ≥ |x − y|
and the definition of Dµ,ν to get the bound
[1 + |µ − ν|]2n+2−N
Z Z
|Kµ,ν (x, y)| dy . dy dz.
(1 + |x − z|)n+1 (1 + |x − y|)n+1
In particular,
Z
sup |Kµ,ν (x, y)| dy . sup k∂ α ak2L∞ · [1 + |µ − ν|]2n+2−N .
x |α|≤N
A similar argument yields

Z
sup |Kµ,ν (x, y)| dx . sup k∂ α ak2L∞ · [1 + |µ − ν|]2n+2−N .
y |α|≤N
Thus by Schur’s test (cf. Remark A.3.5), we deduce
kAµ A∗ν kL2 →L2 . sup k∂ α ak2L∞ · [1 + |µ − ν|]2n+2−N

|α|≤N
uniformly in µ, ν. The same argument handles A∗µ Aν .

We now choose N = 4n + 3 and apply the Cotlar–Stein lemma with
d = 2n and
ω(µ) ∼ (1 + |µ|)−2n−1 sup k∂ α ak2L∞ ,
|α|≤N
where the implicit constants depend only on the dimension. In particular,

we can get
X
k Aµ ukL2 ≤ C0 kukL2 uniformly in M ≥ 0 and u ∈ L2 ,
|µ|≤M
which then implies kAukL2 ≤ C0 kukL2 for any u ∈ L2 . This completes the
proof.
We close this section with a few applications.

First, combining Proposition 8.1.8 with the L2 boundedness result, we
have that if a ∈ S3n (hξim ) is elliptic then we may find b ∈ S3n (hξi−m ) such
that
Oph (a) ◦ Oph (b) = 1 + R1 and Oph (b) ◦ Oph (a) = 1 + R2 ,
where
kR1 kL2 →L2 + kR2 kL2 →L2 = O(h∞ ).
In particular, if m = 0 and h is sufficiently small, then Oph (a) is invertible

on L2 , with inverse satisfying
Oph (a)−1 = Oph (b) + O(h∞ ).
Finally, we prove the following estimate:
Proposition 8.1.13 (Garding inequality). Suppose a ∈ S2n (1) is real-

valued and satisfies a ≥ C1 . Then for any C1 > C, we have
OpW
h (a) ≥
1
C1 on L2 (Rn )
for h sufficiently small, i.e.
hOpW
h (a)u, ui ≥
1 2
C1 kukL2 for all u ∈ L2 .
Proof. Let C < C2 < C1 so that

q
1
a− C2 ∈ S2n (1).
Write q
B = OpW
h ( a− 1
C2 ),
so that B is bounded and self-adjoint on L2 (Rn ). By the symbolic calculus,

we may write
OpW
h (a −
1
C2 ) = B 2 + hR, where kRkL2 →L2 . 1.
Since B 2 ≥ 0, we may find C 0 so that
OpW
h (a −
1
C2 ) ≥ −C 0 h,
and hence
OpW
h (a) ≥
1
C2 − C 0h ≥ 1
C1
for h small enough.
8.2 Microlocal analysis

We begin this section by introducing the global FBI transform (here FBI
stands for Fourier–Bros–Iagolnitzer).
8.2. MICROLOCAL ANALYSIS 223
Definition 8.2.1. We define the operator T : S 0 → C ∞ by

Z
−n/2 −3n/4 2
T u(x, ξ; h) = 2 (πh) ei(x−y)ξ/h−(x−y) /2h u(y) dy
Rn
−n/2 −3n/4 2 /2h
=2 (πh) hu(y), ei(x−y)ξ/h−(x−y) iS 0 ,S .
The motivation for this definition is ‘microlocalization’, which refers to

the simultaneous localization in physical and Fourier space.
We record the basic properties of the FBI transform in the following
proposition.
Proposition 8.2.2.
2 /h
(i) For u ∈ S 0 , eξ T u(x, ξ; h) is holomorphic in z = x − iξ.
(ii) We have kT ukL2 (R2n ) = kukL2 (Rn ) .
(iii) We have hDx T u = (ξ + ihDξ )T u.

Proof. Let’s show (ii). We fix u ∈ Cc∞ (Rn ) and write
kT ukL2 (SM,N )
Z
−n −3n/2 0 2 /2h−(x−y 0 )2 /2h
=2 (πh) ei(y −y)ξ/h−(x−y) u(y)ū(y 0 ) dy dy 0 dξ dx,
SM,N
where
SM,N = {|x| ≤ M, |ξ| ≤ N }.
Using that
Z
−n 0
(2πh) ei(y −y)ξ/h dξ → δ(y 0 − y) as N →∞
|ξ|≤N
in the sense of distributions, we obtain

Z
−n/2 2
2
kT ukL2 (SM,N ) → (πh) e−(x−y) /h |u(y)|2 dy as N →∞
|ξ|≤M
→ kuk2L2 as M → ∞,
where we have used the dominated convergence theorem and the fact that
Z
2
e−(x−y) /h dx = (πh)n/2 .
Corollary 8.2.3. We have T ∗ T = Id on L2 .

This identity may be extended to the setting of tempered distributions,
and we can identify the defining integral in the sense of oscillatory integrals.
By definition, we call properties of T u the ‘microlocal’ properties of u.
Since u = T ∗ T u, we may hope that understanding the microlocal properties
of u may tell us information about u.
Definition 8.2.4. For h-dependent u ∈ S 0 (Rn ) and (x0 , ξ0 ) ∈ R2n , we say
that u is microlocally exponentially small near (x0 , ξ0 ) if there exists δ > 0
such that
T u(x, ξ; h) = O(e−δ/h )
uniformly in a neighborhood of (x0 , ξ0 ) and h > 0 small. The complementary
closed set is called the microsupport of u, denoted MS(u).
Our eventual goal is to study MS(u) for solutions to partial differential
equations of the form P (x, hDx )u = 0. We will utilize estimates for T u.
Lemma 8.2.5. Let p ∈ S2n (1). For any t ∈ [0, 1], we have
T ◦ Opth (p) = Opth (p̃) ◦ T,
where p̃ ∈ S4n (1) is defined by
p̃(x, ξ, x∗ , ξ ∗ ) = p(x − ξ ∗ , x∗ ).
Here x∗ , ξ ∗ are such that Op(x∗ ) = hDx and Op(ξ ∗ ) = hDξ .

The proof follows from direct computation, and so we will omit it.
We next suppose that Q = Opt (q(x, ξ, x∗ , ξ ∗ ) is a bounded pseudodiffer-
ential operator on R2n with q ∈ S4n (1) and t ∈ [0, 1] and let ψ : R2n → R
be smooth. We then have the following theorem.
Theorem 8.2.6. There exists q̃(x, ξ; h) ∈ S2n (1) and a bounded operator
R(h) : L2 (R2n ) → L2 (R2n ) such that for all u, v ∈ L2 ,
hQeψ/h T u, eψ/h T viL2 (R2n ) = h(q̃ + R(h))eψ/h T u, eψ/h T viL2 (R2n ) ,
where
∞
X
q(x, ξ; h) ∼ hj q̃j (x, ξ) in S2n (1),
j=0
q̃0 (x, ξ) = q(x, ξ, ξ − ∂ξ ψ, ∂x ψ),

kR(h)kL2 →L2 = O(h∞ ) as h → 0+ .
Sketch of the proof. Applying Taylor’s theorem, we may write
r1 (x, ξ, x∗ , ξ ∗ ) = q(x, ξ, x∗ , ξ ∗ ) − q(x, ξ, ξ − ∂ξ ψ, ∂x ψ)

= (x∗ − ξ + ∂ξ ψ)q1 + (ξ ∗ − ∂x ψ)q2 ,
where q1 , q2 ∈ S4n (1). We now set
F = hDx − ξ + ∂ξ ψ,
G = hDξ − ∂x ψ,
Qj = Opth (qj ), j = 1, 2.
Using the symbolic calculus, we may find r2 ∈ S4n (1) so that
Opth (r1 ) = 12 (Q1 F + F Q1 ) + 21 (Q2 G + GQ2 ) + hOpth (r2 ).
Using
(hDx − ξ)T = −ihDξ T
and setting Tψ (u) = eψ/h T u, we may also compute
(hDx − ξ + i∂x ψ)Tψ = (ihDξ − ∂ξ ψ)Tψ ,
which yields F Tψ = iGTψ . Using this and some algebraic manipulation, we

may obtain
hOpth (q)Tψ u, Tψ vi = hq(x, ξ, ξ − ∂ξ ψ, ∂x ψ)Tψ u, Tψ vi + hhOpth (q 0 )Tψ u, Tψ vi.
Repeating this argument for q 0 and iterating, we may obtain the expansion
for q.
Next, we let q̃ ∈ S2n (1) be a resummation of the series just obtained,
and we set
R = Πψ [Opth (q) − q̃]Πψ ,
where Πψ is the orthogonal projection onto the range of Tψ (which is a closed
subspace of L2 (R2n ). Using Calderon–Villaincourt and the expansion of q,
we can obtain
hRTψ u, Tψ vi = O(hN )kTψ ukL2 kTψ vkL2
for all N , so that

kRkL2 →L2 = O(h∞ ).
(To obtain this, split u ∈ L2 into u1 + u2 , where u1 is in the range of Πψ
and u2 is in Π⊥
ψ .)
Corollary 8.2.7. Suppose p ∈ S2n (1) extends holomorphically to the strip

{(x, ξ) ∈ C2n : | Im x| < a, | Im ξ| < b}
and ∂ α p = O(1) uniformly on this strip. Suppose also that ψ satisfies
k∇x ψkL∞ < b and k∇ξ ψkL∞ < a.
Fix t ∈ [0, 1] and let P = Opth (p). Finally, let f ∈ S2n (1). Then there exists
p̃(x, ξ; h) ∈ S2n (1) and R(h) : L2 → L2 such that
hf eψ/h T P u, eψ/h viL2 (R2n ) = h(p̃ + R(h))eψ/h T u, eψ/h iL2 (R2n ) ,
for any u, v ∈ L2 , with
X
p̃(x, ξ; h) ∼ hj p̃j (x, h) in S2n (1),
j≥0
p̃0 (x, ξ) = f (x, ξ)p(x − 2∂z ψ, ξ + 2i∂z ψ),

kR(h)kL2 →L2 = O(h∞ ),
where ∂z = 21 (∇x + i∇ξ ).
Sketch of the proof. Since
f eψ/h T P u = Qeψ/h T u,
where
Q = f eψ/h Opth (p(x − ξ ∗ , x∗ ))e−ψ/h ,
the proof boils down to showing that Q is a semiclassical pseudodifferential
operator that admits the correct expansion in powers of h. See [20] for the
details.
We also have:
Corollary 8.2.8. Under the assumptions above,
kf eψ/h T P uk2L2
= kf · p(x − 2∂z ψ, ξ + 2i∂z ψ)eψ/h T uk2L2 + O(h)keψ/h T uk2L2 .
Sketch of the proof. With Q as above, we have
kf eψ/h T P uk2L2 = hQ∗ Qeψ/h T u, eψ/h T ui.
We now observe that Q∗ Q is a semiclassical pseudodifferential operator, with
the first term in its asymptotic expansion equal to
|f (x, ξ)p(x − ξ ∗ − i∂ξ ψ, x∗ + i∂x ψ)|2 .
The result follows.
Finally, recalling u = T ∗ T u and choosing ψ ≡ 0, we obtain:
Corollary 8.2.9. Let p ∈ S2n (1) and P = Opth (p). Then there exists
p̃(x, ξ; h) ∈ S2n (1) and R(h) : L2 → L2 such that for all u, v ∈ L2 ,
hP u, viL2 (Rn ) = h(p̃ + R(h))T u, T viL2 (R2n ) ,
with
X
p̃(x, ξ; h) ∼ hj p̃j (x, ξ) in S2n (1),
j≥0
p̃0 (x, ξ) = p(x, ξ),

kR(h)kL2 →L2 = O(h∞ ).
As an application of the above, we establish the following:
Theorem 8.2.10 (Sharp Garding Inequality). Let p(x, ξ) ∈ S2n (1) be non-
negative. Then there exists C > 0 such that for all u ∈ L2 and for all h > 0
small enough,
hOpW 2
h (p)u, ui ≥ −ChkukL2 .
Proof. The left-hand side may be written
h(p̃ + R(h))T u, T ui.
Now,
|hR(h)T u, T ui| = O(h∞ )kT uk2L2 = O(h∞ )kuk2L2 .
On the other hand,
hp̃T u, T ui = hpT u, T ui + O(h)kT uk2L2

≥ −Chkuk2L2 .
We turn to some applications to PDE of the type P (x, hDx )u = 0, where

X
P = aα (x)(hDx )α .
|α|≤m
We begin with a few definitions.

Definition 8.2.11. Let m ∈ R. Denote by S2n hol (hξim ) the set of p ∈
S2n (hξim ) that can be extended holomorphically in a band
Σa = {(x, ξ) ∈ C2n : | Im x| + | Im ξ| < a}
for some a > 0 (independent of h).
Definition 8.2.12. Suppose p = p(x, ξ, h) ∈ S2n (hξim ) is of the form
p(x, ξ; h) = p0 (x, ξ) + ε(h)r(x, ξ; h),
where p0 , r ∈ S2n (hξim ) and limh→0 ε(h) = 0. For t ∈ [0, 1] given, let
P = Opth (p). The set
Char(P ) = {(x, ξ) ∈ R2n : p0 (x, ξ) = 0}
is called the characteristic set of P .
Our first result is the following.

hol (hξim ) have the same form as above. Suppose
Theorem 8.2.13. Let p ∈ S2n
Pu = 0 and kukL2 ≤ 1,
where P = Opth (p) for some t ∈ [0, 1]. Then
MS(u) ⊂ Char(P ).
Proof. By precomposing with Op(hξi−m ), we can assume m = 0.

We let (x0 , ξ0 ) ∈
/ Char(P ), and we want to show that (x0 , ξ0 ) ∈ / MS(u).
That is, we want to show T u is exponentially small near (x0 , ξ0 ).
We next let U be a complex neighborhood of (x0 , ξ0 ) and C0 > 0 such
that |p0 | > C10 on U . In particular, |p| > 2C1 0 on U for h small enough.
We now take ψ ∈ S2n (1) be real-valued and satisfy:
(i) ψ = 0 on U c ;
(ii) |∇x,ξ ψ| is small enough that
(x − 2∂z ψ, ξ + 2i∂z ψ) ∈ U for (x, ξ) ∈ U ∩ R2n ,
where ∂z = 21 (∂x + i∂ξ );
(iii) ψ = δ > 0 in a neighborhood of (x0 , ξ0 ).

This implies
|p(x − 2∂x ψ, ξ + 2i∂z ψ)| ≥ 1

2C0 for (x, ξ) ∈ U ∩ R2n .
Applying Corollary 8.2.8, we obtain
0 = keψ/h T P uk2L2
= kp(x − 2∂z ψ, ξ + 2i∂z ψ)eψ/h T uk2L2 + O(h)keψ/h T uk2L2 .
This yields
kp(x − 2∂z ψ, ξ + 2i∂z ψ)eψ/h T uk2L2 (U ∩R2n ) = O(h)keψ/h T uk2L2 ,
which in turn (using the properties of ψ) implies
keψ/h T uk2L2 (U ∩R2n ) = O(h) kT uk2L2 + keψ/h T uk2L2 (U ∩R2n ) ,

and hence
keψ/h T uk2L2 (U ∩R2n ) = O(h)kT uk2L2 = O(h).
If we now set
V = {(x, ξ) ∈ R2n : ψ(x, ξ) = δ} ⊂ U,
then we obtain
kT ukL2 (V ) = O(e−δ/h ),
as desired.
As an application, let us consider the following example.

Example 8.2.1. Let H = −h2 ∆ + V , where V ∈ Snhol (1) is real-valued.
Suppose u satisfies
Hu = Eu and kukL2 = 1,
where E = E(h). Suppose that
lim E(h) = E0 .
h→0
Then we may apply the theorem above with P = H − E to obtain
MS(u) ⊂ {(x, ξ) : ξ 2 + V (x) = E0 },
where the set on the right-hand side is the ‘energy surface’ associated with
E0 .
Propagation of the Microsupport.
For p = p(x, ξ) ∈ C ∞ (R2n ; R), we set

n
X ∂p ∂p ∂
Hp = ∂
∂ξj ∂xj − ∂xj ∂ξj =: ( ∂p ∂p
∂ξ , − ∂x )
j=1
as a vector field on R2n (the Hamilton field). Given (x0 , ξ0 ) ∈ R2n , we may
solve (locally) the Hamilton-Jacobi system
(
∂t (x(t), ξ(t)) = Hp (x(t), ξ(t))
(x(0), ξ(0)) = (x0 , ξ0 ),
∂p ∂p
i.e. ẋ = ∂ξ and ξ˙ = − ∂x . We denote this solution by
(x(t), ξ(t)) = (exp tHp )(x0 , ξ0 )
and call
(t, x, ξ) 7→ exp(tHp )(x, ξ)
the Hamilton flow associated to p. We observe that
∂
∂t exp(tHp )(x, ξ) = Hp exp(tHp )(x, ξ).
Example 8.2.2 (Harmonic Oscillator). Let p(x, ξ) = ξ 2 + x2 . Then, using
ẋ = 2ξ and ξ˙ = −2x,
we obtain ẍ + 4x = 0. Thus
x(t) = λ cos(2t) + µ sin(2t), ξ(t) = −λ sin(2t) + µ cos(2t)
for some µ, λ ∈ Rn . In particular,
exp(tHp )(x0 , ξ0 ) = (x0 cos 2t + ξ0 sin 2t, −x0 sin 2t + ξ0 cos 2t).
Proposition 8.2.14. For any (x0 , ξ0 ) ∈ R2n and for t, s small enough, we
have
exp((t + s)Hp ) = exp(tHp ) exp(sHp )
and
p exp(tHp (x0 , ξ0 )) = p(x0 , ξ0 ).
∂
Proof. For the second identity, we apply ∂t and observe that the left-hand
side is constant in t.
Remark 8.2.15. If p(x0 , ξ0 ) = 0 then p(x(t), ξ(t)) ≡ 0. We call a curve
{exp tHp (x0 , ξ0 ) : t ∈ (T0 , T1 )} with p(x0 , ξ0 ) = 0
a null bicharacteristic of p.
We turn to our last main result.
Theorem 8.2.16. Let
hol
p = p(x, ξ, h) = p0 (x, ξ) + ε(h)r(x, ξ, h) ∈ S2n (hξim )
be as above, and suppose u satisfies
Pu = 0 and kukL2 ≤ 1, where P = Opth (p).
Suppose (x0 , ξ0 ) ∈ R2n and that exp(tHp )(x0 , ξ0 ) exists for t ∈ (T0 , T1 ).
Then
(x0 , ξ0 ) ∈ MS(u) ⇐⇒ ∀t ∈ (T0 , T1 ) exp tHp0 (x0 , ξ0 ) ∈ MS(u)

⇐⇒ ∃t ∈ (T0 , T1 ) exp tHp0 (x0 , ξ0 ) ∈ MS(u).
That is, MS(u) is invariant under the flow of Hp0 .

Proof. By the previous proposition and the fact that MS(u) ⊂ Char(P ),
we may assume that p(x0 , ξ0 ) = 0. We may also assume without loss of
hol (1), we will
generality at m = 0 (see [20, Lemma 4.38]). With p0 , r ∈ S2n
show that
∀(x0 , ξ0 ) ∈ MS(u) ∃ δ > 0 : ∀t ∈ [−δ, δ] exp tHp (x0 , ξ0 ) ∈ MS(u).
With this, a standard ‘clopen’ argument finishes the proof.

We assume that ∇x,ξ p0 (x0 , ξ0 ) 6= 0 (for otherwise Hp0 (x0 , ξ0 ) = 0 and so
exp tHp (x0 , ξ0 ) = (x0 , ξ0 ) for all t).
We suppose towards a contradiction that for any δ > 0, there exists
tδ > 0 such that
(xδ , ξδ ) := exp tδ Hp (x0 , ξ0 ) 6∈ MS(u).
In particular, there exists αδ > 0 and neighborhood Wδ 3 (xδ , ξδ ) so that
T u(x, ξ; h) = O(e−αδ /h ) for all (x, ξ) ∈ Wδ .
Our task is then to show that this implies (x0 , ξ0 ) ∈

/ MS(u).
Lemma 8.2.17. For g ∈ S2n (1) (real-valued), there exists c > 0 such that
for all θ > 0 and v ∈ L2 ,
keθg/h T P vk2L2 ≥ θ2 k(Hp0 g)eθg/h T vkL2 − c(h + |ε(h)| + θ3 )keθg/h T vk2L2
uniformly for sufficiently small θ, h.
Sketch of the proof of the lemma. Applying Corollary 8.2.8 and Calderón–
Vailliancourt, we firstly obtain
keθg/h T P vk2L2 = kp0 (x − 2θ∂z g, ξ + 2iθ∂z g)eθg/h T vkL2

+ O(h + |ε(h)|)keθg/h T vk2L2 .
One can now apply Taylor expansion to deduce
Im[p0 (x − 2θ∂z g, ξ + 2iθ∂z g)] = θHp0 g + O(θ2 ) in S2n (1).
Feeding this into the bound above leads to the result.
The key to completing the proof is to apply the preceding lemma with
a good choice of function g.
First, recalling the nonvanishing of ∇x,ξ p0 , we may use a local change
of coordinates on a neighborhood V ⊃ W̄δ to coordinates (y1 , . . . , y2n ) for
which
Hp0 = ∂y∂ 1 and (xδ , ξδ ) 7→ (tδ , 0).
We now choose a, b ∈ R and c > 0 so that
{(y1 , y 0 ) : a ≤ y1 ≤ b, |y 0 | ≤ c} ⊂ Wδ ,
where (y1 , y 0 ) ∈ R × R2n−1 . We then let f = f (y1 ) be a smooth function

such that:
• f (0) = 21 αδ , with 0 ≤ f ≤ αδ ;
• f = 0 outside [−2d, b];
• f 0 ≥ 0 on [−2d, d] and |f 0 | ≤ β on [−d, a],
where d, β are sufficiently small. We also define χ0 : (−c, c) → [0, 1] so that
• χ0 = 1 on [−c/4, c/4];
• χ0 6= 0 on [−c/2, c/2];
1
• χ≤ 4 outside [−c/2, c/2].
Finally, let g(y) = χ(|y 0 |)f (y1 ) (extended to zero outside of V ), which then
obeys the following:
• Hp0 g = 0 on V c ,
• Hp0 g < 0 on Vδ := {(y1 , y 0 ) : y1 ∈ [−d, a], |y 0 | ≤ c/2},,
• g(x0 , ξ0 ) = 12 αδ , with g ≤ αδ everywhere.
αδ
• g≤ 4 for |y 0 | + |y1 | > c/2.
Now we apply the lemma above to deduce
k(Hp0 g)eθg/h T uk2L2 (Vδ ) ≤ C( h+|ε(h)|
θ2
+ θ)keθg/h T uk2L2 .
Noting that |Hp0 g| & 1 on Vδ , this implies
keθg/h T uk2L2 (Vδ ) . ( h+|ε(h)|
θ2
+ θ)keθg/h T uk2L2 .
For θ small (and h small), this implies
keθg/h T uk2L2 (Vδ ) . keθg/h T uk2L2 (R2n \Vδ ) .
Now we claim that
αδ
−βd)/h
keθg/h T ukL2 (R2n \Vδ ) = O(1 + eθ( 2 ).
Indeed, on V c we have g = 0, and on V \Vδ we may obtain g ≤ 12 αδ − βd
(which is ≥ 41 αδ for β small enough). In particular, restricting to a possibly
smaller neighborhood Vδ0 on which g ≥ 12 αδ − 12 βd, we have
kT ukL2 (Vδ0 ) . O(e−βdθ/2h + e−θ(αδ −βd)/2h . O(e−c̃/h ).
This implies that T u is microlocally exponentially small near (x0 , ξ0 ). (In
fact, the definition used the L∞ -norm, but it is also acceptable to use the L2
norm — see [20, Remark 3.2.4]). In particular, (x0 , ξ0 ) 6∈ MS(u), yielding
the desired contradiction.
We close with one last application.
Example 8.2.3. We return to the eigenvalue problem
Hu = Eu, kukL2 = 1,
with H = −h2 ∆ + V , V ∈ Snhol (1), and E = E(h) → E0 as h → 0. The null
bicharacteristics of P = H − E are given by
˙ = −∇V (x(t)), ξ(0)2 + V (x(0)) = E0 .
ẋ(t) = 2ξ(t), ξ(t)
These are the classical trajectories of energy E0 . In particular, MS(u) is a
union of maximal classical trajectories of E0 .
8.3 Exercises
Chapter 9
Sharp inequalities
9.1 Rearrangements and the sharp

Gagliardo–Nirenberg inequality
In this section and the next we will consider the problem of existence of
optimizers for some functional inequalities. We will consider some particular
cases of the Gagliardo–Nirenberg inequality and the Sobolev embedding
inequality, namely
1 3
kf kL4 ≤ CGN kf kL4 2 k∇f kL4 2 for f ∈ H 1 (R3 )
and
kf kL6 ≤ CSob k∇f kL2 for f ∈ Ḣ 1 (R3 ).
Here we use CGN and CSob to denote the best possible constant in these
inequalities. Our goal will be to prove that there exist functions that attain
the best constant. The basic idea is to take an optimizing sequence and
try to prove the existence of a limit, which one then proves is an optimizer.
However, one must contend with a lack of compactness due to the presence
of symmetries that leave the inequalities invariant, namely, translation and
scaling invariance. For example, suppose one already knew that there existed
an optimizer f ∗ to one of these estimates. Then fn := f ∗ (bn x + xn ) would
be an optimizing sequence for any choice of parameters bn ∈ (0, ∞) and
xn ∈ R3 . However, one can readily choose these parameters so that fn
converges weakly to zero (i.e. |xn | → ∞, bn → 0, or bn → ∞; see the
exercises).
Thus, to prove the existence of a limit, we need to restore the loss of com-
pactness. For the case of Gagliardo–Nirenberg, we first perform a rescaling
235
236 CHAPTER 9. SHARP INEQUALITIES
to suitably normalize the sequence. We will restore the loss of compactness

due to translations by taking radial decreasing rearrangements and exploit-
ing the compactness of the embedding Hrad 1 ,→ L4 . For the case of Sobolev
embedding, a different approach is needed, as the embedding Ḣrad1 ,→ L6 is
not compact. We will use the technique of concentration compactness and

profile decompositions, which allows us to understand precisely the ways in
which a bounded sequence in Ḣ 1 could fail to be compact. See [19] for an
alternate approach.
We begin by proving the following:
Lemma 9.1.1 (Gagliardo–Nirenberg inequality). There exists C > 0 such

that for all f ∈ H 1 (R3 ),
1 3
kf kL4 ≤ Ckf kL4 2 k∇f kL4 2 . (9.1)
Remark 9.1.2. This is a special case of a more general range of inequalities

of the form
1−θ
kf kLp . kf kθL2 k|∇|s f kL2 . (9.2)
We leave the investigation of the general case as an exercise.
Proof of (9.1). Using the triangle inequality and Bernstein estimates, we

have for any N0 ∈ 2Z the estimate
X
kf kL4 ≤ kfN kL4
N
X 3 X 1
. N 4 kf kL2 + N − 4 k∇f kL2
N ≤N0 N >N0
3
−1
. N0 kf kL2 +
4
N0 4 k∇f kL2 .
Optimizing in N0 yields the result.
We define the optimal constant CGN by

1 3
−1
CGN = inf{kf kL4 2 k∇f kL4 2 ÷ kf kL4 : f ∈ H 1 (R3 )\{0}}.
We will prove the following:
Theorem 9.1.3. There exists f ∈ H 1 (R3 )\{0} such that

1 3
kf kL4 = CGN kf kL4 2 k∇f kL4 2 .
9.1. SHARP GAGLIARDO–NIRENBERG 237
Remark 9.1.4. Using the Euler–Lagrange equation associated to the opti-

mization of Gagliardo–Nirenberg, one can deduce the existence of solutions
to the nonlinear elliptic partial differential equation
−∆Q + Q − Q3 = 0, Q : R3 → R.
To prove Theorem 9.1.3, we will need to develop a few tools, specifically,

the notion of a radial decreasing rearrangement, and several compactness
tools.
We first introduce radial rearrangements. We give here an abbreviated
introduction; for more details and further results, see [19, Chapter 3].
For a measurable set S ⊂ Rd , we define the radial rearrangement of
S (denoted S ∗ ) to be the ball centered at the origin such that |S ∗ | = |S|.
The radial rearrangement of a function f is then defined by
Z ∞
∗
f (x) = χ{|f |>λ}∗ (x) dλ,
0
where χS denotes the characteristic function of S. This can be compared

with the level set (or “layer cake”) decomposition
Z |f (x)| Z ∞
|f (x)| = dλ = χ{|f |>λ} (x) dλ. (9.3)
0 0
To make sense of this, we only consider functions such that |{|f | > λ}| is
finite for all λ > 0.
This definition guarantees that χ∗S = χS ∗ . Indeed, noting that
(
S λ ∈ (0, 1)
{χS > λ} =
∅ λ ≥ 1,
we find Z 1
χ∗S (x) = χS ∗ (x) dλ = χS ∗ (x).
0
By construction, the rearrangement f ∗ of a function f is a nonnegative,
radial (i.e. spherically symmetric), decreasing function. Furthermore, the
level sets of f ∗ are the rearrangements of the level sets of |f |, that is,
{f ∗ > λ} = {|f | > λ}∗ .
This implies that rearrangements preserve all Lp norms (cf. (A.1)).

We need the following estimate. We only sketch the proof; complete
details may be found in [19].
Proposition 9.1.5 (Riesz rearrangement inequality). For non-negative f, g, h,
hf, g ∗ hi ≤ hf ∗ , g ∗ ∗ h∗ i.
Proof. We first consider the one-dimensional case. Using the decomposition

(9.3), we may first reduce to the case when f, g, h are characteristic functions
of sets of finite measure. Using approximation by open sets, we further
reduce to the case of open sets. Writing open sets as countable unions
of open intervals and using monotone convergence, we further reduce the
problem to considering finite disjoint unions of open intervals, say
J1
X J2
X J3
X
f (x) = fj (x − aj ), g(x) = gj (x − bj ), h(x) = hj (x − cj ),
j=1 j=1 j=1
where fj , gj , hj are characteristic functions of an interval centered at the

origin. Now for t ∈ [0, 1] let
ZZ
Ijk` (t) = fj (x − taj )gk (x − y − tbk )h` (y − tc` ) dx dy
ZZ
= fj (x)gk (x − y)h` (y + t(aj − bk − c` )) dx dy
Z
=: ujk (y)h` (y + t(aj − bk − c` )) dy.
We have that ujk is a symmetric decreasing function of y, and so Ijk` is a

decreasing function of t.
We now start sending t ↓ 0. As soon as two intervals corresponding to
one of the functions intersect, we stop the process and redefine the function
so that it contains an interval that is the union of these two intervals. We
now repeat this process finitely many times until we are left with just three
intervals centered at the origin with the same measures the intervals com-
prising f, g, h (i.e. until we have constructed f ∗ , g ∗ , h∗ ). As this process has
only increased Ijk` , this implies the result.
To prove the higher dimensional case, we introduce the Steiner sym-
metrization. Given a direction e, we define the Steiner symmetrization of
A, denoted A∗e , to be the 1d symmetrization of A along lines that are par-
allel to e. Given a rotation ρ of Rd with ρe = e1 , we define ρf (x) = f (ρ−1 x)
and and let [ρf ]∗1 be the 1d symmetrization of ρf with respect to x1 (keeping
x⊥1 fixed). We then set
f ∗e = ρ−1 [(ρf )∗1 ].
This notion of symmetrization preserves measurability of sets and functions

(exercise).
We will focus on the case d = 2. Again, we may reduce to considering
f, g, h to be characteristic functions of finite measure sets, say A, B, C. By
the 1d rearrangement inequality, we have
I(A∗e , B ∗e , C ∗e ) := hf ∗e , g ∗e ∗ h∗e i ≥ hf, g ∗ hi = I(A, B, C)
for all directions e ∈ S1 .

Now let α > 0 be an irrational multiple of 2π and let Rα denote rotation
by angle α. Let X, Y denote Steiner symmetrization along the x, y axes.
We set
Ak = (Y XRα A)k
and similarly for Bk , Ck . Note |Ak | ≡ |A| and each Ak has reflection sym-
metry about both the x and y axes. Furthermore,
I(A, B, C) ≤ I(Ak , Bk , Ck ) ≤ I(Ak+1 , Bk+1 , Ck+1 )
for all k. The key is to prove that χAk → χA∗ in L2 along a subsequence, and
similarly for Bk , Ck . This suffices to complete the proof, since we estimate
along this subsequence
|I(A∗ , B ∗ , C ∗ ) − I(Ak , Bk , Ck )|
≤ kχA∗ − χAk kL2 kχB ∗ kL2 kχC ∗ kL1 + kχA∗ kL2 kχB ∗ − χBk kL2 kχC ∗ kL1
+ kχAk kL2 kχBk kL1 kχC ∗ − χCk kL2
→0 as k → ∞.
We now sketch how to prove the desired convergence. We first note that
each Ak is of the form
Ak = {(x, y) : |y| < ωk (|x|)}
where ωk is a symmetric decreasing function. We can further reduce to

the case that A, B, C are all contained in a ball. Then we can use uniform
boundedness of the ωk and a diagonal argument to find a subsequence on
which ωk converges at every rational point. Using monotonicity, we can get
convergence at all but countably many points (where jumps may occur). We
can then get L2 convergence of the χAk (and χBk , χCk ) along a subsequence,
say to some χÃ . To complete the proof, one needs to show that Ã is a ball,
for then it must necessarily be A∗ .
2
To see this, one can introduce γ = e−|x| consider the sequence ak =
kγ − χAk kL2 , which converges to a = kγ − χÃ kL2 . Using symmetries of γ
and the double reflection symmetry of Ã, we can deduce
Z Z Z Z
γRα χÃ ≤ γXRα χÃ ≤ γY XRα χÃ = γRα χÃ ,
which implies that Rα χÃ = XY Rα χÃ . This relies on Exercise 9.3.4. Using
this, we find χÃ = R2α χÃ . To complete the argument, one relies on the fact
that any θ may be approximated by irrational multiplies of 2π.
We will use the Riesz rearrangement inequality to establish the following

estimate.
Lemma 9.1.6. We have
k∇f ∗ kL2 ≤ k∇f kL2 .
This estimate is known as the Polya–Szegö inequality, and it actually

holds in Lp for all 1 ≤ p < ∞.
Proof. Recall the heat kernel et∆ introduced in Section 2.6. We compute
k∇f k2L2 = −hf, ∆f i

d t∆
= hf, − dt [e f ]|t=0 i
ZZ
1 2 − d2 −|x−y|2 /4t ¯
= lim t kf kL2 − (4πt) e f (x)f (y) dx dy .
t→0
Applying the Riesz rearrangement inequality and reversing the steps above,
we find
ZZ
2 1 ∗ 2 − d2 −|x−y|2 /4t ∗ ¯∗
k∇f kL2 ≥ lim t kf kL2 − (4πt) e f (x)f (y) dx dy
t→0
= k∇f ∗ k2L2 .
The next result we need is a compactness result due to Riesz.
Theorem 9.1.7 (Compactness in Lp ). Suppose {fn } ⊂ Lp satisfy the fol-

lowing:
• Boundedness: there exists M > 0 such that kfn kLp ≤ M for all n.
• Tightness: for any ε > 0, there exists R > 0 such that

Z
|fn (x)|p dx < ε
|x|>R
for all n.
• Equicontinuity: for any ε > 0, there exists δ > 0 such that

Z
|fn (x + y) − fn (x)|p dx < ε
for all n and all |y| < δ.
Then fn converges in Lp along a subsequence.
Proof.
R Let φ be a smooth bump function supported on the unit ball with
φ = 1. For R > 0, we define
fnR (x) = φ( R
x
)[Rd φ(R·) ∗ fn ](x)
for each n. For any R, we have that {fnR } is a sequence of continuous

functions on the compact set {|x| ≤ R}. Note that for any R > 0, the {fnR }
are uniformly bounded:
d
kfnR kL∞ ≤ kφkL∞ kRd φ(R·) ∗ fn kL∞ . kRd φ(R·)kLp0 kfn kLp . R p M
for all n. We next show that {fnR } are equicontinuous for fixed R > 0. As φ
is smooth, it suffices to show that for any ε > 0, there exists δ > 0 so that
kRd φ(R·) ∗ fn (x + y) − Rd φ(R·) ∗ fn (x)kL∞ < ε
for |y| < δ. As convolution is linear, this follows from the Lp -equicontinuity
of the fn and the estimates above.
By the Arzelá–Ascoli theorem, it follows that for any R > 0, every
subsequence of {fnR } has a subsequence that converges in L∞ ({|x| ≤ R})
and hence in Lp ({|x| ≤ R}).
Now let ε > 0. We claim that there exists R > 0 large enough that
kfn − fnR kLp < ε for all n. (9.4)
To see this, write
fn − fnR = [1 − φ( R
x x
)]fn + φ( R )[fn − Rd φ(R·) ∗ fn ].
The first term can be made small in Lp (uniformly in n) by employing

x
tightness. For the second term, we safely ignore φ( R ) and argue as in the
proof of approximation to the identity (Lemma A.3.2). Recalling that proof,
we see that we may make this term small uniformly in n due to the fact that
we have uniform boundedness and uniform equicontinuity for the functions
fn .
Finally, let us show that {fn } has a convergent sequence in Lp . To see
this, we will show that for any ε > 0, there exists J such that
{fn } ⊂ ∪Jj=1 Bε (fj ). (9.5)
Let us first see that this does the job. We apply this with the sequence
ε = 2−k to find a sequence fn,k satisfying fn,k+1 ∈ B2−k (fn,k ). Such a
subsequence is necessarily Cauchy and hence convergent in Lp .
In fact, by (9.4), it is enough to establish the total boundedness property
(9.5) for any {fnR }. However, this follows from sequential compactness. To
see this, suppose towards a contradiction that for some ε0 , we have
{fnR } 6⊂ ∪Jj=1 Bε0 (fjR ) for any J.
R such that
Then we can inductively build a subsequence fn,k
R R
fn,k+1 ∈
/ Bε0 (fn,k )
By construction, this sequence can have no convergent subsequence, which

yields a contradiction. This completes the proof.
Using this, we can establish the following result.
Lemma 9.1.8 (Compact embedding). Let Hrad 1 (R3 ) denote the set of radial
functions in H . Then Hrad (R ) is compactly embedded in L4 (R3 ).

1 1 3
Remark 9.1.9. This is a special case of a more general result. In particular,

1 (Rd ) is compactly embedded in Lp (Rd ) for 2 < p < 2d (where the
Hrad d−2
2d
exponent d−2 should be taken to be ∞ in dimensions d = 1, 2).
Proof. The fact that H 1 ⊂ L4 is a consequence of Gagliardo–Nirenberg.

We need to show that any bounded sequence {fn } in Hrad1 has a subse-
4
quence that converges in L . We will use Theorem 9.1.7. In particular we
need to check boundedness, equicontinuity, and tightness.
Boundedness in L4 is a consequence of Gagliardo–Nirenberg.
For equicontinuity, we argue as follows. We first apply Gagliardo–Nirenberg

to estimate
1 3
kfn (·) − fn (· + y)kL4 . kfn (·) − fn (· + y)kL4 2 k∇fn kL4 2 ,
so that it suffices to establish equicontinuity in L2 . In fact, this also follows

from H 1 boundedness. Indeed, using Lemma 2.5.8,
Z
kfn (· + y) − fn (·)kL2 ∼ |fˆn (ξ)|2 |eiy·ξ − 1|2 dξ
2
Z
. |y|2 |ξ fˆn (ξ)|2 dξ . |y|2 k∇fn k2L2 ,
which yields equicontinuity.

Finally, we need to prove tightness. Here we rely on the radial symmetry.
We write fn = fn (r) where r = |x| and denote the radial derivative by ∂r .
By the fundamental theorem of calculus, we have for any radial function f
and any r > 0,
2 ∞
Z
2 2

r |f (r)| = 2r
f (ρ)∂ρ f (ρ) dρ
Z ∞ r
. ρ|f (ρ)| ρ|∂ρ f (ρ)| dρ
0
Z 1 Z 1
2 2
2 2 2 2
. ρ |f (ρ)| dρ ρ |∂ρ f (ρ)| dρ . kf k2H 1 ,
where in the last line we compute the integral using spherical coordinates.
In particular, we have
Z Z
|fn (x)|4 dx . R−2 |x|2 |fn (x)|4 dx . R−2 kfn k2L2 k|x|fn k2L∞
|x|>R |x|>R
. R−2 kfn k4H 1 ,
which implies tightness. This completes the proof.
Remark 9.1.10. The estimate used to establish tightness is often called a

radial Sobolev embedding inequality.
With the requisite compactness tools in place, we can now prove exis-
tence of optimizers for Gagliardo–Nirenberg.
Proof of Theorem 9.1.3. Let
kf kL2 k∇f k3L2

J(f ) =
kf k4L4
−4
for f ∈ H 1 (R3 )\{0}, so that CGN = inf J(f ). We take a sequence fn ∈
1 3
H (R )\{0} satisfying
−4
lim J(fn ) = CGN .
n→∞
We now wish to pass to a limit; however, we need to make some modifications

to restore the compactness that is lost to scaling and translation symmetries.
We will first define
gn (x) = an fn (bn x)
for suitable an , bn . In particular, if we define
1
kfn kL2 kfn kL2 2
bn = and an = 3 ,
k∇fn kL2 k∇fn kL2 2
then
kgn kL2 = k∇gn kL2 = 1.
−4
Note that gn remains an optimizing sequence, i.e. J(gn ) → CGN .
∗
Next, we take radial rearrangements and define hn = gn . By Lemma 9.1.6
and the fact that rearrangements preserve Lp -norms, we have that hn is a
1 (in fact kh k
bounded sequence in Hrad n L2 ≡ 1 and k∇hn kL2 ≤ 1) satisfying
J(hn ) ≤ J(gn ).
In particular, hn is also an optimizing sequence.

By boundedness in Hrad 1 , we have that h converges to along a subse-
n
quence to a limit h, weakly in H 1 and strongly in L4 (cf. compact embedding
and Lemma A.2.3). Using Lemma A.2.3, we deduce
J(h) ≤ lim J(hn ) = inf J(f ),

n→∞
thus giving the existence of an optimizer, as desired.

9.2. SHARP SOBOLEV EMBEDDING 245
9.2 Concentration compactness and sharp Sobolev

embedding
In this section we will introduce techniques related to ‘concentration com-
pactness’, specifically the notion of a profile decomposition. We will apply
these techniques to prove the existence of optimizers for a Sobolev embed-
ding inequality proved in Section 6.2. These techniques also play an impor-
tant role in the setting of current research in nonlinear partial differential
equations.
Recall from Section 6.2 that we have the following general inequality:
there exists C > 0 such that for all f ∈ Ḣ 1 (R3 ),
kf kL6 (R3 ) ≤ Ck∇f kL2 (R3 ) . (9.6)
We define the optimal constant CSob by
CSob = sup kf kL6 (R3 ) ÷ k∇f kL2 (R3 ) : f ∈ Ḣ 1 (R3 )\{0} .

(9.7)
Our goal in this section is the following theorem.
Theorem 9.2.1 (Existence of optimizers for Sobolev embedding). There
exists f ∈ Ḣ 1 (R3 ) so that
kf kL6 (R3 ) = CSob k∇f kL2 (R3 ) ,
where CSob is as in (9.7).
Remark 9.2.2. Similar to the case of Gagliardo–Nirenberg, we can use
optimizers of Sobolev embedding to find solutions to the nonlinear elliptic
PDE −∆W = W 5 .
The basic idea is to take an optimizing sequence and to extract a suitable
limit, which can then be shown to be an optimizer. The key to proving con-
vergence is to use a ‘profile decomposition’, which decomposes an arbitrary
bounded sequence in Ḣ 1 into a sum of bubbles (with well-defined spatial
positions and scales), plus a remainder that vanishes in the L6 norm. We
then show that an optimizing sequence necessarily contains only one bubble.
The starting point is a refinement of Sobolev embedding that allows us
to identify a scale for concentration.
Lemma 9.2.3 (Refined Sobolev embedding). For f ∈ Ḣ 1 (R3 ),
2 1
kf kL6 . sup kfN kL6 3 k∇f kL3 2 ,
N
where fN = PN f denotes Littlewood–Paley projection.

Proof. We use the Littlewood–Paley square function estimate, Theorem 7.2.6.

Then
Z X 3
6 2
kf kL6 ∼ |fN | dx
N
X Z
∼ |fN1 |2 |fN2 |2 |fN3 |2 dx
N1 ,N2 ,N3
X Z
. |fN1 |2 |fN2 |2 |fN3 |2 dx.
N1 ≤N2 ≤N3
Continuing from above and applying Hölder’s inequality and Bernstein’s

inequality (Propostion 7.2.3), we find
X
kf k6L6 . kfN1 kL∞ kfN1 kL6 kfN2 k2L6 kfN3 kL6 kfN3 kL3
N1 ≤N2 ≤N3
4 X
N1
1
. sup kfN kL6 N3
2
k∇fN1 kL2 k∇fN3 kL2
N N1 ≤N2 ≤N3
4 X N1
1
log( N

. sup kfN kL6 N3
2
N1 )k∇fN1 kL2 k∇fN3 kL2 .
3
N N1 ≤N3
Applying Schur’s test, we deduce

X
kf k6L6 . [sup kfN k4L6 ] k∇fN k2L2 ,
N N
which implies the desired result.
Next, we combine this result with Hölder’s inequality to demonstrate

how to extract a bubble of concentration from a bounded sequence in Ḣ 1 .
Proposition 9.2.4 (Inverse Sobolev). Suppose {fn } ⊂ Ḣ 1 (R3 ) is a sequence

satisfying
lim sup k∇fn kL2 ≤ A and lim inf kfn kL6 ≥ ε (9.8)
n→∞ n→∞
for some A > 0 and ε > 0. Passing to a subsequence, there exists φ ∈ Ḣ 1 ,

λn ∈ 2Z , and xn ∈ R3 such that
1
λn2 fn (λn x + xn ) * φ(x)
weakly in Ḣ 1 , with
5
k∇φkL2 & ε( Aε ) 4 . (9.9)
We have the decouplings
lim k∇fn k2L2 − k∇[fn − φn ]k2L2 − k∇φn k2L2 = 0

n→∞
and
lim kfn k6L6 − kfn − φn k6L6 − kφn k6L6 = 0,

n→∞
where
−1
φn (x) := λn 2 φ( x−x
λn ).
n
Proof. Passing to a subsequence, we may assume the bounds in (9.8) hold

for each n. Then using Lemma 9.2.3, we may find a sequence Nn such that
1
kPNn fn kL6 & ε( Aε ) 2
for all n. By Hölder’s inequality and Bernstein’s inequality,

1 1 2
ε( Aε ) 2 . kPNn fn kL6 . kPNn fn kL3 2 kPNn fn kL3 ∞
−1 1 2
. Nn 3 k∇fn kL3 2 kPNn fn kL3 ∞
−1 1 2
. Nn 3 A 3 kPNn fn kL3 ∞ .
Thus
1 5
kPNn fn kL∞ & εNn2 ( Aε ) 4 ,
and hence there exists xn ∈ R3 such that
−1 5
Nn 2 |PNn fn (xn )| & ε( Aε ) 4 .
We now set λn = Nn−1 and define

1
hn (x) = λn2 fn (λn x + xn ).
As hn forms a bounded sequence in Ḣ 1 (R3 ), Alaoglu’s theorem implies that

hn converges weakly along a subsequence to some φ (cf. Lemma A.2.3). We
now claim that φ is nonzero; in particular, we have exhibited concentration
for fn at the physical scale λn and spatial position xn .
To prove the claim, we let K denote the convolution kernel of the

Littlewood–Paley projection P1 . Then
Z Z 1
K(x)hn (x) dx = K(x)λn2 fn (λn x + xn ) dx
Z
− 12 −1
= Nn Nn3 K(Nn (y − xn ))fn (y) dy = Nn 2 PNn fn (xn ).
In particular, by construction, we have

5
|hK, φi| = lim |hK, hn i| & ε( Aε ) 4 ,
n→∞
which by Hölder’s inequality and Sobolev embedding implies

5
ε( Aε ) 4 . kφkL6 . k∇φkL2 .
The Ḣ 1 decoupling follows from weak convergence and the fact that
hn * φ =⇒ k∇hn k2L2 − k∇[φ − hn ]k2L2 → k∇φk2L2
(check!).
For the L6 decoupling, we will need the following refined version of Fa-
tou’s lemma due to Brezis and Lieb [1]: if an is a sequence of Lp functions
with lim supn→∞ kan kLp < ∞ and an → a almost everywhere, then
kan kpLp − kan − akpLp → kakpLp . (9.10)
We prove this below. Assuming (9.10) for the moment (see below), we now
claim that 1
λn2 fn (λn x + xn ) → φ
almost everywhere along a subsequence. To see this, we first observe that
H 1 (K) ,→ L2 (K) is a compact embedding for any compact K ⊂ R3 (cf.
Theorem 9.1.7 and the proof of Lemma 9.1.8). Thus, the weak convergence
implies strong L2 convergence along a subsequence for any compact K ⊂
R3 . Using a diagonal argument, we can then deduce almost everywhere
convergence along a subsequence. (See Exercise 9.3.7.)
Thus, by (9.10), we have
1 1
kλn2 fn (λn x + xn )k6L6 − kλn2 fn (λn x + xn ) − φkL6 → kφk6L6 ,
which after a change of variables yields the desired L6 decoupling.

It remains to prove the refined Fatou lemma due to Brézis and Lieb [1].
Proof of (9.10). Let ε > 0 be arbitrary and define

Wε,n = |an |p − |a|p − |an − a|p − ε|an − a|p + ,

where + denotes the positive part. By assumption, Wε,n → 0 as n → ∞.

On the other hand,
|an |p − |a|p − |an − a|p ≤ |an |p − |an − a|p + |a|p

≤ ε|an − a|p + Cε |a|p + |a|p ,

where we have used the fact that for any ε > 0, there exists Cε > 0 such
that
|x + y|p − |x|p ≤ ε|x|p + Cε |y|p .

To see that this holds, consider the cases |y| |x| and |y| & |x| separately;
the first case leads to the first term on the right-hand side, while the second
case leads to the second term.
Continuing from above, we deduce
0 ≤ Wε,n ≤ (1 + Cε )|a|p ∈ L1 .
R
Thus the dominated convergence theorem implies that Wε,n → 0 as n →
∞. Now we observe that
|an |p − |a|p − |an − a|p ≤ Wε,n + ε|an − a|p ,

and hence we deduce

Z
|an |p − |a|p − |an − a|p dx . ε.

lim sup
n→∞
As ε > 0 was arbitrary, this completes the proof.
We can now turn to the main technical tool needed for the proof of
Theorem 9.2.1, namely, a profile decomposition adapted to the Sobolev em-
bedding inequality.
Proposition 9.2.5 (Profile decomposition). Let {fn } be a bounded sequence
∗
in Ḣ 1 (R3 ). There exist J ∗ ∈ {0, 1, 2, . . . } ∪ {∞}, profiles {φj }Jj=1 , positions
∗ ∗
{xjn }Jj=1 , and scales {λjn }Jj=1 such that along a subsequence in n we have
J
X 1 j
fn (x) = [λjn ]− 2 φj ( x−x J
j ) + rn
n
for 0 ≤ J ≤ J ∗.
λn
j=1
Furthermore, the following properties hold:

• We have Ḣ 1 decoupling:
J
X

2 j 2 J 2

sup lim supk∇fn kL2 −
k∇φ kL2 − k∇rn kL2 = 0.
J n→∞
j=1
• The remainder vanishes in L6 :
lim sup lim sup krnJ k6L6 = 0,

J→J ∗ n→∞
and we have the L6 decoupling

J
X

6 j 6

lim sup lim supkfn kL6 − kφ kL6 = 0.
J→J ∗ n→∞
j=1
Proof. We define rn0 ≡ fn . If fn → 0 in L6 , then we stop. Otherwise, we

apply inverse Sobolev to identify the first profile φ1 (and the parameters
(x1n , λ1n )) and define
1 1
rn1 = rn0 − [λ1n ]− 2 φ1 ( x−x
λ1
n
).
n
Ifrn1 → 0 in L6 ,then we stop. Otherwise, we again apply inverse Sobolev to

identify the next profile and parameters, defining rn2 analogously to above.
Proceeding in this way, we construct the profiles, parameters, and remainder
terms. We may need to apply this (countably) many times, passing to a
subsequence in n each time. This determines whether J ∗ is finite or infinite.
We now need to verify the properties claimed in the proposition. The
Ḣ 1 decoupling follows by induction and by construction.
Let us verify the vanishing of the remainder in L6 . We define
εJ = lim krnJ kL6 and AJ = lim k∇rnJ kL2 .

n→∞ n→∞
By construction AJ ≤ A0 . Thus, by (9.9) and construction, we have

εj−1 25 εj−1 5
k∇φj k2L2 & ε2j−1 Aj & ε2j−1 A0 ) .
2
Hence, by Ḣ 1 decoupling,
J J
εj−1 2
X X
ε2j−1 A0 ) . k∇φj k2L2 . A20 ,
j=1 j=1
which implies εj → 0 as j → ∞. Finally, the L6 decoupling follows from

induction and the vanishing of the remainder in L6 .
With the profile decomposition in place, we turn to the proof of Theo-

rem 9.2.1.
Proof of Theorem 9.2.1. We let {fn } be a sequence in Ḣ 1 \{0} such that
6
lim J(fn ) = CSob ,
n→∞
where
J(f ) := kf k6L6 ÷ k∇f k6L2
Let us first normalize the sequence by replacing
fn
fn with k∇fn kL2 ,
so that k∇fn kL2 ≡ 1. We now apply the profile decomposition to fn to write

J
X 1 j
fn = [λjn ]− 2 φj ( x−x J
j ) + rn
n
λn
j=1
along a suitable subsequence. Now observe by the L6 decoupling and Sobolev

embedding,
J ∗ J∗
X X
6
CSob = lim J(fn ) = kφj k6L6 ≤ 6
CSob k∇φj k6L2 .
n→∞
j=1 j=1
On the other hand, by the Ḣ 1 decoupling,

J ∗
X
k∇φj k2L2 ≤ 1.
j=1
Using the nesting `2 ⊂ `6 , this implies that all of the inequalities above are
equalities. In particular,
J ∗ J ∗
X X
k∇φj k6L2 = k∇φj k2L2 = 1.
j=1 j=1
Since the φj are all non-trivial, this implies that there must be only one
profile, say φ, which satisfies k∇φkL2 = 1. We therefore have
−1
fn = λn 2 φ( x−x
λn ) + rn ,
n
1
with λn2 fn (λn x + xn ) * φ and k∇fn kL2 = k∇φkL2 = 1. In particular,
1
λn2 fn (λn x+xn ) converges strongly to φ in H 1 and hence in L6 . In particular,
J(φ) = CSob 6 , i.e. φ optimizes Sobolev embedding.
9.3 Exercises
Exercise 9.3.1. Let f ∈ L2 (Rd ). Show that gn := f (x+xn ) converges weakly
d/2
to zero if |xn | → ∞. Show that hn = λn f (λn x) converges weakly to zero
if λn → 0 or λn → ∞.
Exercise 9.3.2. Prove that the embedding Ḣrad1 (R3 ) ,→ L6 (R3 ) is not com-
pact. Here the subscript ‘rad’ denotes the restriction to radial functions, i.e.
f satisfying f (x) = f (|x|).
Exercise 9.3.3. Investigate the allowed range of exponents in (9.2).
Exercise 9.3.4. Let f, g be nonnegative functions such that |{f > λ}| and
|{g > λ}| are finite for all λ > 0. Then
Z Z
f (x)g(x) dx ≤ f ∗ (x)g ∗ (x) dx.
Rd Rd
If f is strictly radially decreasing, then equality holds if and only if g = g ∗

a.e.
Exercise 9.3.5. Show that Steiner symmetrization preserves measurability
of sets and functions.
Exercise 9.3.6. Prove the existence of optimizers for Gagliardo–Nirenberg
by developing an appropriate profile decomposition.
Exercise 9.3.7. Let K ⊂ Rd be a compact set. Show that H 1 (K) ,→ L2 (K)
is a compact embedding. As a result, show that if gn * g weakly in H 1 (K),
then gn → g strongly in L2 (K) along a subsequence.
Chapter 10
Restriction theory and

related topics
Our goal in this chapter is to provide an introduction to restriction theory

and some related topics, including ‘Strichartz estimates’ in the setting of
Schrödinger equations.
10.1 Restriction theory and Strichartz estimates

In this section, we give a brief introduction to restriction theory, focusing
on an early result due to Strichartz [32].
Restriction theory concerns the basic question of when it makes sense to
restrict a function’s Fourier transform. In particular, given S ⊂ Rn (with
n ≥ 2) and a positive measure dµ supported on S, we can consider the
following two problems:
A. For which p ∈ [1, 2) do we have the following restriction estimate:
kfˆkL2 (dµ) . kf kLp (Rn ) ?
B. For which q ∈ (2, ∞] do we have the estimate

k(F dµ)ˆkLq (Rn ) . kF kL2 (dµ) ?
By duality, these problems are equivalent provided q = p0 (i.e. 1 1

p+q = 1).
Note that if f ∈ L1 ,then the Riemann–Lebesgue lemma tells us that fˆ
is continuous with fˆ → 0 at infinity. In particular, it makes sense to restrict
fˆ to sets of measure zero. On the other hand, if f is merely in L2 then
fˆ ∈ L2 and it does not necessarily make sense to restrict fˆ in this way.
253
254 CHAPTER 10. RESTRICTION THEORY AND RELATED TOPICS
Example 10.1.1. There exists a function belonging to Lp for all p > 1 whose
Fourier transform is infinite on an entire hyperplane. Let ψ : Rd−1 → C be
a bump function and set
f (x) = (1 + |x1 |)−1 ψ(x2 , . . . , xd ).
Then f ∈ Lp for all p > 1. However, if we now set S = {ξ ∈ Rd : ξ1 = 0},

then we observe that
Z
ˆ − d2 −ix ξ
f (ξ) = (2π) ψ̂(ξ2 , . . . , ξd ) e1+|x11 1| dx1 ≡ ∞
for ξ ∈ S.
The issue in the previous example is that the hyperplane S has no cur-
vature.
Following [32], we will first focus on the case that S is a quadratic surface;
we return to some more additional cases in Section 10.4. For simplicity, we
will restrict our attention to the case of the paraboloid
S = {x : xn = −[x21 + · · · + x2n−1 ]},
although more general quadratic surfaces may be treated by similar methods

(see [32]). In our case, the relevant measure dµ is simple to describe. In
particular, for a function F on Rn , we have
Z Z
F (x) dµ(x) = F (y, −|y|2 ) dy. (10.1)
Rn Rn−1
We begin by observing that by Plancherel and Hölder, we may write
kfˆk2L2 (dµ) = hfˆ, fˆdµi = hf, dµ

ˇ ∗ f i . kf kLp kdµ
ˇ ∗ f k p0 .
L
Thus, the restriction estimate in A. would hold for a choice of p ∈ [1, 2)

provided we could prove the convolution estimate
ˇ ∗ f k p0 . kf kLp
kdµ (10.2)
L
for the same choice of p. We will look for p such that (10.2) holds. We will
utilize the Stein interpolation theorem, Theorem 6.1.11.
In what follows, we will use the notation
x = (y, xn ) ∈ Rn−1 × R.
10.1. RESTRICTION THEORY 255
We introduce the the following analytic family of distributions:

z
Kz (x) = γ(z) xn − |y|2 + ,

(10.3)
where + denotes the positive part. Here z ∈ C and γ(z) is an analytic

function to be determined below with a simple zero at z = −1. To connect
these distributions to the problem above, let us prove the following:
Lemma 10.1.1. Define Kz as above and suppose γ has a simple zero at
z = −1. Then there exists a constant c such that
lim Kz → c dµ
z→−1
in the sense of distributions.

Proof. Let F be a test function on Rn . Then by a change of variables we
have
Z Z Z
2
z
Kz (x)F (x) dx = γ(z) F (y, xn + |y| ) xn + dxn dy
Z
z
F (y, ·) ∗ · + (−|y|2 ) dy.

= γ(z)
Recalling (10.1), the problem therefore reduces to establishing
γ(z) · ]z+ → cδ0 as z → −1,

(10.4)
where δ0 is the Dirac delta distribution. Indeed, with (10.4) in place we get
Z Z Z
Kz (x)F (x) dx → c F (−|y|2 , y) dy = c F (x) dµ(x),
as desired.
As γ is assumed to have a simple zero at z = −1, to prove (10.4) it is
enough to establish
(z + 1)[ · ]z+ → δ0 as z → −1,
which we now prove. We let H denote the Heaviside distribution

(
1 τ >0
H(τ ) =
0 τ < 0,
which satisfies ∂τ H = δ0 . We now observe that

z+1
(z + 1)τ+z = ∂τ τ+ .
Thus, for any test function ϕ, we get

h(z + 1)τ+z , ϕi = −hτ+z+1 , ϕ0 i = −hτ+z+1 H, ϕ0 i → −hH, ϕ0 i = hδ0 , ϕi
as z → −1. This completes the proof.
In light of the lemma above and (10.2), we have now reduced the restric-
0
tion estimate A. to establishing the Lp → Lp boundedness for the operator
f 7→ T−1 f := Kˇ−1 ∗ f,
with Kz as in (10.3). Writing Tz f = Kz ∗ f , applying Plancherel and Hölder
yields the estimates
kTz f kL2 ≤ kKz kL∞ kf kL2 , i.e. kTz kL2 →L2 ≤ kKz kL∞
kTz f kL∞ ≤ kǨz kL∞ kf kL1 i.e. kTz kL1 →L∞ ≤ kǨz kL∞ .
Then, to apply Stein’s interpolation result (Theorem 6.1.11) we will look for
x0 > 1 and an analytic function γ with a simple zero at z = 1 satisfying the
following on the strip −x0 ≤ Re z ≤ 0:
(i) |γ(x + iy)| has at most exponential growth as |y| → ∞,
(ii) Kz (x) is bounded when Re z = 0.
(iii) Ǩz (x) is bounded when Re z = −x0 .
Then, by Theorem 6.1.11, we can interpolate between the estimates at
Re z = 0 and Re z = −x0 to deduce
2x0
kT−1 kLp →Lp0 . 1, where p= x0 +1 .
Lemma 10.1.2. Let Γ denote the Gamma function

Z ∞
Γ(z) = e−t tz−1 dt.
0
Define
γ(z) = Γ(z + 1)−1
and let Kz be as in (10.3). Then γ has a simple zero at z = −1 and satisfies
items (i) and (ii) above. Furthermore, writing x = (y, xn ) as above, we have
2 −[z+ n+1 ]
Ǩz (x) = c exp{− i|y| z
4xn }i xn
2
(10.5)
for some constant c. In particular, item (iii) holds provided
Re z = −x0 := − n+1
2 .
10.1. RESTRICTION THEORY 257
This is the key lemma. With Lemma 10.1.2 in place, we conclude:
Theorem 10.1.3 (Restriction estimate for the paraboloid). The restriction

estimates A. and B. on the paraboloid S ⊂ Rn are valid for
2(n+1) 2(n+1)
p= n+3 and q = p0 = n−1 ,
respectively.
We turn to the proof of Lemma 10.1.2.
Proof of Lemma 10.1.2. For basic properties of the Gamma function, we

refer the reader to [31] (say). In particular, since Γ is nowhere zero, we get
that γ(z) is analytic. As Γ has a simple pole at z = 0, γ has a simple zero
at z = −1. One can also check (i) and (ii); for example, for (ii) we observe
that for z = iσ,
|Kiσ (x)| = |γ(iσ)| · |xn − |y|2 |iσ . |γ(iσ)|.

One can then check (using Stirling’s formula, say) that |Γ(1 + iσ)| → ∞ as
|σ| → ∞, which yields boundedness of |γ(iσ)|.
Thus, the proof boils down to establishing (10.5). This computation will
also explain the origin of the mysterious choice of γ.
We begin with a change of variables and contour integration. We write
x = (y, xn ) and the dual variable as ξ = (η, ξn ). Then
Z Z Z ∞
z 2
ei[yη+xn ξn ] xn − |y|2 + dx = ei[yη+ξn |y| ] eixn ξn xzn dxn dy

0
Z
z+1 −(z+1) 2
= i ξn Γ(z + 1) ei[yη+ξn |y| ] dy.
Next, a Gaussian integral computation (i.e. completing the square) implies

Z
2 − n−1 2
ei[yη+ξn |y| ] dy = c ξn 2 exp −i |η|

4ξn . (10.6)
Rn−1
In particular, continuing from above leads to

Z
z −[z+ n+1 ] 2
ei[yη+xn ξn ] xn − |y|2 + dx = cΓ(z + 1)iz ξn exp − i|η|
2

4ξn .
Multiplying both sides by γ(z) = [Γ(z + 1)]−1 yields (10.5), as desired.

Restriction theory remains an active area of research within harmonic

analysis. We have truly just scratched the surface by demonstrating a single
result. Instead of delving deeper into restriction theory at this point, we
will now return to one of the themes appearing frequently in these notes,
namely, applications of harmonic analysis to partial differential equations.
We will first show that the restriction estimate for the paraboloid is
equivalent for a space-time decay estimate for solutions to the Schrödinger
equation. In fact, today these estimates go by the name of Strichartz esti-
mates, in honor of Strichartz and his seminal work [32].
Recall that in Example 7.4.1, we used the Fourier transform to solve
the initial-value problem for the linear Schrödinger equation on Rdy × Rt . In
particular, the solution to
(
(i∂t + ∆y )u = 0
(10.7)
u|t=0 = φ ∈ L2 (Rd ),
is given by Z
− d2 2
u(y, t) = (2π) ei(yη−t|η| ) φ̃(η) dη, (10.8)
Rd
where we temporarily use φ̃ to denote the Fourier transform on Rd .

Proposition 10.1.4 (Strichartz estimate). Let u : Rd+1 → C be the solution
to (10.7). Then
kuk 2(d+2) . kφkL2 (Rd ) .
L d (Rd+1 )
Proof. Let n = d + 1. As above, we set

2(d+2) 2(n+1)
q= d = n−1 and p = q 0 .
We estimate the Lq (Rn ) norm of u by duality. Employing (10.8) and denot-
ing elements of Rn by (y, t) ∈ Rd+1 , we are led to estimate
ZZ Z
i(yη−t|η|2 )
e φ̃(η)f (y, t) dη dy dt = φ̃(η)fˆ(η, −|η|2 ) dη
for f ∈ Lp (Rn ), where fˆ denotes the Fourier transform on Rn . Thus, by

Plancherel, (10.1), and Theorem 10.1.3,
|hu, f i| . kφkL2 kfˆ(η, −|η|2 )kL2 (Rn )
. kφk 2 d kfˆkL2 (dµ)
L (R )
. kφkL2 (Rd ) kf kLp (Rn ) .
The result follows.
10.2. STRICHARTZ ESTIMATES 259
10.2 Strichartz estimates

In the previous section we established a space-time estimate for solutions to
the linear Schrödinger equation by means of a restriction estimate. In this
section we will extend the range of estimates by different methods.
Let us shift notation slightly and denote the solution u = u(t, x) to
(i∂t + ∆)u = 0, u(0) = φ
by
u(t) = eit∆ φ.
2
Here eit∆ is the Fourier multiplier operator with symbol e−it|ξ| , i.e.
2
eit∆ := F −1 e−it|ξ| F.
Our goal is to prove space-time estimates for eit∆ φ. We begin by estab-

lishing estimates for fixed time t. By Plancherel, the representation above
immediately yields the L2 bound
keit∆ φkL2 (Rd ) ≡ kφkL2 (Rd ) .
On the other hand, using Lemma 2.5.11, we can also write

2 d 2
eit∆ φ = F −1 [e−it|ξ| Fφ] = (2π) 2 F −1 [e−it|ξ| ] ∗ φ.
2
However, we have essentially already computed F −1 [e−it|ξ| ] in (10.6). In
particular, we find
Z
it∆ − d2 2
e φ(x) = (4πit) ei|x−y| /4t φ(y) dy,
Rd
which readily implies the ‘dispersive estimate’

d
keit∆ φkL∞ (Rd ) . |t|− 2 kφkL1 (Rd ) .
Then, by interpolation we deduce

−( d2 − pd )
keit∆ φkLp (Rd ) . |t| kφkLp0 (Rd ) for 2 ≤ p ≤ ∞, (10.9)
where as usual p0 denotes the Hölder dual to p, i.e. the solution to p1 + p10 =
1. This will be an essential ingredient in proving the following Strichartz
estimates.
Proposition 10.2.1 (Strichartz estimates). Let d ≥ 1 and suppose 2 <

q, r ≤ ∞ satisfy the scaling condition
2 d
q + r = d2 . (10.10)
Then
keit∆ φkLqt Lrx (R×Rd ) . kφkL2 (Rd ) . (10.11)
Proof. We seek to prove
T := eit∆ maps L2 → Lqt Lrx .
We will employ the method of T T ∗ . In particular, we need to compute the

adjoint T ∗ and prove that
0 0
T T ∗ : Lqt Lrx → Lqt Lrx ,
where q 0 and r0 denote the dual exponents to q and r.

To compute the adjoint, we rely on the fact that eit∆ is unitary on L2
for each t. Thus,
ZZ
hT f, Gi = eit∆ f (x)G(t, x) dx dt
Z Z
= f (x) e−it∆ G(t, x) dt dx,
which shows Z
∗
T G(x) = e−is∆ G(s, x) ds.
R
In particular, Z
T T ∗ F (t, x) = ei(t−s)∆ F (s, x) ds. (10.12)
R
We now use the dispersive estimate (10.9), the Hardy–Littlewood–Sobolev
inequality (Theorem 6.2.2), and the scaling relation (10.10) to estimate
Z
d d
kT T ∗ F kLq Lr . |t − s|−( 2 − r ) kF (s)k r0 ds q

t x Lx Lt
−2
. |t| q ∗ kF (t)kLr0 Lq
x t
− 2q
. k|t| k q
,∞ kF kLq0 Lr0
Lt2 t x
. kF kLq0 Lr0 ,
t x
yielding the desired bounds for T T ∗ . Note that the application of Hardy–
Littlewood–Sobolev requires q > 2.
By the method of T T ∗ (see e.g. Exercise 3.5.6), we deduce the desired
L → Lqt Lrx boundedness of eit∆ .
2
Remark 10.2.2. Note that this result covers the special case q = r =
2(d+2)
d covered in the previous section. The endpoint case q = 2 (which is
compatible with (10.10) only for d ≥ 2) is also allowed provided d ≥ 3; it is
valid in d = 2 in the radial setting. However, proving endpoint Strichartz
estimates is a much more challenging problem (see [17]).
Apart from the missing endpoint q = 2, one may ask about the optimality
of the condition (10.10); i.e. are there other exponent pairs for which we
may expect an estimate of the form (10.11)? If we insist on putting the
function φ in L2 , the answer is no. One can check that the scaling relation
is necessary by considering φλ (x) := φ(λx). Then we firstly have
d
kφλ kL2 (Rd ) = λ− 2 kφkL2 .
On the other hand, the solution to the linear Schrödinger equation with data
φλ is
2
eit∆ [φλ (·)](x) = [eitλ ∆ φ](λx),
and so
− 2q − dr
keit∆ [φλ (·)]kLqt Lrx (R×Rd ) = λ keit∆ φkLqt Lrx (R×Rd ) .
In particular, the estimate is only possible if the powers of λ match, which
is exactly (10.10).
One may also ask whether (10.11) could hold with (q, r) satisfying (10.10)
but 1 ≤ q < 2. The answer is also no. To see this, let us give the following
heuristic argument. Fix a nice function φ and consider the linear solution
u(t) = eit∆ φ. Supposing the Strichartz estimate holds in Lqt Lrx , we can
find a sufficiently long time interval around the origin (of length T , say) so
that ‘most’ of the norm is contained this interval. We now consider time-
translates of u by {tj }Nj=1 , which are separated by a length T . Then,
using time-translation symmetry of the equation, we should have
N
X
1
u(t − tj ) &Nq
Lqt Lrx (R×Rd )

j=1
(by splitting into the disjoint time intervals where

P each u(·−tj ) is nontrivial).
On the other hand, by linearity, we recognize u(t − tj ) as the solution to
P −itj ∆
the Schrödinger equation with data e φ. Thus, applying Strichartz,
the fact that e−it j ∆ φ and e−it k ∆ φ are almost orthogonal (provided |tj − tk |
is large enough), and Cauchy–Schwarz, we deduce
N N −it ∆
X X
1
u(t − tj ) .
e j
φ . N 2.
Lqt Lrx

j=1 j=1 L2
In particular, if 1 ≤ q < 2, then we can reach a contradiction by choosing

N large enough.
One can also prove that the endpoint estimate (d, q, r) = (2, 2, ∞) fails
in general, but is recovered in the case of spherically-symmetric solutions
(see the papers [21, 24]).
Strichartz estimates play an important role in the settling of nonlinear
Schrödinger equations. For example, Strichartz estimates are the essential
ingredient in the well-posedness theory for equations of the form
(i∂t + ∆)u = F (u), (10.13)
where F is a nonlinear function of u; common examples include power-type

or Hartree nonlinearities, i.e.
F (u) = λ|u|p u or F (u) = λ(|x|−γ ∗ |u|2 )u.
In particular, to connect Strichartz estimates to the well-posedness the-

ory for (10.13), observe that a variation of parameters argument (i.e. looking
for a solution in the form u(t) = eit∆ v(t)) leads to an equivalent integral
formulation, known as the Duhamel formula:
Z t
it∆
u(t) = e u0 − i ei(t−s)∆ F (u(s)) ds,
0
where u0 denotes the initial condition u|t=0 . In particular, solving the PDE
is equivalent to finding a fixed point of the operator
Z t
it∆
Φu(t) = e u0 − i ei(t−s)∆ F (u(s)) ds.
0
The usual strategy is to utilize the Banach fixed point theorem, which re-
quires proving that Φ is a contraction on a suitable complete metric space.
In particular, this requires mapping properties (i.e. estimates) for the oper-
ators appearing above.
We may apply Strichartz estimates directly to eit∆ . The remaining op-

erator Z t
F (t, x) 7→ ei(t−s)∆ F (s, x) ds
0
is similar to the operator T T ∗ appearing in (10.12) (for which we proved

estimates); however, it is not identical due to the truncation of the integral.
In the remainder of this section, we will discuss a result due to Christ
and Kiselev [5] that allows us to deduce bounds for the truncated operator
appearing in the Duhamel formula. The general result is the following:
Theorem 10.2.3 (Christ–Kiselev lemma, [5]). Let X and Y be Banach

spaces, and let
T : Lp (R; X) → Lq (R; Y ), 1 ≤ p < q < ∞,
be given by an integral transform

Z
T f (t) = K(t, s)f (s) ds, K : R × R → L(X, Y ).
R
Defining Z t
T̃ g(t) = K(t, s)g(s) ds,
−∞
we have that T̃ is bounded from Lp (R; X) to Lq (R; Y ).
Remark 10.2.4. In the generality stated above, the assumption p < q is

necessary (e.g. one can truncate the Hilbert transform to find a counterex-
ample with p = q = 2). We may allow p = 1 or q = ∞, in which cases p = q
is allowed; we leave the investigation of these details and extensions to the
reader.
Remark 10.2.5. To apply this in the setting of Strichartz estimates for the
Schrödinger equation, we use
Z
0
T f (t) = ei(t−s)∆ f (s) ds, p = q 0 , X = Lr , Y = Lr .
R
Thus we have p < 2 < q, and ei(t−s)∆ ∈ L(X, Y ) by the dispersive estimate.
Sketch of the proof of Theorem 10.2.3. Fixing f satisfying
kf kpLp (R;X) = 1,
we aim to prove
kT̃ f kLq (R;Y ) . 1.
We will use a ‘Whitney decomposition’ of the region S = {x < y} ⊂ R2 in

order to decompose K(t, s)f (s). In particular, we need the following:
Whitney decomposition (see [29]). We may decompose S into a countable
disjoint union of dyadic squares {Q}Q∈D , i.e. squares of the form
Q = Ijk × Ijl , Ijk = (k2−j , (k + 1)2−j ], j, k ∈ Z,
such that l(Q) ∼ d(Q, ∂S), where l(Q) denotes side-length of Q.

To prove this, one takes Qx to be the largest dyadic square containing x
with l(Q) < d(Q, ∂S). Then verify that this leads to the desired decompo-
sition.
Now, with the nondecreasing function F : R → [0, 1] defined by
Z t
F (t) = kf (s)kpX ds,
−∞
we claim that for any t ∈ R, we may decompose

X
K(t, s)f (s) = χπ2 Q (F (t))K(t, s)[χπ1 Q (F (s))f (s)] (10.14)
Q∈D
for almost every s < t, where π1 (A × B) = A and π2 (A × B) = B. Indeed, if

F (s) < F (t), then by properties of the Whitney decomposition there exists
a unique Q(s, t) ∈ D such that (F (s), F (t)) ∈ Q(s, t). Then by linearity,
K(t, s)f (s) = χQ(s,t) (F (s), F (t)) · K(t, s)f (s)

= χπ2 Q(s,t) (F (t))K(t, s) χπ1 Q(s,t) (F (s))f (s)]
= RHS(10.14).
On the other hand, for any t we have by construction

Z
kf (τ )kpX dτ = 0,
F −1 (F (t))
which implies that f (s) = 0 for almost every s < t with F (s) = F (t). Thus,
in this case we have that both sides of (10.14) are zero (almost surely in s).
Continuing from (10.14), we write

X Z
T̃ f (t) = χπ2 Q (F (t)) K(t, s)[χπ1 Q (F (s))f (s)] ds
Q∈D R
X
= χπ2 Q (F (t))T [(χπ1 Q ◦ F )f ](t)
Q∈D
∞ 2X
−1 j
X X
= [χIjk ◦ F ]T [(χπ1 Q ◦ F )f ],
j=0 k=0 Q∈D:π2 Q=Ijk
where we have omitted the terms in which π2 Q = Ij,−1 or Ij,2j (cf. F ∈

[0, 1]). Now observe that for fixed j, we have (using disjointness of supports
and the linearity and the boundedness of T )
j −1
∞ 2X q
kT̃ f kqLq (R;Y )
X X
≤
(χIjk ◦ F ) · T [(χπ1 Q ◦ F )f ]
q
j=0 k=0 Q∈D:π2 Q=Ijk L (R;Y )

X X q

≤ (χIjk ◦ F )T
(χπ1 Q ◦ F )f
q
j,k Q∈D:π2 Q=Ijk L (R;Y )
q
X X
≤
(χπ1 Q ◦ F )f
p
j,k Q∈D:π2 Q=Ijk L (R;X)
∞ 2X
−1 j
k(χπ1 Q ◦ F )f kqLp (R;X) .

X X
≤
j=0 k=0 Q∈D:π2 Q=Ijk
Now, by construction, if l(Q) = 2−j then we have
− qj
k(χπ1 Q ◦ F )f kqLp (R;X) ∼ 2 p .
Furthermore, using properties of the Whitney decomposition, for fixed j, k

there are a bounded number of Q such that π2 (Q) = Ijk . Therefore
∞ 2Xj
−1 ∞
− qj −j[ pq −1]
kT̃ f kqLq (R;Y )
X X
. 2 p . 2 .1
j=0 k=0 j=0
provided 1 ≤ p < q < ∞. This completes the proof.
We have succeeded in proving a range of Strichartz estimates for the

linear Schrödinger equation. Of course, this is not the end of the story; e.g.
we have left out the important endpoint estimate of [17], along with many
other extensions and possible refinements. To conclude our discussion, we
will show how to use these estimates to establish a local well-posedness result
for a nonlinear Schrödinger equation. (We also refer the reader to [3] for a
thorough textbook treatment, as well as [18] for an expository introduction
into some more advanced techniques.)
10.3 Application: local well-posedness for NLS

Let us focus our attention on the following one-dimensional cubic nonlinear
Schrödinger equation
i∂t u + ∂x2 u + |u|2 u = 0, (10.15)
which plays an important role in the setting of nonlinear optics. As described
above, this PDE is reformulated as the following integral equation:
Z t
u(t) = eit∆ u0 + i ei(t−s)∆ |u|2 u(s) ds,
0
where u0 ∈ L2 (R) is the initial condition. The existence of a solution is then

reduced to establishing the existence of a fixed point for a suitable operator.
For this, we rely on the following general result:
Lemma 10.3.1 (Banach fixed point theorem). Let (X, d) be a complete

metric space. Suppose Φ : X → X and
d(Φ(u), Φ(v)) ≤ 21 d(u, v) for all u, v ∈ X. (10.16)
Then there exists a unique u ∈ X such that Φ(u) = u.
Proof. Let u0 ∈ X and define the sequence {un } ⊂ X inductively via un+1 =
Φ(un ). Observe that by iterating (10.16), we have
d(uk+1 , uk ) ≤ 2−k d(u1 , u0 ).
Then for n > m, we have by the triangle inequality that

n−1
X
d(un , um ) ≤ d(uk+1 , uk )
k=m
n−1
X
≤ 2−k d(u1 , u0 ) ≤ 2 · 2−m d(u1 , u0 ) → 0 as m → ∞.
k=m
10.3. APPLICATION: LOCAL WELL-POSEDNESS FOR NLS 267
Thus there exists u such that un → u in X. Now observe that by the triangle
inequality we have
d(Φ(u), u) ≤ d(Φ(u), Φ(un )) + d(Φ(un ), u)
≤ 12 d(u, un ) + d(un+1 , u) → 0
as n → ∞. Hence Φ(u) = u.
To show uniqueness, we suppose that Φ(u) = u and Φ(v) = v. Then
d(u, v) = d(Φ(u), Φ(v)) ≤ 12 d(u, v),
whence u = v.
Proposition 10.3.2 (Local well-posedness for the 1d cubic NLS). Let u0 ∈

L2 with M = ku0 kL2 . Then there exists T = cM −4 and a unique solution
u ∈ L∞ 2 4 ∞
t Lx ∩ Lt Lx ([−T, T ] × R)
to Z t
it∆
u(t) = e u0 + i ei(t−s)∆ |u|2 u(s) ds.
0
Proof. Let T > 0 and C ≥ 1 to be determined below and define
X = {u ∈ L∞ 2 4 ∞
t Lx ∩ Lt Lx ([−T, T ] × R) :
kukL∞ 2 ≤ 2CM
t Lx
and kukL4t L∞
x
≤ 2CM },
with
d(u, v) = ku − vkL∞ 2
t Lx ([−T,T ]×R)
.
Throughout the proof, space-time norms will be taken over [−T, T ] × R.
As (X, d) is a closed subset of the complete metric space L∞ 2
t Lx ([−T, T ]×
R), it follows that (X, d) is a complete metric space.
We let Z t
it∆
Φ(u(t)) = e u0 + i ei(t−s)∆ |u|2 u(s) ds.
0
We will show that (i) Φ : X → X and that (ii) d(Φ(u), Φ(v)) ≤ 21 d(u, v) for
u, v ∈ X.
(i) Let u ∈ X. Then applying Strichartz estimates and Hölder’s inequal-
ity, we have
2
kΦ(u)kL∞ 2 ≤ CM + Ck|u| ukL1 L2
t Lx t x
1
≤ CM + C(2T ) 2 kuk2L4 L∞ kukL∞ 2
t Lx t x
1
≤ CM + C(2T ) (2CM )3 . 2
Thus if
1
T ≤ ,
128C 6 M 4
where C ≥ 1 is the constant in the Strichartz estimate, we have kΦ(u)kL∞ 2 ≤
t Lx
2CM . Similarly,
kΦ(u)kL4t L∞
x
≤ CM + k|u|2 uk 4
Lt3 L1x
1 1
≤ CM + (2T ) 2 kuk2L∞ L2x kukL4t L∞
x
≤ CM + C(2T ) 2 (2CM )3 ≤ 2CM
t
under the same constraint on T . Thus, with this choice of T we conclude

that Φ : X → X.
(ii) In what follows, we use the estimate
||u|2 u − |v|2 v| ≤ 4|u − v|{|u|2 + |v|2 }.
Then for u, v ∈ X, we obtain

2 2
kΦ(u) − Φ(v)kL∞ 2 ≤ Ck|u| u − |v| ukL1 L2
t Lx t x
1
≤ 4C(2T ) 2 {kuk2L4 L∞ + kvk2L4 L∞ }ku − vkL∞ 2
t Lx
t x t x
1
2
≤ 8C(2T ) (2CM ) ku − vkL∞
2 2.
t Lx
Thus for
1
T ≤
8192C 6 M 4
we obtain that
d(Φ(u), Φ(v)) ≤ 12 d(u, v).
With (i) and (ii) in place, we conclude that Φ has a unique fixed point,
yielding our desired solution.
10.4 Return to restriction theory; Tomas–Stein

lemma
In this section, we return to the restriction problem introduced in Sec-
tion 10.1. Along with the paraboloid (discussed in Section 10.1, this problem
has been widely studied in the setting of the sphere and the cone. Just as
the restriction theory for the paraboloid is connected to the Schrödinger
equation, the cone problem is connected to the linear wave equation.
10.4. MORE RESTRICTION THEORY 269
We will focus on the case of the sphere. Similar to the paraboloid case,
we will be interested in estimates of the form
kfˆS kLq (S,dσ) . kf kLp (Rd ) ,

(10.17)
where S = {ξ ∈ Rd : |ξ| = 1} and dσ denotes surface measure on the sphere.

We begin with a few remarks. First, if p = 1 then (10.17) holds for all
q ∈ [1, ∞] (cf. Hölder’s inequality and Hausdorff–Young). Next, if p = 2
then (10.17) fails for all q ∈ [1, ∞], as fˆ may be an arbitrary L2 function
(and hence could be ≡ ∞ on S).
Finally, note that if (10.17) holds with some pair (p, q), then it also holds
with any (p̃, q̃) with p̃ ≤ p and q̃ ≤ q. To see this, let ϕ be a bump function
with ϕ̂ ≡ 1 supported on B(0, 2). Then
fˆ|S = f[
∗ ϕ|S ,
so, by Hölder, Hausdorff–Young, and Young’s inequality, we have
kfˆ|S kLq̃ (S,dσ) . kf[

∗ ϕkLq . kf ∗ ϕkLp .ϕ kf kLp̃ .
Thus the goal is to take p, q as large as possible in (10.17).
As in Section 10.1, we can formulate a dual version of (10.17). Define
R̃ : Lp (Rd ) → Lq (S, dσ) by
R̃f = fˆ|S .
0 0
The adjoint R̃∗ : Lq (S, dσ) → Lp (Rd ) (where primes denote Hölder duals,
as usual) is computed by
Z
hR̃f, giS,dσ = R̃f (ξ)ḡ(ξ)dσ(ξ)
S
Z Z
− d2
= (2π) f (x) eixξ g(ξ) dσ(ξ) dx = hf, (g dσ)ˇiRd .
Rd S
In particular, R̃∗ g = (g dσ)ˇ, which is bounded if and only if

k(g dσ)ˇkLp0 (Rd ) . kgkLq0 (S,dσ) . (10.18)
We begin by finding necessary conditions for R̃, R̃∗ to be bounded.

Example 10.4.1. Let g ≡ 1. Then (10.18) becomes
kσ̌kLp0 (Rd ) . 1.
d−1
As |σ̌(x)| = O(hxi− 2 ) (cf. Exercise 7.5.7), we find that we must have
d−1 0 2d
2 p > d, i.e. p< d+1 . (10.19)
Example 10.4.2 (The Knapp example). Let R 1 and g = χK , where K is

a spherical cap centered at the south pole ξ0 of radius R−1 . Then K ⊂ D,
¯ ξd ),
where D is a disk of radius R−1 and thickness R−2 ; indeed, with ξ = (ξ,
q
¯ 2 + O(|ξ|4 ) = −1 + O(R−2 ).
¯ 2 = − 1 − 1 |ξ|

ξd = − 1 − |ξ| 2
Then Z
ixξ0
(g dσ)ˇ(x) = e ei(ξ−ξ0 )x dσ(ξ).
K
If |(ξ − ξ0 )x| . 1, then (g dσ)ˇ(x) is of size σ(K) ∼ R−(d−1) .

Now, observe that for ξ ∈ K, we have
|xd (ξ − ξ0 )d | . 1 if |xd | . R2 ,
while
|x̄(ξ − ξ0 )| . 1 if |x̄| . R.
In particular, (g dσ)ˇ(x) ∼ R−(d−1) throughout the ‘dual tube’ T to D,
centered at the origin, with height R2 and radius R. So
1 d+1
k(g dσ)ˇkLp0 & R−(d−1) |T | p0 & R−(d−1) R p0 .
On the other hand,

1
− d−1
kgkLq0 (S,dσ) . σ(K) q0 . R q0 .
Thus for (10.18) to hold, we must have

d+1
− d−1
R−(d−1) R p0 .R q0 , i.e. d+1
p0 ≤ d−1
q . (10.20)
The restriction conjecture for the sphere states that the necessary
conditions (10.19) and (10.20) are in fact sufficient.
Conjecture 10.4.1 (Restriction conjecture for the sphere). We have
kfˆ|S kLq (dσ) . kf kLp (Rd ) if and only if p< 2d

d+1 ,
d+1
p0 ≤ d−1
q .
This has been resolved completely in d = 2 [33], while the full result
remains open in higher dimensions.
In the rest of this section, we will discuss a positive result due to Tomas
and Stein.
10.4. MORE RESTRICTION THEORY 271
Theorem 10.4.2 (Tomas–Stein). We have

2(d+1)
kfˆ|S kL2 (S,dσ) . kf kLp (Rd ) whenever 1≤p≤ d+3 .
Note that when q = 2, the necessary conditions in Conjecture 10.4.1

reduce to
p ≤ 2(d+1)
d+3 ,
so this result is sharp for the choice q = 2.
Proof of Theorem 10.4.2. We will prove the result up to (but not including)
the endpoint p = 2(d+1)
d+3 . We begin by obtaining the smaller range
4d
1<p≤ 3d+1
ˇ recall the case p = 1 always holds.

by simply relying on the decay of dσ;
Denoting
Rf = fˆ|S and R∗ g = (g dσ)ˇ,
we will employ the method of T T ∗ and endeavor to prove R∗ R : Lp (Rd ) →
0
Lp (Rd ). Note that
R∗ Rf = (fˆ|S dσ)ˇ = f ∗ σ̌.
By Hardy–Littlewood–Sobolev (Theorem 6.2.2), we have
kf ∗ σ̌kLp0 . kf kLp kσ̌k p

,∞
L 2(p−1)
Now observe that since |σ̌| . |x|−γ for γ ∈ [0, d−1

2 ] (cf. Exercise 7.5.7),
we have
p
,∞ 2d(p−1)
σ̌ ∈ L 2(p−1) provided p ∈ (0, d−1
2 ], i.e. 1<p≤ 4d
3d+1 .
To go beyond this range, we employ the Littlewood–Paley partition of

unity and write X
f ∗ σ̌ = f ∗ (ϕσ̌) + f ∗ (ψN σ̌).
N >1
Because ϕ ∈ S and σ̌ ∈ L∞ , we may use Young’s inequality to write
kf ∗ (ϕσ̌)kLp0 . kf kLp kϕσ̌k p0 . kf kLp .

L 2
To estimate the sum, we wish to prove
kf ∗ (ψN σ̌)kLp0 . N −ε kf kLp for some ε = ε(p) > 0, (10.21)

for then we may sum to deduce the desired estimate.

To prove (10.21), we will interpolate between an estimate for p = 1 and
an estimate for p = 2.
First, for p = 1, we have by Hölder’s inequality and the decay of σ̌
d−1
kf ∗ (ψN σ̌)kL∞ . kf kL1 kσ̌kL∞ (|x|∼N ) . N −( 2
)
kf kL1 .
On the other hand, we have by Plancherel and Lemma 2.5.11
kf ∗ (ψN σ̌)kL2 . kf kL2 kψ̂N ∗ dσkL∞ .
Now, fixing x ∈ Rd , we may write
Z Z
Nd
|ψ̂N ∗ dσ(x)| ≤ N d |ψ̂(N (x − y))|dσ(y) . hN (x−y)im dσ
S S
for any m ≥ 0.
To estimate this integral, we split into two regimes: (i) y ∈ S with
|x − y| . N1 , which has a volume of ∼ N −(d−1) . In this regime the integrand
1
is bounded by N d . (ii) y ∈ S with |x − y| ∼ M for some M N , which
has a volume bound of M −(d−1) (which could of course be replaced by . 1
N
if M ≤ 1). In this case hN (x − y)i ∼ M , and if we choose m = d in the
estimate above we get that the integrand is bounded by M d .
Thus
X
|ψ̂N ∗ dσ(x)| . N d · N −(d−1) + M d M −(d−1) . N.
M ≤N
We now interpolate between the (1, ∞) and (2, 2) estimates and check the
range of p for which we end up with a negative power of N . As p1 = θ + 1−θ
2
2−p
when θ = p , we get
2−p
− d−1 2[ p−1 ]
kf ∗ (ψN σ̌)kLp0 . N 2 p N p kf kLp
2(d+1)
One can check that this power is negative provided p < d+3 . This com-
pletes the proof.
10.5 Exercises
Exercise 10.5.2. (Challenge problem.) Develop a profile decomposition adapted
to the Strichartz estimate in order to prove existence of optimizers to this
inequality.
Chapter 11
Additional topics
11.1 Linear Scattering Theory

In this section we consider a problem in ‘linear scattering theory’. We follow
the presentation of [26] rather closely. Our presentation will be rather brief.
We are interested in the long-time behavior of solutions to a linear
Schrödinger equation in the presence of an external potential. Restricting
attention to the three-dimensional case, we consider the equation
i∂t u = Hu, u|t=0 = f, (11.1)
where
H =∆+V for some V : R3 → R.
To keep things relatively simple, we assume that V is smooth and compactly
supported. We denote the solution to (11.1) by e−itH f ; we will describe this
operator more precisely below.
In this context, ‘scattering’ refers to the statement that as t → +∞
(say), the solution to (11.1) behaves like a solution to the ‘free’ Schrödinger
equation i∂t u = ∆u. That is, for any f ∈ L2 , there exists W+ f ∈ L2 so that
e−itH f − e−it∆ W+ f → 0 as t→∞ (11.2)
in a suitable topology (e.g. weakly in L2 or strongly in L2 ). We may

equivalently aim to establish the existence of the limit
W+ f = lim eit∆ e−itH f (11.3)

t→∞
for f ∈ L2 (using the same topology as above). One can consider the same
problem backward in time and construct an analogous operator W− . The
273
274 CHAPTER 11. ADDITIONAL TOPICS
composition W+ W−−1 is then known as the scattering operator and is relevant

for applications.
We will see below that H can have at most finitely many eigenvalues,
all of which must be negative. The existence of such eigenvalues, how-
ever, would rule out the possibility of (11.2). Indeed, if Hf = λf , then
e−itH f = eitλ f , which does not behave like a solution to the free equation.
Thus, we must either restrict to potentials V so that H has no eigenval-
ues (or understand that the results below apply only to f belonging to the
orthogonal complement in L2 of the space spanned by eigenfunctions of H).
Our approach will be to present the main ideas, taking several facts for
granted along the way. Then we will fill in some of the details below.
We start with the following.
Proposition 11.1.1. The limit (11.3) exists (as a weak limit in L2 ).
Proof. We need to show that for f, g ∈ L2 , the limit
lim heit∆ e−itH f, gi

t→∞
exists. Taking for granted the following fact:
• eitH is unitary on L2 ,
we can first observe that
|heit∆ e−itH f, gi| ≤ kf kL2 kgkL2
uniformly in t. Using this, we see that we may assume that f, g ∈ Cc∞ (R3 ).
Now, by the fundamental theorem of calculus, we find that the existence of
the limit would be implied by the following estimate:
Z ∞
d heit∆ e−itH f, gi dt < ∞.

dt
1
To prove this, we begin by computing
d it∆ −itH
dt e e f = eit∆ [i∆ − iH]e−itH f = −ieit∆ V e−itH f.
Thus, using unitarity of eit∆ , the dispersive estimate for eit∆ , and Cauchy–
11.1. LINEAR SCATTERING THEORY 275
Schwarz, we may estimate

Z ∞ Z ∞
d it∆ −itH
|hV e−itH f, e−it∆ gi| dt

he e f, gi dt ≤

dt
1
Z1 ∞
≤ kV e−itH f kL1 ke−it∆ gkL∞ dt
1
Z ∞ 3
. |t|− 2 kV kL2 ke−itH f kL2 kgkL1 dt
1
. kV kL2 kf kL2 kgkL1 ,
While it was fairly straightforward to establish the existence of W+ f

(albeit only as a weak limit in L2 ), the argument provided us with no insight
into the structure of W+ . Our second main result will be the following.
Proposition 11.1.2. The following formula holds:
W+ = F −1 FH
in the weak Cesaro sense, i.e.

Z ∞
lim hε e−εt eit∆ e−itH f dt, gi = hFH f, Fgi,
ε→∞ 0
where F is the usual Fourier transform and FH is the ‘distorted Fourier

transform’ adapted to H.
To make sense of this proposition, we should first describe the distorted

Fourier transform FH in some more detail.
Lemma 11.1.3. There exists a unitary operator FH : L2 → L2 of the form

Z
FH f (ξ) = c f (x)Φξ (y) dy
R3
such that
2 it|ξ|2
e−itH = FH
∗ −4π
e FH .
Here Φξ is a ‘distorted plane wave’ of frequency ξ that satisfies the general-
ized eigenvalue equation
(H + 4π 2 |ξ|2 )Φξ = 0.
Note that with this lemma, we already obtain unitarity of e−itH (one of
the facts needed above). Taking this lemma for granted for the moment, let
us prove the second proposition.
Proof of Proposition 11.1.2. We integrate by parts to write
Z ∞ Z ∞
hε e−ε eit∆ e−itH f dt, gi = hf, gi + e−εt h dt
d it∆ −itH
e e f, gi dt.
0 0
Computing as in the proof of the first proposition, we find
Z ∞
hW+ f, gi = hf, gi − lim e−εt hieit∆ V e−itH f, gi dt
ε→0 0
Noting that hf, gi = hFH f, FH gi, we find that the problem reduces to prov-
ing Z ∞
lim e−εt hieit∆ V e−itH f, gi dt = hFH f, FH g − Fgi.
ε→0 0
The left-hand side may be expanded as
Z ∞ ZZ
2 2
lim ci V (x)Φξ (x)e−4π it|ξ| FH f (ξ)eit∆ ḡ(x) dx dξ dt
ε→0 0
ZZ Z ∞
2 2
= lim ci Φξ (x)FH f (ξ)V (x) eit[∆+4π |ξ| +iε] dt ḡ(x) dξ dx
ε→0 0
ZZ
= lim ci Φξ (x)FH f (ξ)V (x)(∆ + 4π 2 |ξ|2 + iε)−1 ḡ(x) dξ dx.
ε→0
We now need the fact that

• the solution to (∆ + 4π 2 (k + iε)2 )w = h is given by
1
wε = h ∗ Gεk , where Ĝεk (ξ) = .
4π 2 [(k + iε)2 − |ξ|2 ]
The functions Gεk have a limit Gk as ε → 0.
Thus, the limit above reduces to
ZZ Z
− Φξ (x)FH f (ξ)V (x)ḡ ∗ G|ξ| (x) dx dξ = FH f (ξ)h−V Φξ ∗ G|ξ| , gi dξ.
We now write
Φξ (x) = e2πixξ + wξ (x),
so that wξ solves
(∆ + 4π 2 |ξ|2 + V )wξ (x) = −V e2πixξ .
(In fact, this is how we will construct Φξ below!) We then rely on another
fact:
• For any h ∈ L2 with compact support, the equation
(∆ + 4π 2 k 2 + V )w = h
has a unique solution obeying a certain ‘radiation’ condition; it satisfies
w = (h − V w) ∗ Gk .
Thus we may write
wξ = [−V e2πixξ − V wξ ] ∗ G|ξ| = −V Φξ ∗ G|ξ|

=⇒ −V Φξ ∗ G = Φξ − e2πixξ .
Continuing from above, we obtain

Z Z
FH f (ξ)h−V Φξ ∗ G|ξ| , gi dξ = FH f (ξ) hΦξ , gi − he2πixξ , gi dξ

= hFH f, FH g − Fgi,
as desired.
In the rest of this section, we therefore need to prove Lemma 11.1.3,

along with the facts used in the previous proof (see the bullet points above).
We need to address the following general problems:
• Solve the free Helmholtz equation (∆ + 4πk 2 )w = f .
• Establish existence (and uniqueness...) for the Helmholtz equation

(H + 4πk 2 )w = f with f ∈ L2 of compact support.
• Define the distorted Fourier transform and demonstrate it has the

properties needed above.
We discuss each of these in the following subsections.
11.1.1 Solving the free Helmholtz equation

Given k > 0 and f ∈ L2 of compact support, we seek solutions of
(∆ + 4π 2 k 2 )w = f
as limits of solution to the problem
(∆ + 4π 2 (k + iε)2 )wε = f, ε > 0.

The operator on the left-hand side is of the form T + iσ (with T closed and
self-adjoint and σ 6= 0) and hence is guaranteed to be invertible; indeed, one
can verify that
k(T + is)f k2L2 ≥ s2 kf k2L2 and k(T + is)∗ f k2L2 ≥ s2 kf k2L2 .
In fact, using the Fourier transform we may obtain the formula

e2πixξ
Z
ε ε
wε = f ∗ Gk , where Gk (x) = dξ.
4π 2 [(k + iε)2 − |ξ|2 ]
ε/k
By a rescaling argument, we may obtain Gεk (·) = kG1 (k·), and so it will
be enough to consider the case k = 1. For this, we have the following:
Lemma 11.1.4. For all x 6= 0, we have a limit Gε1 (x) → G1 (x) as ε → 0.
The function G1 obeys the bound
|G1 (x)| . |x|−1 for |x| ≤ C
along with the asymptotic formula

e2πi|x|
G1 = c + O(|x|−3/2 ) for |x| ≥ C,
|x|
and the ‘radiation condition’
∂r G1 (x) − 2πiG1 (x) = o(|x|−1 ) as |x| → ∞,

x
where ∂r = ∇ · |x| is the radial derivative.
Sketch of the proof. By radiality, it suffices to work with x = x1 e1 for some
x1 > 1 and ε ≤ hxi−100 . Let us just describe how to obtain bounds inde-
pendent of ε.
We use a partition of unity to split the integral defining Gε1 , that is,
e2πixξ
Z
dξ,
4π 2 [(1 + iε)2 − |ξ|2 ]
into the regions {|ξ| < 12 }, { 14 < |ξ| < 4}, and {N < |ξ| < 2N } for dyadic
N ≥ 4.
On the region |ξ| < 12 , we recognize the inverse Fourier transform of a
Schwartz function, yielding a term that decays fast as |x| → ∞. On a region
|ξ| ∼ N for N > 4, we can either estimate the term directly as
e2πixξ ϕ(ξ/N )
Z
dξ = O(N )
(1 + iε)2 − |ξ|2
or use the identity

e2πixξ = 1
2πix1 ∂ξ1 e
2πixξ
and integrate by parts repeatedly to obtain an estimate
e2πixξ ϕ(ξ/N )
Z
dξ = O(N · (N |x|)−5 ).
(1 + iε)2 − |ξ|2
We then obtain the acceptable bound

(
X |x|−1 |x| ≤ 1
N min{1, (N |x|)−5 } .
N ≥4
|x|−4 |x| & 1.
The main contribution is therefore coming from the region |ξ| ∼ 1, which
requires the most delicate analysis. We sketch the treatment of this term:
Using spherical coordinates ξ = ρω, we write this term as
Z Z 2πiρω1 x1
e ψ(ρ)
dω dρ
1 + iε − ρ
for a suitable cutoff function ψ. We then use a smooth cutoff to split into
three regimes: (i) ω1 > 14 , (ii) |ω1 | < 12 , and (iii) ω1 < − 14 .
The contribution of region (ii) can be seen to be rapidly decaying, which
we see as follows: We first split into regions where |ρ − 1| > |x|−50 and
|ρ − 1| ≤ |x|−50 , say:
For the first term, we see that the ω integral is an oscillatory integral
with non-stationary phase, and hence is O(hxi−100 ). The ρ integral can then
be seen to be O(loghxi), and hence this part is acceptable. For the second
term, we write
e2πiρω1 x1 ψ(ρ) = e2πiρω1 x1 ψ(1) + O(|ρ − 1|).
The O(|ρ−1|) term is readily estimated to obtain an acceptable contribution.

For the main term, the dρ integral works out to be iπ + O(|x|−50 ). This
leaves us to estimate Z
e2πiω1 x1 dω
|ω1 |< 12
which we may do by the principle of non-stationary phase.

Regions (i) and (iii) may be treated in a similar fashion, so let us only
consider region (i). We begin by writing
ψ(ρ) = ψ(1)χ(ρ) + (ρ − 1)ψ̃(ρ),

where χ, ψ̃ are bump functions and χ = 1 near ρ = 1. Then the ψ̃(ρ) integral
leads to an oscillatory integral with non-stationary phase and hence leads
to rapid decay in x. The χ integral can be dealt with by extending χ into
the complex plane and using contour integration. In the end, by shifting the
contour upward, we end up with a main contribution of
−2πiψ(1)e2πiω1 x1
by the residue theorem. In particular, the main term is of the form

Z
C e2πiω1 x1 dω,
ω1 > 41
which (by the method of stationary phase, with the stationary phase point
ξ = e1 ) can be seen to be of the form
ce2πix1 |x|−1 + O(|x|−3/2 )
for |x| > C.

To prove the radiation condition, we again consider x = x1 e1 and need
to prove
( ∂x∂ 1 − 2πi)G1 (x1 e1 ) = o(|x|−1 ).
In this case, the left-hand side can be expressed as a Fourier transform (like
G1 itself), with an additional factor of 2πi(ξ1 − e1 ). As this factor vanishes
at the stationary point ξ = e1 , we may obtain the improved estimate.
We note also that the function G1 in the above lemma obeys
(∆ + 4π 2 )G1 = δ0
in the sense of distributions.
11.1.2 Existence and uniqueness for the Helmholtz equation

We turn to the equation
(H + 4π 2 k 2 )w = f,
where f is a compact L2 function.

It is convenient to first address the question of uniqueness, which (by
linearity) is implied by the following.
Lemma 11.1.5 (Uniqueness). If w ∈ L2loc satisfies
(H + 4π 2 k 2 )w = 0
along with the radiation condition
w = O(|x|−1 ), and ∂r w − 2πikw = o(|x|−1 ) as |x| → ∞,
then w ≡ 0.
Brief sketch of proof. Using the radiation condition and Green’s theorem,
one can firstly obtain the identity
w = (−V w) ∗ Gk .
Further application of Green’s theorem and the radiation condition shows

Z
lim |w|2 dS = 0.
R→∞ |x|=R
Combining this with the identity for w above and using
Gk (Rω − x) = CR−1 e2πikR e−2πikxω + O(R−3/2 ),
this leads to Z
|V
d w(kω)|2 dω = 0
(more details may be found in [26]).

Now, by some standard ‘elliptic estimates’, we have that w is smooth,
and so V w ∈ Cc∞ . This implies rapid decay for V
d w, and since V
d w vanishes
on the sphere we get
V
d w(ξ)
ŵ(ξ) = 2 2
4π (k − |ξ|2 )
is also analytic and rapidly decreasing. Using this and integrating by parts
repeatedly in the Fourier inversion formula, we can deduce that w is com-
pactly supported.
The final ingredient is the following ‘Carleman estimate’:
ke2πN x1 wkL2 . N −2 ke2πN x1 ∆wkL2
for compactly supported w and N 1, which follows from the identities
F[eN x w](ξ) = ŵ(ξ − iN ), F[eN x ∆w](ξ) = −4π 2 |ξ − iN |2 ŵ(ξ − iN )

and the inequality

1 ≤ N −2 |ξ − iN |2 .
Using this and the equation for w, we finally obtain
ke2πN x1 wkL2 . N −2 ke2πN x1 (V + 4π 2 k 2 )wkL2

.k N −2 ke2πN x1 wkL2 ,
which (choosing N = N (k) sufficiently large) yields e2πN x1 w ≡ 0.
Given ε > 0, we define wε to be the (L2 ) solution to
(H + 4π 2 (k + iε)2 )wε = f,
where (as in the previous section) have that the operator on the left is
guaranteed to be invertible. We will show that wε is bounded in H 1 (BR (0))
for any R > 0. Then, using the compactness of the embedding H 1 (BR (0)) ,→
L2 (BR (0)), we may obtain a subsequential L2 limit of wε as ε → 0. (We can
then verify that the limit obeys w = (f − V w) ∗ Gk as well as the radiation
condition, and hence is the unique solution we are looking for.)
Lemma 11.1.6. The solutions wε are bounded in L2 as ε → 0 and hence
(by elliptic estimates) in H 1 (BR (0)) for any R > 0.
Proof. Suppose not. Then we may find a sequence of solutions wε with
kwε kL2 → ∞ and ε → 0. Normalizing the L2 -norm, we may obtain a
sequence w̃ε with kw̃ε kL2 ≡ 1 but
(H + 4π 2 (k + iε)2 )w̃ε → 0 in L2 .
Using elliptic estimates, we can deduce that w̃ε is compact in L2 (BR (0)) for
any R > 0 and hence we may extract an L2 limit w̃. But then we may derive
(H + 4π 2 k 2 )w̃ = 0 (and the radiation condition) and hence (by uniqueness)
w̃ ≡ 0. However, this contradicts the fact that kw̃ε kL2 ≡ 1.
11.1.3 The distorted Fourier transform

The previous section shows that we may find an L2 function wξ satisfying
the radiation conditions and the equation
(H + 4π 2 |ξ|2 )wξ = −V e2πixξ .
The ‘distorted plane wave’
Φξ (x) = e2πixξ + wξ (x)

then obeys the generalized eigenfunction equation
(H + 4π 2 |ξ|2 )Φξ = 0.
More generally, we can construct a Green’s function GH for H + 4π 2 k 2

(i.e. the integral kernel for (H + 4π 2 k 2 )−1 ), which is connected to Φ via
GH (x, rω, k) = Cr−1 e2πikr Φkω (x) + O(r−3/2 )
and
∂r GH (x, rω, k) = 2πikCr−1 e2πikr Φkω (x) + O(r−3/2 ).
In fact, an argument using Green’s theorem yields
Z
Im GH (x, y, k) = Ck Im Φkω (x)Φkω (y) dω
S2
(see [26]).
Given the facts above, let us conclude this section by defining the dis-
torted Fourier transform adapted to H and showing how it provides us with
a ‘spectral representation’ of H.
Let us begin by verifying a claim made above, namely:
Lemma 11.1.7. For any λ < 0, the spectral projection χ(−∞,λ) (H) has
finite rank.
Proof. Writing X for the range of χ(−∞,λ) (H), we have
hHf, f i ≤ λkf kL2 for f ∈ X.
Integrating by parts, this yields
k∇f kL2 .λ,V kf kL2 .
This implies that the unit ball in X with the L2 topology is a bounded
subset of H 1 , and hence compact. This in turn yields that X is finite-
dimensional.
In what follows (as we have done above), we therefore assume that H has
no negative eigenvalues (or that we restrict to the orthogonal complement
of the space of eigenfunctions of H).
To define functions of H, we will use the following Plemelj formula (which
follows from contour integration):
Z ∞
1
φ(x) = 2πi lim φ(λ)[(x + λ − iε)−1 − (x + λ + iε)−1 ] dλ for x > 0,
ε→0 0
or equivalently
Z ∞
φ(x) = 1
lim
π ε→0 φ(λ) Im[(x + λ − iε)−1 ] dλ.
0
Then, by functional calculus, we obtain

Z ∞
1
φ(H) = π lim φ(λ) Im(H + λ − iε)−1 dλ.
ε→0 0
By the Green’s function representation (and taking λ = 4π 2 k 2 , then setting

ξ = kω), we therefore obtain
Z ∞Z
φ(H)(x, y) = C φ(4π 2 k 2 ) Im GH (x, y, k) dy k dk
0 R3
Z ∞Z Z
=C φ(4π 2 k 2 )Φkω (x)Φkω (y) dω dy k 2 dk
0 R3 S2
ZZ
=C φ(4π 2 |ξ|2 )Φξ (x)Φξ (y) dy dξ
∗
= FH φ(4π 2 |ξ|2 )FH ,
where the distorted Fourier transform FH is defined by

Z
FH f (ξ) = C f (y)Φξ (y) dy.
R3
Now one can show that FH is a unitary operator on L2 , and setting

φ = e−itx we obtain the desired representation
2 |ξ|2
e−itH = FH
∗ −4itπ
e FH ,
which was the final ingredient needed above.

Appendix A
Prerequisite material
The purpose of this chapter is to collect prerequisite material that is used

throughout the main body of these notes.
Let us first recall some standard notation to be used throughout these
notes.
We use the following multi-index notation. Given d ≥ 1, a multi-index
α is an element of Nd , where N = {0, 1, 2, . . . }. We let
d
∂ |α| f
X
|α| = αi , xα = xα1 1 · · · xαd d , ∂αf = α1 α
∂x1 ···∂xd d
.
i=1
We write A . B to denote A ≤ CB for some C > 0. We write A B

to indicate A ≤ cB for some suitably small c > 0.
We write χE for the characteristic function of the set E, that is,
(
1 x∈E
χE (x) =
0 x∈ / E.
A.1 Lebesgue spaces

Given a Lebesgue measurable subset E ⊂ Rd of positive measure and 1 ≤
p < ∞, we define Lp (E) to be the space of measurable functions f such that
Z 1
p
p
kf kLp (E) = |f (x)| dx < ∞.
E
The functions f may be either real-valued or complex-valued. The quantity
k · kLp (E) defines a norm. When p = ∞, we define
kf kL∞ (E) = inf{M : |{x ∈ E : |f (x)| > M }| = 0},
285
286 APPENDIX A. PREREQUISITE MATERIAL
where |S| denotes the Lebesgue measure of S. The quantity k · kLp (E) also
defines a norm. We will often drop the underlying set E and simply write
Lp .
To be precise, elements of Lp should be regarded as equivalence classes
of functions that are equal almost everywhere; however, we will typically
ignore this distinction.
The spaces Lp are vector spaces. Furthermore they are complete with
respect to the metric defined by the Lp -norm (namely, d(f, g) = kf − gkLp ).
In particular, they are Banach spaces. Furthermore, for 1 ≤ p < ∞, we
have that the space Lp is separable (i.e. admits a countable dense subset).
On the other hand, L∞ is not separable.
For spaces of sequences c = {ck } we use
X 1
p
p
kck`p = |ck | , kck`∞ = sup |ck |.
The `p spaces are nested; that is, for p1 ≤ p2 we have
`p1 ⊂ `p2 , with kck`p2 ≤ kck`p1 .
The space L2 admits an inner product, denoted by

Z
hf, gi = f (x)ḡ(x) dx,
where ¯· denotes the complex conjugate. Finiteness of hf, gi follows from the
Cauchy–Schwarz inequality (or Hölder’s inequality):
|hf, gi| ≤ kf kL2 kgkL2 .
The space L2 is therefore an example of a (separable) Hilbert space, i.e.

an inner product space that is complete with respect to the metric induced
by the inner product.
Given 1 ≤ p ≤ ∞, we define 1 ≤ p0 ≤ ∞ via
1 1
p + p0 = 1.
We call p0 the dual exponent to p. It is often useful to compute Lp norms

‘by duality’, i.e. using the fact that
kf kLp = sup |hf, gi|,

0
where the supremum is taken over all g ∈ Lp with kgkLp0 = 1.
A.1. LEBESGUE SPACES 287
For a measurable function f , we define the distribution function of f

by
α 7→ |{x : |f (x)| > α}|.
We can compute the Lp -norm of a function in terms of its distribution
function as follows:
Z Z ∞
p
|f | dx = p αp−1 |{|f | > α}| dα. (A.1)
0
For 1 ≤ p < ∞, we define the weak Lp space by

Lp,∞ (Rd ) = {f : Rd → C : kf kLp,∞ < ∞},
where the Lp,∞ quasi-norm is defined by
1
kf kLp,∞ = sup α{|f | > α} p .
α>0
A quasi-norm refers to a quantity that satisfies all of the hypotheses of a

norm except the triangle inequality, which must be replaced by kx + yk ≤
C(kxk + kyk) for some C > 0. See the exercises.
When p > 1, the Lp,∞ norm is equivalent to an actual norm, while for
p = 1 there is no equivalent norm, but there is a metric that generates the
same topology. In either case, one obtains a complete metric space.
Note that Lp ⊂ Lp,∞ ; indeed, by Tchebyshev’s inequality we have
αp |{|f | > α}| . kf kpLp
uniformly in α.
We have Minkowski’s integral inequality:
kf (x, y)kLpx L1y ≤ kf (x, y)kL1y Lpx
for 1 ≤ p ≤ ∞.
Let us briefly discuss one other theory of integration, namely, the Riemann–
Stieltjes integral. First recall the definition of functions of bounded varia-
tion.
Definition A.1.1. Let f : [a, b] → R, and let
Γ = {x0 , . . . , xm }
be a partition of [a, b]. Define
m
X
SΓ = SΓ [f ; a, b] = |f (xi ) − f (xi−1 )|.
i=1
The variation of f over [a, b] is defined by

kf kBV ([a,b]) = sup SΓ [f ; a, b].
Γ
As 0 ≤ SΓ < ∞, we have kf kBV ∈ [0, ∞]. If kf kBV < ∞, we say f is of

bounded variation. We may write f ∈ BV ([a, b]). Otherwise, we say f is
of unbounded variation.
Examples of bounded variation functions are those that are continuously
differentiable on an interval. For the reader unfamiliar with the notion of
bounded variation, feel free to replace ‘function of bounded variation’ with
‘continuously differentiable function’ and ‘BV norm’ with ‘L∞ norm of the
derivative’ throughout these notes.
Let us also recall the definition of Riemann–Stieltjes integration.
Definition A.1.2. Let f, φ : [a, b] → R. Let Γ = {xi }m
i=0 be a partition of
m
[a, b] and let {ξi }i=1 satisfy
xi−1 ≤ ξi ≤ xi for each i.
The quantity
m
X
RΓ := f (ξi )[φ(xi ) − φ(xi−1 )]
i=1
is called a Riemann–Stieltjes sum for Γ.
If
I = lim RΓ
|Γ|→0
exists and is finite, then I is called the Riemann–Stieltjes integral of f
with respect to φ on [a, b], denoted
Z b Z b
I= f (x) dφ(x) = f dφ.
a a
We recall the following results concerning Riemann–Stieltjes integrals.
Proposition A.1.3. Suppose f ∈ C([a, b]) and φ ∈ BV ([a, b]). Then
Rb
a f dφ exists, and Z b

f dφ ≤ kf kL∞ kφkBV .
a
Rb
Proposition A.1.4 (Integration by parts formula). If a f dφ exists, then
Rb
so does a φ df , and
Z b Z b
f dφ = [f (b)φ(b) − f (a)φ(a)] − φ df.
a a
A.2. HILBERT SPACES 289
A.2 Hilbert spaces

We record here a few basic results concerning Hilbert spaces. Recall that
a Hilbert space is a vector space equipped with an inner product that is
complete with respect to the induced norm. In these notes, we restrict our
attention to the setting of separable Hilbert spaces (i.e. spaces that admit
a countable dense subset).
Lemma A.2.1. Let H1 and H2 be Hilbert spaces. A linear operator T :
H1 → H2 is bounded if and only if it is continuous.
Proof. As T is linear, it suffices to verify continuity at 0. This follows directly
from boundedness, cf.
kT (f )kH2 ≤ M kf kH1 .
Conversely, boundedness and linearity readily imply continuity; cf.
kT (f ) − T (g)kH2 ≤ M kf2 − f1 kH1 .
Lemma A.2.2 (Riesz representation theorem). Suppose ` : H → R is a

continuous linear functional on a real Hilbert space. Then there exists a
unique g ∈ H such that
`(f ) = hf, gi,
where h·, ·i denotes the Hilbert space inner product.
Proof. Let {ϕn } be an orthonormal basis for H. To obtain such a basis, start
with a countable dense subset of H and apply the Gram–Schmidt algorithm.
Given a function f ∈ H, write fn = hf, ϕn i denote the ‘Fourier coeffi-
cients’ of f relative to {ϕn }. (See Section 2.2 for more details.) In particular,
we can uniquely specify a function by prescribing its Fourier coefficients.
We define g by prescribing gn = `(ϕn ). Then by Plancherel’s theorem
X
`(f ) = fn gn = hf, gi.
n
Pdefines g ∈ H and that the result

It remains to verify that this procedure
is unique. To this end, define PN g = N n=1 `(ϕn )ϕn and let us prove that
kPN gkH is uniformly bounded in N . Indeed, on the one hand, by linearity
we have
XN
`(PN g) = [`(ϕn )]2 .
n=1
On the other hand, by boundedness of `, Cauchy–Schwarz, and orthonor-

mality, we have
N N
X 1
X 2
2

|`(PN g)| ≤ M
`(ϕn )ϕn
≤M [`(ϕn )] .
n=1 H n=1
Combining the last two displays yields

N
X 1
2
2
kPN gkH = [`(ϕn )] ≤ M,
n=1
which yields the desired bound. (This in turn may be used to show that
PN g is Cauchy in H as N → ∞ and hence converges to a limit g.)
As for uniqueness, we use the fact that if hf, gi = hf, hi for all f ∈ H
then g = h.
Using the Riesz representation theorem, we can identify H with its dual
space via the pairing hf, gi. We say that a sequence fn converges weakly
to f if
hfn , gi → hf, gi for all g ∈ H.
We write fn * f .
Lemma A.2.3. The following properties hold concerning weak convergence:
(a) If fn * f , then kf k ≤ lim supn→∞ kfn k.
(b) If {fn } is bounded, then fn converges weakly along a subsequence.

Proof. For (a), let ε > 0 and choose a unit vector g ∈ H so that
hf, gi > kf k − ε.
Thus by weak convergence and Cauchy–Schwarz,
kf k < hf, gi + ε = lim hfn , gi + ε ≤ lim sup kfn k + ε.

n→∞ n→∞
As ε > 0 was arbitrary, this implies the result.

We turn to (b). We let S = {gk } be a countable dense subset of H. For
each k, the sequence
{hfn , gk i}
is bounded and hence converges along a subsequence. In particular, by a
diagonal argument we may find a subsequence such that hfn , gk i converges
to a limit (say ck ) for all k. We claim that fn converges weakly along this
subsequence.
To see this, we need to define the limit f . By duality, it will suffice to
specify the values hf, gi for all g ∈ H. To this end, we fix g ∈ H and take
a sequence of gk ∈ S converging to g. We will show that {ck } is Cauchy,
so that it has a limit c; we will then define hf, gi = c and check that this
defines an L2 function to which fn converge weakly. Let ε > 0 and choose
K large enough that kg − gk k < ε for k > K. Now fix k, ` > K. By choosing
n large enough, we may guarantee that

hfn , gk i − hfn , g` i − (ck − c` ) < ε.
On the other hand, by Cauchy–Schwarz,
|hfn , gk i − hfn , g` i| = |hfn , gk − gi − hfn , g` − gi| ≤ 2M ε,
where M is the uniform bound for the {fn }. Thus {ck } is Cauchy and hence
converges to c. An intertwining argument shows that we may uniquely define
f as a (linear) functional on H via hf, gi = c. To see that f ∈ H, we observe
that
|hf, gi| = | lim lim hfn , gk i| ≤ M kgk,
k→∞ n→∞
where gk ∈ S converges to g. In particular kf k ≤ M . Finally, for weak

convergence we again let g ∈ H and choose a sequence S 3 gk → g. We
then write
hf − fn , gi = hf − fn , g − gk i + hf − fn , gk i.
The first term is M · o(1) as k → ∞, while the second term converges to
zero by construction. This completes the proof.
Remark A.2.4. Item (b) in the previous lemma is a special case of Alaoglu’s
theorem.
Let T : H1 → H2 be a (bounded) linear operator. The adjoint of T ,

denoted T ∗ , is the linear operator from H2 to H1 defined via
hT f, giH2 = hf, T ∗ giH1 .
We call T : H → H self-adjoint or symmetric if T = T ∗ .

An operator T : H1 → H2 is called compact if it maps bounded sets to
pre-compact sets. Equivalently, T is compact if whenever {fn } is a bounded
sequence in H1 , {T fn } has a convergent subsequence in H2 . We also have
the following:
Lemma A.2.5. Suppose there exists a sequence of finite-rank operators Tn

so that Tn → T . Then T is compact.
Here finite rank means the range of Tn is finite-dimensional, while the
convergence refers to the operator norm
kT k = sup kT f k.
kf k≤1
Proof. Suppose T has finite-rank approximations Tn and {fm } is a bounded

sequence. Then fm has a subsequence fm 1 so that T f 1 converges. Similarly,
1 m
2 of f 1 so that T f 2 converges. For n > m > K,
we can take a subsequence fm m 2 m
we write
T fnn − T fm
m
= (T − TK )fnn + TK (fnn − fm
m m
) + (TK − T )fm .
Choosing K large enough, the first and third terms may be made arbitrarily
small small due to the fact that {fnn }, {fm
m } are bounded sequences. For
fixed K, the middle term tends to zero as n, m → ∞ because {fnn }n≥K is a

K . Thus T f n is Cauchy and hence convergent. This implies
subsequence of fK n
T is compact.
If X ⊂ Y are two Banach spaces, then we say X is continuously em-

bedded in Y if the inclusion map is continuous (i.e. bounded). We say X
is compactly embedded in Y if the inclusion map is compact.
Lemma A.2.6. Suppose X and Y are Hilbert spaces and X is compactly
embedded in Y . If xn converges weakly in X, then xn converges strongly in
Y (to the same limit)
Proof. It suffices to show that weakly convergent sequences are bounded.
Indeed, if xn is bounded, then the compactness of the embedding implies
that any subsequence of xn converges strongly in Y along a further subse-
quence (to the weak limit of xn , by uniqueness of limits). This implies that
xn converges strongly in Y .
Let us now show that xn is bounded. In fact, this will follow from the uni-
form boundedness principle (Lemma A.3.3). We view each xn as a bounded
linear map from X → R via xn (y) = hxn , yi. Then weak convergence implies
sup |hxn , yi| < ∞ for all y ∈ X.

n
By the principle of uniform boundedness, this implies
sup kxn kX→R = sup kxn kX < ∞,

n n
as desired. In the final step we have used the dual formulation to compute
the norm of xn , i.e. kxn k = sup{|hxn , yi|} where the supremum is taken
over all y ∈ X.
Let us next sketch a proof of the following fundamental fact (a basic

version of the spectral theorem).
Theorem A.2.7. If T : H → H is compact and symmetric, then it is di-
agonalizable. That is, there exists an orthonormal set {un } if eigenvectors
for H and a sequence {λn } ⊂ R so that T un = λn un for all n, with λn → 0.
Furthermore, X
T (x) = λn hx, un iun
n
for x ∈ H.
The fact that the λn are real is a consequence of the symmetry of T (see
Exercise A.4.2). Similarly, one can verify that eigenspaces corresponding to
distinct eigenvalues are orthogonal. Without loss of generality, we take H
to be infinite dimensional.
Sketch of proof. We first show that A has an eigenvalue α0 with |α0 | = kAk.
We let α = kAk ≥ 0. Using symmetry, one observes
kAk2 = sup hf, A2 f i,

kf k=1
so that we may take a sequence un with kun k ≡ 1 and hun , A2 un i → α2 .

By compactness of A (and hence of A2 —exercise), we have A2 un → α2 u for
some u (along a subsequence). But then
k(A2 − α2 )un k2 = kA2 un k2 − 2α2 hun , A2 un i + α4

≤ 2α2 (α2 − hun , A2 un i) → 0.
In particular, A2 un − α2 un → 0 and A2 un → α2 u, so that un → u. Thus

kuk = 1 and A2 u = α2 u, so that
(A + α)(A − α)u = (A − α)(A + α)u = 0.
This shows that either α or −α is an eigenvalue of A. In fact, kAk must be

the largest eigenvalue of A (exercise).
Now let u0 be a normalized eigenvector corresponding to the largest
eigenvalue α0 . Then set
H (1) = {f ∈ H : hu0 , f i = 0}.

This is a Hilbert space with A : H (1) → H (1) ; indeed
hAf, u0 i = hf, Au0 i = αhf, u0 i = 0 for all f ∈ H (1) .
Let A1 = A|H (1) . Then A1 is symmetric and compact, and hence we can find
a largest eigenvalue α1 with normalized eigenvector u1 ... we now continue
in this way to construct a sequence of normalized eigenvectors {uj } that are
mutually orthogonal by construction with eigenvalues αj .
We claim that αj → 0. If not, then (passing to a subsequence) we get
αj 6= 0 for all j. Now consider the bounded sequence α1j uj . Then {A α1j uj }
has no convergent subsequence, since
kA α1j uj − A α1` u` k2 = kuj − u` k2 = 2.
This contradicts compactness. The representation of T in terms of eigen-

vectors is left as an exercise. Hint: Show that
N
X
kT (x) − λn hx, un iun k ≤ |λn |kxk.
n=1
An operator T is called positive (written T ≥ 0) if hT f, f i ≥ 0 for all

f ∈ H. A bounded operator T : H → H is called trace class if there exists
an orthonormal basis {ϕn } such that
X
tr(T ) := hT ϕn , ϕn i < ∞.
n
In fact, in this case the trace tr(T ) is independent of the basis chosen. Trace
class operators are compact. We can prove this provided we take a few
Fourier analysis type results for granted (see Section 2.2 for details).
We first claim that if T is trace class, then in fact
X
kT ϕn k2 < ∞. (A.2)
n
Using this, we can complete the proof by exhibiting finite-rank approxima-

tions to T . In particular, we set
X
TN f = hTN f, ϕn iϕn .
n≤N
A.3. ANALYSIS TOOLS 295
Then, using orthogonality and Cauchy–Schwarz, we can write

2
X X
2

k(T − TN )f k =
hf, ϕn ihT ϕn , ϕm i
m n>N
X X X
2 2
≤ |hf, ϕn i| |hT ϕn , ϕm i|
m n>N n>N
XX
2
≤ kf k |hT ϕn , ϕm i|2
n>N m
X
2
≤ kf k kT ϕn k2 = kf k2 · o(1)
n>N
as N → ∞, which implies the result.

It remains to show that trace class operators satisfy (A.2) (which is
actually the definition of ‘Hilbert–Schmidt’ operators). In fact, (A.2) will
follow from the more general bound
kT f k2 = hT ∗ T f, f i ≤ kT khT f, f i
for positive operators T , as we now prove. As positive operators are auto-
matically self-adjoint, we may replace T ∗ T with T 2 . Multiplying both sides
by an arbitrary α > 0, we can reduce to the case that kT k is as small as we
wish, say kT k < 21 . In this case, we have by Cauchy–Schwarz
hf, f i ≥ kT f k kf k ≥ hT f, f i ≥ 0,
and so
0 ≤ hf − T f, f i ≤ hf, f i.
This shows I − T is positive and has norm bounded by 1. Then writing
T = I − (I − T )
and using the power series expansion for 1 − x (which converges whenever
kxk ≤ 1), we can define a (unique, positive) square root of T . Then the
desired bound follows from
1 1
kT f k2 ≤ kT 2 k2 kT 2 f k2 ≤ kT khT f, f i. (A.3)
A.3 Analysis tools

The convolution of f and g is defined by
Z
f ∗ g(x) = f (x − y)g(y) dy.
We use this both when the functions are defined on all of Rd , as well as
when the functions are periodic on some torus. In this section we will focus
on the case of Rd .
Definition A.3.1. We call a family of functions {Kn } on Rd a family of
good kernels if the following three conditions hold:
R
(i) Rd Kn (y) dy = 1 for all n,
R
(ii) there exists M such that for all n we have Rd |Kn (y)| dy ≤ M ,
R
(iii) for each δ > 0, we have |y|>δ |Kn (y)| dy → 0 as n → ∞.
Example A.3.1. Let K : Rd → R satisfy
Z Z
|K(x)| dx . 1 and K(x) dx = 1.
Then the functions Kn (x) := nd K(nx) form a family of good kernels.

Lemma A.3.2 (Approximation to the identity). Let f ∈ Lp (Rd ) for some
1 ≤ p < ∞. Suppose {Kn } is a family of good kernels. Then
lim kf − f ∗ Kn kLp = 0.
n→∞
If f is bounded and continuous, the same result holds in pointwise. If f is
bounded and uniformly continuous (or if f is continuous and tends to zero
as |x| → ∞), then the convergence is uniform.
Proof. RLet us prove the Lp result and leave the remaining part as an exercise.
Using Kn (y) dx ≡ 1, we see that our task is to prove
Z Z p

lim Kn (y)[f (x) − f (x − y)] dy dx = 0.

n→∞
We let ε > 0 and observe that because translations are continuous in Lp ,

there exists δ > 0 such that
sup kf (x) − f (x − y)kLpx < ε.
|y|<δ
Thus (writing M as the uniform upper bound for kKn kL1 and applying
Minkowski’s integral inequality)
Z Z p

K n (y)[f (x) − f (x − y)] dy dx

|y|<δ
Z Z p
p

. sup |f (x) − f (x − y)| dx · Kn (y) dy dx

|y|<δ
. εp M p
uniformly in n. On the other hand, applying Minkowski’s integral inequality

once again,
Z Z p

Kn (y)[f (x) − f (x − y)] dy dx
|y|>δ
Z Z 1 p
p
p
≤ |Kn (y)| |f (x) − f (x − y)| dx dy
|y|>δ
Z p Z
Kn (y) dy sup |f (x)|p + |f (x − y)|p dx

.
|y|>δ y
Z
. o(1) · |f (x)|p dx
as n → ∞. This completes the proof.
We next record a result known as the principle of uniform boundedness.

Lemma A.3.3 (Uniform boundedness principle). Let X and Y be Banach
spaces and let F be a collection of continuous linear functions from X to Y .
If
sup kT (x)kY < ∞ for all x ∈ X, (A.4)
T ∈F
then
sup kT kX→Y < ∞. (A.5)
T ∈F
Proof. The standard proof relies on the Baire category theorem. Here we
present a completely elementary proof appearing in [27]. Let us omit the
subscripts from the norms below; the meaning should be clear from context.
First, for any linear operator T and any x, y,
kT yk ≤ kT ( 12 (x + y) − 12 (x − y))k
≤ 21 kT (x + y)k + kT (x − y)k

≤ max kT (x ± y)k.
Thus (using linearity once again), we have for any r > 0

1 1 1
kT k = r sup kT yk ≤ r sup max kT (x ± y)k ≤ r sup kT zk,
kyk=r kyk≤r z∈B(x,r)
i.e. for any linear operator we have
sup kT zk ≥ rkT k for all x ∈ X, r > 0.

z∈B(x,r)
Now suppose (A.5) fails; we will show (A.4) fails as well. We choose
a sequence Tn ∈ F such that kTn k ≥ 4n . Define x0 = 0. Proceeding
inductively, we may find a sequence xn such that xn ∈ B(xn−1 , 3−n ) and
9 −n 9 4 n
kTn xn k ≥ 10 3 kTn k ≥ 10 ( 3 ) .
The sequence xn is Cauchy and hence converges to some x. In fact, noting

that
Xm Xm
kxm − xn k ≤ kxj − xj−1 k ≤ 3−j < 12 3−n ,
j=n+1 j=n+1
one finds kx − xn k ≤ 1
2 · 3−n . Now
kTn (x − xn )k ≤ kTn k 12 · 3−n , while kTn xn k ≥ 9 −n

10 3 kTn k,
so that
kTn xk ≥ 52 3−n kTn k ≥ 52 ( 43 )n → ∞,
showing the failure of (A.4).
Lemma A.3.4 (Schur’s test). Let {Tjk } be a matrix satisfying

X X
sup |Tjk | ≤ C < ∞ and sup |Tjk | ≤ C < ∞.
j k j
k
Then kT k`p →`p . C for all 1 ≤ p ≤ ∞.
Proof. Let us show the proof in the simplest case p = 2 and leave the
remaining cases as an exercise. Using Cauchy–Schwarz and exchanging the
order of summation,
2
XX
kT f k2`2

=
Tjk fk
j k
XX X
. |Tjk | |fk |2 |Tj` |
j k `
X XX
. sup |Tj` | · |Tjk ||fk |2
j j
` k
X X
2
.C· |fk | · sup |Tjk | . C 2 kf k2`2 ,
k k j

Remark A.3.5. There isR a completely analogous result concerning opera-

tors of the form T f (x) = K(x, y)f (y) dy for some integral kernal K(x, y).
The following is a result from complex analysis. It relies on the max-
imum principle (see e.g. [31]): an analytic (also called holomorphic or
complex differentiable) function on a bounded domain in the complex plane
attains its maximum on the boundary.
Lemma A.3.6 (The three lines lemma). Let f be analytic on {0 ≤ Re z ≤
1}. Suppose f satisfies
2−δ
|f (z)| ≤ eC|z|
for some C > 0 and δ > 0. Suppose that |f (z)| ≤ M0 when Re z = 0 and
|f (z)| ≤ M1 when Re z = 1. Then we have
|f (z)| ≤ M01−Re z M1Re z
for all z in the strip.
Proof. First suppose that M0 ≤ 1 and M1 ≤ 1; we will show that |f (z)| ≤ 1
on the strip.
To this end, we let ε > 0 and set
2
g(z) = eεz f (z).
This is an analytic function, and because of the hypotheses on f we have
that |g(x + iy)| → 0 as y → ±∞ for any 0 ≤ x ≤ 1.
Therefore, applying the maximum principle on a sufficiently large rect-
angle [0, 1] × [−R, R], we may deduce that |g(z)| ≤ eε for all z in the strip.
As this holds for arbitrary ε > 0, we may send ε → 0 to deduce |f (z)| ≤ 1
on the strip.
For the general case we let h(z) = f (z)M0z−1 M1−z . Then h has exponen-
tial bounds similar to f and is bounded by 1 on the boundary of the strip.
Therefore the previous analysis shows that |h(z)| ≤ 1 everywhere, which
implies the result.
Remark A.3.7. One must impose some restrictions on the growth of the
function f above. Indeed, consider the analytic function f (z) = exp{−ieπiz }.
Then |f (x + iy)| = exp{e−πy sin πx}. In particular, |f (x + iy)| = 1 for
x ∈ {0, 1} but f is unbounded for x ∈ (0, 1).
Finally, we recall the Arzelá–Ascoli theorem.
Theorem A.3.8 (Arzelá–Ascoli). Let K ⊂ Rd be compact and let {fn } be
a bounded, equicontinuous sequence of functions on C(K). Then {fn } has
a uniformly convergent subsequence.
A.4 Exercises
Exercise A.4.1. Let 1 ≤ p1 ≤ p2 ≤ ∞. Show that
kak`p2 ≤ kak`p1
for all a ∈ `p1 .

Exercise A.4.2. Show that if T is a symmetric linear operator on a Hilbert
space, then its eigenvalues are necessarily real.
Exercise A.4.3. Show that
kf kL∞ = inf sup |f (x)|.

|E|=0 x∈E c
Exercise A.4.4. Prove Schur’s test (Lemma A.3.4) for general 1 ≤ p ≤ ∞.

Exercise A.4.5. Show that the functions in Example A.3.1 form a family of
good kernels.
Exercise A.4.6. Let 1 ≤ p < ∞. Find the best possible C such that

kf + gkLp,∞ ≤ C kf kLp,∞ + kgkLp,∞
for all f, g.
Exercise A.4.7. Show that if fn * f weakly in a Hilbert space H and
kfn k → kf k, then fn → f strongly in H.
Bibliography
[1] H. Brézis and E. Lieb, A relation between pointwise convergence of

functions and convergence of functionals. Proc. Amer. Math. Soc. 88
(1983), 486–490.
[2] L. Carleson, On convergence and growth of partial sums of Fourier

series, Acta Math. 116 (1966), 135–157.
[3] T. Cazenave, Semilinear Schrödinger equations. Courant Lecture

Notes in Mathematics, 10. New York University, Courant Institute
of Mathematical Sciences, New York; American Mathematical Soci-
ety, Providence, RI, 2003. xiv+323 pp.
[4] E. Candès, J. Romberg, and T. Tao, Robust Uncertainty Principles:

Exact Signal Reconstruction From Highly Incomplete Frequency Infor-
mation. IEEE Transactions on Information Theory, Vol. 52, No. 2,
February 2006.
[5] M. Christ and A. Kiselev, Maximal functions associated to filtrations.

J. Funct. Anal. 179 (2001), no. 2, 409–425
[6] M. Christ and M. Weinstein, Dispersion of small amplitude solutions

of the generalized Korteweg-de Vries equation. J. Funct. Anal. 100
(1991), no. 1, 87–109.
[7] I. Daubechies, Ten Lectures on Wavelets. CBMS-NSF Regional Con-

ference Series in Applied Mathematics, 61. Society for Industrial and
Applied Mathematics (SIAM), Philadelphia, PA, 1992. xx+357 pp
[8] L. C. Evans, Partial Differential Equations.
[9] C. Fefferman, Pointwise convergence of Fourier series, Ann. of Math.

98 (1973), 551–571.
301
302 BIBLIOGRAPHY
[10] G. Folland, A course in abstract harmonic analysis. Second edi-

tion. Textbooks in Mathematics. CRC Press, Boca Raton, FL, 2016.
xiii+305 pp.+loose errata.
[11] S. Foucart and H. Rauhut, A Mathematical Introduction to

Compressive Sensing. Applied and Numerical Harmonic Analysis.
Birkhuser/Springer, New York, 2013. xviii+625 pp.
[12] D. Griffiths, Introduction Quantum Mechanics. Pearson 2014.
[13] M. Lacey, Carleson’s theorem: proof, complements, variations. Publ.

Mat. 48 (2004), no. 2, 251–307.
[14] M. Lacey and C. Thiele, A proof of boundedness of the Carleson op-

erator. Math. Res. Lett. 7 (2000), 361–370.
[15] Wheeden and Zygmund, Measure and Integral. An introduction to real

analysis. Pure and Applied Mathematics, Vol. 43. Marcel Dekker, Inc.,
New York-Basel, 1977. x+274 pp.
[16] Katznelson, An Introduction to Harmonic Analysis. Second corrected

edition. Dover Publications, Inc., New York, 1976. xiv+264 pp.
[17] M. Keel and T. Tao, Endpoint Strichartz estimates. Amer. J. Math.

120 (1998), no. 5, 955–980.
[18] R. Killip and M. Visan, Nonlinear Schrödinger equations at critical

regularity. Evolution equations, 325–437, Clay Math. Proc., 17, Amer.
Math. Soc., Providence, RI, 2013.
[19] E. Lieb and M. Loss, Analysis. Graduate Studies in Mathematics, 14.

American Mathematical Society, Providence, RI, 1997. xviii+278 pp.
[20] A. Martinez, An Introduction to Semiclassical and Microlocal Analysis.

Universitext. Springer-Verlag, New York, 2002. viii+190 pp.
[21] S. J. Montgomery-Smith, Time decay for the bounded mean oscillation

of solutions of the Schrödinger and wave equations. Duke Math. J. 91
(1998), no. 2, 393–408.
[22] B. Osgood, Lecture Notes for EE251: The Fourier Transform and its
Applications.
[23] T. Tao, Lecture notes for Math 247A and Math 247B. Available at
BIBLIOGRAPHY 303
http://www.ucla.edu/~tao.
[24] T. Tao, Spherically averaged endpoint Strichartz estimates for the two-
dimensional Schrödinger equation, Comm. Partial Differential Equa-
tions 25 (2000), 1471–1485.
[25] T. Tao, An uncertainty principle for cyclic groups of prime order.

Math. Res. Lett. 12 (2005), no. 1, 121–127.
[26] T. Tao, Scattering for the three-dimensional Schrödinger equation with

compactly supported potential.
[27] A. Sokal, A really simple elementary proof of the uniform boundedness

principle. The American Mathematical Monthly 118, no. 5 (2011).
[28] E. Stein, Singular integrals and differentiability properties of functions.

Princeton Mathematical Series, No. 30 Princeton University Press,
Princeton, N.J. 1970 xiv+290 pp.
[29] E. Stein, Harmonic analysis: real-variable methods, orthogonality,

and oscillatory integrals.With the assistance of Timothy S. Murphy.
Princeton Mathematical Series, 43. Monographs in Harmonic Anal-
ysis, III. Princeton University Press, Princeton, NJ, 1993. xiv+695
pp.
[30] E. Stein and R. Shakarchi, Real analysis. Measure theory, integration,

and Hilbert spaces. Princeton Lectures in Analysis, 3. Princeton Uni-
versity Press, Princeton, NJ, 2005. xx+402 pp.
[31] E. Stein and R. Shakarchi, Complex Analysis. Princeton Lectures

in Analysis, 2. Princeton University Press, Princeton, NJ, 2003.
xviii+379
[32] R. Strichartz, Restrictions of Fourier transforms to quadratic surfaces

and decay of solutions of wave equations. Duke Math. J. 44 (1977),
no. 3 705–714.
[33] A. Zygmund, On Fourier coefficients and transforms of functions of

two variables. Studia Math. 50 (1974), 189–201.

Harmonic Analysis

Uploaded by

Copyright:

Available Formats

Harmonic Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Harmonic Analysis

Uploaded by

Copyright:

Available Formats

A Course in Harmonic Analysis

2 Fourier analysis, part I 8

3 Fourier analysis, part II 39

4 Abstract Fourier analysis 76

6 Classical harmonic analysis, part I 131

7 Classical harmonic analysis, part II 166

8 Semiclassical and microlocal analysis 205

9 Sharp inequalities 235

10 Restriction theory and related topics 253

11 Additional topics 273

11.1.3 The distorted Fourier transform . . . . . . . . . . . . 282

A Prerequisite material 285

• In Chapter 2, we introduce Fourier series, motivating their develop-

• In Chapter 3 we discuss the topic of sampling of signals (e.g. the

des, Romberg, and Tao [4] on reconstruction of signals using randomly

• In Chapter 4, we present a survey of results in abstract Fourier trans-

• In Chapter 5, we discuss the continuous and discrete wavelet trans-

• In Chapter 6, we begin discussing what I have called ‘classical’ har-

• In Chapter 7, we continue the study of ‘classical’ topics in harmonic

• In Chapter 8, we begin our study of more ‘modern’ topics in harmonic

for pseudodifferential operators. We then give a short introduction to

• In Chapter 9, we continue our study of ‘modern’ topics and turn to

• In Chapter 10, we prove some basic results in ‘restriction theory’. This

• In Chapter 11, we will collect some additional topics. Currently, this

• Finally, in the appendix, we have collected some prerequisite material

Fourier analysis, part I

2.1 Separation of variables

where f : (0, 1) → R is some given function. This is the well-known heat

Using (2.1) and rearranging, we find that for u to be a solution we must

p0 (t) = −λp(t) and − q 00 (x) = λq(x) for some constant λ.

differential equation (ODE) it must also satisfy the boundary conditions.

2.2 Fourier series in general

If hf, gi = 0, then we call f and g orthogonal. A set {φα }α∈A is orthog-

We define the Fourier series of f (with respect to {φk }) by

We define the partial Fourier series by

We will prove the following:

where {ck } are the Fourier coefficients of f .

which yields Bessel’s inequality upon sending N → ∞.

If equality holds in Bessel’s inequality (i.e. kck`2 = kf kL2 ), we say f

Thus {ck } ∈ L2 implies {tN } is Cauchy and hence converges to some f ∈ L2 .

which tends to ck as NP→ ∞ by Cauchy–Schwarz and the fact that tN → f

Proposition 2.2.7. An orthonormal set {φk } is complete if and only if

Proof. If {φk } is complete and f ∈ L2 , then Bessel’s inequality implies that

Conversely, if hf, φk i = 0 for all k and kf k2 = |hf, φk i|2 , then kf k = 0

2.3 Fourier series, revisited

Lemma 2.3.1. Let f ∈ L2 ([−π, π]). If hf, en i = 0 for all n, then f = 0.

Proof. By assumption, we have

If we could prove that SN f → f (in some sense), we would be finished.

then we may write

By assumption, we have σN f ≡ 0. On the other hand, we will prove that

For the next property, we use the identity

which we also leave as an exercise. In particular, FN (x) ≥ 0, so that

The result follows.

2.4 Convergence of Fourier series

lim σN f (x) = 21 [f (x+) + f (x−)], (2.3)

As for the convergence of SN f to f , we have so far established con-

kSn kL1 →L1 = sup{kSn f kL1 : f ∈ L1 with kf kL1 = 1}.

kSn (FN )kL1 = kσN (Dn )kL1 → kDn kL1 as N → ∞,

where we use that the σN are good kernels. We conclude that

lim kSn kL1 →L1 = ∞. (2.4)