Analysis2011 PDF
Analysis2011 PDF
Analysis2011 PDF
Sergiu Klainerman
Department of Mathematics, Princeton University, Princeton NJ 08544
E-mail address:
seri@math.princeton.edu
Part 1
INTRODUCTION TO PDE
(1)
where u = x
2 u + y 2 u + z 2 u. The other two examples described in the section
of fundamental mathematical definitions are
(Heat Equation)
t u + ku = 0,
(2)
t2 u + c2 u = 0.
(3)
(Wave equation)
(4)
1.
PDEs that small changes in the form of an equation can lead to very different
properties of solutions.
(Klein-Gordon equation)
t2 u + c2 u
mc2
~
2
u=0
(5)
x u
(1 + |x
u|2
+ |y
1
u|2 ) 2
+ y
y u
(1 + |x
u|2
+ |y
1
u|2 ) 2
Here x and y are short hand notations for the partial derivatives
= 0.
and
(6)
y .
The equations we have encountered so far can be written in the form P[u] = 0,
where P is a differential operator applied to u. A differential operator is simply a
rule which takes functions u, defined in Rn or an open subset of it, into functions
P[u] by performing the following operations:
u
We can take partial derivatives i u = x
i relative to the variables x =
1
2
n
n
(x , x , . . . x ) of R . One allows also higher partial derivatives of u such
2
2
u
2
as the mixed second partials i j u = xi x
j or i = x2 .
i
The associated differential operators for (2) is P = t + and that
of (3) is t2 +
Can add and multiply u and its partial derivatives between themselves as
well as with given functions of the variables x. Composition with given
functions may also appear.
3
X
i,j=1
eij Xi Xj .
(7)
The associated differential operators for (2), (3) and (4) are, resp. P = t + ,
P = t2 + and P = it + with variables are t, x1 , x2 , x3 R1+3 . In the
particular case of the wave equation (3) it pays to denote the variable t by x0 . The
wave operator can then be written in the form,
= 02 + 12 + 22 + 32 =
3
X
(8)
,=0
3
X
m X Y = X0 Y0 + X1 Y1 + X2 Y2 + X4 Y4
(9)
,=0
The differential operator is called DAlembertian after the name of the French
mathematician who has first introduced it in connection to the equation of a vibrating string.
Observe that the differential operators associated to the equations (1)(4) are all
linear i.e.
P[u + v] = P[u] + P[v],
for any functions u, v and real, or complex, numbers , . The following is another
simple example of a linear differential operator
P[u] = a1 (x)1 u + a2 (x)2 u
(10)
where x = (x1 , x2 ) and a1 , a2 are given functions of x. They are called the coefficients of the linear operator. An equation of the form
P[u] = f,
(11)
(12)
In the case of the equation (6) the differential operator P can be written, relative
to the variables x1 and x2 , in the form,
2
X
1
u
,
P[u] =
i
i
1
(1 + |u|2 ) 2
i=1
where |u|2 = (1 u)2 + (2 u)2 . Clearly P[u] is not linear in this case. We call
it a nonlinear operator; the corresponding equation (6) is said to be a nonlinear
equation. An important property of both linear and nonlinear differential operators
is locality. This means that whenever we apply P to a function u, which vanishes
in some open set D, the resulting function P[u] also vanish in D.
Observe also that our equations (1)-(4) are also translation invariant. This means,
in the case (1) for example, that whenever the function u = u(x) is a solution so
1.
(13)
One can impose the same boundary condition for solutions of (6), with D a bounded
open domain of R2 . A solution u = u(x, y) of (6) in D, verifying the boundary
condition (13), solves the Plateau problem of finding minimal surfaces in R3 which
pass through a given curve. One can show that the surface given by the graph
u = {(x, y, u(x, y))/(x, y) D R2 } has minimum area among all other graph
surfaces v verifying the same boundary condition, v|D = u0 .
Natural boundary conditions can also be imposed for the evolution equations (2)
(4). The simplest one is to prescribe the values of u on the hyperplane t = 0. In
the case of the heat and Schr
odinger equation we set,
u|t=0 = u0
1The transformations are often linear maps.
while in the case of the wave equation, which involves a second derivative in t, we
impose two conditions
u|t=0 = u0 and t u|t=0 = u1
(14)
where u0 , u1 are functions of the coordinates (x, y, z), called initial conditions. To
solve the initial value problem in both cases means to find solutions of the equations
for t > 0 which verify the corresponding initial conditions at t = 0. In addition
one may restrict the variables (x, y, z) to an open domain of D R3 . More to the
point one may try to solve a boundary value problem in a domain [0, ) D with a
boundary condition, such as (13), on [0, ) D and an initial condition at t = 0.
The choice of boundary condition and initial conditions, for a given PDE, is very
important. Finding which are the good boundary and initial conditions is an important aspect of the general theory of PDE which we shall address in section 2.
For equations of physical interest these appear naturally from the context in which
they are derived. For example, in the case of a vibrating string, which is described
by solutions of the one dimensional wave equation t2 u x2 u = 0 in the domain
(a, b) R, the initial conditions u = u0 , t u = u1 at t = t0 , amount to specifying
the original position and velocity of the string. On the other hand the boundary
condition u(a) = u(b) = 0 simply mean that the two ends of the of the string are
fixed.
So far we have only considered equations in one unknown. In reality many of
the equations of interest appear as systems of partial differential equations. The
following important example, contains two unknown functions u1 = u1 (x1 , x2 ), u2 =
u2 (x1 , x2 ) which verify,
(Cauchy-Riemann)
1 u2 2 u1 = 0,
1 u1 + 2 u2 = 0
(15)
(16)
Equation (15) can also be written in the form P[u] = 0 by introducing u = (u1 , u2 )
as a column vector and P[u] the differential operator,
2 1
u1
P[u] =
1 2
u2
The system of equations (15) contains two equations and two unknowns. This is
the standard situation of a determined system. A system is called over-determined
if it contains more equations than unknowns and underdetermined if it contains
fewer equations than unknowns. For example the system of two equations and
one unknown x u(x, y) = f, y u(x, y) = g is clearly overdetermined. A necessary
condition for a solution to exist is y f = x g, condition which can be interpreted
1.
as requiring that the one-form w = f (x, y)dx + g(x, y)dy is exact, i.e. its exterior derivative d is identically zero. Overdetermined systems, such as De Rham
complexes, play a very important role in geometry.
Observe that (15) is a linear system. Observe also that the operator P has the
following remarkable property.
u1
P 2 [u] = P[P[u]] =
u2
In other words P 2 = I, with I the identity operator I[u] = u, and therefore
P can be viewed as a a square root of . One can define a similar type of square
root for the DAlembertian . To achieve this we need 4 4 complex matrices
0 , 1 , 3 , 4 which satisfy the property
+ = 2m I
(17)
(18)
Using (17) we easily check that, D u = u. Thus the Dirac operator D can be
viewed as a square root of the DAlembertian . It leads to the following fundamental equation introduced by Dirac as the equation of free, massive, relativistic,
particle such as the electron:
(Dirac Equation)
Du = ku
(19)
10
1.
11
(20)
where S , called the Laplace-Beltrami operator of S, is a straightforward adaptation of the Laplace operator, see (1), to the surface S. Thus the proof of the
uniformization theorem reduces to solve equation (20), i.e. for a given surface S
with Gauss curvature K, find a real valued function u which verifies (20).
We give below a precise definition of the operator S relative to a system of local coordinates x = (x1 , x2 ) on an open coordinate chart D S. Denote by
G(x) = (gab (x))a,b=1,2 the 2 2 matrix whose entries are the components of our
Riemannian metric on D. Let G1 (x) denote the matrix inverse to G(x) and denote
its components by (g ab (x))a,b=1,2 . Thus, for all x D,
X
gac (x)g cb (x) = ab
c
with ab the usual Kronecker symbol. We also set, as before, |g(x)| = det(G(x))
and define,
X
p
1
b ( |g(x)| g ab (x) a u(x))
S u(x) = p
|g(x)| a,b=1,2
Typically we suppress the explicit dependence on x in the above formula. It is also
very convenient to use Einsteins summation convention over repeated indices, and
thus write,
p
1
S u = p b ( |g| g ab a u)
(21)
|g|
As a third example we consider the Ricci flow equation on a compact n dimensional manifold M , which is described in one of the articles of the Compendium. In
the particular case of three dimensions the equation has been recently used, decisively, to provide the first proof of Thurstons geometrization conjecture, including
the well known Poincare conjecture. The geometrization conjecture, described in
the topology section of the Compendium, is the precise analogous, in three space
dimensions, of the 2-dimensional uniformization theorem mentioned above. The
Ricci flow is defined, in arbitrary local coordinates x = (x1 , x2 , x3 ) on M , by the
equation:
12
(Ricci Flow)
t gij = Rij (g)
(22)
Here gij = gij (t) is a family of Riemannian metrics depending smoothly on the
parameter t and Rij (g) denotes the Ricci curvature of the metric gij . This is simply
a three dimensional generalization of the Gauss curvature we have encountered
in the uniformization theorem. In a given system of coordinates Rij (g) can be
calculated in terms of the metric coefficients gij and their first and second partial
derivatives. Since both gij and Rij are symmetric relative to i, j = 1, 2, 3 we can
interpret (22) as a non-linear system of six equations with six unknowns. On a
closer look it turns out that (22) is related to the heat equation (2). Indeed, by
a straightforward calculation relative to a particular system of coordinates x =
(x1 , x2 , x2 ) called harmonic, it can be shown that the Ricci flow (22) takes the form
t gij g gij = Nij (g, g)
(23)
where each Nij , i, j = 1, 2, 3, are functions of the components gij and their first
partial derivatives with respect to the coordinates x and g is, again, a differential
operator very similar to the Laplacian in R3 , see (??). More precisely, if G1 =
(g ab )a,b=1,2,3 denotes the matrix inverse to G = (gab )a,b=1,2,3 we can write, using
the summation convention,
g = g ab a b =
3
X
g ab a b .
a,b=1
1.
13
and short discussion of its importance in General Relativity can be found (see
compendium article). Solutions to the Einstein vacuum equations are given by
Ricci flat spacetimes, that is Lorentzian manifolds (M, g) with M a four dimensional
manifold and g a Lorentz metric on it, for which the corresponding Ricci curvature
vanishes identically.
(Einstein-vacuum)
Ric(g) = 0.
(24)
The Ricci curvature of a Lorentz metric, Ric(g), can be defined in exactly the same
way as in the Riemannian case. Thus relative to a coordinate system x , with
= 0, 1, 2, 3, the Ricci curvature, denoted by R , can be expressed in terms of
the first and second partial derivatives of the metric coefficients g . As before,
we denote by g the components of the inverse metric. Moreover, by picking
a specified system of coordinates, called wave coordinates4, we can express the
Einstein-vacuum equations (24) in the form of a system of equations related to the
wave equation (3), in the same way the Ricci flow system (23) was related to the
heat equation (2). More precisely,
g g = N (g, g)
(25)
where, as in the case of the Ricci flow, the terms N (g, g) are expressions, which
can be calculated explicitely, depending on the metric g , its inverse g and the
first derivatives of g relative to the coordinates x . This is a system of 10 equations with respect to the ten unknown components of the metric (g ),=0,1,2,3 .
The differential operator,
X
g =
g
,
appearing on the left hand side is very similar to the wave operator = m =
02 + which we have encountered before in (8). Indeed, in a neighborhood of a
point p M we can pick our wave coordinates x in such a way that g (p) = m .
Thus, locally, g looks like = m and we can thus interpret (25) as a nonlinear
system of wave equations.
The two last examples illustrate the importance of choosing good coordinates for
equations which are defined in terms of geometric quantities, such as the Ricci
curvature. To solve such equations and find interesting properties of the solutions,
it is often very important to pick up a well adapted system of coordinates. In the
case of gauge field theories, such as Yang-Mills equations, the role of coordinates is
replaced by gauge transformations.
Finally we need to note that PDE arise not only in Physics and Geometry but also
in many fields of applied science. In engineering, for example, one often wants to
impose auxiliary conditions on solutions of a PDE, corresponding to a part of a
physical system which we can directly influence, such as the portion of the string
of a violin in direct contact with the bow, in order to control their behavior, i.e.
4they are the exact analogue of the harmonic coordinates discussed above.
14
obtain a beautiful sound. The mathematical theory dealing with this issue is called
Control Theory.
Often, when dealing with complex physical systems, when we cannot possible have
complete information about the state of the system at any given time, one makes
various randomness assumptions about various factors which influence it. This
leads to a very important class of equations called stochastic differential equations.
To give a simple example consider the N N system of the ordinary differential
equation,
dx
= f (x(t))
(26)
dt
Here f is a given function f : RN RN . A solution x(t) is a vector valued function
x : [0, ) RN . Given an initial data x(0) = x0 we can precisely determine
the position x(t) and velocity dx
dt of the solution at any given time t. In applied
situations, because of various factors which are hard to take into account, the state
of the solution may not be so neatly determined. It is thus reasonable to modify
the equation to take into account random effects which influence the system. One
then looks at en equation of the form,
dx
dW
= f (x(t)) + B(x(t))
(t)
(27)
dt
dt
where B(x) is a N M dimensional matrix and W (t) denotes the brownian motion
in RM . Similar modifications, which take randomness into account, can be made for
partial differential equations. A particularly interesting example of a PDE, which
is derived from a stochastic process, related to the price of stock options in finance,
is the well known Black- Scholes equation. The real price of a stock option u(s, t)
at time t and value s, verifies the PDE,
2 2 2
s s u ru = 0,
s > 0, t [0, T ],
(28)
2
subject to the terminal condition at expiration time T , u = max(0, (s p)) and
boundary condition u(0, t) = 0, t [0, T ]. Here p is the strike price of the option.
Observe that this equation is in fact a (time-reversed) variant of the heat equation (2), thus illustrating the point made above that a single class of mathematical
equations can arise in several completely different applications (in this case, thermodynamics and mathematical finance).
t u + rss u +
CHAPTER 1
1. Basic Notions
In this section we will discuss some basic examples of linear and nonlinear equations
which arise variationaly from a relativistic Lagrangian. The fundamental objects
of a relativistic field theory are:
Space-time (M, g) which consists of an n + 1 dimensional manifold M
and a Lorentz metric g; i.e . a nondegenerate quadratic form with signature (1, 1, . . . , 1) defined on the tangent space at each point of M. We
denote the coordinates of a point in M by x , = 0, 1, . . . , n.
Throughout most of this chapter the space-time will in fact be the
simplest possible example - namely, the Minkowski space-time in which
the manifold is Rn+1 and the metric is given by
2
2
ds2 = m dx dx = dt2 + dx1 + + (dxn )
(29)
with t = x0 , m = diag(1, 1, . . . , 1). Recall that any system of coordinates for which the metric has the form (29) is called inertial. Any two
inertial coordinate systems are related by Lorentz transformations.
Collection of fields = (1) , (2) , . . . , (p) which can be scalars, tensors, or some other geometric objects1 such as spinors, defined on M.
Lagrangian density L which is a scalar function on M depending only
on the tensorfields and the metric2 g.
We then define the corresponding action S to be the integral,
Z
S = S[, g : U] =
L[]dvg
U
where U is any relatively compact set of M. Here dvg denotes the volume element
generated by the metric g. More precisely, relative to a local system of coordinates
x , we have
16
(1) At s = 0, (0) = .
(2) At all points p M \ U we have (s) = .
Given such a variation we denote := :=
d(s)
ds
(s) = + s + O(s2 )
A field is said to be stationary with respect to S if, for any compact variation
((s) , U) of , we have
d
S(s) = 0
ds
s=0
where,
S(s) = S[(s) , g; U]
We write this in short hand notation as
S
=0
Action Principle, also called the Variational Principle, states that an acceptable
solution of a physical system must be stationary with respect to a given Lagrangian
density called the Lagrangian of the system. The action principle allows us to derive
partial differential equations for the fields called the Euler-Lagrange equations.
Here are some simple examples:
1. Scalar Field Equations :
One starts with the Lagrangian density
1
L[] = g V ()
2
where is a complex scalar function defined on (M, g) and V () a given real
function of .
Given a compact variation ((s) , U) of , we set S(s) = S[(s) , g; U]. Integration
by parts gives,
Z
d
V 0 ()]
gdx
S(s)
=
[g
ds
s=0
ZU
g V 0 ()]dvg ]
=
[
U
1
g g
g
.
In view of the action principle and the arbitrariness of we infer that must satisfy
the following Euler-Lagrange equation
g V 0 () = 0,
(30)
1. BASIC NOTIONS
17
Equation (30) is called the scalar wave equation with potential V ().
CONFORMAL PROPERTIES 2. Wave Maps :
The wave map equations will be defined in the context of a space-time (M, g), a
Riemannian manifold N with metric h, and a mapping
: M N.
We recall that if X is a vectorfield on M then X is the vectorfield on N defined
by X(f ) = X(f ). If is a 1-form on N its pull-back is the 1-form on M
defined by (X) = ( X), where X is an arbitrary vectorfield on M. Similarly
the pull-back of the metric h is the symmetric 2-covariant tensor on M defined by
the formula ( h)(X, Y ) = h( X, Y ). In local coordinates x on M and y a on
N , if a denotes the components of relative to y a , we have,
a b
hab ((p)) = h , i
x x
x x
where < , > denotes the Riemannian scalar product on N .
( h) (p) =
1
hab () c
I1 =
g
a b gdx
c
2 U
I2 =
g hab () a b gdx
U
After integrating by parts, relabelling and using the symmetry in b, c, we can rewrite
I2 in the form,
Z
hab
c
b
I2 =
a hab ()g b + g
dvg
(32)
c
U
Z
1
hab
hac
b
c
=
a hab ()g b + g
+
dvg
2
c
b
U
Also, relabelling indices
1
I1 =
2
Z
U
hbc a
b c dvg .
a
18
Therefore,
0
= I1 + I2
Z
hac
hbc
hab
a
b
b
c 1
dvg
=
hab g + g
2 c
b
a
U
Z
hsc
hbc
hsb
a
d
b
c 1 ds
dvg
=
had g + g
h had
2
c
b
s
ZU
=
a had g d + b c g dbc dvg
U
where
dbc
= 12 hds
hsb
c
hsc
b
hbc
s
Example:
(33)
= f 0 (r)f (r)g 2 2
f 0 (r)
g 1 2
=
f (r)
The equations of wave maps can be given a simpler formulation when N is a submanifold of the Euclidean space Rm . In this case, the metric h is the Euclidean
metric3 so the first term in (31) vanishes.
Z
d
S(s)
=
g h , idvg
ds
x x
s=0
Z U
=
< , > dvg
U
(34)
where T here means the projection onto the tangent space of N at (p).
In the special case when N Rm is a hypersurface, we can rewrite (34) in a more
concrete form. Let be the unit normal on N and k the second fundamental
form k(X, Y ) = hDX , Y i, with DX the standard covariant derivative of Euclidean
space. The hypersurface N is defined (locally) as the level set of some real valued
3Use the standard coordinates of the ambient Euclidean space.
1. BASIC NOTIONS
19
Where (E ) = x
y i is the pushforward of E =
tangent to N . Therefore,
x .
In particular, (E ) is
< , >= k( (E ), (E ))
(35)
,
>
x x
3. Maxwell equations:
An electromagnetic field F is an exact two form on a four dimensional manifold
M. That is, F is an antisymmetric tensor of rank two such that
F = dA
(36)
(37)
yields another gauge potential A for F . This degree of arbitrariness is called gauge
freedom, and the transformations (37) are called gauge transformations.
The Lagrangian density for electromagnetic fields is
L[F ]
1
F F .
4
d
A =
A(s)
ds
s=0
20
gF
dvg
=
A F dvg =
A
g
U
U
Note that the second factor in the integrand is just D F where D is the covariant
derivative on M corresponding to g. Hence the Euler-Lagrange equations take the
form
D F = 0.
(38)
1
gX
D X =
g
We can write the Maxwell equations in a more symmetric form by using the Hodge
dual of F ,
1
?
F = F
2
and by noticing that (38) is equivalent to d ?F = 0. The Maxwell equations then
take the form
dF = 0,
d ?F = 0
(39)
D ?F = 0
(40)
or, equivalently,
D F = 0,
Note that since Lorentz transformations commute with both the Hodge dual and
exterior differentiation, the Lorentz invariance of the Maxwell equations is explicit
in (39).
Definition.
X F
H = (iX ?F )
X ?F
called, respectively, the electric and magnetic components of F . Note that both
these one-forms are perpendicular to X.
d
We specialize to the case when M is the Minkowski space and X = dxd 0 = dt
. As
d
remarked, E, H are perpendicular to dt , so E0 = H0 = 0. The spatial components
are by definition
Ei
F0i
Hi
F0i =
1
1
0ijk F jk = ijk F jk
2
2
1. BASIC NOTIONS
21
We now use (39) to derive equations for E and H from above, which imply
D ?F = 0
(41)
0,
i Hi = 0
k
(42)
0 Ei + j Fij
= 0 Ei + ijk j H k = t Ei + ( H)i
t Hi ijk j Ek = t Hi ( E)i
Therefore,
t E + H
(43)
t H E
(44)
(0)
Ei ,
(0)
Hi (0, x) = Hi
Exercise.
Show that the equations (42) are preserved by the time evolution of
the system (43)-(44). In other words if E (0) , H (0) satisfy (42) then they are satisfied
by E, H for all times t R.
4. Yang-Mills equations :
The Lagrangians of all classical field theories exhibit the symmetries of the spacetime. In addition to these space-time symmetries a Lagrangian can have symmetries
called internal symmetries of the field. A simple example is the complex scalar
Lagrangian,
1
L = m V (||)
2
where is a complex valued scalar defined on the Minkowski space-time Rn+1 ,
its complex conjugate. We note that L is invariant under the transformations
ei with a fixed real number. It is natural to ask whether the Lagrangian
can be modified to allow more general, local phase transformations of the form
(x) ei(x) (x). It is easy to see that under such transformations, the Lagrangian fails to be invariant, due to the term m . To obtain an invariant Lagrangian one replaces the derivatives by the covariant derivatives
(A)
D , + iA depending on a gauge potential A . We can now easily check
that the new Lagrangian
1
(A) V (||)
L = m D(A) D
2
is invariant relative to the local transformations,
(x ) ei(x) (x ) ,
called gauge transformations.
A A ,
22
Remark that the gauge transformations introduced above fit well with the definition
of the electromagnetic field F . Indeed setting F = dA we notice that F is invariant.
This allows us to consider a more general Lagrangian which includes F ,
1
1
L = F F m , , V (||)
4
2
called the Maxwell-Klein-Gordon Lagrangian.
The Yang-Mills Lagrangian is a natural generalization of the Maxwell-Klein-Gordon
Lagrangian to the case when the group SU (1), corresponding to the phase transformations of the complex scalar , is replaced by a more general Lie group G. In
this case the role of the gauge potential or connection 1-form is taken by a G valued
one form A = A dx defined on M. Here G is the Lie algebra of the Lie group G.
Let [ , ] its Lie bracket and < , > its Killing scalar product. Typically the Lie
group G is one of the classical groups of matrices, i.e. a subroup of either Mat(n, R)
or Mat(n, C). We pause briefly to recall some facts about the relavent Lie groups
and their Lie algebras.
(1) The orthogonal groups O(p, q). These are the groups of linear transformations of Ren which preserve a given nondegenerate symmetric bilinear
form of signature p, q, p + q = n. We denote by Rnp,q the corresponding
space. The case p = 0 is that of the Euclidean case, the group is then
simply denoted by O(n). The case p = 1, q = n is that of the Minkowski
space-time Rn+1 , the group O(1, n) is the Lorentz group. In general let
Q be the diagonal matrix whose first p diagonal elements are 1 and the
remaining ones are +1. Then,
O(p, q)
= {L Mat(n, R)|LT QL = Q}
= {L Mat(n, R)|LM LT = M }
(45)
and its Killing scalar product < A, B >= Tr(AB T ) (where Tr is the
usual trace for matrices) enjoys the compatibility condition
< A, [B, C] >= < [A, B], C >
(46)
4Recall that the Lie algebra of a Lie group G is simply the tangent space to G at the origin.
1. BASIC NOTIONS
23
(2) The unitary groups U(p, q). These are the complex analogues of the
orthogonal groups. They are the groups of all linear transformations of
Cn which preserve a given nondegenerate hermitian bilinear form. Denote
by Cnp,q the corresponding space. Then, with the matrix Q as above,
U(p, q) = {U Mat(n, C) | U QU = Q}
and,
SU(p, q) = {U U(p, q) | det U = 1},
The corresponding Lie algebras are,
u(p, q)
su(p, q)
{A Mat(n, C) | AQ + QA = 0},
where the trace trQ A = Qij Aij . The Lie bracket is again the usual one for
matrices. The Killing scalar product is given by < A, B >= Tr(AB ).
Remark also that dimR U(p, q) = n2 , dimR SU(p, q) = n2 1.
In the Yang-Mills theory one is interested in compact Lie groups with a positive
definite Killing form. This is the case for the groups O(n), SO(n), U (n), SU (n).
In a given system of coordinates the connection 1-form A has the form, A dx and
we define the (gauge) covariant derivative of a G-valued tensor by
D(A)
= D + [A , ]
(47)
where D is the covariant derivative on M. Observe that (47) is invariant under the
following gauge transformations, for a given G-valued gauge potential A and a Gvalued tensor ,
= U 1 U,
A = U 1 A U + D U 1 U
(48)
with U G.
Proposition 1.1.
D(A) = U 1 D(A)
U
]
A
= D
Proof :
Indeed
D U 1 U
=
=
=
D U 1 U + U 1 (D ) U + U 1 (D U )
U 1 (D U ) U 1 + D + (D U ) U 1 U
U 1 D + [, (D U ) U 1 ] U
24
as desired. Hence
D(A) = D + [A , ]
= U 1 D + [, U D U 1 ] + U 1 A U + D U 1 U, U 1 U
= U 1 D + [, (D U ) U 1 ] + [A , ] + U D U 1 , U
^
(A)
= U 1 (D + [A , ]) U = D
(49)
D (D ) + [A , D ]
D (D + [A , ]) + [A , D + [A , ]]
= D D + [D A , ] + [A , D ] + [A , D ] + [A , [A , ]]
So that
(D D D D )
[D A D A , ]
+ [A , [A , ]] [A , [A , ]]
{z
}
|
[[A ,A ],]
Therefore,
F
D A D A + [A , A ]
(50)
We leave it to the reader to show that the curvature tensor F is invariant under
gauge transformations. That is,
]
(A)
F
U 1 F (A) U
= F (A)
(51)
We are finally ready to present the generalization of the Maxwell theory provided
by the Yang-Mills Lagrangian:
L[A]
1
(A)
< F , F (A) >G
4
(52)
1. BASIC NOTIONS
25
which implies
D F = 0
(53)
(54)
(55)
(56)
t i A0 2 [Aj , j Ai ] + [Aj , i Aj ] + [t Ai , Aj ]
+2 [A0 , t Ai ] [A0 , i A0 ] [Aj , [Aj , Ai ]] + [A0 , [A0 , Ai ]]
(57)
(58)
26
= D ( A A + [A , A ])
= 2A + [A , A ] + [A , A ] [A , A ] + A , [A , A ]
+ 3
Again, it is not at all clear that one can transform an arbitrary solution
into the Lorentz gauge. In addition, we will have a hard time finding good
estimates for this purely hyperbolic system of nonlinear wave equations.
Temporal Gauge is specified by the condition A0 = 0.
Ldvg .
U
LG dvg
U
Z
SM
LM dvg
U
denoting, respectively, the actions for the gravitational field and matter. The matter Lagrangian LM depends only on the matterfields , assumed to be covariant
tensorfields, and the inverse of the space-time metric g which appears in the
contraction of the tensorfields in order to produce the scalar LM . It may also
depend on additional positive definite metrics which are not to be varied 6.
5In fact we only require that the corrsponding Euler-Lagrange equations should involve no
more than two derivatives of the metric.
6This is the case of the metric h in the case of wave maps or the Killing scalar product in
the case of the Yang-Mills equations.
1. BASIC NOTIONS
27
Now the only possible candidate for the gravitational Lagrangian LG , which should
be a scalar invariant of the metric with the property that the corresponding EulerLagrange equations involve at most two derivatives of the metric, is given7 by the
scalar curvature R. Therefore we set,
LG = R.
d
Consider now a compact variation (g(s) , U) of the metric g. Let g = ds
g |s=0 .
2
Z
Z
d
SG (s) =
Rdvg +
Rdv g
ds
s=0
U
U
Now,
1
dv g = g g dvg
2
Indeed, relative to a coordinate system, dvg = gdx0 dx1 . . . dxn Thus, the above
equality follows from,
g = gg g ,
with g the determinant
of g . On the other hand, writing R = g R and using
d
= g R + g R
. Therefore,
g(s) = g , we calculate, R
the formula ds
s=0
Z
Z
d
1
dvg
SG (s) = (R g R)g dvg +
g R
(59)
ds
2
s=0
U
U
we make use of the following Lemma,
To calculate R
Lemma 1.2. Let g (s) be a family of space-time metrics with g(0) = g and
d
d
the form R =
.
can be written as a space-time diReturning to (59) we find that since g R
vergence of a tensor compactly supported in U the corresponding integral vanishes
identically. We therefore infer that,
Z
d
SG (s) =
E g dvg
(60)
ds
s=0
U
7up to an additive constant
28
d
SM (s)
ds
s=0
Definition.
Z
LM
g
dv
+
LM dv g
g
U g
U
Z
1
LM
= ( g LM )g dvg
2
U g
(61)
D T = 0
(62)
which is the concise, space-time expression for the law of conservation of energymomentum of the matter-fields.
29
(s ) g
(s ) .
d
(gs ) = LX g = D X + D X
ds
s=0
Therefore
Z
0
T LX g dvg = 2
=
M
T D X dvg = 2
M
D T X dvg
(64)
T =
, , g (g , , + 2V ())
2
2
(2) The energy-momentum for wave maps is given by,
1
1
T =
< , , , > g (g < , , , >)
2
2
where < , > denotes the Riemannian inner product on the target manifold.
(3) The energy-momentum tensor for the Maxwell equations is,
1
T = F F g (F F )
4
(4) The energy-momentum tensor for the Yang-Mills equations is,
1
T =< F , F > g (< F , F >)
4
An acceptable notion of the energy-momentum tensor T must satisfy the following
properties in addition of the conservation law (64),
30
(1) T is symmetric
(2) T satisfies the positive energy condition that is, T(X, Y ) 0 , for any
future directed time-like vectors X, Y .
The symmetry property is automatic in our construction. The following proposition
asserts that the energy-momentum tensors of the field theories described above
satisfy the positive energy condition.
Proposition 2.1. The energy-momentum tensor of the scalar wave equation satisfies the positive energy condition if V is positive. The energy- momentum tensors
for the wave maps, Maxwell equations and Yang-Mills satisfy the positive energy
condition.
gii = 1 i = 1, . . . , n 1
31
Therefore,
g , , = L()L() + |
/ |2
where
|
/ |2 = (E(1) ())2 + (E(2) ())2 + . . . E(n1) ()2 .
Therefore,
1
|
/ |2 + V ().
2
according to the same calculation.
1
< E(), E() >
2
1
< E(), E() >
2
n1
1X
< E(i) (), E(i) () > .
2 i=1
T(L, L) =
(2) For wave maps we have,
T (E, E)
T (E, E)
T (E, E)
FA4 = ?A =
?
F34 = 2
FA3 = ?A
??
F34 = 2
2
1X
(A A + ?A ?A )
2
A=1
2
X
A=1
A A = ||2 0.
32
Similarly,
T (E(3) , E(3) ) =
2
X
A A = ||2 0
A=1
Another important property which the energy momentum tensor of a field theory
may satisfy is the trace free condition, that is
g T = 0.
It turns out that this condition is satisfied by all field theories which are conformally
invariant.
Definition. A field theory is said to be conformally invariant if the corresponding
action integral is invariant under conformal transformations of the metric
= g
g g
a positive smooth function on the space-time.
Proposition 2.2. The energy momentum tensor T of a conformally invariant field
theory is traceless.
Proof : Consider an arbitrary smooth function f compactly supported in U M.
Consider the following variation of a given metric g,
g (s) = esf g .
Let S(s) = SU [, g(s)]. In view of the covariance of S we have S(s) = S(0). Hence,
Z
d
0=
S(s)|s=0 =
T g dvg
ds
U
where
g =
Hence,
R
U
d
g (s)
= f g .
ds
s=0
3. CONSERVATION LAWS
33
We can easily check that the Maxwell and the Yang-Mills equations are conformally
invariant in 3 1-dimensions. The wave maps field theory is conformally invariant
in dimension 1 + 1, i.e. if the space-time M is two-dimensional9.
R
Remark: The action integral of the Maxwell equations, S = U F F dvg is
conformally invariant in any dimension provided that we also scale the electro = 2 g then dvg = n+1 dvg and if we also set
magnetic field F . Indeed if g
n3
F = 2 F we get
Z
F , g
] =
g
dvg
S[
F F g
Z
F F g g dvg
=
=
S[F, g].
We finish this section with a simple observation concerning conformal field theories
in 1+1 dimensions. We specialize in fact to the Minkowski space R1+1 and consider
the local conservation law, T = 0. Setting = 0, 1 we derive
0 T00 + 1 T01 = 0,
0 T01 + 1 T11 = 0
(66)
= 0 =
2B.
(67)
Using this observation it is is easy to prove that smooth initial data remain smooth
for all time.
For example, wave maps are conformally invariant in dimension 1 + 1. In this case
A = T00 =
1
(< t , t > + < x , x >) ,
2
Given data in C0 (R), (67) implies that the derivatives of remain smooth for all
positive times. This proves global existence.
3. Conservation Laws
The energy-momentum tensor of a field theory is intimately connected with conservations laws. This connection is seen through Noethers principle,
34
S[(t ) , (t ) g]
= S[, g].
Thus the action is preserved under (t ) . In view of Noethers Principle we
ought to find a conservation law for the corresponding Euler-Lagrange equations10.
We derive these laws using the Killing vectorfield X which generates t .
We begin with a general calculation involving the energy-momentum tensor T of
and an arbitrary vectorfield X. P the one-form obtained by contracting T with X.
P = T X
Since T is symmetric and divergence-free
D P
where
(X)
1
(D T ) X + T D X = T
2
(X)
Notation.
(LX g) = D X + D X
The restriction of this set to some time interval [t1 , t2 ], t1 t2 t, will be written
N[t1 ,t2 ] (t, x
). These null hypersurfaces are null boundaries of,
| t t}
J 1 (t, x
) = {(t, x) 0 t t; |x x
J
(t, x
) = {(t, x) t2 t t1 ; |x x
| t t}
[t2 ,t1 ]
t with N , respectively J .
At each point q = (t, x) along N (p) , we define the null pair (E+ , E ) of future
oriented null vectors
L = E+
= t +
xi x
i
i ,
|x x
|
L = E = t
xi x
i
i
|x x
|
3. CONSERVATION LAWS
35
N[t
Bt2
1 ,t2 ]
(p)
J[t
(p)
,t ]
Bt1
where,
Z
t2
hP, E i =
N[t
1 ,t2 ]
(p)
Z
hP, E i dat .
dt
t1
St
Z
T(t , X) +
Bt2
N[t
1 ,t2 ]
T(E , X)
(p)
T(t , X)
(69)
B t1
J[t
(p)
1 ,t2 ]
T (X) dtdx
In the particular case when X is Killing, its deformation tensor vanishes identically. Thus,
Corollary 3.3. If X is a killing vectorfield,
Z
Z
Z
T(t , X) +
T(L, X) =
Bt2
N[t
1 ,t2 ]
(p)
T(t , X)
(70)
Bt1
N[t
1 ,t2 ]
(p)
Bt1
In the case of a conformal field theory we can pick X to be the future timelike,
conformal Killing vectorfield X = K0 = (t2 + |x|2 )t + 2txi i . Thus,
Z
Z
Z
T(t , K0 ) +
T(L, K0 ) =
T(t , K0 )
(72)
Bt2
N[t
1 ,t2 ]
(p)
B t1
In (71) the term T(t , t ) is called energy density while T(E , t ) is called energy
flux density. The corresponding integrals are called energy contained in Bt1 , and
11The brackets h, i in (68) denote inner product with respect to the Minkowski metric.
36
Bt2 and, respectively, flux of energy through N . The coresponding terms in (72)
are called conformal energy densities, fluxes etc.
Equation (71) can be used to derive the following fundamental properties of relativistic field theories.
(1) Finite propagation speed
(2) Uniqueness of the Cauchy problem
R
The first property follows from the fact that, if Bt T(t , t ) is zero at
1
R
R
time t = t1 then both integrals Bt T(t , t ) and N
T(E , t ) must vanish
Proof :
[t1 ,t2 ]
also. In view of the positivity properties of the T it follows that the corresponding
integrands must also vanish. Taking into account the specific form of T, in a
particular theory, one can then show that the fields do also vanish in the domain
of influence of the ball Bt1 . Conversely, if the initial data for the fields vanish in
the complement of Bt1 , the the fields are identically zero in the complement of the
domain of influence of of Bt1 .
The proof of the second property follows immediately from the first for a linear
field theory. For a nonlinear theory one has to work a little more.
Exercise 1. Formulate an initial value problem for each of the field theories we
have encountered so far, scalar wave equation (SWE), Wave Maps (WM), Maxwell
equations (ME) and Yang-Mills (YM). Proof uniqueness of solutions to the initial
value problem, for smooth solutions.
The following is another important consequence of (71) and (72). To state the
results we introduce the following quantities,
Z
E(t) =
T (t , t ) (t, x)dx
(73)
n
ZR
Ec (t) =
T (K0 , t ) (t, x)dx
(74)
Rn
Theorem 3.4 (Global Energy). For an arbitrary field theory, if E(0) < , then
E(t) = E(0)
(75)
(76)
Proof : Follows easily by applying (71) and (72) to past causal domains J (p)
with p = (t, 0) between t1 = 0 and t2 = t and letting t +.
Exercise 2. Consider the Lagrangian,
1
L = m V (||)
2
3. CONSERVATION LAWS
37
= t + r
L = E
t r .
1
(t + r)2 E+ + (t r)2 E .
2
Thus,
Z
Ec (t)
=
Rn
1
1
(t + r)2 T++ + (t r)2 T + (t + r)2 + (t r)2 T+ dx.
4
4
|
{z
}
2(t2 +r 2 )
Z
=
Rn
Z
Ec (0)
1
1
1
(t + r)2 T++ + (t2 + r2 )T+ + (t r)2 T dx
4
2
4
Z
T(t , K0 )(0, x)dx =
|x|2 T(t , t )dx
Rn
(77)
Rn
R
Rn
The remaining term in (77) contains the factor (t r)2 which is constant along
outgoing null directions r = t + c. Hence for any 0 < < 1
Z
T = O(t2 )
|x|>(1+)t
Z
T = O(t2 ).
|x|<(1)t
38
Q(L, L)
= L()2
(78)
Q(L, L)
(79)
(80)
Q(L, L)
P
= |
/ |
= L()
where |
/ | = A |eA ()| with (eA )A=1,... ,n1 an orthonormal frame spanning the
orthogonal complement of L, L.
CHAPTER 2
General Equations
It is tempting to define PDE as the subject which is concerned with all partial
differential equations, just as Algebraic Geometry, say, deals with all polynomial
equations. According to this view, the goal of the subject is to find a general theory
of all, or very general classes of PDEs. Though this point of view is quite out of
fashion, it has nevertheless important merits which I hope to illustrate below. To
see the full power of the general theory we need to, at least, write down general
equations, yet will make sure to explain the main ideas in simplified cases. We
consider equations, or systems of equations, in Rd with respect to the variables
2
coordinates x1 , x2 . For the multi-index = (2, 0) we have u = x
1 x1 u = 1 u
sections.
39
40
2. GENERAL EQUATIONS
Consider first the one dimensional situation d = 1 in which case (81) becomes an
ordinary differential equation (ODE), or system of ODE. To simplify further take
k = 1 and N = 1, that is the case of an ordinary differential equation of order k = 1.
Then (81) is simply, F (x, u(x), x u(x)) = 0 where F is a given function of the three
variables x, u and p = x u such as, for example, F (x, u, p) = x p + u3 sinx. To
solve the equation (81) in this case is to find a function a C 1 function u(x) such
that
x x u(x) + u3 = sinx.
(82)
Now consider the case of a second order ODE, i.e. d = N = 1 and k = 2. Then
(81) becomes, F (x, u(x), x u(x), x2 u(x)) = 0, where F now depends on the four
variables x, u, p = x u, q = x2 u. As an example take F = q 2 +V 0 (u), for some given
function V = V (u), in which case (81) becomes the nonlinear harmonic oscillator
equation,
x2 u(x) + V 0 (u(x)) = 0
(83)
(84)
(85)
2. GENERAL EQUATIONS
41
Remark 2. All higher order scalar equations or systems can in fact be re-expressed
as first order systems, i.e. k = 1, by simply introducing all higher order derivatives
of u as unknowns together with the obvious compatibility relations between partial
derivatives. As an example consider equation (83) and set v = x u. We can then
rewrite the equation as a first order system with N = 2, namely x v + V 0 (u) =
0, x u v = 0.
An equation, or system, is called quasi-linear if it is linear with respect to the
highest order derivatives. A quasilinear system of order one (k = 1) in Rd can be
written in the form,
d
X
(86)
i=1
with h11 (u) = 1 + (2 u)2 , h22 (u) = 1 + (1 u)2 , h12 (u) = h21 (u) = 1 u 2 u,
which is manifestly a second order quasi-linear equation.
In the particular case when the top order coefficients of a quasilinear equation,
i.e. those corresponding to the highest order derivatives, depend only on the space
variables x Rd , the equation, or system, is called semi-linear. For example,
equation (20) derived in connection to the uniformization theorem, is semi-linear.
A linear equation, or system, of order k can be written in the form,
X
A (x) u(x) = F (x).
(88)
||k
Observe that the differential operator on the left hand side is indeed linear in the
sense discussed in our introduction. If in addition the coefficients A are constant in
x, the system is called linear with constant coefficients. The five basic equations (1)
(5) discussed in the introduction are all linear with constant coefficients. Typically,
these are the only equations which can be solved explicitly.
We thus have our first useful, indeed very useful, classification of PDEs into fully
nonlinear, quasi-linear, semi-linear and linear. A fully nonlinear equation is nonlinear relative to the highest derivatives. The typical example is the Monge Ampere
equation. For simplicity consider the case of functions of 2 variables u(x1 , x2 )
in R2 with hessian 2 u = (i j u)i,j=1,2 . Clearly the determinant det( 2 u) =
(12 u) (22 u) (1 2 u)2 , is quadratic with respect to the second derivatives of
42
2. GENERAL EQUATIONS
(89)
with f a given function defined on R2 RR2 , is fully nonlinear. This equation plays
an important role in Geometry, in relation to the isometric embedding problem as
well as to the problem of finding surfaces in R2 with prescribed Gauss curvature. A
variant of the Monge Amp`ere equation, for complex valued functions, plays a central
role in complex geometry in connection to Calabi -Yau manifolds. Calabi-Yau
manifolds, on the other hand, are central mathematical objects in String Theory.
Remark. Most of the basic equations of Physics, such as the Einstein equations,
are quasilinear. Fully nonlinear equations appear however in connection to the
theory of characteristics of linear PDE, which we discuss at length below, or in
geometry.
1. First order scalar equations
It does not make sense to give a systematic treatment of this classical topic since
there are many PDE books which do an excellent job, such as [E] or [J]. In what
follows I will only attempt to give the main ideas behind the theory. It turns out
that scalar (N = 1) first order (k = 1) PDE in d space dimensions can be reduced
to systems of first order ODE.
As a simple illustration of this important fact consider the following equation in
two space dimensions,
a1 (x1 , x2 )1 u(x1 , x2 ) + a2 (x1 , x2 )2 u(x1 , x2 ) = f (x1 , x2 )
1
(90)
2
43
(92)
dx2
= a2 x(s), u(s, x(s))
ds
(93)
d
u(x1 (s), x2 (s)) = f (x1 (s), x2 (s), u(x1 (s), x2 (s)))
(94)
ds
Unlike the previous case however (93) is undetermined; we need now to consider
the enlarged ODE system (93)-(94). where the unknowns are x1 (s), x2 (s), u(s) =
u(x1 (s), x2 (s)). As a special example of (92) consider the scalar equation in two
space dimensions,
t u + ux u = 0,
1
u(0, x) = u0 (x)
2
(95)
1
called the Burger equation. Since a = 1, a = u we can set x (s) = s, x (s) = x(s)
in (93) and thus derive its characteristic equation in the form,
dx
(s) = u(s, x(s)).
(96)
ds
Observe that, for any given solution u of (95) and any characteristic curve (s, x(s))
d
we have ds
u(s, x(s)) = 0. Thus, in principle, the knowledge of solutions to (96)
would allow us to determine the solutions to (95). This, however, seems circular
since u itself appears in (96). To see how this difficulty can be circumvented consider
the initial value problem for (95), i.e. look for solutions u which verify u(0, x) =
u0 (x). Consider an associated characteristic curve x(s) such that, initially, x(0) =
x0 . Then, since u is constant along the curve, we must have u(s, x(s)) = u0 (x0 ).
Hence, going back to (96), we infer that dx
ds = u0 (x0 ) and thus x(s) = x0 + su0 (x0 ).
We thus deduce that,
u(s, x0 + su0 (x0 )) = u0 (x0 )
(97)
which gives us, implicitly, the form of the solution u. We see once more, from (97),
that if the initial data is smooth (or real analytic) everywhere except at a point x0 ,
of the line t = 0, then the corresponding solution is also smooth (or real analytic)
everywhere, in a small neighborhood V of x0 , except along the characteristic curve
which initiates at x0 . The smallness of V is necessary here because new singularities
can form in the large. Observe indeed that u has to be constant along the lines
x + su0 (x) whose slopes depend on u0 (x). At a point when these lines cross, we
would obtain different values of u which is impossible unless u becomes singular
at that point. In fact one can show that the first derivative ux becomes infinite at
the first singular point, i.e. the singular point with the smallest value of |t|. This
blow-up phenomenon occur for any smooth, non-constant, initial data u0 .
44
2. GENERAL EQUATIONS
Remark. There is an important difference between the linear equation (90) and
quasi-linear equation (92). The characteristics of the first depend only on the
coefficients a1 (x), a2 (x) while the characteristics of the second depend, explicitely,
on a particular solution u of the equation. In both cases, singularities can only
propagate along the characteristic curves of the equation. For nonlinear equations,
however, new singularities can form in the large, independent of the smoothness of
the data.
The above procedure extends to fully nonlinear scalar equations in Rd of the form,
t u + H(x, u) = 0,
u(0, x) = u0 (x)
(98)
dxi
=
H(x(t), p(t)),
dt
pi
dpi
= i H(x(t), p(t)).
dt
x
(99)
(100)
dx
dt
dp
= x H(x(t), p(t)).
dt
d
On the other hand, dt
u(t, x(t)) = t u(t, x(t)) + x u(t, x(t))p H(x(t), p(t)), and,
using equation (98), t u(t, x(t)) = H(x(t), x u(t, x(t)) = H(x(t), p(t)). Thus,
d
u(t, x(t)) = H(x(t), p(t)) + p(t)p H(x(t), p(t)),
dt
from which we see, in principle, how to construct u based only on the knowledge
of the solutions x(t), p(t), called the bicharacteristic curves of the nonlinear PDE.
Once more singularities can only propagate along bichararcteristics. As in the case
of the Burger equation singularities will occur, for essentially, all smooth data;
thus a classical, i.e. continuously differentiable, solution can only be constructed
locally in time. Both Hamilton-Jacobi equation and hamiltonian systems play a
fundamental role in Classical Mechanics as well as in the theory of propagation
of singularities in linear PDE. The deep connection between hamiltonian systems
and first oder Hamilton-Jacobi equations have played an important role in the
introduction of the Schr
odinger equation in quantum mechanics.
2.
45
(101)
(102)
The reader may assume, for simplicity, that (101) is a scalar equation and that f
is a nice function of x and u, such as f (x, u) = u3 u + 1 + sinx. Observe that the
knowledge of the initial data u0 allows us to determine x u(x0 ). Differentiating the
equation (101) with respect to x and applying the chain rule, we derive,
x2 u(x) = x f (x, u(x)) + u f (x, u(x))x u(x) = cos x + 3u2 (x)x u(x) x u(x)
Hence, x2 u(x0 ) = x f (x0 , u0 ) + u f (x0 , u0 )x u0 and since x u(x0 ) has already
been determined we infer that x2 u(x0 ) can be explicitely calculated from the initial data u0 . The calculation also involves the function f as well as its first
partial derivatives. Taking higher derivatives of the equation (101) we can recursively determine x3 u(x0 ), as well as all other higher derivatives of u at x0 .
One can
u(x) with the help of the Taylor series
Pthan,1 ink principle, determine
1 2
x u(x0 )(xx0 )k = u(x0 )+x u(x0 )(xx0 )+ 2!
x (x0 )(xx0 )2 +. . . .
u(x) = k0 k!
We say in principle because there is no guarantee that the series converge. There is
however a very important theorem, called the Cauchy-Kowalewski theorem, which
asserts that, if the function f is real analytic, which is certainly the case for our
f (x, u) = u3 u + 1 + sinx, then there exists a neighborhood J of x0 where the
Taylor series converge to a real analytic solution u of the equation. One can the
easily show that the solution such obtained is the unique solution to (101) subject
to the initial condition (102).
The same result may not hold true if we consider a more general equation of the
form,
a(x, u(x))x u = f (x, u(x)),
u(x0 ) = u0
(103)
Indeed the recursive argument outlined above breaks down in the case of the scalar
equation (xx0 )x u = f (x, u) for the simple reason that we cannot even determine
x u(x0 ) from the initial condition u(x0 ) = u0 . A similar problem occurs for the
equation (u u0 )x u = f (x, u). An obvious condition which allows us to extend
our previous recursive argument to (103) is that a(x0 , u0 ) 6= 0. Otherwise we say
that the initial value problem (103) is characteristic. If both a and f are also
real analytic the Cauchy-Kowalewski theorem applies and we obtain a unique, real
analytic, solution of (103) in a small neighborhood of x0 . In the case of a N N
system,
A(x, u(x))x u = F (x, u(x)),
u(x0 ) = u0
(104)
(105)
46
2. GENERAL EQUATIONS
It turns out, and this is extremely important, that while the non-degeneracy condition (105) is essential to obtain a unique solution of the equation, the analyticity
condition is not at all important, in the case of ODE. It can be replaced by a simple
local Lipschitz condition for A and F , i.e. it suffices to assume, for example, that
only their first partial derivatives exist and that they are merely locally bounded.
This is always the case if the first derivatives of A, F are continuous.
The following local existence and uniqueness (LEU) theorem is called the fundamental theorem of ODE.
Theorem[Fundamental theorem for ODE] If the matrix A(x0 , u0 ) is invertible
and if A, F are continuous and have locally bounded first derivatives then there
exists a time interval x0 J R and a unique solution3 u defined on J verifying
the initial conditions u(x0 ) = u0 .
Proof The proof of the theorem is based on the Picard iteration method. The idea
is to construct a sequence of approximate solutions u(n) (x) which converge to the
desired solution. Without loss of generality we can assume A to be the identity
matrix4. One starts by setting u(0) (x) = u0 and then defines recursively,
x u(n) (x) = F (x, u(n1) (x)),
u(n1) (x0 ) = u0
(106)
Observe that at every stage we only need to solve a very simple linear problem,
which makes Picard iteration easy to implement numerically. As we shall see below,
variations of this method are also used for solving nonlinear PDE.
...... To fill in the proof.....
Remark. The local existence theorem is sharp, in general. Indeed we have seen
that the invertibility condition for A(x0 , u0 ) is necessary. Also, in general, the
interval of existence J may not be extended to the whole real line. As an example
consider the nonlinear equation x u = u2 with initial data u = u0 at x = 0, for
u0
which the solution u = 1xu
becomes infinite in finite time, i.e. it blows-up.
0
Once the LEU result is established one can define the main goals of the mathematical theory of ODE to be:
(1) Find criteria for global existence. In case of blow-up describe the limiting
behavior.
(2) In case of global existence describe the asymptotic behavior of solutions
and family of solutions.
Though is impossible to develop a general theory, answering both goals (in practice
one is forced to restrict to special classes of equations motivated by applications),
the general LEU theorem mentioned above gives a powerful unifying theme. It
3Since we are not assuming analyticity for A, F the solution may not be analytic, but it has
continuous first derivatives.
4since A is invertible we can multiply both sides of the equation by the inverse matrix A1
3.
47
would be very helpful, really wonderful, if a similar situation were to hold for
general PDE.
u|H = u0
(107)
i=1
(108)
N (s) N (s) = 1
48
2. GENERAL EQUATIONS
To fully determine U (s) it remains to determine its projection on the normal vector
N (s), i.e. U (s) N (s). Indeed, since V (x) and N (x) span R2 , at all points x =
(x1 (s), x2 (s)) along our curve, we have
U (s) = (U V )(s)
V (s)
+ (U N )(s)N (s)
|V (s)|2
(110)
A(s) V (s)
+ (U (s) N (s)) A(s) N (s)
|V (s)|2
(111)
If, on the other hand, A(s) N (s) = 0 then, since V (s) N (s) = 0, we infer that the
dx
vectors A(s) and V (s) = dx
ds must be proportional, i.e. ds = (s)A(s). One can
then reparametrize the curve H, i.e. introduce another parameter s0 = s0 (s) with
ds0
ds = (s), such that relative to the new parameter we have = 1. This leads to
the equation,
dx1
= a1 x(s), u(x(s) ,
ds
dx2
= a2 x(s), u(x(s))
ds
i.e.
(112)
(113)
3.
49
With a little more work we can extend our discussion to general higher order quasilinear equations, or systems and get a simple, sufficient condition, for a Cauchy
problem to be non-characteristic. Particularly important for us are second order
(k = 2) scalar equations (N = 1). To keep things simple consider the case of a
second order, semi-linear equation in Rd ,
d
X
(114)
i,j=1
, or in
gradient . Define the unit normal at a point x0 H to be N = ||
i
components ni = ||
. As initial conditions for (114) we prescribe u and its normal
derivative N u(x) = n1 (x)1 u(x) + n2 (x)2 u(x) + . . . nd (x)d u(x) on H,
xH
(115)
We need to find a condition on H such that we can determine all higher derivatives
of a solution u, at x0 H, from the initial data u0 , u1 . We can proceed exactly in
the same manner as before, and find that all second order derivatives of u can be
determined at a point x0 H, provided that,
d
X
(116)
i,j=1
It is indeed easy to see that the only second order derivative of u, which is not
automatically determined from u0 , u1 , is of the form N 2 u(x0 ) = N (N (u))(x0 ).
This latter can be determined from the equation (114), provided that (116) is
verified. One does this by decomposing all partial derivatives of u into tangential
and normal components, as we have done in (110). One can then show, recursively,
that all higher derivatives of u can also be determined. Thus, (116) is exactly the
non-characteristic condition we were looking for.
Pd
If, on the other hand, i,j=1 aij (x)ni (x)nj (x) = 0 at all points we call H a characteristic hypersurface for the equation (114). Since ni = |
we find that H is
i |
characteristic if and only if,
d
X
(117)
i,j=1
Remark Observe that only the left hand side6 of (114), called 7, is relevant in
determining the characteristic surfaces of the equation.
Example 1.
Rd ,
x Rd
(118)
i,j=1
50
2. GENERAL EQUATIONS
form (87). It is easy to check that, the quadratic form associated to the symmetric
matrix hij (u) is positive definite independent of u. Indeed,
hij (u)i j = (1 + |u|2 )1/2 ||2 (1 + |u|2 )1 ( u)2 > 0
Thus, even though (87) is not linear, we see that all surfaces in R2 are noncharacteristic.
Example 2. Consider the wave equation u = f in R1+d . All hypersurfaces of
the form (t, x) = 0 for which,
(t )2 =
d
X
(i )2 ,
(119)
i=1
are characteristic. This is the same eikonal equation which has appeared before in
(149). Observe that it splits into two Hamilton-Jacobi equations, see (98),
t =
d
X
(i )2 )1/2
(120)
i=1
The bicharacteristic curves of the associated Hamiltonians are called bicharacteristic curves of the wave equation. As particular solutions of (429) we find,
+ (t, x) = (t t0 ) + |x x0 | and (t, x) = (t t0 ) |x x0 | whose level surfaces
= 0 correspond to forward and backward light cones with vertex at p = (t0 , x0 ).
These represent, physically, the union of all light rays emanating from a point source
at p. The light rays are given by the equation (t t0 ) = (x x0 ), for R3
with || = 1, and are precisely the (t, x) components of the bicharacteristic curves
of the Hamilton-Jacobi equations (120).
More general, consider the linear wave equation,
g = 0.
(121)
(122)
3.
51
Thus, through any point p R1+n pass two distinct characteristic surfaces. The
same is true for the general case.
The bicharacteristics of the corresponding hamiltonian systems are called bicharacteristic curves of (123).
Remark. In the case of the first order scalar equations (90) we have seen how the
knowledge of characteristics can be used to find, implicitly, the general solutions.
We have shown, in particular, that singularities propagate only along characteristics. In the case of second order equations the situation is more complicated. the
characteristics are typically9 not sufficient to solve the equations, but they continue to provide important information, such as propagation of singularities. For
example, in the case of the wave equation u = 0 with smooth initial data u0 , u1
everywhere except at a point p = (t0 , x0 ), the solution u has singularities present
at all points of the light cone (t t0 )2 + |x x0 |2 = 0 with vertex at p. A more
refined version of this fact shows that the singularities propagate along bicharacteristics. The general principle here is that singularities propagate along characteristic
hypersurfaces of a PDE. Since this is a very important principle it pays to give it
a more precise formulation which extends to general boundary conditions, such as
the Dirichlet condition for (??).
Propagation of singularities10. If the boundary conditions, or the coefficients
of a linear PDE with smooth (or real analytic) coefficients are singular at some
point p, and smooth ( or real analytic) away from p in some small neighborhood
V , then a solution of the equation may only be singular in V along a characteristic
hypersurface passing through p. If there are no such characteristic hypersurfaces,
any solution of the equation must be smooth ( or real analytic) in V \ {p}.
Remark 1. The principle as stated is far too general, it can be proved only if
specific assumptions are made on the symbol of the operator. It should be viewed
however as something one might expect for a reasonable equation.
Remark 2. The principle can be extended, under specific minimum regularity
assumptions on solutions, to the nonlinear case. It is however invalid in the large.
Indeed, as we have shown in in the case of the Burger equation, solutions to nonlinear evolution equations, can develop new singularities independent of the smoothness of the initial conditions. Global versions of the principle can be formulated
for linear equations, based on the bicharacteristics of the equation, see remark 3
below.
Remark 3. According to the principle it follows that any solution of the equation
u = f , verifying the boundary condition u|D = u0 , with a boundary value u0
which is merely continuous, has to be smooth everywhere in the interior of D
9 Characteristics enter however in the explicit form of the fundamental solution for the
standard wave equation. This was made particularly obvious in the derivation starting with the
ansatz (148). The also play a major role to construct approximate solutions for wave equations
with variable coefficients, such as (123)
10A more precise version of the principle relates propagation of singularities to bicharacteristics curves.
52
2. GENERAL EQUATIONS
provided that f itself is smooth there. Moreover the solution is real analytic, if f
is real analytic.
Remark 4. More precise versions of this principle, which plays a fundamental role
in the general theory, can be given for linear equations. In the case of the general
wave equation (123), for example, one can show that singularities propagate along
bicharacteristics. These are the bicharacteristic curves associated to the HamiltonJacobi equation (124).
4. Cauchy-Kowalevsky Theorem
In the case of ODE we have seen that a non-characteristic initial value problem
admits always local in time solutions. Is there also a higher dimensional analogue
of this fact ? The answer is yes provided that we restrict ourselves to an extension
of the Cauchy -Kowalewsky theorem. More precisely one can consider general
quasilinear equations, or systems, with real analytic coefficients, real analytic hypersurfaces H, and real analytic initial data on H.
Theorem[Cauchy-Kowalevsky (CK)] If all the real analyticity conditions made
above are satisfied and if H is non-characteristic at x0 11, there exists locally, in a
neighborhood of x0 , a unique real analytic solution u(x) verifying the system and
the corresponding initial conditions.
The CK theorem validates thePmost straightforward attempts to find solutions
4.
CAUCHY-KOWALEVSKY THEOREM
53
Remark. The remarkable thing about Holmgrens theorem is that it proves uniqueness even in cases where existence of solutions cannot be guaranteed. Thus, as we
shall see below, the Cauchy problem for the wave equation with data on the hyperplane x1 = 0 does not, in general, have solutions, yet Holmgrens theorem asserts
that if a solution exists it must be unique.
At first glance it may seem that the CK theorem is a perfect analogue of the fundamental theorem for ODEs. It turns out, however, that the analyticity conditions
required by the CK theorem are much too restrictive and thus the apparent generality of the result is misleading. A first limitation becomes immediately obvious when
we consider the wave equation u = 0 whose fundamental feature of finite speed
of propagation12 is impossible to make sense in the class of real analytic solutions.
A related problem, first pointed out by Hadamard, concerns the impossibility of
solving the Cauchy problem, in many important cases, for arbitrary smooth, non
analytic, data. Consider, for example, the Laplace equation u = 0 in Rd . As we
have established above, any hyper-surface H is non-characteristic, yet the Cauchy
problem u|H = u0 , N (u)|H = u1 , for arbitrary smooth initial conditions u0 , u1 may
admit no local solutions, in a neighborhood of any point of H. Indeed take H to
be the hyperplane x1 = 0 and assume that the Cauchy problem can be solved,
for a given, non analytic, smooth data in an domain which includes a closed ball
B centered at the origin. The corresponding solution can also be interpreted as
the solution to the Dirichlet problem in B, with the values of u prescribed on the
boundary B. But this, according to our heuristic principle13, must be real analytic
everywhere in the interior of B, contradicting our initial data assumptions.
On the other hand the Cauchy problem, for the wave equation u = 0 in Rd+1 ,
has a unique solution for any smooth initial data u0 , u1 , prescribed on a space-like
hyper-surface, that is a hypersurface (t, x) = 0 whose normal vector, at every
point p = (t0 , x0 ), is directed inside the interior of the future or past directed light
cone passing through that point. Formally this means,
|t (p)| >
d
X
|i (p)|2
1/2
(125)
i=1
d
X
|i (p)|2
1/2
(126)
i=1
In this case we cannot, for general non real analytic initial conditions, find a solution
of the IVP. An example of a time-like hypersurface is given by the hyperplane
x1 = 0.
12Roughly this means that if a solution u is compactly supported at some value of t it must
be compactly supported at all later times. Analytic functions cannot be compactly supported
without vanishing identically.
13which can be easily made rigorous in this case
54
2. GENERAL EQUATIONS
Definition. A given problem for a PDE is said to be well posed if both existence
and uniqueness of solutions can be established for arbitrary data which belong to a
specified large space of functions, which includes the class of smooth functions14.
Moreover the solutions must depend continuously on the data.
The continuous dependence on the data is very important. Indeed solutions to the
IVP for a PDE would be of little use if very small changes of the initial conditions
will result, instantaneously, in very large changes in the corresponding solutions.
It is only in the class of smooth solutions that the theory of PDE becomes really
interesting, relevant and challenging. It means that we have to give up hope for
a all encompassing result and look instead for special classes of equations which
have common features, or really just on special important equations. It is in that
sense that the generality of the CK theorem is really an illusion. The true study
of partial differential equations only begins when we give up on analyticity. In the
next chapter we will analyze in detail the main analytic properties of the simplest
equations such as Cauchy-Riemann, Laplace, Heat and Wave equations using their
corresponding fundamental solutions. To do this we need fist to recall the theory
of distributions.
14Here we are necessarily vague. A precise space can be specified in each specific case.
CHAPTER 3
Distribution Theory
This is a very short summary of distribution theory, for more exposure to the
subject I suggest F.G. Friedlander and M. Joshis excellent book Introduction to
the Theory of Distributions, [Fr-Io]. Hormanders first volume of The Analysis of
Linear Partial Differential Operators, [Ho], in Springer can also be useful.
Notation. Throughout these notes we use the notation A . B to mean a cB
where c is a numerical constant, independent of A, B. When Rn is a set, we
may write (x ) to denote the indicator function of the set . For instance,
(5 |x| < 7) is a function equal to 1 for 5 x < 7 and 0 otherwise.
(127)
P3
where = i=1 i2 denotes the Laplacian, and on physical grounds we may require
V (or at least its derivative) to vanish at infinity so that distant interactions are
weak.
As with any other field theory, the physical theory cannot be valid and complete
unless there exists a unique solution to the equation (127) (for reasonable data )
which depends continuously, in some sense, on the data. In addition to resolving
these issues, we seek at least a qualitative understanding of the behavior of the
solution. In the present case, thanks to a huge amount of symmetry, we will even
be able to derive an explicit formula, but forP
the heuristic analysis involved, it will
3
only be important that the operator = i=1 i2 is linear and commutes with
55
56
3. DISTRIBUTION THEORY
Z
V (x) =
Vy (x)(y)dy
(128)
Formally, we can even manage to solve the equation Vy (x) = y (x) for any fixed
y R3 . In view of the translation invariance of , we may assume that y = 0.
Since is rotationally invariant (see Exercise 1) and so is 0 , then any solution
V0 (x) = V0 (|x|) should also be rotationally invariant if solutions are to be unique.
We call V0 (x) a fundamental solution for the Laplace operator. Then, postulating
the existence and spherical symmetry of V0 (x), we obtain (using the divergence
theorem)
Z
1=
0 (x)dx
|x|R
Z
=
V0 (x)dx
|x|R
dV
(|x|)d(x)
|x|=R dr
dV
= 4R2
(R)
dr
=
1
We choose the only fundamental solution decaying at infinity, namely V0 (x) = 1
4 R .
1
Therefore, translating back to y , we find Vy (x) = 1
4 |xy| . One can see by direct
computation that Vy (x) = 0 away from y, and one can even prove that (128) does
indeed solve (127) for (say) smooth, compactly supported densities . Furthermore,
by taking the gradient of (128), one obtains the experimentally refutable conclusion
1.
57
Given a fundamental solution for L we can find solutions for the equation Lu = f ,
for any smooth, compactly supported f by setting, formally (never mind, for the
moment, that the integration may make no sense),
Z
u(x) = Vy (x)f (y)dy
Exercise 1. Show, informally, that if L commutes with translations in the sense
that (Lf )( + y) = L(f ( + y)) for all translations x 7 x + y then the fundamental
solution also commutes with translations, in the sense that Vy (x) = V (x y) with
V verifying L(V ) = 0 .
Once a fundamental solution Vy of an operator L has been found, we need to make
sense of it as a generalized function as well as of the formal integration above. This is
precisely what the theory of distributions accomplishes. Distribution theory allows
us to make heuristic calculations rigorous and, even more importantly, enables us
to deal with singular objects as if they were regular functions. There are, of course,
limits to this new freedom which a good theory should spell out.
Exercise 2.
It is not difficult to show that, for R C0 (R3 ), the potential
R
(y)dy
1
1
R3
(y)dy behaves near infinity like 1
+ o(|x|2 ) away
V (x) = 4 R3 |xy|
4
|x|
from the support of . One way toR prove this asymptotic and understand the error
1 d
1
1
1
is to Taylor expand |xsy|
= |x|
+ 0 ds
|xsy| ds (the idea being that the parameter
y is relatively small).
When the
R charge distribution is centered at the origin (that is, the vector-valued
integral R3 y(y)dy = 0), show the more precise result (with explicit remainder)
that, as |x| ,
R
1 R3 (y)dy
V (x) =
+ O(|x|3 )
4
|x|
It may help keep computations
R 1 simple00 to apply the precise, first order Taylor expan1
0
.
sion (1) = (0) + (0) + 0 (1 s) (s)ds to the auxiliary function (s) = |xsy|
Also, a convenient way to differentiate the absolute value function is to observe
that |x sy|2 =< x sy, x sy > where <, > denotes the Euclidean inner product.
R
Remark:
IfR the total
is not 0, then one can find a center of
R charge (y)dy
R
charge yc = ydy/ dy so that (y yc )(y)dy = 0. In this situation, we
58
3. DISTRIBUTION THEORY
could Taylor expand about y = yc to see that the associated potential behaves
asymptotically as though it were centered at yc :
R
1 R3 (y)dy
V (x)
.
4 |x yc |
R
Notice, however, that when the charge cancels out in the sense that (y)dy = 0,
C
the associated potential function V decays more rapidly at infinity as |x|
2 . This
phenomenon of increased decay for localized, oscillatory data is not only physically
important for explaining why electric forces are weak over distances when charge
cancels, but it is also important in analysis where a similar cancelation arises in
many other naturally occurring situations. We will see this sort of cancelation being
used in a critical way later in the notes.
Exercise 3. The reasoning in the previous section can be extended to solve
for the potential inside of a bounded region whose boundary is grounded. That is,
consider the problem V (x) = (x) for x in a bounded domain with V = 0 on
the boundary.
In principle, how could you construct a general solution of the form
R
V (x) = K(x, y)(y)dy ? Where does linearity come in?
Exercise 4. Suppose that a unit of negative charge has been distributed uniformly
over the sphere of radius R1 in R3 , and that a unit of positive charge has been
distributed uniformly on the sphere of radius R2 . Find the electrostatic potential
function V associated to this charge configuration .
Exercise 5. a. Use the informal argument from the introduction to find the
fundamental solution Kn (x) of in Rn for every n 2; i.e. solve Kn (x) = 0 (x)
with an explicit formula for Kn .
b. Discuss the behavior as |x| of the corresponding solution
Z
V (x) = Kn (x y)(y)dy
for compactly supported. Namely, as |x| , what is the main term and how
large is the error?
59
f
Let Rn and f C (). We denote by i f the partial derivative x
, i =
i
1, . . . , n. For derivatives of higher order we use the standard multi-index notation.
A multi-index is an n-tuple = (1 , . . . , n ) of nonnegative integers with length
|| = 1 + +n . Set + = (1 +1 , . . . , n +n ). We denote by ! the product
of factorials 1 ! n !. Now set f = 11 nn f . Clearly + f = f .
Given two smooth functions u, v we have the Leibniz formula,
(u v) =
X
+=
!
u v.
!!
X 1
f (0)x + O(|x|k+1 )
!
as
x 0.
||k
n
1
Here x denotes the monomial x = x
1 xn .
Z
f (y)(
xy
)dy =
Z
f (x z)(z)dz.
We have:
(1) The functions f are in C0 (Rn ) and supp(f ) supp(f ) + B(0, ).
(2) We have f f uniformly as 0.
Proof : The first part of the proposition follows immediately from the definition
since the statement about supports is immediate and, by integration by parts, we
can transfer all derivatives of f on the smooth part of the integrand . To prove
the second statement we simply write,
f (x) f (x) =
Z
f (x z) f (x) (z)dz.
60
3. DISTRIBUTION THEORY
Therefore, for || k,
| f (x) f (x)|
| f (x z) f (x)||(z)|dz
|(z)|dz sup | f (x z) f (x)|
|z|1
|z|1
The proof follows now easily in view of the uniform continuity of the functions f .
As a corollary of the Proposition, one can easily check that the space of test functions C0 () is dense in the spaces C k () as well as Lp (),
R 1 p < . Of course,
one must first exhibit at least one such C0 (Rn ) with dx = 1. Some multiple
1
of the bump function (x) = e 1|x|2 (|x| < 1) will do. Another way to construct
an example is by starting with any C 1 bump function and taking advantage of the
smoothing effects of random translations (as in the above proposition) but keeping the support under control to obtain a smooth bump function as a limit of an
iterative process.
Definition 2.2. A distribution u D0 () is a linear functional u : C0 () C
verifying the following property:
For any compact set K there exists an integer N and a constant C = CK,N
such that for all C0 (), with supp() K we have
X
| < u, > | C
sup | |.
||N
If the same integer N can be used in the above definition for every K, then the
smallest such N is called the order of the distribution. For example, the Riesz
Representation theorem (characterizing the dual of C(X) for compact Hausdorff
spaces) guarantess that distributions of order 0 are Borel measures.
Equivalently, a distribution u is a linear functional u : C0 () C which is continuous with respect to some topology defined on C0 (). This topology turns out
to be a rather unorthodox one (non-metrizable1, locally convex) but never mind all
this; we can go quite far without worrying in the least about the precise definition.
All we need to know is that in this topology a sequence j converges to 0 in C0 ()
if all the supports of j are included in a compact subset of and, for each multiindex , j 0 in the uniform norm. With this definition in mind we have the
following very useful characterization of distributions:
1This topology can be constructed as an inductive limit topology of Fr
echet spaces CK , where
K is compact and CK is the space of all smooth functions supported in K, endowed with
a Fr
echet space structure by the seminorms 7 supK | | for all multi-indices . We do not,
however, need the precise definition.
61
Example 1:
We can thus identify L1loc () as a subspace of D0 (). This is true in particular for
the space C () L1loc ().
R
One often uses the formal notation < u, >= u(x)(x)dx even when u D0 ()
is not a locally integrable function, and even when is not technically a test function. This notation can be conceptually simpler, but keep in mind that this is in no
way a genuine Lebesgue integral. One can, however, typically interpret this formal
integration as a limit of classical integrals.
Example 2:
by
62
3. DISTRIBUTION THEORY
Now, since x K is restricted to some compact setRK Rn , then for every sequence
1
hi 0, the associated sequence of functions y 7 0 k (x + thi ek y)dt, together
with all its derivatives, converge uniformly toward k (x y) and its corresponding
derivatives. Moreover they are all compactly supported with supports contained in
some compact set K 0 . Therefore,
u(x + hek ) u(x)
= uk (x).
lim
h0
h
and thus u has continuous partial derivatives. We can continue in this manner
and conclude that in fact u C (Rn ).
3. Differentiation of distributions: For every distribution u D0 () we define
< u, >= (1)|| < u, > .
We make this definition to be consistent with the integration by parts formula for
functions
Z
Z
i u(x)(x)dx = u(x)(i (x))dx, C0 (Rn ),
which may be proven, for example, be considering difference quotients.
63
1 = lim
has a density function 1 (x) and the limit can be taken in the topology of C0 .
In this dual sense, we have 1 u = limh0 u(x+heh1 )u(x) in the weak topology, which
often enables us to differentiate under the integral sign provided we interpret all
integrals in the distribution-theoretic sense.
We can now define the action of a general linear partial differential operator on
distributions. Indeed let,
X
P (x, ) =
a ,
a C (),
||m
1.) The simplest nontrivial distribution is the Dirac delta function 0 = 0 (x),
defined by < 0 (x), >= (0). We will sometimes write (x) without a subscript
to indicate the point mass at the origin on R.
2.) Another simple example is the Heaviside function H(x) equal to 1 for x 0
and zero for x < 0. Or, using the standard identification between locally integrable
functions and distributions,
Z
< H(x), >=
(x)dx.
0
0
Rx
64
3. DISTRIBUTION THEORY
3.) A more elaborate example is pv( x1 ), or simply x1 , called the principal value
distribution,
Z
Z
1
1
1
(x)dx +
(x)dx .
< , >= lim
0
x
x
x
Observe that log |x| is locally integrable and thus a distribution by the standard
d
identification. One can show easily that dx
log |x| = pv( x1 ). Note that pv( x1 ) is an
odd distribution (it is orthogonal to even test functions), and is of order 1 even
though it is of order 0 away from the origin. In fact, decomposing = ev + odd
into even and odd parts, we have
1
< pv( ), >=
x
odd
dx =
x
Z Z
0 (tx)dt dx
1
. We also remark that the function |x|
(x 6= 0), in contrast, does not admit an
extension as a distribution to the whole line.
Exercise 1.
Show that the distribution t d (t) on the line is equal to (t),
dt
which is a nonzero distribution. This may seem counterintuitive since either t or
0 (t) seems to vanish at every point.
Exercise 2. Let, for z C with 0 < arg(z) < , log z = log |z| + iarg(z). We can
+
regard x log z = log(x + iy) as a family of distributions
depending on y R .
For x 6= 0 we have limy0+ log z = log |x| + i 1 H(x) . Show that as y 0 in
1
R+ , x log z converges weakly to a distribution x+i0
and,
1
= x1 i0 (x).
x + i0
(129)
65
(130)
B(a, b) =
sa1 (1 s)b1 ds
(131)
(a) (b)
(a + b)
(132)
sin(a)
(133)
Prove formulas (132) and (133). For help see Hormander, [?] section
Definition 3.2. For Re(a) > 0, we denote by ja () the locally integrable function
which is identically zero for < 0 and
1 a1
ja () =
,
> 0.
(134)
(a)
The following proposition is well known,
66
3. DISTRIBUTION THEORY
We have,
ja jb ()
=
=
=
1
1
(a) (b)
a1 ( )b1 d
Z 1
1
1 a+b1
sa1 (1 s)b1 ds
(a) (b)
0
B(a, b) a+b1
1
a+b1 = ja+b ()
=
(a) (b)
(a + b)
Based on this observation we define, for every a C such that Re(a) + m > 0 as
distribution
Z
< ja , >= (1)m
ja+m () (m) ()d
0
In particular,
Z
< j0 , >=
j1 () ()d =
0
0 ()d = (0)
67
||N
68
3. DISTRIBUTION THEORY
By Taylor expansion, one reduces to the case where all derivatives a = 0 for
|| N . More precisely, we can write
Z 1
d
(x) = (0) +
(tx)dt
dt
0
XZ 1
xi i (tx)dt
= (0) +
= (0) +
(1 t)
(0)x +
0
d2
(tx)dt
dt2
and continues integrating by parts until one has written as a multinomial with
coefficients corresponding to derivatives of at 0 plus a sum of terms of the form
x , where is a multi-index with | | > N and the functions are smooth (but
obviously not all of compact support).
Expanding in this way, we need to show that < u, x >= 0. Here we will
use the estimate | < u, > | C||||C N , so we will have to estimate derivatives of
the type ( x ) with order || N . Observe that
x
x
(( )) = || ( )( )
and generate a tremendous number of terms, but to find the exact formula for these
products may not be useful even though it might be worthwhile to go through the
details at least once. In practice, however, one avoids details (such as the exact
values of constants) which are not at the heart of the matter by understanding
what kinds of terms will occur, and in particular one isolates the worst terms. In
this case, the worst terms occur when a derivative falls upon the cutoff = ( x ),
which is becoming increasingly sharp as 0. For such terms, each derivative
generates a factor of 1 . However, at most N derivatives can hit this cutoff, and
so we have, for some number C 0 independent of (although potentially dependent
on and ),
| < u, x > | C 0 N
which tends to 0 as 0 since > N .
69
Now that we have introduced the notion of support, it is important to observe that
the convolution of two distributions cannot be defined in general, but only when
certain conditions on the support of the distributions are satisfied. We note in
particular the fact that if u1 , u2 D0 (Rn ) one of which is compactly supported,
then the convolution u1 u2 can be defined. Indeed, assuming u2 to be compactly
supported, we simply define, ( )
(u1 u2 ) = u1 (u2 ),
C0 (Rn ).
70
3. DISTRIBUTION THEORY
C0 ()
In this case we see that the change of variables formula is equivalent to the definition
of pullback.
Example 2: If f : R has a nonvanishing gradient, then we can explicitly
1
obtain the pullback of the delta function t , namely f (t ) = |f
| d. Here, d
denotes the canonical surface measure on the embedded hypersurface f 1 (t) =
{f (x) = t} Rn and f denotes the gradient of f .
In other words, we can compute the value at t of the pushforward measures density
function
Z
d(x)
,
f# (t) =
(x)
|f |
f 1 (t)
and therefore compute < f u, >=< u, f# > not only for a -function, but also
for arbitrary distributions u D0 (R). In this way, the pullback formula may be
written informally as a sort of decomposition
Z
u(f (x)) = u(t)(f (x) t)dt
R
which can be formally derived from the identity u(y) = u(t)(y t)dt.
As a sample application of this formula, one can see that the derivative of the
volume of the ball of radius R is the surface area of the sphere of radius R from
x
the fact that the gradient of |x| = |x|
has norm 1 and from the differentiation
Z
Z
d
H(r |x|)dx = (r |x|)dx.
dr
This formula is clear geometrically: when one compares the volume of a ball of
radius r to a slightly larger ball of radius r + , the change in volume is essentially
times the surface area.
Since the pullback of a delta-function will be very important for us, let us give
a proof of this formula. Once we have proven this formula, we have built up the
theory enough to carry out the details of the previous calculation in full. One would
take difference quotients in the variable r of the distribution H(r ), and these
difference quotients are essentially supported on a thickened sphere. We then use
the trivial observation that the pullback of a distribution < f u, >=< u, f# >
is continuous in u with respect to weak limits.
71
Now let us prove the pullback formula for a -function. The geometric picture
in the proof is basically a generalization of the special case f (x) = |x| considered
above.
Proof of the Pullback Formula for m = 1
By taking a partition of unity to decompose if necessary, we may assume that f
may be completed to a coordinate system on the support of , since this is always
possible on a small neighborhood of any point in the support of by the nonvanishing of |f |, and since finitely many such neighborhoods suffice. We consider the
measure = (y)dy and let (t) : R C be the distribution function defined by
Z
(t) = f# (, t] =
Z
(t < f (y) t + ) (y)dy
exists at every point t R, from which it will follow2 that is absolutely continuous and that 0 (t) = f# (t) given by the formula is in fact the correct density
function. For simplicity of notation, let us suppose > 0.
We now verify by change of coordinates that, very close to a point y0 f 1 (t) the
thickened hypersurface {y | t < f (y) t+} can be parameterized to have height
1
|f |(y0 )| + o() and width df (t) (y0 ), which is at least intuitively plausible
from a picture of the generic situation (for example, in the case of the preceding
example f (x) = |x|).
We may assume (without loss of generality) that
smooth function h(x1 , . . . , xn ) satisfying
f (h(x), x0 ) = x1
f
y 1 (y0 )
(135)
2The Fundamental Theorem of Calculus applies when is continuous and classically differentiable at every point t.
72
3. DISTRIBUTION THEORY
Z
1
(t < f (y) t + ) (y) |dy 1 dy 2 . . . dy n |
Z
1
= lim
(t < x1 t + ) (h(x), x0 ) |dh dx2 . . . dxn |
Z
1
h
= lim
(t < x1 t + ) (h(x), x0 ) | 1 (x)||dx1 dx2 . . . dxn |
x
Z
1
f
= lim
(t < x1 t + ) (/| 1 |) h(x1 , x0 ), x0 dx1 . . . dxn
y
Z
f
= (/| 1 |) (h(t, x0 ), x0 ) dx0
y
lim
To compute the Jacobian of the transformation in the second line, we have used
the shorthand of differential forms, which quickly encapsulates the fact that the
volume of an n-dimensional parallelopiped remains unchanged when one vertex is
translated within the span of others through the identity
h 1
dx ) dx2 . . . dxn
x1
n
X
h i
+(
dx ) dx2 . . . dxn
i
x
2
dh dx2 . . . dxn = (
h 1
dx . . . dxn
x1
h
and x
1 is computed through implicit differentiation of equation (135) which defines
h implicitly. It is now clear that 0 = f# is a smooth function of t.
The equation (135) also shows that, for t fixed, the function t (x0 ) = h(t, x0 )
parameterizes the hypersurface f 1 (t) as the graph of the function x1 = t (x0 )
when x0 varies. We now wish to interpret the integral over f 1 (t) in terms of the
surface measure
!1/2
n
X
t 2
0
df 1 (t) (x ) = 1 +
( j)
dx0
x
2
so we compute the surface density by implicitly differentiating f (t (x0 ), x0 ) = t to
obtain
f
f t
+ j = 0 j = 2, . . . , n
y 1 xj
y
Hence, we see that
|f | =
Substituting into
s
X f 2
y i
f
= | 1|
y
n
X
t
1+
( j )2
x
2
!1/2
f
0
0
(/| y
1 |) t (x )dx gives the formula in Example 2 above.
73
The proof above could have been simplified by employing the same change of variables (y 1 , . . . , y n ) = (h(x), x2 , . . . , xn ) to show directly that
Z
Z
u(f (y))(y)dy = u(t)f# (t)dt
for all u C (R), or equivalently by using a smooth approximate delta-function in
place of the sharp approximate delta-function 1 (0 < t ). We have alternatively
chosen the above, lengthier proof for its intuitive, geometric appeal and also to
demonstrate the use of distribution functions for computing f# . A briefer and
more general proof is given in the Appendix along with some computational tools.
Exercise 1. Let S : Rn Rn be the dilation map S (x) = x. We say that a
distribution u D0 (Rn ) is homogeneous of degree a if, S u = a u. Show that the
definition coincides with the usual one if u is a function. Show that, in Rn , 0 is
homogeneous of degree n.
Exercise 2. Show that any distribution in Rn which is both homogeneous of
degree n and also supported at the origin is a constant multiple of 0 .
The examples above are special cases of a more general formula. We can compute
f# when f : Rm is a smooth map whose derivative is everywhere surjective
by the following explicit formula:
Z
(f# )(y) =
(x)
f 1 (y)
d(x)
||f ||(x)
(136)
1
(f f )
||f ||2
74
3. DISTRIBUTION THEORY
(x)dx = (x)(x)dx =
(x)~nd (x),
f
(x)d (x) = ~nd (x)
|fa |
75
The extension of distribution theory to the setting of manifolds is mostly straightforward and is outlined in H
ormander.
The Appendix on integration over submanifolds included at the end of the notes
may help for some of the following Exercises.
Exercise 3. Show that if f , g are two smooth functions on Rn with non-vanishing
differential everywhere, then for all a, b Rn Rn :
Z
0 (f (a) x)0 (g(b) x)dx = 0 (f (a) g(b)).
Hint: Both sides are to be interpreted as distributions onRRn Rn . One could
re-write the definition of pull-back in the form u(g(b)) = u(x)0 (g(b) x)dx.
Approximating with approximate -functions, we can extend to the case u(x) =
0 (f (a) x). Alternatively, use the obvious special case where f (a) = a and
g(b) = b are both the identity map and pull back for general f and g.
Exercise 4.
(, , )d ( > 2 )
|d d|
2
76
3. DISTRIBUTION THEORY
CHAPTER 4
78
often have nothing to do with the various symmetries, and clearly remain true if
one were to perturb the operator slightly, or even substantially. Therefore keep
in mind that, though explicit formulas are very useful, we will ultimately need to
develop more robust techniques to understand properties of PDEs.
1. Cauchy-Riemann equations
The operator1
= 12 ( x
+ i y
) is fundamental to complex analysis, which studies
it u it
= e z (e z) from the calculation
z u
u it
u it
(e z)d(eit z) +
(e z)d(eit z)
z
z
u
u
= eit (eit z)dz + eit (eit z)d
z
z
z
d[u(eit z)] =
f
z
z z
1An easy way to remember this definition is to write f (x, y) = f ( z+
, 2iz ). Note also that
2
1
(
2 x
i y
) and df =
f
dz
z
f
d
z.
z
1.
CAUCHY-RIEMANN EQUATIONS
79
We may determine the constant by applying the distribution to any test function
we wish, and we will choose our test function to be the characteristic function of
the unit disk D = {|z| 1}. Technically, doing so leaves the realm of distribution
theory that we have covered, but we will have no problem justifying our computations: the point is that one factor is at least continuous wherever the other is a bit
singular, which allows one to pass from smooth approximations. In the notation of
real variables, it is possible to evaluate
Z
Z
1
1
H(1 |z|)
dxdy =
H(1 |x + iy|)dxdy
z z
x + iy z
by integrating by parts. By applying the product rule to
2
|z| =
(z z)
z
z
z
, and hence
we obtain that z |z| = 2|z|
Z
Z
1
H(1 |x + iy|)dxdy =
(1 |z|)dxdy
x + iy z
2|z|
=
Z
D
Z
1 d
z dz
1 dz
=
d
z
2i
z 2i
ZD
1 dz
=
D z 2i
Z 2
iei d
=
ei
2i
0
=
1 1
z,
then
K
z
= 0
d
z dz
dz
(z)
=
(z)
C0 (C)
z
2i
2i
2 One must take care that the parameterization 7 ei gives the correct orientation of
the circle. A naive replication of the following calculation with the clockwise parameterization
7 ei would have led to a sign error.
3We proved the divergence theorem assuming the boundary was smooth, however a very
slight variation of the proof works for Lipschitz boundary. See H
ormander, for instance.
80
d( dz) = ( + )d
z dz
z
z
One of the easiest ways to check that such integral identities involving distributions
are valid is by allowing the singular distribution to be approximated by smooth
functions . In this case, z is a measure since is Lipschitz and is a
continuous function, so one can already see that the computation is valid.
1
In the special case when f
z = 0 in and f is (say) C in a neighborhood of the
can pass from smooth approximations n f to conclude that
Rclosure of , we
4
f
(z)dz
=
0
.
Now, applying
to the product
1
1
= z0 +
z
(z z0 )
(z z0 ) z
(z)
1
dz = (z0 ) +
z z0
2i
1
d
z dz
z z z0
We have only stated the theorem for smooth functions, but the theorem holds much
more generally by approximation. In particular, we can pass from smooth to f
when f is holomorphic in and C 1 in a neighborhood of the closure of , and by
doing so we obtain as a corollary
Corollary 1.3 ( Cauchy Integral Formula). Let f , and z0 as above, then
Z
1
f (z)
dz = f (z0 )
2i z z0
Remark: Some care must be taken when applying the Cauchy Integral formula
and calculating the integral over the boundary. For one thing, the assumption that
f remains well-behaved at the boundary is essential for the passing from smooth
approximations as the example of z1 on the unit disk with the origin removed
illustrates. In thise case, the Cauchy Integral
cannot apply to (say) points
R Formula
1
z0 very close to 0 the boundary integral D z1 zz
dz
clearly has size not much
0
R
larger than the arclength |z|=1 1|dz| = 2. The other important issue which our
4More directly, one can calculate that d(f dz) = df dz = f dz dz + f d
z dz = 0 and
z
z
integrate over .
1.
CAUCHY-RIEMANN EQUATIONS
81
use of Stokes theorem subsumes is that the orientation of the boundary must be
taken into account if.
By analyzing the Cauchy Integral Formula one can show that holomorphic functions
(under the above conditions) possess a convergent power series expansion about any
interior point of , and in particular are smooth. Even more usefully, one can make
this smoothness quantitative by deducing estimates of the form || f ||L (K) .
||f ||L () for compact sets K contained in . The same estimates also indicate
how the solution to f
z = 0 varies continuously upon its boundary values (when the
solution exists). This analyticity is one example of a more general phenomenon: the
regularity of a fundamental solution away from the origin corresponds to regularity
of solutions to the PDE. It would be nice in general, however, to achieve a regularity
result such as this one (perhaps not as strong) without relying upon the explicit
formulas. We will revisit holomorphic functions shortly.
Exercise 1. We say that u D0 () is a weak solution to the Cauchy-Riemann
equations if u
z = 0 in the distribution theoretic sense. Prove that a continuous
function which is a weak solution is in fact a classical holomorphic function (and
hence analytic). (Hint: the class of holomorphic functions is closed under translation and linear combination, so it may be useful to consider a mollification of u.
Then use the a-priori estimates.)
Exercise 2.
Also prove:
82
2. Laplace Operator
.P As we have seen in the introduction, the Laplace operator (or Laplacian) =
n
2
n
i=1 i on R is one of the simplest and most important linear differential operators. Solutions to u = 0 are called harmonic functions. In two dimensions,
is related to the study of holomorphic functions (for example, from the identity
= 4 z z
and our preceding regularity results, we see that real and imaginary
parts of holomorphic functions are harmonic). The operator is also often denoted
5In fact, all such continuous ring homomorphisms on the algebra C() are point masses on
the boundary itself.
2.
LAPLACE OPERATOR
83
Z
Kn (x)H(1 |x|)dx Kn (x) H(1 |x|)dx
Z
x
= (1 |x|)Kn (|x|)
dx
|x|
Z
dKn
x
x
(1 |x|)
(|x|)
dx
dr
|x| |x|
Z
dKn
=
(1)d
|x|=1 dr
=1
With the fundamental solution Kn (x) in hand, we can solve the inhomogeneous
equation
V = , C0 (Rn )
R
with the formula V (x) = Kn (x) = Kn (x y)(y)dy. This solution is also often
denoted by 1 .
6This latter definition generalizes to the Laplace-Beltrami operator on Riemannian manifolds,
where the gradient, dot product, and volume form must be taken with respect to the metric. This
geometric point of view gives us another way of seeing the rotational invariance of the standard
Laplacian.
84
With some basic knowledge of differential geometry, we can give another proof. In
polar coordinates x = r, r > 0, || = 1, takes the form,
= r2 +
n1
r + r2 Sn1 ,
r
where Sn1 is the Laplace -Beltrami operator on the unit sphere Sn1 .
Exercise 1. Show that the Laplace-Beltrami operator on a Riemannian manifold
with metric g 7 is given, in local coordinates xi , by
p
1
g = p i g ij |g|j ).
|g|
Here g ij are the components of the inverse metric g 1 relative to the coordinates
xi . The volume element dSg on M is given, in local coordinates, by dSg =
p
|g|dx1 dx2 . . . dxn . Observe that, on compact manifold M ,
Z
Z
g u vdSg =
u g vdSg .
M
Exercise 2. Calculate the Laplace-Beltrami operator for the unit sphere Sn1
and check the polar decomposition formula for . For the particular case n = 3,
relative to the coordinates x1 = r cos 1 , x2 = r sin 1 cos 2 , x3 = r sin 1 sin 2 ,
1 [0, ), 2 [0, 2) show that,
S2 = 21 + cotan1 1 +
1
22 .
sin2 1
r2 +
= < Kn , >
Z
Z
Z
n1
=
Kn (r)r r
r drdS +
||=1
||=1
(2 n)n
1
Z
||=1
Z
=
0
Kn (r)Sn1 drdS
rn+2 r rn1 r drdS + 0
rn+1
rn1 r dr =
r (r)dr
= (0)
2.
LAPLACE OPERATOR
85
(x) =
Z
=
Kn (x y)dy
Our strategy is to integrate by parts, allowing at most one derivative to hit the
characteristic function. We recall from our discussion of pullbacks of distributions
that = ~nd where ~n is the interior unit normal and d is the surface measure
on the boundary. In contrast to a classical integration by parts, one proceeds as
though there are no boundary terms since the product Kn (xy) has compact
support. For a function f with a continuous first derivative at the boundary ,
we let f
denote the outward unit normal.
Z
(x) = ( ) Kn (x y)dy
Z
Z
= Kn (x y)dy Kn (x y)dy
Z
Z
=
Kn (x y)d(y) ((( Kn ) Kn )
Z
Z
Z
=
Kn (x y) d(y) +
Kn (x y)dy
Kn (x y)d(y)
We thus derive the representation formula,
Z
(x) =
Kn (x y)dy
(137)
Z
Z
+
Kn (x y)d(y)
Kn (x y) d(y)
Kn
u
u(x)
(x y)
(x)Kn (x y) d(x)
(138)
86
With our representation formula in hand, we can repeat much of the same analysis
that had been remarked for the Cauchy-Riemann equations. We find that, thanks
to the real analyticity of the fundamental solution, harmonic functions as above are
in fact real analytic, with quantitative a priori estimates on derivatives in terms of
the boundary values of u and u
. We can also use these estimates to show that
continuous functions satisfying u = 0 are actually classical solutions. But to proceed with the analysis of harmonic functions from this formula may be misleading
because the interior values of a harmonic function are uniquely determined by its
boundary values alone, and therefore the normal derivative cannot be prescribed
arbitrarily.
Indeed, the Maximum Principle for harmonic functions which we now state implies that harmonic functions in the interior of a domain are determined by their
boundary values alone.
Theorem 2.3 ( Maximum Principle). If u : R is C 2 on a connected, open
set and u 0 in , then u cannot obtain an interior maximum unless u is a
constant. In particular, when is bounded,
sup u(x) = sup u(x)
(139)
2.
LAPLACE OPERATOR
87
By the same reasoning that followed the discussion of the maximum modulus principle for holomorphic functions,R there must be a representation formula for harmonic
functions of the form u(y) = u(x)dy (x) for some finite measure y depending
on y . We can obtain such a representation formula as follows: if a harmonic function y (x) can be found which coincides with the fundamental solution
Kn (x y) on the boundary of , then the function G(x, y) = Kn (x y) y (x)
satisfies
x G(x, y) = y (x) in G(x, y) = 0 on
There can be only one such function by the maximum principle. This function
G(x, y) above is called the Greens function for , and wasR introduced formally in
the exercises in the Introduction. By computing u(y) = (x)u(x)G(x, y)dx
as in our previous representation formula (this time the boundary condition for
G cancels a boundary term: G(x, y) = ( G(x, y))) we obtain our desired
representation formula:
Proposition 2.4. If u is harmonic in and C 2 in a neighborhood of and G(x, y)
is as above, then
Z
G
u(y) =
u(x)
(x, y) d(x)
(140)
Note: we have not proven that the function defined by the right hand side of the
formula is defined for arbitrary domains, nor that it defines a harmonic function,
nor even that it realizes the boundary values in the integrand as y tends towards
the boundary. When the boundary is sufficiently nice (say, Lipschitz), all of these
things can be proven and arbitrary continuous boundary values can be achieved by
harmonic functions (in contrast to the Cauchy-Riemann equations).
The probability measure G
(x, y)d(x) appearing in (140) describes the probability
distribution of the first contact with the boundary of a random walk beginning at
the point y. Thus, the value of a harmonic function at the point y may be considered
the expected value which the boundary data obtains at the first contact point of a
random walk beginning at y. From this interpretation some features of harmonic
functions (the maximum principle and mean value property below, for example)
are obvious, but we will not explore this interpretation here.
Example: For the half-space xn > 0 in Rn , one can obtain an explicit formula for
the Greens function G(x, y) = y , G(x1 , . . . , xn1 , 0) = 0 by placing a negative
point source at the point y = (y 1 , . . . , y n1 , y n ), and defining G(x, y) = K(x
y) K(x y ). Then G(x, y) = 0 on xn = 0 since such points are equidistant from
both y and y and the fundamental solution depends only on Euclidean distance.
The same method can also be used to construct a Greens function for a ball |y| 1.
In this case, one uses the conformal reflection y y = |y|y 2 , which fixes the sphere
|y| = 1. The Greens function then takes the form G(x, y) = K(x y) K(|x|(x
y )). Many more examples can be obtained in two dimensions using holomorphic
functions.
Among the main results of our analysis up to this point (the maximum principle,
some of the various a priori estimates which can be deduced from Greens formula,
88
and the existence of solutions on Rn for compactly supported data), many hold
for other operators closely analogous to the Laplacian. For example, by changing variables, we see that when u is a harmonic function and v((x)) = u(x) for
a
of Rn , then v will satisfy an equation of the form L[v] =
Pdiffeomorphism
n
ij
i
ij
i,j=1 a (y)i j v + b (y)i v = 0 where the smooth functions a (y) are the coefij
ficients of a symmetric, positive definite matrix a ((x)) = D(x)(D)t (x) and
the first order terms depend on second derivatives of .
Pn
Pn
More general operators of the form L = i,j=1 aij (x)i j + i=1 bi i where the
matrices (aij (x)) symmetric and positive definite are called elliptic, and it is no
surprise that they share many properties in common with the Laplacian, but they
generally do not necessarily possess the same amount of symmetry 9 as the Laplace
operator does, and therefore they require more robust methods to analyze successfully. However, there are also methods, for extending and transporting results and
estimates for the Laplace operator to more general elliptic operators.
The following theorem embodies the rotational and translational symmetry of the
Laplace operator, and in fact characterizes harmonic functions as well as the Laplace
operator itself. Therefore, it can be used to prove results for the Laplace operator
and harmonic functions which are beyond the reach of other methods, and therefore
its applications are also limited to these purposes. The theorem shows how the
Laplacian controls the change in spherical averages of varying radius.
Theorem 2.5. [Mean Value Property] When u is harmonic in the ball of radius
R > R about x, u(x) is equal to its average over the sphere of radius R centered
at the point x
Z
1
1
u(y)d(y)
u(x) =
nn R(n1) |yx|=R
with the = replaced by when u 0. In fact, for all u C (Rn ), 0 < R1 <
R2 < R ,
Z
Z
1
1
u(x)d(x)
u(x)d(x)
(n1)
(n1)
|x|=R2
|x|=R1
R2
R1
#
Z R2 "Z
=
u(y)dy (n1) d
R1
|y|
Proof We prove the last formula, since the first identity of the theorem is an
immediate consequence (by letting the inner radius tend to 0). In fact, the latter
formula shows that spherical averages increase with the radius when u 0.
9When one refers to the symmetries of a partial differential operator, one often has in
mind a collection of vector fields which commute with the operator, or their flows which leave
the operator invariant. The symmetries of on Rn are the symmetries of the underlying Euclidean geometry: the infinitesimal translations x1 , . . . , xn together with the infinitesimal rotations. The corresponding flows generate the group of rigid motions of Euclidean space. In
this diffeomorphism-invariant sense, a coordinate change of the Laplacian as above has the same
amount of symmetry. We will see later that such symmetries can be very helpful when analyzing
a differential operator. In any case, it is obvious that the condition of ellipticity alone does not
imply the existence of such operator-preserving flows.
2.
LAPLACE OPERATOR
89
d
=
d
=
=
=
=
(n 1)
( r)u(x)dx
(n1)
Z
1
(n 1)
( r)u(x)dx
r
r
(n1)
Z
1
(n 1)
( )H( r)u(x)dx
r
r
r
(n1)
Z
1
H( r)u(x)dx
(n1)
Z
1
u(x)dx
(n1) |x|
that any distribution pulled back by the map (, x) r remains fixed by any
vector field in the null space of d dr (hence, ( + r )( r) = 0). In the fourth
(n1)
2
line, we recognized that the operator r
2 +
r
r coincides with the Laplacian
when applied to spherically symmetric functions.
Integrating in from R1 to R2 gives the desired formula.
Remark 2.6. What we have essentially computed is that
Z
Z
u(x)d(x) = u1 dx
where is the measure in Exercise 5 of the Introduction.
A special case of the above formula has important applications to complex analysis.
When f is a nonzero holomorphic function in a disk D, we have the identity
X
1
k (z)
log |f (z)| =
2
where k runs over the finite collection zeros of f counted with multiplicity the
measure on the right hand side is known in algebraic geometry as the zero divisor of
f . One can see this identity locally near a zero j by writing f (z) = eg(z) (z j )n
for some function g holomorphic in a neighborhood of j , and by recalling that real
parts of holomorphic functions are harmonic and that the fundamental solution for
1
the Laplacian in two dimensions is given by 2
log |z|. This calculation implies in
particular that log |f (z)| is a subharmonic function (a fact which can be used in
combination with the maximum principle to give strong estimates on holomorphic
1
functions). Applying the general formula in the Theorem (2.5) to u = 2
log |f (z)|,
we obtain
90
Proof Beginning with the general formula in the Theorem (2.5) we let u
1
2 log |f (z)| by the standard mollifier construction, and notice that our assumptions
on f (z) are enough to guarantee that the integral formula in the theorem is still
valid for R2 = R and 0 < R1 < min |j |. Letting R1 tend to 0 gives the left hand
side of Jensens formula.
We now calculate the right hand side of the general formula explicitly with Fubinis
theorem (or integration by parts):
#
Z R Z X
Z R "Z
1
log |f (z)|dy 1 d =
|j | (t)dt 1 d
0
0 j
0
|y| 2
"Z
#
Z RX
R
1
=
|j | (t)
d dt
0
X
R
=
log
j
|j |<R
The theorems in Exercises 1 - 5 are basic theorems in the study of the Laplace
equation (along with their generalizations to other elliptic PDE), and their proofs
may be found in some form in either Evans or Gilbarg and Trudinger should the
reader wish to consult a reference.
Exercise 1.[ Hopf Lemma] states that a harmonic function on a bounded, open
set u : R must satisfy u
(x0 ) > 0 at a boundary point x0 where the boundary
is smooth and u(x0 ) > u(x) for x \{x0 }. Prove this fact. One approach is
to design an appropriate superharmonic perturbation of u close to x0 and use the
weak maximum principle to bound u below the superharmonic perturbation.
Exercise 2. Prove the strong maximum principle for subharmonic functions. (By
now you may be able to see more than one proof)
Try to obtain this result also for C 2 subsolutions to an elliptic equation in other
words, supposing
Lu =
n
X
i,j=1
aij (x)i j u +
n
X
bi (x)i u 0
i=1
where the matrices aij (x)i j ||2 are uniformly positive definite.
2.
LAPLACE OPERATOR
91
Exercise 5.
u (x) = (n1)
"Z
#
u(x y)dy t
(t)dt
|y|t
92
3. DAlembertian operator
Recall that the Dalembertian = t2 + is the simplest differential operator in
R1+n invariant under translations and Lorentz transformations, i.e. the Poincare.
group. The easiest way to see is to write = m , with m the Minkowski
metric. Since m is invariant under the Poincare group so is . Thus it makes
sense to look for a fundamental solution of the form (t, x) = f () where10 =
t2 |x|2 = m x x is invariant under Lorentz transformations. Also, because
the distribution 0 on Rn+1 is homogeneous of degree (n + 1) and applying
lowers the degree of homogeneity by 2, we conclude that f must be homogeneous
of degree n1
2 . Therefore, a good candidate for a fundamental solution must have
n1
the form E = cn (t2 |x|2 ) 2 , for some constant cn , in the region t > |x|. We are
therefore led to look for a distribution E+ , homogeneous of degree n + 1, which
n1
coincides with cn (t2 |x|2 ) 2 in the region t > |x|. This may seem difficult at
n1
first, due to the high degree of the singularity of (t2 |x|2 ) 2 along |x| = |t|,
until we realize that we can make use of the homogeneous family of distributions
ja defined by proposition ??. We need to choose in fact a = n1
2 + 1 and take
E+ to be proportional toj n1 +1 (t2 x2 ), understood as the pull back f (j n1 ),
2
2
with f = t2 |x|2 . It is more convenient in the context to change notation a little
bit and write,
a+ := ja+1
Thus,
n1
2
E = +
n1
2
(t2 |x|2 ).
of t2 |x|2 vanishes at the origin, and hence + 2 (t2 |x|2 ) defines a distribution
only on Rn+1 {0}. A rigorous formulation requires a bit more care, but the
particular degree of homogeneity of the distribution basically allows for a unique
extension to the whole space, See Exercise 3 of section (4) for the n = 3 case. Now
the distribution we have produced has the right properties except for the fact that
it supported in the entire region |x| |t|. For deterministic physical reasons we
prefer a distribution supported only in the future region |x| t. This defines our
(n+1)
n1
defined by
Theorem 3.1. The distribution E+
1n
1
n1
(n+1)
E+
(t, x) = 2 H(t)+ 2 (t2 |x|2 )
2
(141)
10 Recall that we denote t = x0 and we use the summation convention w.r.t the indices
, = 0, 1, . . . , n.
3. DALEMBERTIAN OPERATOR
93
is the unique fundamental solution of the wave equations supported in the forward
region |x| t.
We shall prove this theorem later for the moment a few remarks are in order. First,
observe a fundamental difference between the cases when n > 1 is odd and the
(n+1)
is supported only on the
cases when n is even. Indeed in the former case E+
boundary of the region |x| < t, i.e the future light cone |x| = t while in the latter
(n+1)
case E+
is supported in the entire forward region |x| t. More precisely, using
s1
the chain rule and the fact that d s+ () = +
(), we can write the fundamental
d
solution a bit more explicitly away from the origin: in dimensions n = 3 + 2k, the
fundamental solution looks like a derivative of a measure supported on the forward
light cone
k
k
1
1
r (t2 |x|2 ) = cn H(t)
t (t2 |x|2 )
cn H(t)
2r
2t
while in n = 2 + 2k dimensions, it is of the form
k
k
1
1
1
1
p
p
r
t
cn H(t)
(|x| t) = cn H(t)
(|x| t)
2
2
2
2r
2t
t |x|
t |x|2
(the above distributions being equal since 1t t + 1r r is in the null space of d(t2 r2 )).
In the most important particular case, when n = 3, we have,
(1+3)
E+
1
1
H(t)(t2 |x|2 ) =
(t |x|)
2
4
(142)
1
H(t)(t2 |x|2 )1/2 =
2 1/2
(143)
Also, for n = 2,
(1+2)
E+
(144)
94
To see how to do this consider a point p = (t0 , x0 ) with t0 > 0 and observe that,
for any test function we have in the upper half space, D+ = {(t, x) t 0},
Z
Z
(t0 , x0 ) =
(t, x)p (t, x)dtdx =
+ (m )Ep (t, x)dtdx
R1+n
D+
=
m Ep +
m Ep
R1+n
R1+n
Z
+
m b Ep
R1+n
i.e.,
Z
Z
Ep
Ep +
Ep
1+n
1+n
R
R
D
Z
Z
Z +
+
t t Ep
t t Ep
R1+n
R1+n
D
Z +
Z
Z
+
(t) t Ep
(t)t Ep
D
R1+n
R1+n
Z +
Z
Z
Ep
(t) t0 Ep
(t)t Ep
D
R1+n
R1+n
Z +
Z
Z
Ep t0
f (x)Ep (0, x)dx
g(x)Ep (0, x)dx
Z
(t0 , x0 )
=
=
=
=
=
Rn
D+
Rn
t0
=
0
ZR
t0
Rn
Leaving aside the issue of uniqueness, which we shall treat separately later on, we
deduce the following.
Theorem 3.2. [Kirchoff-Hadamard] The initial value problem = F , (0, x) =
f (x), t (0, x) = g(x) has a unique solution for arbitrary smooth functions f, g, F ,
given by formula (145).
Exercise 2. Compare formula (145) with (137) for the Laplacian. Explain what
may go wrong if we try to prove a result for the Laplace equation similar to that
of theorem 3.2 above.
3. DALEMBERTIAN OPERATOR
95
t (4t)1
f (y)da(y) + (4t)1
|xy|=t
Z
+
0
1
ds
ts
Z
g(y)da(y)
|xy|=t
Z
(s, y)da(y)
(146)
|xy|=ts
The traditional way to derive the Kirchoff formula (146) is to first prove it in the
homogeneous case, i.e. = 0. In fact it suffices to prove it for the case f = 0
and arbitrary g using the beautiful method of spherical means, see [J] for a clean
derivation. Once the homogeneous case is treated one can derive the general formula
using the Duhamel principle. This goes as follows: Let W (t)g denote the solution
(t, ) of the homogeneous problem with data f = 0 and arbitrary g. Think of it
as an family of operators, parametrized by t, which take smooth functions in Rn to
smooth functions in Rn . We then have to verify that the solution of the equation
= F is given by the formula
Z
(t, x) =
(147)
E+
(t, x)
1
1
= 1 H(t)0 (t2 |x|2 ) = r1 (t r)
2
4
96
A(u)
(148)
for given real functions A and u to be determined. Here (u) is simply the pull
back of 0 by u as discussed in subsection 4, Example 2. A simple calculation leads
to,
m A(u) = m A (u) + (2 A u + u) 0 (u) + u u 00 (u)
To cancel the coefficient of 00 (u) we need to chose u such that,
m u u = 0.
(149)
This is the famous Eikonal equation in Minkowski space. A simple family of solutions is given by u(t, x) = t t0 |x x0 | for a given point (t0 , x0 ), whose level
hypersurfaces are simply backward light cones with vertex at (t0 , x0 ). For our purposes we choose u = t |x|. Next, to cancel the coefficient of 0 (u), we need to
choose A such that12,
2 A u + u = 0.
One can easily check that the choice A = |x|1 will do. Finally it only remains to
calculate the term containing (u), i.e.,
(A)(u) = (|x|1 )(u) = 40 (x)(u) = 0 (t, x)
where the first 0 (x) is the delta function in R3 while the final 0 is the desired
1+3
1 1
delta function in R1+3 . Hence E+
= 4
|x| (t |x|) as desired.
Exercise 7 Justify that last step involving products of distributions.
Uniqueness of the fundamental solution E+ . It suffices to prove uniqueness of
solutions to the general Cauchy problem in theorem (3.2).
Exercise 8.
We start with the simple calculation involving the energy momentum tensor. To
calculate efficiently it helps to remember that we are using the summation convection with respect to the space-time indices , = 0, 1, . . . , n. We will also be using
12It turns out that the equation below can be interpreted as a transport equation along the
generators of the backward null cone t |x| = 0.
3. DALEMBERTIAN OPERATOR
97
(150)
(151)
N (t1 ,t2 )
Q00
(152)
D(t1 )
where D(t1 ), D(t2 ) are t-sections through the solid light cone |x x0 | t t0 ,
N (t1 , t2 ) represents the portion of the light cone |x x0 | = t t0 between the
sections t = t1 and t = t2 and
L = u = m u.
It is easy to check that,
Q00 =
n
X
1
1
(|t |2 +
|i |2 ) = (|t |2 + ||2 ).
2
2
1=1
98
Theorem 3.4 (Energy inequality). For every solution of the wave equation = 0,
in a neighborhood of the solid region bounded by the surfaces t t0 |x x0 | = 0,
t = t1 and t = t2 , we have,
Z
D(t2 )
1
(|t |2 + ||2 )
2
Z
D(t1 )
1
(|t |2 + ||2 )
2
In particular any smooth solution which vanishes at D(t1 ) must also vanish at
D(t2 ).
Exercise 10. Deduce from the energy inequality the finite propagation speed for
the Cauchy problem, as we have deduced earlier from the explicit solution. Note
that strong Huygens principle in odd spatial dimensions n = 3+2k discussed earlier
is a very special phenomenon related to the precise form of the wave operator 2.
However the phenomenon of finite speed of propagation exhibited by the wave
equation is a more robust feature shared by many related equations for which an
energy inequality as above holds true.
Finally, we check below the validity of our forward fundamental solution in all
dimensions.
Proof [Theorem 3.1 all n] We prove the formula (modulo the absolute value of the
constant cn . )
Z Z
+
(n1)
2
=
t2 + 2 (t2 |x|2 ) H(1 |t|)dxdt
Z Z
(n1)
=
t + 2 (t2 r2 )(1 t)dxdt
Z Z
(n1)
=2
t + 2 (t2 r2 )(1 t)dxdt
Z Z
(n1)
t
=2
( )r + 2 (t2 r2 )(1 t)dxdt
r
Taking the definition of (1t) the above becomes an integral over the hypersurface
t = 1 which we put in polar coordinates (writing the volume form rn1 drdn1
with dn1 the surface measure on the unit sphere in Rn ). We remark here that,
as we will see later on, a distribution does not quite have to be a continuous function in order to have a meaningful restriction to a lower dimensional submanifold.
4. HEAT OPERATOR H.
99
=
r + 2 (1 r2 )rn1 drdn1
r
Z
(n1)
= |S n1 |
rn2 r + 2 (1 r2 )dr
0
rn2 r + 2 (1 r2 )dr
0
is positive. Despite not having defined this number14, we prove its positivity by
induction, separating into cases based on the parity n. For n = 1 + 2k, we integrate
by parts and notice the boundary term vanishes to find
rn2 r +
(n1)
2
(1 r2 )dr =
r2k (
r k+1
)
H(1 r2 )dr
2r
Ik
r k
r r2k1 H(r) (
) H(1 r2 )dr
2r
Z
r k
) H(1 r2 )dr
= 2(k+1) (2k 1)
r2(k1) (
2r
0
(k+1)
=2
4. Heat Operator H.
We consider the heat operator H = t acting on functions defined on RRn =
R
. It makes sense to look for spherically symmetric solutions to Hu = 0: that
is to say, functions u(t, x) = u(t, |x|) = u(t, r). It is possible to find in this way
2
n
a class of locally integrable solutions Ec (t, x) = cH(t) t 2 e|x| /4t , with H(t) the
Heaviside function (although it is easier to proceed via the Fourier transform).
n+1
14Observe, however, that (by the product rule) r n2 H(r) has a decent amount of regularity,
100
Indeed H(Ec ) = 0 for all (t, x) 6= (0, 0). We show below that, in the whole space,
H(Ec ) is proportional to 0 and that we can determine the constant c = cn =
n
2n 2 such that the corresponding E = Ec is a fundamental solution of H, i.e.
H(E) = 0 .
We could very easily reason by considering the parabolic scaling (t, x) (2 t, x),
that HEc = C0 for some constant C (possibly 0). To determine the constant, we
could use any test function, and it would be simple to take H(1 t). More or less,
this is exactly how we will proceed.
Let C0 (Rn+1 ),
< H(E), >
=
=
=
=
0
Rn
Rn
Rn
e|y| dy
Rn
= (0, 0)
R
2
Modulo the fact that Rn e|y| dy = n/2 (which will be shown later), we have
proven that
2
/4t
(153)
is a fundamental solution for H. Notice that, for any fixed t > 0, E(t, x) has
support on all of Rn , implying that the heat equation (in contrast to the wave
equation) exhibits infinite speed of propagation. This phenomenon is related
P to
the parabolic scaling of the heat operator, which, in constrast to t2 + i i2 ,
endows time and space with different units. Also notice that E(t, x) is smooth
for t > 0; this fact will lead to instantaneous smoothing for the initial value problem
H = 0 on (0, ) Rn , (0, x) = 0 (x).
Exercise 1. Derive a representation formula for the initial value problem. Why
is it impossible to solve the heat equation backwards in time for arbitrary initial
data?
Exercise 2. Show that the above representation formula for the Cauchy problem
does indeed give a classical solution for sufficiently smooth data. Check that the
correct boundary value is obtained. (This is in contrast to the situation with the
Cauchy-Riemann equations).
Exercise 3. Write down a maximum principle for C 2 solutions to the H = 0 in
the interior of (0, T ] , for open and bounded.
4. HEAT OPERATOR H.
101
/4t
(154)
2
/4t
makes a world of
Exercise 1.
Show that (for the appropriately chosen branch cut of log) the
locally integrable function E is indeed a fundamental solution for S.
Exercise 2. Similarly to Exercise 5 for the heat equation, one denotes the timeevolution operator for the Schrodinger equation by eit . Show that eit is a unitary
operator in the sense that the quantity
Z
|(t, x)|2 dx
Rn
Part 2
CHAPTER 5
Fourier transform
1. Basic properties.
Recall that if f L1 (Rn ), then the Fourier transform F(f ) = f is defined as the
continuous function
Z
f () = f (x)eix dx
(155)
b n ), we have the inversion formula
In case that f L1 (R
f (x) = (2)n
f()eix d,
(156)
whose proof we shall indicate later. To distinguish between the two conceptually,
we refer to the Rn on which f lives as the physical space and the set of points
b n and
on which f lives as frequency space. We denote the frequency space by R
d
endow it with the normalized measure (2)n .
The inversion formula supplies us with a valuable heuristic understanding of what
the Fourier transform does. We see that f (x) can be written as some kind of linear
d describes the
combination of plane waves (x 7 eix ) and the measure f() (2)
n
distribution of f over the space of frequencies. If we view the plane waves eix
as eigenvectors of the translation operators on Rn , we can consider the Fourier
transform an attempt to simultaneously diagonalize translations. Similarly, if we
: j = 1 . . . n},
view the plane waves eix as the eigenvectors of the operators {i x
j
2
which are self-adjoint with respect to the L inner product (when restricted to the
appropriate domain), then we see that differentiation has also been diagonalized by
the Fourier transform.
With these heuristics in mind, we can begin to see how the Fourier transform
might be useful for analysis. For example, if f is concentrated nearby a frequency
b n , we expect f to behave in some ways like the plane waves nearby eix0 .
0 R
For instance, f may admit a bounded, complex-analytic extension into part of
Cn . We also expect that xj f (x) ij0 f (x) so that differentiation becomes a much
easier operation to study. Indeed, when we encounter Littlewood Paley theory later
on, the main idea will be to decompose general functions into frequency localized
components, analyze these components separately, and then reassemble.
105
106
5. FOURIER TRANSFORM
Another important principle regarding the Fourier transform is the duality between
smoothness and decay in physical and frequency space. Intuitively, a function f
whose graph has sudden jumps or spikes in physical space must be composed of
arbitrarily large frequencies, whereas when f is compactly supported, f must be
globally tame. Similarly, when f is very smooth, f can decay at infinity thanks
to interference ( cancelation ) among nearby plane waves in the inversion formula.
There are not one but many formal manifestations of these basic principles all over
Fourier analysis, so let us keep them in mind as we proceed to develop the theory.
The inversion formula takes particularly concrete form in the case of the Gaussian
2
function G(x) = e|x| /2 .
Lemma 1.1. The following calculation holds true for functions of one variable and
a, b R, b > 0,
Z
2
1/2 a2 /4b
e
(157)
eiax ebx =
b
eixy ety =
(158)
In particular
F(G)() = (2)n/2 G()
(159)
a
i,
2b1/2
by
contour deformation argument.
to calculate the integral J =
R a standard
R Now
x2
|x|2
1/2
2
e
e
dx
=
,
we
observe
that
J
=
dx = by passing to polar
R2
coordinates and from this follows (157) . Formula (158) now follows immediately.
We can give another proof of the above identity after reviewing some of the fundamental properties of the Fourier transform.
Proposition 1.2. The Fourier transform is linear and verifies the following simple
properties.
Fourier transform takes translations in physical space Tx0 f (x) = f (xx0 )
into modulations in frequency space F(Tx0 f )() = eix0 f().
Fourier transform takes modulations in physical space M0 f (x) = eix0 f (x)
into translation in frequency space F(M0 f )() = f( 0 ).
Fourier transform takes conjugation in physical space into conjugation and
reflection in frequency, i.e. F(f)() = f().
1. BASIC PROPERTIES.
107
Let G,x0 ,0 (x) = eix0 G((xx0 )/ ) be a translated, modulated, rescaled Gaussian. Then,
Z
= ()n/2 G ( 0 )
We can interpret this resultas saying that G,x0 ,0 is localized at spatial position x0 ,
with spatial
spread x , and at frequency position 0 with frequency spread
= 1/ . Observe that x 1, so our ability to localize simultaneously in
both physical and frequency space in this way seems to be limited. Surprisingly, this
construction is in some sense the best we can do, and it is our first encounter with
the uncertainty principle of Fourier analysis, which, in its various manifestations,
states that there is a bound on how well one can simultaneously localize in both
frequency and physical space.
108
5. FOURIER TRANSFORM
We now prove our first important manifestation of the duality between smoothness
and decay2.
Proposition 1.3 (Riemann Lebesgue). Given an arbitrary f L1 (Rn ) we have,
kfkL . kf kL1 . Moreover, f() 0 as || .
Proof : Only the last statement requires an argument. Observe that if f
C0 (Rn ), then we can use integration by parts to conclude that f decays rapidly.
Indeed for any multi-index , || = k N,
Z
Z
k
ix
k
f () = i
x e
eix x f (x)dx
f (x)dx = (i)
Z
| f()| .
|x f (x)dx| C
for some constant C . Thus, |f()| . (1 + ||)k which proves the statement in
this case. For general f L1 (Rn ), given > 0, we can choose g C0 such that
||f g||L1 2 . From the preceding, we know that |
g ()| 2 if || > M = M
sufficiently large and therefore,
sup |f()| kf gkL1 (Rn ) + sup |
g ()|
||>M
||>M
109
In particular we have the Parseval identity kf kL2 (Rn ) = kF(f )kL2 (Rbn ) .
| and that (x) decays faster than
T (j f ) = j (T f ),
T (xj f ) = xj (T f )
(161)
with S(R). Applying T to this identity, we see that T (x0 ) = 0 as well, and
we may therefore write T = f for some function f possibly depending on .
But f = f does not depend on . If is any other Schwartz function, the linear
combination (x0 ) (x0 ) vanishes at the point x0 , and applying T we conclude
by the same property that f (x0 ) = f (x0 ) at any point x0 at which and are
simultaneously nonzero. It is clear that the function f must be smooth for T to
map S(Rn ) into itself, and in order for T to commute with differentiation, f must
be a constant.
To determine the constants we only have to remark that, in view of lemma 1.1 we
2
have T (G) = (2)n/2 G = (2)n G. Hence the constant c = (2)n which ends
the proof of the inversion formula, and the proposition, for Schwartz functions.
110
5. FOURIER TRANSFORM
The Plancherel and Parseval identities are immediate consequences of the inversion
formula.
Corollary 2.4. The following properties hold for all functions in S(Rn ):.
Z
Z
dx
=
d
Z
Z
dx = (2)
dx
d
(2)n
)()d
(
=
bn
The last convolution being taken with respect to the measure on R
We only completely worked out the proof of the inversion formula for one dimension,
although the same proof requires only a miniscule generalization of the Taylor
expansion to work for general n. The general case can also be deduced from the case
n = 1 as follows: the inversion formula is true for tensor products f1 (x1 ) fn (xn )
and linear combinations thereof, the delta-function is a tensor product 0 (x) =
(x1 ) (xn ), and an arbitrary
function may be written as a linear combination
R
of delta functions f (x) = f (t)(x t)dt.
Exercise 2.5. Make the above argument into a rigorous, self-contained proof of
the inversion formula for Rn by using approximate delta-functions.
It is worthwhile to explore the relationship of the above proof of the inversion formula via the Lemma 2.3 with other proofs of the formula. Just as a linear operator
between vector spaces of finite dimension can be studied via a matrix representation, we can study the operator T in terms of its Kernel K the distribution on
Rn Rn such that
Z
T (x) =
In asserting that T (x0 ) depends only on (x0 ), we had proven that applying T was
the same as multiplying by some function; in terms of the kernel, we had established
that
Z
T (x) = (x0 )f (x)(x x0 )dx0
In order for T to commute with differentiation which is not so different from
commuting with translation we concluded that f (x) was a constant.
But we can see directly that an equivalent formulation of the inversion formula is
the distribution-theoretic identity
Z
0
d
= (x x0 )
(162)
ei(xx )
(2)n
n
c
R
which is really the special case of the inversion formula for a -function. Viewing the
integral on the left hand side as an inner product, the above identity can be regarded
111
as a statement that the plane waves eix are in some sense orthonormal as
x varies. Thus, in writing
Z
Z
d
f (t)(x t)dt = f (x) = f()ei
2
we might regard the Fourier transform as analogous to a change of orthonormal
basis from -functions in physical space to plane waves so that the Plancharel
and Parseval identities should follow immediately. For example, in one dimension,
d
with distinct
distinct plane waves are eigenfunctions for the self-adjoint operator i d
eigenvalues, and therefore should be orthogonal as a matter of general principle
this argument can be made rigorous to show the above distribution vanishes away
from x x0 = 0, as was essentially done in the previous proof through Taylor
expansion and linearity over the polynomial ring. Let us mention several other
ways to establish this identity and hence prove the inversion formula.3
It suffices to show that
Z
1
eix d = 0 (x)
(2)n
as a distribution in the variable x on Rn this translation invariance corresponds
d
to T commuting with dx
in the previous proof. By viewing the above distribution
as a tensor product, it would suffice to consider the case n = 1, but let us refrain
from doing so. Recall that every distribution supported at the origin is a finite
linear combination of derivatives of (x), and hence the function itself is, up to a
constant, the only distribution homogeneous of degree n supported at 0 these
facts are easily established by Taylor expansion. As the integral on the left hand
side is clearly homogeneous of degree n in x, we will have proven the identity up
to a constant if we can show that
Z
eix d
(163)
is supported at the origin in this precise sense, a plane wave 7 eix is zero on
average.
Heuristically, let us outline a few ways to perform this calculation. Pretend that
the integral 163 is a classical integral and that x 6= 0 is fixed. If we view the plane
waves as eigenfunctions of differential operators, we may integrate in by parts
using the identity
Z
Z
1
eix d =
eix d,
|x|2
or alternatively we can rotate to the case x = |x|(1, 0, . . . , 0) and integrate by
1 eix
parts using the identity i|x|
1 . If we would rather view the plane waves as
eigenfunctions of translation operators, we may show the integral is zero for x 6= 0
by translating in the variable
Z
Z
Z
0
0
eix d = ei( )x d = ei x eix d
3As remarked, the plane waves may be viewed as eigenfunctions of the commuting family of
d
self-adjoint operators f 7 i dx
i f . Dually, the delta functions, which are similarly orthonormal
R
in the sense that 0 (y x)0 (y 0 x)dx = (y y 0 ), can be viewed as eigenfunctions for the
commuting family of operators f 7 xi f .
112
5. FOURIER TRANSFORM
This limit is zero by the dominated convergence theorem (the d integral is a rapidly
decreasing function of x and (x) 0).
Without assuming anything about the support of , the above proof would have
established the Inversion Formula directly with the constant had we chosen a
(such as a Gaussian) whose Fourier transform was understood. Indeed, if we know
the Inversion Formula for a Gaussian, the inversion formula is true for rescalings
and translates of Gaussians. As a limiting case, the Inversion Formula holds for
any
R function, and hence for an arbitrary function by the decomposition f (x) =
f (t)(x t)dt.
Exercise 2.6. Create a self-contained, direct proof of the Inversion Formula from
the case of a Gaussian.
In the case n = 1, there is Ralso a more complex-analytic way to evaluate the
eix d =
eix d +
eix d
0
i(xiy)
= lim
e
d +
y0+
1
1
=i
+
x i0 x + i0
= 2(x)
ei(x+iy) d
113
F(1) = (2)n 0
(164)
114
5. FOURIER TRANSFORM
Observe also that if we denote by sign(x) the one dimensional tempered distribution
x
given by the locally integrable function |x|
we have,
d
sign()
= 2ipv()
(165)
d
Indeed sign0 (x) = 20 . Hence, i sign()
= 2. Therefore, for any rapidly decreasing
, we have
Z
Z
c
i sign(x)x(x)dx
= 2(0)
= 2 (x)dx
d
Also, observe that sign(x)
is an odd distribution so that whenever (x) = (x)
d >= 0. Now given a general test function
is an even test function, then < sign,
, write = 21 ((x) + (x)) + 12 ((x) (x)) = ev + odd . Hence, from the
preceding, we infer that
d >=< sign,
d 1 odd >= 2i < pv( 1 ), >
< sign,
as desired.
This fact may also be observed more directly by evaluating the distribution-theoretic
integral
Z
sign(x)eix dx
along the same lines as the complex-analytic proof of the Fourier Inversion Formula
outlined in the previous section.
Exercise 1. Show that the only harmonic functions which are tempered distributions are polynomials.
Exercise
2. Let f (x) = e|x| L1 (R). Compute f() (and hence f(0) =
R
f (x)dx = 2) using the fact that f satisfies a simple, second order differential
equation. Comment on the precise amounts of regularity and decay of f and f
and how they can be anticipated from the physical space representation. Note that
f continues meromorphically into the complex plane by considering correlations
against complex plane waves x eizx , z C, you can anticipate the location of
the poles from the form of f in physical space.
Exercise 3. Suppose that u is a tempered distribution which is invariant under
translation by a subgroup S of Rn for instance u could be periodic or a function
of less than n of the variables. Why can we assume S is closed? Show that the
Fourier transform u
is supported on the annihilator subgroup S of plane waves
which are invariant under S.
S = { | eix = 1 for all x S}
Df (t) = if 0 (t)
115
Observe that,
[D, X]f = DXf XDf = if
This lack of commutation is responsible for the following:
Proposition 4.1 (Heisenberg uncertainty principle). The following inequality holds,
kXf kL2 kDf kL2
Proof :
1
kf k2L2
2
Now, miniize the right hand side by choosing a = kDf kL2 and b = kXf kL2 .
The uncertainty principle, which can informally be described as5 x 1/2,
places a limit on how accurately we can localize a function, or any other relevant
object, simultaneously in both space and frequency. Let us investigate these localizations in more detail.
(166)
terprets both |f (x)|2 and 2 as probability densities over states in position and momentum
space respectively. In this setting the Heisenberg uncertainty principle gives a lower bound for
the product of the standard deviations of position and momentum.
116
5. FOURIER TRANSFORM
Pd
D f () = D f ().
PD is an example of a Fourier multiplier operator, that is an operator of the type:
Td
m f () = m()f ().
(167)
with m = m() a given function called the symbol of the operator. Clearly,
Z
Tm f (x) = f K(x) = f (x y)K(y)dy
(168)
where K, the kernel of T , is the inverse Fourer transform of m,
Z
n
K(x) = (2)
eix m()d
Clearly any linear differential operator P () is a multiplier with symbol P (i).
To compare the action, in physical space, between rough and smooth cut-off operators it suffices to look at the corresponding kernels K. Let I = [1, 1] R and
I the rough cut-off (while ignoring the 2 constants). The corresponding kernel
Z 1
sinx
K(x) =
eix d = 2
x
1
decays very slowly as |x| . Because of this the operator
Z
sin(x y)
F 1 (I f)(x) = 2
f (y)dy
(x y)
has very poor localization properties. Indeed, the operator spreads around to the
whole R any function supported in some set J R. This situation corresponds to
a perfect localization in frequency space and a very bad one in physical space. The
exact opposite situation occurs when we do the rough cut-off localization I f in
physical space. On the other hand, when we use a smooth cut-off I in frequency
space, then the frequency cutoff operator PI f = F 1 (I f) is of the form f K f
where the kernel
Z
K(x) =
eix I ()d
R
117
Exercise. Show that there exists no non-trivial function such that both and
F() are compactly supported.
The above discussion can be easily extended to higher dimensions. In particular
we can get a qualitative description of functions in Rn whose Fourier support is
restricted to a ball BR = B(0, R) centered at the origin. Let R be a smooth
cut-off for BR . More precisely we take it of the form
R () = (/R)
a smooth cut-off for B1 , i.e. is smooth, identically equal to 1 on B1 and
supported, say, in B2 . It is easy to check the estimate for any multi-index ,
sup | R ()| c R|| ,
where KR (x) = F
(R ).
(170)
for all R > 0, any N N and multi-index N , with a constant CN, which
depends only on N , , dimension n and choice of the fixed test function .
Proof Indeed, integrating by parts,
Z
Z
1 ix
ix
(e )R ()d
KR (x) =
e R ()d =
ix
n
n
R
ZR
1 ix
=
e R ()d
Rn ix
Thus, for any , || = N , denoting by |BR | = cn Rn the volume of BR ,
Z
|x KR (x)|
| R ()| c RN |BR | cn c RN +n
n
Rn
N
R||+n
Rn
R||+n kf kL1
118
5. FOURIER TRANSFORM
Also, by H
olders inequality with
1
p
1
p0
= 1,
0
5. Applications to PDE
Consider the initial value problems for our basic PDEs in R Rn , written in the
form
5. APPLICATIONS TO PDE
119
t = ,
)0, x) = f (x)
(171)
t = i,
(0, x) = f (x)
(172)
t2 = ,
t2 = ,
(0, x) = f (x),
(0, x) = f (x),
t (0, x) = g(x)
t (0, x) = g(x)
(173)
(174)
In each of these cases we can write down solutions using the Fourier transform
method. More precisely we can take the Fourier transform of each equation, set
Z
(175)
(176)
Exercise 1. Show how to relate the formulas (175) and (176) to the physical space
formulas (153) and (154).
In the particular case of the wave equation (173) we derive,
Z
d
sin t||
(t, x) =
eix cos t||f() +
g()
||
(2)n
Rn
(177)
Exercise 2. Derive a formula similar to (177) for the Laplace equation (174).
Show, using these formulas that (173) has solutions for all f, g S(Rn ) while (174)
does not. Show however that if we only prescribe (0, x) = f (this is the Dirichlet
problem for the Laplacian t2 + in Rn+1 ), then the problem has a unique solution
, which decays to zero as |t| + |x| , for all functions f S(Rn ).
Exercise 3. Show, in the special case of dimension 1 + 3, how to pass from formula
(177) to the Kirchoff formula (146)
Z
Z
1
1
(t, x) = t (4t)
f (y)da(y) + (4t)
g(y)da(y)(178)
|xy|=t
|xy|=t
which is consistent with the formulas derived in the previous chapter, based on the
explicit calculation of the fundamental solution.
120
5. FOURIER TRANSFORM
It is interesting to make a comparison between the Fourier based formula (177) and
the Kirchoff formula (178). Observe that it is quite easy, using Parseval, to derive
the global energy identity from (177),
Z
Z
(|t |2 + ||2 ) =
|f |2 + |g|2 dx
Rn
Rn
while obtaining such an identity from (178) seems not at all obvious, in fact quite
implausible. On the other hand (178) is perfect for giving us domain of influence
information. Indeed we read immediately from the formula that if the data f, g is
supported in ball Ba = {|x x0 | a} than (t, x) is supported in the ball Ba+|t|
for any time t. This fact, on the other hand, does not at all seem transparent6
in the Fourier based formula7 (177). The fact that different representations of
solutions have different, even opposite, strengths and weaknesses has important
consequences for constructing parametrices, i.e. approximate solutions, for more
complicated, linear variable coefficient or nonlinear wave equations. There are two
type of possible constructions, those in physical space, which mimic the physical
space formula (178) or those in Fourier space, which mimic formula (177). The first
are called Kirchoff-Sobolev, or Hadamard parametrices while the second are called
Lax parametrices, or, more generally, Fourier integral operators.
CHAPTER 6
Td
m f () = m()f ()
with bounded multipler m. In view of Parsevals identity it is very easy to check the
L2 L2 estimate, kTm f kL2 . kf kL2 . To obtain additional
R estimates we typically use
the integral representation (168) Tm f (x) = f K(x) = f (x y)K(y)dy where K
is the inverse Fourier transform of m. If, for example, we can establish that K L1
than we easily deduce that kTm f kL1 . kf kL1 , since kf KkL1 kf kL1 kKkL1 .
We thus have both L1 L1 and L2 L2 estimates for Tm . and it is tempting to
conclude we might have an Lp Lp estimate for all 1 p 2. Such an estimate
is indeed true and follows by interpolation. On the other hand, if we can establish
that K L then kf KkL . kf kL1 and thus we can prove, by interpolation, the
same Lp Lq estimate as in the Hausdorff-Young inequality.
1.2. Review of Lp spaces. Given a measurable subset Rn the space
L (), 1 p < , consists in all measurables functions f : C with finite Lp
norm,
Z
1/p
p
kf kLp =
|f (x)| dx
< .
p
For all values of 1 p the spaces Lp () are Banach spaces. The theory of
Lp spaces generalizes when we replace the Lebesgue measure dx with a general,
121
122
positive measure
(179)
whenever 1/p = 1/q + 1/r. The relationship between the exponents is necessary
so that both sides are homogeneous of degree p1 in the measure. In particular, for
p = 1,
kf gkL1 kf kLq kgkLq0
where q 0 verifying
1
q0
= 1
1
q
q
that we can identify each element g
R L with the bounded, linear functional on
q
the Banach space L given by f 7 f (x)g(x)dx. For all 1 q < the space
0
Lq () is dual to Lq () in the sense that the above identification is an isometry (in
particular, every bounded linear functional on Lq arises this way for a unique g),
while the dual of L () includes L1 (), but is vastly larger. Often taking the role
of L is the space C0 (Rn ) of continuous functions vanishing at infinity (since they
constitute the closure of C0 in the L norm), whose dual space is the set of finite,
Borel measures on Rn .
Exercise.
(180)
p
which quantitatively expresses the fact that the upper contour sets of an L function
have finite measure. It is helpful (at least as a mnemonic) to note that both sides
have the same units since f and have the same units.
Proof
Z
Z
(f, ) =
|f (x)|p
p
1There are some complications, however, when the whole space cannot be written as a countable union of finite -measure subsets. This will not concern us in these notes.
123
We can
the Lp norm of f in terms of its distribution function. Indeed, the inteR write
p
gral |f | is the measure of the region bounded by the graph {(, x) : 0 < < |f (x)|p },
hence
Z
Z Z
Z
p
p
|f (x)| dx =
(|f (x)| > )ddx = p
p1 (f, )d
0
0
(181)
where the last integral is obtained from the substitution = p .
A measurable function f : C is said to be simple if its range consists of a
PN
finite number of points in C, that is f =
i=1 ai Ai for ai C and Ai
measurable. In this section we denote by S() the set of all simple functions in .
Recall that S() is dense in Lp () for all 1 p . The proof typically involves
approximating a fixed f (x) with linear combinations of characteristic functions
(f (x) E ), and letting the collection {E } tend towards a fine and complete
partition of C.
Exercise. Let f (x, y) be a measurable function on 1 2 Rn1 Rn2 . Prove
the following version of the Minkowskis inequality,
Z
Z
k
f (x, y)dykLpx (1 )
kf (x, y)kLpx (1 ) dy
2
|f (1 + ib)| M1 ,
f (z)
M01z M1z
124
as |=z| , then one can simply apply the usual maximum modulus principle
to a sufficiently large subset of D to conclude |f (z)| 1 throughout. If this is
not the case, then (because we have assumed already that |f (z)| does not grow
substantially as |=z| ) we can apply the same argument to the approximation
F (z) = e(1z)z f (z) (which does decay for large |=z|) and conclude
|f (z)| = lim |F (z)| 1
throughout D.
1
1a
a
=
+ ,
p
p0
p1
1a
1
a
=
+ .
q
q0
q1
q0
g(x),
|f1+ib | |f |p/p1 ,
1
1z
z
=
+ 0.
q 0 (z)
q00
q1
0
125
1
1 1
= + ,
q
r
p
(184)
we have
Proof :
kT f kLq kf kLp .
(185)
kT f kL kf kLr0 .
(186)
By H
older inequality,
On the other hand the dual operator T has the same form as T ,
Z
T g(y) = k(x, y)g(x)dx,
and hence,
kT gkL kgkLr0 ,
which by duality gives the other endpoint
kT f kLr kf kL1 .
(187)
Now, we can use Theorem 1.8, with Tz T , to interpolate between (186) and (187)
and obtain (185).
2In fact, the Schwartz Kernel theorem states that every continuous linear map from
C0 (1 ) D0 (2 ) is of the form (183) for some distribution k(x, y) D0 (1 2 )
126
(188)
for all
q 2,
1/q + 1/p = 1.
(189)
127
0<<|f (x)|
R |f (x)|
But 0
pp1 1 d ' |f (x)|pp1 , since p p1 1 > 1, and
|f (x)|pp2 , since p p2 1 < 1, and the conclusion follows.
R
|f (x)|
pp2 1 d '
In the case of p2 = the proof is actually simpler. We only have to observe that
|T f (x)| implies |T f (x)| , since |T f (x)| . kf kL . Hence we can
replace (189) by
(T f, C) . (T f , ) . p1 kf kpL1p1 ,
where C is some positive constant, and the proof proceeds as before.
(191)
The averaging process may improve local regularity, but, because of the supremum,
it is not clear whether Mf preserves the integrability properties of f . If f is
essentially bounded, then Mf is bounded and
kMf kL kf kL .
(192)
128
(194)
Proof : The second part of the statement follows from the first and the L boundedness of the maximal operator by Marcinkiewicz interpolation, Theorem 1.13.
Hence, we only need to prove (193).
Let f L1 and fix > 0. By the discussion in Remark 2.3 we can find a family
of balls B = {B}, such that E = BB B and each ball B satisfies (190). If these
balls were all disjoint then it would be easy to conclude, since in that case
Z
Z
X
1
1X
|f (y)|dy
|f (y)|dy.
|E |
|B| <
Rn
B
B
BB
In general these balls are not disjoint and we have to be more careful.
Let K be a compact subset of E , then it is possibile to select a finite subfamily B 0
of balls in B that cover K. Using the covering lemma proved below3, Lemma 2.5,
we can select among the balls in B 0 another finite subfamily B 00 made of disjoint
balls (which may no longer cover K) such that
X
|B 0 B0 B 0 | .
|B 00 |.
B 00 B00
and taking the supremum over all possible compact sets K we finally obtain (193).
|K| .
Proof : We can assume that the balls Bj = B(xj , rj ) are labeled so that the radii
are in nonincreasing order, r1 r2 rN .
3This is sometimes known as the Vitali Covering Lemma
129
Take j1 = 1, so that Bj1 is the ball with largest radius. Then by induction, define
jk+1 to be the minimum index among those of the balls Bj which dont intersect
with the previously chosen balls Bj1 , . . . , Bjk ; if there are no such balls then stop
at step k.
With this construction we have that each ball Bj intersects one of the chosen balls
Bjk with rj rjk , hence Bj B(xjk , 3rjk ). This implies that
M
X
N
n
j=1 Bj M
3
|Bjk |.
B(x
,
3r
)
jk
jk
k=1
k=1
R
Let Ar be the averaging operator defined by Ar f (x) = |B(x, r)|1 B(x,r) f (y)dy.
The proof consist of two steps. First we prove that Ar f f in L1 as r 0, and
then it will be enough to show that limr0 Ar f (x) exists almost everywhere.
For the first step, given > 0, using the density of C0 in L1 , we can always find a
compactly supported continuous function g which approximates f in L1 and have
kAr f Ar gkL1 kf gkL1 < uniformly in r. Then by the uniform continuity of
g, we know that Ar g g in L1 as r 0, hence there exists an r such that
kAr f f kL1 kAr f Ar gkL1 + kAr g gkL1 + kf gkL1 3,
for r < r .
For the second step, we define the oscillation of an L1 function f by
f (x) = lim sup Ar f (x) lim inf Ar f (x).
r0
r0
130
We can apply now the weak-L1 property of the maximal function, and for any
positive we find that
1
|{x : f (x) > }| |{x : M(f g)(x) > /2}| . kf gkL1 .
Since kf gkL1 can be arbitrarily small, we infer that set of points where the
oscillation of f is positive is of measure zero.
2.8. Fractional integration. Let T be an integral operator acting on functions defined over Rn with kernel k as in (183). If the only information that we
have on k(x, y) is a decay estimate of the type
|k(x, y)| . |x y| ,
for some > 0, then Youngs inequality, Theorem 1.10, does not allow us to recover
a good control on T f , since the function | x | fails, barely, to be in Ln/ . However,
the convolution has smoothing properties that imply some positive results which
are contained in the following important theorem, originally proved by Hardy and
Littlewood for n = 1 and then extended by Sobolev to n > 1.
Theorem 2.9 (Hardy-Littlewood-Sobolev inequality). Let 0 < < n and 1 < p <
q < such that
1 1
(196)
1 = ,
n
p q
then
k| | f kLq (Rn ) . kf kLp (Rn ) .
Proof :
(197)
We can split the convolution with the singular kernel into two parts:
Z
Z
f (x y)
f (x y)
dy
+
dy,
I f (x) = | | f (x) =
|y|
|y|
|y|<R
|y|R
where the radius R is a positive constant to be chosen later We estimate the first
term simply by H
olders inequality,
Z
!1/p0
Z
n
0
f (x y)
dy kf kLp
|y|p dy
. R p0 kf kLp ,
|y|R |y|
|y|R
where we need the integrability condition p0 > n, which by (196) is equivalent to
q < .
For the second part we perform a dyadic decomposition around the singularity and
get an estimate in terms of the maximal function,
Z
Z
|f (x y)|
f (x y) X
dy
dy .
|y|
|y|<R |y|
|y|
k
k1
R 2
k=0 2
Z
X
1
.
|f (x y)|dy .
(2k R) |y|2k R
.
k=0
k=0
131
where we need < n for the convergence of the last geometric series.
At this point we have found that for every x Rn and every R > 0,
n
| | f (x) . R p0 kf kLp + Rn Mf (x),
with constants independent of R and x. We optimize this inequality choosing, for
each x, a radius R = R(x) such that the two terms on the right hand side are equal,
n
R p0
kf kLp = Rn Mf (x),
i.e.,
R(x) =
kf kLp
Mf (x)
p/n
,
+ 0 = .
p01
p2
n
It is important to understand that the relation among the exponents can be quickly
derived from scaling arguments. If we assign a length scale to L to the variable x,
the expression
Z
has the units L Ln L q , whereas kf kLp has the units L p . The exponents , q, p
must relate in such a way that the exponents of both quantities match up. Indeed, if
they did not, then one could deduce the failure of the estimate (197) by considering
an arbitrary, nontrivial f and rescaling it to derive a contradiction.
Remark. In our proof of (197) we have not fully used the power of the Maximal
function (for example, by only considering balls centered at x). In fact, the same
estimate holds upon replacing the kernel |xy| with any kernel k(x, y) sharing the
same distribution function. A proof along these lines requires one to build up the
machinery of Lorentz spaces along with a more general form of the Marcinkiewicz
interpolation theorem. For this we refer to (***)
132
Proof.
To consider all possible directions in which f could grow, we integrate over the
whole sphere, and recalling that the volume element in Rn in polar coordinates is
dy = rn1 drd , we find that
Z
|f (y)|
|f (x)| .
dy = | |1n |f | (x).
n1
|x y|
We take the Lq norm and use (197) to get
kf kLq . k | |1n |f | kLq . k f kLp ,
whenever p > 1 and
1
1 1
n1
= .
n
p q
(198)
(1 p < q < ).
(199)
f C0 (Rn ),
(200)
m N,
m
X
k k f kLp (Rn ) ,
k=0
when
n
p
m < 0.
Remark. We dont need to remember the precise condition (199); it can be deduced by a simple dimensional analysis. Since the estimate is homogeneous, it has
to be invariant under dilations, and (199) simply says that both sides in (198) have
the same scaling. Also the condition np m < 0 is a comparison of the scalings of
the two sides of (200) which excludes a very localized and spiky counterexample.
133
Remark. The following non-sharp version of estimate (198) also holds for all
1 p < q < and 1/p m/n < 1/q,
X
kf kLq (Rn ) .
k f kLp f C0 (Rn ),
(201)
||m
Exercise. Show by an example that the inequality (200) can fail to be true for
p = n/m. Prove (201) for m = 1, using the results of theorem 2.11.
Exercise. Show by a scaling argument that if the inequality (201) holds true for
1/p = 1/q + m/n < 0 then the homogeneous inequality (198) is also true.
Proof [Proof of (198)]: We obtain the cases with m > 1 by repeated iterations of
the case m = 1. Hence, we can assume m = 1 and, by (199),
np
n
q=
< .
1 p < n,
n1
np
Once we have the estimate for p = 1 and q = n/(n 1), then we get the cases
with p > 1 and q > n/(n 1) by simply applying Holder inequality. Indeed, let
q = n/(n 1), for some > 1, then
n
kf kLq = k|f | kL n1
. k|f |1 f kL1 k|f |1 kLp0 kf kLp ,
n1
n q
1 n1
1
= q.
1q
But this essentially needs no verification by the scaling of the inequality, the
exponents must work out.
It only remains to prove the special case m = 1, p = 1, q = n/(n 1). Following
Nirenberg, [?], one can show the stronger result that for f C0 (Rn ) we have
n
kf kL n1
.
(Rn )
n
Y
1/n
kj f kL1 (Rn ) .
(202)
j=1
When n = 2, we do the same with respect toeach variable and then multiply and
integrate:
ZZ
ZZ Z
Z
|f (x1 , x2 )|2 dx1 dx2
|1 f (y1 , x2 )|dy1 |2 f (x1 , y2 )|dy2 dx1 dx2
= k1 f kL1 k2 f kL1 .
When n 3 things become more tricky and, to separate the variables, we have to
make a repeated use of H
older inequality. Let just look at the case n = 3. To ease
134
Z
|f (x)|
(x)dxj =
(
xj ). We start with
12 Z
21 Z
21
|f1 (, x2 , x3 )|
|f2 (x1 , , x3 )|
|f3 (x1 , x2 , )|
.
Then integrate with respect to x1 . The first factor on the right hand side doesnt
depend on x1 , while we use Holder to separate the second from the third,
Z
12 Z
12 Z
12
Z
3
2
|f (, x2 , x3 )|
|f1 (, x2 , x3 )|
|f2 (, , x3 )|
|f3 (, x2 , )|
.
1
1,2
1,3
1,2
1,2
1,2,3
21
|f3 ()|
,
1,2,3
21 Z
|f2 ()|
1,2,3
12
|f3 ()|
1,2,3
(204)
where the integrability condition needed here is (n 1)p0 < n, which is precisely
p > n.
In general, fix a cutoff function C0 with support in B and (0) = 1, then in
view of the above, |f (0)| = |(0)f (0)| . k(f )kLp . kf kLp + kf kLp .
135
the weak4 derivatives u belong to Lp (). These spaces come equiped with the
norms,
X
1/p
kukW s,p () =
k ukpLp ()
,
for 1 p <
||s
kukW s, ()
k ukL ()
||s
if
1 m
1
.
p
n
q
Moreover, for q = , W m,p (Rn ) embeds into the space of bounded continuous
functions on Rn provided that m > n/p.
Proof :
Follows from theorem 2.11 and the density of C0 (Rn ) in W m,p (Rn ).
2.15. H
older spaces. Together with Sobolev spaces Holder spaces play a very
important role in Analysis, especially in connection to elliptic equations. Before
introducing these spaces we recall the definitions of the spaces C m () of m times
4That is derivatives in the sense of distributions.
136
[u]C 0, () = sup
(205)
The H
older space C k, () consists of all functions u C k () for which the norm,
X
kukC k, () = kukC k () +
[ u]C 0, () .
(206)
||=k
is finite.
Exercise 2.17.
Exercise 2.18.
Show that C 0,1 ((a, b)), the space of Lipschitz functions on an
interval, consists exactly of those distributions whose derivative belongs to L .
Exercise 2.19.
Let f (x) = (a x b) be the characteristic function of an
interval. Show that the seven-fold convolution f f is in the Holder class C 5,1 .
The following stronger version of the Sobolev embedding in L is important in
elliptic theory. As usual, the relationship between the exponents involved can be
deduced from dimensional analysis.
Theorem 2.20 (Morreys inequality). Assume n < p . Then, for all u
C0 (Rn ),
kukC 0, (Rn ) . kukW 1,p (Rn )
(207)
|| s}.
Proposition 2.22. The Sobolev space H s (Rn ) coincides with the set of all tempered
distributions u S 0 (RN ) for which u
is locally integrable and,
Z
kuk2H s =
(1 + ||2 )s |
u()|2 <
(208)
RN
137
Proof : Follows easily from the Parseval identity, the density of C0 in each space,
Question. Why does kukH s have units L 2 s if we consider the physical space
variable to have the unit L?
Exercise. For s (0, 1) the space H s (Rn ) coincides with the space of locally
integrable functions such that,
1/2
Z Z
|u(x) u(x + y)|2
2
dxdy + kukL2 (Rn )
<
(210)
|y|n+2s
Rn Rn
Exercise. Prove that, for s > n/2 the Sobolev space H s (Rn ) embeds in the space
of bounded continuous functions.
2.23. A Trace Theorem. In order to make sense of boundary values of generalized functions for partial differential equations, it is important to prove that the
operation of restriction, which obviously makes sense for continuous functions, continues to make sense even when the function is not continuous. Such theorems are
called trace theorems. Consider, for simplicity, the case of the hyperplane xn = 0
in Rn and define the trace operator,
T f (x1 , . . . , xn ) = f (x1 , . . . , xn1 , 0).
(211)
Clearly the operator makes sense for any continuous functions f , in particular for
any test function, in Rn .
Theorem 2.24. The following estimate holds true, uniformly for any test function
f C0 (Rn ), n 2 and any s > 1/2.
kT f k
H s 2 (Rn1 )
. kf kH s (Rn )
(212)
138
I.e.
Z
f(x0 , n ) =
Z
.
2
f()dn (1 + | 0 |2 )s1/2 d 0
Z
0 2 s1/2
2
2 s
(1 + | | )
|f ()| (1 + || ) dn J( 0 )d 0
Z
n1
Z
.
Rn1
with,
J( 0 ) =
(1 + ||2 )s dn
Z
0 2 s+1/2
= (1 + | | )
(1 + y 2 )s dy
Plugging this into our above estimate for kgkH s1/2 proves the result.
Similar results hold for traces to higher co-dimension hypersurfaces. Here is such
a result, which can be proved by elementary means.
Proposition 2.25. Consider the trace operator T in R3 which takes continuous
functions f (t, x1 , x2 ) to T f (t) = f (t, 0, 0). We have, for any test function f ,
kt (T f )kL2 (R) . k 2 f kL2 (R3 )
2
(213)
3
|t f (t, 0, 0)|2 dt
Z
dx2 1 2 t f (t, x)t f (t, x)dt
0
0
Z
Z
Z R
= 2
dx1
dx2
1 2 t f (t, x)t f (t, x)dt
Z0
Z0
ZR
+ 2
dx1
dx2
1 t f (t, x)2 t f (t, x)dt
dx1
Clearly,
Z
dx
dx
139
1 t f (t, x)2 t f (t, x)dt . k 2 f k2L2 (R3 )
Hence,
Z
dx
dx
1 2 t f (t, x)t f (t, x)dt .
k 2 f kL2 (R3 ) .
as desired.
Exercise. Prove the same result using Fourier transform and extend it to all dimensions and general H s spaces. Exercise Extend the result to bounded intervals
in t.
2.26. Extensions. To extend results which hold true for functions in Rn to
domains in Rn we need to extend the functions in a controlled manner. I will restrict
the discussion to the case of the half space Rn+ = {x Rn /xn 0}. Consider the
Sobolev space W 1,p (Rn+ ). We want to prove the following.
Proposition 2.27. There exists a bounded linear operator E : W 1,p (Rn+ ) W 1,p (Rn ),
such that for any continuous f ,
Ef |Rn+ = f
and,
kEf kW 1,p (Rn ) . kf kW 1,p (Rn+ )
Proof It suffices to prove the result for functions f C 1 (Rn+ ). Given such a function we define, using its higher order reflection, its extension barf which coincided
with f in Rn+ and, for all xn < 0,
1
f(x0 , xn ) = 3f (x0 , xn ) + 4f (x0 , xn )
2
Observe first that f is also C 1 . Indeed f is continuous across xn = 0 and so are its
derivatives with respect to the variables x0 = (x1 , . . . , xn1 ). On the other hand,
for xn < 0,
1
n f(x0 , xn ) = 3n f (x0 , xn ) 2n f (x0 , xn )
2
Hence, letting xn tend to zero with xn < 0
(n f) (x0 , 0) = n f (x0 , 0).
Using these calculations we immediately derive the desired estimate.
140
Exercise Extend the result to the W s,p spaces, with s N. What about fractional
H s spaces ?.
3. Littlewood-Paley theory
In its simplest manifestation Littlewood-Paley theory is a systematic and very useful method to understand various properties of functions
f , defined on Rn , by
P
decomposing them in infinite dyadic sums f =
kZ fk , with frequency localized components fk , i.e. fbk () = 0 for all values of outside the dyadic annulus
2k1 || 2k+1 . Such a decomposition can be easily achieved by choosing a test
function () in Fourier space, supported in 21 || 2, and such that, for all
6= 0,
X
(2k ) = 1.
(214)
kZ
(215)
Pk f = fk = mk f
(216)
(218)
k, k 0 Z,
|k k 0 | > 2.
Therefore,
Pk f =
X
k0 Z
Pk0 (Pk f ) =
Pk 0 Pk f
|kk0 |1
Thus,P
since Pk1 , Pk , Pk+1 do not differ much between themselves we can write
Pk = |kk0 |1 Pk0 Pk Pk2 . It is for this reason that the cut-off operators Pk are
called, improperly, LP projections.
P
Denote PJ = kJ Pk for all intervals J Z. We write, in particular, Pk =
P(,k] and P<k = Pk1 . Clearly, Pk = Pk P<k .
3. LITTLEWOOD-PALEY THEORY
141
The following properties of these LP projections lie at the heart of the classical LP
theory:
Theorem 3.1. The LP projections verify the following properties:
LP 1. Almost Orthogonality.
The operators Pk are selfadjoint and verify
Pk1 Pk2 = 0 for all pairs of integers such that |k1 k2 | 2. In particular,
X
kf k2L2
kPk f k2L2
(219)
k
LP 2.
Lp -boundedness:
(220)
LP 3.
Finite band property. We can write any partial derivative Pk f in the
form Pk f = 2k Pk f and the symbol of Pk is a cut-off operator5 which verifies
property LP2. In particular, for any 1 p
kPk f kLp
. 2k kf kLp
(221)
. kf kLp
(222)
2 kPk f kLp
LP 4.
Bernstein inequalities.
inequalities,
kPk f kLq
kP0 f kLq
. 2kn(1/p1/q) kf kLp ,
k Z
. kf kLp .
(223)
(224)
In particular,
kPk f kL . 2kn/p kf kLp .
LP5.
Commutator estimates
[Pk , f ] g = Pk (f g) f Pk g
with f, g
C0 (Rn ).
We have,
k [Pk , f ] gkLp . 2k kf kL kgkLp .
|Pk f (x)|2
1/2
(225)
kZ
is known as the Littlewood-Paley square function. For every 1 < p < there exists
constant(s), depending on p, such that for all f C0
kf kLp . kSf kLp . kf kLp
(226)
142
Proof : Only the proof of LP6 is not straightforward and we postpone it until
next section. The proof of LP1 is immediate. Indeed we only have to check (219).
Clearly,
X
X
kf k2L2 = k
Pk f k2L2 =
< Pk f, Pk0 f >L2
|kk0 |1
|kk0 |1
kPk f k2L2
P
To show that k kPk f k2L2 . kf k2L2 we only need to use Parsevals identity together
with the definition of the projections Pk .
It suffices to prove LP2 for intervals of the form J = (, k] Z, that is to prove
k
\
Lp boundedness for Pk . If () = () (2) then P
k f = (/2 )f (). Thus
Pk f = m
k f,
where m
k (x) = 2nk m(2
k x) and m(x)
n
X
Hence,
2k
j=1
where j () =
j [
i||2 xj f ().
j
i||2 ().
X
j
(/2k )[
2k j (/2k )[
xj f () =
xj f ()
2
i||
j=1
n
X
(j m)k j f
j=1
with (j m)k (x) = 2nk j m(2k x) and j m the inverse Fourier transform of j . Thus,
as before,
2k kPk f kLp .
n
X
kj f kLp = kf kLp
j=1
as desired.
Property LP4 is an immediate consequence of the physical space representation
(216) and the convolution inequality (188).
kPk f kLq
3. LITTLEWOOD-PALEY THEORY
143
where 1 + q 1 = r1 + p1 . Now,
Z
1/r
nk
kmk kLr = 2
= 2nk 2nk/r kmkLr . 2nk(11/r) . 2nk(1/p1/q)
|m(2k x)|r dx
Rn
d
f (x + s(y x))ds
0 ds
. |x y|kf kL
|f (y) f (x)| .
Hence,
|Pk (f g)(x) f (x)Pk g(x)|
. 2k kf kL
Z
|m
k (x y)||g(y)|dy
Rn
where m
k (x) = 2nk m(2
k x) and m(x)
= |x|m(x). Thus,
kPk (f g) f Pk gkLp . 2k kf kL kgkLp
We leave the proof of property LP6 for the next section.
Remark. It could have simplified matters in the preceding proof to prove properties LP2-4 only in the case k = 0, and deduce the more general estimates from
the scaling identity (218). In particular, note that the Bernstein inequality is simply the statement that lower Lp norms control higher Lp norms when f is localized
in frequency space (as opposed to the other way around, which occurs when f is
localized in physical space). This accords with our intuition for Lp norms: while
a frequency localized function may be too large at in physical space to be integrable, one need not worry about sudden jumps or spikes where the function blows
up locally, and hence only the former phenomenon needs to be controlled.
Definition. We say that a Fourier multiplier operator Pk is similar to a standard
LP projection Pk if its symbol
k is a bump function adapted to the dyadic region
|| 2k . More precisely we can write
k () = (
2k ) for some bump function
1 k
k
supported in the region c 2 . || c2 for some fixed c > 0.
Remark. Observe that the inequality kPk f kLp . kf kLp holds for every other
operator Pk similar to Pk . The same holds true for the properties LP3, LP4 and
LP5.
Remark: We have the following pointwise relation of the operator Pk with the
maximal function:
|Pk f | . Mf (x)
Indeed we have, as before,
Pk f = m
k f,
(227)
144
where m
k (x) = 2nk m(2
k x) and m(x)
S(Rn ). Therefore,
Z
Z
nk
k
nk
|Pk f | . 2
|f (y)|m
2 (x y) |dy . 2
|f (y)|(1 + 2k |x y|)n1 dy
Z
. 2nk
|f (y)|(1 + 2k |x y|)n1 dy
+
2nk
B(x,2k )
XZ
j=0
.
.
2nk
|f (y)|(1 + 2k |x y|)n1 dy
2j 2k |xy|2j+1
Z
|f (y)|dy +
B(x,2k )
Mf (x) +
2(n+1)j
. Mf (x) + 2n
|f (y)|dy
|xy|2j+1k
j0
j>0
1
|B(x, 2k+j+1 )|
Z
|f (y)|dy
B(x,2k+j+1 )
2j Mf (x) . Mf (x)
j>0
as desired.
Properties LP3-LP4 go a long way to explain why LP theory is such a useful tool
for partial differential equations. The finite band property allows us to replace
derivatives of the dyadic components fk by multiplication with 2k . The Lp L
Bernstein inequality is a dyadic remedy for the failure of the embedding of the
n
Sobolev space W p ,p (Rn ) to L (Rn ). Indeed, in view of the finite band property,
the Bernstein inequality does actually imply the desired Sobolev inequality for each
LP component
fk , the failure of the Sobolev inequality for f is due to the summation
P
f = k fk .
In what follows we give a few applications of LP -calculus.
3.2. Interpolation inequalities.
The following inequality holds true for
arbitrary functions in C0 (Rn ) and any integers 0 i m:
1i/m
k i f kLp . kf kLp
i/m
k m f kLp
(228)
Thus,
k i f kLp
i kf kLp + im k m f kLp
for any 2Z . To finish the proof we would like to choose such that the two
terms on the right hand side are equal to each other, i.e.,
m
1/m
k f kLp
0 =
kf kLp
3. LITTLEWOOD-PALEY THEORY
145
. kf kLp +
. kf k
Lp
k>0
kn(m/n)
k>0
m
kf kLp . kf kLp +
2kn k m f kLp
k>0
+ k f k
Lp
kPk f k2L2
kZ
. kf kL2
kZ
X
kf k2H s
22ks kk f k2L2
(230)
k=0
The Littlewood- Paley decompositions can be used to define new spaces of functions
such as Besov spaces.
146
Definition:
norm:
s
The Besov space Bp,q
(Rn ) is the closure of C0 (Rn ) relative to the
s
kf kBp,q
=(
2ksq kk f kqLp ) q
(231)
k=0
(232)
p,q
kZ
s
One similarly define Triebel space Fp,q
by reversing the Lp norm and lq norm in
s
s
(231). Thus, for example, the H norm is equivalent with the Besov norm B2,2
.
s
s
s
Observe that, H B2,1 . One reason why the larger space B2,1 is useful is because
of the following
kf kL . kf kB n/2
(233)
2,1
which follows from the Bernstein inequality LP4. (233) will play a key role in
s
the following section. Another reason to use the Besov norms B2,1
will become
transparent in the next section where we discuss product estimates.
= Pk f[k5,k+5] g[k5,k+5]
LHk (f, g ) = Pk fk5 g[k3,k+3]
HLk (f, g ) = Pk f[k3,k+3] gk5
LLk (f, g )
The term HHk (f, g ) corresponds to high-high interactions. More precisely, each
term in the sum defining HHk (f, g ) has frequency 2m for some 2m >> 2k . We
shall write schematically,
X
HHk (f, g ) = Pk
fm gm
(236)
m>k
3. LITTLEWOOD-PALEY THEORY
147
The term LLk (f, g ) consists of a finite number of terms which can be typically
ignored. Indeed they can be treated, in any estimates, like either a finite number
of HH terms or a finite number of LH and HL terms. We write, schematically,
LLk (f, g ) = 0
(237)
Finally the LHk and HLk terms consist of low high, respectively high-low, interactions. We shall write schematically,
LHk (f, g ) = Pk f<k gk
(238)
HLk (f, g ) = Pk fk g<k
(239)
Remark. In the correct expression of LHk given by (235) the terms of the form
fk5 gk00 , k 00 [k 3, k + 3], have Fourier supports in the dyadic region 2k .
Thus Pk can be safely ignored and we can write,
LHk (f, g ) f<k gk .
We have thus established, the famous trichotomy formula,
Pk (f g) = LHk (f, g) + HLk (f, g) + HHk (f, g)
(240)
(241)
kf gkH s . kf kH s kgkH s
(242)
In what follows we give a somewhat simple proof of theorem (3.7) which is very
instructive. The proof6 shows that it is sometimes better not to rely on the full
decomposition (235) but rather using decompositions sparingly whenever needed.
Indeed, we write,
X
X
X
kf gk2H s .
22ks kPk (f g)k2L2 .
22ks kPk (f<k g)k2L2 +
22ks kPk (fk g)k2L2
k
Now,
X
kgk2L
kgk2L
XX
kgk2L
X X
kgk2L kf k2H s
k
0
k k0 k
k0
0
0
22(kk )s k2k s fk0 k2L2
kk0
148
P
To estimate k 22ks kPk (fk g)k2L2 we shall decompose further, proceeding as in the
P
decomposition (235). But first observe that the term k 22ks kPk (f[k3,k] g)k2L2 can
P
be treated precisely as k 22ks kPk (f>k g)k2L2 . Indeed we might as well estimated
P 2ks
kPk (f>k3 g)k2L2 instead. Now,
k2
X
X
X
Pk (fk3 gk0 ) =
Pk (fk3 g) =
Pk (fk3 gk0 ) +
Pk (fk3 gk0 )
k0
k0 <k2
k2k0 k+2
Pk (fk3 gk0 )
k0 >k+2
Observe that the first and last term are zero, therefore,
X
Pk (fk3 g) =
Pk (fk3 gk0 ) Pk (fk3 gk ).
k2k0 k+2
(243)
Of course this formula is not quite right, but is morally right. Now,
X
X
22ks kPk (f<k g)k2L2 =
22ks kf<k gk k2L2
k
. kf k2L
as desired.
Remark.
(244)
Exercise.
(245)
4. Wentes Inequality
In this section we prove Wentes inequality as an application of Littlewood-Paley
theory. In what follows given two functions f, g in R2 we consider the bilinear
expression (df dg) = x f y g y f x g, where denotes the trivial Hodge duality
in R2 . By abuse of language we drop the dual sign below and write simply df dg
.
Theorem 4.1. On R2 , assume f , g H 1 (R2 ), u = (df dg). Then u L is
in fact continuous.
4. WENTES INEQUALITY
149
LHk
= dP<k f dPk g
HLk
= dPk dP<k g
X
= Pk (
(dPm f dPm g)
HHk
mk
By symmetry we only need to deal with LH and HH. The LH term is trivial to
estimate, without using the special structure of the wedge product. Using the
Bernstein inequality we write,
X
2k kLHk kL2 . 2k
kdPl f kL kdPk (g)kL2
l<k
l<k
The proof now follows with the following discrete version of the Young inequality.
Lemma 4.2. Let f (k) l1 (Z) and g(k), h(k) l2 (Z). Then,
X
f (k l)g(l)h(k) kf kl1 kgkL2 khkl2 .
k,l
kDPl f k2L2
1/2 X
(
kDPk f k2L2 )1/2
mk
150
Therefore,
2k kHHk kL2
Thus, again, using the discrete Young inequality of the lemma above,
X
2k kLHk kL2 . kDf kL2 kDgkL2
k
as desired.
p1
Z
kf k
q
Lp
x Lt
=
R2
kf (, x)kpLq (I) dx
t
(246)
We observe that
1
0
kgkB2,1
. kgkB2,1
+ kgkL2
Thus, (246) follows from the sharp bilinear trace theorem below.
Theorem 5.1. For any smooth, scalar functions g,h on I R2 , we have
Z
0
k t g hdtkB2,1
. kgkH 1 (IR2 ) khkH 1 (IR2 )
I
X
k0
Z
kPk
0
Z
t g hdtkL2x + kP<0
k0
t g hdtkL2x
t g hdtkL2x
(247)
151
P
P
We will then decompose g and h with respect to x; g = k Pk g = k gk , h =
R1
P
P
k Pk h =
k hk . Then we can decompose Pk 0 (t g h) = Ak + Bk + Ck + Dk ,
where
Z 1
A k = Pk
(t g)<k hk
0
1
(t g)k h<k
Bk = P k
0
1
(t g)<k h<k
Ck = Pk
0
(t g)k hk
Dk = Pk
0
We can then use Bernstein inequality LP4 and property LP3 on h to pull out the
0
00
0
00
0
00
power 2k k . Writing 2k k . 2(k k)/2+(kk )/2 , using LP1, and summing over
k, we can then get:
X
2 . kt gkL L2 khkL L2
kAk kL
x
x
t Lx
t
t
k0
R1
(t g)k hk , write
Z 1
X
1
2
Dk = Dk + Dk =
Pk
(t g)k0 hk00 +
To estimate Dk = Pk
kk0 k00
X
kk0 k00
(t g)k00 hk0
Pk
0
Dk1 can be estimated straightforwardly, without integration by parts. Use LP4 and
LP3 to write
0
kDk1 kL2x . 2kk kt gkL2t L2x khkL2t L2x
Then sum over k and use LP1 to get:
X
kDk1 kL2x . kt gkL2t L2x khkL2t L2x
k0
To estimate Dk2 we use integration by parts to transfer the t from the highfrequency gk00 to the low-frequency hk0 . After integrating by parts we treat the
152
result exactly as Dk1 . Thus, we need only estimate the boundary terms: kIk (1)
2 , where
Ik (0)kL2x . kIk kL
t Lx
X
Ik =
Pk (gk00 hk0 )
kk0 <k00
00
k|)
kgk0 kkhk00 k
Using this lemma, we integrate by parts and bound Dk2 just as Dk1 plus the boundary
term, and eventually get:
X
kDk2 kL2x . kgkH 1 khkH 1
k
R1
P
Now we estimate Bk by similarly decomposing to Bk = k0 <kk00 Pk 0 (t g)k00 hk0 .
As above,
P we integrate by parts and use the lemma to estimate the boundary terms
Jk = k0 <kk00 Pk (gk00 ) hk0 ). It is then not hard to manipulate and sum over k to
get
X
kBk kL2x . kgkH 1 khkH 1
k
Combining all the estimates for Ak , Bk , and Dk completes the proof of the theorem.
It only remains to prove the above Lemma which helped us estimate the boundary
terms. Without going into all the details, this is done by considering the three
cases:
k 0 k 00 k, k 0 k > k 00 , k > k 0 k 00
We note that the third (low-low) case is impossible. The other two cases are
bounded using LP3 and the the following (simple) calculus inequality:
1
2
2
2 . kt f k 2 2 kf k 2 2 + kf kL2 L2
kf kL
L L
L L
t Lx
t x
t
(248)
Exercise.
6. Calderon-Zygmund theory
The following L2 identity
n
X
i,j=1
ki j uk2L2 = kuk2L2 .
6. CALDERON-ZYGMUND THEORY
153
for any u C0 (Rn ) can be easily established by integration by parts, see below in
(252). Thus,
k 2 ukL2 . kukL2
(249)
It is natural to ask whether such estimate still holds true for other Lp norms. It
turns out that the problem can be reduced to that of study the Lp boundedness
properties for a very important class of linear operators called Calderon-Zygmund.
Definition 6.1. A linear operator T acting on L2 (Rn ) is called a Calderon-Zygmund
operator if:
(1) T is bounded from L2 to L2 .
(2) There exists a measurable kernel k such that for every f L2 with compact support and for x 6 suppf , we have
Z
T f (x) =
k(x y)f (y)dy,
Rn
|k(x)| . |x|n1
(251)
R
Example 1. Hilbert transform Hf (x) = eix sign f()d. By Plancherel it is
easy to check that H is a bounded linear operator on L2 . On the other hand we
know that the inverse Fourier transform of sign is proportional to the principal
value distribution pv(1/x). Hence, if x 6 suppf ,
Z +
1
f (y)dy.
Hf (x) = c
x
1
x
Example 2. Consider the equation u = f in Rn , n 3, for f , smooth, compactly supported. Recall, see (??), that any solution u, vanishing at7 , can be
represented in the form, u = Kn f where Kn (x) = cn |x|2n . Thus, if x 6 suppf ,
it makes sense to differentiate under the integral sign and derive,
Z
i j u = i j Kn f =
i j Kn (x y)f (y)dy.
Rn
7In the case of n = 2 any solution whose first derivatives vanish at .
154
Rn
n
X
i,j=1
Rn
i,j=1
|Rij f (x)|2 dx
(252)
Rn
Rn
Rn
Rn
where
T g(y) =
Z
k(y + x)g(x)dx,
y 6 suppg.
Rn
sup
kgkLp0 1
Rn
kgkLp0 1
Rn
We shall prove the main theorem 6.3 in the next two subsections.
6. CALDERON-ZYGMUND THEORY
155
(253)
Proof : For each x denote by Qx the largest dyadic cube containing x with
the property: dist (Qx , ) > size (Qx ). If Q denotes the parent of Qx then
dist (Q , ) size (Q ). By the triangular inequality it follows that
n + 2 size (Qx ).
dist (Qx , ) n size (Qx ) + dist (Q , )
Hence, Qx verifies (253). If y Qx then, by the maximality property of Qx and
Qy , we necessarily have Qy = Qx . Hence, the family Q = {Qx }x is formed of
disjoint cubes and covers .
Proposition 6.8 (Calderon-Zygmund decomposition). Let f L1 (Rn ) and > 0.
Then it is possible to findP
a countable family of disjoint dyadic cubes Q = {Q} and
a decomposition f = g + QQ bQ , such that:
kgkL . ,
(254a)
supp bQ Q,
Z
bQ (x)dx = 0,
(254b)
(254d)
(254c)
(254e)
156
Remark Note that in the above , g, bQ and f all have the same units, so that
these estimates on the sizes and supports of g and bQ are the only ones possible
that are still dimensionally correct.
Proof : Let Q be the Whitney decomposition of the open set
R = {Mf (x) > }
as indicated in Lemma (6.7). For each Q, define fQ = |Q|1 Q f (x)dx. Let
(
f (x), if x 6 ,
g(x) =
fQ ,
if x Q,
and bQ (x) = Q (x)(f (x) fQ ) with
P Q the characteristic function of the cube
Q. Of course we have f = g + Q bQ . The important property, which follows
from (253), is that each cube Q is contained inside a ball B which is not entirely
contained in and with |Q| |B|. Let x B \ , we have
Z
Z
1
1
|f (y)| dy .
|f (y)| dy Mf (x) .
(255)
|fQ |
|Q| Q
|B| B
We check now that this decomposition has the desired properties. For almost every
x outside , by Lebesgues differentiation theorem, Corollary 2.7, we have |g(x)|
Mf (x) . When x it follows from (255) that g(x) . . Hence (254a) is
satisfied. Properties (254b) and (254c) are immediate consequences of the definition
of hQ . Property (254d) is implied by (255). Finally, (254e) is nothing but the weak
L1 property for Mf proved in Theorem 2.4.
1
P 6.9. Proof of Theorem 6.3. Consider f L and > 0. Let f = g +
Q bQ = g + b be the Calderon-Zygmund decomposition of f according to Theorem 6.8. Since
{|T f (x)| > } {|T g(x)| > /2} ({|T b(x)| > /2})
and in view of (254e) it is enough to prove separately that
1
|{|T g(x)| > /2}| . kf kL1 ,
(256)
1
|{|T b(x)| > /2}| . kf kL1
(257)
X
X
1
1
1
kf kL1 +
kbQ kL1 . kf kL1 +
|Q| . kf kL1 .
It remains to derive (257). Since the family Q is countable we denote its members
j to be the cube with
by Qj , j N. For each Qj let y(j) be its center and take Q
1/2
the same center but with the sides expanded by 2n , such that for all x in the
j ,
complement of Q
|x y(j) | 2 max |y y(j) |
yQj
6. CALDERON-ZYGMUND THEORY
157
bj dy = 0 we
Qj
XZ
j
xRn \Q
j y(j) }
xRn \{Q
Z
|b(y)|
yQj
XZ
j
|b(y)|
XZ
yQj
|bj (y)|
XZ
j
j
xRn \Q
yQj
|b(y)| . kf kL1
yQj
Therefore,
|{x F : |T b(x)| > /2}|| . 1 kf kL1
j is also
On the other hand, the measure of the complement of F , i.e. = Q
controlled by,
X
X
j | .
||
|Q
Qj . 1 kf kL1 .
j
Hence,
|{x Rn : |T b(x)| > /2}|| . 1 kf kL1
as desired.
6.10. Michlin-H
ormander theorem. An important class of CZ operators
can be defined by means of Fourier multiplier operators. Recall that these are
defined by Fourier transform,
Tcf () = m()fb(),
(258)
where m is a bounded function, called the multiplier. We can view these operators
as convolution operators, T f = k f , where b
k = m. It is natural to ask when a
Fourier multiplier operator gives rise to a CZ operator. Since we know that a CZO
will grant extra decay to a localized function of mean zero, we would expect that
the multiplier m should be fairly away from the origin. This is precisely the content
of the following theorem.
158
(259) we obtain
m ()2 d . n2j .
||=j
(260)
|x|R
Z
|k (x)| dx .
|x|2l |k (x)| dx
1/2
|x|R
Z
|x|>R
dx
|x|2l
!1/2
. (R)n/2l .
(261)
6. CALDERON-ZYGMUND THEORY
159
(263)
n/2l
|k (x)| dx . (|y|)
|x||y|
(264)
We sum over using (262) when |y| 1 and (264) when |y| > 1, and obtain9
Z
X
X
|k(x y) k(x)| dx . |y|
+ |y|n/2l
n/2l . 1.
|x||y|
|y|1
>|y|1
as desired.
Exercise.
Let C0 (C) and let f be the solution to the inhomogeneous
Cauchy-Riemann equations f
z = which decays at infinity. Show that for 1 < p <
we have the estimate
||f ||Lp . ||||Lp
6.12. Square function estimates. We recall property LP6 for the square
P
2 1/2
function, Sf =
,
k |Pk f |
Theorem 6.13 (Littlewood-Paley). We have,
kf kLp . kSf kLp . kf kLp
(265)
Z
.
!1/2
X
|Pk f (x)|
!1/2
X
|Pk0 g(x)|
dx
k0
The left inequality in (265) now follows by taking the sup over all g with kgkLp0 = 1.
9 Here we used the following summation properties, in dyadic notation, for geometric series,
' L and
160
To prove the right inequality in (265) we need to introduce the Rademacher functions rk (t) defined on R as follows: for every k 0, k Z and t R set
rk (t) = r0 (2k t), where r0 (t) is the periodic function, r0 (t + 1) = r0 (t), such that
r0 (t) = 1 for 0 t < 1/2 and r0 (t) = 1 for 1/2 t < 1. These Rademacher
functions form an orthonormal sequence in L2 [0, 1] and they form a sequence of
independent identically distributed random variables. The basic property that we
need is that the Lp norm of a linear combination of Rademacher function is equivalent to the l2 norm of its coefficients.
P 2
Lemma 6.14. Given a sequence of real numbers {ak } satisfying
k=0 ak < ,
define
X
F (t) =
ak rk (t).
k=0
P
Then F L ([0, 1]) with kF kL2 = ( k=0 a2k )1/2 . In addition, F Lp ([0, 1]) for
1 < p < , and there exist constants Ap so that
2
A1
p kF kLp kF kL2 Ap kF kLp .
For a proof of this lemma see Stein, [?, Appendix D].
Define the operator Tt so that
Tt f =
rk (t)Pk f
k=0
P
Clearly Tt is the Fourier multiplier operator with symbol mt () = k rk (t)(2k ),
where is the smooth cut-off function used to define the LP projections. For 6= 0,
at most three of the terms in the sum defining mt () can be non-zero. We can then
easily verify that mt verifies the condition of Thm. 6.11. That is, that
| mt ()| C |||| ,
with constants C independent of t. Thus, by Calderon-Zygmund theory (specifically Corollary 6.4), we have:
kTt f kLp . kf kLp
And so,
Z
0
1/p
kTt f kpLp dt
. kf kLp
Z
&
R
!p/2
X
|(Pk f )(x)|
dx
6. CALDERON-ZYGMUND THEORY
161
(Note that this argument proves the theorem only in the one-dimensional case,
n = 1. It can, however, be extended to Rn as in Stein, Singular Integrals, Ch. IV,
Section 5.)
Clearly, if f S(Rn ), for every x Rn , Sf (x) l2 and Sf (x) = |Sf (x)| denotes
the l2 norm of Sf (x). We claim that
Z
Sf (x) = K(x y)f (y)dy
is a an l2 -valued Calderon-Zygmund operator with the l2 -valued kernel defined by,
K(x) = Kk (x) kZ ,
Kk (x) = 2nk (2
k x)
P
P
2 1/2
2 1/2
Denote |K(x)| =
, |K(x)| =
. We easily
k |Kk (x)|
k |Kk (x)|
check that the l2 valued version of the condition (251) is verified,
|K(x)| . |x|n
|K(x)| . |x|(n+1) ,
for x 6= 0.
(266)
Thus S is indeed an l valued C-Z operator and therefore, in view of a straightforward extension of Theorem 6.3 and its corollary, we infer that,
kSf kLp := k|Sf |kLp = kSf kLp . kf kLp
In view of the beginning of the first proof of our theorem we infer that also,
kf kLp . kSf kLp .
P
P
2 1/2
Remark that, according to theorem 6.13, | k Pk f |
. A more
k |Pk f |
general principle asserts that if a sequence of functions f1 , f2 , . . . fk . . . oscillate at
P
P
2 1/2
different rates, that is any two phases are different, then | k fk |
.
k |fk |
The following version of the property LP6, and theorem 6.13, also holds true for
LP projections Pk Pk . More precisely,
X
1/2
k
|Pk f |2
kLp . kf kLp ,
1 < p < .
(267)
k
This can be proved in the same manner as the inequality kSf kLp . kf kLp by
= (Pk f )kZ , and proceeding exactly as in
introducing the l2 valued operator, Sf
the second proof of theorem 6.13. Given an l2 valued vector function g = (gk )kZ
observe that
Z
Z
Z X
X
(x)dx =
f (x)
Pk gk (x)dx
< Sf, g >=
Sf (x) g
Pk f (x)gk (x)dx =
Rn
Rn
Rn
162
Thus,
g =
S
Pk gk
(268)
gk p0 . kgk p0 , for
and therefore the estimate dual to (267) has the form, kS
L
L
0
1/p + 1/p = 1. In other words,
k
Pk gk kLp . k
|gk |2
1/2
kLp ,
1 < p < .
(269)
(270)
kZ
(271)
kZ
P
2
Proof : Recall that Sf (x)2 =
kZ |Pk f | . If p/2 1, in view of LP6 and
Minkowski inequality, we have
X
X
X
kf k2Lp . kSf k2Lp = k
|Pk f |2 kLp/2
k |Pk f |2 kLp/2 =
kPk f k2Lp .
k
The reverse Minkowski inequality we have used here states that for 0 < q 1 and
a sequence of positive functions (fk )kZ
X
X
k
|fk |kLq
kfk kLq .
(272)
k
We briefly sketch a proof of (272); it can be found in many books (e.g. Garling, Inequalities or DiBenedetto, Real Analysis, from which we take this particular
proof).
One way is to first prove a reverseR Holder inequality: For 0 < p < 1, q < 0,
1
1
p
q
|f g| kf kLp kgkLq . This can be easily shown
p + q = 1, f L , g L , we have
R
1/p
p
|f g|
by writing kf kLp =
and applying the usual Holder inequality with the
|g|p
exponents p = 1/p > 1 and q = 1/(1 p) > 1.
6. CALDERON-ZYGMUND THEORY
163
With this in hand, the reverse Minkowski inequality in two terms (k|f | + |g|kLq
kf kLq + kgkLq for 0 < q 1) follows (writing q10 = 1 1q ):
Z
k|f | + |g|kqLq =
(|f | + |g|)q1 (|f | + |g|)
Z
(|f | + |g|)
(q1)q 0
1/q0
(kf kLq + kgkLq )
q1
k|f | + |g|kL
q (kf kLq + kgkLq )
6.16. W s,p - Sobolev spaces. We recall that we have defined the W s,p norm
of a function by,
s
X
kf kW s,p =
k j f kLp
j=0
Proof :
We first write,
k j f kLp . k
j Pk f kLp
As in the proof of the property LP5, we can express j Pk f = 2jk Pk Pk f for some
Pk similar to Pk . Hence, using the estimate (269)
X
X
1/2
k j f kLp . k
2jk Pk Pk f kLp . k
|2jk Pk f |2
kLp .
k
On the other hand, we can also write 2 Pk f = Pk j f for some other similar LP
projection. Then, in view of (267),
X
X
1/2
1/2
k
|2jk Pk f |2
kLp . k
|Pk j f |2
kLp . k j f kLp
jk
Using the lemma we can now find an equivalent definition using LP projections:
Proposition 6.18. For any 1 < p < and any s N we have,
X
kf kW s,p k
(1 + 2k )s Pk f kLp .
(273)
(274)
164
Observe that the expressions on the right hand side of (273) and (274) make sense
s,p spaces
for every value s R. We can thus extend the definitions of W s,p , and W
to all real values s.
Additional characterizations of the homogeneous Sobolev norms k kW s,p can be
given using the following,
Proposition 6.19. For 2 p < and any s we have,
!1/p
!1/2
X
X
p
2kps kPk f kLp
. kf kW s,p .
22ks kPk f k2Lp
.
k
Proof :
(275)
!1/p
X
kps
kPk f kpLp
(276)
7. Problems
Problem 1.[Distributions in R]
Let f (z) be a an analytic function in the domain D+ = {z C : 0 < =(z) < }
such that |f (z)| . |=(z)|N for all z D. Show that there exists a distribution
f+ = f ( + i0) such that for every C0 (Rn ),
Z
lim
f (x + iy)(x)dx = < f+ , >,
y0,y>0
1
z
1
x+iy ,
(x + i0)1 (x i0)1
= 2i0 (x).
7. PROBLEMS
where
1
x
165
t u = u,
u(0, x) = f (x)
(277)
t u = iu,
u(0, x) = f (x)
(278)
t2 u = u,
t2 u = u,
u(0, x) = f (x),
u(0, x) = f (x),
t u(0, x) = g(x)
t u(0, x) = g(x)
(279)
(280)
In each of these cases write down solutions using the Fourier transform method. In
other words take the Fourier transform of each equation, set
Z
u
(t, ) = eix u(t, x)dx
and solve the resulting differential equation in t. Compare the results for the last
two equations. Show that (279) has solutions for all f, g S(Rn ) while (280)
does not. Show however that if we only prescribe u(0, x) = f (this is the Dirichlet
problem for the Laplacian t2 + in Rn+1 ), then the problem has a unique solution
u, which decays to zero as |t| + |x| , for all functions f S(Rn ). In all cases
express10 the resulting solutions as integral operators applied to the initial data(in
physical space).
Problem 4.[Extension operator] Let H be the half space xn > 0 in Rn and
1 p . Show that there exists an extension operator, that is a bounded linear
operator E : W 1,p (H) W 1,p (Rn ) such that for all u W 1,p (H) we have Eu = u
a.e. in H and
kEukW 1,p (Rn ) . kukW 1,p (H) .
Extend the result to any s N. Can you extend the result to arbitrary domains
U Rn ? What about domains with smooth boundaries ?
Problem 5.[Distributions and Fourier Analysis on the Circle] A smooth function
on the circle R/Z is a smooth function on R which is 1-periodic
f (x + k) = f (x),
kZ
\ = Z corresponding to the
The circle has a discrete space of frequencies m (R/Z)
functions x 7 e2imx . The discreteness of the frequency space is intimately related
10 You will have to perform the inverse Fourier tarnsform, u(t, x) = F 1 u
(t, ). For the wave
equation this is more difficult, in general, but you can do it for dimension n = 3.
166
to the compactness of the circle. A Schwartz function on the circle is just a smooth
function; a Schwartz function on Z is one which decays faster than any polynomial
at infinity.
a. We define the Fourier transform of a periodic function f(m) =
Prove the Fourier inversion formula
X
f (x) =
f(m)e2imx
R1
0
f (x)e2imx .
mZ
for smooth functions on the circle. Deduce the Plancharel formula < f, g >=<
f, g >.
b. We define a distribution u on the circle to be an element of the dual of C k (R/Z)
for some k, i.e. < u, > C||||C k for some k, C and all C (R/Z). The circle
has a smooth structure, so it is possible to formulate the notion of a fundamental
solution for a differential operator (the group structure on the circle allows convolution to make sense as well) however it is not always possible to find such a
d
. In other
solution. Show that there is no fundamental solution u to the operator dx
du
words, there is no distribution u for which dx = (x) in the sense that
du
d
, > < u,
>= (0), C (R/Z)
dx
dx
There are many ways to prove this. Can you see this in both physical and frequency
d
space? What if we replace the vector field dx
by another nonvanishing vector field
d
(281)
ii. Show that the result is not true for s 1/2. Show however that the following
sharp trace theorem holds for all s > 0,
kRf kH s (Rn1 ) . kf kH s+1/2 (Rn )
(282)
iii. Show that f is a function with Fourier support in the ball || . 2k for some
integer k then, for all 1 p and s > 1/p,
kf kLp (Rn1 ) . 2k/p kf kW s,p (Rn )
Can you deduce from here a trace result, in Lp norms, generalizing that of (281) ?
What about (282) ?
iv. Let H be the half space xn > 0. According to the above considerations we
can talk about the trace of a function in W 1,p (H) to the hyperplane xn = 0( Prove
this !). Show that a function f W 1,p (H) belongs11 to W01,p (H) if and only if its
trace to xn = 0 is zero.
11recall that W 1,p (H) is the closure of C (H) in W 1,p (H)
0
0
8. RESTRICTION THEOREMS
167
x6=yRn
i.
|u(x) u(y)|
|x y|
ii.
sup
xRn , 0h1
|f (x + h) + f (x h) 2f (x)|
h
Show that
kf k kP0 f kL + sup 2k kPk kLp .
k>0
iii.
Problem 8. Read on your own the section on Calderon-Zygmund operators. Indicate how the theory can be extended to operators valued in a given Hilbert space,
such as l2 .
8. Restriction Theorems
It is well known that when f L1 (Rn ) then its Fourier transform f is a bounded
and continuous function, thus the restriction of f to any hypersurface is perfectly
well defined. On the other hand, if f L2 (Rn ) then f may be any function in L2 ,
hence defined only almost everywhere and completely arbitrary on sets of measure
zero like hypersurfaces.
Can one make sense of the restriction of f to a smooth hypersurface S when f
belongs to some Lp with 1 < p < 2? This is a basic question in modern Fourier
analysis, which, as we shall see, turns out to be intimately tied to regularity properties of solutions to wave equations.
If we take S to be a hyperplane, we immediately see that the answer is negative.
Indeed, let f (x1 , x0 ) = u(x1 )v(x0 ), f(1 , 0 ) = u
(1 )
v ( 0 ), with x1 , 1 R and
0 0
n1
x , R
.R The restriction of f to the hyperplane 1 = 0 is well defined only
when u
(0) = u(x)dx
R is well defined. For any p > 1 it is always possible to find
u Lp (R) such that udx doesnt make sense. We deduce that the restriction of
the Fourier transform on hyperplanes cannot be defined when p > 1.
The answer is different if we consider hypersurfaces which have non vanishing curvature. For simplicity we consider the model case of the sphere.
168
8.1. The Stein-Tomas theorem. The following type of result was first proved
by Stein [], then extended by Tomas [] and given its final form again by Stein [].
Theorem 8.2 (Stein-Tomas). Let S = Sn1 be the standard unit sphere in Rn and
d its standard volume element. Let f Lp (Rn ) with
1p
2(n + 1)
.
n+3
Then Rf = f L2 (S) and
S
Sg(x) = R g(x) =
(283)
8. RESTRICTION THEOREMS
169
8.6. Knapp counterexample. The result of theorem 8.3 is false for any
p < p in virtue of the following counterexample ([?]).
Define, for some small > 0, the region in phase space
D = Rn : |1 1| < 2 , | 0 | < .
Let now f = SD be the characteristic function of the cap S D, then
kf kL2 (S) = |S D|1/2 (n1)/2 .
We can write
Sf (x) = eix1
ei(x,) d ,
SD
n1
2
n+1
p 0,
This example suggests that there is some sort of parabolic scaling property in the
structure of the operator S which comes from the nonvanishing curvature of the
sphere.
8.7. The importance of curvature. The restriction theorem and its dual
counterpart remain true if we replace the standard sphere Sn1 by a compact
hypersurface H Rn with non-vanishing Gauss curvature. The importance of
non-vanishing Gauss curvature is illustrated by the following result.
Lemma 8.8. Let H Rn be a compact hypersurface with non-vanishing Gauss
curvature (i.e. with all its principal curvatures different from zero) and volume
element d. Then, for any smooth function , we have,
|(d) (x)| . (1 + |x|)
n1
2
(284)
n2
2
Proof The general proof is based on the method of stationary phase, see Steins
Harmonic Analysis book. For the particular case of the standard sphere H = Sn1
and odd n the proof can be done by a direct computation in polar coordinates.
Exercise
170
T : H B 0 is bounded and kT k = M ;
T : B H is bounded and kT k = M ;
T T : B B 0 is bounded and kT T k = M 2 ;
the bilinear form (x, y) 7 hT x, T yi is bounded on B B with norm M 2 .
ix
SS f (x) = SRf (x) = e f ()d =
ei(xy) d f (y)dy = d f (x).
S
Rn
We are thus led to the following equivalent form of the restriction theorem,
kd f kLp (Rn ) . kf kLp0 (Rn ) ,
(285)
for p p .
One can give three distinct proofs of Theorem 8.3. We shall sketch the first proof
based on analytic interpolation. This is essentially the original proof of Stein and
Tomas. The second proof, based on introducing a time parameter and treating
Sf as an evolution operator allows us to regard the restriction theorem as part
of a more general framework which includes Strichartz estimates for various linear
8. RESTRICTION THEOREMS
171
PDE with constant coefficients. Finally the third approach, which only applies for
specific exponents, will allow us to to connect with bilinear estimates.
8.13. First proof: analytic interpolation. According to Remark 8.12 and
Remark 8.4 it suffices to prove that U f = d f verifies
kU f kLp (Rn ) . kf kLp0 (Rn ) ,
(286)
(287)
(288)
To obtain (286) from the analytic interpolation of (287) and (288), we would like
the latter to happen on the line Re(z) = a, where a is chosen so that
1 = a + (1 )0,
1
=
+
,
p
1
1
= +
,
0
p
1
2
172
z () = ez z+ (1 ||)(||),
(289)
8. RESTRICTION THEOREMS
173
For more information about z+ and distribution theory one can consult the books
by Gelfand and Shilov [Ge-S] or Hormander [?].
2
|
z (x)| . (1 + |x|)
(291)
(292)
p
d 0
0
it 1| 0 |2 ix0 0
0 |2 , 0 ) p
Sf (t, x ) =
e
e
f
(
1
1 | 0 |2
| 0 |< 3/2
Z
0 0
0 2
= eit 1| | eix (| 0 |)g( 0 )d 0 .
p
p
with C0 supported in | 0 | < 1 and g( 0 ) = f ( 1 | 0 |2 , 0 )/ 1 | 0 |2 . Observe that
Z
Z
|f ()|2
0 2
0
|g( )| d =
d ' kf k2L2 (S)
2
S |1 |
by the assumption on the support of f .
Theorem 8.17. Let C0 (Rn1 ) be supported in the unit ball { Rn1 : || <
1} and consider the operator
Z
2
T g(t, x) =
eit 1|| eix ()g()d, t R, x Rn1 .
Rn1
(293)
(294)
where (r) = (n 1)(1/2 1/r). Then the following estimate holds true for all
g C0 (Rn1 ),
kT gkLqt Lrx (RRn1 ) . kgkL2 (Rn1 ) .
(295)
174
Remark 8.18. We can run again the Knapp example to prove the necessity of
condition (293), when q 2. Indeed let D Rn1 be the disk defined by || ,
for sufficiently small > 0, and take g = D to be the characteristic function of D.
We write,
Z
2
T g(t, x) = eit
eit( 1|| 1) eix ()d
D
c q
n1
r
n1
2
2
=
F (t, x) eit 1|| eix ()g()ddtdx =
Z Z
Z
it 1||2 ix
= g()()
e
e
F (t, x)dtdx d.
Hence
T F () = ()
and
2
eit 1|| eix F (t, x)dtdx,
2
eit 1|| eix ()T F ()d
ZZ
2
=
ei(ts) 1|| eix |()|2 F (s, )dds,
T T F (t, x) =
where F (s, ) =
ZZ
2
U (t)f (x) = eit 1|| eix |()|2 f()d,
(296)
By Proposition 8.11, to show that T is a bounded operator from Lqt Lrx (Rn ) to
0
0
L2 (Rn1 ) it suffices to prove that T T is a bounded operator from Lqt Lrx (Rn ) to
Lqt Lrx (Rn ).
8. RESTRICTION THEOREMS
175
(297)
(298)
(299)
then the estimate follows from the standard Riesz interpolation theorem.
We obtain (298) immediately using Plancherel formula, since
2
(U (t)f ) () ' eit 1|| |()|2 f().
To prove (299) we write
Z
U (t)f (x) =
where
Kt (x y)f (y)dy,
2
eix eit 1|| |()|2 d
ZZ
p
'
eix eit (1 2 ||2 ) 1 ||2 |()|2 d d
ZZ
'
ei(t,x)(,) (1 |(, )|)1 (, )d d,
1 (, ) = |()|2 ,
Z
Kt (x) =
when (r) + 1 + 1/q = 1/q , hence (r) = 2/q. Therefore we proved Theorem
8.17 in the case 0 < (r) = 2/q < 1.
On the other hand if q = 2 and (r) > 1 we have from (300),
kT T F kL2t Lrx . kF kL2t Lrx0 ,
by an application of the standard Hausdorff-Young inequality.
176
Finally, if 2/q < 1 and (r) > 2/q the result follows from the case (r) = 2/q using
Sobolev inequalities.
8.21. Third proof: bilinear forms (n = 2 and n = 3). We present now
another method to prove the restriction theorem for the sphere that works for the
special cases n = 2, p = 6 or n = 3, p = 4. The idea is that when p is an even integer,
the restriction theorem can be viewed as an L2 estimate for a multilinear form,
which, through the Fourier transform, has a convolution structure that provides
some smoothing effects. The proofs given below are at the root of the so called
bilinear trilinear estimates, which play a fundamental role in the modern theory of
nonlinear wave and dispersive equations.
Let us see the case n = 3 first. We consider the Stein operator Sf = (f d) , and
use the fact that (Sf Sf ) ' (f d) (f d). Let B(f, g) = Sf Sg, then an L4
estimate for Sf corresponds to an L2 estimate for B(f, f ). We have
Z
(301)
with
1)()| = sup
A = sup |B(1,
Z
(1 | |)(1 ||)d.
(302)
Thus, to prove the theorem in this case it suffices to check that A is finite. It is
1)(). For any dimension
useful to carry out the explicit calculation of A() = B(1,
n 2 we have:
Lemma 8.22.
Z
(1 | |)(1 ||)d '
A() =
Rn
n3
1
4 ||2 +2 .
||
(303)
Proof
Z
A() =
Z
(1 | |)(1 ||)d '
(1 | |2 )d =
||=1
1
=
(|| 2 )d '
||
||=1
2
||=1
||
d .
2
||
Because of the rotational symmetry, we may assume that = (||, 0, . . . , 0), so that
Z
1
||
A() '
u (1 u2 ) 2 du =
1
,
|| 1
2
||
4
8. RESTRICTION THEOREMS
177
(304)
with
A = sup |T(1, 1, 1)()| = sup
ZZ
(1 | |)(1 ||)(1 ||)dd.
(305)
The convolution structure allows us to restrict to the set suppf + suppg+ supph,
and, if we make the hypothesis of f, g, h supported in a small cap of the sphere, we
can assume 1 || 3. Using Lemma 8.22 we can evaluate T (1, 1, 1) and show
that A is bounded,
Z
T (1, 1, 1)() = B(1, 1)( ) (1 ||)d
Z
Z
(1 ||)
d
d
=
'
S1
2 )1/2
(4
|
(3
+ ||2 )1/2
||<2
||<2
Z 1
Z 1
da
da
'
' 1,
2
1/2
2
1/2
1/2
(1 a )
(1 a)1/2
a() (3 || + 2||a)
a() (a a())
2
2
where a() = 3||
2|| . From the L estimate (304) of the trilinear form T (f, g, h),
it follows the L6 estimate for the Stein operator Sf :
178
We can also try to repeat the bilinear argument for n = 2. As before, for B(f, g) =
Sf Sg we have
g)()|2 B(1,
1)()B
|f |2 , |g|2 ().
|B(f,
1),
Integrate with respect to , and use Lemma 8.22 to evaluate B(1,
ZZ
(1 | |)(1 ||)
|f ( )|2 |g()|2 dd.
kB(f, g)k2L2 (R2 ) .
||(4 ||2 )1/2
Change variable, = , and observe that when || = || = 1 we have
|| = | + | ' (1 + )1/2 ,
(4 ||2 )1/2 = (4 | + |2 )1/2 ' (1 )1/2 ,
hence
kB(f, g)k2L2 (R2 ) .
|f ()|2 |g()|2
ZZ
S1 S1
(1 ( )2 )
1/2
d d .
(306)
Part 3
CHAPTER 7
Decay estimates
Consider the standard wave equation in Minkowski space Rn+1
= 0.
(307)
The canonical, inertial, coordinates in Rn+1 are denoted by x , = 0, 1, . . . , n relative to which the Minkowski metric takes the diagonal form m = diag(1, 1, . . . , 1).
We have x0 = t and x = (x1 , . . . , xn ) denote the spatial coordinates. We make use
of the standard summation convention over repeted indices and those concerning
raising and lowering the indices of vectors and tensors. In particular, if x = m x ,
we have x0 = t and xi = xi , i = 1, . . . , n. We denote by t0 the spacelike
P hyperplanes t = t0 . The wave operator is defined by = m = t2 + i i2 . We
study the initial value problem,
(0, x) = f (x),
t (0, x) = g(x)
(308)
For convenience we denote [0] = (f, D1 g) with D1 the pseudodifferential operator with symbol ||1 . Let,
Z
X
E[](t) =
(|t |2 +
|i |2 )dx
(309)
t
be the total energy of at time t. The conservation law for the energy is,
E[](t) = E[](0)
(310)
n1
2
181
k[0] k
n+1
B1,12
(312)
182
7. DECAY ESTIMATES
n+1
kZ
Exercise. Show that the inequality (312) follows from its frequency localized
version. In other words show that it suffices to prove the following inequality,
k(t)kL c|t|
n1
2
k[0] kL1
(313)
1
2
|| 2.
Proof The standard proof of (312) is based on the method of stationary phase
applied to the representation (311). In odd dimensions one can prove a related form
of the dispersive estimate using the spherical means representation of solutions.
This is particularly easy to do for n = 3. We shall later discuss a derivation of
(312) which avoids any representation formulas.
Remark 0.25. The dispersive inequality provides two types of information. The
first concerns the precise decay rate of k(t)kL as t while the second provides information about the regularity properties of k(t)kL for t > 0. As far as
improved regularity is concerned the estimate (312) gains, for t > 0, n1
2 derivatives
when compared to the Sobolev embedding L (Rn ) W 1,n (Rn ).
In many applications, especially to nonlinear equations, (312) is not very useful.
A more effective procedure to derive the asymptotic properties of solutions of the
wave equation is based on generalized energy estimates, obtained by the commuting
vectorfields method, together with global Sobolev inequalities. In what follows
we review the commuting vectorfields method for deriving the above decay rate
estimate. The idea is to use the energy identity (310) together with the vectorfields
which commute with the wave operator and and a global version of the classical
Sobolev inequalities We refer the reader to [?] and [?] for details.
The Minkowski space-time Rn+1 is equipped, see appendix 4.2, with a family of
Killing and conformal Killing vector fields, the translations T = , Lorentz rotations L = x x , scaling S = tt + xi i and the inverted translations
K = 2x S+ < x, x > . Recall that x , denote the standard variables x0 = t,
x1 , . . . , xn , and x = m x . The Killing vector fields T and L commute with
while S preserves the space of solutions in the sense that = 0 implies S = 0
as [, S] = 2. One can split the operators L into the angular rotation operators
(ij)
O = xi j xj i and the boosts (i) L = xi t + ti , for i, j, k = 1, . . . , n. Recall
the energy expression in (309). Based on the commutation properties described
above we define the following generalized energies
Ek [] =
X
Xi1 ,..,Xij
(314)
7. DECAY ESTIMATES
183
with the sum taken over 0 j k and over all Killing vector fields T, L as well
as the scaling vector field S. The crucial point of the commuting vectorfield method
is that the quantities Ek , k 1 are conserved by solutions to (307). Therefore, if,
X Z
2k
k+1
2
k
2
(1 + |x|)
|
f (x)| + | g(x)| dx Cs <
(315)
0ks
then for all t, Es [](t) Cs . The desired decay estimates of solutions to (307) can
now be derived from the following global version of the Sobolev inequalities ( see
[?], [?]):
Theorem 0.26 (Global Sobolev). Let be an arbitrary function in Rn+1 such that
Es [] is finite for some integer s > n2 . Then,
|(t, x)| . (1 + t + |x|)
n1
2
for all t > 0. Therefore if the data f, g in (307) satisfy 315, with s >
all t 0,
1
|(t, x)| .
n1
1
2
(1 + t + |x|)
(1 + |t |x||) 2
(316)
n
2,
then for
(317)
Remark 0.27. Clearly this estimate, whose proof is purely geometric1, implies the
decay properties given by the dispersive inequality (312). In fact it provides more
information outside the wave zone |x| t which fit very well with the expected
propagation properties of the linear equation = 0. On the other hand, as (316)
is really a global version of the Sobolev inequality, it seems that the estimates of
the Proposition 0.26 have no bearing on the improved regularity features of (312).
This is however not quite true as we shall see, later.
Proof We only sketch the main ideas of the proof below. Consider the canonical
null pair L = t r ,, an associated null frame e1 , . . . en1 , en = L , en+1 = L+
as well as the angular vectorfields, Ai = i xri r . Clearly,
X
X
|Ai | . |
/ | .
|Ai |.
i
where |
/ |2 =
Pn1
2
i=1 |ei ()| . Also,
X
X
|r | +
|Ai | . || . |r | +
|Ai |
i
= xi Lj xj Li
184
7. DECAY ESTIMATES
1
|(t, x)|
t
1
|(t, x)|
|L (t, x)| .
t |x|
|L+ (t, x)| .
(318)
1
|(t, x)|.
t
(319)
1
N
| N (t, x)| .
| (t, x)|
t |x|N
where |N | =
(320)
Combining the above inequalities with the definition of our norms we derive
tkE+ (t)kL2
. k(t)kL2 (Rn )
tk
/ (t)kL2
. k(t)kL2 (Rn )
kuE (t)k
. k(t)kL2 (Rn )
L2
k(t)kL .
k(1 + |u|)k k (t)kL
Also,
1 n+1
2
ks+1 (t)kL2 (Rn )
t
1 n+1
2
.
ks+1 (t)kL2 (Rn )
t
1 n1
2
.
ks+1 (t)kL2 (Rn )
t
kE+ (t)kL .
k
/ (t)kL
k(1 + |u|)E (t)kL
(321)
(322)
7. DECAY ESTIMATES
185
Proposition 0.30 ( see[?]). The commuting vectorfields method implies the dispersive inequality (312).
Proof Without loss of generality we may assume that t = g = 0 and that the
Fourier transform of f = (0) is supported in the shell 2 || 2 for some
2N . By a simple scaling argument we may in fact assume = 1. For such
initial conditions, with Fourier supports restricted to 1/2 || 2, it suffices to
prove,
k(t)kL . (1 + |t|)
n1
2
kf kL1 (Rn )
|k I (x)| Ck,n
(324)
IZn
n1
2
n+k+1
X
kDj fI kL1
(325)
j=0
X
I
kk I (t)kL . (1 + t)
n1
2
n+k+1
X X
j=0
kDj fI kL1
186
7. DECAY ESTIMATES
0ij
X Z
X
Rn
0ij
Rn
|Di I (x)| )Dji f (x)|dx
ci,n kD
ji
0ij
Hence,
kk (t)kL
(1 + t)
n1
2
kf kL1 (Rn )
as desired.
It therefore remains to check (325). Without loss of generality, by performing a
space translation, we may assume that I = 0. Applying the proposition 0.26 to
= 0 we derive, for s the first integer strictly larger than n2 ,
k(t)kL
c(1 + t)
n1
2
n1
2
c(1 + t)
Es [0 ](t)
Es [0 ](0).
Since the support of 0 is included in in the ball of radius 1 centered at the origin
we have,
sX
+1
Es [0 ](0) Cn
kDj f0 kL2 .
j=0
n
n1
2
n+2
X
j=0
as desired.
kDj f0 kL1
CHAPTER 8
Strichartz Inequalities
Strichartz inequalities are an important tool in the study of linear and nonlinear
wave equations. They are intimately tied to restriction theorems. In this chapter
we shall only consider the case of the standard linear wave equation. Similar inequalities hold true however for linear dispersive equations such as the Schrodinger,
linear KdV etc.
0.30.1. Homogeneous wave equation. Consider solutions u = u(t, x), t R, x
Rn to the equation
u =
u(0, x)
F,
(326)
f (x),
t u(0, x) = g(x),
(327)
(328)
verifying the initial condition (327) at time t = 0, and a solution to the purely
inhomogeneous wave equation
2u = F,
(329)
t u(x, 0) = 0.
t u(0, x) = h(x)
Before stating the main result of this section we make the following definition.
187
188
8. STRICHARTZ INEQUALITIES
Definition 0.31. We say that the pair of real numbers (q, r) is an admissible wave
pair if they satisfy the conditions
q
2
q
(q, r, n) 6=
2,
(n 1)
1 1
2 r
,
(2, , 3).
with
. kf kH + kgkH 1
(331)
(3) We also have the following more general version of (332) for admissible
pairs (q1 , r1 ), (q2 , r2 ) with r1 , r2 < verifying the dimensional condition,
1
n
1
n
n
+
= = 0 + 0 2
q1
r1
2
q2
r2
Then,
kukLq1 ([0,T ];Lr1 ) + kukC([0,T ];H ) + kt ukC([0,T ];H 1 )
8. STRICHARTZ INEQUALITIES
189
190
8. STRICHARTZ INEQUALITIES
R
assume by contradiction that in fact, J := 0 (t, t, 0, 0)(t) dt < C for all f
C0 (R3 ) with kf kL2 = 1 and some S(R), 6 0. In view of the formula (see
section on the fundamental solution of in R3+1 ),
Z
1
(t, x) = (4) t
f (x + t) d
||=1
we find that,
1
J = (4)
R3
Since f is an arbitrary
C0 (R3 )
g()
||
Thus the fundamental solution W (t)h, defined above, takes the form,
Z
sin(t||)
W (t)h(x) =
eix
h()d.
||
n
R
(336)
(337)
By Duhamel principle, see (330), the general solution of the inhomogeneou equation
u = F can be expressed in the form,
Z t
u(t) = t W (t)f + W (t)g +
W (t s)F (s)ds.
(338)
0
let D = ()
Observe that,
1/2
(W (t)Df )(x) =
Rn
Since sin t|| and cos t|| are bounded the operators t W (t) and DW (t) map H s (Rn )
in itself. In particular, solutions u of (328), (327) preserves the (Sobolev) regularity
8. STRICHARTZ INEQUALITIES
191
. kf kH + kgkH 1
which provides the easy part of estimate5 (331). Therefore to prove (331) it suffices
to prove,
kukLqt Lrx
. kf kH + kgkH 1
(339)
eix cos(t||)h()d
=
Rn
and,
D1 W (t)h(x)
eix
=
Rn
cos(t||)
h()d
||
where f =
1
2
u
b(t, ) = eit|| f+ () + eit|| f (),
f D1 g . It follows that u = u+ + u where
Z
u
=
ei(xt||) f ()d
. kf + kH
(340)
Rn
Thus integrating in t,
kt u(t)k2L2 + ku(t)k2L2 kt u(0)k2L2 + ku(0)k2L2 + 2
Z tZ
t u F dxds
0
Rn
equation
192
8. STRICHARTZ INEQUALITIES
t u F dxds.
(342)
Rn
In particular, applying H
older,
ku(t)k2L2
ku(0)k2L2
kt u(s)kL2 kF (s)kL2 ds
+2
0
(343)
Now let Ds be the operator Ds = ()s/2 whose symbol in Fourier space is given
by ||s . Since Ds commutes with we easily derive,
Z
s
2
s
2
kD u(t)kL2 = kD u(0)kL2 + 2
t Ds u Ds F dx
Rn
Rn
Therefore, by H
older, in the slab DT = [0, T ] Rn ,
sup kDs u(t)k2L2 kDs u(0)k2L2 + 2kD2s t ukLqt Lrx (DT ) kF kLq0 Lr0 (D
t
t[0,T ]
T)
t[0,T ]
T)
t[0,T ]
T)
T)
(344)
Then,
sup kD1/2 u(t)kL2 . kF kLq0 Lr0 (D
t
t[0,T ]
T)
ku(t)kH ) + kt ukH 1 )
thus proving half of estimate (332). Therefore the inhomogeneous estimate (332)
reduces to proving,
kukLq ([0,T ];Lr ) + kD1 t ukLq ([0,T ];Lr )
(345)
8. STRICHARTZ INEQUALITIES
193
0.38. Homogenous Case. In this section we prove estimate (340) and thus
complete the proof for the homogeneous Strichartz estimate of theorem 0.32. Using
the space-time Fourier transform, i.e. Fourier transform with respect to both t and
x,
u
e+ (, ) = ( ||)f+ (), u
e (, ) = ( + ||)f (),
(346)
These are the components of u
e living on the forward null cone C+ = { = ||} and
on the backward null cone C = { = ||}, respectively. Thus we can interpret
(340) from the point of view of a restriction theorem for the half light cones C+ or
C . We next show that it suffices to prove (340) for the case when f+ is included
in fixed dyadic piece. More precisely, dropping the label + it suffices to show that,
q
ku+
k kLt Lrx
. 2k kfk+ kL2
(347)
+
+
+
+
where u+ = k2Z u+
and Pk the standard LP projeck , uk = Pk u , fk = Pk f
tions with respect to the spatial variables x.
To show that (348) implies (340) is highly nontrivial7 as we need to rely on corollary
6.15 adapted to the mixed norms Lqt Lrx with both q and r larger than 2. Thus,
X
X
2
ku+ k2Lq Lr .
ku+
22k kfk+ k2L2 . kf + kH
k kLq Lr .
t
kZ
kZ
Finally we observe, using a simple scaling argument, that (348) follows from,
q
ku+
0 kLt Lrx
kf0+ kL2
(348)
(349)
(350)
and also
kCC F kLqt 1 Lrx1 . kF k
q0
r0
Lt 2 Lx2
(351)
B 2,1
replacing H norm on the right.
194
8. STRICHARTZ INEQUALITIES
(352)
(353)
(354)
which is also equivalent to the polarized form (351). Thus, to prove the theorem it
suffices to prove (354). As in the second proof of the restriction theorem presented
in the previous section to prove (354) we need to prove the following properties for
the evolution operators U (t).
Proposition 0.40. Let () be a fixed C0 (Rn ) function supported in 1/2 || 2
and,
Z
U (t)f (x) = ei(t||+x) ()f()d.
(355)
Then,
kU (t)f kL2
kU (t)f kL
. Ckf kL2
(356)
n1
2
. (1 + |t|)
kf kL1
(357)
. (1 + |t|)
n1
2
2 (1 r )
kf kLr0
(358)
(1 + |t|)
n1
2
kf kL1
(359)
Proof We prove directly the stronger version (359). We only need to check (??).
We write,
Z
U (t)f = Kt f,
Kt (x) = ei(x+t||) ()d
It suffices to show that,
|Kt (x)| .
1
(1 + |t| + |x|)
In the regions |x| < |t|/2 and |x| 2|t| we integrate by parts k times with respect to
P xj +t j
the operator L = i j |x+t ||2 j , such that L(ei(x+t||) ) = ei(x+t||) . We also
||
n1
2
n+1
2
8. STRICHARTZ INEQUALITIES
195
On the other hand, in the region |t| |x|, we write, with (||) vanishing on the
support of h ,
Z 1+2
Z
Kt (x) =
eit ()
eix h ()d()
||=
12
(360)
||=
which follows easily from the decay of the Fourier transform of measures supported
on Sn1 discussed in the previous section, see lemma 8.8. Therefore, for |t| |x|,
|Kt (x)| . (1 + |x|)
n1
2
. (1 + |t|)
n1
2
as desired.
We are now ready to prove (354) by following the same argument as in the second
proof of the restriction theorem. Indeed, in view of (352) and (358) we derive,
Z +
kCC F kLrx (t) .
(1 + |t s|)(r) kF (s)kLrx0 ds
(361)
n1
2 (1
2
r ).
where (r) =
when (r) + 1 + 1/q = 1/q , hence (r) = 2/q. This proves (348), and thus
theorem 0.39, in the case 0 < (r) = 2/q < 1. If q = 2 and (r) > 1 we have from
(361),
kCC F kL2t Lrx . kF kL2t Lrx0 ,
by an application of the standard Hausdorff-Young inequality.
Finally, if 2/q < 1 and (r) > 2/q the result follows from the case (r) = 2/q using
Sobolev inequalities. Due to the fact that one of the principal curvatures of the
light cone vanishes, the Strichartz estimates for the wave equation is not as strong
as it could be. Using the improved dispersive estimate (359) we can however derive
a stronger statement ,which is very useful in applications.
Proposition 0.41. Let 0 < < 1. Let f be an L2 function with Fourier transform
supported in a cube of size at a distance 1 from the origin. Let (q, r) be an
admissable pair of exponents for the Strichartz estimates. Then
1
1
(362)
kCf k q r . ( 2 r ) kf k 2 .
L
Lt Lx
The proof is based on the improved dispersive estimate (359). Interpolating it with
(356) we derive,
kU (t)f kLr
1 r (1 + |t|)
n1
2
2 (1 r )
kf kLr0
196
8. STRICHARTZ INEQUALITIES
1
1
and therefore, by the T T argument, kCf kLqt Lrx . ( 2 r ) kf kL2 , as desired. As a
straightforward corollary to the proposition we derive:
Finally we state below another result, which follows easily from the decay estimate
(357).
Theorem 0.43. Let u be a free wave, i.e. solution of the homogeneous equation
u = 0, with initial data (f, g). Then,
n+1
n1 X
n1
ku(t)kL . |t| 2
2 kf kL1 + 2 kg kL1
2Z
n1
2
|t|
kf kB n+1/2 + kgkB n1/2 .
1,1
1,1
n1
The uniform decay rate |t| 2 , for large t, plays a very important role in the study
of nonlinear perturbations of the standard wave equation.
0.44. Inhomogeneous Strichartz estimates. We have already reduced the
inhomogeneous Strichartz estimate (332) of theorem 0.32 to estimate (345). Proceeding as in the case of the homogeneous estimates we can now reduce (345) to
the case when the spatial Fourier transform of F is supported in the unit dyadic
ring 1/2 || 2. Moreover, decomposing u as before in the parts it suffices to
prove the estimates separately for u+ and u . Therefore we need to prove,
ku+ |Lq ([0,T ];Lr ) + kD1 t u+ kLq ([0,T ];Lr )
(364)
We have,
Z
u+ (t, )
=
0
D1 t u+ (t, )
Z
=
(CC )R F (t, ) =
U (t s)F (s, )ds
0
8. STRICHARTZ INEQUALITIES
197
. kF k
q0
r0
(365)
Proof The proof is straightforward in the case (q1 , r1 ) = (q2 , r2 ) = (q, r). Indeed
in this case we can simply repeat the proof of estimate (354) and just take into
account the limits of integration. We have also treated the case when q1 = ,
r1 = 2, see the subsection on energy estimates. The other non-diagonal case cases
are a little more difficult and will be treated in the more general abstract setting
discuss later in this section. The proof we have given covers however the most
interesting case of estimate (332). We have thus given complete proofs for the first
two parts of theorem 0.32
| 0 |2
. 2 .
|| + 1
We can then choose a region of space-time R defined by
|| 1 =
|t| . 2 ,
|t + x1 | . 1,
|x0 | . 1 ,
such that, when (t, x) R and D, then the oscillatory factor inside the last
integral can be treated as a constant. Hence, |Cf (t, x)| & |D| for (t, x) R and we
have
kCf kLqt Lrx
|D|kR kLqt Lrx
n1
n1
2
&
2 q r .
1/2
kf kL2
|D|
In the limit 0, an estimate of the form (354) will necessarily imply that q and
r satisfy the condition
2
1 1
(n 1)
.
(366)
q
2 r
The other restriction on the range for q, i.e. q 2 is a consequence of the invariance of the operator CC under time translations. Indeed for translation invariant
operators we have the following general result due to Hormander, [?].
Proposition 0.47. Let T : Lp (Rn ) Lq (Rn ) be a (non trivial) linear operator
which commutes with translations, in the sense that (T f ) y = T (f y ), where
y (x) = x + y, for x, y Rn . If T is bounded from Lp to Lq then we necessarily
have q p.
198
8. STRICHARTZ INEQUALITIES
|y|
|y|
|y|
hence
lim kf + f y kLp = lim 21/p kgR kLp = 21/p kf kLp .
|y|
|y|
Proof [Proof of Proposition 0.47] Let C > 0 be the optimal constant for the
estimate
kT f kLq Ckf kLp , f Lp .
Then by linearity and the translation invariance,
kT f + (T f ) y kLq Ckf + f y kLp .
When |y| , applying the lemma we obtain
21/q kT f kLq C21/p kf kLp ,
1
f Lp .
8. STRICHARTZ INEQUALITIES
199
(367)
(368)
(369)
for r 2, where
2
(r) = 0 1
.
r
Theorem 0.50. If the evolution operator U (t) satisfies (367) and (368), then the
estimates
kU (t)f kLqt LrX . kf kH ,
hold for all q, r 2 verifing:
2
= (r),
q
(370)
(371)
Remark 0.51. This form of the Strichartz inequalities applies to linear dispersive
equations such as Schr
odinger.
Proof If we consider the operator T : H Lqt LrX defined by T f (t, x) = (U (t)f )(x)
0
0
then it is easy
to verify that the dual of T is the operator T : Lqt LrX H given
R
by T F = U (s)F (s, )ds. By the T T method, (370) is then equivalent to the
estimate
Z
k U (t)U (s)F (s)dskLqt LrX . kF kLq0 Lr0 .
(372)
t
(373)
(374)
s<t
(375)
200
8. STRICHARTZ INEQUALITIES
(376)
kF (t)kLr0 kG(s)kLr0
dsdt.
(1 + |t s|)(r)
(377)
Now we can obtain (373) from Youngs inequality when 2/q = 1/p and (1 +
|t|)(r) Lp (R), i.e. (r)p > 1. Hence, (376) allows us to extend the Strichartz
estimates (370) in Theorem 0.50 to the range
2
(r), (q, r, 0 ) 6= (2, , 1).
q
This case applies to the linear wave equations.
(378)
Remark 0.53. We observe that there is a natural scaling associated to the objects
in this abstract formulation. More precisely, the estimates (370) in Theorem 0.50
are invariant under the change of scale defined by
U (t) U (t/),
U (s) U (s/),
d 0 d,
r=
Observe that operator R looks very similar to the T T operator, which is given by
Z
8. STRICHARTZ INEQUALITIES
201
Proof First of all observe that in the proof of theorem 0.50 we have actually
proved the diagonal case (q, r)
q , r). Indeed, the bilinear form defined in (374)
RR= (
can be written as B(F, G) =
R(F ) Gdxdt and (373) is the dual formulation of
the mapping property for R.
The non diagonal cases with 1q + 1q < 1 follow from the mapping properties of T T
by using a general argument about integral operators due to Christ and Kiselev
(see [] and []) which we summarize in Proposition 0.57 below.
It remains to consider the cases with q = q = 2 and r 6= r, under the assumption
that the evolution U (t) satisfies the stronger dispersive inequality (376) with 0 > 1.
Since, we have already proved the case r = r, by interpolation it is enough to
0
, r = , and show that
consider the extreme case: r = r = 2
0 1
|B(F, G)| . kF k
r0
L2t LX
kGkL2t L1X .
2Z
B ,
(380)
/2|ts|2
Proof We may assume that F and G are supported on disjoint time intervals of
length O() separated by a distance O(). Then B (F, G) = hT F, T GiH . We
use the energy estimate to bound kT F kH and the Strichartz estimate with q = 2
and r = to bound kT GkH , so that
|B (F, G)| . kF kL1t L2X kGkL2t L1X .
We then apply Holder inequality and use the assumption on the support of F to
obtain
|B (F, G)| . 1/2 kF kL2t L2X kGkL2t L1X .
We can also write B (F, G) =
dispersive inequality,
RRR
202
8. STRICHARTZ INEQUALITIES
Again, we apply Holder inequality and use the assumption on the support of F and
G to obtain
p
with constant (1+)
spaces we obtain that B
0 . By standard interpolation of L
r0
,
= +
C = /2
,
(1 + )0
r0
2
1
r =
20
.
0 1
20
. min , 1 ,
C =
1+
with
= min
0 + 1
0 + 1
,1
20
20
=
0 1
1
=
> 0.
20
r
0.56.1. Integral operators with restricted kernel. In this subsection we give a self
contained exposition of the results of Christ-Kisselev mentioned above. Consider
an integral operator with a measurable kernel K(s, t),
Z
T f (t) =
K(s, t)f (s)ds,
R
and its restricted version associated with the kernel K(s, t)(s < t),
Z
Rf (t) =
K(s, t)f (s)ds.
s<t
If T maps Lp into Lq and 1 p < q then we have that R also maps Lp into
Lq . An equivalent formulation of this fact is given in the following proposition.
Proposition 0.57. Let K(s, t) be a measurable function on R R. Let B(f, g) be
the bilinear form with kernel K,
ZZ
B(f, g) =
K(s, t)f (s)g(t)dsdt,
e g) the bilinear form with kernel restricted to the region s < t,
and B(f,
ZZ
e g) =
B(f,
K(s, t)f (s)g(t)dsdt.
s<t
(381)
8. STRICHARTZ INEQUALITIES
203
e is also bounded on Lp Lq ,
then B
e
B(f, g) . kf kLp kgkLq .
Remark 0.58. There are cases for which equality in condition (381) is not allowed.
Consider for the example the case of the Hilbert transform, which corresponds to
1
the kernel K(s, t) = st
, with p = q = 2.
Proof Let f Lp and g Lq with kf kLp = kgkLq = 1.
R
Define F (t) = s<t |f (s)|p ds. F is a continuous non-decreasing function which
maps [, +] onto [0, 1]. In particular, the inverse image of an interval of the
type I = [a, b] [0, 1] will be an interval of the same type, F 1 (I) = [A, B], with
RB
F (A) = a, F (B) = b, and A |f (s)|p ds = F (B) F (A) = b a. Hence,
kf kLp (F 1 (I)) = |I|1/p .
(382)
Consider now a Whitney decomposition of the set = (x, y) R2 : x < y into
disjoint dyadic squares, as in Lemma 6.7, = Q Q, where each square Q = I J
has the property
dist (I, J) |I| = |J| = ,
(383)
for some dyadic value of . If we look only at those squares needed to cover the
triangle [0, 1]2 , then 1/2.
Observe that s < t implies that either F (s) < F (t) or f 0 almost everywhere on
the interval [s, t]. Hence, we can write
ZZ
X
e g) =
B(f,
K(s, t)f (s)g(t)dsdt =
B(F 1 (I) f, F 1 (J) g).
F (s)<F (t)
Q
p
Now we use (382), (383) and the fact that, for each given dyadic interval J, the
number of intervals I for which I J is one of the squares in the decomposition of
is bounded by a universal constant. Hence,
X 1 X
e
kgkLq (F 1 (J)) .
p
B(f, g) .
1/2
|J|=
Next, we apply H
olders inequality to the summation over the dyadic intervals J of
length and since there are 1 of them in [0, 1] we have
X 1 1
X 1 1
e
p q0 kgkLq =
p + q 1 . 1.
B(f, g) .
1/2
1/2
CHAPTER 9
Bilinear Estimates
Writing u = u + u
it suffices to prove,
ku+ kL4 (R1+3 )
kf + kH 1/2
(384)
where
Z
u+ (t, x) =
eix+t|| f()d
Clearly,
ku+ k2L4 (R1+3 )
f
+u
+k 2
ku+ u+ kL2 = kuf
L
205
(385)
206
9. BILINEAR ESTIMATES
Therefore the bilinear estimate is an immediate consequence of the uniform boundedness of J. This follows from the following more general lemma below.
Lemma 1.2. Let F be an arbitrary function of two variables and JF the integral
Z
JF (, ) =
( || | |)F (||, | |)
Rn
Then,
JF (, )
JF+ (, )
( 2 ||2 )
n3
2
F
1
( 2 ||2 )
n3
2
F
1
n3
+ s|| + s|| 2
,
( x2 ||2 )(1 |x|2 ) 2 dx,
2
2
(386)
n3
+ s|| + s|| 2
,
( x2 ||2 )(1 |x|2 ) 2 dx
2
2
(387)
( ||)2 | |2
2( ||)
2( ||) ( ||)2 | |2
with a the cosine of the angle between and . Thus, for fixed and we must
have, on the support of the measure,
a
2 ||2 2
2||
(388)
Observe that in the ellipsoidal case a can take any values in the interval [1, 1] and
2 ||2
||
thus, since = 2(
+||
a||) , we have
2
2 . On the other hand, in the
2
2
hyperboloidal case when || > , we must also have the restriction,
a.
||
and thus, =
2 +||2
2( +a||)
+|
2 .
2.
207
n3
2
F (, )( )n2 1
d
JF =
|| ||
2||
2
n3 Z +||
2
+ ||
( 2 ||2 ) 2
|| n3
2
=
F (, )( )
n2
||
||
2
2
2
At last we perform the change of variables x = 2
|| to derive the desired formula
(386). The proof for (387) follows in the same manner.
kf kH a kgkH a
with (q, r) an acceptable pair. By dimensional analysis and recalling the exponent
= n(( 21 1r )) 1q in (331), we must have,
2a
1
1 1
= b + 2
b + 2 n( )
2 r
q
(389)
<
X b
kf kH a kg kH a
. kf kH a kgkH a
208
9. BILINEAR ESTIMATES
By symmetry,
kDb (uv)LH kLq/2 Lr/2
t
kf kH a kgkH a
It thus only remains to estimate the high-high term k(u v)HH kLq/2 Lr/2 . This
x
t
requires a more subtle argument based on theorem ??. We write,
X
kDb (u v)HH kLq/2 Lr/2 .
b kP (u v )kLq/2 Lr/2
x
(390)
we would derive,
kDb (u v)HH kLq/2 Lr/2 .
t
b b kf kH a kgkH a
which diverges. We need to replace (390) by a stronger estimate which takes into
account the presence of P in front of u v . To achieve this, we need first to exploit
some orthogonality properties. We decompose the
, g , in Fourier space,
P the data fP
into pieces supported on cubes of size , f = Q fQ , g = Q gQ and denote by
uQ , vQ the corresponding solutions.
Clearly P
the decomposition commutes with the
P
wave operator . Thus, u Q uQ , v Q vQ and
X
P (u v )
P uQ1 vQ2
Q1 ,Q2
Hence,
kP (u v )kLq/2 Lr/2 .
t
.
.
.
1 r2
kf kH kg kH
2
1 r 22a
kf kH a kg kH a
2
1 r b
kf kH a kg kH a
209
and, consequently,
kDb (u v)HH kLq/2 Lr/2
t
X 1 2 b
r
kf kH a kg kH a
<
. kf kH a kgkH a
provided that b < 1
[?].
2
r.
. ku[0]kH a kv[0]kH a
(391)
1 1
1
= n( )
2 r
q
(392)
n+1
2
(Rn )
(393)
Remark 3.3. Without loss of generality, it suffices to consider the reduced initial
value problems
u(0, x) = f (x), v(0, x) = g(x), t u(0, x) = t v(0, x) = 0
(394)
In what follows we show how to deduce the estimate (3.2) from a more general form
of bilinear estimates presented in the next section.
|| , (| | + ||) , | | || .
210
9. BILINEAR ESTIMATES
1
1
k(uv)kL2 (Rn+1 ) = (2)n k( 2 ||2 )f
uvkL2 (Rn+1 )
2
2
. kD+ D (uv)kL2 (Rn+1 )
Therefore,
kQ0 (u, v)kL2 (Rn+1 )
(395)
Thus, in the case of the null form Q0 , theorem 3.2 reduces to,
kD+ D (uv)kL2 (Rn+1 ) . ku[0]kH 1 (Rn ) kv[0]k
n+1
2
(Rn )
(396)
1/2
1/2 1/2
kD+ D (D1/2 u0
1/2 0
v )kL2 (Rn+1 )
(397)
(398)
4(|||| )(|||| + )
((|| + || | + |)((|| + || + | + |)
(| + | || ||)(| + | + || ||)
(399)
(400)
211
According to proposition 3.6, theorem 3.2 reduces, for Q = Qij , resp. Q = Q0i , to
the statements,
1/2
1/2 1/2
kD+ D (u
v)kL2 (Rn+1 )
APPENDIX A
In what follows we give a short overview of the basic notions in Riemannian and
Lorentzian geometry. For a more detailed review we refer to [Pet], for Riemannian
geometry, and [Car], [Ha-E], [Wa] for Lorentzian geometry.
1. Introduction
A pseudo-riemannian manifold 1, or simply a spacetime, consist of a pair (M, g)
where M is an orientable p + q = n-dimensional manifold and g is a pseudoriemannian metric defined on it, that is a smooth, a non degenerate, 2-covariant
symmetric tensor field of signature (p, q). This means that at each point p M one
can choose a basis of p + q vectors, {e() }, belonging to the tangent space T Mp ,
such that
g(e() , e() )
(401)
for all , = 0, 1, ..., n , where is the diagonal matrix with 1 in the first p entries
and +1 in the last q entries. If X is an arbitrary vector at p expressed, in terms of
the basis {e() }, as X = X e() , we have
g(X, X) = (X 1 )2 . . . (X p )2 + (X p+1 )2 + .... + (X p+q )2
(402)
214
(403)
is, respectively, negative, zero or positive. The set of null vectors Np forms a double
cone, called the null cone of the corresponding point p. The set of timelike vectors
Ip forms the interior of this cone. The vectors in the union of Ip and Np are called
causal. The set Sp of spacelike vectors is the complement of Ip Np .
A frame e() verifying (401) is said to be orthonormal. In the case of Lorentzian
manifolds it makes sense to consider, in addition to orthonormal frames, null frames.
These are collections of vectorfields2 e consisting of two null vectors en+1 , en and
orthonormal spacelike vectors (ea )a=1,... ,n1 which verify,
215
we use greek letters , , , . . . to denote indices 0, 1, . . . , n. For a general pseudoriemannian metric of signature s we shall also use greek indices.
We will review the following topics below:
1.)
2.)
3.)
4.)
5.)
6.)
7.)
8.)
Curvature tensor of a pseudo-riemannian manifold. Symmetries. First and
second Bianchi identities.
9.)
Isometries and conformal isometries. Killing and conformal Killing vectorfields.
iii)
216
LX linearly maps (p, q)-tensor fields into tensor fields of the same type.
xi (t (q)) j
xj (q) Y
,
q
where q = t (p). (See [?], Hawking and Ellis, section 2.4 for details.)
If A is a k-form we have, as a consequence of the commutation formula of the
exterior derivative with the pull-back ,
d(LX A) = LX (dA) .
For a given k-covariant tensorfield T we have,
LX T (Y1 , . . . , Yk ) = XT (Y1 , . . . , Yk )
k
X
i=1
T (Y1 , . . . , LX Yi , . . . , Yk )
217
We remark that the Lie bracket of two coordinate vector fields vanishes,
,
= 0.
x x
The converse is also true, namely, [Sp], Vol.I, Chapter 5,
Proposition 2.3. If X(0) , ...., X(k) are linearly independent vector fields in a neighbourhood of a point p and the Lie bracket of any two of them is zero then there exists
a coordinate system x , around p such that X() = x for each = 0, ..., k .
Proof When k = 2 the proposition is equivalent to the fact that if if two linearly
independent vector fields X, Y commute , i.e. [X, Y ] = 0, then their flows also
commute in the following sense. Starting with any point p, if we go a parameter
distance t along the integral curve of X initiating at p then a parameter distance
s along the integral curve of Y , we arrive at the same point as if one first distance
s along the integral curve of Y and then distance t along the integral curve of X.
To prove the assertion we first consider coordinates t, x relative to which X = t .
The coefficient of Y , relative to the same coordinates, are t independent. We then
perform another change of coordinates x = x(y) such that Y = A(y)t + y1 .
Finally we can make another change of coordinates of the form s = a(y) + t, with
a the primitive of A relative to y 1 . In the new coordinates s, y we must have
X = s , Y = y1 as desired. The general case can be proved in the same manner.
The above proposition is the main step in the proof of Frobenius Theorem. To state
the theorem we recall the definition of a k-distribution in M. This is an arbitrary
smooth assignment of a k-dimensional plane p at every point in a domain U of
M. The distribution is said to be involute if, for any vector fields X, Y on U with
X|p , Y |p p , for any p U, we have [X, Y ]|p p . This is clearly the case for
integrable distributions3. Indeed if X|p , Y |p T Np for all p N , then X, Y are
tangent to N and so is also their commutator [X, Y ]. The Frobenius Theorem
establishes that the converse is also true4, that is being in involution is also a
sufficient condition for the distribution to be integrable,
Theorem 2.4. (Frobenius Theorem) A necessary and sufficient condition for a
distribution (p )pU to be integrable is that it is involute.
Proof If is involute we can find linearly independent vector fields X1 , . . . , Xk
spanning at every point and such that the Lie bracket of any two of them is a
linear combination of X1 , . . . , Xk . One can then redefine X10 as a linear combination
of the other vector fields such that it commutes with X2 , . . . , Xk . One can then
proceed by induction to redefine the other vectorfields such that they all commute.
The proof then follows from the previous proposition.
3Recall that a distribution on U is said to be integrable if through every point p U there
passes a unique submanifold N , of dimension k, such that p = T Np .
4For a proof see Spivak, [Sp], Vol.I, Chapter 6.
218
(404)
c) DX f Y = X(f )Y + f DX Y
Therefore, at a point p,
DY Y; () e()
(405)
where the () are the one-forms of the dual basis respect to the orthonormal frame
e() . Observe that Y; = () (De() Y ). On the other side, from c),
D(f Y ) = df Y + f DY
so that
DY = D(Y e() ) = dY e() + Y De()
and finally, using df () = e() (f )() (),
DY = e() (Y ) + Y () (De() e() ) () e()
(406)
Therefore
Y; = e() (Y ) +
Y
(De() e() )
=
(407)
which, in a coordinate basis, are the usual Christoffel symbols and have the expression
= dx (D )
x x
Finally
DX Y = X(Y ) +
e()
(408)
X Y
In the particular case of a coordinate frame we have
DX Y = X
+ X Y
x
x
A connection is said to be a Levi-civita connection if Dg = 0. That is, for any
three vector fields X, Y, Z,
Z(g(X, Y )) = g(DZ X, Y ) + g(X, DZ Y )
(409)
A very simple and basic result of differential geometry asserts that for any given
(pseudo-riemannian) metric there exists a unique affine connection associated to it.
219
1
g ( g + g g ) .
2
So far we have only defined the covariant derivative of a a vector field. We can
easily extend the definition to one forms A = A dxa by the requirement that,
X(A(Y )) = DX A(Y ) + A(DX Y ),
for all vectorfields X, Y . Given a k-covariant tensor field T we define its covariant
derivative DX T by the rule,
DX T (Y1 , . . . , Yk ) = XT (Y1 , . . . , Yk )
k
X
T (Y1 , . . . , DX Yi , . . . , Yk )
i=1
be the
Given a smooth curve x : [0, 1] M, parametrized by t, let T = t
x
corresponding tangent vector field along the curve. A vector field X, defined on
the curve, is said to be parallelly transported along it if DT X = 0. If the curve
has the parametric equations x = x (t), relative to a system of coordinates, then
5[ ... ; ] indicates the antisymmetrization with respect to all indices (i.e. 1 (alternating
1
k
k!
sum of the tensor over all permutations of the indices)) and , indicates the ordinary derivative
with respect to x .
220
T = dx
= X (x(t)) satisfy the ordinary differential
dt and the components X
system of equations
D
dX
dx
X
+ (x(t))
X =0.
dt
dt
dt
The curve is said to be geodesic if, at every point of the curve, DT T is tangent
to the curve, DT T = T . In this case one can reparametrizethe curve such that,
satisfies DS S = 0 .
relative to the new parameter s, the tangent vector S = s
x
Such a parameter is called an affine parameter. The affine parameter is defined
up to a transformation s = as0 +b for a, b constants. Relative to an affine parameter
s and arbitrary coordinates x the geodesic curves satisfy the equations
d2 x
dx dx
+
=0.
ds2
ds ds
A geodesic curve parametrized by an affine parameter is simply called a geodesic.
In Lorentzian geometry timelike geodesics correspond to world lines of particles
freely falling in the gravitational field represented by the connection coefficients. In
this case the affine parameter s is called the proper time of the particle.
Given a point p M and a vector X in the tangent space Tp M, let x(t) be the
unique geodesic starting at p with velocity X. We define the exponential map:
expp : Tp M M .
This map may not be defined for all X Tp M. The theorem of existence and
uniqueness for systems of ordinary differential equations implies that the exponential map is defined in a neighbourhood of the origin in Tp M. If the exponential
map is defined for all Tp M, for every point p the manifold M is said geodesically
complete. In general if the connection is a C r connection6 there exists an open
neighbourhood U0 of the origin in Tp M and an open neighbourhood of the point
p in M, Vp , such that the map expp is a C r diffeomorphism of U0 onto Vp . The
neighbourhood Vp is called a normal neighbourhood of p.
(410)
(411)
3.
221
R =
+
(412)
x
x
The fundamental property of the curvature tensor, first proved by Riemann, states
that if R vanishes identically in a neighbourhood of a point p one can find families
of local coordinates such that, in a neighbourhood of p, g = 7.
The trace of the curvature tensor, relative to the metric g, is a symmetric tensor
called the Ricci tensor,
R = g R
The scalar curvature is the trace of the Ricci tensor
R = g R .
The Riemann curvature tensor of an arbitrary spacetime (M, g) has the following
symmetry properties,
R = R = R = R
R + R + R = 0
(413)
(414)
The traceless part of the curvature tensor, C is called the Weyl tensor, and has the
following expression in an arbitrary frame,
1
(g R + g R g R g R )
C = R
n1
1
(g g g g )R
(415)
+
n(n 1)
Observe that C verifies all the symmetry properties of the Riemann tensor:
C = C = C = C
C + C + C = 0
and, in addition,
(416)
C = 0 .
are conformal if g
= 2 g for some non zero
We say that two metrics g and g
differentiable function . Then the following theorem holds (see Hawking- Ellis,
[?], chapter 2, section 2.6):
the Weyl tensor relative to g
= 2 g, C
and C the Weyl tensor
Theorem 3.1. Let g
relative to g. Then
= C .
C
222
(X)
the deformation
(X)
= (LX g) = D X + D X .
The tensor (X) measures, in a precise sense, how much the diffeomorphism generated by X differs from an isometry or a conformal isometry. The following Proposition holds, (see Hawking-Ellis, citeHawkEll, chapter 2, section 2.6):
Proposition 3.3. The vector field X is Killing if and only if
conformal Killing if and only if (X) is proportional to g.
(X)
= 0. It is
Remark: One can choose local coordinates such that X = x . It then immediately follows that, relative to these coordinates the metric g is independent of the
component x .
Proposition 3.4.
On any pseudo-riemannian spacetime M, of dimension n =
p + q, there can be no more than 21 (p + q)(p + q + 1) linearly independent Killing
vector fields.
Proof: Proposition 3.4 is an easy consequence of the following relation, valid for
an arbitrary vector field X, obtained by a straightforward computation and the use
of the symmetries of R.
D D X = R X + (X)
(417)
where
(X)
1
(D + D D )
2
(418)
(419)
3.
223
and this implies, in view of the theorem of existence and uniqueness for ordinary
differential equations, that any Killing vector field is completely determined by the
1
2 (n + 1)(n + 2) values of X and DX at a given point. Indeed let p, q be two points
connected by a curve x(t) with tangent vector T . Let L D X , Observe that
along x(t), X, L verify the system of differential equations
D
D
X =T L ,
L = R(, , X, T )
dt
dt
therefore the values of X, L along the curve are uniquely determined by their values
at p.
The n-dimensional Riemannian manifold which possesses the maximum number of
Killing vector fields is the Euclidean space Rn . Simmilarily the Minkowski spacetime Rn+1 is the Lorentzian manifold with the maximum numbers of Killing vectorfields.
3.5. The volume form, Hodge duality and divergence theorem. Let
(M, g) be an orientable pseudo-riemannian manifold M of dimension n and signature s. Let e() be an arbitrary, positively oriented, frame on M e() the corresponding dual frame. We can associate it with the n-form,
e(1) e(2) . . . e(n)
()
the corresponding
If e0() = M e() denotes another basis and e0 () = (M 1 )
e
dual basis of 1-forms, then,
(420)
(421)
1 2 3 ...n 1 2 3 ...n
...
=
=
...
n!(1)s
1 ...n
1 ...n
Remark that the covariant derivative of vanishes. Indeed this follows easily by
differentiating the formula 1 ...n 1 ...n = n!(1)s from which it follows that,
1 ...n 1 ...n ; = 0 and therefore, using (421), D 1 ...n =1 ...n ; = 0.
224
k1
(422)
(M) by the formula,
A = d A
(423)
(424)
1 1 ...k
A
B1 ...k
k!
and,
Z
< A, B >:=
g(A, B)
(425)
With these definitions we can easily check the following formula for A k (M), B
nk (M)
Z
< ?A, B > =
AB
M
Observe also that the divergence operator is dual to d in the following sense (for
A k (M), B k1 (M)),
< AB >=< A, dB >
8 In particular, if g is a Lorentz manifold of dimension 4 we have (A) = (1)k+1 )A.
3.
225
Proof Consider first the 0 part of the boundary and define, locally around every
point p 0 , a system of coordinates x0 , x1 , . . . xn such that,
(1) x0 = 0 in 0 and x0 < 0 in U.
(2) x1 , x2 , . . . , xn is positively oriented in 0 .
(3) g00 = 1, g0i = 0 on 0 near p.
Relative to these coordinates we have dvg =12...n dx1 . . .dxn where 12lodtsn =01...n .
Therefore, on 0 ,
( ?X)i1 ...in
226
(426)
(427)
4. MINKOWSKI SPACE
227
(429)
Equation (429) can also be written in the form DN N = 0. We call N a null geodesic
generator of the level hypersurfaces of u.
A causal curve can be either timelike and null at any of its points. The canonical
time orientation of Rn+1 is given by the vectorfield T0 = 0 . A timelike vector X is
said to be future oriented if m(X, T0 ) < 0 and past oriented if m(X, T0 ) > 0. The
causal future J + (S) of a set S consists of all points in Rn+1 which can be connected
to S by a future directed causal curve. The causal past J (S) is defined in the
same way. Thus, for a point p = (t, x), J + (p) = {(t t0 , x)/|x x0 | t t0 }.
Given a smooth domain D, its future set J + (D) may, in general, have a nonsmooth
boundary, due to caustics.
We consider conservative domains J + (D1 ) J (D2 ) with D1 1 , D2 2 ,
spacelike hypersurfaces. The domain is regular if both D1 , D2 are regular and
its non- spacelike boundaries N1 (J + (D1 )) \ D1 and N2 (J (D2 )) \ D2
are smooth. In the particular case, when D1 = 1 and D = D2 2 , we obtain
J + (1 )J (D), called domain of dependence of D relative to 1 , consisting of all
points in the causal past of D 2 , to the future of 1 . Similarily J + (D)J (2 ),
with D 1 is called the domain of dependence of influence of D relative to
2 . Particularly useful examples are given in terms of a time function t with
1 = {(t, x)/t(t, x) = t1 }, 2 = {(t, x)/t(t, x) = t1 } two, nonintersecting, level
hypersurfaces, 2 lying in the future of 1 .
A pair of null vectorfields L, L form a null pair if m(L, L) = 2. A null pair
en = L, en+1 = L together with vectorfields e1 , . . . en1 such that m(L, ea ) =
m(L, ea ) = 0 and m(ea , eb ) = ab , for all a, b = 1, . . . , n 1, is called a null frame.
The null pair,
L = t + r ,
L = t r ,
(430)
4.2. Conformal Killing vectorfields. Let x be an inertial coordinate system of Minkowski space Rn+1 . The following are all the isometries and conformal
isometries of Rn+1 .
228
x
(x, x)
10
(x x )
x
x
10Observe that the vector fields K can be obtained applying I to the vector fields T .
4. MINKOWSKI SPACE
229
We also list below the commutator relations between these vector fields,
[L , L ] = L L + L L
[L , T ] = T T
[T , T ] = 0
[T , S] = T
[T , K ] = 2( S + L )
[L , S] = [K , K ] = 0
[L , K ] = K K
(431)
Denoting P(1, n) the Lie algebra generated by the vector fields T , L and K(1, n)
the Lie algebra generated by all the vector fields T , L , S, K we state the following version of the Liouville theorem,
Theorem 4.3. The following statements hold true.
1) P(1, n) is the Lie algebra of all Killing vector fields in Rn+1 .
2) If n > 1, K(1, n) is the Lie algebra of all conformal Killing vector fields in Rn+1 .
3) If n = 1, the set of all conformal Killing vector fields in R1+1 is given by the
following expression
f (x0 + x1 )(0 + 1 ) + g(x0 x1 )(0 1 )
where f, g are arbitrary smooth functions of one variable.
Proof: The proof for part 1 of the theorem follows immediately, as a particular
case, from Proposition (3.4). From (417) as R = 0 and X is Killing we have
D D X = 0 .
Therefore, there exist constants a , b such that X = a x + b . Since X is
Killing D X = D X which implies a = a . Consequently X can be
written as a linear combination, with real coefficients, of the vector fields T , L .
Let now X be a conformal Killing vector field. There exists a function such that
(X)
(432)
n1
2
n+1
D X =
(433)
X =
(434)
and applying D to the first equation, to the second one and subtracting we
obtain
= 0
(435)
230
1
+
(
+
)
+
(0 1 )
X=
0
1
2 x0
x1
2 x0
x1
which proves the result.
Remark.
T0 = 21 (L + L),
S = 21 (u L + u L),
K0 = 21 (u2 L + u2 L).
(437)
Both T0 = t and K0 = (t +|x| )t +2tx i are causal. This makes them important
in deriving energy estimates. Observe that S is causal only in J + (0) J (0).
4.4. Null hypersurfaces. Null hypersurfaces are particularly important as
they correspond to the propagation fronts of solutions to the wave or Maxwell
equation in Minkowski space11. The simplest way to describe the geometry of a
null hypersurfaces is to start with a codimension one hypersurface S0 0 , where
0 is a fixed spacelike hypersurface of Mn+1 . At every point p S0 there are
precisely two null directions ortogonal to the tangent space Tp (S0 ). Let L denote
a smooth null vectorfield orthogonal to S0 and consider the congruence of null
geodesics12 generated by the integral curves of L. As long as these null geodesics
do not intersect the congruence forms a smooth null hypersurface N . We can also
extend L, by parallel transport, to all points of N . Clearly DL L = 0, m(L, L) = 0,
moreover m(L, X) = 0 for every vector X tangent to N . Observe also that L is
uniquely defined up to multiplication by a conformal factor depending only on S0 .
Define, for all vectorfields X, Y tangent to N ,
(X, Y ) = m(X, Y ),
(X, Y ) = m(DX L, Y )
(438)
They are both symmetric tensors, called, respectively, the first and second null fundamental forms of N . Observe that is uniquely defined up to the same conformal
11Or more generally on a Lorentz spacetime.
12 These are in fact straight lines in Minkowski space.
4. MINKOWSKI SPACE
231
by dx
ds = L with x (0, ) the point on S0 of coordinates w. Let,
ab = (
,
),
a
a
ab = (
,
)
a
b
denote the components of and relative to these coordinates. One can easily
d
check that ds
ab = 2ab . The volume element of Ss is given by
p
daSs = ||d 1 . . . dwn1
d
d
with the determinant of the metric . Observe that ds
log || = ab ds
ab = 2tr,
ab
with tr = ab the expansion coefficient of the null hypersurface. Thus,
p
dp
|| = tr ||.
ds
The rate of change of the total volume |Ss | is given by the following formula,
Z
d
|Ss | =
trdaSs .
(439)
ds
Ss
We also remark that verifies the following Ricatti type equation,
d
+ 2 = 0
ds
(440)
which can be explicitely integrated. Thus one can verify that tr(s, 0 ) may become
at a finite value of s > 0 if tr(0, 0 ) < 0 at some point of S0 . This occurence
corresponds to the formation of a caustic.
An arbitrary foliation Sv on N can be parametrized by v(s, ) with (s, ) the
geodesic coordinates defined above. We call = dv
ds the null lapse function of the
foliation and denote by 0 and 0 the restiction of , to Sv . If X is a vectorfield
tangent to the geodesic foliation Ss then X 0 = X 1 X(v)L is tangent to Sv .
Thus, if X, Y are tangent to Ss then (X, Y ) = (X 0 , Y 0 ) and (X 0 , Y 0 ) = (X, Y ).
Relative to the coordinates (v, ) we have
0
ab
= ab ,
0ab = ab .
(441)
where daSv denotes the area element of Sv induced by . The definition does not
depend on the particular foliation.
232
(442)
1
Q
2
(X)
(443)
N1
D2
D1
The integrals are taken with respect to the area elements daN along the null hypersurfaces N1 , N2 and the area elements of the Riemannian metrics induced by m on
1 , 2 . Observe that all integrands are positive if X is causal. The identity (444)
remains valied if X is conformal Killing and Q is traceless.
Proof :
D2
J (D)1
(445)
4. MINKOWSKI SPACE
Similarily, if D 1 ,
Z
Z
Q(X, L) +
N
D1
233
Z
Q(X, T ) =
Q(X, T )
J + (D)2
(446)
Bibliography
235