Lecture Notes On General Relativity
Lecture Notes On General Relativity
Lecture Notes On General Relativity
Matthias Blau
Albert Einstein Center for Fundamental Physics
Institut f
ur Theoretische Physik
Universit
at Bern
CH-3012 Bern, Switzerland
http://www.blau.itp.unibe.ch/Lecturenotes.html
1
Contents
0 Introduction 12
0.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
0.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
0.3 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
0.4 References and Footnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
0.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Tensor Algebra 98
3.1 Principle of General Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.2 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.3 Tensor Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.4 Generally Covariant Integration and Volume Elements . . . . . . . . . . . . . . 108
3.5 Tensor Densities and Volume Elements . . . . . . . . . . . . . . . . . . . . . . . 110
3.6 Towards a Coordinate-Independent Interpretation of Tensors . . . . . . . . . . . 113
3.7 Multilinear Algebra and Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.8 Vielbeins and Orthonormal Frames . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.9 Epilogue: Indices? Indices! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2
4.2 Extension of the Covariant Derivative to Other Tensor Fields . . . . . . . . . . 133
4.3 Main Properties of the Covariant Derivative . . . . . . . . . . . . . . . . . . . . 135
4.4 Uniqueness of the Levi-Civita Connection (Christoffel symbols) . . . . . . . . . 137
4.5 Tensor Analysis: Some Special Cases . . . . . . . . . . . . . . . . . . . . . . . . 138
4.6 Appendix: A Formula for the Variation of the Determinant . . . . . . . . . . . 143
4.7 Covariant Differentiation Along a Curve . . . . . . . . . . . . . . . . . . . . . . 145
4.8 Parallel Transport and Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.9 Example: Parallel Transport on the 2-Sphere . . . . . . . . . . . . . . . . . . . 147
4.10 Fermi-Walker Parallel Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.11 Epilogue: Manifolds? Think Globally, Act Locally! . . . . . . . . . . . . . . . . 153
3
8.1 Symmetries of a Metric (Isometries): Preliminary Remarks . . . . . . . . . . . . 221
8.2 Lie Derivative for Scalars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.3 Lie Derivative for Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.4 Lie Derivative for other Tensor Fields . . . . . . . . . . . . . . . . . . . . . . . . 226
8.5 Lie Derivative of the Metric and Killing Vectors . . . . . . . . . . . . . . . . . . 228
8.6 Lie Derivative for Tensor Densities . . . . . . . . . . . . . . . . . . . . . . . . . 232
4
14.3 Embeddings and Pull-Backs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
14.4 Embedded Hypersurfaces and Normal Vectors . . . . . . . . . . . . . . . . . . . 316
14.5 Hypersurface Orthogonality and Frobenius Integrability . . . . . . . . . . . . . 318
5
20.6 Back to Gravity: Conjugate Momenta and Primary Constraints . . . . . . . . . 407
20.7 Legendre Transform and ADM Hamiltonian . . . . . . . . . . . . . . . . . . . . 409
20.8 Secondary Constraints: the Hamiltonian and Momentum Constraints . . . . . . 411
20.9 Properties and Significance of the Constraints . . . . . . . . . . . . . . . . . . . 413
20.10 Boundary Terms in the ADM Action and Hamiltonian . . . . . . . . . . . . . . 418
20.11 Alternative Derivation of the Hamiltonian Boundary Terms . . . . . . . . . . . 421
20.12 Significance of the Hamiltonian Boundary Terms: ADM Energy . . . . . . . . . 423
6
24.6 Bending of Light by a Star: 3 Derivations . . . . . . . . . . . . . . . . . . . . . 531
24.7 A Unified Description in terms of the Runge-Lenz Vector . . . . . . . . . . . . . 537
29 Black Holes IV: Other Black Hole Solutions (a brief overview) 642
29.1 Kerr-Newman Family of 4-dimensional Black Holes . . . . . . . . . . . . . . . . 642
7
29.2 Other 4-dimensional Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
29.3 Higher-dimensional Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
F: Cosmology 723
8
33.10 Comments on Cosmic Expansion as Expansion of Space . . . . . . . . . . . . 752
G: Varia 837
9
38.2.8 Interlude on (A)dS Schwarzschild . . . . . . . . . . . . . . . . . . . . . . 850
38.2.9 Painleve-Gullstrand-like Coordinates . . . . . . . . . . . . . . . . . . . . 851
38.3 Some Coordinate Systems for anti-de Sitter space . . . . . . . . . . . . . . . . . 852
38.3.1 Global (and Static) Coordinates . . . . . . . . . . . . . . . . . . . . . . . 853
38.3.2 Conformal Coordinates, Conformal Boundary and Penrose Diagrams . . 854
38.3.3 Isotropic (Spatially Conformally Flat) Coordinates . . . . . . . . . . . . 857
38.3.4 Cosmological (Hyperbolic Slicing) Coordinates . . . . . . . . . . . . . . 858
38.3.5 de Sitter Slicing Coordinates . . . . . . . . . . . . . . . . . . . . . . . . 858
38.3.6 anti-de Sitter Slicing Coordinates . . . . . . . . . . . . . . . . . . . . . . 859
38.3.7 Poincare Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859
38.3.8 Plane Wave AdS Coordinates . . . . . . . . . . . . . . . . . . . . . . . . 862
38.3.9 Codimension-2 Hyperbolic Slicing Coordinates . . . . . . . . . . . . . . 864
38.3.10 Painleve-Gullstrand-like Coordinates? . . . . . . . . . . . . . . . . . . . 865
38.4 Warped Products, Cones, and Maximal Symmetry . . . . . . . . . . . . . . . . 867
10
42.8 Plane Waves with more Isometries . . . . . . . . . . . . . . . . . . . . . . . . . 934
11
0 Introduction
1905 was Einsteins magical year. In that year, he published three articles, on light
quanta, on the foundations of the theory of Special Relativity, and on Brownian motion,
each one separately worthy of a Nobel prize. Immediately after his work on Special
Relativity, Einstein started thinking about gravity and how to give it a relativistically
invariant formulation. He kept on working on this problem during the next ten years,
doing little else. This work, after many trials and errors, culminated in his masterpiece,
the General Theory of Relativity, presented in 1915/1916. It is widely considered to
be one of the greatest scientific and intellectual achievements of all time, a beautiful
theory derived from pure thought and physical intuition, capable of explaining, or at
least describing, still today, more than 100 years later, every aspect of gravitational
physics ever observed.
Einsteins key insight was what is now known as the Einstein Equivalence Principle, the
(local) equivalence of gravitation and inertia. This ultimately led him to the realisation
that gravity is best described and understood not as a physical external force like the
other forces of nature but rather as a manifestation of the geometry and curvature of
space-time itself. This realisation, in its simplicity and beauty, has had a profound
impact on theoretical physics as a whole, and Einsteins vision of a geometrisation of
all of physics is still with us today.
These lecture notes for an introductory course on General Relativity are based on a
course that I originally gave in the years 1998-2003 in the framework of the Diploma
Course of the ICTP (Trieste, Italy). Currently these notes form the basis of a course
that I teach as part of the Master in Theoretical Physics curriculum at the University
of Bern.
In the intervening years, I have made (and keep making) various additions to the lecture
notes, and they now include much more material than is needed for (or can realistically
be covered in) an introductory 1- or even 2-semester course, say, but I hope to have
nevertheless preserved (at least in parts) the introductory character and accessible style
of the original notes.
Invariably, any set of (introductory) lecture notes has its shortcomings, due to lack of
space and time, the requirements of the audience and the expertise (or lack thereof)
and interests of the lecturer. These lecture notes are, of course, no exception. In
particular, the emphasis in these notes is on developing the theory (I am a theoretical
physicist), not on experiments or connecting the theory with observation, but stops
short of doing real mathematical general relativity (i.e. proving theorems), as this would
12
require significantly more mathematical sophistication and machinery than I want to
assume (or can develop) in these notes. I hope that these lecture notes nevertheless
provide the necessary background for studying these or other more advanced topics not
covered in these notes.
I should also stress that I have written these notes primarily for myself, and for my
students. I am making them publicly available just in case somebody else happens to
find them useful, and because I know that previous versions of these notes have enjoyed
some popularity. However, if you do not like these notes or my way of explaining things,
or do not find what you are looking for, please do not complain to me (yes, this has
happened in the past). There will occasionally be further additions and updates to these
notes, reflecting however my personal preferences and taste rather than any (futile) aim
for completeness.
Lecture notes of this length unavoidably contain some minor mistakes somewhere. How-
ever, I hope that these notes are free of major conceptual errors and blunders. I am
of course grateful for any constructive criticism and corrections. If you have such com-
ments, or also if you just happen to find these notes useful, please let me know (blau
at itp.unibe.ch).
0.1 Prerequisites
Special Relativity,
Lagrangian mechanics,
I will thus attempt to explain every single other thing that is required to understand
the basics of Einsteins theory of gravity. However, this also means that I will not be
13
able to discuss some mathematically more advanced and yet equally important aspects
of General Relativity.
0.2 Overview
E: Black Holes
F: Cosmology
G: Varia
I refer to the Table of Contents for rather detailed information about the contents of
the individual parts and sections of these notes and want to just provide some remarks
here for a first orientation.
Part A of the lecture notes is dedicated to explaining and exploring the consequences
of Einsteins insights into the relation between gravity and space-time geometry, and
to developing the machinery (of tensor calculus and Riemannian geometry) required to
describe physics in a curved space time, i.e. in a gravitational field.
From about section 3 onwards, Part A can be read in parallel with other parts of
these notes which deal with various applications of General Relativity. In particular,
at this point in the course I find it useful to develop in parallel (and suggest to read in
parallel) the more formal material on tensor analysis in Part A, and Part D (dealing
with solar system tests of general relativity) cf. the more detailed suggestions at the
end of section 2. Not only does this provide an interesting and physically relevant
application and illustration of the machinery developed so far, it also serves to provide
an appropriate balance between physics and formalism in the lectures.
The topics covered in Parts A and D, together with the first section 18 of Part C
dealing with the Einstein field equations, probably form the core of most introductory
courses on general relativity. This provides (or is meant to provide) the basis for other
applications or investigations of general relativity, and other sections of Part C and
Parts E-G provide a reasonably large variety of topics to choose from.
14
In Part B of the lecture notes I have collected a number of different more mathematical
topics that develop the formalism of tensor calculus and differential geometry in one
way or another. Stricly speaking, none of these topics are essential for understanding
some of the more elementary aspects of general relativity to be treated later on (so Part
B can also be regarded as a mathematical appendix to the notes). However, some of
them are required at a later stage to understand, or even formulate, certain somewhat
more advanced aspects of general relativity (and it is perhaps best to then go back to
this section if and when needed), and others are included simply because they are fun
or beautiful (or, usually, both).
0.3 Literature
Most of the material covered in these notes, in particular in the introductory parts, is
completely standard and can be found in many places. While my way of explaining
things is my own, and numerous gratuitous Remarks throughout the notes as well
as the selection of more advanced topics reflect my own interests, I make no claim to
major originality in these notes and have not attempted to reinvent the wheel.
In particular, in earlier versions of these notes the presentation of much of the intro-
ductory material followed quite closely the treatment in Weinbergs classic
and readers familiar with this book may still recognise the similarities in some places.
Even though my own way of thinking about general relativity is much more geometric
(and this has definitely influenced later versions of and additions to these notes), I have
found that the pragmatic approach adopted by Weinberg is ideally suited to introduce
general relativity to students with little mathematical background.
As far as more recent and modern books are concerned, here is a short personal selection
of my favourites:
2. At an intermediate level (i.e. more or less at the level of these notes), my favourite
modern book is
15
E. Poisson, A Relativists Toolkit: the Mathematics of Black Hole Mechanics
R. Wald, General Relativity
and I will frequently refer to these books in the body of the notes for discussions
of more advanced and/or more mathematical topics.
A. Pais, Subtle is the Lord: the science and life of Albert Einstein
As mentioned before, much of the material covered in these notes is quite standard,
and can be found in many places, and I have not attempted to provide references or
attributions for this.
Nevertheless, these lecture notes contain a large number of footnotes, with significantly
higher density in the sections of the notes dealing with more advanced and, specifically,
more recent developments. For the most part, these are meant as pointers to the
literature for further reading and with more information.
When referring to textbooks, I usually just refer to them in the form Author,
Title (as above), without indicating publisher, year, . . . If you actually need this
information, it will be easy for you to find it.
When referring to articles, if they are available from the preprint server at
http://arXiv.org/
I usually just refer to the arXiv number, regardless of whether or not that article
has been published elsewhere (this just reflects the by now standard practice that
people are more likely to first go there rather than to the library to look for or at
an article).
16
References to pre-arXiv articles are given in the traditional complete Author(s),
Title, Journal, . . . form.
0.5 Exercises
This simply reflects my own style of teaching, where exercises are very much integrated
into the course and mainly serve the purpose of getting students to look at what was done
in the course and to perhaps fill in some details that I skipped in class. In particular,
I am no fan of exercises that go significantly beyond what is covered in class or in the
notes (if it is relevant, I should explain or include it, if it is not then we may as well not
bother).
However, most (sub-)sections contain numerous Remarks, and many of them con-
tain supplementary and/or more advanced information and material, and these may be
regarded as (annotated) exercises or used as a basis for exercises.
17
A: Physics in a Gravitational Field
and General Covariance
18
1 From the Einstein Equivalence Principle to Geodesics
(but we will come back in some detail below to the question if/why the same mass
parameter m appears on both sides of this equation, so as to incorporate the observation,
going back to Galileo, that all bodies fall at the same rate in a a gravitational field).
The latter is the Poisson equation
with GN denoting, here and throughout, Newtons constant, i.e. the gravitational cou-
pling constant, and where is the mass density, and = c2 the associated rest mass
energy density - I will set c = 1 in the following and use .
Let us start with the field equation. It is immediately evident that this cannot be the
final story. Not only is this equation not Lorentz invariant. Because of the absence
of time-derivatives in (1.2), it actually describes an action at a distance and an in-
stantaneous propagation of the gravitational field to every point in space (if you wiggle
your mass distribution here now, this will immediately effect the gravitational potential
arbitrarily far away). This is something that Einstein had just successfully exorcised
from other aspects of physics, and clearly Newtonian gravity had to be revised as well.
It is then also immediately clear that what would have to replace Newtons theory is
something rather more complicated. The reason for this is that, according to Special
Relativity, mass is just another form of energy. Then, since gravity couples to masses, in
a relativistically invariant theory gravity will also have to couple to energy. In particular,
therefore, gravity would have to couple to gravitational energy, i.e. to itself. As a
consequence, the new gravitational field equations will, unlike Newtons, have to be
non-linear: the field of the sum of two masses cannot equal the sum of the gravitational
fields of the two masses because it should also take into account the gravitational energy
of the two-body system.
Now, having realised that Newtons theory cannot be the final word on the issue, how
does one go about finding a better theory?
I will first very briefly discuss (and then dismiss) what at first sight may appear to
be the most natural and naive approach to formulating a relativistic theory of gravity,
19
namely the simple replacement of Newtons field equation (1.2) by its relativistically
covariant version
= 4GN = 4GN , (1.3)
where is the Lorentz invariant dAlembert or wave operator. While this looks promis-
ing, something cant be quite right about this equation. We already know (from Special
Relativity) that is not a scalar but rather the 00-component of a tensor, the energy-
momentum tensor, so if actually appears on the right-hand side, cannot be a scalar,
while if is a scalar something needs to be done to fix the right-hand side.
Turning first to the latter possiblility, one option that suggests itself is to replace by
the trace T = T of the energy-momentum tensor. This is by definition / construction
a scalar, and it will agree with in the non-relativistic limit (where rest mass dominates
over other contributions). Thus a first attempt at fixing the above equation might look
like
= 4GN T . (1.4)
This is certainly an attractive equation, but it definitely has the drawback that it is
too linear. Recall from the discussion above that the universality of gravity (coupling
to all forms of matter) and the equivalence of mass and energy lead to the conclusion
that gravity should couple to gravitational energy, invariably predicting non-linear (self-
interacting) equations for the gravitational field. However, the left hand side could be
such that it only reduces to or of the Newtonian potential in the Newtonian limit
of weak time-independent fields. Thus a second attempt at fixing the above equation
might look like
() = 4GN T , (1.5)
Such a scalar relativistic theory of gravity and variants thereof were proposed and
studied among others by Abraham, Mie, and Nordstrm. As it stands, this field equation
appears to be perfectly consistent (and it may be interesting to discuss if/how the
Einstein equivalence principle, which will put us on our route towards metrics and
space-time curvature is realised in such a theory). However, regardless of this, this
theory is incorrect simply because it is ruled out experimentally. The easiest way to see
this (with hindsight) is to note that the energy-momentum tensor of Maxwell theory
(6.47) is traceless (6.121), and thus the above equation would predict no coupling of
gravity to the electro-magnetic field, in particular to light, hence in such a theory there
would be no deflection of light by the sun etc.1
The other possibility to render (1.3) consistent is the, a priori perhaps much less com-
pelling, option to think of and or not as scalars but as (00)-components of
1
For more on the history and properties of scalar theories of gravity see the review by D. Giulini,
What is (not) wrong with scalar gravity?, arXiv:gr-qc/0611100.
20
some tensor, in which case one could try to salvage (1.3) by promoting it to a tensorial
equation
{Some tensor generalising } 4GN T . (1.6)
This is indeed the form of the field equations for gravity (the Einstein equations) we
will ultimately be led to (see section 18.4), but Einstein arrived at this in a completely
different, and much more insightful, way.
Let us now, very briefly and in a streamlined way, try to retrace (one aspect of) Einsteins
thoughts, namely on the relation between inertial and gravitational mass, which, as we
will see, will lead us rather quickly to the geometric picture of gravity sketched in the
Introduction.
To that end we return to the Newtonian equation of motion (1.1). Recall that in this
Newtonian theory, there are two a priori completely independent concepts of mass:
inertial mass mi (or acceleration mass), which accounts for the resistance of a body
or particle against acceleration and appears universally on the left-hand-side of
the Newtonian equation of motion
mi~a = F~ (1.7)
gravitational mass mg which is the mass the gravitational field couples to, i.e. it
is the gravitational charge of a particle,
~g = mg
F ~ . (1.8)
Now it is an important empirical fact that the inertial mass of a body is equal to its
gravitational mass. This realisation, at least with this clarity, is usually attributed
to Newton, although it goes back to experiments and observations by Galileo usually
paraphrased as all bodies fall at the same rate in a gravitational field. (It is not true,
though, that Galileo dropped objects from the leaning tower of Pisa to test this - he
used an inclined plane, a water clock and a pendulum).
21
Figure 1: Experimenter and his two stones freely floating somewhere in outer space, i.e.
in the absence of forces.
is perfectly acceptable for any ratio qe /mi , and Einstein was very impressed with the
observed equality of mi and mg . This should, he reasoned, not be a mere coincidence
but is probably trying to tell us something rather deep about the nature of gravity.
With his unequalled talent for discovering profound truths in simple observations, he
concluded (calling this der gl
ucklichste Gedanke meines Lebens (the happiest thought
of my life)) that the equality of inertial and gravitational mass suggests a close relation
between inertia and gravity itself, suggests, in fact, that locally effects of gravity and
acceleration are indistinguishable,
2. Now assume (Figure 2) that somebody on the outside suddenly pulls the box up
with a constant acceleration. Then of course, our friend will be pressed to the
bottom of the elevator with a constant force and he will also see his stones drop
to the floor.
22
Figure 2: Constant acceleration upwards mimics the effect of a gravitational field: ex-
perimenter and stones drop to the bottom of the box.
3. Now consider (Figure 3) this same box brought into a constant gravitational field.
Then again, he will be pressed to the bottom of the elevator with a constant force
and he will see his stones drop to the floor. With no experiment inside the elevator
can he decide if this is actually due to a gravitational field or due to the fact that
somebody is pulling the elevator upwards.
Thus our first lesson is that, indeed, locally the effects of acceleration and gravity
are indistinguishable.
4. Now consider somebody cutting the cable of the elevator (Figure 4). Then the
elevator will fall freely downwards but, as in Figure 1, our experimenter and his
stones will float as in the absence of gravity.
Thus lesson number two is that, locally the effect of gravity can be eliminated
by going to a freely falling reference frame (or coordinate system). This should
not come as a surprise. In the Newtonian theory, if the free fall in a constant
gravitational field is described by the equation
x
= g (+ other forces) , (1.10)
and the effect of gravity has been eliminated by going to the freely falling coordi-
nate system . The crucial point here is that in such a reference frame not only our
23
Figure 3: Effect of a constant gravitational field: indistinguishable for our experimenter
from that of a constant acceleration in Figure 2.
Figure 4: Free fall in a gravitational field has the same effect as no gravitational field
(Figure 1): experimenter and stones float.
24
Figure 5: Experimenter and his stones in a non-uniform gravitational field: the stones
will approach each other slightly as they fall to the bottom of the elevator.
observer will float freely, but because of the equality of inertial and gravitational
mass he will also observe all other objects obeying the usual laws of motion in the
absence of gravity.
5. In the above discussion, I have put the emphasis on constant accelerations and
on locally. To see the significance of this, consider our experimenter with his
elevator in the gravitational field of the earth (Figure 5). This gravitational field
is not constant but spherically symmetric, pointing towards the center of the
earth. Therefore the stones will slightly approach each other as they fall towards
the bottom of the elevator, in the direction of the center of the gravitational field.
6. Thus, if somebody cuts the cable now and the elevator is again in free fall (Figure
6), our experimenter will float again, so will the stones, but our experimenter will
also notice that the stones move closer together for some reason. He will have to
conclude that there is some force responsible for this.
This is lesson number three: in a non-uniform gravitational field the effects of
gravity cannot be eliminated by going to a freely falling coordinate system. This
is only possible locally, on such scales on which the gravitational field is essentially
constant.
Einstein formalised the outcome of these thought experiments in what is now known as
the Einstein Equivalence Principle which roughly states that physics in a freely falling
25
Figure 6: Experimentator and stones freely falling in a non-uniform gravitational field.
The experimenter floats, so do the stones, but they move closer together, indicating the
presence of some external force.
and
There are different versions of this principle depending on what precisely one means by
the laws of nature. If one just means the laws of Newtonian (or relativistic) mechanics,
then this priciple essentially reduces to the statement that inertial and gravitational
mass are equal. Usually, however, this statement is taken to imply also Maxwells
2
S. Weinberg, Gravitation and Cosmology.
3
J. Hartle, Gravity. An Introduction to Einsteins General Relativity.
26
theory, quantum mechanics etc.4 What it pragmatically asserts in one of its stronger
forms is that
The power of the above principle, which we will regard as a heuristic guideline, rather
than trying to (prematurely) give it a mathematically precise formulation, lies in the
fact that we can combine it with our understanding of physics in accelerated reference
systems to gain insight into the physics in a gravitational field. Two immediate conse-
quences of this (which cannot be derived on the basis of Newtonian physics or Special
Relativity alone) are
To see the inevitability of the first assertion, imagine a light ray entering the rocket /
elevator in Figure 1 horizontally through a window on the left hand side and exiting
again at the same height through a window on the right. Now imagine, as in Figure
2, accelerating the elevator upwards. Then clearly the light ray that enters on the left
will exit at a lower point of the elevator on the right because the elevator is accelerating
upwards. By the equivalence principle one should observe exactly the same thing in a
constant gravitational field (Figure 3). It follows that in a gravitational field the light
ray is bent downwards, i.e. it experiences a downward acceleration with the (locally
constant) gravitational acceleration g.
To understand the second assertion, one can e.g. simply appeal to the so-called twin-
paradox of Special Relativity: the accelerated twin is younger than his unaccelerated
inertial sibling. Hence accelerated clocks run slower than inertial clocks. Hence, by
the equivalence principle, clocks in a gravitational field run slower than clocks in the
absence of gravity.
Alternatively, one can imagine two observers at the top and bottom of the elevator,
having identical clocks and sending light signals to each other at regular intervals as
determined by their clocks. Once the elevator accelerates upwards, the observer at
the bottom will receive the signals at a higher rate than he emits them (because he
is accelerating towards the signals he receives), and he will interpret this as his clock
running more slowly than that of the observer at the top. By the equivalence principle,
the same conclusion now applies to two observers at different heights in a gravitational
4
For a discussion of different formulations of the equivalence principle and the logical relations
among them, see E. di Casola, S. Liberati, S. Sonego, Nonequivalence of equivalence principles,
arXiv:1310.7426 [gr-qc].
27
field. This can also be interpreted in terms of a gravitational redshift or blueshift
(photons losing or gaining energy by climbing or falling in a gravitational field), and we
will return to a more quantitative discussion of this effect in section 2.9.
What the equivalence principle tells us is that we can expect to learn something about
the effects of gravitation by transforming the laws of nature (equations of motion) from
an inertial Cartesian coordinate system to other (accelerated, curvilinear) coordinates.
As a first step, we will, in section 1.3 below, discuss the above example of an observer
undergoing constant acceleration in the context of special relativity.
As a preparation for this, and the remainder of the course, this section will provide a
lightning review of the Lorentz-covariant formulation of special relativity, mainly to set
the notation and conventions that will be used throughout, and only to the extent that
it will be used in the following.
1. Minkowski space(-time)
( a ) = ( 0 = ct, k = xk ) , (1.13)
where c is the speed of light. Typically in these notes a will indicate such
a (locally) inertial coordinate system, whereas generic coordinates will be
called x etc. We will almost always work in units in which c = 1.
(b) Minkowski space is equipped with a prescription for measuring distances,
encoded in a line-element which, in these coordinates, takes the form
X
ds2 = (d 0 )2 + (d k )2 . (1.14)
k
with metric (ab ) = diag(1, +1, +1, +1) or, more explicitly,
1 0 0 0
0 +1 0 0
ab = (1.16)
0 0 +1 0
0 0 0 +1
(thus we are using the mostly plus convention).
28
2. Lorentz Transformations
a 7 a = La b (1.17)
s2 ab da db = ab d a d b = ds2
d ab Lac Lbd = cd . (1.18)
= L , Lt L = (1.19)
(1 + )t (1 + ) = () + ()t = 0 . (1.21)
a = ab b with ab ac cb = ba . (1.22)
(c) Poincare transformations are those affine transformations that leave the Min-
kowski line-element invariant. They are composed of Lorentz transformations
and arbitrary constant translations and thus have the form
a 7 a = La b + a , (1.23)
29
infinitesimally
a = ab b + a . (1.24)
Any two inertial systems in the sense of the equivalence principle of special
relativity are related by a Poincare transformation.
(a) The Minkowski metric defines the Lorentz (and Poincare) invariant distance
()2 = ab (Pa Q
a
)(Pb Q
b
) (1.25)
(b) Depending on the sign of ()2 , the two events P, Q are called, spacelike,
lightlike (null) or timelike separated,
> 0 spacelike separated
2
() = = 0 lightlike separated (1.26)
< 0 timelike separated
(c) The set of events that are lightlike separated from P define the lightcone
at P . It consists of two components (joined at P ), the future and the past
lightcone, distinguished by the sign of Q0 0 (positive for Q on the future
P
0 > 0 , negative for Q on the past lightcone).
lightcone, Q P
d a
a (0 ) = ()|=0 . (1.27)
d
It is called spacelike, lightlike (null) or timelike, depending on the sign of
ab a b ,
> 0 spacelike
a b
ab = 0 lightlike (1.28)
< 0 timelike
This sign (and hence this classification) depends only on the image of the
curve, not its parametrisation.
(b) A curve whose tangent vector is everywhere timelike is called a timelike curve
(and likewise for lightlike and spacelike curves). A curve whose tangent
vector is everywhere timelike or null (i.e. non-spacelike) is called a causal
curve. Worldlines of massive particles are timelike curves, those of massless
particles (light) are null curves.
30
(c) A natural Lorentz-invariant parametrisation of timelike curves is provided by
the Lorentz-invariant proper time along the curves,
a = a ( ) , (1.29)
with
p p p
cd = ds2 = ab d a d b = ab a b d
d a ( ) d b ( ) (1.30)
ab = c2 .
d d
Likewise spacelike curves are naturally parametrised by proper distance ds.
The derivative with respect to proper time will be denoted by an overdot,
d a
a ( ) = ( ) . (1.31)
d
d a a d
a ( ) = ( ) = b b ( ) = Lab b ( ) . (1.32)
d d
These are the prototypes of what are called Lorentz vectors or, more gener-
ally, Lorentz tensors.
5. Lorentz Vectors
(a) Lorentz vectors (or 4-vectors) are objects with components v a which trans-
form under Lorentz transformations with the matrix Lab (to be thought of as
the Jacobian of the transformation relating a and a ),
va = Lab v b . (1.33)
31
(a) Lorentz scalars are objects that are invariant under Lorentz transformations.
Examples are scalar products and norms of Lorentz vectors.
(b) Lorentz covectors are objects ua that transform under Lorentz transforma-
tions with the (contragredient = transpose inverse) representation
= (Lt )1 = L 1 , (1.35)
i.e.
a = ab ub
u , ab = ac Lcd db , (1.36)
Lt L = t = ac bd cd = ab . (1.37)
u : v V 7 u(v) = ua v a R . (1.38)
Linear combinations of (p, q)-tensors are again (p, q)-tensors. Arbitrary prod-
ucts and contractions of Lorentz tensors are again Lorentz tensors (and the
tensor type can be read off from the number and position of the free in-
dices).
7. Tensor Fields
(a) Lorentz tensor fields are assignments of Lorentz tensors to each point of
Minkowski space,
a ...a
T : 7 Tc11...cqp () . (1.41)
32
(b) Given a vector field V a (), ab V a ()V b () is an example of a scalar field, and
given a scalar field f (), its partial derivatives give a covector field
Ua () = a f () a f () (1.42)
= ab a b (1.44)
are Lorentz invariant in the sense that they are satisfied in one inertial system
iff they are satisfied in all inertial systems.
ua a ( ) (1.46)
ua ua ab ua ub = c2 . (1.47)
ac uc cb ac ub = 0 , (1.50)
33
(c) The action for a free massive particle with worldline a ( ) is essentially the
total proper time along the path,
Z Z p
2
S[] = mc d = mc ab d a d b , (1.52)
9. Energy-Momentum 4-Vector
L
pa = = mab ub pa = mua = m(d a /d ) (1.55)
(d a /d)
(c) Lt
pk = = m(v)vk , (1.57)
v k
ab pa pb = m2 c2 E 2 = m2 c4 + p~2 c2 . (1.59)
We return to the issue discussed in the context of the Einstein equivalence principle in
section 1.1, namely physics as experienced by an observer undergoing constant accel-
eration (as a precursor to studying this observer in a genuine gravitational field), now
specifically within the framework of special relativity.
34
Specialising (1.50) to an observer accelerating in the 1 -direction (so that in the mo-
mentary restframe of this observer one has ua = (1, 0, 0, 0), aa = (0, a, 0, 0)), we will say
that the observer undergoes constant acceleration if a is time-independent. To deter-
mine the worldline of such an observer, we note that the general solution to (1.47) with
u2 = u3 = 0,
ab ua ub = (u0 )2 + (u1 )2 = 1 , (1.60)
is
u0 = cosh F ( ) , u1 = sinh F ( ) (1.61)
for some function F ( ). Thus the acceleration is
with norm
a2 = F 2 , (1.63)
and an observer with constant acceleration is characterised by F ( ) = a ,
ab a b = ( 0 )2 + ( 1 )2 = a2 (1.66)
We can now ask the question what the Minkowski metric or line-element looks like in
the restframe of such an observer. Note that one cannot expect this to be again the
constant Minkowski metric ab : the transformation to an accelerated reference system,
while certainly allowed in special relativity, is not a Lorentz transformation, while ab
is, by definition, invariant under Lorentz-transformations.
We are thus looking for coordinates that are adapted to these accelerated observers
in the same way that the inertial coordinates are adapted to static observers ( 0 is
proper time, and the spatial components i remain constant). In other words, we seek a
coordinate transformation ( 0 , 1 ) (, ) such that the worldlines of these accelerated
observers are characterised by = constant (this is what we mean by restframe, the
observer stays at a fixed value of ) and ideally such that then is proportional to the
proper time of the observer.
35
worldline of a
stationary observer
eta constant
rho constant
Figure 7: Rindler metric: Rindler coordinates (, ) cover the first quadrant 1 >
| 0 |. Indicated are lines of constant (hyperbolas, worldlines of constantly accelerating
observers) and lines of constant (straight lines through the origin). The quadrant is
bounded by the lightlike lines 0 = 1 = . An inertial observer reaches and
crosses the line = in finite proper time = 0 .
It is now easy to see that in terms of these new coordinates the 2-dimensional Minkowski
metric ds2 = (d 0 )2 + (d 1 )2 (we are now suppressing, here and in the remainder of
this subsection, the transverse spectator dimensions 2 and 3) takes the form
ds2 = 2 d 2 + d2 . (1.68)
This is the so-called Rindler metric.
The Rindler coordinates and are obvisouly in some sense hyperbolic (Lorentzian)
analogues of polar coordinates (x = r cos , y = r sin , ds2 = dx2 + dy 2 =
dr 2 + r 2 d2 ). In particular, since
0
( 1 )2 ( 0 )2 = 2 , = tanh , (1.69)
1
by construction the lines of constant , = 0 , are hyperbolas, ( 1 )2 ( 0 )2 = 20 ,
while the lines of constant = 0 are straight lines through the origin, 0 =
(tanh 0 ) 1 .
36
The metric in these new coordinates is time-independent, where time means ,
and time-independent means that the coefficients of the metric or line-element in
(1.68) do not depend on . This is due to the fact that the generator of -
time evolution is actually the generator of a Lorentz boost in the ( 0 , 1 )-plane
in Minkowski space,
= ( 0 ) 0 + ( 1 ) 1 = 1 0 + 0 1 . (1.70)
Since a Lorentz boost leaves the Minkowski metric invariant, the latter has to be
invariant under translations in , i.e. it has to be -independent, as is indeed the
case.
Along the worldline of an observer with constant one has d = 0 d, so that his
proper time parametrised path is
Even though (1.68) is just the metric of Minkowski space-time, written in accelerated
coordinates, this metric exhibits a number of interesting features that are prototypical
of more general metrics that one encounters in general relativity:
1. First of all, we notice that the coefficients of the line element (metric) in (1.68)
are no longer constant (space-time independent). Since in the case of constant
acceleration we are just describing a fake gravitational field, this dependence
on the coordinates is such that it can be completely and globally eliminated by
passing to appropriate new coordinates (namely inertial Minkowski coordinates).
Since, by the equivalence principle, locally an observer cannot distinguish between
a fake and a true gravitational field, this now suggests that a true gravitational
field can be described in terms of a space-time coordinate dependent line-element
where the coordinate dependence on the x is now such that it cannot be elimi-
nated globally by a suitable choice of coordinates.
37
coordinate singularity at the origin of standard polar coordinates in the Cartesian
plane). More generally, whenever a metric written in some coordinate system
appears to exhibit some singular behaviour, one needs to investigate whether this
is just a coordinate singularity or a true singularity of the gravitational field itself.
3. The above coordinates do not just fail at = 0, they actually fail to cover large
parts of Minkowski space. Thus the next lesson is that, given a metric in some
coordinate system, one has to investigate if the space-time described in this way
needs to be extended beyond the range of the original coordinates. One way to
analyse this question (which we will make extensive use of in sections 25 and 26
when trying to understand and come to terms with black holes) is to study light
rays or the worldlines of freely falling (inertial) observers.
In the present case, an example of an inertial observer is a static observer in
Minkowski space, i.e. an observer at a fixed value of 1 , say, with 0 = his proper
time. In Rindler coordinates this is described by the condition that 1 = cosh
is a constant, so this is most certainly not a straight line in an (, )-diagram.
Such an observer will of course discover that = + is not the end of the world
(indeed, he crosses this line at finite proper time = 1 ) and that Minkowski
space continues (at the very least) into the quadrant 0 > | 1 | (see Figure 7 for
an illustration of this).
2 d 2 = d2 d = 1 d . (1.74)
| |2 = ( 0 )2 ( 1 )2 . (1.75)
38
6. Finally we note that there is a large region of Minkowski space that is invisible
to the constantly accelerated observers. While a static observer will eventually
receive information from any event anywhere in space-time (his past lightcone
will eventually cover all of Minkowski space . . . ), the past lightcone of one of
the Rindler accelerated observers (whose worldlines asymptote to the lightcone
direction 0 = 1 ) will asymptotically only cover one half of Minkowski space,
namely the region 0 < 1 . Thus any event above the line 0 = 1 will forever be
invisible to this class of observers. Such an observer-dependent horizon has some
similarities with the event horizon characterising a black hole (see section 26.4 for
a first encounter with such an object, and section 31 for a detailed discussion).
In order to move away from constant accelerations (as models of observers in constant
gravitational fields only), we now consider the effect of arbitrary (general) coordinate
transformations on the laws of special relativity and the geometry of Minkowski space.
This may look like a somewhat exaggerated move at this point (should we perhaps
not just look at coordinate transformations to coordinates that somehow correspond to
adapted coordinates for some arbitrary accelerated observer?), but
there are many useful things that one can learn from doing this;
and we will see later (when discussing the relation between the Einstein Equiva-
lence Principle and the Principle of General Covariance in section 3.1), that the
relation between the description of physics in an arbitrary gravitational field and
the behaviour of this description under arbitrary coordinate transformations is
much closer and more far-reaching than we perhaps have the right to expect at
the moment.
Let us see what the equation of motion (1.49) of a free massive particle looks like when
written in some other (non-inertial, accelerating) coordinate system. It is extremely
useful for bookkeeping purposes and for avoiding algebraic errors to use different kinds
of indices for different coordinate systems. Thus we will call the new coordinates x ( b )
and not, say, xa ( b ).
First of all, proper time should not depend on which coordinates we use to describe the
motion of the particle (the particle couldnt care less what coordinates we experimenters
39
or observers use). [By the way: this is the best way to resolve the so-called twin-
paradox: It doesnt matter which reference system you use - the accelerating twin in
the rocket will always be younger than her brother when they meet again.] Thus
d 2 = ab d a d b
a b
= ab dx dx . (1.76)
x x
Here
a
Ja (x) = (1.77)
x
is the Jacobi matrix associated to the coordinate transformation a = a (x ), and we
will make the assumption that (locally) this matrix is non-degenerate, thus has an
inverse Ja (x) or Ja () which is the Jacobi matrix associated to the inverse coordinate
transformation x = x ( a ),
Ja Jb = ba Ja Ja = . (1.78)
We see that in the new coordinates, proper time and distance are no longer measured
by the Minkowski metric in its standard form (the constant matrix ab ), but by
d 2 = g (x)dx dx , (1.79)
a b
g (x) = ab . (1.80)
x x
The fact that the Minkowski metric written in the coordinates x in general depends
on x should not come as a surprise - after all, this also happens when one writes the
Euclidean metric in spherical coordinates etc.
It is easy to check, using (1.78), that the inverse metric, which we will denote by g ,
is given by
x x
g (x) = ab . (1.82)
a b
We will have much more to say about the metric below and, indeed, throughout this
course.
Turning now to the equation of motion, the usual rules for a change of variables give
d a a dx
= , (1.83)
d x d
40
a
where x is an invertible matrix at every point. Differentiating once more, one finds
d2 a a d2 x 2 a dx dx
= +
d 2 x d 2 x x d d
a
d x2 2 b dx dx
a
= + b
x d 2 x x d d
a
d x 2 x 2 b dx dx
= + . (1.84)
x d 2 b x x d d
Thus, since the matrix appearing outside the square bracket is invertible, in terms of the
coordinates x the equation of motion, or the equation for a straight line in Minkowski
space, becomes
d2 x x 2 a dx dx
+ a =0 . (1.85)
d 2 x x d d
The second term in this equation, which we will write as
d2 x
dx dx
+ d d = 0 , (1.86)
d 2
where
x 2 a
= , (1.87)
a x x
or, more compactly,
= Ja Ja = Ja Ja Ja J
a
, (1.88)
While (1.86) looks a bit complicated and unattractive, it is simply the general variant
of a calculation that you have probably done numerous times before in various specific
contexts. Moreover, there are at least two very useful things that we can extract or
anticipate from this equation, namely
in any theory of gravity satisfying the Einstein equivalence principle. Let us now discuss
these features in turn (relegating some uninspiring calculational details to the end of
this subsection):
41
1. the Metric as a Candidate for the Gravitational Potential
It turns out that the above (pseudo-)force term can be expressed in terms of the
partial derivatives of the metric (1.80) as
= g
(1.89)
= 12 (g , +g , g , )
It is an elementary but nevertheless useful exercise to check this (see below - but
do try this yourself as well).
This shows that the components of the metric appear to play the role of po-
tentials for the gravitational pseudo-force. In particular, since in principle all
components of the metric can contribute to , we learn the interesting fact
that in order to achieve this a single scalar potential, as in the Newtonian theory,
is completely insufficient.
If the metric indeed plays the role of the gravitational potential, as suggested by
these considerations, then it will play the role of the fundamental dynamical vari-
able of gravity. Since the metric encodes what one usually refers to as the geometry
of a space(-time), namely the information required to determine distances, areas,
volumes etc., this means that we are being led to the conclusion that any theory
of gravity based on the equivalence principle is a theory of dynamical geometry.
Wow . . .
Thus the geodesic equation transforms in the simplest possible non-trivial way
under coordinate transformations x y, namely with the Jacobi matrix
y
J = . (1.91)
x
42
We will see later that this transformation behaviour characterises/defines tensors,
in this particular case a vector (or contravriant tensor of rank 1).
In particular, since this matrix is assumed to be invertible, we reach the conclusion
that the left hand side of (1.90) is zero if and only if the term in square brackets
on the right hand side is zero,
d2 y
dy dy
d2 x
dx dx
+ = 0 + d d = 0 (1.92)
d 2 d d d 2
This is what is meant by the statement that the equation takes the same form
in any coordinate system, and is therefore satisfied in one coordinate system if
and only if it is satisfied in all coordinate systems. We see that in this case this
is achieved by having the equation transform in a particularly simple way under
coordinate transformations, namely as a tensor.
One might then, on the basis of the equivalence principle, also want to postulate that
the motion of particles in a general gravitational field, described by a metric, is then
still governed by (1.86) and (1.89). In this more general context the are referred
to as the Christoffel symbols of the metric.
Happily, as we will see below, in section 1.7, these equations need not be postulated
at all - they are simply the geodesic equations satisfied by paths that extremise proper
time (or proper distance), and are thus the Euler-Lagrange equations for the obvious
R
generalisation of the special relativistic action for a free particle, S d , to an
arbitrary metric.
1. Proof of (1.89):
From
g = ab Ja Jb (1.93)
one deduces
a
g, = ab (J Jb + Ja J
b
) (1.94)
where
a 2a
J = Ja = a
= J . (1.95)
x x
Therefore, now adopting (1.89) as the definition of the -symbols, one has
= 12 (g, + g, g, )
a
= 12 ab (J Jb + Ja J
b a b
+ J J + Ja J
b a b
J J Ja J
b
) (1.96)
= ab Ja J
b
,
where the cancellations in passing to the last line arise from the symmetries
b = J b etc.
ab = ba , J
43
Thus, finally (and writing out everything in detail for once),
= g = cd Jc Jd ab Ja J
b b
= cd Jc da ab J
(1.97)
= ca Jc ab J
b
= bc Jc J
b
= Jb J
b
,
as was to be shown.
2. Proof of (1.90):
Equating this result to (1.84) and using the chain rule for partial derivatives
y y a
= , (1.99)
x a x
one finds
d2 y
dy dy
y d2 x
dx dx
+ = + (1.100)
d 2 d d x d 2 d d
as claimed.
Above we saw that the motion of free particles in Minkowski space in curvilinear coordi-
nates is described in terms of a modified metric, g , and a force term representing
the pseudo-force on the particle. Thus the Einstein Equivalence Principle suggests
that an appropriate description of true gravitational fields is in terms of a metric tensor
g (x) (and its associated Christoffel symbols) which can only locally be related to the
Minkowski metric via a suitable coordinate transformation (to locally inertial coordi-
nates). Thus our starting point will now be a space-time equipped with some metric
g (x), which (by analogy with the Euclidean and Minkowski metrics) we will assume
to be symmetric and non-degenerate, i.e.
The metric encodes the information how to measure (spatial and temporal) distances,
as well as areas, volumes etc., via the associated line element
44
Thus a metric determines a geometry (in the literal sense of a prescription for mea-
suring distances etc.), but different metrics may well determine the same geometry,
namely those metrics which are just related by coordinate transformations. In particu-
lar, distances should not depend on which coordinate system is used. Hence, changing
coordinates from the {x } to new coordinates {y (x )} and demanding that
Remarks:
1. Here I have denoted the components of the metric in the new coordinates y simply
by g . Occasionally it is more convenient to use a more elaborate notation, such
as
x x = y g g
= J J g , (1.105)
which allows one to distinguish notationally specific components of the metric in 2
(the (11)-component of the metric in the
different coordinate systems, such as g11
y-coordinates) from g11 (the (11)-component of the metric in the x-coordinates).
As mentioned before, indices and other decorations are primarily bookkeeping
devices; therefore I will usually not be overly-pedantic about these things in the
following and will use whatever notation is more convenient in the case at hand.
Clearly, the inverse metric then transforms inversely, i.e. with the inverse Jacobi
matrices J , and this is now nicely compatible with the convention to denote the
inverse metric by upper indices,
g = J J g . (1.107)
This is also the rationale for writing the invese metric with upper indices: the
positioning of indices is used to indicate how an object transforms under coordinate
transformations (and we will formalise this in the discussion of section 3 on tensor
algebra).
45
3. A space-time equipped with a metric tensor g (x) is called a metric space-time
or (pseudo-)Riemannian space-time. Here Riemannian usually refers to a space
equipped with a positive-definite metric (all eigenvalues positive), while pseudo-
Riemannian (or Lorentzian) refers to a space-time with a metric with one negative
and 3 (or 27, or whatever) positive eigenvalues.
4. One point to note about the tensorial transformation behaviour is that pointwise
it is a similarity transformation in the sense of linear algebra, in matrix notation
g 7 J t gJ . (1.108)
Here are some examples of Riemannian metrics that you may already be familiar with.
Examples:
and plugging this into the Euclidean line-element dx2 + dy 2 + dz 2 , one finds the
above result.
Denoting the Cartesian coordinates by x and the spherical coordinates by y ,
with (y 1 = r, y 2 = , y 3 = ), the non-vanishing components of the metric in the
two coordinate systems are thus (using the prime notation (1.105))
g11 = g22 = g33 = 1 , g11 = 1 , g22 = r 2 , g33 = r 2 sin2 . (1.111)
Alternatively, it is often more informative (and very common) to use the coor-
dinates themselves, rather than indices, as the labels of the components of the
metric tensor. In this case one can dispense with the prime notation and simply
write the components of the metric in spherical coordinates as
46
2. Restricting the first example above to constant radius r = R, this gives us the
1 of radius R,
line-element on the circle SR
ds2 (SR
1
) = R2 d2 . (1.113)
2 of radius R,
Restricting the second to the 2-sphere SR
x2 + y 2 + z 2 = r 2 = R 2 or r=R , (1.114)
ds2 (SR
2
) = R2 (d 2 + sin2 d2 ) R2 d2 . (1.115)
Here
d2 = d 2 + sin2 d2 (1.116)
is usually called the solid angle, and we can now interpret it as the line element
on the unit 2-sphere. We will use the notation / abbrevation d2 for this line
element throughout the notes.
This example provides a nice illustration of the fact that by drawing the coordinate
grid / infinitesimal parallelograms determined by the metric tensor, one can get a
feeling for the geometry and can in particular convince onseself that in general a
metric space or space-time need not or cannot be flat, i.e. is not the flat Euclidean
space of Euclidean geometry.
Indeed, the coordinate grid of the metric d 2 + sin2 d2 cannot be drawn in
flat space because the infinitesimal parallelograms described by ds2 degenerate
to triangles not just at = 0 (as would also be the case for the flat metric
ds2 = dr 2 + r 2 d2 in polar coordinates at r = 0), but also at = . This
coordinate grid can, on the other hand, of course be drawn on the 2-sphere.
and (if required) this can be continued iteratively to yet higher-dimensional spheres.
4. If instead of the unit 2-sphere one considers the unit hyperboloid H 2 , defined
by
x2 + y 2 + z 2 = +1 x2 + y 2 z 2 = 1 , (1.118)
then this is naturally thought of as being embedded not in R3 but in R1,2 , i.e. into
the 3-dimensional vector space with line-element
47
The hyperbolic analogues (r, , ) of the spherical coordinates, defined by
x2 + y 2 z 2 = r 2 (1.121)
so that the unit hyperboloid is evidently just the surface r = 1. In these coordi-
nates, the metric (1.119) takes the form
Examples:
(i.e. the components depend only on the spatial coordinates xi , not on t).
48
3. Somewhat more generally, the spatial comopnents of the metric can depend non-
trivially on time. For example, a space-time metric describing a spatially spherical
universe with a time-dependent radius (expansion of the universe!) might be
described by the line element
ds2 = dt2 + a(t)2 d 2 + sin2 (d 2 + sin2 d2 ) , (1.127)
and more generally one can consider the corresponding generalisation of (1.126),
namely metrics of the form
This describes a space-time with spatial metric gij (x)dxi dxj and a time-dependent
radius a(t); in particular, such a space-time metric can describe an expanding
universe in cosmology. We will discuss such metrics in detail later on in the
context of cosmology, sections 32-37.
The characteristic feature of metrics with Lorentzian signature is of course the presence
of timelike and null (lightlike) directions, and thus in a pseudo-Riemannian space-time
one has the same distinction between spacelike, timelike and lightlike separations as in
Minkowski space(-time). Infinitesimal
49
and null or lightlike if g (x)V (x)V (x) = 0,
and a curve x () is called spacelike if its tangent vector is everywhere spacelike etc.
Using the definition of a vector in general relativity (to be introduced in section 3),
namely an object that transforms in the obvious way, with the Jacobi matrix, under
coordinate transformations, one sees that g (x)V (x)V (x) is a scalar, i.e. invariant
under coordinate transformations, and hence the statement that a vector is, say, space-
like is a coordinate-independent statement, as it should be.
When the metric (1.124) is (time-space) block-diagonal, i.e. when the mixed components
g0k = 0 (as in all of the above examples), then the timelike and spacelike directions are
easy to distinguish by inspection. Typically then the spatial metric gik is positive
definite, and thus necessarily g00 < 0.
When some of the g0k are non-zero, on the other hand, one has a more intricate mixing
of time- and space-directions. This can also be seen from the components of the inverse
metric. Indeed, from (1.106), one finds
In particular, this shows that in general (i.e. unless the off-diagonal components g0k are
all zero), the spatial components gik of the inverse metric are not the inverse of the
spatial components gij of the metric. Rather, using (1.131) one has
1
gij gi0 gj0 gjk = ik . (1.133)
g00
At this point the question naturally arises how one can tell whether a given (perhaps
complicated looking) metric is just the flat (Euclidean or Minkowski) metric written
in other coordinates or whether it describes a genuinely curved space-time. We will see
later that there is an object, the Riemann curvature tensor, constructed from the second
derivatives of the metric, which has the property that all of its components vanish if and
only if the metric is a coordinate transform of the flat space Minkowski metric. Thus,
given a metric, by calculating its curvature tensor one can decide if the metric is just
the flat metric in disguise or not. The curvature tensor will be introduced in section 7,
and the above statement will be established in section 10.2.
50
1.7 Geodesic Equation from the Extremisation of Proper Time
We have seen that the equation for a straight line in Minkowski space, written in arbi-
trary coordinates, is
d2 x
dx dx
+ d d = 0 , (1.134)
d 2
where the pseudo-force term is given by (1.87). We have also seen in (1.89) (pro-
vided you checked this) that can be expressed in terms of the metric (1.80) as
= 12 g (g , +g , g , ) . (1.135)
This gravitational force term is fictitious since it can globally be transformed away by
going to the global inertial coordinates a . The equivalence principle suggests, however,
that in general the equation for the worldline of a massive particle, i.e. a path that
extremises proper time, in a true gravitational field is also of the above form.
We will now confirm this by deriving the equations for a timelike path that extremises
proper time from a variational principle. These paths will be referred to as (timelike)
geodesics. We will briefly return below to the (delicate) issue to which extent these can
be regarded as world lines of actual massive particles.
Recall first of all from special relativity that the Lorentz-covariant description of the
dynamics of a massive particle is based on describing the timelike worldline of the
particle in the parametric form
a = a ( ) (1.136)
d 2 = ab d a d b . (1.137)
We can adopt the same set-up and action in the present setting. Thus we parametrise
the worldlines by
x = x ( ) , (1.141)
51
invariant under general coordinate transformations (provided that one transforms the
metric appropriately). The corresponding 4-velocity
dx
u = (1.143)
d
is again normalised as
g u u = 1 , (1.144)
Of course m drops out of the variational equations (as it should by the equivalence
principle) and we will therefore ignore m in the following.
and to write Z Z Z
dx 1/2
d = (d /d)d = (g dx
d d ) d . (1.147)
keeping the end-points fixed, and will denote the -derivatives by x ( ). By the standard
variational procedure one then finds
Z Z
1 dx dx 1/2 dx dx dx dx
d = 2 (g d d ) d g 2g
d d d d
Z h i
1
= d g , x x x + 2g x
x + 2g , x x x
2
Z h i
= + 12 (g , +g , g , )x x x
d g x (1.149)
Here the factor of 2 in the first equality is a consequence of the symmetry of the metric,
the second equality follows from an integration by parts, the third from relabelling the
indices in one term and using the symmetry in the indices of x x in the other.
= 12 g (g , +g , g , ) , (1.150)
52
Thus we see that indeed the equations for a timelike geodesic in an arbitrary gravita-
tional field are
d2 x
dx dx
+ d d = 0 . (1.152)
d 2
Remarks:
1. By definition, massive test particles are those particles that satisfy the above
geodesic equation, i.e. that follow timelike geodesics in space-time. However, it
needs to be borne in mind that this notion of a test particle is a fiction, in particular
as it neglects the backreaction, i.e. the change in the background gravitational field
due to the mass of the particle. Moreover, real particles either have a finite extent
(in which case this finite size should play a role in their equations of motion) or
are considered to be point-like. However, the notion of a point-like particle is
extremely dangerous and delicate in general relativity: as we will see later, if a
given total mass is concentrated in a sufficiently small region of space-time (and
point-like certainly qualifies as sufficiently small), then one will end up with a
black hole rather than with the description of a particle. The correct description
of point particles in general relativity is a complicated issue and an active area of
research.5
2. One can also consider spacelike paths that extremise (minimise) proper distance,
by using the action Z
S0 ds (1.153)
where
ds2 = g (x)dx dx (1.154)
is the proper distance (or arc-length in the traditional terminology of the differ-
netial geometry of curves).
One should also consider massless particles, whose worldlines will be null (or
lightlike) paths. However, in that case one can evidently not use proper time or
proper distance, since these are by definition zero along a null path, ds2 = 0. We
will come back to this special case, and a unified description of the massive and
massless case, below (section 2.1). In all cases, we will refer to the resulting paths
as geodesics. If required, we add the qualifier timelike, spacelike or null,
and this is meaningful and unambiguous since, as we will see below, a geodesic
that is initially timelike will always remain timelike etc.
We will have much more to say about geodesics and variational principles in section 2.
5
See e.g. E. Poisson, A. Pound, I. Vega, The Motion of Point Particles in Curved Spacetime,
arXiv:1102.0529 [gr-qc] for a detailed discussion and many references (but you will need to acquire
a solid understanding of tensor analysis first).
53
1.8 Christoffel Symbols and Coordinate Transformations
The Christoffel symbols play the role of the gravitational force term, and thus in this
sense the components of the metric play the role of the gravitational potential. These
Christoffel symbols play an important role not just in the geodesic equation but, as
we will see later on, more generally in the definition of a covariant derivative operator
and the construction of the curvature tensor, and thus ultimately also in the generally
covariant description of the dynamics of the gravitational field itself.
Two elementary important properties of the Christoffel symbols are that they are sym-
metric in the second and third indices,
= , = (1.155)
(this follows simply from the definition), and that symmetrising over the first pair
of indices one finds
+ = g, (1.156)
(and this follows from noting that 4 of the 6 partial derivative terms of the metric cancel
in this linear combination while 2 add up)
Knowing how the metric transforms under coordinate transformations, we can now also
determine how the Christoffel symbols (1.135) and the geodesic equation transform. A
straightforward but not particularly inspiring calculation (which you should nevertheless
do) shows that under x y the Christoffel symbols are related by
y x x y 2 x
= + , (1.157)
x y y x y y
or
= J J J + J J . (1.158)
Namely, after another not terribly inspiring calculation (which you should nevertheless
also do at least once in your life) , one finds
d2 y
dy dy
y d2 x
dx dx
+ = + . (1.159)
d 2 d d x d 2 d d
This is analogous to the result (1.90) that we had obtained before in Minkowski space,
and the same remarks about covariance and tensors etc. apply. An explicit proof of
(1.158) and (1.159) is given at the end of this subsection. A more general result along
54
these lines will be established in section 4.1 below, when we introduce the covariant
derivative of a vector field.
Remarks:
1. That the geodesic equation transforms in this simple way (namely as a vector)
should not come as a surprise. We obtained this equation as a variational equation.
The Lagrangian itself is a scalar (invariant under coordinate transformations), and
the variation x is (i.e. transforms like) a vector,
y
y = x = J x . (1.160)
x
Putting these pieces together, one finds the desired result.
3. There is of course a very good physical reason for why the force term in the
geodesic equation (quadratic in the 4-velocities) is not tensorial. This simply
reflects the equivalence principle that locally, at a point (or in a sufficiently small
neighbourhood of a point) you can eliminate the gravitational force by going to
a freely falling (inertial) coordinate system. This would not be possible if the
gravitational force term in the equation of motion for a particle were tensorial.
1. Proof of (1.158)
For partial derivatives one has the chain rule = J ( is a covector).
Therefore for the partial derivatives of the metric one has
g, = (J J g ), = g, J J J + (J
J + J J
)g . (1.161)
55
Adding up the 3 terms comprising the Christoffel symbol , one obtains
2 = g, + g, g,
= 2J J J (1.162)
+ (J J + J J
+ J J + J J
J J J J
)g .
In the last line, the 3rd term cancels against the 5th (because J is symmetric),
the 1st term cancels against the 6th (because J and g are symmetric), while
the 2nd and 4th term add up, so that one finds
= J J J + J J
g . (1.163)
Now the hard work has been done. Raising the 1st index of the Christoffel symbol,
using the inverse metric
g = g J J , (1.164)
it is now simple to see that one obtains the claimed result (4),
= g = J J J + J J
. (1.165)
For example, for the 2nd term one has (just using properties of inverse Jacobi
matrices and metrics)
g J J
g = g J J J J
g = g J J
g
(1.166)
= g J J
g = J J
= J J
2. Proof of (1.159)
The 4-velocities transform as vectors (the chain rule again), y = J x . Therefore
for the acceleration one has
y = J x
+ J
x x . (1.167)
Therefore
x + J J y y ) + J J
y + y y = J (
y y + J x x
(1.168)
= J (
x + x x ) + (J J
+ J J J )y y
The 1st term will give us the desired result, and cooperatively the 2nd term is
identically zero because (use = J again)
0 = ( ), = (J J ), = J
J J + J J . (1.169)
56
1.9 Apology and Outlook
You may feel that, after a promising start in sections 1.1 and 1.3, the things that we
have done subsequently, in particular in sections 1.4 and 1.8, look terribly messy. I
agree, indeed they are! However, I can assure you that things will improve dramatically
rather quickly and that this section 1 is by far the messiest part of the entire lecture
notes.
Indeed, the main purpose and benefit of developing tensor calculus in the next couple
of sections is to develop a formalism in the framework of which (among other things)
one can avoid having to deal explicitly with objects that transform in complicated
ways under coordinate transformations
the transformation behaviour of any object is manifest (and does not have to be
checked)
This tensor calculus formalism is simple, elegant and efficient and will then allow us
to make rapid progress towards describing the dynamics in a (and subsequently of the)
gravitational field in a way compatible with the Einstein equivalence principle.
57
2 Physics and Geometry of Geodesics
Let us first verify that S1 really leads to the same equations of motion as S0 . Either by
direct variation of the action, or by using the Euler-Lagrange equations
d L L
=0 , (2.4)
d (dx /d) x
one finds that the action is indeed extremised by the solutions to the equation
d2 x
dx dx
+ d d = 0 . (2.5)
d2
This is identical to the geodesic equation derived from S0 (with , the proper
time). This is essentially all we will need, and we will make extensive use of this simpler
Lagrangian for geodesics througout these notes.
Even though not strictly required in the following, it is nevertheless quite instructive in
its own right to try to understand and establish the precise relation between these two
actions S0 and S1 , and this is the subject of the remainder of this subsection.
Thus, what is the relation (if any) between the two actions? In order to explain this, it
will be useful to introduce an additional field e() (i.e. in addition to the x ()), and a
master action (or parent action) S which we can relate to both S0 and S1 . Consider
the action
Z Z
1 dx dx
1
S[x, e] = 2 d e() g m e() = d e()1 L 21 m2 e() (2.6)
2
d d
58
The crucial property of this action is that it is parametrisation invariant provided that
one declares e() to transform appropriately. It is easy to see that under a transforma-
tion = f (), with
x = x ()
() = f ()d
d (2.7)
the action S[x, e] is invariant provided that e() transforms such that e()d is invariant,
i.e.
e()d =! e()d e() = e()/f () . (2.8)
Indeed, this is evident when one writes the action (2.6) in the form
Z
dx dx
S[x, e] = 12 e()d e()2 g m2 (2.9)
d d
and notes that d and e() only appear in the combinations e()d and e()1 (d/d).
Now what is the relation between the action S[x, e] and the two standard actions
S0 [x] and S1 [x]?
The first thing to note now is that, courtesy of this parametrisation invariance, we
can always choose a gauge in which e() = 1. With this choice, the action S[x, e]
manifestly reduces to the action S1 [x] modulo an irrelevant field-independent con-
stant, Z Z
S[x, e = 1] = d L 21 m2 d = S1 [x] + const. . (2.10)
Alternatively, instead of fixing the gauge, we can try to eliminate e() (which
appears purely algebraically, i.e. without derivatives, in the action) by its equation
or motion. Varying S[x, e] with respect to e(), one finds the constraint
dx dx
g + m2 e()2 = 0 . (2.11)
d d
This is just the usual mass-shell condition in disguise. It suggests that a better
gauge fixing than e() = 1 would have been e() = m1 . However, the sole effect
of this would have been to replace L in (2.10) by mL,
In any case, for a massive particle, m2 6= 0, one can alternatively solve (2.11) for
e(), r
dx dx
e() = m1 g . (2.13)
d d
59
Using this to eliminate e() from the action, one finds
Z r Z
1 dx dx
S[x, e = m . . .] = m d g = m d = S0 [x] . (2.14)
d d
Thus for m2 6= 0 we find exactly the original action (integral of the proper time)
S0 [x] (and since we have not touched or fixed the parametrisation invariance, no
wonder that S0 is parametrisation invariant).
Thus we have elucidated the common origin of the actions S0 and S1 for a massive
particle.
The perspective provided by the parent action S[x, e] also gives some further insights.
For example, an added benefit of the parent action S[x, e] is that it also makes perfect
sense for a massless particle. For m2 = 0, the mass shell condition
dx dx
g =0 (2.15)
d d
says that these particles move along null lines, and the action reduces to
Z
dx dx
S[x, e] = 2 d e()1 g
1
(2.16)
d d
which is parametrisation invariant but can (as in the massive case) be fixed to e() = 1,
upon which the action reduces to S1 [x]. Thus we see that S1 [x] indeed provides a simple
and unfied action for both massive and massless particles, and in both cases the resulting
equation of motion is the (affinely parametrised) geodesic equation (2.5),
d2 x
dx dx
+ =0 . (2.17)
d2 d d
Remarks:
1. The infinitesimal form of the invariance of the action S[x, e] under (2.7) and (2.8) is
obtained by considering the infinitesimal transformation of x () and e() induced
by an infinitesimal transformation = + (),
dx ()
= () x () = ()
d (2.18)
d() de()
e() = e() + () .
d d
Here the (at first perhaps somewhat peculiar looking) transformation behaviour of
the auxiliary field e() arises from the transformation behaviour (2.8) by setting
60
and calculating (keeping at most linear terms in ())
e() = f () = (1 + ()) e( + ())
e()
= (1 + ()) ( e ())
e() + () (2.20)
= e() + ()
e() + ()
e () .
Under this infinitesimal transformation, the Lagrangian
61
3. This is as it should be: something that starts off as a massless particle will remain
a massless particle etc. If one imposes the initial condition
dx dx
g = , (2.27)
d d =0
then this condition will be satisfied for all . In particular, therefore, one can
choose = 1 for timelike (spacelike) geodesics, and can then be identified with
proper time (proper distance), while the choice = 0 sets the initial conditions
appropriate to massless particles (for which is then not related to proper time
or proper distance).
p p + m2 = 0 (2.30)
To understand the significance of how one parametrises the geodesic, observe that the
geodesic equation itself,
+ x x = 0 ,
x (2.32)
d2 x
dx dx
f dx
+ = . (2.34)
d 2 d d f2 d
62
Thus the geodesic equation retains its form only under affine changes of the proper time
parameter , f ( ) = a + b, and parameters = f ( ) related to by such an affine
transformation are known as affine parameters.
From the first variational principle, based on S0 , the term on the right hand side arises
in the calculation of (1.149) from the integration by parts if one does not switch back
from to the affine parameter . The second variational principle, based on S1 and the
Lagrangian L, on the other hand, always and automatically yields the geodesic equation
in affine form.
d2 x
dx dx
dx
+ d d = () , (2.35)
d 2 d
for some function () (the inaffinity), we can deduce that this curve is the trajectory
of a geodesic, but that it is simply not parametrised by an affine parameter (like proper
time in the case of a timelike curve). Comparison of (2.34) and (2.35) shows that, given
(), an affine parameter is determined by
f d d
(f ( )) = () = ln (2.36)
f2 d d
or R
d
= e ds (s) . (2.37)
d
In the following, whenever we talk about geodesics we will practically always have in
mind the variational principle based on S1 leading to the geodesic equation (2.17) in
affinely parametrised form.
However, it should be kept in mind that sometimes non-affine parameters appear nat-
urally. For instance, it is occasionally convenient to parametrise timelike geodesics in a
geometry with coordinates x = (x0 = t, xk ) not by x = x ( ), where is the proper
time along the geodesic, but rather as xk = xk (t). This is the same curve, but described
with respect to coordinate time (which could for instance agree with the proper time of
some other, perhaps static, observer). The curve t (t, xk (t)) will not be an affinely
parametrised curve unless t itself satisfies the geodesic equation
t = 0 t = a + b . (2.38)
One occasion where this will play a role (and from where I have borrowed the symbol
for the inaffinity) is in our discussion, much later, of the horizon of a black hole,
where the lack of a certain coordinate to be an affine parameter is directly related to
the physical properties of black holes (see section 26.9). In this context is known as
the surface gravity of a black hole.
63
2.3 Example: Geodesics in R2 in Polar Coordinates
It is high time to consider an example. We will consider the simplest non-trivial metric,
namely the standard Euclidean metric on R2 in polar coordinates. Thus the line element
is
ds2 = dx2 + dy 2 = dr 2 + r 2 d2 (2.39)
and the non-zero components of the metric are
and
grr = 1 , g = r 2 . (2.41)
respectively. Since this metric is diagonal, the non-zero components of the inverse metric
g are
g xx = g yy = 1 (2.42)
and
grr = 1 , g = r 2 (2.43)
respectively.
A reminder on notation (cf. the dsicussion leading to (1.112)): since , in g are
coordinate indices, we should really have called x1 = r, x2 = , say, and written
g11 = 1, g22 = r 2 , etc. However, writing grr etc. is more informative and useful since one
then knows that this is the (rr)-component of the metric without having to remember if
one called r = x1 or r = x2 . In the following we will frequently use this kind of notation
when dealing with a specific coordinate system, while we retain the index notation g
etc. for general purposes.
Let us now look at the geodesic equations for this metric, first in the Cartesian coordi-
nates (x, y) and then in the polar coordinates (r, ).
1. Cartesian coordinates
Since the metric in Cartesian coordinates is the constant Euclidean metric g =
, all the partial derivatives of the metric are zero, and therefore also all the
Christoffel symbols are zero. The geodesic equations thus take the form
x
= y = 0 . (2.44)
These equations could also have been obtained as the Euler-Lagrange equations
of the Lagrangian
L = 12 (x 2 + y 2 ) . (2.45)
The general solution is
64
Combining these two, one finds the standard representation
y = kx + e (2.47)
2. Polar coordinates
Now let us consider the same problem in polar coordinates. The crucial point here
is that in these coordinates the geodesic equations will not simply be r = = 0,
but that there are additional terms arising
Taking the latter point of view, the Christoffel symbols of this metric are to be
calculated from
= 12 (g, + g, g, ) . (2.48)
Since the only non-trivial derivative of the metric is g,r = 2r, only Christoffel
symbols with exactly two s and one r are non-zero,
r = g r = g rr r = r
1
r = r = g r = g r = . (2.50)
r
Note that here it was even convenient to use a hybrid notation, as in gr , where
r is a coordinate and is a coordinate index. Once again, it is very convenient to
permit oneself to use such a mixed notation.
In any case, having assembled all the Christoffel symbols, we can now write down
the geodesic equations (one again in the convenient hybrid notation). For r one
has
r + r x x = 0 , (2.51)
r r 2 = 0 . (2.52)
65
Here the factor of 2 arises because both r and r = r contribute.
Remarks:
(a) This equation is supposed to describe geodesics in R2 , i.e. straight lines. This
can be verified in general (but, in general, polar coordinates are of course not
particularly well suited to describe straight lines). However, it is easy to
find a special class of solutions to the above equations, namely curves with
= r = 0. These correspond to paths of the form
which are a special case of straight lines, namely straight lines through the
origin.
(b) The geodesic equations can of course also be derived as the Euler-Lagrange
equations of the Lagrangian
L = 21 (r 2 + r 2 2 ) . (2.55)
66
The next simplest example to discuss would be the two-sphere with its standard metric
d 2 + sin2 d2 . It will appear, in bits and pieces, in section 2.5 to illustrate the general
remarks.
As another example, let us consider the ultrastatic metrics introduced in (1.126) with
coordinates x = (t, xk ) and line-element
Because g00 = 1, g0k = 0, and the gik = gik are time-independent, all Christoffel
symbols with at least one x0 - or t-index are zero,
0 = 0 = 0 = 0 , (2.60)
and the purely spatial components of the Christoffel symbols agree with those of the
spatial metric,
i .
ijk = (2.61)
jk
t = 0 , (2.62)
x i x j x k = 0
i + (2.63)
jk
where the dot denotes a derivative with respect to the affine parameter . The first
equation tells us that
t = 0 t( ) = a + t0 . (2.64)
Thus provided that a 6= 0 we can use t instead of to parametrise the paths (and in
the present case t is then also an affine parameter, cf. the discussion in section 2.2 in
connection with (2.38)), and then one can rewrite the spatial equations as equations for
xi = xi (t),
d2 xi i dxj dxk
+ jk =0 . (2.65)
dt2 dt dt
Therefore the solutions to the space-time geodesic equations have the form
where xi (t) is an affinely parametrised geodesic for the metric gij . When a = 0, one
cannot change variables from to t because t = t0 is fixed. One is then necessarily
dealing with spacelike geodesics in space-time and the solutions have the form
x ( ) = (t0 , xi ( )) (2.67)
67
where xi ( ) is again an affinely parametrised geodesic for the metric gij .
These sorts of considerations evidently generalise to more general metrics of this direct
product form,
ds2 = gab (y)dy a dy b + gik (x)dxi dxk , (2.68)
with the conclusion that geodesics in such space-times have the form (y a ( ), xi ( )) with
y a ( ) and xi ( ) individually solutions of the geodesic equations for the metric gab (y)
respectively gik (x).
Recall from above that the geodesic equation for a metric g can be derived from the
Lagrangian L = (1/2)g x x
d L L
=0 . (2.69)
d x x
This has several immediate consequences which are useful for the determination of
Christoffel symbols and geodesics in practice.
p1 = L/ x 1 = g1 x (2.70)
Remarks:
(a) One might perhaps have wanted to argue that the definition (and interpre-
tation) of conserved momenta should be based on the physical Lagrangian
(2.1) r
dx dx
L0 = m g (2.71)
d d
R
with action S = m d , but this makes no difference since the two momenta
are essentially equal: one has
L0
= mp1 (2.72)
(dx1 /d)
68
with p1 as defined in (2.70), so that this just supplies us with the additional
information that the momenta obtained from the Lagrangian L should (for a
massive particle) be interpreted as momenta per unit mass. This discrepancy
could have been avoided by working with the Lagrangian mL (alternatively:
fixing the gauge e() = m1 in section 2.1, see (2.12)), but unless or until
one starts coupling the particle to fields other than the gravitational field it
is unnecessary (and a nuisance) to carry m around all the time.
(b) For example, on the two-sphere the Lagrangian reads
L = 21 ( 2 + sin2 2 ) . (2.73)
The angle is a cyclic variable and the angular momentum (actually angular
momentum per unit mass for a massive particle)
L
p = = sin2 (2.74)
is a conserved quantity. This generalises to conservation of angular momen-
tum for a particle moving in an arbitrary spherically symmetric gravitational
field.
(c) Likewise, if the metric is independent of the time coordinate x0 = t, the
corresponding conserved quantity
p0 = g0 x E (2.75)
has the interpretation as minus the energy (per unit mass) of the particle,
minus because, with our sign conventions, p0 = E in special relativity.
We will discuss the relation between this notion of energy and the notion
of energy familiar from special relativity (this requires an asymptotically
Minkowski-like metric) in more detail in section 24.1.
(d) We will discuss in more detail in section 2.6 (and then again in sections
8 and 9) how to detect and describe symmetries and conserved charges in
coordinate systems in which the symmetries are not as manifest (via cyclic
variables) as above.
L = 21 (y 2 + g x x ) , (2.76)
y 21 g ,y x x = 0
+ terms proportional to x = 0 .
x (2.77)
69
Therefore x = 0, y = 0 is a solution of the geodesic equation, and it describes
motion along the coordinate lines of y.
Remarks:
(a) In the case of the two-sphere, with its metric ds2 = d 2 + sin2 d2 , this
translates into the familiar statement that the great circles, the coordinate
lines of y = , are geodesics.
(b) The result is also valid when y is a timelike coordinate. For example, consider
a space-time with coordinates (t, xi ) and metric (1.128)
+ x x = 0
x (2.80)
Remarks:
x x = 11 (x 1 )2 + 212 x 1 x 2 + . . . (2.81)
(b) For example, once again in the case of the two-sphere, for the -equation one
has
d L L
= , = sin cos 2 . (2.82)
d
Comparing the resulting Euler-Lagrange equation
70
with the geodesic equation
+ 2 + 2 + 2 = 0 , (2.84)
Likewise, from
d L L
=0 sin2 ( + 2 cot )
=0 (2.86)
d
= = cot , = = 0 . (2.87)
As will be discussed in section 23.2, this is the general form of a static spher-
ically symmetric metric, and as such will provide us with the starting point
for describing the gravitational field of a star. The corresponding Lagrangian
is
L = 21 A(r)t2 + B(r)r 2 + r 2 ( 2 + sin2 2 ) , (2.89)
(a prime denoting an r-derivative), from which one can immediately read off
A
trt = ttr = , t = 0 otherwise . (2.91)
2A
Likewise, the equation for r takes the form
B 2 A 2
r + r + t + = 0 , (2.92)
2B 2B
and from this one can read off that
B A
rrr = , rtt = , ... (2.93)
2B 2B
As we will need them anyway in section 23.3, it is a good exercise to determine
all the Christoffel symbols in this way.
71
2.6 Conserved Charges and (a first encounter with) Killing Vectors
In the previous section we have seen that cyclic coordinates, i.e. coordinates the metric
does not depend on, lead to conserved charges, as in (2.70). As nice and useful as this
may be (and it is nice and useful), it is obvioulsy somewhat unsatisfactory because it
is an explicitly coordinate-dependent statement: the metric may well be independent
of one coordinate in some coordinate system, but if one now performs a coordinate
transformation which depends on that coordinate, then in the new coordinate system
the metric will typically depend on all the new coordinates. Nevertheless,
thus there should be a corresponding first integral of the geodesic equation in any
coordinate system.
To see how this works, let us reconsider the situation discussed in the previous section,
namely a metric which in some coordinate system, we will now call it {y }, has com-
ponents g which are independent of y 1 , say. Translation invariance of the geodesic
Lagrangian is the statement that the Lagrangian is invariant under the infinitesimal
variation y 1 = , y = 0 otherwise, and via Noethers theorem this leads to a con-
served charge g1 y , as in (2.70).
Now we ask ourselves what this statement corresponds to in another coordinate system.
Note that in the y-coordinates, invariance is the statement that the metric is invariant
under the (infinitesimal) coordinate transformation y 1 y 1 + or y 1 = , y = 0
otherwise,
g y1 g = 0 . (2.94)
It is then clear that in another coordinate system, infinitesimal y 1 -translations must also
correspond to some infinitesimal coordinate transformation (but not necessarily just a
translation),
x = V (x) . (2.95)
In particular, if (as in the above example) in y-coordinates V has the components
V 1 = 1, V = 0 otherwise, then in any other coordinate system one has
x = (x /y )y = (x /y 1 ) (2.96)
so that
V = J1 (2.97)
is just the corresponding column of the Jacobi matrix.
72
1. We can investigate directly, under which conditions on the V the transformation
(2.95) leads to an invariance of the Lagrangian (2.2). Using
x = x V (2.98)
where
V g = V g + ( V )g + ( V )g (2.100)
Thus the condition for the infinitesimal transformation (2.95) to leave the La-
grangian invariant is
V g = 0 . (2.101)
QV = p V = g V x . (2.102)
Note that for constant components V , (2.101) is simply the statement that the
metric is constant in the direction V , V g = 0.
g = J J g , y1 = (y1 x ) J1 V (2.103)
J1 J = 1 J = J1 = V = J V (2.105)
All of this may seem a bit ham-handed at this point, and indeed it is. However, we will
see later how these results can be written and understood in a much more pleasing and
covariant way. In particular, we will see in section 4.5 how to write (2.100) in a way
that makes it completely manifest that it transforms like the metric under coordinate
transformations. Moreover, we will discover in section 8 that (2.100) is a special case of
73
the Lie derivative of a tensor field along a vector field V , denoted by LV . Continuous
symmetries of a metric correspond to vector fields along which the Lie derivative of the
metric vanishes. Such vectors are known as Killing vectors, and are thus vectors V
satisfying the Killing equation (2.101),
LV g V g = 0 . (2.106)
We saw that the 10 components of the metric g appear to play the role of potentials for
the gravitational force. In order to substantiate this, and to show that in an appropriate
limit this setting is able to reproduce the Newtonian results, we now want to find the
relation of these potentials to the Newtonian potential, and the relation between the
geodesic equation and the Newtonian equation of motion for a particle moving in a
gravitational field.
First let us determine the conditions under which we might expect the general relativistic
equation of motion (namely the non-linear coupled set of partial differential geodesic
equations) to reduce to the linear equation of motion
d2 ~
~x = (2.107)
dt2
of Newtonian mechanics, with the gravitational potential, e.g.
GN M
= . (2.108)
r
Thus we are trying to characterise the circumstances in which we know and can trust
the validity of Newtons equations, such as those provided e.g. by the gravitational field
of the earth or the sun, the gravitational fields in which Newtons laws were discovered
and tested. Two of these are fairly obvious:
1. Weak Fields: our first plausible assumption is that the gravitational field is in a
suitable sense sufficiently weak. We will need to make more precise by what we
mean by this, and we will come back to this below.
2. Slow Motion: our second, equally reasonable and plausible, assumption is that the
test particle moves at speeds at which we can neglect special relativistic effects, so
slow should be taken to mean that its velocity is small compared to the velocity
of light.
Interestingly, it turns out that one more condition is required. Note that the gravita-
tional fields we have access to are not only quite weak but also only very slowly varying
in time, and we will add this condition,
74
3. Stationary Fields: we will assume that the gravitational field does not vary sig-
nificantly in time (over the time scale probed by our test particle).
The very fact that we have to add this condition in order to find Newtons equations
(as will be borne out by the calculations below) is interesting in its own right, because
it also shows that general relativity predicts phenomena deviating from the Newtonian
picture even for weak fields, provided that they vary sufficiently rapidly (e.g. quickly
oscillating fields), and one such phenomenon is that of gravitational waves (see section
22).
Now, having formulated in words the conditions that we wish to impose, we need to
translate these conditions into equations that we can then use in conjunction with the
geodesic equation.
1. In order to define a notion of weak fields, we need to keep in mind that this is
not a coordinate-independent statement since we can simulate arbitrarily strong
gravitational fields even in Minkowski space by going to suitably accelerated co-
ordinates, and therefore a weak field condition will be a condition not only on
the metric but also on the choice of coordinates. Thus we assume that we can
choose coordinates {x } = {t, xi } in such a way that in these coordinates the
metric differs from the standard constant Minkowski metric only by a small
amount,
g = + h (2.109)
2. The second condition is obviously (with the coordinates chosen above) dxi /dt 1
or, expressed in terms of proper time,
dxi dt
. (2.110)
d d
(for a discussion and explanation of the difference betwen the term stationary
used here and the term static used e.g. to describe the metric (2.88), see section
15.4 - it is not crucial here).
+ x x = 0 .
x (2.112)
75
From the decomposition g = +h we see that is at least linear in h , and by
the weak field condition (condition 1) we will only retain the terms linear in h . Then
the condition of slow motion (condition 2) implies that among the quadratic terms x x
we need to only retain the leading term, namely tt. Thus the geodesic equation can be
approximated by
x + 00 t2 = 0 . (2.113)
From the weak field condition (condition 1), which allows us to write
g = + h g = h , (2.115)
where
h = h , (2.116)
we learn that
00 = 12 i i h00 , (2.117)
t = 0
i =
x 1 2
2 i h00 t . (2.119)
The first of these just says that t is constant, or that t is also an affine parameter,
t( ) = a + b . (2.120)
In other words, in the Newtonian limit there is essentially (up to a choice of scale/units)
no difference between coordinate time and proper time. We can use this in the second
equation to convert the -derivatives into derivatives with respect to the coordinate
time t,
1 d2 1 d 1 d d2
t = 0 = = . (2.121)
t2 d 2 t d t d dt2
Hence we obtain
d2 xi
= 21 h00 ,i (2.122)
dt2
(the spatial index i in this expression is raised or lowered with the Kronecker symbol,
ik = ik ). Comparing this with the Newtonian equation (2.107),
d2 xi
= ,i (2.123)
dt2
76
leads us (with the constant of integration absorbed into an arbitrary constant term in
the gravitational potential) to the key identification
h00 = 2 (2.124)
between the Newtonian gravitational potential and the (00)-component of the deviation
of the space-time metric from the Minkoski metric. By relating this back to g ,
g00 = (1 + 2) . (2.125)
we find the sought-for relation between the Newtonian potential and the space-time
metric. Thus Newtonian gravity can be captured or described by a space-time metric
of the form
ds2 = (1 + 2(~x))dt2 + d~x2 . (2.126)
For a radial gravitational field, with = (r), it is also natural to write this in terms
of spatial spherical coordinates as
Remarks:
1. With the speed of light not set equal to c = 1, the dimensionally correct form
of this identification is (recall that kinetic and potential energy have the same
dimension so that the dimension of , the energy per unit mass, is that of a
velocity-squared; thus /c2 is dimensionless)
2. For the gravitational field of isolated systems, it makes sense to choose the in-
tegration constant in such a way that the potential goes to zero at infinity, and
this choice also ensures that the metric approaches the flat Minkowski metric at
infinity.
3. Restoring the appropriate units, in particular the above factor of c2 , one finds
that the dimensionless factor /c2 109 on the surface of the earth, 106 on
the surface of the sun (see section 23.4 for some more details), so that the distor-
tion in the space-time geometry produced by gravitation is in general quite small
(justifying our approximations).
77
5. Likewise, in this approximation it does not make sense to inquire about the other
subleading components of the metric. As we have seen, a slowly moving particle
in a weak static gravitational field is not sensitive to them, and hence can also not
be used to probe or determine these components.
with
c2 (x) = (1 + 2(x)/c2 )c2 . (2.130)
Einstein realised fairly early on (1911) in his search for a relativistic theory of
gravity that this would have to be part of the story. However, this interpretation
is neither useful nor tenable when considering gravitational fields beyond the static
Newtonian approximation (which requires one to go beyond a theory with a single
scalar potential).
7. Later on, we will determine the exact solution of the Einstein equations (the field
equations for the gravitational field, i.e. for the metric) for the gravitational field
outside a spherically symmetric mass distribution with mass M (the Schwarzschild
metric). The metric turns out to have the simple form (23.28)
2 2GN M 2 2 2GN M 1 2
ds = 1 c dt + 1 dr + r 2 d2 . (2.131)
c2 r c2 r
From this expression one can read off that the leading correction to the flat metric
indeed arises from the 00-component of the metric,
2GN M 2
ds2 c2 dt2 + dr 2 + r 2 d2 + dt + . . .
r (2.132)
2GN M
= dx dx + (dx0 )2 + . . . .
rc2
This is indeed precisely of the above Newtonian form, with the standard Newto-
nian potential
GN M
(r) = . (2.133)
r
One can then also determine the subleading (known as post-Newtonian) correc-
tions to the general relativistic gravitational field, which are evidently suppressed
by additional inverse powers of c2 .
8. The key relation (2.124) can also be obtained at the level of the action. Starting
with the action S0 (the integral of the proper time), and using the time-coordinate
78
t as the parameter, using the same approximations as above one finds that the
action can be written as (keeping c explicit for a change, so that x0 = ct)
Z q
S0 [x] = mc dt g (dx /dt)(dx /dt)
Z q
= mc dt (dx /dt)(dx /dt) h (dx /dt)(dx /dt)
Z q (2.134)
= mc dt c2 ik (dxi /dt)(dxk /dt) h00 c2
Z p
= mc2 dt 1 ~v 2 /c2 h00 .
Expanding the square root and dropping the first (irrelevant) term, one finds that
in this limit the action reduces to
Z
m 2 mc2
S0 [x] dt ~v + h00 (2.135)
2 2
In this compact (but slightly dubious) derivation of this relation, the significance
of the stationarity condition is not manifest: it enters through the condition of
the equivalence of the 4-dimensional and 3-dimensional variational principles (with
respect to the fields x ( ) and xk (t) respectively), guaranteed by the affine relation
between t and implied by requiring in addition stationarity.
In section 1.3 we had discussed the Minkowski metric in Rindler coordinates, i.e. in
coordinates adapted to a constantly accelerating observer. For an observer accelerating
in the x1 -direction, the metric took the form (1.68),
with ~y = (x2 , x3 ) denoting the transverse spectator coordinates (which will again be
suppressed in the following).
What is the relation, if any, between this metric and the metric describing a weak
gravitational field, as derived above (after all, small accelerations should mimic weak
gravitational fields)? At first sight, the only thing they appear to have in common
79
is that the departure from what would be the Minkowski-metric in these coordinates
is encoded in the time-time component of the metric, 2 in one case, (1 + 2) in the
other, but apart from that 2 and (1 + 2) look quite different. This difference is,
however, again a coordinate artefact and the Rindler metric can be made to look like
the weak-field metric with the help of a suitable further redefinition of the coordinates.
For starters, it will be convenient, for this purpose and for a generalisation which we
will discuss below, to introduce the acceleration a explicitly into the coordinates by
redefining the coordinate transformation (1.67) to (I will now also call the Minkowski
coordinates 0 = t and 1 = x)
(so this differs by /a, a from the transformation given in (1.67)). Thus, now
it is the observer at = 1 who has acceleration a and whose proper time is = . The
Rindler metric now has the form
Now the transformation = 1+a x (reminding us that we are talking about acceleration
1
in the x = x direction), leads to
ds2 = (1 + a
x)2 d 2 + d
x2 (2.141)
x)2 1 + 2a
(1 + a x 1 + 2(
x) , (2.142)
Remarks:
1. Remarkably, this same form of the metric remains valid for an arbitrary time-
dependent acceleration a = a( ), and thus is capable of reproducing the weak
field form of the metric for general potentials. To see this, consider the wordline
(t( ), x( )) with general 4-velocity (actually 2-velocity in this case)
u0 = t(
) = cosh v( ) , u1 = x(
) = sinh v( ) , (2.143)
which satisfies
(u0 )2 (u1 )2 = 1 , (2.144)
as it should, and has the time-dependent acceleration a( ) = v(
),
(u 1 )2 (u 0 )2 = x
2 t2 = v(
)2 a( )2 . (2.145)
80
We can pass to adapted coordinates (, x), as above, by setting
e 2a 1 + 2a . (2.149)
For the record, and for later use, we note that the complete coordinate transfor-
mation between the Minkowski coordinates (t, x) and the conformally flat Rindler
coordinates (, ) is
= a(tx + xt ) . (2.151)
This is a boost in the (t, x)-plane, but the limit a 0 appears to be singular.
3. A simple and useful way to rectify this is to introduce a further constant shift of x,
x x 1/a, into the 1-parameter family (2.150) of coordinate transformations,
a0 t , x . (2.153)
= t + a(tx + xt ) (2.154)
81
4. In terms of the Minkowski null (or advanced and retarded time) coordinates
or, compactly,
Note that the range of the coordinates is < , < + or < uR , vR <
+, and that the coordinates (, ) or (uR , vR ) cover (and can be used in) the
right-hand quadrant x > |t| of Minkowski space-time, corresponding to <
uM = t x < 0 and 0 < vM = t + x < +, the so called (right) Rindler wedge.
As we will see in section 6.8, these null Rindler coordinates are particularly useful
for studying the solutions of the scalar wave equation in the Rindler wedge.
Let me close this section with some comments on other versions of (3 + 1)-dimensional
Rindler space. First of all, instead of looking at acceleration in the x1 -direction, say,
one can consider radial accelerations. To that end one first writes the metric in spatial
spherical coordinates,
ds2 = dt2 + dr 2 + r 2 d22 , (2.159)
r 2 t2 = 2 (2.162)
82
r 2 t2 < 0 and covering precisely the interior of the lightcone) is the so-called Milne
metric to be discussed in section 36.1.
(an analogous shift x x x0 for acceleration in the x-direction would have had no
effect on the metric since such a translation is a symmetry of the Minkowski metric,
whereas a translation in the radial direction is not). This form of the metric is adapted
to the hyperboloids
(r r0 )2 t2 = 2 (2.165)
and now describes radially accelerating observers, each one asymptotically approaching
the radial lightray emanating from a distance r0 from the origin (and correspondingly
the region of space-time covered by these coordinates is the complement of the past and
future of the 2-sphere of radius r0 at the origin, a hole in spacetime).
Following Einstein, the gravitational redshift (i.e. the fact that photons lose or gain
energy when rising or falling in a gravitational field) is usually presented as a direct
consequence of the Einstein Equivalence Principle (and is therefore also said to provide
an experimental test of the Einstein Equivalence Principle itself). It can indeed be
derived in this way (see Remark 2 at the end of this section for one such argument, albeit
not the original one). However, here we will derive this effect within the framework that
we have already adopted, inspired by the equivalence principle, namely in terms of the
description of the gravitational field by a metric.
This has several advantages. It allow us to further familiarise ourselves with the formal-
sim and to illustrate how to extract physical effects from our description of lightrays as
null geodesics (much as we employed timelike geodesics above to study the Newtonian
limit). Moreover, it allows us to derive formulae for this effect in quite some generality
and I will actually give 3 different derivations in increasing order of generality. In con-
junction with the Newtonian approximation to the gravitational field these then reduce
to the result in the form in which it is usually presented, e.g. as in (2.191) or (2.192)
(and as then rederived on the basis of the equivalence principle in (2.196)).
6
V. Balasubramanian, B. Czech, B. Chowdhury, J. de Boer, The entropy of a hole in spacetime,
arXiv:1305.0856 [hep-th], V. Balasubramanian, B. Chowdhury, B. Czech, J. de Boer, M. Heller, A
hole-ographic spacetime, arXiv:1310.4204 [hep-th].
83
To set the stage, note that it it is manifest from the expression
d 2 = g (x)dx dx (2.166)
for the proper time that e.g. the rate of clocks is affected by where one is in a gravita-
tional field. However, as by the unviversality of gravity everything is (and in particular
all ideal clocks are) affected in the same way by gravity, it is impossible to measure this
effect locally, at a fixed point in a gravitational field. In order to find an observable
effect, one needs to compare data from two different points in a gravitational potential.
The situation we could consider is that of two observers A and B moving on worldlines
(paths) A and B , A sending light signals to B. In general the frequency, measured
in the observers rest-frame at A (or in a locally inertial coordinate system there) will
differ from the frequency measured by B upon receiving the signal.
In order to separate out Doppler-like effects due to relative velocities, we consider two
observers A and B at rest radially to each other, at radii rA and rB , in a static spherically
symmetric gravitational field. This means that the metric depends only on a radial
coordinate r and we can choose it to be of the form
where d2 is the standard volume element on the two-sphere (see section 23 for a more
detailed justification of this ansatz for the metric).
Observer A sends out light of a given frequency A , say n pulses per proper time unit
A . Observer B receives these n pulses in his proper time B and interprets this
as a frequency B . Thus the relation between the frequency A emitted at A and the
frequency B observed at B is
A B
= . (2.168)
B A
I will now give two arguments to show that this ratio depends on the metric (i.e. the
gravitational field) at rA and rB through
1. The first argument is essentially one based on geometric optics (and is best ac-
companied by drawing a (1+1)-dimensional space-time diagram of the light rays
and worldlines of the observers).
The geometry of the situation dictates that the coordinate time intervals recorded
at A and B are equal, tA = tB as nothing in the metric actually depends on
84
t. In equations, this can be seen as follows. First of all, the equation for a radial
light ray is
g00 (r)dt2 = grr (r)dr 2 , (2.170)
or 1/2
dt grr (r)
= . (2.171)
dr g00 (r)
From this we can calculate the coordinate time for the light ray to go from A to
B. Say that the first light pulse is emitted at point A at time t(A)1 and received
at B at coordinate time t(B)1 . Then
Z rB
t(B)1 t(A)1 = dr(grr (r)/g00 (r))1/2 (2.172)
rA
The right hand side obviously does not depend on t, so we also have
Z rB
t(B)2 t(A)2 = dr(grr (r)/g00 (r))1/2 (2.173)
rA
where t2 denotes the coordinate time for the arrival of the n-th pulse. Therefore,
or
t(A)2 t(A)1 = t(B)2 t(B)1 , (2.175)
as claimed. Thus the coordinate time intervals recorded at A and B between the
first and last pulse are equal. However, to convert this to proper time, we have to
multiply the coordinate time intervals by an r-dependent function,
dx dx 1/2
A,B = (g (rA,B ) ) tA,B , (2.176)
dt dt
and therefore the proper time intervals will not be equal. For observers at rest,
dxi /dt = 0, one has
A,B = (g00 (rA,B ))1/2 tA,B . (2.177)
Since tA = tB , (2.169) now follows from (2.168).
2. The second argument uses the null geodesic equation, in particular the conserved
quantity associated to time-translations (recall that we have assumed that the
metric (2.167) is time-independent), as well as a somewhat more covariant looking,
but equivalent, notion of frequency.
First of all, let the light ray be described the wave vector k . In special relativity,
we would parametrise this as k = (, ~k) with = 2 the frequency. This is the
frequency observed by an inertial observer at rest, with 4-velocity u = (1, 0, 0, 0).
A Lorentz-invariant, and thus in our context now coordinate-independent, notion
of the frequency as measured by an observer with velocity u is thus
= u k . (2.178)
85
This includes as special cases the relativistic Doppler effect (where one compares
with = u k , u
the tangent to the world line of a boosted observer), as
well as the gravitational redshift we want to discuss here.
A static observer in the spherically-symmetric and static gravitational field (2.167)
is described by the 4-velocity
(and likewise for the observer at r = rB ). The wave vector k is a null tangent
vector, k k = 0, to a null geodesic corresponding to the Lagrangian
Since the metric is time-independent, there is (cf. the discussion in section 2.5)
the corresponding conserved quantity
L
E= = g00 (r)t (2.182)
t
(the minus sign serving only to make this quantity positive for t > 0). Then one
finds that the frequency measured by the static observer at r = rA is
3. The above derivation is not completely general, and still not completely covari-
ant, because we used the explicit form of the metric (which is the general form
of a metric with a time-translation invariance in spherical symmetry, but not in
general). We can improve this somewhat by using the more general characteri-
sation of time-translation invariance in terms of Killing vectors (section 2.6) and
the associated conserved charge (2.102).
Thus assume that we have a timelike Killing vector V . Then by definition a
static observer is one whose 4-velocity u is proportional to V ,
u V . (2.184)
86
For V = t this evidently reduces to the statement that only t changes along the
worldline, i.e. that the oberver remains at fixed values of the spatial coordinates,
and this is the sense in which we have informally used the term static observer
so far. Denoting the norm of V by
V = (V V )1/2 , (2.185)
Given the null wave vector k , we have the conserved energy (2.102),
E = k V . (2.187)
Since E is constant along the lightray, frequencies observed by two different static
observers are related by
A VB
= . (2.189)
B VA
For this reason, the norm V is also known as the redshift factor associated with a
timelike Killing vector.
Note that this result reduces to (2.169) if the metric has the form (2.167) and
V = t since then
Having derived (2.169) in 3 different ways, let us now look at what the result tells us in
specific situations of interest. Since on earth and in the solar system we only have access
to gravitational fields that are to a reasonably high degree of precision well described
by Newtonian gravity, we can use the Newtonian approximation (2.125). The (2.169)
becomes
A
g00 = (1 + 2) 1 + (rB ) (rA ) , (2.191)
B
or, with (r) = GN M/r,
A B GN M (rB rA )
= (2.192)
B rA rB
Thus for rB > rA one has
87
so that, as expected, a photon loses energy when rising in (and against the pull of) a
gravitational field, and conversely one has the gravitational blueshift effect
Remarks:
1. Note that the general result (2.169) depends only on the value of the gravitational
field at the points rA and rB , not on the gravitational field inbetween. This
reinforces the interpretation that the gravitational redshift is only due to the
different rate of clocks / proper time at the positions rA and rB , and not due to the
fact that something happens to the lightray as it travels through a gravitational
field (which should lead to a cumulative effect depending also on the intermediate
gravitational field).
2. The result (2.169) can also be deduced from energy conservation. A local inertial
observer at the emitter A will see a change in the internal mass of the emitter
mA = hA when a photon of frequency of A is emitted. Likewise, the absorber
at point B will experience an increase in inertial mass by mB = hB , but the
total internal plus gravitational potential energy must be conserved. Thus
leading to
A 1 + (rB )
= 1 + (rB ) (rA ) , (2.196)
B 1 + (rA )
as before. This derivation (in quotes, because we are wildly mixing Newto-
nian gravity, special relativity and quantum mechanics - do take this derivation
with an appropriately sized grain of salt, please) shows that gravitational redshift
experiments test the Einstein Equivalence Principle in its strong form, in which
the term laws of nature is not restricted to mechanics (inertial = gravitational
mass), but also includes quantum mechanics in the sense that it tests if in an
inertial frame the relation between photon energy and frequency is unaffected by
the presence of a gravitational field.
3. While difficult to observe directly (by looking at light form the sun), this predic-
tion has been verified in the laboratory, first by Pound and Rebka (1960), and
subsequently, with one percent accuracy, by Pound and Snider in 1964 (using the
Mossbauer effect).
Let us make some rough estimates of the expected effect. We first consider light
reaching us (B) from the sun (A). In this case, we have rB rA , where rA is the
88
radius of the sun, and (also inserting a so far suppressed factor of c2 ) we obtain
A B GN M (rB rA ) GN M
= 2
2 . (2.197)
B c rA rB c rA
Using the approximate values
rA 0.7 106 km
Msun 2 1033 g
GN 7 108 g1 cm3 s2
GN c2 7 1029 g1 cm = 7 1034 g1 km , (2.198)
one finds
2 106 . (2.199)
In principle, such a frequency shift should be observable. In practice, however, the
spectral lines of light emitted by the sun are strongly effected e.g. by convection in
the atmosphere of the sun (Doppler effect), and this makes it difficult to measure
this effect with the required precision.
In the Pound-Snider experiment, the actual value of / is much smaller. In
the original set-up one has rB rA 20m (the distance from floor to ceiling of
the laboratory), and rA = rearth 6.4 106 m, leading to
2.5 1015 . (2.200)
However, here the experiment is much better controlled, and the gravitational
redshift was verified with 1% accuracy.
Central to our initial discussion of gravity was the Einstein Equivalence Principle which
postulates the existence of locally inertial (or freely falling) coordinate systems in which
locally at (or around) a point the effects of gravity are absent. Now that we have decided
that the arena of gravity is a general metric space-time, we should establish that such
coordinate systems indeed exist. Looking at the geodesic equation, it it is clear that
at least in this context absence of gravitational effects is tantamount to the existence
of a coordinate system { a } in which at a given point p the metric is the Minkowski
metric, gab (p) = ab and the Christoffel symbols are zero, abc (p) = 0,
89
the latter condition is equivalent to gab , c (p) = 0. I will sketch three arguments estab-
lishing the existence of such coordinate systems, each one having its own virtues and
providing its own insights into the issue.
Actually it is physically plausible (and fortuitously moreover true) that one can always
find coordinates which embody the equivalence principle in the stronger sense that the
metric is the flat metric ab and the Christoffel symbols are zero not just at a point but
along the entire worldline of an inertial (freely falling) observer, i.e. along a geodesic ,
Such coordinates, based on a geodesic rather than on a point, are known as Fermi
normal coordinates. The construction is similar to that of Riemann normal coordinates
(based at a point) to be discussed below.7
1. Direct Construction
We know that given a coordinate system { a } that is inertial at a point p, the
metric and Christoffel symbols at p in a new coordinate system {x } are deter-
mined by (1.80,1.87). Conversely, we will now see that knowledge of the metric
and Christoffel symbols at a point p is sufficient to construct a locally inertial
coordinate system at p.
We will construct this coordinate system a = a (x) locally around the point p
(with coordinates x0 , say, in the original coordinate system) by a Taylor series
expansion,
a (x) = da + (x x0 ) ea + 12 (x x0 ) (x x0 ) f
a
+ ... . (2.204)
Here
da = a (x0 ) 0a (2.205)
are the (arbitrary) coordinate values of the point p in the new coordinates a ,
a
ea = (x0 ) (2.206)
x
is the Jacobi matrix of the coordinate transformation at x = x0 , and
a 2a
f = (x0 ) (2.207)
x x
is its 1st derivative at x0 .
7
Most discussions of Fermi coordinates in the literature follow the presentation given in F. Menasse,
C. Misner, Fermi normal coordinates and some basic concepts in differential geometry, J. Math. Phys.
4 (1963) 735-745; for a geometrically transparent treatment see also section 1.11 of E. Poisson, A
Relativists Toolkit; Fermi coordinates for null geodesics are constructed in M. Blau, D. Frank, S.
Weiss, Fermi Coordinates and Penrose Limits, arXiv:hep-th/0603109.
90
Form the tensorial transformation behaviour of the metric we know that
ea ea = , ea eb = ba , (2.210)
we see that the inverse matrix diagonalises (and scales) the metric at the point p
in such a way that
g (x0 )ea eb = ab . (2.211)
Since g (x0 ) is a symmetric non-degenerate matrix, such matrices always exist
(and are unique up to similarity transformations that leave ab invariant, i.e. up
to Lorentz transformations). The notation ea and ea reflects the fact that these
matrices are the components of an orthonormal vierbein (or vielbein) at the point
p, which are traditionally denoted this way (cf. the discussion in section 3.8 below).
Taking stock, we see that the condition gab (p) = ab determines the coordinate
system to 1st order in a Taylor series expansions, up to translations (the choice of
da ) and Lorentz transformations, i.e. up to Poincare transformation.
We now turn to the 2nd condition characterising a locally inertial coordinate
system, namely abc (p) = 0. We can write the inhomogeneous transformation
behaviour of the Christoffel symbols as
a b
a
c 2a
= + . (2.212)
x bc
x x x x
Thus at the point p we have
Requiring abc (p) = 0 now uniquely determines the 2nd order Taylor coefficients,
abc (0 ) = 0 a
f = ea (x0 ) . (2.214)
Thus to 2nd order in a Taylor series expansion, the transformation from arbitrary
coordinates x to inertial coordinates a at the point p is given by
91
We have therefore established that for an arbitrary point p in an arbitrary gravi-
tational field one can always introduce local coordinates which are inertial at that
point, and that up to 2nd order in a Taylor series expansion such a coordinate
system is unique up to Poincare transformations.
Since this leaves the infinite number of higher-order terms of the Taylor expansion
undetermined, this shows that inertial coordinate systems are highly non-unique,
and raises the following questions:
Can one continue in this vein and choose the (so far undetermined) higher-
order terms in the Taylor expansion such that also e.g. the 2nd derivatives
of the metric at p are equal to zero,
abc b c = 0 . (2.218)
92
way that gab (p) = ab (by choosing the four directions at p to be orthonormal unit
vectors).
Before turning to the more detailed construction, let us look at an example. Con-
sider the standard metric ds2 = d 2 + sin2 d2 on the two-sphere. Any point
is as good as any other point, and one can construct an inertial coordinate sys-
tem at the north pole = 0 in terms of geodesics shot off from the north pole
into the = 0 ( 1 ) and = /2 ( 2 ) directions. The affine parameter along
a great circle (geodesic) connecting the north pole to a point (, ) is , and
thus is also the geodesic distance, and the coordinates of the point (, ) are
( 1 = cos , 2 = sin ). In particular, the north pole is the origin 1 = 2 = 0.
Note that one could have guessed these coordinates from the fact that near = 0
the metric is d 2 + 2 d2 , which is the Euclidean metric in polar coordinates
( cos , sin ).
Calculating the metric in these new components, using
and thus
1 d 1 + 2 d 2 1 d 2 2 d 1
d = p , d = , (2.220)
( 1 )2 + ( 2 )2 ( 1 )2 + ( 2 )2
one finds
d 2 + sin2 d2 = (d 1 )2 + (d 2 )2 + O( 2 d 2 ) , (2.221)
i.e.
gab () = ab + O( 2 ) . (2.222)
Therefore
gab ( = 0) = ab , gab,c ( = 0) = 0 , (2.223)
as required.
We now (re)turn to the general construction of such coordinates, starting with the
geodesic equation
x + x x = 0 . (2.224)
We consider geodesics passing through (or emanating from) the point p with co-
ordinates x0 at = 0, and with initial 4-velocity u0 ,
x ( = 0) = x0 , x ( = 0) = u0 . (2.225)
( = 0) = (x0 )u0 u0 .
x (2.226)
Hence in a Taylor expansion around = 0 we can write the solution to the geodesic
equation as
x ( ) = x0 + u0 21 2 (x0 )u0 u0 + . . . . (2.227)
93
We can expand the (arbitrary) initial 4-velocity u0 in terms of 4 linearly indepen-
dent (and orthonormal, say) vectors at p as
We can then think of the Taylor expansion (2.227) as defining a coordinate trans-
formation
(c) From the present point of view, the 2nd condition arises from the fact (men-
tioned above) that in these coordinates the geodesic equation for the above
geodesics reduces to
as claimed.
(d) In contrast to the previous construction leading to (2.216), here the higher-
order terms in the Taylor expansion of the coordinate transformation are now
determined by the higher-order terms in the Taylor expansion of the solution
(2.227) of the geodesic equation. These higher-order terms will depend on
2nd and higher derivatives of the metric g (x) at x0 , and these in turn will
94
also determine the quadratic and higher terms of the Taylor expansion of the
metric in these coordinates,
gab () = gab (0 ) + ( 0 )c gab , c (0 ) + 12 ( 0 )c ( 0 )d gab , cd (0 ) + . . .
= ab + 12 ( 0 )c ( 0 )d gab , cd (0 ) + . . . .
(2.235)
We will determine the quadratic term in this expansion (expresed in terms
of the Riemann curvature tensor) in section 7.9.
3. A Numerological Argument
This is my favourite argument because it requires no calculations and at the same
time provides additional insight into the nature of curved space-times.
Assuming that the local existence of solutions to differential equations is guaran-
teed by some mathematical theorems, it is frequently sufficient to check that one
has enough degrees of freedom to satisfy the desired initial conditions (one may
also need to check integrability conditions). In the present context, this argument
is useful because it also reveals some information about the true curvature hidden
in the second derivatives of the metric. It works as follows:
95
Again this turns out to agree with the number of independent components (7.25)
of the curvature tensor in D dimensions.
96
Note: At this point in the course I find it useful to develop in parallel (and suggest to
read in parallel)
the more formal material on tensor analysis in sections 3, 4, 5, 6, 7 and 10, say
(and then moving on to the Einstein equations themselves)
and a detailed discussion of the basic properties of the Schwarzschild metric (sec-
tions 23.4 - 26),
since much of the latter (in particular geodesics, solar system tests of general relativity,
even the issues that arise in connection with the Schwarzschild radius) can be understood
just on the basis of what has been done so far (if, for the time being, one accepts
on faith that the Schwarzschild metric is the unique spherically symmetric vacuum
solution of the Einstein field equations). Not only is this an interesting and physically
relevant application of the machinery developed so far, it also provides an appropriate
balance between physics and formalism in the lectures. More advanced material in the
intervening sections can then be covered and dealt with if and when needed or desired
(or, ideally, both).
97
3 Tensor Algebra
The Einstein Equivalence Principle tells us that the laws of nature (including the effects
of gravity) should be such that in an inertial frame they reduce to the laws of Special
Relativity. As we have seen in the case of a free particle, this can be implemented by
transforming the laws of Special Relativity to arbitrary coordinate systems and declaring
that these be valid for arbitrary coordinates and metrics.
However, it may not yet be completely clear at this stage what is the precise relation
between this procedure and the incorporation of a gravitational field via the equivalence
principle. Moreover, this is a somewhat tedious procedure in general (e.g. to obtain the
correct form of the Maxwell equations in the presence of gravity) and not particularly
enlightning.
In order to fill this gap (and overcome these shortcomings), we will now introduce the
Principle of General Covariance and show that it provides us with a concrete way of
implementing the Einstein Equivalence Principle.
where the term in brackets is some invertible matrix or operator. Then clearly the
presence of the junk-terms means that the equation T = 0 is not equivalent to the
equation T = 0. An example of an object that transform in this way is, as we have seen,
98
the Christoffel symbols. On the other hand, if these junk terms are absent, so that we
have
T = (. . .)T (3.2)
then clearly T = 0 if and only if T = 0, i.e. the equation is satisfied in one coordinate
system if and only if it is satisfied in any other (or all) coordinate systems. This is the
kind of equation that embodies general covariance, and again we have already seen an
example of such an equation, namely the geodesic equation, where the term (. . .) in
brackets is just the Jacobi matrix. Thus, to be more concrete, we can replace the 2nd
condition above by
Let us now establish the above statement, namely that the Einstein equivalence principle
implies that an equation that satisfies the conditions 1 and 2 (or 2) is valid in an
arbitrary graviational field:
consider some equation that satisfies these conditions, and assume that we are in
an arbitrary gravitational field;
condition 2 implies that this equation is true (or satisfied) in all coordinate sys-
tems if it is satisfied just in one coordinate system;
now we know that we can always (locally) construct a freely falling coordinate
system in which the effects of gravity are absent;
the Einstein Equivalence Principle now posits that in such a reference system the
physics is that of Minkowski space-time;
Remarks:
1. Note that general covariance alone is an empty statement since any equation
(whether correct or not) can be made generally covariant simply by writing it in
an arbitrary coordinate system (cf. also the discussion in section 5.4). It develops
its power only when used in conjunction with the Einstein Equivalence Principle
99
as a statement about physics in a gravitational field, namely that by virtue of its
general covariance an equation will be true in a gravitational field if it is true in
the absence of gravitation.
2. The principle of general covariance does not fix the equations uniquely because
there are generally covariant objects that one can construct e.g. from the (second)
derivatives of the metric (via the Riemann curvature tensor to be introduced
in section 7) that can therefore be added to an equation and which vanish for
Minkowski space, i.e. in the absence of gravitation.
3.2 Tensors
If you are already familiar with Lorentz tensors from special relativity (as briefly recalled
in section 1.2, these are objects which transform in a particularly simple multi-linear way
under Lorentz transformations), then hardly anything in this or the subsequent section
3.3 should be new or unexpected (but interesting new features will arise in particular
when we move on from tensor algebra to tensor analysis in section 4).
1. Scalars
The simplest example of a tensor is a function (or scalar) f which under a coor-
dinate transformation x y (x ) simply transforms as
or f (y) = f (x(y)). One frequently suppresses the argument, and thus writes
simply, f = f , expressing the fact that, up to the obvious change of argument,
functions are invariant under coordinate transformations.
2. Vectors
The next simplest case are vectors V (x) transforming as
y
V (y(x)) = V (x) . (3.4)
x
100
A prime example is the tangent vector x to a curve, for which this transformation
behaviour
y
x y = x (3.5)
x
is just the familiar one.
Remarks:
and likewise for scalars and scalar fields, and more general tensors and tensor
fields.
(b) One way of thinking about vector fields is as tangent vector fields to families of
curves on a space or space-time which arise as the solutions to the differential
equation
d
x () = V (x()) (3.7)
d
(and we take local existence and uniqueness of these solutions under suitable
regularity and differentiability conditions for granted). These curves x (s)
are the integral curves (or orbits) of the vector field V , and by by con-
struction they are characterised by the fact that at any point x the tangent
vector to the curve passing through that point is the vector V (x) at that
point. Thus vector fields also generate a flow on the space(-time), namely
the motion of points along these integral curves, x () 7 x ( +s) for s R.
(c) An extremely useful related way of thinking about vectors (vector fields) is
as first order differential operators, via the correspondence
V V := V . (3.8)
One of the advantages of this point of view is that the object V is com-
pletely invariant under coordinate transformations as the components V of
V transform inversely to the basis vectors . For more on this see sections
3.6 and 3.8 on the coordinate-independent interpretation of tensors below.
3. Covectors
A covector (field) is an object U (x) which under a coordinate transformation
transforms inversely to a vector, i.e. as
x
U (y(x)) = U (x) . (3.9)
y
101
A familiar example of a covector is the derivative U = f of a function which
of course transforms as
x
f (y(x)) = f (x) . (3.10)
y
Remarks:
(a) As in the case of covectors of special relativity (1.38), one should think of
covectors pointwise as elements of the dual vector space V to the space of
vectors V, i.e. as linear functionals on the space of vectors, given by
U = U dx (3.12)
of a scalar.
(c) Combining the two points of view in the remarks above, one can thus think
of df as the linear functional on vector fields that assigns to a vector field V
the scalar which is the derivative of f along V ,
4. Covariant 2-Tensors
Clearly, given the above objects, we can construct more general objects which
transform in a nice way under coordinate transformations by taking products of
them. Tensors in general are objects which transform like (but need not be equal
to) products of vectors and covectors.
In particular, a covariant 2-tensor, or (0,2)-tensor, is an object A that transforms
under coordinate transformations like the product of two covectors, i.e.
x x
A (y(x)) = A (x) . (3.15)
y y
102
I will from now on use a shorthand notation in which I drop the prime on the trans-
formed object and also omit the argument. In this notation, the above equation
would then become
x x
A = A . (3.16)
y y
We already know one example of such a tensor, namely the metric tensor g
(which happens to be a symmetric tensor).
5. Contravariant 2-Tensors
Likewise we define a contravariant 2-tensor (or a (2,0)-tensor) to be an object B
that transforms like the product of two vectors,
y y
B = B . (3.17)
x x
An example is the inverse metric tensor g .
6. (p, q)-Tensors
It should now be clear how to define a general (p, q)-tensor - namely as an object
...
T 11 ...pq with p contravariant and q covariant indices which under a coordinate
transformation transforms like a product of p vectors and q covectors,
... y 1 y p x1 xq 1 ...p
T 1 ...p = . . . . . . T 1 ...q . (3.18)
1 q x1 xp y 1 y q
Remarks:
1. Note that, in particular, a tensor is zero (at a point) in one coordinate system if
and only if the tensor is zero (at the same point) in another coordinate system.
Thus, any law of nature (field equation, equation of motion) expressed in terms of
...
tensors, say in the form T 11 ...pq = 0, preserves its form under coordinate trasfor-
mations and is therefore automatically generally covariant,
1 ...p 1 ...p
T 1 ...q =0T 1 ...q
=0 (3.19)
103
3. A covariant 2-tensor T , say, is said to be symmetric if T = T and anti-
symmetric if T = T . This is well-defined because it is a generally covariant
notion: a tensor is symmetric in all coordinate system iff it is symmetric in one
coordinate system, etc.
This definition can be extended to any or all pairs of covariant indices or pairs of
contravariant indices. Thus e.g. a tensor T 1 ...p is called totally symmetric (or
totally anti-symmetric) if it is symmetric (anti-symmetric) under the exchange of
any pair of indices.
On the other hand, it is not meaningful to talk of the symmetry of a (1,1)-tensor,
say, as an equation like T = T does not make any sense.
Symmetrisation and anti-symmetrisation of tensors will be discussed in section
3.3 below.
Tensors can be added, multiplied and contracted in certain obvious ways. The basic
algebraic operations are the following:
1. Linear Combinations
1 ...p 1 ...p
Given two (p, q)-tensors A 1 ...q and B 1 ...q , their sum
1 ...p 1 ...p 1 ...p
C 1 ...q =A 1 ...q +B 1 ...q (3.20)
2. Direct Products
104
1 ...p 1 ...p
Given a (p, q)-tensor A 1 ...q and a (p , q )-tensor B 1 ...q , their direct product
1 ...p 1 ...p
A 1 ...q B 1 ...q (3.21)
is a (p + p , q + q )-tensor,
3. Contractions
Given a (p, q)-tensor with p and q non-zero, one can associate to it a (p 1, q 1)-
tensor via contraction of one covariant and one contravariant index,
1 ...p 1 ...p1 1 ...p1
A 1 ...q B 1 ...q1 =A 1 ...q1 . (3.22)
This is indeed a (p 1, q 1)-tensor, i.e. transforms like one. Consider, for ex-
ample, a (1,2)-tensor A and its contraction B = A . Under a coordinate
transformation B transforms as a covector:
B = A
y x x
= A
x y y
x
= A
y
x x
= A = B . (3.23)
y y
Remarks:
(a) Note that there are p different ways of lowering the indices, and they will in
general give rise to different tensors. It is therefore important to keep track of
105
this in the notation. Thus, in the above, had we contracted over the second
index instead of the first, we should write
1 ...p 3 ...p
g2 A 1 ...q A1 1 ...q . (3.25)
(b) In particular, given a vector field V (x), we can associate to it the dual
(with respect to the metric) covector field V (x) with covariant components
V = g V , (3.26)
A = g A . (3.27)
g = g g g . (3.29)
and raising one index of the metric gives the Kronecker tensor,
g g g = . (3.30)
106
The factor 12 is chosen such that the symmetrisation of a symmetric tensor is the
same as the original tensor,
T() = 12 (T + T ) (3.33)
is totally symmetric, i.e. symmetric under the exchange of any pair of indices, and
1
T[] 3! (T T T + T T + T ) (3.35)
is totally anti-symmetric. The prefactor 16 is again there to ensure that the total
symmetrisation of a totally symmetric tensor is the original tensor (and likewise for
the total anti-symmetrisation of totally anti-symmetric tensors). This generalises
in an evident way to higher rank p tensors, with the combinatorial prefactor 1/p!.
An observation we will frequently make use of to recognise when some object is a tensor
is the following (occasionally known as the quotient theorem or quotient lemma):
...
Assume that you are given some object A 11 ...pq . Then if for every covector U the
... ...
contracted object U1 A 11 ...pq transforms like a (p 1, q)-tensor, A 11 ...pq is a (p, q)-
tensor. Likewise for contractions with vectors or other tensors so that if e.g. in an
equation of the form
A = B C (3.36)
you know that A transforms as a tensor for every tensor C, then B itself has to be a
tensor.
If junk 6= 0, then there will be some C such that junk contributes to the contraction
B C . That means that junk contributes to A , the transformed A, contradicting the
premise that A is a tensor.
107
3.4 Generally Covariant Integration and Volume Elements
While tensors are the objects which, in a sense, transform in the nicest and simplest
possible way under coordinate transformations, they are not the only relevant objects.
An important class of non-tensors (but almost tensors) are so-called tensor densities.
They will play a crucial role for us in order to have a generally-covariant notion of
integration at our disposal, and thus ultimately also a way of writing down generally
covariant action principles for fields etc.
In this section we will address the issue of generally covariant integration in a space-time
equipped with a metric. This will be accomplished with the help of a particular tensor
density constructed from the metric. Having thus established that tensor densities are
objects of legitimate interest in their own right, we will then discuss their properties in
more generality in section 3.5 below.
To set the stage, consider once again first the situation in special relativity. In that
case, the integral of a Lorentz scalar f () with respect to the volume element d4 (or
d27 . . . ) is itself a Lorentz scalar, i.e. independent of the inertial reference frame in
which the integral is evaluated,
Z Z
d f () = d4 f()
4 (a = Lab b ) (3.38)
1. f is a scalar by asumption,
f()
= f () , (3.39)
d4 = d4 . (3.42)
108
because of the non-trivial Jacobian,
y 4
4
d y = det d x . (3.44)
x
One way out would be to abandon the idea that one should integrate scalars and to
require that the integrand f (x) should transform in such a way that it cancels the
Jacobian arising from the measure, namely as
1
y
f (y) = det f (x) . (3.45)
x
This is indeed an option, and we will return to this below (see remark 1 in section
3.5), but at this stage this is rather unintuitive and not particularly useful, in particular
because it is not clear how one should go about finding or constructing such objects in
the first place.
Therefore let us approach this question in a different way. Integrals are used to cal-
culate or measure volumes (or areas, or lenghts, or . . . ). Such integrals should have a
coordinate-independent meaning, but they should depend on the prescription one uses
for measuring volumes, areas, lenghts, . . . These prescriptions are concisely encoded in
the metric. Thus it is plausible that in order to define a generally covariant notion of
integration one may need to specify the metric, but that this is all that one should need
to know (while the Jacobian between two coordinate systems should fundamentally be
irrelevant and be considered to be a red herring).
With this in mind, let us recall the standard tensorial transformation behaviour of the
metric under coordinate transformations,
x x
g (y) = g (x) . (3.46)
y y
It follows from this that the absolute value of the determinant of the metric
does not transform like a scalar or some other tensor at all, but instead transforms as
2 2
x y
g = det g = det g . (3.48)
y x
In particular, its square-root g transforms as
1
p y
g = det g . (3.49)
x
Therefore the combined expression gd4 x is invariant under general coordinate trans-
formations,
p 4
g d y = gd4 x , (3.50)
109
and can therefore be used to define integrals of scalars in a generally covariant (but
metric-dependent) way,
Z p Z
4 4
g d y f (y) = gd x f (x) . (3.51)
This will of course be important in order to formulate action principles etc. in a space-
time equipped with a metric in a generally covariant way.
This is also frequently the quickest way to determine the volume element in non-
Cartesian coordinates in Euclidean space. Thus, to determine what is the volume
element in spherical coordinates {y k } = (r, , ), say, instead of laboriously determin-
ing the Jacobi matrix for the coordinate transformation, and then (equally laboriously)
calculating its determinant (which would be the standard uninspiring and uninspired
procedure), all one needs to know is the metric in these coordinates to deduce
and therefore
d3 x = g d3 y = r 2 sin dr d d . (3.53)
In the previous section we have encountered certain not strictly tensorial objects which
nevertheless turned out to be useful. Having thus established the basic credentials of
such objects, we will now formalise this somewhat.
Thus the prime example of what we will call a tensor density is the (absolute value of
the) determinant g := | det g | of the metric tensor, which, as we have seen, transforms
as 2
y
g = det
g . (3.54)
x
An object which transforms in such a way under coordinate transformations is called
a scalar tensor density of weight w = +2, and the square root of the determinant g
transforms as, and hence is, a tensor density of weight w = +1.
1 ...p y 1 y p x1 xq w/2 1 ...p
gw/2 T 1 ...q = . . . . . . g T 1 ...q . (3.56)
x1 x p y 1 y q
110
Conversely, therefore, any tensor density of weight w can be written as a tensor times
g+w/2 ,
The algebraic rules for tensor densities are strictly analogous to those for tensors. Thus,
for example, the sum of two (p, q) tensor densities of weight w (let us call this a (p, q; w)
tensor) is again a (p, q; w) tensor, and the direct product of a (p1 , q1 ; w1 ) and a (p2 , q2 ; w2 )
tensor is a (p1 + p2 , q1 + q2 ; w1 + w2 ) tensor. Contractions and the raising and lowering
of indices of tensor densities can also be defined just as for ordinary tensors.
Remarks:
1. Generalising the argument in section 3.4, we now learn that if f is any scalar
density of weight w = +1, then its integral is well-defined and coordinate inde-
pendent, Z Z
d4 x f = d4 x f . (3.58)
See remark 4 below for one way of constructing such objects without taking re-
course to a metric.
2. There is one more important tensor density which - like the Kronecker tensor - has
the same components in all coordinate systems. This is the totally anti-symmetric
Levi-Civita symbol (taking the values 0, 1) which is a tensor density of
weight w = 1. Then g is a tensor (strictly speaking it is a pseudo-tensor
because of its behaviour under reversal of orientation - see below).
To see this, recall first of all the definition of the Levi-Civita symbol: it is totally
anti-symmetric,
=[] , (3.59)
and has therefore only got one independent component which we will normalise
to be
0123 = +1 . (3.60)
111
Next, recall one possible definition of the determinant det M of a (D D)-matrix
M , namely as the coefficient (proportionality factor) on the right-hand side of
1 ...D M 11 . . . M D
D
= (det M ) 1 ...D . (3.62)
Now choose M to be the Jacobi matrix (y/x). Then the above equation shows
that 1
y x xD
1 ...D = det . . . 1 ...D , (3.63)
x y 1
y D
i.e. that 1 ...D transforms as a tensor density of weight w = 1, provided that
det(y/x) > 0. The latter condition means that the coordinate transformation
preserves the orientation. Thus, 1 ...D transforms as a tensor density under
orientation-preserving coordinate transformations but picks up a sign when the
orientation is reversed. Thus strictly speaking 1 ...D is not a tensor density but
a pseudo-tensor density.
Going back to 4 dimensions, it follows that
g (3.64)
We could have chosen to not absorb the minus sign into the definition of ,
at the expense of an explicit minus sign on the right-hand side of (3.65). The
convention we have adopted is more convenient, however, in particular since it
is compatible with the standard practice in special relativity to (tacitly) identify
= , the minus sign arising from raising the indices on with the
Minkowski metric with 00 = 1, so that 0123 = 0123 .
112
This is not a tensor but transforms like a scalar density. On the other hand, if
one works instead with the tensor one obtains a scalar, and this scalar is
precisely the invariant volume element (3.50),
1
dx dx dx dx = gd4 x . (3.68)
4!
Consider first of all the derivative df of a function (scalar field) f = f (x). This is
clearly a coordinate-independent object, not only because we didnt have to specify a
coordinate system to write df but also because
f (x) f (y(x))
df = dx = dy , (3.70)
x y
which follows from the fact that f (a covector) and dx (the coordinate differentials)
transform inversely to each other under coordinate transformations. This suggests that
it is useful to regard the quantities f as the coefficients of the coordinate independent
object df in a particular coordinate system, namely when df is expanded in the basis
{dx }.
We can do the same thing for any covector A . If A is a covector (i.e. transforms like
one under coordinate transformations), then A := A (x)dx is coordinate-independent,
113
and it is useful to think of the A as the coefficients of the covector A when expanded
in a coordinate basis, A = A dx . Linear combinations of dx built in this way from
covectors are known as 1-forms.
From this point of view, we interpret the {A } simply as the (coordinate dependent)
components of the (coordinate independent) 1-form A when expressed with respect to
the (coordinate dependent) differentials {dx }, considered as a basis of the space of
covectors.
Something similar can be done for vector fields. Just as covectors transform inversely to
coordinate differentials, vectors V transform inversely to partial derivatives . Thus
V := V (x) (3.71)
x
is coordinate-independent - a coordinate-independent linear first-order differential op-
erator. One can thus always think of a vector field as a 1st order differential operator
and this is a very fruitful point of view.
V f = V f . (3.72)
This is also a coordinate independent object, a scalar, arising from the contraction of a
vector and a covector. And this is as it should be because, after all, both a function and
a vector field can be specified on a space-time without having to introduce coordinates
(e.g. by simply drawing the vector field and the profile of the function). Therefore also
the change of the function along a vector field should be coordinate independent and,
as we have seen, it is.
So far we have only discussed vectors and covectors. All this can, in principle, be
extended to higher rank tensors, but at this point it would be very useful to introduce
the notion (or at least the notation) of tensor products. I will briefly desribe this in
section 3.7 below.
For those who do not want to delve into this (and it is not required for the following):
...
fact of the matter is that any (p, q)-tensor T 11 ...pq can be thought of as the collection
of components of a coordinate independent object T when expanded in a particular
coordinate basis in terms of the dx and (/x ).
Any choice of coordinate system {x } gives rise to such a basis {dx }, and such bases
are known as coordinate bases or natural bases. This is not the only possible choice of
basis, however, and we will return to this issue in section 3.8.
In (multi-)linear algebra, the tensor product is used to describe multilinear maps. Let
V be a vector space, and V its dual, consisting of the linear maps V R, and denote
114
the action of a V on v V by
a V , v V a(v) R . (3.73)
ei (Ek ) = ki , (3.74)
a b 6= b a . (3.77)
(a b)(v, w) = ai bk v i wk , (3.78)
acting as
a(v, w) = aik v i wk . (3.80)
From these definitions it follows that the tensor product is evidently linear,
a (b + c) = a b + a c (3.81)
(and likewise for the first factor), and R-linear, i.e. for r R one has
115
Using the canonical isomorphism (V )
= V for finite-dimensional vector spaces,
one can also in the same way define the tensor product V V as the space of
bilinear functions on V V ,
By the same token, the tensor product V W is the space of bilinear maps on
V W .
p V = |V .{z
. . V } (3.87)
p times
These multilinear maps can be added and multipled and thus form an algebra,
the tensor algebra of V , denoted by T (V ). As a vector space, it consists of the
sums of all the p-linear maps,
T (V ) = p=0 p V . (3.88)
The tensor product can also be used to describe multilinear maps between vector spaces:
Likewise a linear map from V to some other vector space W can be regarded as
an element of V W .
116
Clearly, in general, given a basis of V and a dual basis of V , the tensor product can
be used to construct a basis
in the space
T p,q = (V . . . V ) (V . . . V ) (3.92)
| {z } | {z }
p times q times
of (p, q)-tensors,
i ...i
T T p,q : T = Tk11 ...kpq (Ei1 . . . Eip ) (ek1 . . . ekq ) . (3.93)
This is the way we will use the tensor product notation below, as a multilinear operation
providing us with a basis for higher rank tensor fields.
The reason for introducing and working with tensors, defined in this way, is that tensorial
equations have the virtue that they are generally covariant, i.e. that they are satisfied
in all coordinate system if and only if they are satisfied in one coordinate system. The
emphasis in this formulation is thus not on tensors as multilinear maps but on how they
transform under coordinate transformations. This seems to be somewhat at odds with
the definition of tensors in multilinear algebra, but as we will see below this is simply
due to the choice of a particular class of bases (coordinate bases), with respect to which
multilinear maps indeed transform in this way under changes of the coordinate basis,
i.e. under changes of coordinates.
We had already noted above, that there is a more coordinate independent way of looking
at covector fields and vector fields, by associating to them the objects
which are completely invariant under coordinate transformations, with the dx and the
providing a basis for the space of covector and vector fields respectively.
This perspective can now be extended to higher-rank and mixed tensors. In particular,
associated with the metric g (x) we have the coordinate independent line element
ds2 = g dx dx . (3.96)
117
which we can now also think of as the tensor
g = g dx dx . (3.97)
Since we are now dealing with tensor fields rather than just with tensors (multilinear
maps at a given point), the tensor product in this context is required to be multilinear
not just over R, but over functions (scalars) so that e.g.
Now let us return to (3.97). If one wants to emphasise that the metric is a symmetric
(0,2)-tensor, one can also expand it with respect to the symmetrised basis as
but for the metric the tensor-product is often omitted and one simply writes it as the
line element (3.96).
If one has a non-symmetric (0, 2)-tensor T , say, then one can also group these coeffi-
cients into the components of a coordinate-invariant object, but now the tensor product
notation
T T = T dx dx (3.100)
is more useful than just writing T dx dx , simply to emphasise the fact that all com-
ponents of T , not just the symmetric part of T , contribute to T because dx dx
is not symmetric,
dx dx 6= dx dx , (3.101)
(whereas just writing dx dx might lead one to believe that dx and dx commute).
More generally, to a (0, p)-tensor we can associate the object
The tensor product notation is also useful for higher-rank contravariant or mixed tensors.
Given a (2, 0)-tensor with components T , say, one really does not want to write
the corresponding coordinate-invariant object as T , say, because this may be
118
interpreted as a second order differential operator whereas what one really means is a
bilinear first order differential operator, which one writes as
T = T , (3.104)
In general, we can thus think of a (p, q)-tensor field, as given in (3.94), as the components
of a coordinate-independent object
...
T = T 11 ...qp (x) (1 . . . p ) (dx1 . . . dxq ) , (3.105)
when expanded with respect to the coordinate basis in the space of tensor fields gener-
ated by dx and = x .
As we saw in section 3.6, a choice of coordinates provides one with a choice of basis for
vectors, covectors and other tensors, and a quantity like V is then interpreted as the
collection of components of an object V = V with respect to the coordinate basis .
In classical tensor calculus one always works in such a basis, and with the components
of tensors with respect to such a basis. This is very convenient and natural, but this is
now clearly not the only choice.
Indeed, the above point of view suggests a reformulation and generalisation that is
extremely natural and useful (but that I will nevertheless hardly ever make use of in
these notes).
Namley, let {em (x)} be such that it is an invertible matrix for every point x. Then
another possible choice of basis for the space of covectors are the linear combinations
em := em dx . (3.106)
A general such basis is called a vielbein, which is German for multileg, quite appropriate
actually, as one should visualise this as a bunch of linearly independent (co-)vectors at
every point of space-time.
In two, three, and four dimensions these are also known more specifically as zweibeins,
dreibeins and vierbeins respectively. In four dimensions, the Greek word tetrads is
also commonly used. The em are sometimes also referred to as frame fields, mostly in
the context of orthonormal frames (see below).
In general, this new basis is not a coordinate basis, i.e. there does not exist a coordinate
system {y m } such that em = dy m . If such a coordinate system does exist, then one has
y m
em = dy m em =
x (3.107)
em = em ,
119
and locally also the converse is true. In particular, if
For many purposes, bases other than coordinate bases can also be extremely useful and
natural, in particular the orthonormal bases we will introduce below.
dx = em em , (3.109)
em m
en = n em em
= . (3.110)
A = A dx = A em em Am em , (3.111)
so that the components of A with respect to the new basis {em } are
Am = A em . (3.112)
Likewise, the vielbeins allow us to pass from a natural (or coordinate) basis for vector
fields, the { }, to another basis
Em = em , (3.113)
V = V (x) = V m Em (3.114)
with
V m = em V . (3.115)
Note that, unlike the , the Em do not commute in general, i.e.
[Em , En ] 6= 0 . (3.116)
We can apply the same reasoning to any other tensor field, e.g. to the metric tensor
itself. We can write the invariant line element as
so that the components of the metric with respect to the new basis are
gmn = g em en . (3.118)
120
Given a metric, there is a preferred class of bases {ea } which are such that the corre-
sponding matrices ea (x) diagonalise (and normalise) the metric at every point x, i.e.
which are such that gab = ab or
Such a basis ea , with respect to which the components of the metric are the Minkowski
metric ab , is known as an orthonormal basis or orthonormal frame.
In the more mathematical literature, the ea are also referred to as soldering forms
because they identify (solder, glue) an abstract space of (co-)vectors at each point x,
labelled by a, b, . . . with the concrete space of (co-)vectors tangent to the space-time at
the point x, labelled e.g. by the indices , , . . ..
For a general metric, a basis which achieves this cannot be a coordinate basis (because
this would mean that the metric is equivalent to the Minkowski metric by a coordinate
transformation). However, clearly there is no obstacle to finding a more general basis
which will do this: for every point x we can find a matrix ea (x) which achieves (3.119)
As the metric varies smoothly with x, we can also choose the matrices ea (x) to vary
smoothly with x, and hence we can put them together to define the smooth matrix-
valued function ea (x) for all x. [I am ignoring some global (topological) issues here.
We will not need to worry about them here.]
The reason why I referred to a class of bases above is that, clearly, such an orthonormal
basis is not unique. At every point x it is determined up to a Lorentz transformation
Thus a given metric does not determine a unique orthonormal basis, but only an or-
thonormal basis up to Lorentz transformations
If one wants the components of the metric in a given coordinate system {x }, one
expands the orthonormal basis ea in terms of the natural basis dx as above as
to find, as above,
g (x) = ea (x)eb (x)ab . (3.124)
Thus instead of the metric one can choose orthonormal vielbeins as the basic variables
of General Relativity. In that case one has to demand not only general covariance but
121
also invariance under local Lorentz transformations (acting on the orthonormal indices
a, b, . . .). [One could also allow for general vielbeins, in which case one would have to
replace Lorentz transformations by the larger group of general linear transformations.]
Examples:
Here are a few examples to illustrate that orthonormal frames are not something mys-
terious but can usually be read off very easily from the metric in a coordinate basis.
Now define
e1 = Rd , e2 = R sin d , (3.126)
i.e.
ea = ea dx (3.127)
with
e1 = R , e1 = 0 , e2 = 0 , e2 = R sin . (3.128)
ds2 = e1 e1 + e2 e2 = ab ea eb , (3.129)
so the ea are an orthonormal basis. They are obviously not a coordinate basis
because (3.108)
e2 = R cos 6= e2 = 0 . (3.130)
Ea = Ea (3.131)
E1 = R1 , E2 = (R sin )1 , (3.132)
which satisfies
g Ea Eb = ab . (3.133)
That this is not a coordinate basis is reflected in the fact that the commutator
[E1 , E2 ] 6= 0,
122
2. The Schwarzschild Metric (1.129)
The metric is
With
0 1 2 3 2m 1/2 2m 1/2
(e , e , e , e ) = (1 ) dt, (1 ) dr, rd, r sin d (3.136)
r r
ds2 = ab ea eb , (3.137)
Remarks:
g u u = 1 . (3.140)
we see that the 4-velocity u can be interpreted as the timelike component ea=0
of an orthonormal frame along the worldline,
u = ea=0 , (3.142)
123
condition of these vectors along the worldline, such as the Fermi-Walker parallel
transport to be discussed in section 4.10.
In any case, however the laboratory system is defined, the frame components
V a = ea V (3.143)
= u k = ea=0 k = ea=0
k k
a=0
. (3.144)
2. The ea can in some sense be regarded as the square-root of the metric. In par-
ticular denoting the determinant of the matrix ea by
(3.119) implies
p
g(x) := | det(g (x))| = e(x)2 |e(x)| = g(x) . (3.146)
3. Coordinate indices can, as usual, be raised and lowered with the space-time metric
g and its inverse, and Minkowski (tangent space) indices with the Minkowski
metric ab and its inverse.
Note that this is consistent with the notation for ea and its inverse ea because
ea = g ab eb . (3.147)
g = ab ea eb , (3.148)
etc. The reason why I have called the basis of vector fields in a general frame Em
rather than em is that em and Em are of course not related just by lowering or
raising the indices of the metric, Em 6= gmn en . The former are linear combinations
of the dx , the latter linear combinations of the , so they are very different
objects.
One could now go ahead and develop the entire machinery of tensor calculus (covariant
derivatives, curvature, . . . ) that we are about to develop in the following sections in
terms of vielbeins as the basic variables instead of the metric. This is rather straight-
forward. For example, given the expression for the Christoffel symbols in terms of the
124
metric, and for the metric in terms of the vielbeins, one can express the Christoffel
symbols (and hence covariant derivatives and curvatures) in terms of vielbeins, but the
resulting expressions are rather unenlightning and not of much use in practice.
The real power of the vielbein formalism emerges when one combines it with the for-
malism of differential forms. And in practice the most useful and efficient alternative
to working in components in a coordinate basis is working with differential forms in an
orthonormal basis.
I do most of my (curvature) calculations in the latter framework (and e.g. only then
translate them into coordinate components for the purposes of inserting them into these
notes), but this is (for the time being) not something I will develop further here.8
Having reached this point, you may have the impression that the notation we have
...
introduced for tensors, T 11 ...qp say, and which, as you might have noticed by looking
ahead, we will continue to use in these notes, with its morass of indices, is somewhat
cumbersome and unelegant. And perhaps you might prefer to at the very least see
everything written in terms of the index-free coordinate-invariant objects like V = V
or A = A dx introduced in section 3.6.
I cannot disagree with the sentiment that using all these indices does not appear to be
particularly elegant. Mathematicians abhor it. Physicists, however, are pragmatists by
nature - they will use whatever turns out to be useful or efficient for what they want
to achieve, regardless of whether or not it is considered or perceived to be beautiful or
elegant according to some external criteria.
In particular, in the case at hand, the index-laden notation would not be that commonly
used and widespread if it did not have some distinct advantages over other options.
Indeed, this notation is an extremely useful and informative bookkeeping device that
conveys a lot of information in a very compact way. In particular, as we have seen, the
index notation allows one to reliably read off what kind of tensor one is dealing with,
along the lines of if it has p upper and q lower indices, it transform like, hence is, a
(p, q)-tensor. Moreover, as we will see below, it provides one with a much more concise
and informative way of describing and performing algebraic manipulations of tensors
than some index-free notation is capable of.
Let me first make clear what the issue is and what it is not when one writes something
like V or V (x), as this can be interpreted in (at least) 2 distinct ways:
8
See e.g. W. Thirring, Classical Mathematical Physics for a presentation of general relativity entirely
in the coordinate-independent formalism of differential forms, and N. Straumann, General Relativity,
where differential forms are used whenever it is convenient or useful.
125
1. On the one hand, V may refer to the numerical values of the components of a
specific vector V in a specific coordinate system.
2. On the other hand, the notation V may be used to indicate that the object V
transforms like a vector.
The first use of V is completely uncontentious: if one wants to write down the compo-
nents of some object with respect to some basis, one has to write down the components
of that object with respect to that basis, there is no way around that.
It is mainly the second use and interpretation of the notation that is at stake, and it is
also mainly in this sense that the index notation is used for tensor algebra and tensor
calculus in general and in these notes in particular.
To a somewhat lesser extent the fact that the notation itself does not indicate whether
one has in mind the first or the second interpretation is also an issue (even though this
is usually clear from the context). It is actually not so much an issue (if desired this
is something that can easily be remedied - I will come back to this at the end of this
section) as possibly the source of a major misunderstanding between mathematicians
and physicists - namely that a dislike of the index notation arises from the (false!) belief
that it means that one is always writing down objects with respect to a particular basis.
If this were the case, this would indeed be clumsy and silly, and quite contrary to the
spirit of general covariance. However, as interpretation 2 indicates, this is absolutely
not what is meant.
Returning to the use of indices as a way to indicate tensorial type and tensorial oper-
ations (like contractions), let us consider the alternatives. If one wants to indicate in
symbols that some object V is a vector field, then as a mathematician one might write
something like V (T M ), stating that V is a section of the tangent bundle of the
space or space-time (manifold) M . This is fine, but if the space M is clear from the
context, why not declare once and for all that writing V means the same thing? And
perhaps use different kinds of indices to refer to tensors on different spaces?
If this were all then this would hardly be an issue and even physicists could be convinced
to write V (T M ), at least when talking to mathematicians. Where the index
notation really pays off, however, is when it comes to algebraic manipulations such as
those discussed in section 3.3 (and even more so when it comes to tensor analysis, which
is the subject of section 4, but tensor algebra will be enough to illustrate this).
As examples consider the contractions of a (1, 2) tensor T , say, with itself and with
a vector V . With indices one would write T and V and the possible contractions
would be written as
T T , T
(3.149)
(T , V ) T V , T V ,
126
the first line indicating the two distinct covectors one obtains as contractions of T itself,
and the second the two distinct possibilities of contracting T and V to obtain a (1, 1)-
tensor. In an index-free notation one would have to invent some operation like Cnm to
indicate a contraction over the mth upper and nth lower index.9 In this notation, the
four objects above would then be written as
T C11 (T ) , C21 (T )
(3.150)
(T, V ) C12 (T V ) , C22 (T V ) .
Is this superior? It does not even allow one to read off the tensor type of the resulting
objects unless one remembers what the tensor types of T and V were to begin with,
whereas this is completely manifest in (3.149).
Moreover, imagine how untransparent this would become were one to perform even the
simplest sequence of such elementary operations: compare
If you prefer the right-hand side, or some variant of it, feel free to use it. However, you
should be aware of the fact that the left-hand side contains an equivalent amount of
information, simply packaged in a more digestible way that is both more informative
(its a scalar!) and easier to manipulate. For most intents and purposes the index
notation is really extremely convenient and it is for this reason that we will continue to
make use of it in these notes.
One other reason for concern may be that by exclusively working with local coordinates
and coordinate bases one may be missing some global aspects of a space or space-
time. This is certainly true to a certain extent but is not primarily a notational issue.
Rather, it means that in addition one needs to make use of more advanced notions from
topology, global analysis etc. This is not something I will attempt here (cf. the book by
Hawking and Ellis in the previous footnote for a description of the groundbreaking early
applications of global analysis to general relativity). One related, but more elementary,
issue is the introduction and use of the term manifold when referring to spaces or space-
times of the kind we are dealing with in these notes. This is something I will very briefly
come back to in section 4.11 below.
Let me, to conclude this rant section, come back to the issue of the notational ambiguity
when one writes something like V , which can occasionally be a source of confusion.
Even though, as mentioned above, usually it is clear from the context what one means,
one might imagine wanting to write down a couple of equations with indices which
are only valid in spherical coordinates, say, and are therefore not to be understood as
tensorial equations. Then it might be helpful to have a notation which reveals that
information as well.
9
I am not making this up - see e.g. section 2.2 of The large scale structure of space-time by S. Hawking
and G. Ellis, in all other respects a wonderful book.
127
This can for instance be accomplished by inventing a new notation like = (or whatever)
to indicate an equality only in a special or specified coordinate system, but while this
may add clarity it does not address the fundamental issue that just writing V does
not unambiguously specify what one has in mind.
Alternatively, and more elegantly and attractively, this can e.g. be accomplished with
very litle effort with the help of what is known as the Penrose abstract index notation.
The idea is to still indicate the tensor type of an object by a certain kind of indices, but
with these indices only serving that purpose and not simultaneously referring to any
particular kind of basis. Thus for example, one would indicate a vector by an object
V a , where the fact that one has a single upper index a just means that this is a (1, 0)-
tensor, and nothing else (exactly as in interpretation 2 above). For the components of
this vector with respect to some basis (coordinates x ) one could then continue to use
the traditional V .
The advantage of this abstract index notation is that for tensorial operations one never
needs to specify a basis anyway, so they can all be performed at the level of the abstract
indices and tensorial equations look identical when written with these abstract indices
or when written with concrete component indices. Thus V a Wa is used to indicate the
scalar one obtains by contraction of a vector V a with a covector Wa . Likewise, instead
of T (which may look basis dependent) one would write T aab , and this is completely
equivalent to writing something like C11 (T ),
but much more informative and user-friendly, and all the usual rules of tensor algebra
apply to these abstract indices.
Whenever one wants or needs to specify a basis or coordinate system, this can be
accomplished by using other kinds of indices. Thus gab could e.g. be used to refer to
the metric tensor in general, while g could then be used to refer to its components in
the basis x . From this we see that
[...] the distinction between the index notation and the component notation
is much more one of spirit (i.e., how one thinks of the quantities appearing)
than of substance (i.e., the physical form the equations take).10
While I will not make use of the abstract index notation in these notes (with the hope
that this will not cause any confusion), the use of abstract indices appears to be an
ideal (eat the cake and have it too) compromise combining the best of both worlds
10
R. Wald, General Relativity. See section 2.4 of this book for a more detailed explanation of the
abstract index notation, which is systematically used throughout the book. For a detailed treatment of
the abstract index notation and a discussion of some minor subtleties with this notation see R. Penrose,
W. Rindler, Spinors and Space-Time, Vol. 1: Two-Spinor Calculus and Relativistic Fields.
128
and should actually keep both camps happy. It does not yet appear to have found
widespread acceptance among mathematicians, however.
An alternative compromise solution is the already mentioned use of differential forms (in
an orthonormal basis, say), which is manifestly covariant and minimises clutter, display-
ing only the (essential and informative) Lorentz Lie algebra indices while suppressing
the component indices of forms (anti-symmetric tensors).
129
4 Tensor Analysis (Generally Covariant Differentiation)
Tensors transform in a nice and simple way under general coordinate transformations.
Thus these appear to be the right objects to construct equations from that satisfy the
Principle of General Covariance.
However, the laws of physics are differential equations, so we need to know how to
differentiate tensors. The problem is that the ordinary partial derivative does not map
tensors to tensors, the partial derivative of a (p, q)-tensor is not a tensor unless p = q = 0.
This is easy to see: take for example a vector V . Under a coordinate transformation,
its partial derivative transforms as
x y
V = V
y x x
x y x 2 y
= V + V . (4.1)
y x y x x
The appearance of the second term shows that the partial derivative of a vector is not
a tensor.
As the second term is zero for linear transformations, you see that partial derivatives
transform in a tensorial way e.g. under Lorentz transformations, so that partial deriva-
tives are all one usually needs in special relativity.
We also see that the lack of covariance of the partial derivative is very similar to the
= 0, and this suggests that the problem can be
lack of covariance of the equation x
cured in the same way - by introducing Christoffel symbols. This is indeed the case.
V = V + V . (4.2)
It follows from the non-tensorial behaviour (1.157), (1.158) of the Christoffel symbols
under coordinate transformations x y that V , as defined above, is indeed a
(1, 1) tensor.
V = V + V (4.3)
V = J (J V ) + (J J J + J J )J V . (4.4)
130
The obstructions to tensoriality are the 2 terms involving the derivatives of the Jacobi
matrix, but these cooperatively combine to give
J J + J ( J )J = J J + J ( J )J
= J J ( J )J J (4.5)
= J J ( J )J = 0 .
Remarks:
1. Analysing the above argument for the tensoriality of the covariant derivative, we
see that it relies exclusively on the specific non-tensorial form of the transformation
behaviour of the Christoffel symbols, not on the explicit form of the Christoffel
symbols themselves.
Thus any other object could also be used to define a covariant derivative
(generalising the partial derivative and mapping tensors to tensors) provided that
it transforms in the same way as the Christoffel symbols, i.e. provided that one
has
2 x
=
y x x + y . (4.8)
x x y y
y y
This implies (and is equivalent to the fact) that the difference
C =
(4.9)
is of the form
transforms as a tensor. Thus, any such
= + C
(4.10)
131
2. We could have arrived at the above definition of the covariant derivative (using
the Christoffel symbols) in a somewhat more systematic way by appealing to the
equivalence principle and/or general covariance. Namely, let { a } be an inertial
coordinate system. In an inertial coordinate system we can just use the ordinary
partial derivative b V a . We now define the new (improved, covariant) derivative
V in any other coordinate system {x } by demanding that it transforms as a
(1,1)-tensor, i.e. we define
x b
V := b V a . (4.11)
a x
By a straightforward calculation one finds that
V = V + V , (4.12)
V = V ; ; . (4.17)
One can also define the covariant directional derivative of a vector field V along
another vector field X by
X V X V . (4.18)
132
4. The appearance of the Christoffel-term in the definition of the covariant derivative
may at first sight appear a bit unusual (even though it also appears when one
just transforms Cartesian partial derivatives to polar coordinates etc.). There
is a more invariant way of explaining the appearance of this term, related to
the more coordinate-independent way of looking at tensors explained in section
3.6. Namely, since the V (x) are really just the coefficients of the vector field
V (x) = V (x) when expanded in the basis , a meanigful definition of the
derivative of a vector field must take into account not only the change in the
coefficients but must also include a prescription how bases at (infinitesimally)
neighbouring points are related (or connected). Such a prescription is provided by
the Levi-Civita connection (or a general connection ).
Indeed, writing
V = (V )
= ( V ) + V ( ) , (4.19)
we see that the covariant derivative of the coordinate basis vector (i.e. V = 1,
V = 0 otherwise), is the linear transformation (a prescription for a change of
basis)
= . (4.20)
So far we have defined the covariant derivative for vector fields, and we now want to
extend the definition of the covariant derivative to other tensor fields. In order to achieve
this, we now adopt a more systematic and axiomatic approach.
Our basic postulates for the covariant derivative are the following:
= . (4.21)
133
We will now see that, demanding the above properties, in particular the Leibniz rule,
there is a unique extension of the covariant derivative on vector fields to a differential
operator on general tensor fields, mapping (p, q)- to (p, q + 1)-tensors.
To define e.g. the covariant derivative for covectors U , we note that U V is a scalar
for any vector V so that
(U V ) = (U V ) = ( U )V + U ( V ) (4.23)
(since the partial derivative satisfies the Leibniz rule), and we demand
(U V ) = ( U )V + U V . (4.24)
U = U U . (4.25)
That this is indeed a (0, 2)-tensor can either be checked directly or, alternatively, is a
consequence of the quotient theorem.
The extension to other (p, q)-tensors is now immediate. If the (p, q)-tensor is the direct
product of p vectors and q covectors, then we already know its covariant derivative (using
the Leibniz rule again). We simply adopt the same resulting formula for an arbitrary
(p, q)-tensor. The result is that the covariant derivative of a general (p, q)-tensor is the
sum of the partial derivative, a Christoffel symbol with a positive sign for each of the p
upper indices, and a Christoffel with a negative sign for each of the q lower indices. In
equations
1 p 1 p
T 1 q = T 1 q
1 1 p1
+ T 12qp p
+ . . . + T 1 q
| {z }
p terms
1 p 1 p
1 T 2 q . . . q T 1 q1 (4.26)
| {z }
q terms
Having defined the covariant derivative for arbitrary tensors, we are also ready to define
it for tensor densities. For this we recall that if T is a (p, q; w) tensor density, then
gw/2 T is a (p, q)-tensor. Thus (gw/2 T ) is a (p, q + 1)-tensor. To map this back to
a tensor density of weight w, we multiply this by gw/2 , arriving at the definition
134
where tensor
just means the usual covariant derivative for (p, q)-tensors defined above.
For example, for a scalar density one has
w
= ( g) . (4.29)
2g
In particular, since the determinant g is a scalar density of weight +2, it follows that
g = 0 , (4.30)
which obviously simplifies integrations by parts in integrals defined with the measure
4
gd x.
The main properties of the covariant derivative, in addition to those that were part of
our postulates (like linearity and the Leibniz rule) are the following:
A = A + A A = A . (4.32)
The most transparent way of stating this property is that the Kronecker delta is
covariantly constant, i.e. that
= 0 . (4.33)
A... ...
... = (A ... )
= ( A... ...
... ) + A ...
= ( A...
... ) (4.34)
135
which is precisely the statement that covariant differentiation and contraction
commute. To establish that the Kronecker delta is covariantly constant, we follow
the rules to find
= +
= = 0 . (4.35)
This property does not rely on the specific form of the , and is thus true for
any covariant derivative defined by some choice of connection ,
g = + , (4.36)
we calculate
g = g g g
= +
= 0 . (4.37)
136
also knowns as the no torsion property of the covariant derivative. Namely, we
have
=
= + = 0 . (4.39)
Note that the second covariant derivatives on higher rank tensors do not commute
- we will come back to this in our discussion of the curvature tensor later on.
We noted before that the postulates for a covariant derivative (a linear tensorial operator
reducing to the partial derivative on scalars and satisfying the Leibniz rule) do not
determine it uniquely but only up to the addition of a tensor to the connection,
= + C ,
(4.40)
Not unrelated to this is the fact that it is the uniqe connection that can be built
from only the metric and its 1st derivatives (and which thus vanishes in an inertial
coordinate system in Minkowski space or at the origin of an inertial coordinate system
in an arbitrary gravitational field.
Moreover, as we have seen, this covariant derivative has two important properties,
namely that
2. the torsion is zero, i.e. the second covariant derivatives of a scalar commute.
In fact, it turns out that these two conditions uniquely determine the to be the
Christoffel symbols. The second condition implies that the are symmetric in the
two lower indices,
,
[ ] = 0 =
. (4.41)
The first condition now allows one to express the in terms of the derivatives of
the metric, leading uniquely to the familiar expression for the Christoffel symbols :
First of all, by definition / construction one has (e.g. from demanding the Leibniz rule
)
for
g = g
g g
g
. (4.42)
137
Requiring that this be zero implies in particular that
g +
0= g
g
= g + g g
+
+
(4.43)
= 2( )
(where the cancellations are entirely due to the assumed symmetry of the coefficients
= . This unique metric-compatible and torsion-free
in the last two indices). Thus
connection is also known as the Levi-Civita connection. It is the connection canonically
associated to a space-time (manifold) equipped with a metric tensor, and it is the
connection used in general relativity.
It is possible to relax either of the conditions (1) or (2), or both of them and this will
be discussed in section 10.5, and subsequently also in section 19.7.
In this section we will look at some common and useful special cases of the Levi-Civita
covariant derivative (simply the covariant derivative in the following), such as the
covariant curl and divergence etc.
F = A A (4.45)
[ A ] = [ A ] . (4.46)
138
3. The Covariant Divergence of a Vector
By the covariant divergence of a vector field one means the scalar
V = V + V . (4.47)
I will give a proof of this identity in an appendix to this section (subsection 4.6).
Thus the covariant divergence can be written compactly as
1
V = ( gV ) , (4.49)
g
and one only needs to calculate g and its derivative, not the Christoffel symbols
themselves, to calculate the covariant divergence of a vector field.
This formula is also useful (and provides the quickest way of arriving at the result)
if one just wants to write the ordinary flat space divergence of vector calculus on
R3 in, say, polar or cylindrical coordinates.
~ is of course
In Cartesian coordinates (x1 , x2 , x3 ), the divergence of a 3-vector V
given by the familiar expression
~ = 1 V 1 + 2 V 2 + 3 V 3 .
divV (4.50)
However, as you also know, e.g. in spherical coordinates (r, , ) the divergence is
not simply of this form,
~ 6= r V r + V + V .
divV (4.51)
Rather, going through the coordinate transformation and Jacobians etc., one finds
that calculating the divergence in spherical coordinates one picks up additional
terms, the result taking the somewhat unintuitive form
~ = r V r + V + V + 2 V r + cot V .
divV (4.52)
r
The easy and quick way to obtain this, which provides a rationale for and expla-
nation of the origin of these additional terms, is from the result (4.49). Using
g = r 2 sin , one has
1 h i
~ =
divV r (r 2
sin V r
) + (r 2
sin V
) + (r 2
sin V
)
r 2 sin (4.53)
2
= r V r + V + V + V r + cot V .
r
This thus produces the correct result on the nose and with very little effort.
139
4. The Covariant Laplacian of a Scalar
How should the Laplacian be defined? Well, the obvious guess (something that
is covariant and reduces to the ordinary Laplacian for the Minkowski metric) is
= g , which can alternatively be written as
= g = = = g (4.54)
etc. Note that, even though the covariant derivative on scalars reduces to the
ordinary partial derivative, so that one can write
= g , (4.55)
it makes no sense to write this as : since does not commute with the
metric in general, the notation is at best ambiguous as it is not clear whether
this should represent g or g or something altogether different. This am-
biguity does not arise for the Minkowski metric, but of course it is present in
general.
A compact yet explicit expression for the Laplacian follows from the expression
for the covariant divergence of a vector:
:= g
= (g )
= g 1/2 (g1/2 g ) . (4.56)
Again, this formula is also useful (and provides the quickest way of arriving at the
result) if one just wants to write the ordinary flat space Laplacian on R3 in, say,
polar or cylindrical coordinates.
To illustrate this, let us calculate the Laplacian for the standard metric on Rn+1
in polar coordinates. The standard procedure would be to first determine the
coordinate transformation xi = xi (r, angles), then calculate /xi , and finally
P
assemble all the bits and pieces to calculate = i (/xi )2 . This is a pain.
To calculate the Laplacian, we do not need to know the coordinate transformation,
all we need is the metric. In polar coordinates, this metric takes the form
where d2n is the standard line-element on the unit n-sphere S n . The determinant
of this metric is g r 2n (times a function of the coordinates (angles) on the
sphere). Thus, for n = 1 one has ds2 = dr 2 + r 2 d2 and therefore
In general, denoting the angular part of the Laplacian, i.e. the Laplacian of S n ,
by S n , one finds analogously
= r2 + nr 1 r + r 2 S n . (4.59)
140
I hope you agree that this method is superior to the standard procedure.
Now the second term is an ordinary total derivative and thus, if V vanishes
sufficiently rapidly at infinity, one has
Z
4
gd x V = 0 . (4.61)
A somewhat more precise statement of this theorem, including the boundary con-
tributions to the integral, will be given in section 15.3.
T = T + T + T + . . .
= g1/2 (g1/2 T ) + T + . . . . (4.62)
V g = V g + ( V )g + ( V )g . (4.64)
While we saw that this expression could be understood and deduced from the
requirement that the variation of the metric is itself a tensorial object that trans-
forms like the metric, the tensorial nature of the above expression is far from
manifest. However, it has a very nice and simple expression in terms of covariant
derivatives of V , namely
V g = V + V (4.65)
141
We can also obtain this condition as the covariantisation of the statement that in
a particular coordinate system the coefficients of the metric do not depend on one
of these coordinates, say y,
y g = 0 , (4.67)
so that the metric is then manifestly invariant under translations in y. In such
a coordinate system adapted to the symmetry at hand, these translations are
generated by K = y , and for a vector of this form (in particular, thus, with
constant coefficients) one has
K = y K = y
K = y (4.68)
K + K = y g
(where in the last step the basic relation (1.156) was used). Thus we find that the
fact that the metric is y-translation invariant can be characterised covariantly as
the statement that K = y satisfies
y g = 0 K + K = 0 . (4.69)
This is again the Killing equation (4.66). As this equation is now tensorial it is
valid in any coordinate system, in particular independently of whether or not the
coordinate system is adapted to K in the way described above.
The expressions (4.65) and (4.69) will be rederived (and placed into the general
context of Lie derivatives and Killing vectors) in section 8 - see in particular section
8.5.
You will have noticed that many equations simplify considerably for completely anti-
symmetric tensors. In particular, their curl can be defined in a tensorial way without
reference to any metric. This observation is at the heart of the coordinate indepen-
dent calculus of differential forms. In this context, the curl is known as the exterior
derivative.
Indeed, it is also straightforward to show directly, i.e. without going through the illogi-
cal loop of introducing the covariant derivative in order to obtain something manifestly
tensorial only to find it disappear again from the final expression, that [ A1 ...p ] is
a tensor, i.e. transforms as a tensor under coordinate transformations: what happens
is that the possible obstructions to the tensorial behaviour, namely derivatives of Ja-
cobians, drop out after anti-symmetrisations because they are are really 2nd partial
derivatives of the coordinates, which are symmetric and thus do not survive the anti-
symmetrisation.
To see this completely explicitly, consider a covector A (x) and a coordinate transfor-
mation x = x (y ), with Jacobi matrix
x
J = . (4.70)
y
142
As a covector, A transforms as A = J A , and therefore its derivative transforms as
(using = J )
A = J A A = J J A + ( J )A . (4.71)
Because of
2 x
J = = J , (4.72)
y y
for the anti-symmetrised derivative one finds the tensorial transformation behaviour
A A = J J ( A A ) . (4.73)
Likewise, Lie derivatives of tensors in general (section 8) are, as the special case of
the Lie derivative of the metric mentioned above - see (4.65), automatically tensorial
objects (and one can, but need not, make their tensorial nature manifest by writing
these derivatives in terms of covariant derivatives).
Here is an elementary proof of the identity (4.48), and a useful more general formula
for the variation of the determinant of the metric, namely
g = gg g or g1 g = g g . (4.74)
This proof is based on the standard cofactor or minor expansion of the determinant of
a matrix (an alternative standard proof can, as also outlined below, be based on the
identity det G = exp tr log G and its derivative or variation). The cofactor expansion
formula for the determinant is
X
g= (1)+ g |m | , (4.75)
where |m | is the determinant of the minor of g , i.e. of the matrix one obtains by
removing the th row and th column from g .
since this is, in particular, the determinant of a matrix with g = g , i.e. of a matrix
with two equal rows. Together, these two results can be written as
X
(1)+ g |m | = g . (4.77)
This shows that the coefficients of the inverse metric g are given by
|m |
g = (1)+ , (4.78)
g
143
a formula that should also be familiar from linear algebra. Now varying g in (4.75) with
respect to g and noting that, by construction, m does not depend on g , one finds
X
g = (1)+ g |m | = gg g . (4.79)
For a symmetric matrix, in particuar for the metric, this reduces to the formula (4.74)
we set out to establish. It also implies
g = 21 gg g , (4.80)
a particuarly useful result that we will repeatedly make use of. An equally useful
variant of this equation is an expression for the variation of g expressed in terms of
the variations g of the components of the inverse metric. As a consequence of
g g = g = g g g (4.81)
or
g g = 4 (g )g = g g (4.82)
It follows from (4.79) that if the variation is the partial derivative one has
g
g = g = gg g . (4.84)
g
or
g 1 g = g g , (4.85)
The result (4.79) can also be written in matrix form, with G denoting the matrix with
components (G) = g , as
144
In this form, the result can also be derived from variation of the remarkably useful
identity
det G = e tr log G (4.90)
This identity, in turn, can be derived in an elementary way for diagonalisable G by
noting that it holds trivially for diagonal matrices, and therefore, by the conjugation
invariance of det and tr, also for diagonalisable matrices (like the metric). [And if
desired, this can in turn be extended to all matrices by topological arguments involving
extensions of continuous functionals from the dense set of diagonalisable matrices to the
space of all matrices . . . ]
So far, we have defined covariant differentiation for tensors defined everywhere in space
time. Frequently, however, one encounters tensors that are only defined on curves - like
the momentum of a particle which is only defined along its world line. In this section we
will see how to define covariant differentiation along a curve. Thus consider a curve x ( )
(where could be, but need not be, proper time) and the tangent vector field X (x( )) =
x ( ). Now define the covariant derivative D along the curve, covariantising d/d , by
d
= x D = X = x . (4.91)
d
Frequently one also uses the (suggestive, but ugly) notation
D V = x V + x V
d
= V (x( )) + (x( ))x ( )V (x( )) . (4.93)
d
For this to make sense, V needs to be defined only along the curve and not necessarily
everywhere in space time.
This notion of covariant derivative along a curve permits us, in particular, to define the
(covariant) acceleration a of a curve x ( ) as the covariant derivative of the velocity
u = x ,
+ x x = u u .
a = D x = x (4.94)
Thus we can characterise (affinely parametrised) geodesics as those curves whose cvari-
ant acceleration is zero,
Geodesics: a = u u = 0 , (4.95)
a reasonable and natural statement regarding the movement of freely falling particles.
If they are not affinely parametrised, as in (2.35), then instead of u u = 0 one has
u u = u . (4.96)
145
4.8 Parallel Transport and Geodesics
We now come to the important notion of parallel transport of a tensor along a curve.
Note that, in a general (curved) metric space time, it does not make sense to ask if two
vectors defined at points x and y are parallel to each other or not. However, given a
metric and a curve connecting these two points, one can compare the two by dragging
one along the curve to the other using the covariant derivative.
We say that a tensor T
is parallel transported along the curve x ( ) if
D T
= 0 . (4.97)
1. In a locally inertial coordinate system along the curve, this condition reduces to
dT /d = 0, i.e. to the statement that the tensor does not change along the curve.
Thus the above is indeed an appropriate tensorial generalisation of the intuitive
notion of parallel transport to a general metric space-time.
2. The parallel transport condition is a first order differential equation along the
curve and thus defines T
( ) given an initial value T (0 ).
3. Taking T to be the tangent vector u = x to the curve itself, the condition for
parallel transport becomes
D u = 0 + x x = 0 ,
x (4.98)
i.e. precisely the geodesic equation. We have already seen that geodesics are
precisely the curves with zero acceleration. We can now equivalently characterise
them by the property that their tangent vectors are parallel transported (do not
change) along the curve. For this reason geodesics are also known as autoparallels.
4. Since the metric is covariantly constant, it is parallel along any curve. Thus, in
particular, if V is parallel transported, also its length remains constant along the
curve,
d
D V = 0 (g V V ) = D (g V V ) = 0 . (4.99)
d
In particular, we rediscover the fact claimed in (2.23) that the quantity g x x
is constant along a geodesic,
d
D x = 0 (g x x ) = 0 . (4.100)
d
5. Now let x ( ) be a geodesic and V parallel along this geodesic. Then, as one
might intuitively expect, also the angle between V and the tangent vector to the
curve u remains constant. This is a consequence of the fact that both the norm
of V and the norm of u are constant along the curve and that
d
(g u V ) = D (g u V ) = g (D u )V + g u D V = 0 (4.101)
d
146
4.9 Example: Parallel Transport on the 2-Sphere
As usual, the simplest non-trivial example is provided by the 2-sphere with its standard
line element
ds2 = d2 = d 2 + sin2 d2 , (4.102)
with the non-zero Christoffel symbols (determined e.g. from the geodesic equation, as
in (2.82) - (2.87))
g x x = sin2 0 , (4.106)
0 = V + x V = V + V
(4.107)
= V + V + V .
Using the explicit form of the Christoffel symbols, the parallel transport equations are
thus
0 = V sin cos V
(4.108)
0 = V + cot V .
Differentiating once more, these equations can be decoupled and take the form of har-
monic oscillator equations with frequency cos 0 ,
(2 + cos2 0 )V = 0 . (4.109)
Plugging this into the 1st order equations to reduce the spurious 4 to 2 integration
constants, and relating them to the intial values at = 0, say,
V (0 , = 0) = v , (4.111)
147
one finally finds the result
Remarks:
1. In the special case of parallel transport along the equator 0 = /2, one has
cos 0 = 0, and therefore
0 = /2 V (/2, ) = v . (4.113)
In other words, the components are constant under parallel transport along the
equator. This is inuitively obvious on the basis of spherical symmetry. Since
among the family of constant = 0 curves olny the equator is a geodesic (great
circle), this is also in agreement with the general results obtained above, which
imply that upon parallel transport along the equator the angle between the vector
and the equator remains constant. In 2 dimensions, this condition, together with
the fact that the lenght of a vector remains invariant under parallel transport
in general, is sufficient to imply that the parallel transported components are
constant along the path.
2. While the above is not unexpected, perhaps the most interesting consequence
of the above result (4.112) is that, in general, not only are the components not
constant but that actually, after having completed the 2-circuit along the path to
return to the starting point, the parallel transported vector will not agree with the
initial vector. Indeed, the components at = 2 are related to the components
v at = 0 by
3. As we will see in section 10.1, this fact that parallel transport along closed paths is
non-trivial (equivalently that parallel transport from one point to another depends
on the path) can be directly attributed to (and is the smoking gun of) the presence
of curvature.
4. If desired, the result can be written in terms of proper distance s along the circle,
rather than the angle , by the substitution
148
5. The result (4.112) takes on a more transparent form when written in terms of
the components of V and v with respect to an orthonormal basis (section 3.8) E
rather than the coordinate basis . Such an orthonormal basis is provided by
E = , E = (sin )1 , (4.116)
g E E = g E E = 1 , g E E = 0 . (4.117)
The components with respect to this orthonormal basis are related to the coordi-
nate components by
V = V = V E V = V , V = sin V (4.118)
is known as the deficit angle or holonomy of the parallel transport along the given
loop. With this terminology we can say that the holonomy along the equator is
trivial.
7. At the other extreme, we see that there is a non-trivial holonomy as 0 0, i.e. for
parallel transport along an infinitesimal loop around the north pole, along which
the parallel transported vector performs a complete 2-rotation, (2) = 2. As
shown in section 10.1, parallel transport along infinitesimal loops at or around a
point provides a precise measure of the curvature at that point.
8. Curiously, as shown by Rothman, Ellis and Murugan, the holonomy along circular
equatorial orbits in the Schwarzschild geometry (such orbits are geodesics at the
critical points of the effective potential for geodesic motion, to be discussed in sec-
tion 24.3), is non-trivial, even though again intuitive reasoning based on spherical
symmetry might have led one to expect a trivial result (and would thus have led
one astray).11
11
T. Rothman, G. Ellis, J. Murugan, Holonomy in the Schwarzschild-Droste Geometry,
arXiv:gr-qc/0008070.
149
4.10 Fermi-Walker Parallel Transport
The properties of parallel transport established in section 4.8 show that this is a natural
prescription for transporting tensorial objects along a geodesic. However, it is important
to keep in mind that this is just one possible description, obtained by imposing the
differential equation (4.97), e.g. for a vector
D V = 0 . (4.122)
a = D u = x x 6= 0 , (4.123)
however, this prescription has some shortcomings. For example, parallel transport of a
tanget vector to the curve at a point to another point at the curve will not give rise to
the tangent vector at the second point, simply because D V = 0 with initial condition
V (0 ) = u (0 ), say (parallel transport) is evidently not the same as D u = a (the
equation satisfied by the tangent vector). Likewise, the scalar product between the
tangent vector to the (non-geodesic) curve and some parallel-transported vector along
it will not remain constant in general,
d
D u = a , D V = 0 (g u V ) = a V . (4.124)
d
A vivid illustration of this is provided by the example of the previous section:
The latter procedure appears to be much more natural in this case than rotating ones
basis as one goes around the sphere. Analogously, for an observer along a timelike curve
it would be desirable to be able to set up once and for all a local reference system on
the worldline, consisting of the (unit) tangent vector E0 = u in the time-direction,
and three orthogonal and mutually orthogonal vectors Ek in the spatial directions (the
laboratory system of the observer), regardless of whether the oberver is in free fall or
not (indeed, most laboratories are not . . . ).
This procedure can be formalised by replacing the parallel transport condition (4.122)
along a timelike curve by the Fermi-Walker Transport prescription
F V D V + F V = 0 , (4.125)
150
with
F = a u u a . (4.126)
Indeed, parallel transport according to this prescription has the following desirable
features:
a = u u u F = 0 . (4.127)
F u = 0 . (4.128)
Proof:
F u = D u + F u
(4.129)
= a + (a u u a )u = a a = 0
because u u = 1 and a u = 0. Thus the solution to the Fermi-Walker trans-
port prescription for V (0 ) = u (0 ) is just the tangent vector u ,
F V = 0 , V (0 ) = u (0 ) V ( ) = u ( ) . (4.130)
151
Remarks:
1. The signs chosen here are appropriate for timelike curves with u u = 1. As the
proofs of the above statements show, in the spacelike case one needs to replace
F F .
3. Note that the properties 2-4 in the above list rely on the 3 properties
F u = a , u F = a , F + F = 0 (4.139)
F F + (4.140)
with
u = u = 0 , + = 0 . (4.141)
Since there is no such rotation term in the prescription for Fermi-Walker transport,
and no natural candidate for it either with only u and a at ones disposal, it
is natural to think of Fermi-Walker transport as a prescription for transporting
objects in a non-rotating way.
152
4.11 Epilogue: Manifolds? Think Globally, Act Locally!
In section 3.9 I had already briefly discussed some issues regarding the use of indices (and
thus in some sense of local coordinates), and had advocated them as a useful bookkeeping
device that also provides a transparent way of performing algebraic operations (tensor
algebra). In the meantime we have seen that this extends to tensor analysis, and I can
only reiterate that for most purposes and in most cases it is much more convenient to
perform calculations in this notation than in some supposedly more elegant index-free
notation.
There is one issue, however, that is worth commenting upon, and that in the end actually
provides further justification for being allowed to adopt this procedure. Namely, in using
local (Cartesian, say) coordinates x to describe a space or space-time (I will use space
in the following) one is implicitly assuming the following 3 things:
1. first of all, that one can always locally introduce Cartesian coordinates on that
space (so as to then be able to perform tensor algebra, tensor analysis etc.);
2. secondly, that different choices of local coordinates will give compatible descrip-
tions of that space;
3. and finally, that in principle one can obtain complete information about the space
by covering it with such local coordinate systems.
When these assumptions are satisfied, then one is justified in using local coordinates to
describe such a space. The point of this brief section is just to point out that (modulo
some topological fine-points) these conditions amount precisely to the definition of a
(differentiable or smooth) manifold in mathematics.
Thus while I could have started off these notes with an introduction to and definition
of smooth manifolds (and numerous textbooks do), for all local intents and purposes
this is then really equivalent to (consistently) working in local coordinates, as we have
done and will continue to do. It is true that the notion of manifolds, of vector bundles
on them etc. becomes indispensable for certain more advanced questions dealing with
the global structure of a space-time, or theorems about the existence and uniqueness of
solutions to differential equations on some manifold, say, but these are not topics that
will be addressed in these notes.
The usual textbook definition of a manifold consists essentially of the following steps:13
13
This presentation is adapted from the concise and clear description in S. Mukhi, N. Mukunda,
153
1. Topological Spaces
A topological space is a set S together with a collection of subsets U of S (called
open sets) which includes S and the empty set, and which is closed under union
and finite intersection. This set of open sets defines the topology of the space
and a corresponding notion of continuous maps (the inverse image of any open
set is open) and homeomorphisms (bijective maps such that both and 1
are continuous) between topological spaces. In particular there is a notion of
continuity for (real-valued, say) functions
f: SR (4.142)
2. Charts
However, in this context there is no notion of differentiability or differentiation.
In order to have such things at ones disposal one needs topological spaces that
locally look like Rn . The essential building blocks of such a topological space
are charts:
A chart C on a topological space S is the pair C = (U, ) where U S is an open
set of S and is a homeomorphism
U S (U ) Rn . (4.143)
3. Topologial Manifolds
A topological manifold is a topological space M that is locally homeomorphic to
Rn in the sense that for each point p there is a chart C = (U, ) with p U
(and that satisfies some further topological regularity conditions we are not inter-
ested in, such as Hausdorff and usually either second countable or paracompact).
Equivalently, a topological space has the structure of a topological manifold when
it posesses a covering by open sets Ua with charts Ca = (Ua , a ).
154
If one has two charts on M , C1 = (U1 , 1 ) and C2 = (U2 , 2 ), and U1 U2 6= ,
then the transition functions
1 1
2 : 2 (U1 U2 ) 1 (U1 U2 )
(4.144)
2 1
1 : 1 (U1 U2 ) 2 (U1 U2 )
f: M R (4.146)
fU = f 1 : (U ) Rn R . (4.147)
i.e.
pU f (p) = fU (~xp ) . (4.148)
For such functions on Rn we now not only have a notion of continuity at our
disposal, but also the notions of differentiability, smoothness, differentiation etc.
On the intersection of 2 charts we can represent the function f in 2 different ways
in terms of local coordinates, namely by the functions fUa fa for a = 1, 2,
on U1 U2 : f = f1 1 = f2 2 f2 = f1 (1 1
2 )
(4.149)
f1 = f2 (2 1
1 )
This is just the change of variables formula for a function (scalar), namely
6. Compatibility of Charts
In order to be able to extend the notion of smoothness (C -differentiability),
say, of a function from a local chart consistently to all of M , we need to impose
compatibility conditions on intersecting charts.
It is evident from (4.149) that the notion of smoothness of a function around a
point p will only be independent of the chart if the transition functions 1 12
and 2 1
1 (i.e. the coordinate transformations) are also smooth. Thus we define
2 charts to be smoothly compatible if either U1 U2 is empty or, otherwise, if these
maps are smooth.
Note that for topological manifolds and the condition of continuity any 2 charts
are automatically compatible since the transition functions are continuous.
155
7. Smooth Atlas and Compatibility and Equivalence of Atlases
A smooth atlas A(M ) of M is now naturally a family of charts Ca = (Ua , a )
which cover M and such that all charts are mutually smoothly compatible.
2 smooth atlases A1 (M ) and A2 (M ) for the same topological manifold M are said
to be compatible with each other if all the charts of A1 are compatible with all
the charts of A2 . This defines an equivalence relation on atlases.
ab = b 1
a : a (Ua ) Rm b (Vb ) Rn (4.155)
Analagously one can define C k -differentiable manifolds (transition functions are required
to be of degree C k ), real analytic manifolds (transition functions are required to be real
analytic), complex manifolds (modelled on open subsets of Cn , with holomorphic tran-
sition functions), etc., as well as submanifolds (modelled on subspaces of Rn ), manifolds
with boundary (modelled on the half-space Rn+ ) etc.
156
5 Physics in a Gravitational Field and Minimal Coupling
Recall that the Principle of General Covariance (section 3.1) says that, by virtue of
the Einstein Equivalence Principle, a generally covariant equation will be valid in an
arbitrary gravitational field provided that it is valid in Minkowski space in inertial
coordinates (i.e. in the absence of gravity and/or acceleration).
We now have all the tools at our disposal to construct such equations. In particular, the
fact that the covariant derivative maps tensors to tensors and reduces to the ordinary
partial derivative in a locally inertial coordinate system suggests the following procedure
or algorithm for obtaining equations that satisfy the Principle of General Covariance:
a 7 x . (5.1)
6. In particular, for the proper-time derivative along a curve this entails replacing
d/d by D ,
d
7 D = x . (5.5)
d
R R 4
7. Wherever an integral d4 appears, replace it by gd x,
Z Z
4 4
d 7 gd x . (5.6)
157
By construction, the resulting equations or expressions are tensorial (generally covari-
ant) and true in the absence of gravity and hence satisfy the conditions for the Principle
of General Covariance to apply. As a consequence they will be true in the presence of
gravitational fields, at least on scales small compared to those of the gravitational fields.
This procedure can thus be regarded as providing us with a description how to couple
matter (particles, fields) to the gravitational field.
Remarks:
2. The reasons for the at least on small scales caveat in the paragraph above is
that if one considers higher derivatives of the metric tensor then there are other
equations that one can write down, involving e.g. the curvature tensor, that are
tensorial but reduce to the same equations in the absence of gravity.
We can see the power of the formalism we have developed so far by rederiving the laws
of particle mechanics in a general gravitational field. In Special Relativity (SR), the
motion of a free particle with mass m is governed by the equation
dua
SR: aa = =0 , (5.7)
d
where ua = d a /d is the 4-velocity and aa the 4-acceleration. Thus, using the principle
of minimal coupling, the equation of motion of a free particle in a general gravitational
field is
GR: a = D u = 0 x + x x = 0 , (5.8)
158
We could also have arrived at this equation for a free particle in a gravitational field by
applying the minimal coupling description not at the level of the equations of motion
but rather (and perhaps conceptually more satisfactorily) at the level of the action, i.e.
by replacing
Z Z p Z Z
p
S = m d = m a
ab d d b m d = m g dx dx ,
(5.9)
and this is exactly what we already did back in section 1.7 where we showed that this
also leads to the geodesic equation (5.8).
Here is where the formalism we have developed really pays off. We will see once again
that, using the minimal coupling rule, we can immediately rewrite the equations for a
scalar field (here) and the Maxwell equations (in section 5.6 below) in a form in which
they are valid in an arbitrary gravitational field.
1. The action for a (real) free massive scalar field in Special Relativity is
Z h i
SR: S[] = d4 12 ab a b 21 m2 2 . (5.10)
To covariantise this, we replace d4 gd4 x, ab g , and we can replace a
by or (since this makes no difference on scalars). Therefore, the covariant
action in a general gravitational field is
Z
4 h 1 i
GR: S[, g ] = gd x 2 g 12 m2 2 . (5.11)
Here I have also indicated the dependence of the action on the metric g . This
is not (yet) a dynamical field, though, just the gravitational background field.
Remarks:
(a) A comment on how to derive this: if one thinks of the in the action as
covariant derivatives, , then the calculation is identical to that in
159
Minkowski space provided that one remembers that g = 0. If one sticks
with the ordinary partial derivatives, then upon the usual integration by
parts one picks up a term ( gg ) which then evidently leads to the
Laplacian in the form (4.56).
(b) If the relative sign of (or g ) and m2 in the Klein-Gordon equation looks
unfamiliar to you, then this is probably due to the fact that in a course where
you first encountered the Klein-Gordon equation the opposite (particle physi-
cists) sign convention for the Minkowski metric was used, with its negative
definite spatial metric.
(c) All of this generalises in a straightforward way to (self-)interacting scalar
fields, described by a potential V (). In particular, the action is
Z
4 h 1 i
S[, g ] = gd x 2 g V () . (5.13)
Logically the next thing to discuss would be the energy-momentum tensor, e.g. the
minimally coupled counterpart of the special relativistic (Noether) energy-momentum
tensor
SR: Tab = a b + ab L (5.14)
and its properties. However, it turns out that there is more to say about this than meets
the eye, and we will therefore return to this issue in more detail in section 6.
Before turning to our next example, I want to briefly comment on the issue of general
covariance in Minkowski space, as this tends to generate quite a bit of confusion and
unnecessary debates. I will discuss this issue in the context of the above example of a
scalar field, but the discussion is valid more generally.
On the one hand, the action (5.10) is generally considered to be invariant (only) under
Lorentz or Poincare transformations, while by construction the action (5.11) is invariant
under arbitrary coordinate transformations. Does this really mean that the theory of a
scalar field in a non-trivial gravitational background has more invariances than that in
a Minkowski background?
On the other hand, certainly nothing prevents one from using e.g. spherical (and thus in
particular non-inertial) coordinates in Minkowski space to write down the Klein-Gordon
equation or action. But does this mean that the action (5.10) is actually (secretly)
invariant also under such non-Lorentz transformations?
Well, that depends . . . While this sounds like (and generally is correctly considered to
be) a somewhat unsatisfactory answer, I can be more specific:
160
it depends on what one means by invariance (or covariance)
From the current point of view, the natural answer is that the action (5.11) is generally
covariant in any gravitational field, in particular therefore also in the absence of a
true gravitational field, i.e. in a purely fictitious gravitational field or, equivalently, in
Minkowski space. If we specialise the action (5.11) to such a gravitational field, i.e. to
the Minkowski metric written in some perhaps non-inertial coordinates, we get
Z
4 h 1 i
S[, ] = d x 2 21 m2 2 . (5.15)
Here it is important to keep in mind that refers to the components of the Minkowski
metric in the not-necessarily inertial coordinates x , as in
a b
= ab . (5.16)
x x
As a consequence, also is not necessarily equal to 1. This action is invariant under
arbitrary coordinate transformations, provided that one transforms the fields and the
metric appropriately.
1. If one looks for the transformations of the coordinates a and the fields that
leave the action invariant (with fixed metric components ab ) then none too sur-
prisingly one finds that the action is invariant under Poincare transformations of
the coordinates provided that the scalar fields transform as scalars, but not under
more general transformations.
2. If one looks for the transformations of the coordinates a and the fields and
the metric ab that leave the action invariant, then one finds that the action is
invariant under arbitrary coordinate transformations
Sometimes option (1) is taken to define the invariance group (Poincare transformations)
while option (2) refers to the covariance group. In this sense, special relativity is in-
variant under Poincare transformations but is at the same time generally covariant. In
161
philosophy of science or epistemological terms whether one has option (1) or option (2)
is related to the question whether or not the Minkowski metric is regarded as an absolute
element of the theory. With ab promoted to an absolute element, general covariance
is reduced to Poincare invariance (those transformations that, from the generally co-
variant transforms point of view, leave ab invariant).14 Unfruitful discussions
ensue when tacitly conflicting assumptions are made about what are considered to be
the absolute elements of a theory.
~ B
. ~ =0 , ~ E
~ + t B
~ =0 (5.17)
~ E
. ~ = /0 , ~ 1 t E
~ B ~ = 0 J~ (5.18)
c2
~ J~ = 0
t + . (5.19)
~ and ,
4. the vector and scalar potentials A
~ =
B ~ A
~ , ~ =
E ~ t A
~ (5.20)
~ and B
5. and the corresponding gauge transformations leaving E ~ invariant,
~A
A ~ +
~ , t ~ E
E ~ , ~ B
B ~ . (5.21)
The charge density and current can be packaged into a Lorentz vector
162
(note that in signature (-+++) one has to choose whether to identify J 0 or J0 = J 0
with the charge density, here we choose the former), and the continuity equation can
be written in the manifestly Lorentz-invariant form
a J a = 0 . (5.23)
Likewise, the scalar and vector potential can be packaged into a Lorentz covector
~ ,
Aa = (/c, A) (5.24)
Aa Aa + a . (5.25)
Fab = a Ab b Aa (5.26)
and
0 +E1 /c +E2 /c +E3 /c
E /c 0 +B3 B2
ab 1
(F ) = (5.29)
E2 /c B3 0 +B1
E3 /c +B2 B1 0
In terms of these Lorentz tensors, the homogeneous Maxwell equations can be written
as
[a Fbc] = 0 a Fbc + c Fab + b Fca = 0 , (5.30)
and these equations are identically satisfied if Fab derives from a potential,
163
with the Maxwell Lagrangian
~ 2 /c2 B
14 Fab F ab = 21 F0k F 0k 14 Fik F ik = 12 (E ~ 2) . (5.34)
This is essentially all we will need (some facts regarding the Noether versus covariant
energy-momentum tensor of Maxwell theory will be recalled below).
Mutatis mutandis we can now proceed in the same way as for a scalar field.
1. The basic dynamical field is the vector potential Aa . Given the vector potential
A , the Maxwell field strength tensor in Special Relativity is
Therefore in a general metric space time (gravitational field) one is led to (or
tempted to) define the field strength tensor as
GR: F = A A = A A . (5.36)
SR: a F ab = J b
[a Fbc] = 0 . (5.37)
Thus in a general gravitational field (curved space time) these equations become
GR: F = J
(5.38)
[ F] = 0 ,
164
where now of course all indices are raised and lowered with the metric g ,
F = g g F . (5.39)
Remarks:
(a) Regarding the use of the covariant derivative in the second equation, the
same caveat as above applies.
(b) In particular, using the results derived in section 4.5, we can rewrite these
two equations as
GR: ( gF ) = gJ
[ F] = 0 . (5.40)
(c) It is clear from the first of these equations that the Maxwell equations imply
that the current is covariantly conserved: since
( gF ) = 0 (5.41)
a Aa = 0 a F ab = J b Ab = Jb . (5.43)
This gauge condition has the virtue of preserving Lorentz invariance. Simi-
larly, its covariantised version
A = 0 (5.44)
165
the covariant divergence of the Maxwell field strength tensor can be written
as
A = 0 F = ( A A ) = A [ , ]A ,
(5.45)
where A = A is the naive Laplacian on scalars. The second
term would of course be zero in Minkowski space, but here it is not. Indeed,
as we will see in section 7, the quintessence of a non-trivial geometry is
that covariant derivatives do not commute on tensors other than scalars.
In particular, here one finds that as a consequence of (7.38) the Maxwell
equations in the covariant Lorenz gauge can be written as
A R A = J , (5.46)
SR: f a = eF ab b . (5.47)
GR: f = eg F x . (5.48)
in General Relativity.
As for the scalar field, depending on whether one writes the field strength tensor
as F = A A or as F = A A , by varying this action with
respect to the A one obtains the vacuum Maxwell equations F = 0 in either
of the 2 forms
(
( gF ) = 0
S[A , g ] = 0 (5.51)
A F = 0
166
Remarks:
(a) Writing out explicitly the Lagrangian in terms of its components (with re-
spect to some coordinate system x = (t, xk )) one finds
While the 1st and 2nd lines look just like gravitationally dressed standard
terms E ~ 2 and B~ 2 , the last line appears to suggest a gravitationally
induced coupling between the electric and magnetic fields. This, however,
is misleading and simply not a meaningful way of expressing things. After
all, even in Minkowski space the decomposition of the electro-magnetic field
into electric and magnetic fields depends on the choice of inertial reference
system.
R 4
(b) In order to add sources, one can add gd x A J to the Maxwell action,
thus coupling the matter current to the Maxwell gauge field. Instead of just
adding such a (phenomenological) source-term by hand, a more coherent mi-
croscopic approach (which also provides the sources with their own dynamics)
is to consider a matter action (minimally) coupled to the Maxwell field,
SM [] SM [, A ] . (5.53)
The combined Maxwell + matter action will then give rise to the Maxwell
equations with a source provided that one defines the current J as the
variation of the matter action with respect to the gauge field,
SM [, A ]
J . (5.54)
A
Im anticipation of this I just want to point out that, by the same rationale as that
leading to (5.54), perhaps we should define the source term for the gravitational field
by the variation of the gravitationally minimally coupled matter action with respect to
the metric. If we now call this source term the energy-momentum tensor, then we have
a candidate definition of the energy-momentum tensor which is natural and appropriate
from the gravitational point of view. We will pursue this point of view in section 6.6.
167
5.7 Minimal Coupling and (quasi-)Topological Couplings
In all the cases considered so far, the minimal coupling prescription resulted in a mini-
mally coupled matter action that depends explicitly on the metric - this is as it should
be and is not a surprise. What would be more of a surprise would be to find minimally
coupled and hence generally covariant contributions to an action that do do not depend
on the metric, but such examples do indeed exist (and play an important role in many
branches of physics and even mathematics, ranging from the strong-CP problem in QCD
to high-Tc superconductors to topology). Such terms in the action are usually referred
to as topological terms in the physics literature but as they need not be (and usually
are not) purely topological in the mathematics sense, for lack of a better name I refer
to them as quasi-topological.
with Z
Ss [] = d4 x Ls (, ) (5.56)
some arbitrary standard scalar field action (of the type already discussed), Sm [A]
the usual Maxwell action,
Z Z
Sm [A] = d4 x Lm ( A ) = 14 d4 x F F (5.57)
where
F = 1
2 F (5.59)
168
Minimal coupling for the first two (standard) terms proceeds as already discussed
above. For the third, axionic term, we make the usual replacement d4 x gd4 x
and recall from (3.65) that
1
(5.60)
g
is a (4,0) tensor, so that the generally covariant generalisation of the axionic action
is Z
4
1
Sa [, A, g ] = 8 gd xf () F F
Z (5.61)
1 4
= 8 d xf () F F = Sa [, A]
We see that, as announced, the metric dependence drops out of the minimally cou-
pled generally covariant action. The reason for this is that the axionic Lagrangian
is already all by itself a scalar density of weight w = +1, and that therefore
its integral (3.58) is well-defined and generally covariant without having to take
recourse to a metric to construct an auxiliary weight-one object like g.
L = Lm + k Lcs = 14 F F + 21 k A F . (5.62)
Minimal coupling for the first term is standard and for the 2nd term one finds,
as above, that the generally covariant minimally coupled Chern-Simons action is
actually metric independent (since the Chern-Simons Lagrangian is a density of
weight w = 1),
Z
1 3
Scs [A, g ] = 2 k gd x A F
Z (5.63)
1 3
= 2k d x A F = Scs [A]
As an aside note that the above theory is also known as topologically massive
Maxwell theory, since the CS term provides a gauge-invariant mass term for the
photon. One quick way to see this is to note that the equations of motion are
F + k F = 0 . (5.64)
G = 1
2 F (5.65)
169
the equations of motion and the Bianchi identity take the form
G G = 2k G , G = 0 (5.66)
respectively. Acting with on the equation of motion and using the Bianchi
identity and again the equation of motion one finds
G = 2k G = k ( G G )
(5.67)
= 2k2 G = 4k2 G
These quasi-topological terms modify the equations of motion. Moreover, since they
depend on the derivatives of the fields, they will contribute to the canonical Noether
energy-momentum tensor. On the other hand, since they do not depend on the metric,
they do not contribute to the covariant energy-momentum tensor, defined in section 6
in terms of the variation of the matter action with respect to the metric (and as such
playing the role of the source term for the Einstein gravitational feild equations).
How it nevertheless conspires that this tensor is conserved on-shell even though the equa-
tions of motion have been modified and how the improved canonical energy-momentum
tensor nevertheless ends up agreeing with the covariant energy-momentum tensor on-
shell will be explored and explained in section 21.5.
where V is the four-volume R3 [t0 , t1 ]. This holds provided that J vanishes at spatial
infinity.
Now in General Relativity, the conservation law will be replaced by the covariant conser-
vation law J = 0, and one may wonder if this also leads to some conserved charges
in the ordinary sense. The answer is yes because, recalling the formula for the covariant
divergence of a vector,
J = g1/2 (g 1/2 J ) , (5.70)
170
we see that
J = 0 (g1/2 J ) = 0 , (5.71)
so that g1/2 J is a conserved current in the ordinary sense. We then obtain conserved
quantities in the ordinary sense by integrating J over a spacelike hypersurface . We
will develop a more precise formula for this, an appropriate version of the Gauss theorem
for hypersurfaces in curved space-times, in section 15.3.
The factor g1/2 apearing in the current conservation law can be understood physically.
To see what it means, split J into its space-time direction u , with u u = 1, and
its magnitude as
J = u . (5.72)
This defines the average four-velocity of the conserved quantity represented by J and
its density measured by an observer moving at that average velocity (rest mass density,
charge density, number density, . . . ). Since u is a vector, in order for J to be a vector,
has to be a scalar. Therefore this density is defined as per unit proper volume. The
factor of g1/2 transforms this into density per coordinate volume and this quantity is
conserved (in a comoving coordinate system where J 0 = , J i = 0).
We will come back to this in the context of cosmology later on in this course, but
for now just think of the following picture (Figure 44 in section 33): take a balloon,
draw lots of dots on it at random, representing particles or galaxies. Next choose some
coordinate system on the balloon and draw the coordinate grid on it. Now inflate
or deflate the balloon. This represents a time dependent metric, roughly of the form
ds2 = r 2 (t)(d 2 + sin2 d2 ). You see that the number of dots per coordinate volume
element (area element in this case) does not change, whereas the number of dots per
unit proper volume (area) will.
171
6 Energy-Momentum Tensor I: Basics
6.1 Introduction
Newtons gravitational field equation for the gravitational potential is the Poisson
equation = 4GN , with the mass density. Thus in Newtons theory, mass is the
source of gravity. We can also more usefully, and thinking relativistically, write this in
terms of the energy density = c2 as
4GN (c=1)
= = 4GN . (6.1)
c2
Now we already noted in section 1.1 that in Special Relativity is not a scalar but
rather just one component of a tensor, the energy-momentum tensor
with the components Tab transforming into each other under Lorentz transformations
according to the transformation rules for Lorentz tensors.
Within the framework of special relativity and relativistic field theories there are (at
least) 2 common approaches to constructing or defining an energy-momentum tensor,
namely
A macroscopic phenomenological description is useful when one does not know (or does
not care about) the microscopic description of the matter one is dealing with but rather
tries to characterise its properties in terms of the specification of some macroscopic
(thermodynamic, hydrodynamic) parameters such as energy (density), pressure, viscos-
ity etc. For many purposes this is the appropriate language for describing e.g. gases or
fluids.
172
In this case, one constructs the energy-momentum tensor in such a way that it encodes
the physics one is trying to describe (primarily conservation laws and dynamics). As
a simple example of this (not by coincidence the one which is of most relevance for
gravitational physics and thus also later on in these notes), we consider a perfect fluid.
By definition, a perfect fluid is one in which a comoving oberver (i.e. an oberver in a local
rest-frame of the fluid) sees the fluid around him as isotropic (rotation-invariant). This
means that in this reference system the components of the energy-momentum tensor
have the form (any non-zero T0k would break rotation invariance, and ik is the unique
rotation-invariant symmetric (0, 2)-tensor)
Here and p are any functions of the coordinates, interpreted as the energy density and
the pressure of the fluid .
To specify the kind of fluid one is working with, one should supplement this by an
equation of state which provides a relation between and p. Typically this amounts to
specifying p as a function of ,
In terms of the 4-velocity u of the fluid, which in the local rest frame has the compo-
nents
ua = (1, 0, 0, 0) , (6.5)
one can combine the components of the energy-momentum tensor into the expression,
(note that energy density and pressure = force per unit area have the same dimensions).
As this is now a tensorial equation it is now valid in any inertial system. It defines the
energy-momentum tensor of a perfect fluid. The conditions
a Tab = 0 (6.7)
imply a continuity equation and (as we will see below) a relativistic generalisation of
the Euler equations for a perfect fluid. These are usually supplemented by a further
continuity equation for the fluid density current
j a = nua (6.8)
a j a = 0 . (6.9)
173
Now let us look at the consequences of these equations. Since
With the help of the current conservation equation, this equation rcan be recast into
the form
0 = ua a + ( + p)a (j a /n)
= ua a + ( + p)j a a (1/n)
(6.12)
= ua [a + ( + p)na (1/n)]
= nua [pa (1/n) + a (/n)] .
The point of rewriting the equation in this way is that (assuming a situation of ther-
modynamic equilibrium) the 2nd law of thermodynamics says that pressure p, energy
density and the volume per particle (1/n) are related by
where T is the temperature and s the specific entropy, i.e. the entropy per particle.15
Thus the above equation says that the specific entropy s is constant along the flow,
ua a s = 0 . (6.14)
so that
~
ua a = (v)(t + ~v .) (6.16)
is ((v) times) the usual convective derivative or comoving time-derivative, and the
above equation for the conservation of the specific entropy can be written as
~ =0 .
(t + ~v .)s (6.17)
becomes
~
t ((v)n) + .((v)n~
v) = 0 , (6.19)
15
See e.g. J. van Holten, Relativistic Fluid Dynamics, http://www.nikhef.nl/~t32/relhyd.pdf for
a derivation of this and further discussion.
174
and the time-component of (6.7) can be written as
~
t (p (v)2 ( + p)) .[(v)2
( + p)~v ] = 0 . (6.20)
Using this equation the spacelike components of (6.7) can then be written as
~ v ) + ~v t p + p
(v)2 ( + p)(t~v + ~v .~ ~ =0 . (6.21)
For a covariant rendition and elementary covariant derivation of the ensuing equations
of motion in a general gravitational field from the conservation of the energy-momentum
tensor, see e.g. the derivation of (34.73) and (34.74) in section 34.3.
A microscopic Lagrangian description is the method of choice when one has a Poincare-
invariant Lagrangian field theory description of the matter one is trying to describe.
In particular, this applies to the scalar and Maxwell field theories we have already
discussed and, more generally, to the modern microscopic and action-based description
of the fundamental interactions of particle physics.
For a Lagrangian L = L(, a ) depending on some fields and their 1st derivatives
(these could be scalar, vector, . . . fields), this tensor is defined by
L
ab = b + ab L (6.23)
(a )
(sign conventions are such that 00 rather than 00 is the energy density). It is built
from the 4 Noether currents
a
ab J(b) (6.24)
associated to translation invariance in the xb -direction, (b) = b . By calculating its
divergence, one finds
L
a ab = b , (6.25)
175
where L/ is the Euler-Lagrange variational derivative,
L L L
= a . (6.26)
(a )
a ab = 0 on-shell , (6.27)
This procedure and prescription is perfectly adequate and sufficient for scalar (spin 0)
fields, but it turns out to be far from satisfactory and far from the end of the story for
other fields (e.g. for Maxwell theory, for which ab turns out to be neither symmetric
nor gauge invariant). In this more general situation one is then required to improve
this prescription in order to obtain an energy-momentum tensor Tab with the desired
properties.
As a first example where everything works out nicely, consider the energy-momentum
tensor of a Klein-Gordon scalar field in Minkowski space. In this case,
ab = a b + ab L = a b 12 ab cd c d + m2 2 (6.29)
with
00 = 12 ( 2 + ()
~ 2 + m2 2 ) . (6.30)
ba = ab . (6.32)
In particular, this implies that the angular momentum current associated to an infinites-
imal Lorentz transformation (1.22) with parameters bc = cb , namely
La = 21 bc Labc (6.33)
with
Labc = xb ac xc ab , (6.34)
is on-shell conserved,
a Labc = bc cb = 0 . (6.35)
176
Since ab is symmetric (and gauge invariance is not an issue), in this example there is
no need to improve the Noether energy-momentum tensor, and we thus denote it by
Tab ,
Tab = ab = a b + ab L (6.36)
As we will see below, it is also straightforward to promote this tensor by minimal
coupling to a (covariantly conserved) energy-momentum tensor of a scalar field in a
gravitational field,
Now let us take a look at Maxwell theory in Minkowski space. In this case the canonical
Noether energy-momentum tensor is
L
ab = b Ac + ab L = Fac b Ac 14 ab Fcd F cd . (6.37)
( a Ac )
It is of course on-shell conserved by construction,
a ab = 0 on-shell (6.38)
(note that both sets of Maxwell equations are required to derive this), but it is neither
symmetric nor gauge-invariant. In particular, therefore, the angular momentum current
(6.34) is not conserved (even though Maxwell theory is Lorentz invariant), and the
expression for the energy-density is not gauge-invariant and does not agree with the
standard expression
~2 + B
00 6= 21 (E ~ 2) . (6.39)
This can be rectified by manipulating ab as
and noting that the last term can be written as a sum of two terms,
a c (Fac Ab ) = 0 , (6.42)
ab = ab c (Fac Ab )
(6.44)
ab = 0 on-shell ,
a (6.45)
177
as well as on-shell gauge invariant,
ab = Fac F c 1 ab Fcd F cd ( c Fac )Ab
b 4
(6.46)
= Fac Fb c 14 ab Fcd F cd on-shell .
Therefore one can define the improved energy-momentum tensor
(again both sets of Maxwell equations are required to establish this; with an
external source,
[a Fbc] = 0 , a F ab = J b (6.49)
one has the non-conservation law
instead, which becomes a conservation law when one adds to Tab the energy-
momentum tensor of the source fields + interaction terms);
Moreover, the components of T0k are the components of the Poynting vector and the
spatial components Tik are the components of the Maxwell stress tensor. Thus Tab is
the correct energy-momentum tensor of Maxwell theory.
This procedure to obtain Tab from ab can be understood in a more general and sys-
tematic way, via the so-called Belinfante improvement (or symmetrisation) procedure.
A brief synopsis of this construction will be provided in section 6.4 below.
One of the many useful properties of a symmetric, conserved energy-momentum tensor,
and one that is occasionally used in general relativity, e.g. in the discussion of the energy
and energy flux of gravitational waves, is the Laue Theorem (or tensor virial theorem).
It states that for such an energy-momentum tensor and a localised source (so that one
can integrate by parts with impunity) one has the relation
) Z Z
a Tab = 0 , Tab = Tba 3 ik 2
1
d x T = 2 (0 ) d3 x T00 xi xk (6.53)
localised source
178
between the integrated spatial components Tik and the quadrupole moments
Z
Q (t) = d3 x T00 xi xk
ik
(6.54)
Z
= + 2 0 d3 x (j T j0 )xi xk
1
Z
= 2 0 d3 x (T i0 xk + T k0 xi )
1
Z (6.57)
= 2 d3 x (0 T i0 xk + 0 T k0 xi )
1
Z
= 2 d3 x ((j T ij )xk + (j T kj )xi )
1
Z
= + d3 x T ik .
The procedure to obtain a symmetric and conserved Tab from the canonical Noether
energy-momentum tensor ab of a Poincare-invariant field theory, illustrated above in
the case of Maxwell theory, can be understood in a more general and systematic, but also
somewhat round-about way by appealing to the Lorentz-invariance of the action and
taking into account the non-trivial transformation behaviour of the fields with spin 6= 0
under Lorentz transformations. This recipe is known as the Belinfante improvement
procedure.16 Here is, just for reference purposes, a brief description of the general
features of this construction:
16
This is explained in many places, with varying degree of comprehensibility or comprehension. For
a detailed explanation, geared also towards applications to general relativity, see section 2 of T. Ortin,
Gravity and Strings; for a succinct description, and an extension of the usual procedure to Lagrangians
depending also on second derivatives of the fields, see section II of D. Bak, D. Cangemi, R. Jackiw,
Energy-Momentum Conservation in General Relativity, arXiv:hep-th/9310025.
179
In general (with the exception of spin zero scalar fields), ab = ac cb is not
symmetric,
ab 6= ba . (6.58)
By Lorentz invariance of the action and Noethers theorem, the total (orbital +
spin) angular momentum should be conserved, and the above (purely orbital)
angular momentum current fails to be conserved because it does not take into
account the spin, i.e. the fact that the are possibly non-trivial Lorentz tensors
(an irrelevant fact as far as the translational symmetries and hence the Noether
energy-momentum tensor are concerned).
This can be rectified by constructing the conserved total angular momentum cur-
rent J abc directly from Noethers theorem applied to Lorentz transformations
= L of the fields and coordinates. This gives rise to an additional (spin)
contribution to the current, schematically of the form
L a[bc]
J a = Jorbit
a
+ L , Jorbit = Labc . (6.60)
(a )
From the conservation of this current one can then via some gymnastics deduce
and extract a candidate energy-momentum tensor ab which is such that the total
a
angular momentum current J takes the form
ac xc
J abc = xb ab . (6.61)
Note that the spin-contribution to the total angular momentum has in this way
been transformed into an orbital contribution with respect to the new energy-
momentum tensor ab .
ab = ab + c cab ,
(6.62)
with
cab = acb a c cab 0 (6.63)
so that
a ab = 0 on-shell a = 0
a on-shell . (6.64)
b
180
Addition of such a term to the energy-momentum tensor is always possible as it
does not violate the conservation law. While this changes the definition of the
local energy and momentum densities, with suitable fall-off conditions on the abc
this has no effect on the total energy-momentum Pb (6.28),
Z Z
P P + d x c = P + d3 x k k0b .
b b 3 c0b b
I (6.65)
= P + dSk k0b .
b
a J abc = 0 on-shell ab =
ba on-shell . (6.66)
Thus on-shell ab agrees with a tensor Tab , which can be chosen to be symmetric
(off-shell) and on-shell conserved,
ab Tab :
Tab = Tba off-shell
(6.67)
a T ab = 0 on-shell .
Given the success of the minimal coupling prescription, it is natural to try to define the
matter energy-momentum tensor in a gravitational field in the same way. While this is
certainly possible to a certain extent (as the examples will show), this procedure also
leaves something to be desired (as the examples will also show).
Following the minimal coupling rules, we promote this to the energy-momentum tensor
T = ( + p)u u + pg , (6.70)
181
where u denotes the proper-time normalised velocity field of the fluid, g u u = 1.
The covariantisation of the conservation law (6.7) evidently reads
a Tab = 0 T = 0 . (6.71)
This generalises the continuity equation and the relativistic Euler equations to a fluid
moving in a gravitational field and reduces to the special relativistic laws at the origin
of a freely falling coordinate system, as it should.
There are neither conceptual nor technical complications in this example, and we will
adopt this perfect fluid energy-momentum tensor, supplemented by an appropriate equa-
tion of state, to model the interior of a star (section 23.7) and the matter content of
the universe (in our discussion of cosmology). In both of these examples, such a phe-
nomenological description is quite appropriate and sufficient (although for more detailed
investigations one may need to go beyond the perfect fluid approximation). For a de-
tailed analysis of the conservation equations in the context of cosmology, see sections
34.3 and 34.4.
Let us now turn to energy-momentum tensors for Lagrangian field theories, starting
with the example of the Klein-Gordon scalar field. As we saw above, in Minkowski
space its (Noether = improved) energy-momentum tensor is given by
Tab = a b + ab L = a b 12 ab cd c d + m2 2 . (6.72)
and it is easy to check that it is covariantly conserved for a solution to the equations
of motion in a gravitational background,
g m2 = 0 T = 0 . (6.74)
For the action (5.13) with a potential V (), the energy-momentum tensor of course also
has the form (6.73) with m2 2 /2 unsurprisingly replaced by V (),
T = 12 g g g V () , (6.75)
with
g = V () T = 0 . (6.76)
So far so good. However, the significance of this energy-momentum tensor outside the
realm of special relativity is not clear. In special relativity, it encodes the conserved
quantities associated to translation invariance, but in a general gravitational field there
is no translation invariance (or other symmetry). In particular, in a general gravitational
field
182
one cannot even derive the energy-momentum tensor (6.73) from Noethers theo-
rem applied to translations
and, related to this is the fact that one does not obtain an ordinary conservation
law but the covariant conservation law T = 0.
Regarding the second point, we will see in sections 6.9 and 9.1 below that to any con-
tinous symmetry of a gravitational field (metric) and the covariantly conserved energy-
momentum tensor one can associate a covariantly conserved current and thus also (as
discussed in section 5.8) a conserved charge.
Now let us turn to Maxwell theory. Here the situation is a priori a bit murkier, because
in principle we have both the canonical Noether energy-momentum tensor ab (6.37),
at our disposal. Let us start with the latter, not only because it is the nicer object but
also because it turns out to give the correct result. Applying the rules of minimal
coupling, one finds the tensor
T = F F 41 g F F , (6.79)
where indices of the (metric independent) field strength tensor F are of course raised
with the aid of the inverse metric g . This object turns out to have all the right
properties to qualify as a candidate energy-momentum tensor of Maxwell theory in a
gravitational field. In particular, it is off-shell symmetric and moreover on-shell covari-
antly conserved,
T = 0 on-shell , (6.80)
T = J F on-shell . (6.81)
While one may have anticipated these last two equations on the basis of the minimal
coupling recipe, it is important (and a useful exercise) to verify by direct calculation that
183
they indeed hold. The point of this verification is to make sure that no commutators
of covariant derivatives, i.e. curvature terms, arise in and mess up this equation, as
they will in the calculation below involving the Noether energy-momentum tensor.
So let us take a brief look at the covariantised or minimally coupled Noether energy-
momentum tensor, namely
= F A 41 g F F . (6.82)
While the canonical energy-momentum tensor in Minkowski space had some undesirable
properties, its one redeeming feature was that it was on-shell conserved. In contrast
to this, is neither on-shell conserved nor on-shell covariantly conserved. In order
to establish a ab = 0 in Minkowski space, one uses the fact that partial derivatives
commute. Thus, analogously, in calculating one encounters the commutator of
covariant derivatives. Explicitly on-shell one finds
= 12 F [ , ]A , (6.83)
However, as we will discuss at length in section 7, the characteristic and defining feature
of a non-trivial curved space-time is that these covariant derivatives do not commute
when acting on tensors other than scalars (their commutator defining the curvature
tensor of the space-time).
ab = ab c (Fac Ab ) ,
(6.84)
(F A ) = 12 F [ , ]A , (6.85)
so that it would not qualify as an improvement term in the standard sense. Neverthe-
less, subtracting this term from the (non-conserved) Noether energy-momentum tensor,
one finds that this indeed cancels the commutator term arising form (6.83), thus giving
rise to an on-shell covariantly conserved or T . From the present perspective,
however, this must be considered to be somewhat of a miracle or fluke. For some more
comments on this, see section 21.2.
184
6.6 Covariant Energy-Momentum Tensor: the Source of Gravity
As we have seen, there are some irritating conceptual and technical issues associated
with the Noether + minimal coupling procedure in general. These irritants turn
out to be a good thing, though, because they motivate us to rethink this issue from
scratch, and this will now lead us to a much more compelling and both conceptually
and technically perfectly satisfactory general definition of the energy-momentum tensor
of any Lagrangian field theory in a gravitational field.
Thus let us think about this issue from a Lagrangian, action-based, perspective. So
far we have discussed what is the appropriate form of the action for matter fields in a
gravitational field, namely a generally covariant action
Z
4
Smatter = SM [; g ] = gd x LM (, , . . . , g , . . .) (6.87)
for the matter fields in a gravitational background g , obtained e.g. by the minimal
coupling description and thus describing the dynamics of the fields in a gravitational
background and encoding the coupling of the matter fields to gravity. Ultimately, this
action should then be one part of the total gravitational + matter action describing the
dynamics of the matter fields and of the gravitational field,
Since the gravitational field is described by the (now dynamical) variables g (x), we
can write this marginally more explicitly as
S[g , ] = Sg [g ] + SM [; g ] . (6.89)
The precise form of the gravitational action Sg will not be relevant here - this is some-
thing that we will discuss at length in section 19. All we need to keep in mind is that
this action is to provide us with the gravitational part of the gravitational field equa-
tions, i.e. with the appropriate tensorial generalisation of the left-hand side of the
Newtonian field equation = 4GN .
Variation of this total action with respect to the matter fields is equivalent to the
variation of the matter action SM alone with respect to the matter fields,
S[g , ] SM [; g ]
=0 =0 , (6.90)
and will thus simply give rise to the equations of motion of the matter fields in a
gravitational field, as required.
Now let us consider the variation of the total action with respect to the gravitational
dynamical variables g ,
S[g , ] Sg [g ] SM [; g ] !
= + =0 (6.91)
g g g
185
Variation of the gravitational action with respect to the gravitational field g will give
us the gravitational part of the field equations. Thus variation of the matter action with
respect to the gravitational field will give us the source term for the gravitational field
equations provided by the matter fields,
SM [; g ]
= Source of Gravity . (6.92)
g
On the other hand, as recalled in the introduction to this section (section 6.1), we expect
the energy-momentum tensor to act as the source of gravity. Therefore we should simply
define the energy-momentum tensor by this relation,
T := Source of Gravity
SM [; g ] (6.93)
T .
g
We will fix the proportionality factor momentarily.
Note that this is precisely analogous to the way a source term for the Maxwell equations,
a current J , arises from the variation of the coupled matter-Maxwell action with respect
to the gauge field A (5.54),
SM [, A ]
J . (6.94)
A
In order to test this suggestion, let us take a look at our two standard examples, a scalar
field and Maxwell theory. For a scalar field, the minimally coupled action is (5.13)
Z
4 1
S[, g ] = gd x 2 g V () . (6.95)
Since the action depends explicitly on the inverse metric, it is more convenient to de-
termine the variation of the action under variations
g g + g (6.96)
of the inverse metric. Under such a variation, the volume factor g varies as (4.83)
g = 12 gg g . (6.97)
186
Now let us look at Maxwell theory, our litmus test. In this case, the action is (5.50)
Z
1 4
S[A , g ] = 4 gd x g g F F . (6.101)
The variation of g is as before, and as regards the variation of the inverse metric, there
is now an additional relative factor of two compared with the calculation for the scalar
fields because the action depends quadratically on the inverse metric. Thus one has
Z
4
1
S[A , g ] = 2 gd x g F F 41 g F F g . (6.102)
Thus the metric variation of the matter action has given us on the nose the symmetric,
gauge invariant, on-shell conserved energy-momentum tensor of Maxwell theory, without
any need to appeal to any improvement procedures!
Thus, when it comes to defining the energy-momentum tensor for Maxwell theory, the
above approach based on the variation of the matter action with respect to the metric
wins hands down over the painful canonical definition based on Noethers theorem for
translations and the Belinfante improvement procedure combined with minimal cou-
pling.
Encouraged by this, we now define the energy-momentum tensor T in general by
Z
4
1
metric SM [, g ] = 2 gd x T g , (6.105)
or, equivalently,
2
T := SM [, g ] . (6.106)
g g
Even though, as we have seen, there are other definitions of the energy-momentum ten-
sor, this is the modern, and by far the most useful, definition of the energy-momentum
tensor, namely as the response of the matter action to a variation of the metric (equiv-
alently, as the source of gravity).
Moreover, crucially for the present context, whatever the virtues of other definitions
may be, from the variational principle for general relativity it is this energy-momentum
tensor that plays the role of the source term for the Einstein equations.
Remarks:
187
1. The energy-momentum tensor as defined by (6.105) or (6.106) is frequently called
the metric energy-momentum (or stress-energy) tensor, or also the Hilbert or
Rosenfeld energy-momentum tensor. It is sometimes also referred to as the gravi-
tational energy-momentum tensor, but that is confusing as it does not describe the
energy-momentum of the gravitational field itself, a more mysterious and elusive
quantity we will briefly look at and for in section 21.6.
I prefer the attribute covariant, to distinguish it from what is usually called the
canonical Noether energy-momentum tensor. Thus, even though this terminology
is not standard, I will henceforth refer to T as defined by (6.105) or (6.106), as
the Covariant Energy-Momentum Tensor.
4. When the minimally coupled matter Lagrangian depends only on the metric and
not on the first derivatives of the metric (i.e. not on the Christoffel symbols),
as in the case of scalar or Maxwell gauge fields, then more explicitly the covariant
energy-momentum tensor can be written as (and calculated from)
2 ( gLM (x)) LM (x)
T (x) = = 2 + g (x)LM (x) (6.109)
g g (x) g (x)
or
LM (x)
T (x) = 2 + g (x)LM (x) . (6.110)
g (x)
Here the sign change is due to the fact that g denotes the variation of the
inverse metric, not the contravariant components of g . Thus it is not the same
as g g g , but rather minus this expression,
g = g g g , (6.111)
0 = (g g ) = (g )g + g g g = g g g . (6.112)
188
5. The definition (6.105) or the explicit expression (6.109) also provides an efficient
strategy to determine the energy-momentum tensor even if one is just interested
in Poincare-invariant field theories in Minkowski space:
In order to determine a symmetric, gauge invariant, and on-shell conserved energy-
momentum tensor Tab for such a theory, one
It can be shown that for fields of any spin this energy-momentum tensor agrees
on-shell with what one could have also obtained by invoking the Belinfante im-
provement procedure of the Noether energy-momentum tensor,
ab = Tab
on-shell (6.114)
6. When the minimally coupled matter action depends also on the first derivatives
of the metric, through the covariant derivative of some (non-scalar) field ,
say, by the usual rules of variational calculus there will be additional contributions
to the energy-momentum tensor, arising from an integration by parts of
Z
4 LM (x)
gd x 2 ( )(x)
(x)
189
We consider the situation where the minimally coupled matter action happens to be
invariant under Weyl rescalings, i.e. under rescalings of the metric
In particular, thus, we consider the (admittedly very special) situation where one has
such a symmetry without any accompanying transformation of the matter fields. The
discussion can be extended to the case where also a transformation of the matter fields
is required, but for present purposes this special case is good enough (see the end of
this section for a comment on the general case).
Examples of such actions are e.g. the action of a massless scalar field (5.11) in D = 2
(space-time) dimensions
Z
S[, g ] = 2 d2 x gg
1
(6.118)
Indeed, in that case the metric dependence of the action is precisely such that the
combination of of the determinant g and the inverse metric that appears is invariant
under Weyl rescalings,
(
2 D=2 gg gg
g e g (6.120)
D=4 gg g gg g
This is reflected in the fact that the corresponding energy-momentum tensor is traceless
precisely in these dimensions: from (6.73) and (6.79) one finds
T = 21 g (g ) T = 12 (D 2)g
(6.121)
T = F F 14 g F F T = 41 (D 4)F F .
The relation between these two observations / assertions is provided by noting that if
the matter action is invariant under Weyl rescalings one has
Z
D
0 = Smatter = 12 gd x T (x) g (x)
Z Z (6.122)
D D
= gd x T (x)g (x)(x) = gd x T (x)(x) .
190
In the special csae that we have considered here (invariance under scalings of the metric
alone, without transforming the matter fields), this is true off-shell, i.e. without using
the equations of motion for the matter fields. In the more general casse of an invariance
under joint Weyl rescalings of the metric and accompanying scalings of the matter fields,
in the above chain of arguments one would need to also vary the matter action with
respect to the matter fields to establish the invariance of the action. The term arising
from the variation of the matter fields is evidently proportional to the Euler-Lagrange
equations of the matter fields, and therefore in that case one could only conclude that
T = 0 on-shell,
)
invariance under joint Weyl rescalings
T = 0 on-shell. (6.124)
of the metric and the matter fields
An example of this is provided by the so-called conformally coupled scalar field. This
conformal coupling involves a space-time dependent mass term that represents a non-
minimal coupling of the scalar field to the scalar curvature (a contraction of the Riemann
curvature tensor to be introduced in section 7), and understanding the Weyl invariance
of this model requires a formula for the variation of the scalar curvature with respect
to the metric which we will derive in section 19.2. Therefore we will need to postpone
a discussion of this model to section 21.3.
As an aside, but as a concrete, and the simplest non-trivial, example, and an illustration
of the above remarks regarding Weyl invariance, let us consider a massless scalar field
in (1+1)-dimensions, in either the usual Minkowski coordinates, or in the Rindler coor-
dinates discussed in sections 1.3 and 2.8 (we will in particular make use of the results
in section 2.8).
Thus a natural basis of solutions to this equation is provided by the plane waves fk
exp(it + ikx), with k2 = 2 , i.e. k = , > 0, and their complex conjugates. For
a given there are thus two linearly-independent positive frequency solutions,
1
f (t, x) = 1/2
e i(t x)
(4)
(6.126)
1
g (t, x) = e i(t + x)
(4)1/2
(the normalisation factors are inserted for QFT-pedantry reasons only and are irrelevant
for the following). Thus the basis of solutions splits into right-movers or right-moving
191
modes f and left-movers g . It is thus convenient to introduce the corresponding null
coordinates uM = t x, vM = t + x as in (2.155), in terms of which the solutions can
be written as
1
f = f (uM ) = e iuM
(4)1/2
(6.127)
1 iv M
g = g (vM ) = e .
(4)1/2
That the solutions split in this way could have also been deduced from the form of the
wave operator in these lightcone (null) coordinates, namely = 4uM vM , and the
ensuing solutions to the equation of motion,
Here f and g can now be arbitrary wave packets constructed from the solutions f and
g respectively.
The energy-density M = Ttt of the scalar field with respect to Minkowski time is
M = 12 ((t )2 + (x )2 ) (6.129)
and in terms of lightcone coordinates this splits into a sum of left-moving and right-
moving contributions,
M = (uM )2 + (vM )2 , (6.130)
with f (uM ) evidently only contributing to the former and g(vM ) to the latter.
Now let us consider the same issue in Rindler coordinates. In terms of the coordinates
(, ) (2.150), the metric takes the form (2.148)
ds2 = e 2a (d 2 + d 2 ) . (6.131)
Note that, as mentioned in section 2.8, the metric in these coordinates is conformally
flat. Thus, by the reasoning above, in section 6.7, in particular the discussion around
equation (6.120), we know that the action and equation of motion for a scalar field
in Rindler coordinates will look just like those in Minkowski coordinates, with the
replacement (t, x) (, ),
Z Z
1
SR [] = 2
gdd g = 2 dd ,
1
(6.132)
and
g = 0 (2 + 2 ) = 0 . (6.133)
Thus by the same reasoning as above, the solutions can be split into left- and right-
movers and are conveniently written in terms of the Rindler lightcone coordinates (2.156)
(uR , vR ) = , (6.134)
192
i.e. one has
R = 12 (( )2 + ( )2 ) (6.136)
and in terms of lightcone coordinates this splits into a sum of left-moving and right-
moving contributions,
R = (uR )2 + (vR )2 , (6.137)
with f (uR ) evidently only contributing to the former and g(vR ) to the latter.
The interest in these (fairly trivial) considerations lies in the fact that the exponential
relation (2.158) between the Minkowski and Rindler null coordinates
reflecting the exponential redshift of a Rindler relative to an inertial observer (and vice-
versa) has a number of non-trivial and remarkable implications. I will just mention 2
of them here:
1. The exponential redshift expressed by (6.138) implies that the right-moving energy
densities in Minkowski and Rindler coordinates are related by
uM 1
= auM (uM )2 = (uR )2 (6.139)
uR a2 u2M
(and likewise for the left-movers). Thus essentially any classical solution that is
regarded as regular by the Rindler observer (finite and non-zero R ) corresponds
to a divergent Minkowski energy-density as uM 0, i.e. on the future boundary
(horizon) t = x of the Rindler wedge.
2. The exponential redshift expressed by (6.138) also implies that the notions of
positive frequency with respect to Minkowski and Rindler time are inequivalent,
e.g. in the sense that f (uM ), restricted to the right Rindler-wedge uM < 0, say,
cannot be written as a superposition of Rindler right-moving positive frequency
waves alone, Z
f (uM ) 6= d (, )f (uR ) . (6.140)
0
Of course, the f (uR ) and their complex conjugates f (uR ) provide a basis of
solutions for the right-moving modes (in the right Rindler-wedge), so that one can
certainly expand the Minkowski plane waves as
Z
f (uM ) = d (, )f (uR ) + (, )f (uR ) , (6.141)
0
193
but necessarily with some of the (, ) 6= 0.
If you know a little bit of quantum field theory, you will be able to anticipate that
this means that the notions of creation and annihilation operators are inequivalent,
and that therefore what is the vacuum, say, for an inertial observer, will not be
seen as the vacuum by the accelerating observer (and vice-versa).
Combining the two facts, one also arrives at the conclusion that the Rindler
vacuum is singular both at the future horizon (from right-movers) and at the
past horizon (from left-movers).
In the spirit of the equivalence principle (before studying gravity, let us study accelera-
tions in flat space), this Unruh Effect is a fascinating and rewarding first step towards
understanding (or appreciating the difficulties encountered by) quantum field theory
in curved space-times, i.e. in non-trivial gravitational fields. For more on this see the
references given in section 26.6.
In section 5.8 we had discussed how to obtain conserved charges from covariantly con-
served currents. Now in special relativity one can construct conserved currents (cor-
responding to the generators of Poincare transformations) from the conserved energy-
momentum tensor, and hence from there the corersponding conserved charges like en-
ergy, momentum and angular momentum. In this section we will take a first look at
the question if or to which extent we can also obtain such conserved currents from the
covariantly conserved energy-momentum tensor in a gravitational field.
To set the stage, recall that in Special Relativity, if T ab is the energy-momentum tensor
of a physical system, it generally satisfies an equation of the form
a T ab = Gb , (6.142)
where Gb represents the density of the external forces acting on the system. In par-
ticular, if there are no external forces, the divergence of the energy-momentum tensor
is zero. For example, in the case of Maxwell theory and a current corresponding to a
charged particle we have
Gb = Ja F ab = F ab J b F ab b , (6.143)
194
which is indeed the relevant external (Lorentz) force density (in writing this I have
suppressed the -function that localises the current to the worldline a = a ( ) of the
particle).
When there are no external forces, i.e. when one has taken into account the complete
matter action, the total energy-momentum tensor is conserved. In that case, T ab = J (b)a
defines four conserved currents, more or less (modulo Belinfante improvement terms,
see e.g. the discussion in sections 6.4 and 21.2 and the references given there) the
currents associated to translation invariance of the action via Noethers theorem. One
is thus in the setting of conserved currents of the previous section, and one can define
conserved quantities like total energy and momentum, P a , and angular momentum J ab ,
by integrals of T 0a or a T 0b b T 0a (the latter being conserved if Tab is symmetric) over
spacelike hypersurfaces.
We see that, due to the second term, this does not define four conserved currents in the
ordinary or covariant sense (and we will return to the interpretation of this equation,
and the related issue of energy and energy density of the gravitational field, in section
21.6).
Nevertheless, in analogy with special relativity, one might like to attempt to define
conserved quantities like total energy and momentum, P , and angular momentum
J , by integrals of T 0 or x T 0 x T 0 over spacelike hypersurfaces. However, these
quantities are rather obviously not covariant, and nor are they conserved.
This should perhaps not be too surprising because, after all, for a Poincare-invariant field
theory in Minkowski space these quantities are preserved as a consequence of Poincare
invariance, i.e. because of the symmetries (isometries) of the Minkowski metric (as well
as of the action).
A generic metric has no isometries whatsoever (the explicit examples of metrics in these
notes not withstanding, all of which exhibit at least some symmetries). As it has no
symmetries, we have no reason to expect to find associated conserved quantities in
general.
However, if there are symmetries then one should indeed be able to define conserved
quantities (think of Noethers theorem again), one for each symmetry generator. In
order to implement this we need to understand how to define and detect isometries of
195
the metric. For this we need the concepts of Lie derivatives and Killing vectors. These
already made occasional brief appearances in previous sections and will be discussed
more systematically in section 8, the corresponding conserved charges then being the
subject of section 9.
Alternatively, one might try to just go ahead optimistically and attempt to construct
a covariant current-like object (with a corresponding conservation law and the ensuing
possibility to define conserved charges) by contracting the energy-momentum tensor not
with the coordinates but with a vector field V , along the lines of
JV = T V . (6.145)
At least this now has the merit of clearly being a vector field, but is it conserved?
Calculating its covariant divergence, and using the fact that T is symmetric and
conserved, one finds
JV = 21 T ( V + V ) . (6.146)
Thus we would have a conserved current (and associated conserved charge by the pre-
vious section) for any conserved energy-momentum tensor if the vector field V were
such that it satisfies
V + V = 0 (T V ) = 0 . (6.147)
The link between this observation and the one in the preceding paragraph regarding
symmetries is that this is precisely the condition characterising (infinitesimal) symme-
tries of metric:
First of all, this is the condition we already found and encountered in (2.101), as
reformulated in (4.65), for the infinitesimal coordinate transformation x = V
to generate a symmetry of the metric, thus leading to a conserved charge for
geodesics.
More generally, as we will discuss in detail in section 8 below, vector fields satisfy-
ing the equation V + V = 0 are indeed in one-to-one correspondence with
infinitesimal generators of continuous symmetries of a metric (isometries).
Thus this gives a satisfactory and coherent overall picture of symmetries and conserva-
tion laws in a gravitational field.
196
7 Curvature I: The Riemann Curvature Tensor
We now come to one of the most important concepts of General Relativity and Rie-
mannian Geometry, that of curvature and how to describe it in tensorial terms. Among
other things, this will finally allow us to decide unambiguously if a given metric is just
the (flat) Minkowski metric in disguise or the metric of a genuinely curved space (but
a proof of this statement is postponed to section 10). More importantly (for present
purposes) it will allow us to construct tensors that depend on the 2nd derivatives of
the metric and will thus allow us to construct tensorial (generally covariant) differen-
tial equations for the metric. In particular, this will then lead us fairly directly to the
Einstein equations (section 18), i.e. to the field equations for the gravitational field.
Recall that the equations that describe the behaviour of particles and fields in a gravi-
tational field involve the metric and the Christoffel symbols determined by the metric.
Thus the equations for the gravitational field should be generally covariant (tensorial)
differential equations for the metric.
At first, here we seem to face a dilemma. How can we write down covariant differential
equations for the metric when the covariant derivative of the metric is identically zero?
Having come to this point, Einstein himself reached an impasse and required the help
of his mathematician friend Marcel Grossmann (Grossmann, you have to help me, or
else Ill go crazy!) whom he had asked to investigate if there were any tensors that
could be built from the second derivatives of the metric.
Grossmann soon found that this problem had indeed been addressed and solved in the
mathematics literature, in particular by Riemann (generalising work of Gauss on curved
surfaces), Ricci-Curbastro and Levi-Civita. It was shown by them that there are indeed
non-trivial tensors that can be constructed from (ordinary) derivatives of the metric.
These can then be used to write down covariant differential equations for the metric.17
The most important among these are the Riemann curvature tensor and its various
contractions. In fact, it is known that these are the only tensors that can be constructed
from the metric and its first and second derivatives, and they will therefore play a central
role in all that follows.
Technically the most straightforward way of introducing the Riemann curvature tensor is
via the commutator of covariant derivatives. In this section we will adopt this pragmatic
(and relatively streamlined) approach, as it is sufficient to
17
Of course, the story is not as simple and straightforward as that. For an account of Marcel Gross-
manns (often overlooked) contributions to tensor calculus and the development of general relativity, see
T. Sauer, Marcel Grossmann and his contribution to the general theory of relativity, arXiv:1312.4068
[physics.hist-ph].
197
determine the most important algebraic and differential properties of the curvature
tensor (symmetries and Bianchi identities)
assess its physical significance (gravitational tidal forces) via the influence of the
curvature tensor on the motion of (families of) freely falling particles
and to thus provide us with all the information and ingredients we need to then
discuss the Einstein equations (section 18) and their formulation in terms of an
action principle (section 19).
However, this is not geometrically the most intuitive way to introduce the concept
of curvature, and it downplays the extent to which the curvature tensor reflects and
encodes the geometric properties of space time and, more generally, does not do justice
to the fundamental differential geometric notion and significance of curvature. Some of
these aspects are discussed in Part B of these notes, in particular in sections 10, 11, 12
and 13.
[ , ](V ) = [ , ]V (7.1)
for any scalar field . This implies that [ , ]V cannot depend on derivatives of V
because if it did it would also have to depend on derivatives of .
[ , ]V = R V . (7.2)
This can of course also be verified by a direct calculation, and we will come back to
this below. For now let us just note that, since the left hand side of this equation is
clearly a tensor for any V , the quotient theorem implies that the quantities R are
the components of a tensor.
V = ( )V + ( )( V ) + ( )( V ) + V . (7.3)
198
Thus, upon taking the commutator the 2nd and 3rd terms drop out (because the 3rd is
the symmetrisation of the 2nd), and we are left with
[ , ]V = ([ , ])V + [ , ]V
= [ , ]V , (7.4)
where the last line follows from the fact that 2nd covariant derivatives do commute on
scalars. Thus we have established (7.1).
By explicitly calculating the commutator, one can confirm the structure displayed in
(7.2). This explicit calculation shows that the Riemann-Christoffel Curvature Tensor
(or Riemann tensor for short) is given by
R = + (7.5)
Remarks:
1. Note how useful the quotient theorem is in this case. It would be quite unpleasant
to have to verify the tensorial nature of this expression by explicitly checking its
behaviour under coordinate transformations.
2. Note also that this tensor is clearly zero for the Minkowski metric written in
Cartesian coordinates. Hence it is also zero for the Minkowski metric written in
any other coordinate system. We will prove the converse, that vanishing of the
Riemann curvature tensor implies that the metric is (locally) equivalent to the
Minkowski metric, in section 10.2.
3. In the above we have defined the Riemann tensor by the relation (7.2) and then
deduced the explicit expression (7.5). While this is, pragmatically speaking, a
useful way of proceeding, it may be more logical to initially define the Riemann
tensor in a different way, e.g. directly by (7.5) (for instance because by painful
calculations one has discovered that this particular combination of non-tensorial
objects miraculously happens to transform as a tensor). In that case, (7.2) is a
result rather than a definition, known as the Ricci identity.
199
We will see later that the Riemann tensor is anti-symmetric in its first two indices.
Hence we can also write
[ , ]V = R V . (7.7)
The extension to arbitrary (p, q)-tensors now follows the usual pattern, with one Rie-
mann curvature tensor, contracted as for vectors, appearing for each of the p upper
indices, and one Riemann curvature tensor, contracted as for covectors, for each of the
q lower indices. Thus, e.g. for a (2, 0)-tensor T one has
[ , ]T = R T + R T (7.8)
[ , ]A = R A R A . (7.9)
I will give two other versions of the fundamental formula (7.2) which are occasionally
useful and used.
2. Secondly, one can consider a net of curves x (s1 , s2 ) parametrising, say, a two-
dimensional surface, and look at the commutators of the covariant derivatives
along the s1 - and s2 -curves. The formula one obtains in this case (it can be
obtained from (7.10) by noting that X and Y commute in this case) is
dx dx
(Ds1 Ds2 Ds2 Ds1 ) V = R V , (7.11)
ds1 ds2
where Dsk denotes the covariant derivative along the curve parametrised by sk ,
i.e. (section 4.7)
x (s1 , s2 )
Dsk = . (7.12)
sk
200
In general, to read off all the symmetries from the formula (7.5) is difficult. One way
to simplify things is to look at the Riemann curvature tensor at the origin x0 of a
Riemann normal coordinate system (or some other inertial coordinate system). In that
case, all the first derivatives of the metric disappear and only the first two terms of (7.5)
contribute. One finds
R (x0 ) = g ( )(x0 )
= ( )(x0 )
1
= 2 (g , +g , g , g , )(x0 ) . (7.13)
In principle, this expression is sufficiently simple to allow one to read off all the symme-
tries of the Riemann tensor. However, it is more insightful to derive these symmetries
in a different way, one which will also make clear why the Riemann tensor has these
symmetries.
R = R (7.14)
R = R (7.15)
This is a consequence of the fact that the metric is covariantly constant. In fact,
we can calculate
0 = [ , ]g
= R g + R g
= (R + R ) . (7.16)
R[] = 0 R + R + R = 0 (7.17)
This Bianchi identity is a consequence of the fact that there is no torsion. In fact,
applying [ , ] to the covector , a scalar, one has
[ ] = 0 R[] = 0 . (7.18)
As this has to be true for all scalars , this implies R[] = 0 (to see this
you could e.g. choose the (locally defined) coordinate functions (x) = x with
= ).
201
4. Symmetry under exchange of the two pairs of indices
R = R (7.19)
This identity, stating that the Riemann tensor is symmetric in its two pairs of
indices, is not an independent symmetry but can be deduced from the three other
symmetries by some not particularly interesting algebraic manipulations. One
(quite possibly not optimal or minimal) possibility is
(3)
R = R R
(2)
= R + R
(3)
= R R R R
(7.20)
(1,2)
= 2R + R R
(3)
= 2R R
(1,2)
= 2R R ,
while
(1,2)
1 + 2 3 4 = 2R 2R . (7.23)
We can now count how many independent components the Riemann tensor really has.
(1) implies that the second pair of indices can only take N = (4 3)/2 = 6 independent
values. (2) implies the same for the first pair of indices. (4) thus says that the Riemann
curvature tensor behaves like a symmetric (66) matrix and therefore has (67)/2 = 21
components. We now come to the remaining condition (3): if two of the indices in (3)
are equal, (3) is equivalent to (4) and (4) we have already taken into account. With
18
See e.g. D. Bleecker, Gauge Theory and Variational Principles.
202
all indices unequal, (3) then provides one and only one more additional constraint. We
conclude that the total number of independent components is 20.
Remarks:
1. Note that this agrees precisely with our previous counting in section 2.10 of how
many of the second derivatives of the metric cannot be set to zero by a coordinate
transformation: the second derivative of the metric has 100 independent compo-
nents, to be compared with the 4 (4 5 6)/(2 3) = 80 components of the
third derivatives of the coordinates. This also leaves 20 components. We thus see
very explicitly that the Riemann curvature tensor contains all the coordinate in-
dependent information about the geometry up to second derivatives of the metric.
In fact, it can be shown that in a Riemann normal coordinate system one has
2. Just for the record, I note here that in general dimension D = d + 1 the Riemann
tensor has D 2 (D 2 1)/12 independent components. This number arises as
D 2 (D 2 1) N (N + 1) D
=
12 2 4
D(D 1)
N = (7.25)
2
and describes (as above) the number of independent components of a symmetric
(N N )-matrix, now subject to D 4 conditions which arise from all the possibilities
of choosing 4 out of D possible distinct values for the indices in (3). Just as for
D = 4, this number of components of the Riemann tensor coincides with the
number of second derivatives of the metric minus the number of independent
components of the third derivatives of the coordinates determined in (2.236),
D(D + 1) D(D + 1) D(D + 1)(D + 2) D 2 (D 2 1)
D = . (7.26)
2 2 23 12
For D = 2 this formula predicts one independent component, and this is as it
should be. Rather obviously the only independent non-vanishing component of
the Riemann tensor in this case is R1212 . We will discuss curvature in 2 dimensions
in more detail in sections 7.6 and 10.3 below.
Finally, a word of warning: there are a large number of sign conventions involved
in the definition of the Riemann tensor (and its contractions we will discuss below),
so whenever reading a book or article, in particular when you want to use results or
equations presented there, make sure what conventions are being used and either adopt
those or translate the results into some other convention. As a check: the conventions
used here are such that R as well as the curvature scalar (to be introduced below)
are positive for the standard metric on the two-sphere.
203
7.4 Influence of Curvature on Particle Trajectories
In a certain sense the main effect of curvature (or gravity) is that initially parallel
trajectories of freely falling non-interacting particles (dust, pebbles,. . . ) do not remain
parallel, i.e. that gravity is an attractive force that has the tendency to focus matter.
This statement find its mathematically precise formulation in equations describing the
influence of space-time curvature on the behaviour of (families of) geodesics.
Let us, as we will need this later anyway, recall first the situation in the Newtonian
theory. One particle moving under the influence of a gravitational field is governed by
the equation
d2 i
dt2
x = i (x) , (7.27)
where is the potential. Now consider a family of particles, or just two nearby particles,
one at xi (t) and the other at xi (t) + xi (t). The other particle will of course obey the
equation
d2
dt2
(xi + xi ) = i (x + x) . (7.28)
From these two equations one can deduce an equation for x itself, namely
d2 i
dt2 x = i j (x)xj . (7.29)
It describes the effect of gravitational tidal forces (the gradient of the gravitational force)
on a family of particles moving in a gravitational field.
In particular, when there is no gravitational force, and the trajectories are straight lines,
one has
d2
dt2
xi = 0 xi = (xi )0 + (v i )t . (7.30)
Thus one recovers Euclids parallel axiom, that two straight lines intersect at most once
(when v i 6= 0) and that they never intersect when they are initially parallel (v i = 0).
Any departure from this equation or its Minkowskian counterpart
d2
d 2
a =0 , (7.31)
It is the counterpart of (7.29) that we will be seeking in the context of General Rela-
tivity. One derivation of this can be modelled on the Newtonian derivation above. It
is elementary but looks non-covariant (and therefore somewhat messy) at intermediate
stages of the calculation (see section 11.1 for a manifestly covariant derivation).
The starting point is of course the geodesic equation for x and for its nearby partner
x + x ,
d2
d 2
x + (x) d
d d
x d x = 0 , (7.32)
and
d2
d 2
(x + x ) + (x + x) d
d
(x + x ) d
d
(x + x ) = 0 . (7.33)
204
As above, from these one can deduce an equation for x, namely
d2
d 2
x + 2 (x) d x d x + (x)x d
d d d d
x d x = 0 . (7.34)
Now this does not look particularly covariant. Thus instead of in terms of d/d we
would like to rewrite this in terms of the covariant operator D , with
d dx
D x = x + x . (7.35)
d d
appearing in that expression by x x (be-
Calculating (D )2 x , replacing x
cause x satisfies the geodesic equation) and using (7.34), one eventually finds the nice
covariant geodesic deviation equation
(D )2 x = R x x x (7.36)
Remarks:
1. This shows very clearly that curvature, as captured by the Riemann curvature
tensor, leads to non-Euclidean geometry in which e.g. the parallel axiom is not
necessarily satisifed.
2. In general, solutions to the geodesic deviation equation are called Jacobi fields.
They describe the difference between the given geodesic and a (hypothetical) in-
finitely close neighbouring geodesic.
7.5 Contractions of the Riemann Tensor: Ricci Tensor and Ricci Scalar
The Riemann tensor, as we have seen, is a four-index tensor. For many purposes this
is not the most useful object, but we can create new tensors by contractions of the
Riemann tensor. Due to the symmetries of the Riemann tensor, there is essentially only
one possibility, namely the Ricci tensor
R := R = g R . (7.37)
It arises naturally from the definition (7.2) of the Riemann tensor in terms of commu-
tators of covariant derivatives, when one considers a contracted commutator,
[ , ]V = R V [ , ]V = R V R V . (7.38)
205
In particular, this identity explains why the Maxwell equations in the covariant Lorenz
gauge (5.45) take the non-minimally coupled form (5.46).
It follows from the symmetries of the Riemann tensor that R is symmetric. Indeed
R = g R = g R = R = R . (7.39)
Thus, for D = 4, the Ricci tensor has 10 independent components, for D = 3 it has 6,
while for D = 2 there is only 1 because there is only one independent component of the
Riemann curvature tensor to start off with.
There is one more contraction of the Riemann tensor we can perform, namely on the
Ricci tensor itself, to obtain what is called the Ricci scalar or curvature scalar
R := g R . (7.40)
Remarks:
1. One might have thought that at least in four dimensions there is another way
of constructing a (pseudo-)scalar, by contracting the Riemann tensor with the
Levi-Civita tensor, but
R = 0 (7.41)
2. Note that for D = 2 the Riemann curvature tensor has as many independent
components as the Ricci scalar, namely one, and that for D = 3 the Ricci tensor
has as many components as the Riemann tensor, namely 6. Thus in D = 2 one
can express the entire Riemann tensor in terms of the Ricci scalar (and the metric)
alone, and one has
D=2: R = 12 (g g g g )R (7.42)
(we will establish this relation in section 10.3, see (10.29)), while in D = 3 one
has
D = 3 : R = (g R + R g g R R g )
(7.43)
+ 21 (g g g g )R
(and we will prove this in section 10.4).
3. It is thus only in four (and more) dimensions that there are strictly less components
of the Ricci tensor than of the Riemann tensor. This has profound implications
for the dynamics of gravity in these dimensions. In fact, we will see that it is only
in dimensions D > 3 that gravity becomes truly dynamical, where empty space
can be curved, where gravitational waves can exist etc.
206
4. Contracting (7.8), one consequence of the symmetry of the Ricci tensor is the
useful general result
[ , ]T = R (T T ) = 0 (7.44)
F = F ( F ) = 21 [ , ]F = 0 . (7.45)
Note that this can also be deduced (without knowing anything about curvature
in general or the Ricci tensor in particular) from the general expression (4.63) for
the divergence of an anti-symmetric tensor,
F = J J = 0 . (7.47)
Now we see that we can alternatively directly use the identity (7.45) to arrive at
this result.
V V V V = R V V . (7.48)
V V = (V V ) ( V )( V ) (7.49)
V ( V ) + ( V )( V ) (V V ) + R V V = 0 . (7.50)
This is a very useful and versatile master equation which provides valuable infor-
mation about the relation between vector fields and curvature when specialised e.g.
to geodesic vector fields, V V = 0, or Killing vector fields, V = V
and V = 0. Various specialisations of this equation will therefore appear later
on in these notes, and even though we will then usually rederive them from scratch
in the case at hand, it is good to keep in mind that e.g. (11.22) (our starting point
for the discussion of the Raychaudhuri equation in section 11.2) and (12.12) (a
useful identity relating Killing vectors and curvature) are special cases of (7.50).
207
6. There are other scalars that can be built form the curvature tensor, but these
are necessarily of higher order in the curvature tensor, such as (trivially) R2 or
(somewhat less trivially) R R or the square of the Riemann tensor, the so-
called Kretschmann scalar
K = R R . (7.51)
Analogously, scalars can be built from higher powers of the Riemann tensor and or
from powers of covariant derivatives of the Riemann tensor (R being the simplest
example).
7. Such scalars are useful in analysing a given metric because, since they are scalars
they are invariant under coordinate transformations. Thus they directly provide
coordinate-invariant information about a metric. For instance if K is singular at
some point in some coordinate system then it will be singular at that point in all
coordinate system, and thus such a singularity is not an artefact of a bad choice
of coordinate system but a property of the space(-time) itself described by that
metric. A prominent example is the singularity at the origin r = 0 of the Schwarz-
schild metric, unambiguously unveiled by the singularity of its Kretschmann scalar
(26.145).
( V )( V ) + R V V = (V V ) . (7.52)
The simplest (albeit perhaps not of most direct relevance for physics) situation
where one can deduce something of substance from this equation is when one
has a Riemannian (i.e. positive-definite) metric and the space one is dealing with
is compact, without boundary. Then (a) the first term is non-negative, and (b)
upon integration over the space the total derivative term on the right-hand side
gives zero upon use of the Gauss theorem (4.61) (discussed in some more detail in
section 15.3).
This implies that for a harmonic V to exist on such a space, the integral of
R V V must be non-positive. In particular,
if the Ricci tensor is positive (as a quadratic form), there are no harmonic
vector fields at all,
208
and if R V V =0, then a harmonic vector field is necessarily covariantly
constant, V = 0.
In more mathematical terms this means that the first Betti number of a compact
manifold admitting a metric with positive Ricci curvature is equal to zero. A
variant of this kind of argument for Killing vectors will be given in section 12.3.19
To see how calculations of the curvature tensor can be done in practice, let us work out
the example of the two-sphere of unit radius, i.e. with line element
We already know that the non-zero Christoffel symbols necessarily have two -indices
and one -index (from g = sin2 ), and are given by
We also know that the Riemann curvature tensor has only one independent component.
Let us therefore work out r . From the definition we find
R = + c c c c . (7.55)
The second and third terms are manifestly zero, and we are left with
Thus we have
R = R = sin2
(7.57)
R = 1 .
Therefore the Ricci tensor Rab has the components
R = 1
R = 0
R = sin2 . (7.58)
209
showing that the standard metric on the two-sphere is what we will later call an Einstein
metric. The Ricci scalar R is
1
R = g R + g R = 1 + sin2 = 2 . (7.60)
sin2
In particular, we have here our first concrete example of a space with non-trivial, in fact
positive, curvature.
We will see later on, in section 13, that this form of the curvature tensor, or its equivalent,
We now turn to some variations of the above theme (and some other generalisations are
discussed in section 10.3 below).
1. First of all, let us address the question what is the curvature (scalar) of a sphere
of radius , i.e. of the space with line element
The first is to simply and blindly redo the above calculations in this case and
to see what one gets.
Alternatively, and somewhat more insightfully, rather than redoing the cal-
culation in that case one can argue as follows. Let us observe first of all that
the Christoffel symbols are invariant under constant rescalings of the metric
because they are schematically of the form g 1 g. Therefore the Riemann
curvature tensor, which only involves derivatives and products of Christoffel
symbols, is also invariant. Hence the Ricci tensor, which is just a contraction
of the Riemann tensor, is also invariant:
210
However, to construct the Ricci scalar, one needs the inverse metric. This
introduces an explicit -dependence and the result is that the curvature scalar
of a sphere of radius is R = 2/2 ,
2. Now let us consider, instead of the unit 2-sphere, the unit hyperboloid H 2 with
metric (1.123)
ds2 (H 2 ) = d 2 + sinh2 d2 . (7.67)
It is clear that, apart from a few sign changes here and there, the calculation
of the Riemann curvature tensor is identical to that for S 2 . These sign changes
ultimately lead to the conclusion that the curvature scalar of H 2 is (-2). While
the sphere is the prototypical example of a space with positive curvature, the
hyperboloid is the prototypical example of a space with negative curvature.
3. Now let us promote the constant radius of S 2 to a new radial coordinate r and
ask the question what is the curvature tensor of the 3-dimensional space with
coordinates (r, xa ) = (r, , ) and line element
On the one hand, because one seems to have just added a trivial r-direction to the
2-sphere, one might be tempted to suspect that also this 3-dimensional space has
non-trivial curvature. On the other hand, we recognise the above metric as the
Euclidean metric on R3 , written in spherical coordinates, and as such we expect
its curvature (in fact, all components of the Riemann tensor) to be zero.
The latter expectation is of course borne out, but it is instructive to see explicitly
how this cancellation occurs. In fact, it will be even more instructive to consider
an apparently harmless and innocuous modification of the above metric which
consists in replacing dr 2 by some constant multiple of dr 2 ,
211
Equivalently, up to a truly harmless overall constant factor, we can think of this
as the Euclidean metric, but with the metric on the unit-sphere replaced by that
of a metric of radius 1/ p 6= 1),
ds2 = p dr 2 + (r 2 /p)(d 2 + sin2 d2 ) . (7.70)
with ab in this example denoting the components of the metric on the unit sphere
(and with abc and r abcd its associated Christoffel symbols and components of the
Riemann curvature tensor determined in the previous section). From these we can
deduce that for r > 0 the non-trivial Christoffel symbols are
From this, in turn, one finds that all the components of the Riemann tensor
involving at least one r-index are zero, whereas for the purely angular components
one finds
Rabcd = r abcd + acr rbd adr rbc . (7.73)
Using (7.61) and (7.72), one sees that
Therefore precisely for p = 1 the two contributions to the curvature tensor indeed
cancel and the curvature tensor is identically zero, as expected.
Equally interesting is the fact that for p 6= 1 the curvature is non-zero even away
from r = 0 (in addition, there is a conical deficit angle singularity at r = 0, as in
the next example below, but this shall not be our concern here). In particular it
follows from the above result that the only non-vanishing components of the Ricci
tensor of this 3-dimensional space are
We also see from this that this space actually has a curvature singularity as r 0.
Since the Ricci scalar is a scalar (under coordinate transformations), this diver-
gence cannot be an artefact of a bad choice of coordinates, and indicates that
there is a genuine geometric singularity for r 0.
Extended to a four-dimensional space-time metric via
212
this describes the gravitational field outside a monopole.20
4. As a final variation of this theme, we consider the above example in one dimension
less, i.e. we look at the metric one obtains if one replaces the Euclidean metric on
R2 written in polar coordinates by
dr 2 + r 2 d2 p dr 2 + r 2 d2 , (7.78)
This would be the standard Euclidean metric on R2 either for p = 1 or if the angle
had periodicity 2 p, but since has period 2, this results in a misidentification
of the points in a plane, like when one rolls up a flat piece of paper into a cone.
Away from r = 0, this space is intrinsically flat (all the components of the Riemann
curvature tensor are zero, as one can easily calculate - see section 10.1 for an
explanation of this use of the word intrinsic). There is, however, a conical
singularity at the tip of the cone r = 0, which can be thought of as a -function
contribution to the curvature localised at r = 0. Extended to a four-dimensional
space-time metric,
So far, we have discussed algebraic properties of the Riemann tensor. The Riemann
tensor also satisfies some differential identities which, in particular in their contracted
form, will be of fundamental importance in the following.
20
M. Barriola, A. Vilenkin, Gravitational Field of a Global Monopole, Phys. Rev. Lett. 63 (1989)
341-343.
21
It is far from straightforward, however, to find a formalism which allows one to caluclate and derive
the distributional Riemann tensor of this space-time - see R. Geroch, J. Traschen, Strings and other
distributional sources in general relativity, Phys. Rev. D36 (1987) 1017-1031 for a general analysis of
the problem and issues arising in this and related contexts, C. Clarke, J. Vickers, J. Wilson, Generalized
functions and distributional curvature of cosmic strings, Class. Quantum Grav. 13 (1996) 2485-2498
for one approach (based on the Colombeau algebra of distributions), and D. Garfinkle, Metrics with
distributional curvature, arXiv:gr-qc/9906053 for a different approach. We will (mostly) stay away
from distributional curvatures in these notes.
213
The first identity is easy to derive. As a (differential) operator the covariant derivative
clearly satisfies the Jacobi identity
[[ , [ , ] ]] = 0 (7.81)
[[ , [ , ] ]] = 0 [ , [ , ]]+ (, , ) = 0 . (7.82)
If you do not believe this identity (valid for any 3 associative linear operators), you can
just write out the twelve relevant terms explicitly to see that there is indeed a complete
cancellation:
[[ , [ , ] ]] +
+ +
+ +
= 0 . (7.83)
To determine the implications of this identity for the Riemann tensor, we apply it to a
vector field V , say. The first term in (7.82) is
[ , [ , ]]V = (R V ) [ , ]( V )
= ( R )V + R V R V + R V (7.84)
= ( R )V + R V .
Upon taking the cyclic permutations, the sum of the 2nd terms vanishes by the cyclic
symmetry of the Riemann tensor, and therefore one finds
( R )V + (, , ) = 0 . (7.85)
Since this holds for any V , one deduces the Bianchi identity
R + (, , ) = 0 [ R||] = 0 (7.86)
[ R] = 0 . (7.87)
R + R + R = 0 . (7.88)
214
By contracting this with g we obtain
R R + R = 0 . (7.89)
This is not yet particularly useful. To also turn the last term into a Ricci tensor we
contract once more, with g to obtain the contracted Bianchi identity
R R + R = 0 , (7.90)
or
(R 12 g R) = 0 . (7.91)
G = R 12 g R . (7.92)
It is the unique divergence-free tensor that can be built from the metric and its first
and second derivatives (apart from g itself, of course),
G = 0 , (7.93)
and this is why it will play the central role in the Einstein equations for the gravitational
field.
A minor caveat regarding the above statement about the uniqueness of the Einstein
tensor is that, as it stands, it is only true in D = 4 space-time dimensions. In D > 4,
there are other tensors with this property, but they are non-linear in 2nd derivatives of
the metric. The uniqueness statement continues to be true for D > 4 if one adds the
requirement that the tensor is linear in 2nd derivatives of the metric. I will briefly come
back to this in the discussion of the action principle for general relativity in section 19.1.
In particular,
215
2. this led us to consider the coordinate transformation (2.229)
a ( ) = 0a + a (7.96)
implying at = 0
(and we will look at the implications of the next term in the Taylor expansion of
(7.97) below).
Therefore the Taylor expansion of the metric around = 0 has the form
and we will now determine the quadratic term in this expansion (and be able to express
it in terms of the components of the Riemann tensor Rabcd (0 ) at the point p in these
coordinates). To that end we look at the next term in the Taylor expansion of (7.97).
Thus we differentiate (7.97) along the geodesic, i.e. with respect to , and evaluate the
results at = 0 to deduce
d abc (0 )d b c = 0 b (7.100)
or, equivalently
arising from the higher-order terms in the Taylor expansion of (7.97) impose constraints
on the Christoffel symbols and their derivatives that are satisfied in Riemann normal
coordinates (but not in general inertial coordinate systems).
A useful way of reexpressing the condition (7.101) is the following (a certain amount of
hindsight or trial-and-error is required for this): because abc (0 ) = 0, from the definition
of the Riemann tensor we have
Rabcd (0 ) + Racbd (0 ) = c abd (0 ) d abc (0 ) + b acd (0 ) d acb (0 )
(7.103)
= c abd (0 ) + b acd (0 ) 2d abc (0 ) ,
216
and using (7.101) this can be written as
we have
gab , cd () = d abc () + d bac () (7.106)
and at 0 we can use (7.104) and the symmetries of the Riemann tensor to deduce
We have thus found that, to quadratic order in a Taylor expansion of the metric around
the origin of a Riemann normal coordinate system, the metric can be written as
If required, higher order terms can be determined analogously with the help of the
higher order terms in the Taylor expansion of (7.97), and (with a steady hand) can be
expressed in terms of the covariant derivatives of the Riemann tensor at 0 .
In sections 3.1 and 5.1 on the principles of general covariance and minimal coupling
respectively, I mentioned that these do not necessarily fix the equations uniquely. In
other words, there could be more than one generally covariant equation which reduces
to a given equation in Minkowski space. Having the curvature tensor at our disposal
now, we can construct examples of this kind.
Given some tensorial equation, obtained by the minimal coupling prescription, say,
one can always contemplate the possiblity to add additional terms to it involving the
curvature tensor. Since such terms take the form of higher derivative corrections to the
original equation, multiplied by appropriate dimensionful constants, one can usually
get away with ignoring such terms when dealing with weak fields and other low-energy
phenomna, and under such conditions the minimal coupling rule can usually be trusted.
However, such terms are not negligible under extreme conditions involving e.g. very
strong or strongly fluctuating gravitational fields.
An example which shows very clearly that the minimal coupling prescription, at least
the way we have formulated it, is itself ambiguous is, as already briefly pointed out in
217
section 5.6, provided by Maxwell theory. In that case, we saw that in the covariant
Lorenz gauge one has (5.45)
A = 0 F = ( A A ) = A [ , ]A , (7.109)
where A = A . It thus follows from (7.38) that the Maxwell equations in the
covariant Lorenz gauge can be written as (5.46)
A = 0 F = J A R A = J . (7.110)
What this shows is that minimal coupling all by itself is not a unique prescription, as
we would have obtained (7.110) without the curvature terms by applying the minimal
coupling prescription to the special relativity Maxwell equation in the Lorenz gauge,
namely just A = J .
In the present situation, (5.46) is superior to the equation without the curvature term
because
and (related to this) because (7.110) implies that the current is covariantly con-
served (as we had verified in section 5.6 in an arbitrary gauge), while for the
equation without the curvature term covariant current conservation would then
be violated by a curvature term (as can easily be verified).
Thus occasionally some such additional criteria can be used to eliminate (or reduce) the
ambiguity in the minimal coupling prescription, but this need not always be the case.
As another example, consider the wave equation for a (massless, say) scalar field . In
Minkowski space, this is the Klein-Gordon equation which has the obvious curved space
analogue (4.55)
= 0 (7.111)
obtained by the minimal coupling description. However, one could equally well postulate
the equation
( + R) = 0 , (7.112)
218
can be imposed to select a particular non-zero value for (e.g. for a 4-dimensional
space-time this turns out to be the value = 1/6). This will be discussed and explained
in section 21.3.
Thus in general such ambiguities are present and are something one has to live with.
219
B: General Relativity and Geometry
In this second part of the lecture notes I have collected a number of different topics that
develop the formalism of tensor calculus in one way or another. This does not mean,
however, that one necessarily needs to digest all these topics before continuing with the
physical applications of general relativity, and I do not even recommend this.
Stricly speaking none of these topics are essential for understanding some of the more
elementary aspects of general relativity to be treated later on, e.g. the discussion of
the Einstein equations, the field equations for gravity, in section 18, the discussion
of gravitational waves in section 22, or the analysis of geodesics in the Schwarzschild
geometry and the corresponding solar system tests of general relativity in section 24.
Some of the topics treated below will reappear frequently in subsequent sections, e.g.
Killing vectors (section 8) and their associated conserved quantities (section 9), or the
Gauss integral formula derived in section 15.3, and it will be useful to develop at least
some nodding acquaintance with these things.
either to illustrate the relation between the Riemann curvature tensor, a central
object of interest in general relativity and defined in a somewhat pragmatic and
perhaps unintuitive fashion in section 7, and more intuitive and/or geometric
concepts of curvature;
or simply because they are fun or beautiful (or both), and provide an invitation
to the wonderful world of differential geometry;
220
8 Lie Derivative, Symmetries and Killing Vectors
Symmetries and their consequences play a fundamental role in physics. In the present
context, these are symmetries of the gravitational field or of the space-time metric.
Before trying to figure out how to detect symmetries of a metric, or so-called isometries,
let us decide what we mean by symmetries of a metric.
For example, we would say that the Minkowski metric has the Poincare group as a group
of symmetries, because the corresponding coordinate transformations leave the metric
invariant.
Likewise, we would say that the standard metrics on the two- or three-sphere have
rotational symmetries because they are invariant under rotations of the sphere. We can
look at this in one of two ways: either as an active transformation, in which we rotate
the sphere and note that nothing changes, or as a passive transformation, in which we
do not move the sphere, all the points remain fixed, and we just rotate the coordinate
system. So this is tantamount to a relabelling of the points. From the latter (passive)
point of view, the symmetry is again understood as an invariance of the metric under a
particular family of coordinate transformations.
Thinking actively, in order to detect symmetries, we should e.g. compare the geometry,
given by the line-element ds2 = g dx dx , at two different points x and y related by
y (x). Thus we are led to consider the difference
Using the invariance of the line-element under coordinate transformations, i.e. the usual
tensorial transformation behaviour of the components of the metric, we see that we can
also write this as the difference
(g (y) g (y))dy dy . (8.3)
221
Thus we deduce that what we mean by a symmetry, i.e. invariance of the metric under
a coordinate transformation, is the statement
g (y) = g (y) . (8.4)
From the passive point of view, in which a coordinate transformation represents a rela-
belling of the points of the space, this equation compares the new metric at a point P
(with coordinates y ) with the old metric at the point P which has the same values of
the old coordinates as the point P has in the new coordinate system, y (P ) = x (P ).
The above equality then states that the new metric at the point P has the same
functional dependence on the new coordinates as the old metric on the old coordinates
at the point P . Thus a neighbourhood of P in the new coordinates looks identical to
a neighbourhood of P in the old coordinates, and they can be mapped into each other
isometrically, i.e. such that all the metric properties, like distances, are preserved. Thus
either actively or passively one is led to the above condition.
Note that to detect a continuous symmetry in this way, we only need to consider infinites-
imal coordinate transformations. In that case, the above amounts to the statement that
metrically the space time looks the same when one moves infinitesimally in the direction
given by the coordinate transformation.
We now want to translate the above discussion into a condition for an infinitesimal
coordinate transformation
to generate a symmetry of the metric. Here you can and should think of V as a
vector field because, even though coordinates themselves of course do not transform like
vectors, their infinitesimal variations x do,
z
z = z (x) z = x (8.6)
x
and we think of x as V .
In fact, we will do something slightly more general than just trying to detect symmetries
of the metric. After all, we can also speak of functions or vector fields with symmetries,
and this can be extended to arbitrary tensor fields (although that may be harder to
visualise). So, for a general tensor field T we will want to compare T (y(x)) with
T (y(x)) - this is of course equivalent to, and only technically slightly more convenient
in the following than, comparing T (x) with T (x).
222
As usual, we start the discussion with scalars. In that case, we want to compare (y(x))
with (y(x)) = (x). We find
(y(x)) (y(x))
LV := lim . (8.8)
0
Evaluating this, we find
LV = V . (8.9)
Thus for a scalar, the Lie derivative is just the ordinary directional derivative, and this
is as it should be since saying that a function has a certain symmetry amounts to the
assertion that its derivative in a particular direction vanishes.
We now follow the same procedure for a vector field W . We will need the matrix
(y /x ) and its inverse for the above infinitesimal coordinate transformation. We
have
y
= + V , (8.10)
x
and
x
= V + O(2 ) . (8.11)
y
Thus we have
y
W (y(x)) = W (x)
x
= W (x) + W (x) V (x) ,
(8.12)
and
W (y(x)) W (y(x))
LV W := lim , (8.14)
0
we find
LV W = V W W V . (8.15)
223
1. The result looks non-covariant, i.e. non-tensorial, but as a difference of two vectors
at the same point (recall the limit 0) the result should again be a vector. This
is indeed the case. One way to make this manifest is to rewrite (8.15) in terms of
covariant derivatives, as
LV W = V W W V
= V W W V . (8.16)
This shows that LV W is again a vector field. Note, however, that the Lie deriva-
tive, in contrast to the covariant derivative, is defined without reference to any
metric.
2. There is an alternative, and perhaps more intuitive, derivation of the above ex-
pression (8.15) for the Lie derivative of a vector field along a vector field, which
makes both its tensorial character and its interpretation manifest (and which also
generalises to other tensor fields; in fact we had already applied it to the metric
in section 2.6 to deduce (2.100)).
Namely, let us assume that we are initially in a coordinate system {y } adapted
to V in the sense that V = /y a for some particular a, i.e. V = a (so that
[V, W ] := LV W = LW V . (8.20)
224
This is actually a Lie bracket, i.e. it satisfies the Jacobi identity
This can also be rephrased as the statement that the Lie derivative is also a
derivation of the Lie bracket, i.e. that one has
4. I want to reiterate at this point that it is extremely useful to think of vector fields
as first order linear differential operators, via V V = V . In this case, the
Lie bracket [V, W ] is simply the ordinary commutator of differential operators,
[V, W ] = [V , W ]
= V ( W ) + V W W ( V ) W V
= (V W W V )
= (LV W ) = [V, W ] . (8.23)
5. From the above it is evident that if one has two vector fields of the form V(k) = yk ,
they commute as differential operators, i.e. their Lie bracket is zero,
Conversely it is also true that locally this is a sufficient condition for the existence
of such coordinates,
6. For example, if one has a 2-parameter surface x = x (, ), which one can think
of as a 1-parameter family of curves x ( ) labelled by , then the tangent vector
field = x to the family of curves and the connecting vector field (or deviation
vector field) = x have vanishing Lie bracket.
Conversely this also provides a good visualisations of what it means for two vector
fields to Lie commute, namely that locally they span a 2-dimensional surface and
generate a coordinate grid on that surface. We will make use of this in section
11.1 when discussing the so-called geodesic deviation equation.
7. Having equipped the space of vector fields with a Lie algebra structure, in fact
with the structure of an infinite-dimensional Lie algebra, it is fair to ask the
Lie algebra of what group?. Well, we have seen above that we can think of
vector fields as infinitesimal generators of coordinate transformations. Hence,
formally at least, the Lie algebra of vector fields is the Lie algebra of the group
225
of coordinate transformations (passive point of view) or diffeomorphisms (active
point of view).22 We will briefly come back to this below, in remark 1 of section
8.4.
for the relation between the commutator of directional covariant derivatives and
the Riemann curvature tensor. There we had used the abbreviation [X, Y ] for the
vector field X Y Y X . Comparing with (8.16), we see that this is indeed
just the Lie bracket [X, Y ] . Thus one way of interpreting the Riemann tensor
is that the curvature measures the failure of the covariant derivative to provide a
representation of the Lie algebra of vector fields.
To extend the definition of the Lie derivative to other tensors, we can proceed in one of
two ways. We can either extend the above procedure to other tensor fields by defining
T
(y(x)) T (y(x))
LV T
:= lim . (8.27)
0
Or we can extend it to other tensors by proceeding as in the case of the covariant
derivative, i.e. by demanding the Leibniz rule. The Lie derivative on an arbitrary tensor
is then uniquely determined by its action on scalars and vectors.
In either case, the result can be rewritten in manifestly tensorial form in terms of
covariant derivatives. For example, for a covector one finds
LV A = V A + ( V )A = V A + ( V )A . (8.28)
The general result is that the Lie derivative of a (p, q)-tensor T is, like the covariant
derivative, the sum of three kinds of terms: the directional covariant derivative of T
along V , p terms with a minus sign, involving the covariant derivative of V contracted
with each of the upper indices, and q terms with a plus sign, involving the convariant
derivative of V contracted with each of the lower indices (note that the plus and minus
signs are interchanged with respect to the covariant derivative). Thus, e.g., the Lie
derivatives of a (0,2) and a (1,2)-tensor are
LV T = V T + T V + T V
(8.29)
LV T = V T T V + T V + T V .
22
See e.g. H. Gl ockner, Fundamental problems in the theory of infinite-dimensional Lie groups,
arXiv:math/0602078 [math.GR] for an introduction and a survey of the problems that arise when
dealing with or trying to define infinite-dimensional Lie groups.
226
Remarks:
1. While it is not obvious from the somewhat pedestrian definition of the Lie deriva-
tive that we have given here, the Lie derivative is an extremely natural operation
on tensors. In differential geometry textbooks (and mathematically more sophis-
ticated accounts of general relativity) it is defined as follows:
(c) Define the Lie derivative to be the infinitesimal generator of this action,
d t
LV T := ( ) T |t=0 . (8.31)
dt V
While this definition can be shown to be equivalent to the definition of the Lie
derivative given above in terms of coordinates, Taylor expansions etc., this defi-
nition is evidently more compact, more illuminating and somewhat more to the
point. In particular, it makes the tensorial nature of the Lie derivative manifest.
However, in order to arrive at explicit expressions for the Lie derivative of the
components of a tensor, one then still needs to perform a calculation equivalent
to (8.27).
2. The fact that the Lie derivative provides a representation of the Lie algebra of
vector fields by first-order differential operators on the space of (p, q)-tensors is
expressed by the identity
[LV , LW ] = L[V,W ] . (8.32)
227
8.5 Lie Derivative of the Metric and Killing Vectors
The above general formula (8.29) for the Lie derivative of a tensor becomes particularly
simple for the metric tensor g . The first term is not there (because the metric is
covariantly constant), so the Lie derivative is the sum of two terms (with plus signs)
involving the covariant derivative of V ,
LV g = g V + g V . (8.35)
Lowering the index of V with the metric, this can be written more succinctly as
LV g = V + V . (8.36)
The not manifestly covariant avatar of this equation (recall that fundamentally the Lie
derivative requires no notion of a covariant differentiation) is
LV g = V g + V g + V g . (8.37)
A quick alternative way to arrive at this result is to look directly at the infinitesimal
version of the difference
g (y)dy dy g (x)dx dx (8.38)
which was the starting point of our discussion in section 8.1 above. Namely, we consider
the infinitesimal coordinate transformation
V x = V V dx = dV = ( V )dx
(8.39)
V g (x) = V g (x) ,
and define the Lie derivative of the metric by the change this operation V induces in
the line element,
V (g dx dx ) (LV g )dx dx . (8.40)
This leads directly to (8.37) and thus to (8.36).
We are now ready to return to our discussion of isometries (symmetries of the metric).
Evidently, an infinitesimal coordinate transformation is a symmetry of the metric if
LV g = 0. By (8.36) this can be written as (see also (4.66))
V generates an isometry LV g = 0
(8.41)
V + V = 0 .
Vector fields V satisfying this equation are called Killing vectors - not because they
kill the metric but after the 19th century mathematician W. Killing.
The alternative non-covariant way (8.37) of writing the Killing equation makes it man-
ifest that only components and derivatives of the metric in the V -direction enter in this
condition,
V + V = 0 V g + V g + V g = 0 . (8.42)
228
This is precisely the condition (2.101) we had encountered first in our discussion of first
integrals of motion for the geodesic equation, and which we had already rewritten in
terms of covariant derivatives, as in (8.36) above, in (4.65).
Since they are associated with symmetries of space time, and since symmetries are
always of fundamental importance in physics, Killing vectors will play an important
role in the following. Our most immediate concern (in section 9, in particular section
9.1) will be with the conserved quantities associated with Killing vectors. Other aspects
of Killing vectors and their interplay with the geometry of a space-time will be discussed
in sections 12 and 13. For now we just note the following simple facts and examples:
1. Note that by virtue of (8.32) Killing vectors form a Lie algebra, i.e. if V and W
are Killing vectors, then also [V, W ] is a Killing vector,
LV g = LW g = 0 L[V,W ] g = 0 . (8.43)
Indeed one has
L[V,W ]g = LV LW g LW LV g = 0 . (8.44)
An explicit proof of this fact will be given later on in section 12.2.
2. The resulting algebra of Killing vectors is the Lie algebra of the isometry group
of the metric. For example, the collection of all Killing vectors of the Minkowski
metric generates the Lie algebra of the Poincare group. Indeed, for the Minkowski
space-time in inertial (Cartesian) coordinates a , i.e. with the constant standard
metric ab , the Killing condition simply becomes
a Vb + b Va = 0 , (8.45)
which is solved by
V a = ab b + a (8.46)
where the a are constant parameters and the constant matrices ab satisfy ab =
ba . These are precisely the infinitesimal Lorentz transformations and transla-
tions of the Poincare algebra, as given e.g. in (1.24).
Choosing as a basis for the Killing vectors of Minkowski space the vectors
Pa = a , Mab = a b b a , (8.47)
so that the general Killing vector V a (8.46) can be expanded as
V = V a a = 21 ab Mab + a Pa , (8.48)
the Lie algebra (algebra of Lie brackets) is given by
[Pa , Pb ] = 0
[Mab , Pc ] = ac Pb + bc Pa (8.49)
[Mab , Mcd ] = ad Mbc + bc Mad ac Mbd bd Mac .
This is of course the Lie algebra of the Poincare group.
229
3. Another simple example is provided by the two-sphere: as mentioned before, in
some obvious sense the standard metric on the two-sphere is rotationally invariant.
In particular, with our new terminology we would expect the vector field , i.e.
the vector field with components V = 1, V = 0 to be Killing. Let us check
this. With the metric d 2 + sin2 d2 , the corresponding covector V , obtained by
lowering the indices of the vector field V , are
V = 0 , V = sin2 . (8.50)
V = V V
= sin2 = 0
V + V = V V + V V
= 2 sin cos 2 cot sin2 = 0
V = V V = 0 . (8.51)
Alternatively, using the non-covariant form (8.42) of the Killing equation, one
finds, since V = 1, V = 0 are constant, that the Killing equation reduces to
g = 0 , (8.52)
which is obviously satisfied. This is clearly a simpler and more efficient argument.
By solving the Killing equations on S 2 , in addition to V(3) one finds two
other linearly independent Killing vectors V(1) and V(2) , namely
Note that V(3) evidently relates these two other Killing vectors by
This is the Lie algebra of infinitesimal rotations, i.e. of the rotation group SO(3),
which is the isometry group of the standard metric on S 2 .
230
4. In general, if the components of the metric are all independent of a particular
coordinate, say y, then by the above argument V = y is a Killing vector,
Such a coordinate system, in which one of the coordinate lines agrees with the
integral curves of the Killing vector, is said to be adapted to the Killing vector (or
isometry) in question. For any given Killing vector V one can always introduce
local coordinates such that V takes the form V = y . It suffices to choose as y
the parameter along the integral curves of V , using the remaining coordinates to
label the individual integral curves.
5. If one has two Killing vector fields V(1) and V(2) , then the necessary and sufficient
condition that one can introduce local coordinates (y 1 , y 2 , . . .) that are adapted
to both of them, i.e. such that V(k) = yk is that they commute as differential
operators, i.e. that they have vanishing Lie bracket,
6. As we did in section 2.6, one can also take the above equations (8.57) as the
starting point for what one means by a symmetry of the metric (isometry) and
then simply transform it to an arbitrary coordinate system by requiring that it
transforms as a (0, 2)-tensor. Then one arrives at the Killing condition in the form
(8.42).
7. Because by definition the geometry of a space-time does not change along the
orbits of a Killing vector, it is intuitively obvious that in particular the norm of
a Killing vector V should be constant along (the orbits of) V , and this is indeed
easy to prove. Here are two simple proofs of this statement, one using covariant
derivatives and the other using Lie derivatives:
V (V V ) = V (V V ) = 2V V V = 0 (8.59)
by anti-symmetry of V .
(b) Using Lie derivatives, one calculates
V (V V ) = LV (g V V )
(8.60)
= (LV g )V V + 2g (LV V )V = 0
231
8. An occasionally useful result that provides an interesting relation between geodesics
and Killing vectors (different from the one to be discussed below in section 9.1)
and that is straightforward to establish, is the fact that a Killing vector field is
geodesic if and only if it is of constant length. This follows by contracting the
Killing equation with V and writing
0 = V ( V + V ) = V V + 21 (V V ) . (8.61)
9. As an aside: a minimal variation of this proof establishes the same result for
gradient vector fields V = S instead of Killing vector fields, namely that a
gradient vector field is geodesic if and only if it is of constant length. Since a
gradient vector field satisfies
V = S V V = 0 (8.62)
0 = V ( V V ) = V V 12 (V V ) , (8.63)
It is straightforward to extend the Lie derivative to tensor densities. Given the fact
expressed in (3.57) that any tensor density can be written as tensor times a suitble
power of the determinant g of the metric, all we need to know is the Lie derivative
acting on g. For this we can use the general variational formula (4.74) to deduce
LV g = g g LV g . (8.64)
LV g = g g ( V + V ) = 2g V , (8.65)
and for the ubiquitous volume element g one finds
LV g= g V . (8.66)
It follows for example that for a scalar density of weight 1 gF , F a scalar, one has
LV ( g F ) = g(V F + F V ) = g (V F ) . (8.67)
232
Using (4.49), this can also be written as a total derivative
LV ( g F ) = ( g V F ) . (8.68)
This identity lies at the heart of the general covariance of actions built from scalars or
scalar densities, and we will discuss this aspect in more detail in sections 19.6 and 21.2.
Analogously, the Lie derivative can be extended to tensor densities of any rank and
weight.
233
9 Killing Vectors, Symmetries and Conserved Charges
We are used to the fact that symmetries lead to conserved quantities (Noethers theo-
rem). For example, in classical mechanics, the angular momentum of a particle moving
in a rotationally symmetric gravitational field is conserved. In the present context, the
concept of symmetries of a gravitational field is replaced by symmetries of the met-
ric, and we therefore expect conserved charges associated with the presence of Killing
vectors. Here are the two most important classes of examples of this phenomenon:
QK = K x (9.1)
Note that this is precisely the conserved quantity QV (2.102) with V K deduced
from Noethers theorem and the variational principle for geodesics in section 2.6.
234
9.2 Conformal Killing Vectors and Conserved Charges
Another situation of interest occurs when one has a theory invariant under Weyl rescal-
ings and thus a traceless energy-momentum tensor (section 6.7). In that case one can
associate conserved currents not only to Killing vectors fields but also to conformal
Killing vectors C , satisfying
C + C = 2(x)g (9.5)
for some function (x). Such conformal Killing vectors generate coordinate transfor-
mations that leave the metric invariant up to an overall (Weyl) rescaling.
If the theory is invariant under such Weyl rescalings, then the energy-momentum tensor
is traceless and there should also be a corresponding conserved current. Indeed, we have
JC = T C (9.6)
JC = ( T )C + T C
= 0 + 12 T ( C + C ) = (x)T g = 0 . (9.7)
We will look at the example of the conformal Killing vectors of Minkowski space in more
detail in section 9.3 below.
There is also a counterpart of statement 1 (conserved charges for geodesics) in the case
of conformal Killing vectors, namely for null geodesics (this condition replacing the
assumption in statement 2 that the energy-momentum tensor is traceless):
1 Let C be a conformal Killing vector field, and let x ( ) be a null geodesic. Then
the quantity
QC = C x (9.8)
is constant along the geodesic. Indeed, repeating the calculation leading to state-
ment 1, for a null geodesic one has
d d
QC = (C x ) = 12 ( C + C )x x = (x)g x x = 0 . (9.9)
d d
We will make use of (9.9) in the discussion of the cosmological redshift in section
33.7.
235
As an aside, note that if K is a true Killing vector for a metric g , say, then it is at
least a conformal Killing vector for any conformally rescaled metric
g = e 2(x) g . (9.10)
Indeed, writing the Killing equation in the non-covariant form (8.42) (in order to avoid
having to determine the covariant derivatives or Christoffel symbols of conformally
rescaled metrics)
K g + K g + K g = 0 (9.11)
and expressing this in terms of the metric g , one finds
K g + K g + K g = 2(K )g . (9.12)
Thus K will be a true Killing vector field for the rescaled metric if the conformal
factor (x) is constant along the orbits (integral curves) of K , and will otherwise be
a conformal Killing vector field. Conformal Killing vector fields that do not arise from
true Killing vector fields in this way are called essential. In the Riemannian case it is
known that (under some technical assumptions) metrics admitting essential conformal
vector fields are conformal to the standard metric on the sphere or the Euclidean space.
In the pseudo-Riemannian (Lorentzian signature) case the situation turns out to be
quite different (with an interesting connection with the plane wave metrics that are the
subject of section 42).23
As an example, let us consider 4-dimensional Minkowski space. In that case there are
5 conformal Killing vectors (in addition to the 10 true Killing vectors (8.46) generating
Poincare transformations).
D = a a : a Db + b Da = 2ab (9.14)
of dilatations,
236
In this case (x) = 1 is constant, and such a conformal symmetry is called a ho-
mothety (see also section 9.4 below). Provided that one has a symmetric traceless
conserved energy-momentum tensor, one has a corresponding conserved current
a
JD = T ab D b = T ab b . (9.16)
The dilatation and the special conformal transformation enlarge the Poincare algebra
(8.49) of translations and Lorentz transformations to the conformal algebra. Adding the
generators D and Cb C (b) to the generators Pa and Mab of the Poincare algebra, one
finds the extended algebra
[Pa , Pb ] = 0
[Mab , Pc ] = ac Pb + bc Pa
[Mab , Mcd ] = ad Mbc + bc Mad ac Mbd bd Mac
[D, Pa ] = Pa
[Mab , D] = 0 (9.22)
[Pa , Cb ] = 2(ab D Mab )
[Mab , Cc ] = ac Cb + bc Ca
[D, Ca ] = Ca
[Ca , Cb ] = 0 .
237
Here
the fourth expresses the obvious fact that Pa = a is homogeneous of degree (-1)
under the dilatation generated by D;
the eighth says that Ca is homogeneous of degree (+1) under the dilatation gen-
erated by D.
the last relation says that special conformal transformations generate an Abelian
algebra (corresponding to the fact that they generate inverted translations).
Thus the only relation that is not a priori obvious is the sixth, [Pa , Cb ] = 2(ab D Mab ),
but this follows simply from
It is perhaps also not obvious at first sight that this conformal Lie algebra is isomorphic
to the Lie algebra of SO(2, 4), or SO(2, D) in D space-time dimensions. This is the
group of rotations in the (D+2)-dimensional pseudo-Euclidean space R2,D preserving the
metric AB with signature (+. . .+), i.e. the indices have the range A = 0, 1, . . . , D+1,
and DD = (D+1)(D+1) = +1. Its Lie algebra is just the obvious counterpart of the
D-dimensional Lorentz Lie algebra (8.49), namely
Concretely, with z A Cartesian coordinates on R2,D , this Lie algebra can be realised as
the algebra of rotational Killing vectors of the metric AB , given by
Returning to the conformal algebra, it is now easy to see that with the identification
the Lie algebra relations (9.22) and (9.24) are mapped precisely into each other.
Thus, when one has a conserved, symmetric, traceless energy-momentum tensor, one
can construct conserved currents for the entire conformal group and thus has a (at least
classically) conformally invariant field theory (or conformal field theory for short).
As we have seen in section 6.7, when the matter action is invariant under Weyl rescalings
of the metric alone, the covariant energy-momentum tensor is conserved, symmetric and
238
traceless, and thus the specialisation of the theory to Minkowski space should define a
conformal field theory.
There is an interesting twist to this story when one also needs to transform the matter
fields (and modify the action by non-minimal couplings to the gravitational field) which
will be discussed in section 21.3.
Finally, let us consider the special case that the conformal factor (x) in (9.5) is constant,
(x) = 0 ,
C + C = 20 g (9.27)
In that case, the transformation generated by the conformal Killing vector is called a
homothety.
D = a a (9.28)
with Aab (u) an arbitrary function of u. These metrics have the homothety
for any choice of plane wave profile Aab (u), and this homothety is generated by
C = 2vv + xa xa . (9.31)
Whenever one has such a homothety, there is an explicitly -dependent conserved quan-
tity even for non-null geodesics:
QC = C x 0 g x x (9.32)
is constant along the geodesic. Indeed, repeating the calculation leading to state-
ment 1, and using the fact that g x x is constant, one finds
d
QC = 0 g x x 0 g x x = 0 . (9.33)
d
239
Remarks:
1. Note that for a null geodesic (9.32) reduces to the conserved charge C x (9.8) in
1 above (which does not explicitly depend on ).
2. The existence of this constant of motion can also be understood from the Noether
theorem (applied now to transformations of the fields x ( ) and the coordi-
nate ). Indeed, when one has a homothety, one has
g dx dx 2 g dx dx , (9.34)
2 d g x x d g x x , (9.35)
QD = ab a b ab a b . (9.36)
pa pa = m2 , (9.39)
240
9.5 Conserved Charges from Killing Tensors and Killing-Yano Tensors
When a metric possesses sufficiently many symmetries (Killing vectors), the geodesic
equations (or the associated Hamilton-Jacobi equation) or, say, the Klein-Gordon equa-
tion or some other field equation in that background are separable and can hence be
reduced to quadratures of ordinary differential equations. It is not uncommon, how-
ever, in particular in the context of black hole physics, to encounter space-times in which
these equations can be separated even though there appear not to be enough isometries
(symmetries of the metric) to explain this. In many cases, this phenomenon can be
explained via (or deduced from) the existence of additional (hidden) symmetries of the
problem, associated not to Killing vectors but to certain higher-rank generalisations
thereof. Most prominent among them are (totally symmetric) Killing tensors (occa-
sionally also called Killing-Stackel tensors), and (totally anti-symmetric) Killing-Yano
tensors.
To set the stage, recall from above that a Killing vector satisfies
( K) = 0 K = [K] (9.41)
and that using the geodesic equation x x = 0 this leads to a first integral QK =
K x of the geodesic equations of motion via the simple chain of manipulations
d
(K x ) = x (K x ) = x x K = 0 (9.42)
d
by symmetry of x x and anti-symmetry of K .
This has the following two immediate (and, as it turns out, actually useful in practice)
generalisations:
This is evidently one possible generalisation of the Killing vector equation (9.41)
to higher rank tensors (generalising the first formulation in (9.41)). Then
QK = K1 ...n x 1 . . . x n (9.44)
241
2. Killing-Yano Tensors
Let Y1 ...n be totally anti-symmetric rank-n tensor satisfying the Killing-Yano
equation
( Y1 )...n = 0 Y1 ...n = [ Y1 ...n ] (9.46)
This is evidently another possible generalisation of the Killing vector equation
(9.41) to higher rank tensors. Then the tensorial charges
Remarks:
1. Trivial examples of Killing tensors are the metric g (whose associated conserved
quantity g x x we already know), and products of Killing vectors K . . . K
which do not yield any new independent constants of motion beyond those pro-
vided by the Killing vectors. New constants of motion are associated with Killing
tensors that cannot be constructed from the metric and the Killing vectors alone.
Trivial Killing-Yano tensors are Killing vectors K and the Levi-Civita tensor (in
four dimensions ).
2. There are interesting relations between Killing-Yano tensors and Killing tensors.
For example, it is not difficult to check that if Y is a rank-2 Killing-Yano tensor,
then its square
K = Y Y (9.49)
(which is symmetric) is a rank-2 Killing tensor (and squares of trivial Killing-Yano
tensors give rise to trivial Killing tensors, as in K K K ). Indeed, the totally
symmetrised covariant derivative of this K can be expresed in terms of partially
symmetrised covariant derivatives of Y , but by definition of a Killing-Yano tensor
its covariant derivatives are totally anti-symmetric, and hence
Y = [ Y] ( K) = 0 . (9.50)
( C) = (x)g , (9.51)
242
and just as the latter these turn out to be useful for massless particles or fields.
For example, a rank 2 conformal Killing tensor satisifies an equation of the form
( C) = g( V) (9.52)
for some (co-)vector field V . Repeating the calculation (9.45) in the case at hand
for the quantity QC = C x x , one finds
d
QC = ( C) x x x = g V x x x (9.53)
d
which evidentliy vanishes for null geodesics (g x x = 0).
4. Historically, the discovery of (conformal) Killing and Killing-Yano tensors for the
Kerr metric, the metric describing a rotating black hole (see section 29.1) and
their relation to the separability of the geodesic and field equations in the Kerr
background played a decisive role in the development of the subject.24
24
For more information about and examples and applications of Killing(-Yano) tensors, see e.g. section
35.3 of H. Stephani, D. Kramer, M. MacCallum, C. Hoenslaers, E. Herlt, Exact Solutions to Einsteins
Field Equations - Second Edition or the articles O. Santillan, Killing-Yano tensors and some applica-
tions, arXiv:1108.0149 [hep-th], F. Larsen, C. Keeler, Separability of Black Holes in String Theory,
arXiv:1207.5928 [hep-th] and the references therein.
243
10 Curvature II: Geometry and Curvature
In this section, we will first discuss two properties of the Riemann curvature tensor
that illustrate its geometric significance and thus, a posteriori, justify equating the
commutator of covariant derivatives with the intuitive concept of curvature. These
properties are
the fact that the space-time metric is equivalent to the (in an obvious sense flat)
Minkowski metric if and only if the Riemann curvature tensor vanishes.
We then briefly discuss some other general aspects of the relation between geometry
and curvature (while the interplay between geodesics and curvature and Killing vectors
and curvature will be discussed in sections 11 and 12 respectively).
The Riemann curvature tensor and its relatives, introduced above, measure the intrinsic
geometry and curvature of a space or space-time. This means that they can be calculated
by making experiments and measurements in the space itself. Such experiments might
involve things like checking if the interior angles of a triangle add up to or not.
This intrinsic geometry and curvature described above should be contrasted with the
extrinsic geometry which depends on how the space may be embedded in some larger
space. As we have no intention of embedding space-time into something higher dimen-
sional, we will mainly be concerned with intrinsic geometry in the following. However,
if you would for example be interested in the properties of spacelike hypersurfaces in
space-time, then aspects of both intrinsic and extrinsic geometry of that hypersurface
would be relevant. See section 17 for some further comments on this.
Let us return to intrinsic geometry. An even better method, the subject of this section,
to determine the curvature is to check the properties of parallel transport. The tell-tale
sign (or smoking gun) of the presence of curvature is the fact that parallel transport is
path dependent, i.e. that parallel transporting a vector V from a point A to a point B
along two different paths will in general produce two different vectors at B. Another
way of saying this is that parallel transporting a vector around a closed loop at A will
in general produce a new vector at A which differs from the initial vector.
This is easy to see in the case of the two-sphere, for which we also worked out explicitly
the parallel transport in section 4.9 (see Figure 8). Since all the great circles on a
two-sphere are geodesics, in particular the segments N-C, N-E, and E-C in the figure,
we know that in order to parallel transport a vector along such a line we just need to
244
N
E
4 3
C
make sure that its length and the angle between the vector and the geodesic line are
constant. Thus imagine a vector 1 at the north pole N, pointing downwards along the
line N-C-S. First parallel transport this along N-C to the point C. There we will obtain
the vector 2, pointing downwards along C-S. Alternatively imagine parallel transporting
the vector 1 first to the point E. Since the vector has to remain at a constant (right)
angle to the line N-E, at the point E parallel transport will produce the vector 3 pointing
westwards along E-C. Now parallel transporting this vector along E-C to C will produce
the vector 4 at C. This vector clearly differs from the vector 2 that was obtained by
parallel transporting along N-C instead of N-E-C.
To illustrate the claim about closed loops above, imagine parallel transporting vector 1
along the closed loop N-E-C-N from N to N. In order to complete this loop, we still have
to parallel transport vector 4 back up to N. Clearly this will give a vector, not indicated
in the figure, different from (and pointing roughly at a right angle to) the vector 1 we
started off with.
The precise statement regarding the relation between the path dependence of parallel
transport and the presence of curvature is the following. If one parallel transports a
covector V (I use a covector instead of a vector only to save myself a few minus signs
here and there) along a closed infinitesimal loop x ( ) with, say, x(0 ) = x(1 ) = x0 ,
245
then one has I
V (1 ) V (0 ) = 1
2( x dx )R (x0 )V (0 ) . (10.1)
Thus an arbitrary vector V will not change under parallel transport around an arbitrary
small loop at x0 only if the curvature tensor at x0 is zero. This can of course be extended
to finite loops, but the important point is that in order to detect curvature at a given
point one only requires parallel transport along infinitesimal loops.
Before turning to a proof of this result, I just want to note that intuitively it can be
understood directly from the definition of the curvature tensor (7.2). Imagine that the
infinitesimal loop is actually a tiny parallelogram made up of the coordinate lines x1
and x2 . Parallel transport along x1 is governed by the equation 1 V = 0, that along
x2 by 2 V = 0. The fact that parallel transporting first along x1 and then along x2
can be different from doing it the other way around is precisely the statement that 1
and 2 do not commute, i.e. that some of the components R12 of the curvature tensor
are non-zero.
For sufficiently small (infinitesimal) loops, we can expand the Christoffel symbols as
The linear term in the expansion of V ( ) arises from the zeroth order contribution
(x0 ) in the first order (single integral) term in (10.4),
Z 1
[V (1 ) V (0 )](1) = (x0 )V (0 )( d x ( )) . (10.6)
0
Now the important observation is that, for a closed loop, the integral in brackets is zero,
Z 1
d x ( ) = x (1 ) x (0 ) = 0 . (10.7)
0
246
Thus the change in V ( ), when transported along a small loop, is at least of second
order. Such second order terms arise in two different ways, from the first order term
in the expansion of (x) in the first order term in (10.4), and from the zeroth order
terms (x0 ) in the quadratic (double integral) term in (10.4),
Z 1
[V (1 ) V (0 )](2) = ( )(x0 )V (0 )( d (x( ) x0 ) x ( ))
0
Z 1 Z
+ ( )(x0 )V (0 ) d d x ( )x ( )(10.8)
0 0
The final observation we need is that the remaining integral is anti-symmetric in the
indices , , which follows immediately from
Z 1 Z 1
d
d (x ( )x ( ) + x ( )x ( ) = d
(x ( )x ( )) = 0 . (10.11)
0 0 d
It now follows from (10.10) and the definition of the Riemann tensor that
I
V (1 ) V (0 ) = 12 ( x dx )R (x0 )V (0 ) . (10.12)
Simply by raising and lowering of the indices, and using the symmetry properties of
the Riemann tensor, we can deduce that the corresponding equation for the parallel
tansport of vectors is
I
V (1 ) V (0 ) = 2 ( x dx )R (x0 )V (0 ) .
1
(10.13)
As an example, recall that in section 7.6 we already determined explicitly the parallel
transport of vectors on the 2-sphere along the circles with fixed = 0 . Choosing 0
infinitesimal corresponds to an infinitesimal loop around the north pole. Expanding the
result (4.114) for small 0 , in particular using
one finds complete agreement between (10.13) and the components of the Riemann
tensor of the 2-sphere, determined in (7.57),
r = sin2 , r = 1 , (10.15)
247
evaluated for 0 0. In verifying this, some care should be taken with the fact that =
0 is a coordinate singularity so that one should never strictly set 0 = 0. Alternatively,
and to be on the safe side, one can rewrite (10.13) as an equation for orthonormal frame
components and use the result (4.119) for the parallel transport of the frame components
(which is not sensitive to coordinate singularities).
We are now finally in a position to prove the converse to the statement that the
Minkowski metric has vanishing Riemann tensor. Namely, we will see that when the
Riemann tensor of a metric vanishes, locally there are coordinates in which the metric
is the standard Minkowski metric. Since the opposite of curved is flat, this then allows
one to unambiguously refer to the Minkowski metric as the flat metric (locally at least),
and to Minkowski space as flat space(-time).
So let us assume that we are given a metric with vanishing Riemann tensor. Then, by
the above, parallel transport is path independent and we can, in particular, extend a
vector V (x0 ) to a vector field everywhere in space-time: to define V (x1 ) we choose any
path from x0 to x1 and use parallel transport along that path. In particular, the vector
field V , defined in this way, will be covariantly constant or parallel, V = 0. We can
also do this for four linearly independent vectors Va at x0 and obtain four covariantly
constant (parallel) vector fields which are linearly independent at every point.
An alternative way of saying or seeing this is the following: The integrability condition
for the equation V = 0 is
V = 0 [ , ]V = R V = 0 . (10.16)
We will now use this result in the proof, but for covectors instead of vectors. Clearly
this makes no difference: if V is a parallel vector field, then g V is a parallel covector
field.
248
Now we solve the equations
Ea = 0 Ea = Ea (10.18)
with the initial condition Ea (x0 ) = ea . This gives rise to four linearly independent
parallel covectors Ea .
Ea = Ea . (10.19)
Summing this up, we have seen that, starting from the assumption that the Riemann
curvature tensor of a metric g is zero, we have proven the existence of coordinates a
in which the metric takes the Minkowski form,
a b
g = ab . (10.23)
x x
The argument given above is local in the sense that the existence of these coordinates
a is only guaranteed locally, i.e. in the neighbourhood of some point. Whether or
not these coordinates can be used to cover the space-time globally depends on gobal
(topological) properties of the space-time which are not captured by the intrinsic local
and locally determined Riemann tensor.
For example, imagine starting with Minkowski space R1,3 with inertial coordinates a ,
and then making a periodic identification of 1 , say,
in the new space-time the coordinate 1 , which is now an angular variable, is not
globally well defined,
and the space-time looks like Minkowski space only locally, not globally.
249
10.3 Curvature of Surfaces: Euler, Gauss(-Bonnet) and Liouville
We can generalise the example of the curvature of the 2-sphere, discussed in section
7.6, somewhat, in this way connecting our considerations with the classical realm of
the differential geometry of surfaces, in particular with the Gauss Curvature, the Euler
characteristic, the Gauss-Bonnet theorem and the Liouville Equation.
For any 2-dimensional metric gab it is a simple exercise to derive the relation between the
one independent component, say R1212 , of the Riemann tensor, and the scalar curvature.
First of all, the Ricci tensor is
R(gab ) = g ab Rab = g11 R2121 + g12 R1112 + g21 R2221 + g 22 R1212 . (10.26)
Using the fact that in 2 dimensions the components of the inverse metric are explicitly
given by !
1 g g
22 12
gab = (10.27)
g11 g22 g12 g21 g21 g11
and the (anti-)symmetry properties (1) and (2) of the Riemann tensor, one finds
2
R(gab ) = R1212 . (10.28)
g11 g22 g12 g21
This is precisely the relation (7.42) between the Riemann tensor and Ricci scalar. The
factor of 2 in this equation is a consequence of our (and the conventional) definition of
the Riemann curvature tensor, and is responsible for the fact that the scalar curvature
of the unit 2-sphere is R = +2. We can also write this result as
Rabcd = 21 (gac gbd gad gbc )R Rabcd = 12 (ac gbd ad gbc )R . (10.29)
In two dimensions, it is often convenient and natural to absorb this ubiquitous factor
of 2 into the definition of the (scalar) curvature, and what one then gets is the classical
Gauss Curvature
1
K := R(gab ) (10.30)
2
of a two-dimensional surface.
It follows from (10.29) that the Ricci tensor is related to the Ricci scalar by
This generalises the result for the standard metric on the 2-sphere found by explicit
calculation in section 7.6. It shows that in complete generality the Ricci tensor of a
two-dimensional space or space-time, thought of as the linear map
250
has only one (double) eigenvalue, namely the Gauss curvature K. It can also be inter-
preted as saying that in 2 dimensions the Einstein tensor (7.92) is identically zero,
We will now briefly look at two important and interesting consequenes of the above for-
mulae, one related to the Euler characteristic of a surface and its integral representation
(the Gauss-Bonnet theorem), and the other to the Liouville equation describing metrics
with constant Gauss curvauter K = k = 1.
Clearly, this areas depends on a choice of metric, and under a variation g of the
metric it transforms as
Z Z
2 1 2 ab
g A(Sh ) = g gd x = 2 gd x g gab . (10.35)
Sh Sh
g (Sh ) = 0 . (10.37)
Here are two rather explicit ways of establishing this remarkable result:
251
Then one finds
g ( gR) = (g g)g ab Rab + g(g ab )Rab ) + gg ab g Rab
= g( 12 Rgab + Rab )gab + ggab g Rab (10.40)
= gGab gab + ggab g Rab .
for some well-defined B a built from the covariant derivatives of the variations
of the metric, as in (19.19). Taken together, these two facts imply that for a
closed surface Sh (without boundary) one has
Z
1 2
g (Sh ) = gd x(Gab gab + a B a ) = 0 , (10.42)
4
as was to be shown.
(b) Alternatively, somewhat less covariantly but very explicitly, one can show
that the integrand gK or gR can itself locally be written as a total deriva-
tive. Indeed, using (10.29) to write
1 2
R1212 = 21 g11 R K= R (10.43)
g11 121
and simply writing out explicitly this Riemann curvature tensor component
in terms of the Christoffel symbols,
252
Either way we have seen that the real number (Sh ) is independent of the metric
one uses to calculate it. For example, for h = 0 and for the standard metric on
the sphere S 2 one finds
Z Z
1 1
(Sh=0 ) = (S 2 ) = gR = g=2 , (10.48)
4 2
and this will therefore be the result for any metric on S 2 . Likewise, for h = 1, i.e.
a torus, by choosing the flat metric on T 2 (see e.g. the discussion and construction
in section 17.1), one finds
(Sh=1 ) = (T 2 ) = 0 , (10.49)
and this will therefore be the result for any metric on T 2 (and it is instructive
to check this explicitly for the non-trivial, non-flat metric on T 2 induced by its
embedding into R3 constructed in section 17.1). I am not aware of an equally
elementary calculation to determine (Sh ) for h > 1 in this way but fact of the
matter is that
(Sh ) = 2 2h (10.50)
is the Euler characteristic of Sh , which can also be defined purely combinatorially
as the number
(S) = nF nE + nV (10.51)
of faces minus vertices plus edges of any cubist rendition of a surface S (and
(S) is independent of such a cubist realisation or triangulation). The remarkable
fact that this topological invariant of a surface S can be calculated in terms of
differential geometric quantities, namely as the integral of the curvature scalar, is
known as the Gauss-Bonnet theorem.
the calculation of the Riemann tensor is particularly simple and one finds the
(easy to memorise) results
Rxyxy = h (10.53)
and
K = e 2h h (10.54)
where = x2 +y2 is the 2-dimensional Laplacian with respect to the flat Euclidean
metric dx2 + dy 2 . Thus a surface with constant curvature K = k is given by a
solution to the non-linear differential equation
h + ke 2h = 0 . (10.55)
253
This is the (in-)famous Liouville equation, which plays a fundamental role in many
branches of mathematics (and mathematical physics).
In terms of the intrinsic Laplacian g associated to the metric gab , the Gaussian
cuvature and the Liouville equation can also simply be written as
K = g h , g h + k = 0 , (10.56)
since, due to the peculiarities of 2 dimensions, ggab in independent of h, i.e. is
conformally invariant (as we already observed in a different context in section 6.7,
cf. (6.120)),
gg ab = e 2h e 2h ab = ab
1 1 (10.57)
g = a ( gg ab b ) = a (ab b ) = e 2h .
g g
I will not attempt to say anything about the general (local) solution of this equa-
tion (which roughly speaking depends on an arbitrary meromorphic function of
the complex coordinate z = x + iy), but close this section with some special (and
particularly prominent) solutions of this equation.
dx2 + dy 2
ds2 = ( (x, y) R2 , y > 0 ) . (10.59)
y2
By the coordinate transformation y = ez this is mapped to the equivalent
metric
ds2 = dz 2 + e 2z dx2 (10.60)
dx2 + dy 2
ds2 = 4 . (10.62)
(1 + x2 + y 2 )2
254
This is the constant positive curvature metric on the Riemann sphere one
gets by stereographic projection of the standard metric on the two-sphere
S 2 to the (x, y)-plane.
In terms of polar coordinates (r, ) on the Euclidean plane, this metric
takes the form
dr 2 + r 2 d2
ds2 = 4 , (10.63)
(1 + r 2 )2
and the further change of variables r = tan /2 shows that this is indeed
the standard line element d2 on the 2-sphere,
Read backwards, this can also be read as the statement that via the
above change of variables the Euclidean metric on R2 can be written as
(1 + r()2 )2 1
dr 2 + r 2 d2 = (d 2 + sin2 d2 ) = d2 . (10.65)
4 4 cos4 /2
dx2 + dy 2
ds2 = 4 ( {x, y} R2 , x2 + y 2 < 1 ) . (10.66)
(1 (x2 + y 2 ))2
This is the Poincare disc model of the hyperbolic geometry, defined in
the interior of the unit disc in R2 . In terms of polar coordinates, it can
also be written as
dr 2 + r 2 d2
ds2 = 4 (0r<1) (10.67)
(1 r 2 )2
The two metrics (10.59) and (10.66) are isometric, i.e. related by a (albeit
not completely evident) coordinate transformation.
(c) As our final example, one other solution (given here only for k = 1) is
e 2x
e 2h(x, y) = e 2h(x) = 4 . (10.68)
(1 e 2x )2
255
where
x = log tanh(/2) dx = d/ sinh . (10.71)
e 2x
sinh2 (x) = 4 , (10.72)
(1 e 2x )2
It is worth remarking that the Poincare upper-half plane model of a space with constant
negative curvature readily generalises to arbitrary dimensions and signature. Thus
d~x2 + dy 2
ds2 = , d~x2 = ab dxa dxb or d~x2 = ab dxa dxb (10.73)
y2
The Lorentzian metric will reappear later as a solution to the Einstein equations with a
negative cosmological constant, and is in this context known as the anti-de Sitter metric
(in Poincare coordinates, which cover only a part of the complete space-time), and we
will discuss this solution in some detail in section 38.
In section 7 from the Riemann tensor we have extracted its traces, the Ricci tensor and
the Ricci scalar, as well as a particular linear combination of them, the Einstein tensor.
We can therefore also explicitly decompose the Riemann tensor into these trace parts
and the remaining traceless part.
We noted in section 7.5 that for D = 2 and D = 3 the Riemann tensor would be pure
trace, i.e. could be written entirely in terms of the Ricci tensor and Ricci scalar. For
D = 2 we have already established this explicitly by proving the relation (10.29),
D=2: R = 12 (g g g g )R (10.74)
in section 10.3.
We now look at this issue for D 3. Simply by linear algebra one finds, for D 3, the
decomposition
R = C
1
+ (g R + R g g R R g ) (10.75)
D2
1
R(g g g g ) .
(D 1)(D 2)
256
This definition is such that C has all the symmetries of the Riemann tensor (this is
manifest) and such that all of its traces are zero, i.e.
C = 0 , (10.76)
as is easily verified. This traceless part C of the Riemann tensor is called the Weyl
tensor.
Occasionally it is more convenient and transparent to decompose the Riemann tensor not
into the Weyl tensor, the Ricci tensor and the Ricci scalar, but to perform an orthogonal
decomposition (with respect to the metric) into the Weyl tensor, the traceless part S
of the Ricci tensor,
1
S = R g R , (10.77)
D
and the trace R. Then the decomposition becomes
R = C
1
+ (g S + S g g S S g ) (10.78)
D2
1
+ R(g g g g ) .
D(D 1)
One other common and convenient decomposition is in terms of a tensor P such that
(10.75) takes the form
R = C + (g P + P g g P P g ) . (10.79)
Regardless of how we write the trace part of the Riemann tensor, it turns out that for
D = 3 the Weyl tensor vanishes identically,
D=3: C 0 (10.81)
(I will give an elementary proof of this momentarily). Therefore, for D = 3 one has the
decomposition
D=3: R = (g R + R g g R R g )
(10.82)
+ 12 (g g g g )R
This is precisely the result claimed previously in (7.43).
To establish (10.81), in order to trivialise the algebra let us fix a point x0 and choose
coordinates there such that g (x0 ) = (or , depending on the signature of the
metric, but let us assume that we are in the case of Euclidean signature - the same argu-
ment works in the Lorentzian case). Now the proof consists of the following elementary
steps:
257
Since we are in D = 3, at least two of the indices in C must be equal. Since the
Weyl tensor has all the symmetries of the Riemann tensor, if more than two indices
are equal, the Weyl tensor component is zero. Thus we only need to consider the
components where 2 indices are equal and we can without loss of generality choose
these to be C11 , say, with , 6= 1.
Thus all in all the Weyl tensor can have only 3 independent non-zero components,
namely C1212 = C2121 , C1313 = C3131 , C2323 = C3232 , and they are all required to
be pairwise negatives of each other. This is impossible for non-trivial C ,
and implies that all of the components of the Weyl tensor are identically zero in
D = 3.
Thus the Weyl tensor is only non-trivial for D 4. Using the Bianchi identies discussed
in section 7.8, in particular also (7.89),
R = R R (10.86)
from (10.75) one finds a simple expression for the divergence, namely
C = (D 3) ( P P ) . (10.87)
The tensor appearing on the right-hand side also has its own name. It is called the
Cotton Tensor C ,
C = P P . (10.88)
The content of (10.87) is evidently trivial in D = 3, but the Cotton tensor itself is not
(and I will briefly come back to this below).
The Weyl tensor plays an important role in many aspects of gravitational physics:
1. For example, the Weyl tensor has traditionally been one of the central objects
of interest in the invariant algebraic classification of gravitational fields and in
258
the characterisation of what are known as algebraically special solutions to the
Einstein equations (the so-called Petrov classification and related procedures).
Originally, this was (of course) developed for D = 4, and this case has a number of
special features. It is based on the classification of the properties of the eigenvalues
of the Weyl tensor (at a point x0 ), thought of as a map on the space of anti-
symmetric (2, 0)-tensors (bivectors),
1
2 C X = X (10.89)
or
C AB X B = X A (10.90)
with C CAB thought of as a symmetric (6 6) matrix.25
An equivalent (as it turns out) classification arises from determining the number
and multiplicity of linearly independent null vectors satisfying the condition
[ C][ ] = 0 . (10.91)
Such are called the principal null directions of the Weyl tensor. More recently,
this classification scheme (based on the latter approach) has been (partially) ex-
tended to higher dimensions.26
2. As we will see in section 18.6, the Einstein equations imply that the Weyl tensor
describes the gravitational field in vacuum. Specifically, when (or where) the
energy-momentum tensor is zero, the Riemann curvature tensor is equal to the
Weyl tensor,
T (x) = 0 R (x) = C (x) . (10.92)
The Weyl tensor thus encodes the information about things like gravitational
waves and the asymptotic behaviour of a gravitational field and has been studied
extensively from this point of view.
3. In the presence of matter, on the other hand, (10.87), in conjunction with the
Einstein equations, becomes an evolution equation for these vacuum components
of the gravitational field in terms of the sources - see equations (18.51) and (18.52).
The Weyl tensor also plays an important role in geometry, as it is conformally invariant,
i.e. C is invariant under conformal (Weyl) rescalings of the metric,
259
equivalently
In particular, the Weyl tensor is zero if the metric is conformally flat, i.e. related by a
conformal transformation to the flat metric (of any signature),
This can be established by brute force calculation and is not per se particularly enlight-
ning.
Conversely for D 4 vanishing of the Weyl tensor is also a sufficient condition for a
metric to be (locally) conformal to the flat metric. This is a non-trivial result because
at face value one seems to obtain a completely overdetermined system of equations for
the single function f , of the form
However, it turns out that the integrability conditions for this system of equations are
equivalent to the vanishing of the Weyl tensor, and then a variant of the Frobenius
integrability theorem (mentioned in a different context in section 14.5) can be used to
establish the local existence of a solution f .
For D = 3, the situation is slightly (but not fundamentally) different. We see from
(10.88) that for any D 4 conformal flatness implies vanishing of the Cotton tensor.
It turns out that for D = 3 the Cotton tensor takes over the role of the Weyl tensor
(which, as proven above, is itself trivial for D = 3), i.e. one has the statement that for
D = 3 a metric is (locally) conformally flat if and only if the Cotton tensor vanishes.
In section 4.4 we had seen that the Levi-Civita connection (defined by the Christoffel
symbols) is characterised by the fact that
2. the torsion is zero, i.e. the second covariant derivatives of a scalar commute.
It is of course possible to relax either of the conditions (1) or (2), or both of them and,
in particular, connections with torsion (relaxation of condition 2) are popular in certain
circles and/or arise naturally in certain generalised (gauge) theories of gravity and in
string theory.
260
with the canonical Levi-Civita connection, and C a (1, 2)-tensor. We will also
use the corresponding (0, 3)-tensor
C = g C . (10.98)
V = V +
V (10.99)
etc. The reason for this choice is that one should think of the collection of objects
(and ) as the coefficients of a matrix-valued 1-form (cf. section 3.6) = dx ,
the matrices acting by rotation on vectors (and more general tensors), as in (4.20).
,
[ ] = T , T = g T , (10.100)
g = Q .
(10.101)
T = C C = 2C[]
(10.102)
Q = C + C = 2C() .
Thus the torsion is zero iff C (and hence ) is symmetric in its lower indices, and
the connection is compatible with the metric iff C is anti-symmetric in its first two
indices. In particular, if the torsion is zero and the connection is metric-compatible, one
has
C = C and C = C C = 0 , (10.103)
C = C = C = C = C = C = C . (10.104)
Conversely, since the absence of torsion and non-metricity characterises the Levi-Civita
connection, it should be possible to express the deviation C from the Levi-Civita
connection entirely in terms of torsion and non-metricity. This is indeed the case. By
repeating the calculation (4.43) in this more general context, one finds
2C() = Q + Q Q T T . (10.105)
261
Combining this with 2C[] = T , one obtains
C = 12 (T T T ) + 12 (Q + Q Q )
(10.106)
T + Q ,
with
T = 12 (T T T ) = T (10.107)
and
Q .
= 1 (Q + Q Q ) = Q (10.108)
2
Thus we can now split a general connection more informatively into the 3 pieces
= + T + Q
. (10.109)
Remarks:
Q = 0 = + T .
(10.110)
but it cannot be symmetric (if the contorsion were symmetric, the torsion, and
hence the contorsion, would be zero). If its symmetric part vanishes, then T is
completely anti-symmetric,
= 12 g, + 12 (g, g, ) , (10.113)
one might be tempted to think that that part can be cancelled (or absorbed)
by a metric-compatible C = C[] , so that a very simple metric-compatible
connection would be
? 1
= = ? 1
2 g, , 2 g g, . (10.114)
However, the term that one has canclled (or absorbed) is not a tensor. Therefore,
this candidate connection does not transform as (and therefore does not qualify
as) a connection and cannot be used to define a covariant derivative.
262
the notions of autoparallels (section 4.8),
3. In general, for a connection ,
X = 0
x x x = 0 ,
+ (10.115)
(i.e. curves characterised by the fact that their tangent vectors are parallel trans-
ported along the curve - this depends on a choice of connection) no longer coincides
with the notion of geodesics (which are obtained by extremising proper time or
distance, and which always lead to the Levi-Civita connection). However, this
difference disappears if C happens to be anti-symmetric in its lower indices
(e.g. for a metric-compatible connection with totally anti-symmetric contorsion
tensor), as one then has
x x = x
+
x + x x . (10.116)
We have defined the Riemann tensor via the commutator of covariant derivatives (7.2)
[ , ]V = R V (10.117)
R = + . (10.118)
In order to show explicitly (rather than by appealing to (10.117)) that this transforms as
a tensor, all that one needs is the characteristic non-tensorial transformation behaviour
of the Christoffel symbols . As discussed in section 4.4 and above, an arbitrary
connection that can be used to define a tensorial covariant derivative has the same
non-tensorial transformation behaviour. Therefore
R ()
R =
+
(10.119)
.
defines a tensor for any connection, namely the curvature tensor of the connection
It is related to the commutator of covariant derivatives by
,
[ ]V = R
V + (
) V = R
V + T V , (10.120)
where T is the torsion tensor. As before, one can also define the Ricci tensor and
Ricci scalar by
R ()
R =R
, R()
R = g R
. (10.121)
However, it is crucial to keep in mind that the symmetry properties and Bianchi iden-
tities satisfied by these generalised curvature tensors will in general differ from those of
the Riemann-Christoffel tensor. This should be clear from the way we derived the sym-
metries of the Riemann tensor in section 7.3, where we related the symmetries to the
properties (metricity, no torsion) that characterise the canonical Levi-Civita connection
263
(Christoffel symbols). For example, in general the Ricci tensor will not be symmetric,
[] to the
the Bianchi identity R[] = 0 will be replaced by an identity relating R
torsion (and its covariant derivative), etc.27
For some further discussion of connections with non-metricity or torsion and their cur-
vature tensors see section 19.7.
27
For more on this and related topics, see e.g. section 1 of T. Ortin, Gravity and Strings.
264
11 Curvature III: Curvature and Geodesic Congruences
In section 7.4 we had already encountered the so-called geodesic deviation equation
(7.36),
(D )2 x = R x x x , (11.1)
describing the evolution of a separation (or deviation) vector along a given geodesic. In
this section we will rederive this result in a more satisfactory and covariant manner and
also use the same covariant framework to discuss the extension of these results to the
so-called Raychaudhuri equation, which descibes the focussing properties of congruences
of geodesics.
u u = 0 , (11.2)
[u, ] = u u = 0 D = u . (11.3)
so the matrix B describes the evolution and deformation of the deviation vector
along the geodesic. Because u is affinely geodesic, it satisfies
B u = u u = 0 (11.8)
and
u B = 12 (u u ) = 0 , (11.9)
265
and is thus transverse to u . This is a crucial property we will come back to in the
discussion of the Raychaudhuri equation below. As a consequence one has
u D = 0 (11.10)
d
(u ) = D (u ) = u D = 0 . (11.11)
d
This means that the u-component of a geodesic deviation vector in the sense of u
is simply constant and contains no interesting information about the geodesic itself.
In the timelike case this means that a vector of the form = u is a deviation vector
only if is constant, and then is simply a translation along the geodesic and therefore
not a deviation vector of interest (and certainly anyhow not a vector of the kind one
has in mind when thinking about a deviation vector, which should point away from the
geodesic). In the null case, the interpretation is slightly different (and we will return to
this in section 11.4), but the fact that u is simply constant for a deviation vector
remains, and we can without loss of information choose the deviation vector to satisfy
the condition u = 0.
(D )2 = (D B ) + B D
(11.12)
= (D B + B B ) .
For the term in brackets we find, using the geodesic equation for u ,
D B + B B = u u + ( u ) u
= u u + (u u ) u u (11.13)
= u ( )u = R u u ,
and plugging this back into (11.12), we obtain straightaway the covariant version (7.36)
of the geodesic deviation equation in the form
(D )2 = R u u . (11.14)
u (D )2 = R u u u = 0 . (11.15)
Remarks:
266
1. I hope you agree that this derivation is somewhat more satisfactory than the one
given in section 7.4.
2. The object we have called B in (11.6) and its evolution equation (11.13) will
play a central role in our derivation of the Raychaudhuri equation below.
4. If the curve is not a geodesic (but still parametrised by proper time, so that
u u = 1), then the above derivation shows that in addition to the force exerted
by the space-time curvature the deviation vector feels a force proportional to the
change of the acceleration a = u u along the curve,
(D )2 = R u u + D a . (11.16)
In flat space, only the last term is present and describes the (tidal) forces arising
from the possible non-uniformity of the external force acting on the particle (or,
better: on the extended object described by a family of worldlines) to produce
the acceleration a . Thus, in precise analogy with the Newtonian situation, the
gravitational (i.e. here now Riemann curvature tensor) contribution to the geodesic
deviation equation should be interpreted as the gravitational tidal force.
Manipulations similar to those leading to (11.14) allow one to derive an equation for
the rate of change of the divergence u of a family of geodesics along the geodesics.
This simple result, known as the Raychaudhuri equation, has important implications and
ramifications in general relativity, in particular in the context of the so-called singularity
theorems of Penrose, Hawking and others, none of which will, however, be explored here
(see footnote 93 of section 28.3 for some references).
Thus u now denotes a tangent vector field to an affinely parametrised geodesic con-
gruence, u u = 0 (and u u = 1 or u u = 0 everywhere for a timelike or null
congruence). As in section 11.1, we introduce the tensor field (11.6)
B = u . (11.17)
267
Recall from section 11.1 that B satisfies (11.8), (11.9),
B u = u B = 0 (11.18)
and therefore only has components in the directions transverse to u . Its trace
= B = g B = u (11.19)
The key equation governing the evolution of B along the integral curves of the geodesic
vector field is (11.13)
D B + B B = R u u . (11.20)
By taking the trace of this equation, we evidently obtain an evolution equation for the
expansion , namely
d
= ( u )( u ) R u u . (11.21)
d
Note that this equation, written in the form
u ( u ) + ( u )( u ) + R u u = 0 . (11.22)
To gain some more insight into the geometric significance of this equation, we now
consider the case that the geodesic congruence u is timelike and normalised in the
standard way as u u = 1 (so that is proper time).
h = g + u u . (11.23)
The properties of this tensor are closely related to those of the (induced metric) tensor
h = g N N (15.1) studied in section 15.1 in the context of hypersurfaces.
The main difference in the present context is that u is not necessarily hypersurface-
orthogonal (section 14.5) and therefore, in particular, not necessarily a normal vector
field to a familiy of spacelike hypersurfaces. Therefore h does not necesarily have an
interpretation as the induced metric on some hypersurface. Nevertheless, pointwise it
can be interpreted as a metric on the space of vectors transverse to the geodesic and its
purely algebraic properties are identical to those of the induced metric.
In particular,
u h = h u = 0 . (11.24)
268
It can therefore be interpreted as the spatial projection of the metric in the direc-
tions orthogonal to the timelike vector field u . This can be seen more explicitly
in terms of the projectors
h = + u u
h h = h . (11.25)
h u = 0 , (11.26)
h = . (11.27)
satisfies
u t... = . . . = u t... = 0 . (11.29)
g g h h = g + u u = h , (11.30)
as anticipated above. Whereas for the space-time metric one obviously has g g =
4, the trace of h is (in the 4-dimensional case)
g h = g g + g u u = 4 1 = 3 = h h . (11.31)
Thus for an affinely parametrised congruence the properties (11.8) and (11.9) show that
B is automatically a spatial or transverse tensor in the sense above,
b h h B = B . (11.32)
Note that the affine parametrisation of the timelike geodesic congruence, expressed
by the normalisation condition u u = 1, is crucial for this entire set-up, since the
projection operator requires a unit vector field. This is to be contrasted with the
situation for null geodesic congruences , to be discussed below, where the property
= 0 is independent of the parametrisation and one can (and we will) also consider
the case of non-affine parametrisations.
269
In the spirit of elasticity theory, we now decompose b into its anti-symmetric, sym-
metric traceless and trace part,
b = + + 13 h , (11.33)
with
1
= 2 (b b )
1
= 2 (b + b ) 31 h
= h b = g B = u . (11.34)
The quantities , and are known as the rotation tensor, shear tensor, and
expansion of the congruence (family) of geodesics defined by u .
In terms of these quantities we can write the evolution equation (11.7) for deviation
vectors as
D = + + 31 , (11.35)
and the evolution equation (11.21) for the expansion as
d
= 31 2 + R u u . (11.36)
d
This is the Raychaudhuri equation for timelike geodesic congruences.
Remarks:
= h b = h B = h u
(11.37)
= 12 h ( u + u ) = 12 h Lu g
where Lu denotes the Lie derivative along the vector field u. Substituting g =
h u u , one finds
= 12 h Lu (h u u ) = 21 h Lu h . (11.38)
270
a bit of care. When the congruence is hypersurface orthogonal, with the induced
metric (15.6)
hab = Ea Eb h , (11.41)
then (11.40) with h = det(hab ) follows from (11.38), because
Here we have made use of the fact that u and Ea have vanishing Lie bracket,
because (introducing and y a as coordinates, instead of the x )
x x
u = , Ea = (11.43)
y a
and the Lie bracket gives the commutator of the second partical derivatives of x .
When the congruence is not hypersurface orthogonal, one can still construct a
transverse cross-sectional volume, but one can only choose it to be orthogonal at
a given geodesic. Introducing in a neighbourhood of a point on this geodesic coor-
dinates y a labelling the geodesics, as well as the parameter along the geodesic,
the above calculation will then still go through.28
2. If required and desired, from (11.20) similar (but somewhat less transparent)
equations can be derived for the evolution of the shear and rotation tensors along
the geodesic congruence, i.e. for (d/d ) and (d/d ) .
3. Since and are purely spatial tensors, their squares are non-negative,
0 , 0 , (11.44)
with = 0 only for = 0 (and likewise for the rotation). They thus enter
the Raychaudhuri equation with opposite signs.
4. In the presence of both these terms it is difficult to say something general about the
evolution of . Since the first term ( 2 /3) is non-positive, an important special
case of the Raychaudhuri equation arises when the rotation is zero, = 0. This
happens for example when u = S is the gradient co-vector of some function
S. In this case u is orthogonal to the level-surfaces of S. In fact, more generally
we have the statement that
u[ u] = 0 u + u + u = 0 . (11.46)
28
For a more careful proof of this statement see the discussion in section 2.4.8 of E. Poisson, A
Relativists Toolkit.
271
Contracting this with u and using u u = 1 and u = 0, only the first term
survives and one finds on the nose that = 0,
u[ u] = 0 = 0 , (11.47)
and the Frobenius theorem provides one with the converse statement. Alter-
natively, = 0 follows from assuming that u has the explicit hypersurface-
orthogonal form u = f S. Then one has (14.52)
f = (u f )u u = 0 . (11.49)
5. Either way, for a hypersurface orthogonal congruence of timelike geodesics one has
d
= 31 2 R u u . (11.50)
d
The first two terms on the right hand side are manifestly non-positive (recall that
is a spatial tensor and hence 0). Thus, if one assumes that the
geometry is such that
R u u 0 (11.51)
(by the Einstein equations to be discussed in the section 18, this translates into a
positivity condition on the energy-momentum tensor known as the strong energy
condition, cf. section 21.1), one finds
d
= 31 2 R u u 0 . (11.52)
d
This means that the divergence (convergence) of geodesics will decrease (increase)
in time. The interpretation of this result is that gravity is an attractive force (for
matter satisfying the strong energy condition) whose effect is to focus geodesics.
6. According to (11.52), d/d is not only negative but actually bounded from above
by
d
13 2 . (11.53)
d
Rewriting this equation as
d 1 1
, (11.54)
d 3
one deduces immediately that
1 1
+ . (11.55)
( ) (0) 3
This has the rather dramatic implication that, if (0) < 0 (i.e. the geodesics are
initially converging), then ( ) within finite proper time 3/|(0)|,
272
7. If one thinks of the geodesics as trajectories of physical particles, this is obviously
a rather catastrophic situation in which these particles will be infinitely squashed.
In general, however, the divergence of only indicates that the family of geodesics
develops what is known as a caustic where different geodesics meet.
8. Simple non-catastrophic examples of such caustics are e.g. the poles of a sphere
where great circles meet, or even just the origin in Euclidean space Rn when
considering the family of radial geodesics passing through the origin. E.g. in the
latter case the tangent vector field is simply r , and its divergence is
1
(r ) = ( g(r ) ) r 1 , (11.57)
g
9. Nevertheless, the above result plays a crucial role in establishing the occurrence of
true singularities in general relativity if supplemented e.g. by conditions which en-
sure that such harmless caustics cannot appear, as this means that the geodesic
cannot be extended to where one would find . This kind of argument
(leading to the conclusion of geodesic incompleteness of a space-time) is one of the
typical ingredients of the singularity theorems of general relativity (see footnote
93 of section 28.3 for some references).
10. The adaptation of this formalism in general and the Raychaudhuri equation in
particular to congruences of null geodesics requires some more care (and is ul-
timately expressed in terms of 2-dimensional rather than 3-dimensional spatial
tensors), and we will discuss this in section 11.4 below.
In section 11.4 we will derive the null counterpart of the Raychaudhuri equation for
timelike geodesic congruences discussed in section 11.2 above. The set-up we will use
is a suitable combination of that for timelike geodesics and the formalism of projectors
adapted to null directions. As a preparation for this, and a useful by-product, in this
section we will first derive a variant of the geodesic deviation equation for null geodesics,
the transverse null geodesic deviation equation.
Thus we consider a null geodesic (or congruence of null geodesics), with tangent vector
field , and we will initially choose these null geodesics to be affinely parametrised so
that one has
= 0 , = 0 . (11.58)
273
The affine parameter along the null geodesics of this congruence will (for lack of imagi-
nation) be called .
Now recall from the discussion of the geodesic deviation equation in section 11.1 that
for any geodesic deviation vector , i.e. a vector satisfying the condition
D = u , (11.59)
2. In the null case, however, the condition = 0 does not accomplish this, i.e.
does not remove the component of pointing in the direction of because it
imposes no condition precisely on that component. Thus we expect the deviation
vector to have two uninteresting components in the null case, and the
component of in the direction of :
= (11.62)
274
Therefore it is natural to project out both these components from . In order to
construct a suitable projection operator, one can proceed as in section 16.4 and introduce
a complementary null vector (field) n with
n n = 0 , n = 1 . (11.64)
Then
= + . . . = n (11.65)
and we can elininate both boring components by imposing the transversality conditions
= n = 0 (11.66)
= . (11.67)
As in (11.6) we introduce
B = , (11.68)
D = B . (11.69)
B = B = 0 , (11.70)
but B is not automatically orthogonal to n (and we will come back to and recitify
this below). Exactly the same calculation as (11.13) in section 11.1 now shows that
D B + B B = R (11.71)
(D )2 = R . (11.72)
(D )2 = R = 0 , (11.73)
275
Associated with a choice of n we have a decomposition of the metric into a
longitudinal and a transverse spatial part,
g = s ( n + n ) , (11.74)
and
g s = s s = s = 2 . (11.76)
s = + ( n + n ) : s s = s
(11.77)
s = s n = 0 ,
With the aid of these projectors, we can now write the fully projected version of (11.69)
as
s D (s ) = b (11.78)
b = s s B . (11.79)
Likewise the purely transverse (to and n) variant of the null geodesic equation (11.72)
can be written as
s (D )2 (s ) = s s R . (11.80)
While this is essentially the final result, it is not particularly transparent yet. We will
put this equation into a somewhat more attractive form below, in which manifestly only
the transverse components of the deviation vector and R appear.
First of all, note that the auxiliary normal vector n is not unique. For a fixed choice
of , at a point on the geodesic, that is for a given value of , it is uniquely determined
up to null rotations around ,
, n n + a Ea + 12 2 , Ea Ea + a , (11.81)
Then the properties of parallel transport obtained in section 4.8 imply that the con-
ditions (11.64) on n hold everywhere along the null geodesic (or congruence of null
276
geodesics) if they are satisified initially. This reduces the ambiguity in (11.81) to -
independent null rotations.
In fact, one can do even better than that and choose (see also the discussion at the end
of section 16.4, in particular around (16.57)) an entire pseudo-orthonormal frame
{EA } = {E+ = , E = n, Ea } : g EA EB = AB (11.83)
where
++ = = 0 , + = 1 , a+ = a = 0 , ab = ab . (11.84)
If one selects such a frame at one point along the geodesic and then parallel transports
the frame along the geodesic, the orthogonality relations (11.83) will hold everywhere
along the geodesic. Thus we can always choose a basis EA such that
D EA =0 , g EA EB = AB . (11.85)
With this choice, a transverse geodesic deviation vector is simply one which has com-
ponents only in the Ea -directions,
= n = 0 = a Ea , (11.86)
or simply
= a Ea . (11.87)
d2 a
= Ra++b b = Ra+b+ b , (11.89)
d 2
where the Ra+b+ are the frame components of the Riemann tensor,
Ra+b+ = Ea E+ Eb E+ R = Ea Eb R . (11.90)
Thus the transverse null geodesic deviation equation has the form of a (D2)-dimensional
(transverse) harmonic oscillator equation,
d2 a
= (2 )ab b , (11.91)
d 2
with the time-dependent symmetric frequency matrix
277
The notation used here is perhaps suggestive but it is not meant to imply that 2 is
necessarily positive - the frequencies can be real or imaginary. Using the decomposition
(10.75) of the Riemann tensor into its traceless and trace parts, we can (with D = 4,
a+ = ++ = 0, ab = ab ) decompose Ra+b+ as
In particular, if the Ricci tensor is zero (as we will see this means that the metric
solves the vacuum Einstein equations), the frequency matrix 2 is symmetric traceless
and thus necesarily has positive and negative eigenvalues (corresponding to real and
imaginary frequencies).
We now consider a null geodesic congruence, with tangent vector field again denoted by
, and we will initially choose these null geodesics to be affinely parametrised so that
one has
= 0 , = 0 . (11.94)
We use the same framework as in the previous setion, with an auxiliary null vector field
n with n = 1, the associated projectors etc.
b = s s B . (11.95)
Performing this projection explicitly, one sees that this spatial projection b is equal
to
b = s s B = B + n B + n B + n n B . (11.96)
This has two useful immediate consequences that we will make use of in the following,
namely
that the spatial trace of b with respect to s is equal to the space-time trace
of B (with respeect to g ),
g B = g b = s b , (11.97)
B B = b b . (11.98)
We can now, as in the timelike case, decompose b orthogonally into its irreducible
(trace, symmetric traceless, anti-symmetric) parts,
b = 12 s + 21 (b + b s ) + 12 (b b )
(11.99)
= 12 s + + .
278
Here is the expansion
= s b = s = g = , (11.100)
Remarks:
= 12 s L s . (11.101)
2. The equivalence between the spatial and space-time traces of in the above
equation is due to the fact that we have chosen to be affinely parametrised. We
will always define to be the spatial trace (divergence) of , even when is
not affinely parametrised, but in that case and are no longer equal (see
(11.126)). We will return to this issue below.
3. As regards the other terms, and are again known as the shear tensor and
rotation tensor respectively.
B B = b b = + 21 2 + . (11.103)
6. Because the tensors appearing on the right-hand side of this equation are spatial
tensors, their squares are non-negative,
0 , 0 . (11.104)
= R ( ) + ( ) .
279
The 2nd term is just B B and the 3rd term is zero because is geodesic. Thus
one finds the Raychaudhuri equation for null congruences
d
= R 12 2 + . (11.107)
d
Using (11.102) in the form
d
s = s , (11.108)
d
we can also write this as an equation for the change in the expansion rate of the cross-
sectional area s of the congruence. This leads to an additional +2 in the evolution
equation, and thus flips the sign of the 2nd term of (11.107), resulting in
d2
1 2
s = R + 2 + s . (11.109)
d 2
Remarks:
3. Analogously to the timelike case, (11.112) has the consequence that if one has an
initially converging null congruence, (0 ) < 0, then because of
d 1 1 0
12 2 + (11.113)
d ( ) (0 ) 2
1/( ) 0 or ( ) at the latest at
280
(if the geodesics can be extended that far). As in the timelike case, this usually
indicates the formation of a (harmless) caustic where these null geodesics cross.
5. An argument similar to that in the timelike case shows that the rotation vanishes
if (and locally only if, by Frobenius) is hypersurface orthogonal
= 0 hypersurface-orthogonal . (11.119)
281
6. The expansion properties of families of null geodesics play a crucial role both in
the singularity theorems of general relativity (where for example so-called trapped
surfaces are characterised by negative expansions for both ingoing and outgoing
families of lightrays), and in the study of black holes and the laws governing the
evolution of their event horizons (where the interest is in the null geodesic congru-
ences generating the horizon). In particular, in the latter case the Raychaudhuri
equation is one crucial ingredient in the proof of the statement (Hawkings theo-
rem) that under reasonable conditions the cross-sectional area of the event horizon
of a black hole cannot decrease.
Let us now look at the case when the null geodesic congruence is not affinely parametrised,
i.e. when, instead of (11.94), the starting point is a null vector field satisfying
= 0 , = , (11.123)
with the inaffinity. Then a couple of things change in the derivation, but the end
result (11.129) turns out to differ from (11.107) by only one term (and I will give an
alternative and much quicker derivation of the result below).
As before, we can choose an auxiliary null vector field n , construct the projectors s
etc. Defining again B = , one still has B = 0 (because this is implied by
= 0), but instead of B = 0 one now has
B = . (11.124)
While the projection (11.96) remains unchanged, i.e. the relation between b and B
has the same form as in (11.96), the equations (11.97) and (11.98) for the trace and
square of B differ. Instead of (11.97) one has
s b = (g + n + n )B
(11.125)
= + n = .
B B = b b + 2 . (11.127)
282
Putting everything together and calculating (d/d ) as in (11.106), one then finds
d
( + ) = R ( ) + ( )
d
= R B B + ( )
d (11.128)
= R b b 2 + + ( + )
d
d
= R b b + + .
d
Thus the net effect of dealing with a non-affinely parametrised null congruence is that
one just picks up one additional term on the right-hand side of the Raychaudhuri equa-
tion,
d
= R 21 2 + . (11.129)
d
A quick(er) way to derive (11.129) is from the result (11.107) for affinely parametrised
null geodesics, by determining how the quantities appearing in (11.107) change under a
reparametrisation
= f (11.130)
On the other hand for the expansion parameter etc one deduces from
= = f B + f
B (11.133)
b = f b (11.134)
which implies
= f
b = f b
= f (11.135)
= f
Plugging these results into (11.107) one obtains on the nose (11.129) (with ,
etc.).
283
11.6 Expansions and Inaffinities of Radial Null Congruences
In this section, we look at some general properties of radial null congruences in a spher-
ically symmetric space-time. All of the results of the previous sections 11.4 and 11.5
are of course also valid in this case, but the spherically symmetric case also has some
special and simplifying features.
Thus we consider a spherically symmetric metric. Such a metric could always be written
in the form
ds2 = A(t, r)dt2 + B(t, r)dr 2 + r 2 d2 (11.136)
by a suitable choice of coordinates. However, we will not need to commit ourselves to
this particular choice of coordinates. By making an arbitrary coordinate transformation
preserving the manifest spherical symmetry, this metric can be written in the form
for some 2-dimensional Lorentzian metric gab (z), and with r = r(z a ) now a function of
the new coordinates.
We now consider two linearly independent radial and spherically symmetric null vector
fields and n , which we choose to be cross normalised such that
= n n = 0 , n = 1 . (11.139)
Remarks:
1. Here radial means that it has components only in the z a -directions transverse
to the sphere, and spherically symmetric that the coefficients only depend on
the z a and not on the coordinates of the sphere (this can of course also, if desired,
be phrased in a more coordinate-independent way, e.g. as the statement that the
Lie derivatives of and n along the Killing vectors generating the rotational
symmetry vanish, but for present purposes not much is gained by this).
2. In concrete applications we will choose n to be ingoing (in the sense that future
directed null rays tangent to n will move towards smaller values of r) and to
be (asymptotically) outgoing.
3. The minus sign in the cross normalisation is such that both vector fields are either
future or past oriented (and we will of course choose the former).
4. Note that the individual normalisation of the and n is not fixed by the above
conditions, i.e. one can still perform the boost
284
This can e.g. be used to select a preferred normalisation for one of them. If
has been fixed, then, in spherical symmetry and with the assumption that n is
also purely radial (longitudinal), n is uniquely determined by the 2 conditions
n n = 0 and n = 1. This should be contrasted with the situation without
spherical symmetry where, as discussed in section 11.3, there is still the additional
freedom to perform null rotations on n .
Spherical symmetry (and the choice of spherically symmetric null vector fields) also has
other implications. For instance, it follows from spherical symmetry that will
be some linear combination of and n (i.e. no component tangent to the transverse
sphere),
= A + Bn (11.141)
(and likewise for n ). Taking the scalar product with and using
( ) = 12 ( ) = 0 , (11.142)
= , n n = n n . (11.143)
The boost freedom can then e.g. be used to choose either or n to be affinely
parametrised (but usually not both of them simultaneously).
n = n , n = n . (11.144)
= n = n
(11.145)
n = n n = n n ,
or
= 21 ( n + n ) = 12 Ln g
(11.146)
n = 12 n n ( + ) = 12 n n L g
Here L and Ln are the Lie derivatives. Thus the inaffinities encode the information
about the longitudinal projections of the derivatives and n , or of the Lie
derivatives L g and Ln g .
285
Other useful information is contained in the transverse (i.e. parallel to the sphere)
projections of these objects. To define them, note that, as in section 11.3, associated
with a choice of and n we have the decomposition of the metric
g = s ( n + n ) (11.147)
with s the transverse spatial metric (on the sphere),
s dx dx = r(z)2 d2 , (11.148)
but that in the current context this decomposition and the corresponding projectors s
are now unique as the combination n is boost-invariant.
The expansions of and n are defined as the transverse spatial projections of the
divergence of respectively n , i.e.
= s , n = s n . (11.149)
As in (11.101) and (11.102) of section 11.4, these can be written as
1
= 21 s L s = L s
s
(11.150)
1
n = 12 s Ln s = Ln s
s
With s = r(z)2 sin , one finds more explicitly
2 2
= r , n = n r . (11.151)
r r
If one works with r as one of the coordinates, then this can also succinctly be written
as
2 2
= r , n = n r . (11.152)
r r
As in (11.126) of section 11.5, we also have the relations
= + , n = n + n . (11.153)
Turning now to the Raychaudhuri equation for a spherically symmetric radial null con-
gruence , the general result (11.129) (for 6= 0), i.e.
d
= R 12 2 + (11.154)
d
simplifies considerably. Spherical symmetry implies that the spatial shear and rotation
tensors are zero (a spatial rotationally invariant 2-tensor is proportional to ik which
has neither a traceless nor an anti-symmetric part),
= = 0 . (11.155)
The vanishing of the rotation can also be deduced from the fact that is hypersurface
orthogonal (specifically orthogonal to the family of null hypersurfaces generated by ).
Thus the Rauchaudhuri equation reduces to
d
= R 12 2 . (11.156)
d
286
12 Curvature IV: Curvature and Killing Vectors
( )V = R V (12.1)
and its cyclic symmetry (7.17), it is possible to deduce that for a Killing vector K ,
K + K = 0 , (12.2)
one has the following basic identity relating Killing vectors and the curvature tensor,
K = R K . (12.3)
Indeed, proceeding as in the proof of the cyclic permutation identity (7.17), we deduce
that
[ K] R[] K = 0 . (12.4)
K + K + K = 0 . (12.5)
Using the Killing property in the second term, we can write this as
K = [ , ]K = R K (12.6)
which is (12.3).
This identity can be interpreted as the statement (and can alternatively be derived from
the fact) that the Lie derivative of the Christoffel symbols of a metric along a Killing
vector of the metric is zero.
Indeed, first of all it is easy to see that under a general variation of the metric, the
induced variation of the Christoffel symbol can be written as (19.14)
= 21 g ( g + g g ) . (12.7)
(this is easy to derive and also easy to remember as it takes exactly the same form as
the definition of the Christoffel symbol, only with the metric replaced by the metric
variation and the partial derivatives by covariant derivatives - see section 19.2 for a
287
derivation and discussion of this identity). In particular, this exhibits the fact that the
metric variation of the Christoffel symbols is a tensor (as could have been anticipated
from the fact that the non-tensorial term in the transformation of the Christoffel symbols
is independent of the metric), and additionally provides us with an explicit expression
for this tensor.
Next, if the variation g = L g is the Lie derivative, i.e. the variation in the metric
induced by an infinitesimal coordinate transformation x = , one can write this as
L = 12 g ( L g + L g L g ) . (12.8)
Note that in general the Lie derivative of a non-tensorial quantity is not well defined
(or at least its definition requires a bit more thought). Here, however, it is natural to
use the general formula (12.7) for the variation of the Christoffel symbols under metric
variations to in particular define their Lie derivative (as the change in the Christoffel
symbols induced by the Lie derivative of the metric).
Thus, adopting this definition and using L g = + , the right-hand side can
(upon using the definition and cyclic symmetry of the Riemann tensor) be written as
L = R
(12.9)
= + R .
LK g = 0 LK = 0 K = R K , (12.10)
Contracting (12.3) over and , one obtains the next useful and frequently used identity
K = K R . (12.11)
R K K = ( K )( K ) + (K K ) . (12.12)
Note that this can also be deduced directly from (7.50) for V K a Killing vector.
We will now look at various consequences of the identities (12.3), (12.11) and (12.12)
which are useful and interesting in their own right. The implications of these identities
for maximal symmetry and maximally symmetric spaces will be discussed separately in
section 13 below.
288
12.2 Killing Vectors form a Lie algebra
As the first application, we will explicitly prove the assertion (8.43) of section 8.5 that
the Lie bracket of two Killing vectors is again a Killing vector. While this follows from
the general property (8.32) of the Lie derivative, which itself can (with some work)
be deduced from the general definition of the Lie derivative (as the generator of the
action of coordinate transformations on tensors), it is instructive and reassuring to
verify this by an explicit calculation, also because similar manipulations are required
when extending the analysis from Killing vectors to Killing tensors or Killing-Yano
tensors briefly mentioned in section 9.5.29
Thus consider two Killing vectors A and B , say, i.e. vector fields satisfying
A + A = B + B = 0 (12.13)
or, equivalently,
A = [ A] , B = [ B] . (12.14)
C = [A, B] = A B B A , (12.15)
C = [A, B] C = [ C] . (12.16)
C = ( A ) B ( B ) A + A B B A
(12.17)
= ( A ) B + ( B ) A + R (A B B A ) .
The first two terms are already manifestly anti-symmetric (the second being the anti-
symmetrisation of the first), and by the cyclic identity and other symmetries of the
Riemann tensor, so is the last term,
R R = R R = R = R . (12.18)
Thus the Lie bracket of two Killing vectors is indeed again a Killing vector, as claimed.
29
The interesting question if or when Killing-Yano tensors form a Lie algebra, extending and gener-
alising the Lie algebra of the isometry group generated by the Killing vectors, is analysed in D. Kastor,
S. Ray, J. Traschen, Do Killing-Yano tensors form a Lie Algebra?, arXiv:0705.0535 [hep-th].
289
12.3 On the Isometry Algebra of a Compact Riemannian Space
In this section we will look at one immediate application of the identity (12.12),
R K K = ( K )( K ) + (K K ) , (12.19)
namely an analogue of the Bochner-Yano type argument (given in remark 8 of section
7.5) for Killing vectors. Again, in order to be able to say something of substance we as-
sume that the space we are dealing with is compact without boundary, and Riemannian,
i.e. equipped with a positive-definite metric. In spite of this, the result we will derive
is relevant also for physics, at least as long as one is willing to entertain the possibility
that some higher-dimensional generalisations of general relativity (such as Kaluza-Klein
theories discussed in section 43) plays a role in some more fundamental description of
nature.
With the above assumptions, the first term on the right-hand side of (12.19) is non-
negative and the second is a total derivative term that vanishes upon integration.
Therefore for a Killing vector to exist on a compact Riemannian space, the integral
of R V V must be non-negative as well.
Since the Lie bracket of two covariantly constant vector fields is zero,
V = W = 0 [V, W ] = V W W V = 0 , (12.20)
this means that continuous isometries of a space with vanishing Ricci tensor can at
most be Abelian. An example is provided by the torus T n equipped with the flat metric
it inherits from regarding T n as the periodic identification of Rn . This metric has
vanishing Ricci tensor (because evidently even the Riemann tensor is zero), but there
are n linearly-independent (covariantly) constant translational Killing vectors (inherited
from Rn ) that generate the Abelian isomtery group U (1)n .
In Kaluza-Klein theory, one of the basic ideas is that gauge symmetries arise from
isometries of the internal space living in the extra dimensions. This internal space
is usually assumed to be compact (so as to be sufficiently small to have escaped our
attention). Thus, if one wants to generate non-Abelian gauge theories in this way
the above results provide one of the most basic constraints on the internal geometry,
namely that the Ricci tensor should not be non-positive (but it does not have to be
strictly positive everywhere).
290
12.4 Invariance of the Curvature along Killing Directions
It should be obvious and obviously true that for any Killing vector of a metric the scalar
curvature of the metric inherits the corresponding symmetries of the metric, i.e. that it
does not change along the orbits of that Killing vector,
K + K = 0 K R = 0 , (12.21)
or
LK g = 0 LK R = 0 (12.22)
One can start with the contracted Bianchi identity G = 0, and contract it
with K to find
0 = ( G )K = ( R )K 21 K R . (12.23)
Using the Killing equations, i.e. the anti-symmetry of K , and the symmetry
of the Ricci tensor, one can write this as
K R = 2 (K R ) . (12.24)
K R = 2 ( K ) = [ , ] K = 0 (12.25)
by (7.44).
Alternatively (and more quickly but somewhat less covariantly) one could have simply
locally introduced an adapted coordinate system (8.57) in which K = y and y g = 0,
to immediately deduce that then necessarily also y R = 0. However, on general grounds,
and with an eye towards possible generalisations, it is always useful to have different
arguments at ones disposal, in particular among them one which is covariant.
LK g = 0 LK R = 0 , LK R = 0 . (12.26)
291
12.5 Calculating Killing Components of the Ricci Tensor
Since K is anti-symmetric, one can write (12.11) more explicitly with the help of
the formula (4.63) for the covariant divergence of an anti-symmetric tensor as
1
R K = ( g K ) . (12.27)
g
This can be a quite efficient way to calculate certain components of the Ricci tensor
of a metric, namely those which are of the form R K for some Killing vector (the
components referred to glibly as the Killing components of the Ricci tensor in the
heading). In spite of this, this shortcut does not appear to be widely known or commonly
used.
As an illustration of how this works, consider again the general static spherically sym-
metric metric (2.88),
Among the Killing vectors of this metric is the vector field = t generating time-
translations, and thus we can use (12.27) to determine the components Rt of the Ricci
tensor.
= = + A(r)t , (12.29)
A
trt = ttr = , t = 0 otherwise , (12.30)
2A
the only non-trivial components of are
t r = r t = A /2 , = 0 otherwise . (12.31)
r t = t r = A /2AB . (12.32)
With
g= ABr 2 sin (12.33)
292
while
1
Rtt = r (r 2 A / AB) . (12.36)
2 ABr 2
Explicitly one can write this as
A A A B A
Rtt = ARtt = ( + )+ . (12.37)
2B 4B A B rB
Remarks:
1. As you can check for yourself, this way of determining Rtt is much quicker than
working it out from the general formula for the Ricci tensor involving the Christof-
fel symbols squared as well as their derivatives. In fact it is the quickest and slickest
way to obtain Rtt by a calculation in coordinate components that I am aware of.
2. In the same way, one can also determine the angular components R , say, using
the Killing vector = .
3. The only not obviously vanishing component of R (see the discussion in section
23.3) that cannot be obtained in this way is Rrr .
A cute application of the identity (12.11) is the following. Recall that in the covariant
Lorenz gauge
A = 0 (12.38)
F = ( A A ) = 0 (12.39)
K + K = 0 K = 0 (12.41)
K = R K K = R K . (12.42)
Thus the sign of the Ricci tensor in (12.40) and (12.42) is different, but evidently this
difference disappears for a metric with vanishing Ricci tensor. This does not imply at
all that the Riemann tensor is zero. Indeed, we will learn in section 18 that the vacuum
293
Einstein equations (i.e. the gravitational field equations without or outside of matter
sources) are simply the Ricci flatness conditions R = 0 (18.30).
A = K , F = K K F = 0 , (12.44)
is identical to (12.39).
This means that any Killing vector of a solution to the vacuum Einstein equations auto-
matically gives rise to a solution of the vacuum Maxwell equations in that gravitational
background. Depending on the Killing vector this may or may not be a non-trivial
(F 6= 0) solution to the Maxwell equations, rotational Killing vectors typically giv-
ing rise to non-trivial solutions while for translational Killing vectors A is pure gauge
(see section 13.1 for a more precise characterisation of what is meant by rotational
and translational Killing vectors at a given point).
For example, taking the general Killing vector (8.46) of Minkowski space (which certainly
has vanishing Ricci tensor),
K = x + A K = x + , (12.45)
F = K K = 2 . (12.46)
Thus it vanishes for a purely translational Killing vector while a boost is associated
with a constant electric field (Ek 0k ) and a spatial rotation gives rise to a constant
magnetic field (Bk kij ij ).
Because the Einstein tensor G (7.92) is symmetric and conserved (the contracted
Bianchi identity (7.91)), to any Killing vector one can associate (cf. the discussion in
section 9.1) the conserved current
J1 = G K = R K 12 R K = R K 12 RK . (12.47)
Ja = R K 12 a RK , Ja = 0 , (12.48)
294
for any value of the real parameter a. Among this 1-parameter family of conserved
currents, the choice
Ja=0 J (K) = R K (12.49)
(the Komar current) is singled out by the fact that, by (12.11), it is not only conserved
but can actually be written as the divergence of an anti-symmetric tensor,
J (K) = A , A = A = K (12.50)
Thus the corresponding conserved charge, written as a hypersurface integral, can actu-
ally be written as a surface integral of components of A . These define the so-called
Komar charges associated to symmetries of the metric. They will make a brief appear-
ance in section 22.4.
As an aside, note that while in the above we started off with Killing vectors, a similar
story is actually true for any vector field. Namely, for any vector field define the
current
J () = ([ ] ) = 21 ( ) . (12.51)
Note that this reduces to (12.50) for = K a Killing vector. Moreover, by (7.45) this
current is conserved,
J () = 0 . (12.52)
J () = 21 ( ) = 21 ( + )
= [ , ] + ( ) 21 ( + ) (12.53)
= R + 12 (g g g g ) ( + ) ,
where we made use of (7.38). Note that this indeed reduces to (12.49) for a Killing
vector, for which the second term on the right-hand side is absent.
The existence of these identically conserved currents and the corresponding surface
charge densities [ ] reflects the fact that in general relativity (more generally, in
any generally covariant theory) all vector fields can be considered as the generators of
symmetries (in the sense of coordinate transformations). Indeed, the currents J () can
be shown to be precisely the corresponding Noether currents arising from the Lagrangian
formulation of general relativity to be discussed in section 19. We will establish this
result in section 19.6. Nevertheless, the currents and charges associated with Killing
vectors turn out to play a privileged role, and we will in particular relate the Komar
charge for a timelike Killing vector to the ADM mass of an isolated (asymptotically)
static system in section 22.4.
295
13 Curvature V: Maximal Symmetry and Constant Curvature
As we will discuss later on, in the context of the Cosmological Principle, such spaces,
which are simultaneously homogeneous (the same at every point) and isotropic (the
same in every direction) provide an (admittedly highly idealised) description of space
in a cosmological space-time.
If you already know (or are willing to believe) that in any spatial dimension n there
are essentially only 3 such spaces, namely the Euclidean space Rn , the sphere S n , and
its negative curvature counterpart, the hyperbolic space H n (all equipped with their
standard metrics), you can skip this section, and may just want to refer to section 13.3
where it is shown that these 3 standard metrics can be written in a unified way as
dr 2
ds2 = + r 2 d2n1 (13.1)
1 kr 2
for k = 0, 1 respectively.
Our starting point is, as in the previous section, the identity (12.3), reproduced here
with the explicit x-dependence included for present purposes,
In particular, this shows that the second derivatives of the Killing vector at a point x0
are again expressed in terms of the value of the Killing vector itself at that point. This
means (think of Taylor expansions) that, remarkably, a Killing vector field K (x) is
completely and uniquely determined everywhere by the values of K (x0 ) and K (x0 )
at a single point x0 .
(i)
A set of Killing vectors {K (x)} is said to be linearly independent if any linear relation
of the form X
ci K(i) (x) = 0 , (13.3)
i
296
with constant coefficients ci implies ci = 0 (the reason for insisting on constant coeffi-
cients rather than functions ci (x) in this definition is of course that if K is a Killing
vector, then so is cK iff c is constant).
An example of a metric with the maximal number of Killing vectors is, none too sur-
prisingly, n-dimensional Minkowski space, where n(n + 1)/2 agrees with the dimension
of the Poincare group, the group of transformations that leave the Minkowski metric
invariant.
Other examples of spaces that are maximally symmetric spaces are provided by spheres
with their standard metric (e.g. we already know that the 2-sphere has 3 = 2(2 + 1)/2
linearly independent Killing vectors, given explicitly in (8.53)). We will show below that
spheres and their negative curvature hyperbolic counterparts are the unique non-trivial
maximally symmetric spaces (with a corresponding statement for maximally symmetric
space-times, which we will study in detail in section 38).
We will now see how the data K (x0 ) and K (x0 ) are related to translations and
rotations:
297
For example, as mentioned before, the 2-sphere is maximally symmetric, with 3 linearly
indepndent Killing vectors, given explicitly e.g. in (8.53). The decomposition of these
3 Killing vectors into 1 rotational and 2 translational Killing vectors depends on the
point on the 2-sphere, the rotational Killing vector always being associated with the
rotations around the axis through that point, and the translational Killing vectors being
formed by the remaining 2 linearly independent combinations of Killing vectors. The
decomposition given in (8.53) is adapted to rotations around the north (or south) pole,
with V(3) = the corresponding rotational Killing vector. Note that this Killing vector
acts as a rotation at / around the poles but that it acts as a translation away from
the poles (where some other linear combination of the 3 Killing vectors would be the
rotational Killing vector). We will come back to this in slightly more general terms
below.
Some simple and fairly obvious consequences of these definitions are the following:
3. (1) and (2) now imply that a space which is isotropic around every point is max-
imally symmetric.
4. Finally one also has the converse, namely that a maximally symmetric space is
homogeneous and isotropic.
Property (2) is a consequence of the fact that constant linear combinations of Killing
vectors are again Killing vectors and that, as mentioned above in the context of the
2-sphere, away from the origin of the rotation a rotation acts just like a translation.
Technically, the difference between two rotational Killing vectors at x and x + dx can
be shown to be a translational Killing vector. To see this (roughly), consider 2 Killing
vectors K and L describing rotations about a point x0 and a point x0 + dx respectively,
i.e.
K (x0 ) = 0 , L (x0 + dx) = 0 . (13.5)
and, in particular,
( L )(x0 + dx) 6= 0 . (13.6)
298
Now expanding L (x) around the point x+dx, one has (in an inertial coordinate system
at x0 , say)
M (x0 ) = dx L (x0 + dx) 6= 0 (13.9)
while its matrix of covariant derivatives M vanishes there due to the crucial identity
L L (13.2). Thus M defines a translational Killing vector at x0 .
On the basis of these simple considerations we can already determine the form of the
Riemann curvature tensor of a maximally symmetric space. We will see that maxi-
mally symmetric spaces are spaces of constant curvature in the sense that the Riemann
curvature tensor is simply and purely algebraically related to the metric by
This result could be obtained by making systematic use of the higher order integrability
conditions for the existence of a maximal number of Killing vectors. The argument
given below is less covariant but more elementary.
Assume for starters that the space is isotropic at x0 and choose a Riemann normal
coordinate system centered at x0 . Thus the metric at x0 is gij (x0 ) = ij where we may
just as well be completely general and assume that
where the last term is only possible for D = 4. The symmetries of the Riemann tensor
imply that a = d = b + c = 0, and hence we are left with
299
Thus in an arbitrary coordinate system we will have
Rijkl (x0 ) = b(gik (x0 )gjl (x0 ) gil (x0 )gjk (x0 )) , (13.14)
If we now assume that the space is isotropic around every point, then we can deduce
that
Rijkl (x) = b(x)(gik (x)gjl (x) gil (x)gjk (x)) (13.15)
for some function b(x). Therefore the Ricci tensor and the Ricci scalar are
For n > 2 the contracted Bianchi identity i Gij = 0 now implies that b(x) has to be a
constant, and we have thus established (13.10). Note that we also have
We are interested not just in the curvature tensor of a maximally symmetric space but in
the metric itself. I will give you two derivations of the metric of a maximally symmetric
space, one by directly solving the differential equation
for the metric gij , the other by a direct geometrical construction of the metric which
makes the isometries of the metric manifest.
300
where d2(n1) = d 2 + . . . is the volume-element for the (n 1)-dimensional sphere or
its counterpart in other signatures. For concreteness, we now fix on n = 3, but the
argument given below goes through in general.
It is straighforward to calculate the components of the Ricci tensor of this metric. This
can be viewed as a special case of the calculations leading to the Schwarzschild metric
in section 23, setting the function called A(r) there to zero (of course before having
divided by it anywhere . . . ).
1 B
Rrr =
rB
1 rB
R = +1+ . (13.22)
B 2B 2
We now want to solve the equations
B = 2krB 2 , (13.24)
1 rB
2kr 2 = +1+
B 2B 2
1 2kr 2 B 2
= +1+
B 2B 2
1
= + 1 + kr 2 . (13.25)
B
This is an algebraic equation for B solved by
1
B= (13.26)
1 kr 2
(and this also solves the first equation). Therefore we have determined the metric of a
a maximally symmetric space to be
dr 2
ds2 = + r 2 d2(n1) . (13.27)
1 kr 2
Clearly, for k = 0 this is just the flat metric on Rn . For k = 1, this should also look
familiar as the standard metric on the sphere. If not, dont worry, we will be more
explicit about this below.
We will also rederive these metrics in the next section in a way that makes the isometries
of the metric manifest (and which thus also excludes the possibility, not logically ruled
301
out by the arguments given so far, that the metrics we have found here for k 6= 0 are
spherically symmetric and have constant Ricci curvature but are not actually maximally
symmetric).
Remarks:
1. First of all let us note that for k 6= 0 essentially only the sign of k matters as
|k| only affects the overall size of the space and nothing else (and can therefore
be absorbed in the scale factor a(t) of the metric (32.1) that will be the starting
point for our investigations of cosmology). To see this note that a metric of the
form (13.27), but with k replaced by k/L2 ,
dr 2
ds2 = + r 2 d2 , (13.28)
1 kr 2 /L2
can, by introducing r = r/L, be put into the form
2 dr 2 2 2 2 r2
d 2 2
ds = + r d = L + r d . (13.29)
1 kr 2 /L2 1 kr2
We now see explicitly that a rescaling of k by a constant factor is equivalent to
an overall rescaling of the metric, and thus we will just need to consider the cases
k = 0, 1. However, occasionally it will also be convenient to think of k as a
continuous parameter, the 3 geometries then being distinguished by k < 0, k = 0
and k > 0 respectively.
302
4. Thus, collectively we can write the three metrics as
dr 2
ds2 = + r 2 d2(n1) = d 2 + gk ()2 d2n1 (13.34)
1 kr 2
where
k=0
gk () = sin k = +1 (13.35)
sinh k = 1
r 2 /4)1 ,
r = r(1 + k (13.36)
r 2 /4)2 (d
ds2 = (1 + k r 2 + r2 d2(n1) ) = (1 + k~x2 /4)2 d~x2 . (13.37)
Note that this differs by the conformal factor (1 + kr 2 /4)2 > 0 from the flat
metric. One says that such a metric is conformally flat. Thus what we have
shown is that every maximally symmetric space is conformally flat. Conformally
flat, on the other hand, does not by any means imply maximally symmetric (the
conformal factor could be any function of the radial and angular variables).
Note also that the metric in this form is just the 3- (or n-) dimensional general-
isation of the 2-dimensional constant curvature metric on the 2-sphere in stere-
ographic coordinates (10.62) (for k = +1) or of the Poincare disc metric of H 2
(10.66) (for k = 1).
Recall that the standard metric on the n-sphere can be obtained by restricting the flat
metric on an ambient Rn+1 to the sphere. We will generalise this construction a bit to
allow for k < 0 and other signatures as well.
k~x2 + z 2 = 1 . (13.39)
303
This equation breaks all the translational isometries, but by the very definition of the
group G it leaves this equation, and therefore the hypersurface , invariant. It follows
that G will act by isometries on with its induced metric. Since dim G = n(n+1)/2, the
n-dimensional space has n(n+1)/2 Killing vectors and is therefore maximally symmetric.
Remarks:
2. The Killing vectors of the induced metric are simply the restriction to of the
standard generators of G on the vector space V .
3. For Euclidean signature, these spaces are spheres for k > 0 and hyperboloids for
k < 0, and in other signatures they are the corresponding generalisations. In
particular, for (p, q) = (1, n 1) we obtain de Sitter space-time for k = 1 and anti-
de Sitter space-time for k = 1. We will discuss their embeddings, and coordinate
systems for them, in much more detail in section 38.
It just remains to determine explicitly this induced metric. For this we start with the
defining relation of and differentiate it to find that on one has
k~x.d~x
dz = , (13.41)
z
so that
k2 (~x.d~x)2
dz 2 = . (13.42)
1 k~x2
Thus the metric (13.38) restricted to is
1
ds2 | = d~x2 + dz 2 |
k
k(~x.d~x)2
= d~x2 + . (13.43)
1 k~x2
Passing from Cartesian coordinates ~x to spherical coordinates (r, , ), with
304
14 Hypersurfaces I: Basics
In this section I will describe some of the basic aspects of what is known as the intrinsic
geometry of such hypersurfaces. The geometry of surfaces is of course a classical subject
of geometry, the study by Gauss of 2-dimensional surfaces embedded in R3 and his
Theorema Egregium regarding the intrinsic nature of the curvature of a surface marking
the birth of differential geometry, and as such is described in many places. We will just
barely scratch the surface of this subject and concentrate on those aspects that are of
evident (rather than just potential) relevance for general relativity.30
Strictly speaking very little of this is needed or used in the elementary applications of
general relativity in the later parts of these notes, and therefore this section could also
be skipped at first. However, this is a subject which is interesting in its own right and
which also leads to an improved understanding of the things that we have done so far
regarding tensors and tensor calculus.
Moreover, some results of this section, and its accompanying sections 15 and 16, come in
handy e.g. when one needs to integrate some quantity (like a component of a conserved
current) over a hypersurface, say. Moreover, some basic familiarity with this subject
is required to better understand certain slightly (but not terribly) advanced aspects of
general relativity like the Hamiltonian formulation of general relativity (section 20, this
also requires a knowledge of the extrinsic geometry of hypersurfaces to be discussed in
section 17) or the event horizon of the Schwarzschild black hole geometry (which turns
out to be a null hypersurface with certain special features to be discussed in more detail
in section 31).
We start by defining what we mean (at least roughly speaking) by a hypersurface and
an embedding or an embedded hypersurface.
305
Ther are two distinct ways of describing and thinking about hypersurfaces.
1. Embeddings
On the one hand, one can describe a hypersurface in terms of an embedding
of into M , specified by the map (which will need to satisfy some appropriate
regularity conditions - we will come back to this below).
2. Embedded Hypersurfaces
On the other hand one can think of a hypersurface concretely as a subspace of M ,
i.e. as an (already) embedded hypersurface
specified e.g. by
= {x M : S(x) = 0} . (14.3)
The 1st description may look a bit abstract, in particular since it seems to grant some
autonomy and independent existence to outside the space-time. However, if one
equips with coordinates y a , say, and M is described by coordinates x , then such an
embedding is given very concretely by specifying the point in M with coordinates x
that corresponds to a point in with coordinates y a . Thus an embedding is given by
the functions or parametric equations
: x = x (y a ) . (14.4)
Typically in general relativity, at least as far as its more elementary aspects are con-
cerned, hypersurfaces naturally arise as concretely embedded subspaces of space-time
(without an independent existence outside of the space-time), for example in the guise of
hypersurfaces of constant time t = t0 for some time coordinate t, or as slices of constant
r for some radial coordinate r etc.
Nevertheless, for certain purposes it is useful even then to also have the 1st description
at ones disposal, in particular when it comes to questions of relating tensors on M
to tensors on , determining induced metrics and volume elements on etc. All of
this is more transparent when expressed in terms of local coordinates on and M
and the relations among them. These are precisely the data x (y a ) locally defining an
embedding.
Examples:
306
1. As the first and most basic example, let us consider a spacelike hypersurface of
constant time in Minkowski space M , the latter equipped with standard inertial
coordinates x = (t, xk ).
In the 1st description one has in mind that one is given the space = R3
with Cartesian coordinates y k , and that one embeds it into Minkowski space
e.g. via the relations x = x (y) given explicitly by
In the 2nd description, one defines the same spacelike hypersurface by the
equation
S(t, xk ) = t t0 = 0 . (14.6)
x (y a ) : x1 (, ) = r0 sin cos
x1 (, ) = r0 sin sin (14.7)
x3 (, ) = r0 cos
or by the equation
or
r = r0 , x = , x = , (14.10)
This shows that it is probably a good idea to try to introduce and use coordinates
on the ambient space-time that are somehow adapted to the hypersurface one is
interested in.
307
3. As the third example consider the future lightcone of a point in Minkowski space
M . Without loss of generality we can choose that point to be the origin of the
coordinate system. Using spherical coordinates (x0 = t, x1 = r, x2 = , x3 = ) on
M , we can describe the future lightcone
S(x ) = x0 x1 = t r = 0 . (14.13)
Remarks:
1. A simple and simple-minded way of seeing the relation betwen the two descriptions
and passing from one to the other, generalising the above embedding of the sphere
in terms of spherical coordinates on the ambient space, is to use S(x ) as a new
coordinate, at least in a neighbourhood of the hypersurface , i.e. to trade any
one of the coordinates S(x) depends on for S. Calling the new coordinates (S, xa ),
where the xa are arbitrary independent coordinates, one may as well use the xa as
coordinates on the surface defined by S = 0. Then the parametric description
x (y) of the surface S = 0 can be chosen to be
2. One important and recurrent theme is the relation between tensors on a hyper-
surface and tensors on the ambient space(-time) M , i.e. the relation provided
by the embedding of into M between
308
-tensors: objects which transform like tensors under transformations of the
coordinates y a on and are scalars (invariant) under transformations of the
coordinates x on M , and
M -tensors: objects which transform (as usual) like tensors under transfor-
mations of the coordinates x on M and are scalars (invariant) under trans-
formations of the coordinates y a on .
3. The -tensor of principal interest is, as in the case of the ambient space M ,
the metric tensor hab (y) of . In general the metric tensor (and its associated
curvature tensor, to be discussed at length in sections 7 and 10) provide a complete
local characterisation of the intrinsic geometry of a space (or space-time), i.e. the
properties that can be deduced by measuring lengths, areas, volumes, angles,
performing parallel transport etc in that space.
4. In the case at hand, when we do not equip with any independent a priori metric
but we embed into M , the latter equipped with a metric g (x), the metric on
will be the induced metric, i.e. the metric induced on by the ambient metric
g (x) (in a way to be described below), and it is this metric that describes the
intrinsic geometry of the hypersurface .
5. The reason for insisting on the word intrinsic in this context is that when it
comes to embedded hypersurfaces there is another aspect of the geometry of
that goes beyond its purely intrinsic geometry, namely how it is embedded into the
ambient space M , i.e. how it bends inside M . A brief discussion of some aspects
of this so-called extrinsic geometry of appears in section 17. In this section we
will focus on the intrinsic geometry of a hypersurface.
The study of the relation between M -tensors and -tensors has a somewhat different
flavour for embeddings and embedded hypersurfaces {S(x) = 0}, and we will consider
both points of view in turn in the following.
14.2 Embeddings: Tangent and Normal Vectors and the Induced Metric
In this section we will look at some aspects of the geometry of hypersurfaces from the
point of view of embeddings , i.e. in terms of the parametric description x (y a ) of a
hypersurface .
First of all, let me start by giving a slightly more precise characterisation of what
is meant (or deserves to be called) an embedding. Clearly we want to impose some
regularity conditions on as for example the map which sends all of to a single point
x M might be entertaining to contemplate but does not quite capture what one has
in mind when one thinks of hypersurfaces.
309
that is injective (or one-to-one), i.e. that distinct points in are mapped to
distinct points in M ,
x
Ea = , (14.15)
y a
has maximal rank n.
Remarks:
2. For our purposes the most important consequence of this definition is that it
implies that the images in M of the tangent vector fields ya to , the vector
fields
x
ya 7 Ea = Ea (14.16)
y a
(which are tangent to the image () of in M ) are linearly independent. Here
tangent means that they are tangent to some curve in (), which is evidently
the case since one can take the required curve to be the image under of a suitable
curve in .
3. We have thus been able to push forward the ya from to M . Such a push-forward
operation induced by a map is usually denoted by , so that we can also write
the above as
(ya ) = Ea = Ea . (14.17)
4. Since we have not equipped with any other structure than the coordinates y a ,
the ya are the only objects we will be pushing forward to . In fact, as we will
see below, it is not even meaningful to try to push forward the differentials dy a .
310
Since the Ea are linearly independent tangent vectors to (the image of) in M , normal
vectors to , i.e. vectors orthogonal to , are characterised by
g Ea = Ea = 0 . (14.18)
Remarks:
3. The normal vector is only defined somewhat implicitly through (14.18). We will
see below, in section 14.4, that it has a much more concrete description in the
case of embedded hypersurfaces.
5. I will mostly use the term null surface for a surface with a null or lightlike normal
vector. The null case is somewhat special and peculiar, and we will occasionally
have to treat it separately from the timelike and spacelike case in the following.
6. When is not null, the freedom in the choice of f can be used to normalise
the normal vector to unit length 1. This normalisation condition determines the
normalised normal vector N uniquely up to a choice of sign, one possibility being
(
1 if is spacelike
N = 1/2 N N = = (14.20)
| | +1 if is timelike
One common convention for fixing the sign ambiguity in the case of an embedded
hypersurface = {S(x) = 0} will be mentioned in section 14.4.
311
metric on induced by a metric g (x) on M . This is simply obtained by restricting
the metric to (better, to its image () in M ), and also restricting the displacements
dx to displacements in (the image of) ,
ds2 | = g (x)dx dx |
x x a b (14.21)
= g dy dy hab (y)dy a dy b .
y a y b
Thus the induced metric is
x x
hab (y) = g (x(y)) (y) (y) . (14.22)
y a y b
In terms of the tangent vectors Ea (14.16) the induced metric can be written as (and
determined from)
hab = g Ea Eb . (14.23)
Remarks:
2. We see that here we have been able to pull back (restrict) a tensor on M to a tensor
on , an operation usually denoted by , so that one also frequently writes this
as
hab = ( g)ab = g Ea Eb . (14.24)
This is really all we need and will make use of in the following, while for the restriction
of other tensor fields from M to we will principally use the formulation of embedded
hypersurfaces rather than that of embeddings of hypersurfaces.
However, in order to better understand why e.g. the operation of pulling back a metric,
described above, works so simply, and if or how this can be extended to other tensor
312
fields, it is useful, even though not strictly necessary, and certainly not indispensable
for the following, to look at this from a slighly more general perspective (and we will
do this in section 14.3 below).
f: M R , (14.25)
f : R
(14.27)
( f )(y) = f ((y)) .
Now let us move on from scalars to vectors and covectors. Thinking of covectors as
linear functions on vectors, it is clear that upon restiction a covector field on M to
one obtains a covector field on since its action on any vector at x M is
well-defined, therefore in particular its action on vectors tangent to (which is all that
is required to make it a well-defined covector on ). In equations this amounts to the
statement that if U is a covector field on M , then it can be pulled back to a covector
field ua on via
x
ua = ( U )a = U = Ea U . (14.28)
y a
This is indeed (rather evidently now) a covector field on , i.e. transforms as such (while
it has become a scalar under coordinate transformations in M ). This construction can
also be understood in terms of the differentials dx and the restriction of the generally
covariant object U dx . Just as in our discussion of the induced metric, one can simply
restrict the dx to to obtain
U dx | = U Ea dy a ua dy a . (14.29)
In the same way one can pull back higher rank covariant tensor fields U... on M to ,
A special case of this is the pull-back of the (covariant components of the) metric (14.24).
313
Characteristic features of hypersurfaces, and what one can and cannot do on them, arise
from the fact that the Jacobian Ea of is not a square matrix and is therefore not
invertible even when it has maximal rank (as we assumed). We had used this before
to push forward vectors from to M (the map ya Ea in (14.17)) and we have
now been able to use it to pull back covectors from M to . However, because of the
non-invertibility of the Jacobian, neither can we use it on the nose to push forward
covectors on , or their basis dy a (I will not dwell on this, though), nor can we use it
(all by itself) to restrict (pull back) vectors on M to vectors on .
Indeed, given a vector field V (x) on M its restriction to or () is not all by itself
a vector field there because it need not be tangent to (or ()). This can be rectified
by projecting out the components normal to but this requires a metric, whereas the
pull-back of covariant tensors did not require this. I consider this projection procedure
to be somewhat simpler and more transparent from the embedded hypersurface point
of view, and we will discuss this in section 15.1.
I want to conclude this section with some (even less indispensable) remarks on the gener-
ality of the pull-back procedure and the difference between covariant and contravariant
tensor fields with respect to this operation:
F : N M (14.32)
for m M, n N .
3. More generally, covariant tensors can always be pulled back under arbitrary (dif-
ferentiable) maps by precisely the same procedure and formulae (14.30) as in the
314
case of embeddings. On the other hand, we already saw above that even with
an embedding one cannot simply pull back vector fields (or other contravariant
tensor fields).
The ability to pull back covariant tensors endows these tensors with a
crucial operation that is not available to the contravariant ones. It is
difficult to overemphasize the importance of this advantage.31
5. This is also one aspect of the naturality of the calculus of differential forms, based
on totally anti-symmetric covariant tensors, briefly mentioned in sections 3.6, 3.8
and 4.5.
6. This crucial distinction between covariant and contravariant tensors did not ap-
pear in our general discussion of tensors in section 3.3, because we were dealing
with coordinate transformations x (y ) on M . These can be thought of as (local)
diffeomorphisms)
: M M (14.34)
or
: U M (U ) M , (14.35)
i.e. suitably differentiable (smooth) and (locally) invertible maps. In that case,
the push-forward is as well-defined as the pull-back since one can set = (1 ) ,
and therefore both covariant and contravariant tensors could be transformed back
and forth between the coordinate systems x and y .
What you can do with Ea by contracting indices you are allowed to do.
If what you want to do would require the inverse of that matrix, or at least
something with the opposite index structure, you cannot do it (or at least
not without using some additional structure like a metric).
Nevertheless, for embeddings into spaces equipped with a metric the crucial dis-
tinction between pull-backs and push-forwards and between covariant and con-
travariant tensors is blurred by two facts, namely
315
and crucially by the fact that with the additional structure of a metric on
M we can in any case freely convert contravariant into covariant tensors and
vice-versa.
Thus in the following we can and will proceed without worrying too much about these
matters, and perhaps pragmatically speaking the only benefit of having suffered through
this section is that you may have gained a better understanding of why we can get away
with this in the case at hand, i.e. for embeddings into space-times equipped with a
metric.
In the following, in order not to have to introduce separate coordinates on from the
outset, for the most part we will use the 2nd description of a hypersurface, i.e. we will
work with embedded hypersurfaces defined by (14.3)
= {x M : S(x) = 0} (14.36)
for some function S(x) on M , rather than with embeddings (and we will see later how
e.g. the induced metric can be described and recovered from this point of view).
Implicitly the characterisation (14.36) of implies not only that S(x) = 0 on but
that {S(x) = 0} actually defines a codimension 1 hypersurface, i.e. that S(x) is not zero
when one moves off . We will furthermore choose the defining function S(x) in such a
way that it has a 1st order zero on S. This is not necessary in order to define , but it
avoids unnecessary complications (why would one want to define a horizontal plane in
R3 (with coordinates (x, y, z), say) by z 2 = 0 rather than by z = 0?).
( S)|S=0 6= 0 . (14.37)
One advantage of this description of and choice of S is that one can now at once, and
very concretely, describe the normal vectors to the hypersurface .
This means that any such tangent vector is orthogonal to the gradient vector g S
which is non-zero on by assumption. Thus on this gradient vector field is normal
to (and actually normal to the family of hypersurfaces defined by S(x) = const).
316
As in section 14.2,
spacelike if g S S < 0
is called timelike if g S S > 0 (14.39)
null if g S S = 0
Evidently, with g S also any vector field of the form = f g S for some scalar
f (x) (non-zero on ) is normal to ,
= f g S g V = 0 V tangent to , (14.42)
and in the case that is spacelike or timelike this freedom in the rescaling of the normal
vector can be used to normalise it in such a way that N = f S has unit length = 1.
This determines N uniquely up to a choice of sign. Explicitly, the choice
S
N = (14.43)
|g S S|1/2
is such that (
1 if is spacelike
N N = = (14.44)
+1 if is timelike
and such that N points in the direction of increasing S,
Remarks:
2. In the null case there is in general no such preferred choice of normal vector,
because any normal vector satisfies = 0. We will return to that case in
section 16.1.
317
3. As noted in section 14.2, if the hypersurface is given in parametric form x =
x (y a ), normal vectors are characterised by the condition
x
=0 Ea = 0 . (14.46)
y a
Thus the defining function S is related to the parametric description by the con-
dition that its gradient covector field S is in the kernel of the Jacobian,
Ea S = 0 . (14.47)
S S = S S = 0 . (14.48)
= 0 = 0 , (14.49)
318
then this is the necessary integrability condition for to be of the form S for some
function S. This condition is metric-independent, as it should be. It is well known from
standard (vector) calculus that locally this condition is also sufficient (if the curl of a
vector field is zero then locally it can be written as a gradient vector field etc.), i.e. one
has
= 0 (locally) S : = S . (14.50)
= f S = f S f S (14.51)
that satisfies
[ ] = ([ log f )] (14.52)
While this is true, in this form it is not a particularly useful characaterisation of hy-
persurface orthogonality because given it may not be straightforward to see if such a
function f exists or not. A more useful condition is the integrability condition implied
by this, namely
[ ] = ([ log f )] [ ] = 0 , (14.53)
In strict analogy with the above story for gradient vectors, this condition is also sufficient
to establish that locally can be written as = f S for some functions f and S,
[ ] = 0 (locally) S, f : = f S . (14.54)
[ ] = 0 [ ] = 0 (14.55)
is known as the hypersurface orthogonality condition and a vector field that satisfies
(14.55) is called hypersurface orthogonal.
Remarks:
1. The assertion (14.54) is known as Frobenius theorem, and the hypersurface orthog-
onality condition is therefore also known as the Frobenius integrability condition.
319
2. If we do not just have [ ] = 0 but actually [ ] = 0, then, as we saw
before, we can draw the stronger conclusion
[ ] = 0 (locally) S : = S , (14.56)
320
15 Hypersurfaces II: Intrinsic Geometry of non-Null Hypersur-
faces
In the case of spacelike or timelike hypersurfaces , with the normalised normal vectors
N at our disposal we can now construct the induced metric from the metric g on the
ambient space M . More generally, we will construct projectors that allow us to project
tensors on M restricted to onto directions (co-)tangent to .
1. It is orthogonal to N ,
N h = 0 , h N = 0 . (15.2)
Indeed,
h N = g N N N N = N 2 N = 0 ; (15.3)
2. For vectors V orthogonal to N , i.e. tangent to , the scalar product with respect
to h is identical to that with respect to g .
V N = 0 h V = g V . (15.4)
These two properties together imply that essentially h (restricted to ) is the metric
induced on by g . The precise relation to the induced metric (14.23)
hab = g Ea Eb . (15.5)
hab = g Ea Eb
= (h + N N )Ea Eb (15.6)
= h Ea Eb ,
where the 2nd equality follows from the fact that N is orthogonal to the Ea .
Remarks:
321
h (x) as a matrix is degenerate (as it has the null vector N ), while hab is
non-degenerate;
h is an M -tensor of type (0, 2) and a -scalar while hab is an M -scalar and
a -tensor of type (0, 2).
h = g g h , (15.7)
this h is not the inverse of the induced metric h (indeed, as we just noted,
h does not even have an inverse). Rather, one finds
h h = N N , (15.8)
3. We see in this example (and we will see and make use of this more generally
below) that on covariant tensors that are orthogonal to N in the sense that any
contraction with N is zero, we can convert space-time indices to hypersurface
indices using the Ea , i.e. we can convert such tensors into tensors on . For such
tangential space-time tensors this conversion does not lose any information (i.e.
one is not throwing away any components).
h = hab Ea Eb (15.10)
5. Using this expression for h , we can write (and interpret) the defining relation
h = g N N for h as a completeness relation for the linearly independent
vectors N and Ea , namely
g = hab Ea Eb + N N . (15.11)
322
The tensor h also provides us with the projectors allowing us to project a general
tensor onto its tangential components to . Indeed, first of all we can reinterpret the
result (15.8) as the statement that the tensors
h = g h = N N (15.12)
V N = 0 N N V = 0
(15.15)
V = fN N N V = f N (N N ) = V .
These projectors now allow one to map / project an arbitrary covariant or contravariant
space-time tensor field onto its components (co-)tangent to :
V 7 v = h V , v N = 0 . (15.16)
v = Ea v a . (15.17)
B 7 b h h B , (15.18)
where b satisfies
V N = W N = 0 b V W = B V W
(15.19)
V N b V = 0 .
bab = Ea Eb b . (15.20)
323
We see that the projection procedure is quite straightforward and simple in terms of the
normal vector provided by the defining function S of an embedded hypersurface. Using
(15.10), we can also write and interpret this projection in terms of the embedding data
x (y a ), in particular the Ea and the induced metric hab . Let us take a look at that in
the case of a vector field. Then we can write the projection (15.16) as
v = h V = h g V = Ea hab Eb g V . (15.21)
Taking this apart, we see that from the embedding point of view the projection procedure
(which is a single step procedure when expressed in terms of h ) consists of the following
sequence of steps:
use the space-time metric g to convert the vector field V into the covector field
V ,
V = g V (15.22)
vb = Eb V (15.23)
use the inverse hab of the induced metric to turn this into a vector field v a on ,
v a = hab vb (15.24)
Finally use Ea to push this forward to a tangent vector field v on the image
() M ,
v = Ea v a . (15.25)
This is a perfectly logical sequence of operations, but you may now understand why I
said in section 14.3 that I consider this projection procedure to be somewhat simpler
and more transparent from the embedded hypersurface point of view.
Given the induced metric hab on , one has the associated canonical Levi-Civita co-
variant derivative (i.e. the unique torsion-free metric-compatible connection) at ones
disposal to define covariant derivatives of -tensor fields. Let us temporarily denote
this intrinsic covariant derivative by (h) , so that e.g.
where
(h)bac = 21 hbd (c had + a hcd d hac ) . (15.27)
324
On the other hand, given a space-time vector field V that is tangent to (on ), i.e.
v h V = V V = Ea v a , (15.28)
we can define its projected covariant derivative along by taking its covariant derivative
and then projecting it to . Let us denote this covariant derivative by , so that e.g.
v = h h v .
(15.29)
Since this is now a projected tensor, it can be pulled back without loss of information
to , i.e. pragmatically speaking we can convert its indices using Ea ,
v
ab = Ea E
(v)
b v . (15.30)
Given that we now appear to have two natural notions of differentiation of -tensor
fields, the obvious question that arises at this point is what is the relation between the
two, and the (reassuring, and perhaps not too surprising) answer is that they are equal,
Ea Eb
v = a(h) vb . (15.31)
The quickest way to see this is to prove that the projected covariant derivative is sym-
metric (torsion-free, covariant derivatives commute on scalars) and compatible with the
induced metric. The first property is obvious since
,
[ ]f = h h [ , ]f = 0 , (15.32)
h = g N N h = (( N )N + N ( N )) (15.33)
since this expression vanishes after projection into the directions orthogonal to N .
Thus for projected tensors the projected covariant derivative is equal to the intrinsic
covariant derivative (up to pull-back), and the obvious next questions are e.g. what
are the normal components of the covariant derivative of a projected tensor? or what
are the projections of the covariant derivative of a non-tangential tensor, i.e. a tensor
with a normal component? These are legitimate and interesting questions. However,
they go beyond the intrinsic geometry of hypersurfaces and bring us into the realm of
extrinsic geometry, a subject that will be addressed (briefly) in section 17.
Let be a non-null hypersurface, with local coordinates y a , and hab a metric on , e.g.
the metric induced by a metric g on the ambient space-time M . Then hdn y, with
325
is an invariant volume element on and integration of -scalars f can be defined by
Z Z
f := hdn y f (y) . (15.35)
Integrals over hypersurfaces arise in particular from applications of the Gauss theo-
rem (or Gauss-Stokes theorem) which allows one to express the volume integral over
some space-time region V of a covariant divergence as an integral over the boundary
hypersurface
= V (15.36)
Moreover, by the usual cofactor / minor formula for the components of the inverse
metric (4.78), one has
det hik
g SS = = det(hik )/g , (15.42)
det g
where hik = gik refers to the (ik)-components of the induced metric h in the adapted
coordinates (S, xi ),
g = h + N N gik = hik . (15.43)
326
Therefore we can write the factor gg S appearing in (15.39) as
q
S
g g = g N |gSS |
p (15.44)
= | det(hik )|N .
so that we can also write the Gauss theorem in the convenient and transparent form
Z Z Z
D
gd x J = dn y h N J d J . (15.47)
V =V
V = {1 } {0 } (15.48)
(the minus sign indicating that we equip 0 with the opposite orientation to that induced
by V so that e.g. both surfaces k have future-pointing normal vectors). Then one finds
that Z Z
Q1 Q0 = d J d J
Z 1 0
(15.49)
D
= gd x J = 0 ,
V
so that (under suitable asymptotic conditions) covariantly conserved currents will lead
to conserved charges. Analogously, and somewhat more generally, this shows that if
one has a family c of hypersurfaces, sweeping out a space-time volume V = c c , the
integral Z
Qc = d J (15.50)
c
is independent of c, i.e. the charge in invariant under deformations of the hypersurface.
One common instance where the issue of hypersurface orthogonality discussed in section
14.5 plays a crucial role is in the distinction between what are known as stationary
metrics (or space-times, or gravitational fields) versus static metrics (or space-times, or
gravitational fields). Both terms refer to gravitational fields that are in a suitable sense
time-independent, but static is a stronger condition than stationary.
327
I used the word static in connection with the metric (2.88),
while in the discussion of the Newtonian limit of the geodesic equation in section 2.7 I
used the term stationary to refer to the condition (2.111) that the coefficients of the
metric be time-independent. In general, we will define a metric to be stationary if it has
a time-translation invariance, in the sense that one can find coordinates x = (t, xk ),
say, such that = t is timelike and that none of the coefficients g of the metric
depend on t,
Stationary Metric: t g = 0 . (15.52)
Thus the general form of a stationary metric, without assuming the existence of any
further symmetries, is just
This can be stated in a geometrically more invariant way as the condition that the metric
admits a timelike Killing vector (cf. the discussion in sections 2.6 and 8.5). Locally,
one can then always introduce coordinates such that the Killing vector has the form
= t (see the discussion after (8.57)), so that in these coordinates the symmetry is that
of t-translation invariance, as in (15.52). For present purposes this locally equivalent
characterisation of the existence of a time-translation symmetry is good enough.
It will nevertheless be useful (even for present purposes) to be able to write the condition
(15.52) in a somewhat more covariant form. To that end, note that for a vector field of
the form = t one has (repeating the calculation leading to (4.68))
= t = t
(15.54)
+ = t g
Thus we find that the fact that the metric is t-translation invariant can be characterised
covariantly as the statement that = t satisfies
t g = 0 + = 0 . (15.55)
The metric (15.51) of course has the property that all the metric coefficients are t-
independent so it is certainly stationary, but it also has the further property that = t
is hypersurface orthogonal. Indeed, in this case = t is evidently normal to the
constant time hypersuraces t t0 = 0.
This need not be the case, however. To set the stage, consider an arbitrary metric
written in coordinates x = (t, xk ) and let be the vector = t , i.e. = t . Then
its metric dual covector has the covariant components
= t = g = gt . (15.56)
328
In particular, t is also the norm-squared of ,
t = gtt = g = , (15.57)
gtt t f S , (15.61)
In general, a static metric is by definition a metric which is stationary and which is such
that the vector field = t generating the time-translation symmetry is hypersurface
orthogonal,
The converse to this is also true, i.e. given a stationary metric such that = t is
hypersurface orthogonal, one can find a coordinate transformation (really just t T ),
such that t = T , T g = 0, and such that gT k = 0 so that t is manifestly orthognoal
to the surfaces of constant T .
I will give two proofs of this, one using the integrated version = f S of the
hypersurface orthogonality condition, and the other using the integrability condition
(14.55) for hypersurface orthogonality, in conjunction with the covariant characterisa-
tion (15.55) of a stationary metric.
1. The first proof is somewhat pedestrian and not particularly elegant but has the
virtue that it is clear from the beginning where one wants to go and what one is
doing to get there. We begin with the hypersurface orthogonality condition in the
form
!
= gt = f S . (15.64)
329
In particular,
= t = gtt = f t S 6= 0 . (15.65)
Now, since t S 6= 0, S has to depend on t, but this dependence needs to drop out
of the ratio k S/t S. This implies that, as far as its t-dependence is concerned, S
is a linear function of t with constant coefficients,
t (f t S) = 0 t f = 0 . (15.68)
Without loss of generality we can asume that b = 0 (either because only depends
on S, or by absorbing it into s(xk )), and that a = 1 (by absorbing the constant
a into f , say). Thus we can assume that S(x ) has the form
and
gtk /gtt = k S . (15.71)
This again strongly suggests that the right thing to do is to introduce a new
time-coordinate T through
with
t S = 1 T = t . (15.74)
330
Then the metric is
ds2 = f (xk )dT 2 + (gik (xk ) f (xk )i s(xk )k s(xk ))dxi dxk (15.75)
with
T g = t g = 0 , gkT = 0 . (15.76)
2. For the second proof, we start with the integrability condition (14.55). Using
the fact that is anti-symmetric, we can write it in a way which makes its
anti-symmetry in the indices (, ) manifest,
+ 21 ( ) = 0 . (15.77)
Contracting this expression with , and using the abbreviation 2 = for the
norm of , one deduces
( 2 ) ( 2 ) + 2 ( ) = 0 , (15.78)
( / 2 ) ( / 2 ) = 0 . (15.79)
= 2 S (15.80)
t = 2 t S = 1 , (15.81)
a condition which thus arises here seemingly in a somewhat different way than
before. We can then deduce that S is of the form
so that
k = gkt = 2 k S = gtt k s . (15.83)
331
Then we can deduce that T = t and that in the new coordinates the off-diagonal
component of the metric gKT is
x x x
gKT = g = gt
xK T xK (15.85)
= gkt gtt k s = 0 .
Either way we have shown that the general form of a static metric, without assuming
the existence of any further symmetries, can be chosen to be of the block-diagonal form
Static Metric: ds2 = gtt (xk )dt2 + gik (xk )dxi dxk (x = (t, xk )) . (15.86)
Remarks:
1. We will see in section 23.2 that a stationary and spherically symmetric metric is
automatically static. This follows easily from the fact that for a stationary metric,
and in spherical symmetry, in coordinates (t, r, , ) suitable for expressing both
these facts, the only allowed off-diagonal gtk -term of the metric is C(r) = gtr (r),
so that the (t, r)-part of the metric takes the form
whose geodesics were discussed in section 2.4, are the special case of static metrics
for which the norm of = t is constant.
3. We see from a comparison of (15.53) and (15.86) that an equivalent way of char-
acterising static metrics is that they are invariant under time translations (sta-
tionary) and invariant under time reflections t t or t c t
332
5. The prime example of a stationary but not static metric is precisely of this kind.
This is the Kerr metric describing the gravitational field outside a rotating star
(or black hole), briefly mentioned in section 29.1. This solution is stationary and
axially symmetric (around the axis of rotation), but it turns out that the space-
time is distorted in the direction of the rotation. In suitable coordinates (t, r, , )
this manifests itself in the fact that the metric coefficients are independent of t
(stationarity) and (axial symmetry), but do depend not only on r but also on .
Thus, in agrement with the remark above, the metric is not spherically symmetric.
Under t t, the sense of rotation is changed and the corresponding metric
cannot be invariant under this operation because the gravitational field is now
distorted in the opposite angular direction. In fact, in these coordinates the metric
turns out to have a non-vanishing gt (r, ), and gt gt under t t.
333
16 Hypersurfaces III: Intrinsic Geometry of Null Hypersurfaces
The scalar product between a null and a timelike vector is always non-zero, because it
picks out the time-component of the the null vector. Thus we also learn that a tangent
vector to a null hypersurface cannot be timelike and is thus either null or spacelike.
One consequence of this is also a converse to the above statement, namely that a null
tangent vector to a null hypersurface N is also normal to N ,
or
tangent to N and = 0 . (16.3)
Intuitively, this is clear, because if were null, tangent and not proportional to the
normal , there would be 2 linearly independent null vectors tangent to N , but for
that to be possible N would need to be timelike.
tangent to N = f + s (16.4)
for some function f and spacelike vector s , and noting that the 2 requirements
!
tangent = s = 0 (16.5)
and
!
null = 2f s + s s = 0 (16.6)
Since in general for a null hypersurface one has = 0 for any normal vector ,
we cannot normalise it as in the spacelike or timelike case, However, given the defining
function S, a convenient and natural choice for a normal vector is
= S , (16.7)
334
where the sign has been chosen in such a way that is future-oriented for a function
S that increases towards the future (for an illustration of this see the examples below
where S = t x or S = t r have this property). All other normal vectors are then of
the form
= f (16.8)
Examples:
One could have also made a more symmetric choice for (u, v), of course, this is
irrelevant for what follows and the present asymmetric choice just serves to avoid
some other factors of 2 further down. The signs of the lightcone coordinates have
been chosen in such a way that the null vector fields u and v are future-oriented
(i.e. u and v grow with increasing t).
Now consider the family of hypersurfaces (straight lines) defined by
On the one hand, the complementary null coordinate v provides a good coordi-
nate on each null line u = const. On the other hand, a normal vector to this
hypersurface is
= g S = g u = g u (16.11)
i.e.
= = gu = +v . (16.12)
We see that
2. If we add further spatial directions (y, z), then the null hypersurface (hyperplane
in this case) u = const. would be parametrised by the null coordinate v and the
spatial coordinates (y, z). A slightly more interesting higher-dimemsional example
of a null surface is provided by introducing spherical coordinates for 4-dimensional
Minkowski space,
ds2 = dt2 + dr 2 + r 2 d2 , (16.13)
335
and replacing S(t, x) = t x by its 4-dimensional radial counterpart
S(t, r, , ) = t r . (16.14)
Then S = 0 defines the future lightcone of the origin (see also example 3 in section
14.1), and it is clearly a null surface. Indeed, the normal vector = S has
components
( ) = (t = 1, r = +1, = 0, = 0) = 0 . (16.16)
dx ()
= (x()) , x (0) N , (16.17)
d
lie entirely in the null hypersurface N . These curves turn out to be null geodesics,
although not necessarily affinely parametrised,
, (16.18)
and the same thing is true for any other choice of normal vector ,
. (16.19)
3. and then for any null vector field satisfying the hypersurface orthogonality
condition (14.55).
Proof:
336
1. Let = S. Then
= = 12 ( ) . (16.20)
( ) = 0 on N
In this case clearly is not only geodesic but even affinely parametrised,
= 0 . (16.21)
( ) 6= 0 on N
In that case ( ) is normal to N . Since it is normal to N , it is necessarily
proportional to the normal vector ,
( ) , (16.22)
for some scalar function (x) measuring the inaffinity (lack of affinity) of the
family of geodesics (geodesic congruence) defined by the normal vector field (as
in (2.35) for a single geodesic curve).
Remarks:
(a) The situation in the 1st case arises if S(x) = c defines a family of null
hypersurfaces, i.e. not just the surface N defined by S(x) = 0 is null but also
the surfaces S(x) = c for c in some interval around 0, because then = 0
not just on the surface S(x) = 0 but in a neighbourhood of N .
(b) In the 2nd case one could have chosen
S = (16.24)
337
2. Now let = f be any normal vector to N . To establish (16.19), it is sufficient
to note that
= f (f )
= f ( f ) + f 2
(16.25)
= (f f + f 2 )
= ( f + f ) .
Thus we have shown that is geodesic iff is geodesic and that the inaffinities
of and = f are related by
= f = f + f . (16.26)
3. Let us assume that we are given a hypersurface orthogonal vector field . Ex-
plicitly, we can write the condition (14.55) as
( ) + ( ) + ( ) = 0 . (16.28)
containing two kinds of terms. The 1st term is of the familiar type already dealt
with above. Either = 0 also off the surface, or ( ) . Either way,
the 1st term is zero. We are thus left with the condition
= . (16.30)
V W = V W W = f V (16.31)
338
In the case at hand, this implies
(16.30) , (16.32)
which is precisely the statement we set out to prove, namely that on N the null
normal vector field is (possibly non-affinely) geodesic.
Since any point on N lies on one of these null geodesics, one says that the null surface
is generated by these null geodesics. The null geodesics, in turn, are known as the null
generators of N .
Remarks:
1. Returning to the examples discussed at the beginning of this section, in both cases
the normal and geodesic null vector field = v is actually affinely parametrised,
= v = vv = 0 , (16.33)
i.e. the null coordinate v is an affine parameter along these (right-moving respec-
tively radial outgoing null geodesics), and is thus e.g. a natural coordinate to use
on N . The reason one finds affinely parametrised geodesics in this case is that
S(u) = u = c defines a family of null hypersurfaces.
2. In this general context of null surfaces, the inaffinity associated with a particular
choice = f of normal vector field has no particular significance since, as we
have seen, it can be changed at will by changing f .
3. However, these geodesics and their associated inaffinity acquire a particular impor-
tance when the normal vector field in question cannot be rescaled in an arbitrary
way by a scalar f .
This arises for example when one has a Killing vector that becomes normal
to some null hypersurface, and is thus in particular null there (this hypersurface
is then called a Killing horizon). Since f (x) will then not be a Killing vector
unless f (x) = a is constant, the ambiguity in the inaffinity is greatly reduced, to
multiplication of by a constant,
a a . (16.34)
339
16.3 Adapted Coordinates and Induced Metric for Null Hypersurfaces
Since in general the null geodesics which are the generators of a null surface are naturally
associated with the null surface, it is also convenient to adapt the coordinates y a on N
to by choosing the coordinates to be
y a = (v = , y k ) (16.35)
where is the (not necessarily affine) parameter along the null geodesics and y k are
spatial coordinates labelling the individual null geodesics. In particular, therefore, the
y k are constant along the null geodesics and can be constructed e.g. from the constants
of motion or the constants of integration of the null geodesic equation.
In these coordinates, the tangent vectors Ea (14.16) to the null surface are
x x
Ev = = , Ek = (16.36)
y k
hvv = g = 0 , hvk = g Ek = 0 ,
hkm skm = g Ek Em , (16.37)
where hvk = 0 follows because by construction the Ek are tangent to the surface while by
definition is normal to the surface and therefore in particular normal to the tangent
vectors Ek .
Thus the metric is clearly degenerate (a characteristic feature of null surfaces) and the
line element takes the form
ds2 |N = skmdy k dy m = g Ek Em dy k dy m . (16.38)
Note that this form of the metric is independent of whether one chooses to be the
original (perhaps non-affine) parameter or the affine parameter, as this just amounts to
changing = f for a suitable choice of f , so that one still has hvv = hvk = 0.
as could also have been deduced directly by restricting the metric (16.15) to u = tr = 0,
2dudv + (v u/2)2 d2 |u=0 = v 2 d2 . (16.40)
340
16.4 Projectors for Null Hypersurfaces
As in the case of non-null hypersurfaces, one can also study the induced metric from
the point of view of embedded hypersurfaces and projection operators. However, the
construction is somewhat different in this case because the tangent directions Ea and
the normal direction are not independent. It is clear that, in order to e.g. have a
completeness relation akin to (15.11), we should adjoin to the spatial tangent directions
Ek and the null tangent direction to the surface another linearly independent vector
which can conveniently be chosen to be a null vector n on N , but of course not tangent
to N , such that
n n = 0 , n Ek = 0 , n 6= 0 . (16.41)
We can always rescale n in such a way that n = 1, and this is a convenient choice
we will adapt in the following (the minus sign having been chosen such that n is future
directed iff is future directed). Thus, given a choice of spatial basis vectors Ek , the
set-up is the set of vectors { , n , Ek } satisfying the relations
n n = = 0 , n Ek = Ek = 0 , n = 1 . (16.42)
g = s ( n + n ) , (16.44)
or
s = g + ( n + n ) . (16.45)
Note that s is invariant under the boost (16.43). This tensor has the properties
s = s n = 0 (16.46)
and
V = V n = 0 g V = s V . (16.47)
It thus defines the induced metric in the directions orthogonal to and n , and is thus
the induced (degenerate, spatial) metric on the surface N , the properties
s = Ek Em
km
s , skm = Ek Em
s (16.48)
being the analogues of (15.10) and (15.6) respectively. One also has
g = skm Ek Em
( n + n ) , (16.49)
341
which is the null analogue of the completeness relation (15.11). The properties (16.46)
and (16.47) also imply
g s = s s = s = n 1 . (16.50)
s = + ( n + n ) , s s = s (16.51)
s = s n = 0 , (16.52)
v = s V v = v n = 0 . (16.53)
v 6= 0 v v > 0 (16.54)
A variant of the set-up in this section (in particular the auxiliary complementary null
vector n and the corresponding projectors) appeared in the discussion of the Ray-
chaudhuri equation for null geodesic congruences in section 11.4.
As an aside (a useful aside, though), note that this entire set-up can be phrased in
a somewhat more satisfactory manner in terms of an orthonormal basis or frame Ea
(section 3.8) rather than in terms of the basis Ek associated to the choice of coordinates
y k (similar remarks apply to the timelike case). Namely, by introducing suitable linear
combinations
Ea = Eak Ek (16.55)
{EA } = {E+ = , E = n, Ea } : g EA EB = AB (16.57)
with
++ = = 0 , + = 1 , a+ = a = 0 , ab = ab . (16.58)
342
Then the boost (16.43) really is a Lorentz boost (in the tangent space), and the ambigu-
ity in the identification of a choice of null vector n and complementary spatial directions
can be identified as the possibility to perform a null Lorentz rotation (11.81) around ,
, n n + a Ea + 21 2 , Ea Ea + a , (16.59)
where 2 = ab a b . Note that this transformation leaves invariant and the orthonor-
mal frame counterpart
n n = = 0 , n Ea = Ea = 0 , n = 1 (16.60)
(u , v , z a ) = (u, v + a z a + 12 2 u, z a + a u) , (16.62)
343
17 Hypersurfaces IV: Extrinsic Geometry of non-Null Hypersur-
faces
In this section we will briefly touch upon some aspects of extrinsic geometry, more
specifically of the extrinsic geometry of non-null hypersurfaces. One can also develop
the extrinsic geometry of null hypersurfaces and of surfaces of higher codimension, but
we will not do this here.
As mentioned before, the (local) intrinsic geometry of a space, i.e. the properties that
can be deduced by measuring lengths, areas, volumes, angles, performing parallel trans-
port etc in that space, is completely described by the metric and objects that can be
derived from it, like the Riemann curvature tensor. In particular, the intrinsic geometry
of a hypersurface , is completely described by its metric, e.g. by the metric induced
on it by a metric on the ambient embedding space M .
However, for hypersurfaces there is another aspect of the geometry of beyond its
purely intrinsic geometry, namely how it is embedded into the ambient space M , i.e.
how it bends inside M . As one needs to be able to move off to even detect that there
is such an embedding, this aspect of the geometry is something that cannot be captured
by intrinsic measurements on alone, and is therefore known as the extrinsic geometry
of .
Before developing this, let us look at some simple examples of embedded hypersurfaces:
1. Cylinder C R3
For example, a cylinder
C = R S1 (17.1)
In this way it clearly inherits the flat metric from R2 . This flat metric is the
induced metric on the cylinder when one embeds it in the standard way into R3 .
Indeed, introducing cylindrical coordinates (r, , z) in R3 , the metric on R3 takes
the form
ds2 = dr 2 + r 2 d2 + dz 2 . (17.3)
344
and restricting to constant r = L, one obtains a cylinder with circumference 2L
and induced metric
ds2 |r=L = L2 d2 + dz 2 . (17.5)
Since the components of this metric are constant, the Christoffel symbols and the
curvature tensor are zero. Thus, the intrinsic curvature of the cylinder is zero, it is
flat (and locally looks just like Euclidean space). In particular, parallel transport
is rather obviously path independent. The fact that it looks curved to an outside
observer is therefore not something that can be detected by somebody performing
local measurements on the cylinder.
2. Torus T 2 R3 and T 2 R4
Let us now consider the 2-torus T 2 . If one visualises it in the standard way as (the
surface of a doughnut) embedded in R3 , then it inherits a non-flat metric from the
ambient flat metric on R3 . To see this explicitly, place the torus around the origin
of the above cylindrical coordinates, i.e. such that its center is at r = z = 0
and such that it is invariant under rotations in around the z-axis. Fixing ,
the cross-section of the torus is a circle of radius L2 , say, centered at a distance
r = L1 > L2 from the orgin at the point (r = L1 , z = 0). Thus the points on this
circle, and therefore, by including , the points on the T 2 are described by the
equation
S(r, z, ) = z 2 + (r L1 )2 L22 = 0 . (17.6)
It is thus intrinsically flat, but at the moment we have not embedded this flat
torus into any higher-dimensional space. It is not possible to embed T 2 into R3
in such a way that the induced metric is this flat metric, but it is easy to see that
it is possible to achieve this via an embedding into R4 .
345
Indeed let us introduce polar coordinates (r1 , 1 ) in the (12)-plane, and (r2 , 2 )
in the (34)-plane, so that the Euclidean metric on R4 has the form
Then all the lines of constant (r1 , r2 , 2 ) and of constant (r1 , r2 , 1 ) are circles in
orthogonal (12)- and (34)-planes in R4 . Thus the surfaces of constant r1 and r2
are tori, and choosing r1 = L1 and r2 = L2 , one finds that the metric induced
on this torus by the ambient flat metric on R2 is precisely the above flat metric
(17.8),
ds2 |rk =Lk = (L1 )2 (d1 )2 + (L2 )2 (d2 )2 . (17.11)
In a parametric description x (1 , 2 ), with respect to the Cartesian coordinates
x on R4 , this Clifford embedding of T 2 into R4 is given by
with the standard non-flat induced metric on S 3 in turn inducing the flat metric
on the embedded T 2 .
3. Circle S 1 R2
In order to understand how to quantify that both for the flat cylinder and the flat
torus the extrinsic geometry is non-trivial it is sufficient to look at the simplest
possible lower-dimensional counterpart of this example, namely a 1-dimensional
closed space (a loop S 1 ) embedded into R2 as a circle of constant radius L.
This one-dimensional space is evidently intrinsically flat (because the Riemann
tensor vanishes identically in 1 dimension), but equally evidently the circle seems
to bend / curve around in 2 dimensions (in order to be able to form a circle in
the first place).
In order to quantify this somewhat, one possible strategy is to determine how the
(unit) normal vector N = r to the circle changes as one moves along the circle.
The change in r in the ambient space is given (in polar coordinates) by
r = (r ) = r . (17.14)
346
This vector is already tangent to the circle (had it not been, we could have now
projected it back), and the result can be written as
Alternatively, in order to explore how the embedded circle sits inside the ambient
geometry, one can study how the induced metric changes in the normal direction,
r g |r=L = 2L . (17.16)
In order to capture this extrinsic aspect of the geometry in general, we are thus led to
define the extrinsic curvature of in M either by
(1)
K = h h N (17.17)
Cooperatively and conveniently, the two tensors defined in (17.17) and (17.18) turn out
to be identical in general. To see this, we first make use of the formula (8.36)
LN g = N + N (17.20)
(2)
for the Lie derivative of the metric to write K as
(2)
K = 12 h h ( N + N ) . (17.21)
This already resembles (17.17), apart from the explicit symmetrisation in (17.21). This
symmetrisation, however, is not necessary: since N is by definition hypersurface or-
thogonal, its anti-symmetrised derivative satisfies (14.52)
[ N] = V[ N] (17.22)
for some (gradient) vector V , and therefore the tangential projection of the anti-
symmetric part of N is equal to zero, and we can simplify (17.21) to
(2) (1)
K = h h N = K . (17.23)
347
We can therefore drop the labels on K and define the (symmetric) extrinsic curvature
tensor K by
K = 21 h h LN g = h h N = K . (17.24)
Remarks:
1. Due to the tangential projections, this definition is independent of how the normal
vector N is extended off the hypersurface . If it is extended in such a way
that N N = also off (e.g. if N is the normal vector field to a family of
hypersurfaces), then ( N )N = 0 and the 2nd projection in the above definition
is unnecessary. In that case one finds
K = h N = N N a (17.25)
where
a = N N (17.26)
is the acceleration of N , and the 2nd term is simply there to subtract this
normal component of the 1st term. In particular if, as suggested in section 14.4,
N is extended off the hypersurface as an affinely parametrised geodesic vector
field, one simply has K = N .
Ea Eb N = Ea ( Eb )N
(17.29)
= (a Eb + Ea Eb )N
3. The induced metric hab and the extrinsic curvature tensor Kab (or equivalently
h and K ) are also known as the 1st fundamental form and 2nd fundamental
form of respectively.
348
4. Writing the hypersurface orthogonal N as
N = f S (17.31)
N = ( f /f )N + f ( S S) . (17.32)
The first term is killed by the tangential projection, and we see that the remaining
second term is manifestly symmetric. In adapted coordinates, i.e. choosing S to
be one of the coordinates, one evidently has S = 0, and therefore
5. The extrinsic curvature also depends on a choice of orientation convention for the
normal vector (such as inward pointing versus outward pointing in situations
where this makes sense). When one has several boundary components, some of
them timelike and some of them spacelike, say, each one with its own extrinsic
curvature tensor, sorting out ones signs in extrinsic geometry provides one with
a practically unlimited source of entertainment and/or frustration.
6. The trace of the extrinsic curvature tensor is identical to the space-time divergence
of the vector field N ,
K := g K = h K = N . (17.34)
7. In the case of the circle S 1 R2 of radius L discussed above, the only non-zero
component of K is
K = L (17.35)
and the trace of the extrinsic curvature of a circle of radius L is
349
n Rn+1 of
8. More generally, the trace of the extrinsic curvature of the sphere SR
radius R, with its standard metric
is
n
K = 21 hab (r hab )|r=R = . (17.38)
R
9. An elementary property of the extrinsic curvature is that K = 0 if the normal
vector happens to also be a Killing vector,
N = N K = 0 , (17.39)
N = f K , K + K = 0 K = 0 , (17.40)
N = f K + ( f /f )N (17.41)
u u = u N = 0 K u u = 0 . (17.42)
K u u = u u N = u (u N ) (u u )N = 0 . (17.43)
One also sees that the geodesics need not be affinely parametrised for this to hold;
if one has u u u , one still has K u u = 0.
Roughly speaking this says that geodesics do not bend in the ambient space, and
this can be made more precise in the context of the extrinsic geometry of surfaces
350
of higher codimension (like curves). In that case (which we will not develop here),
one can define extrinsic curvatures associated with all of the normal directions to
the surface (K takes values in the normal bundle), and for a curve the geodesic
equation turns out to be the condition that this extrinsic curvature tensor vanishes.
We will briefly return to this issue in the next section.
In section 15.2 we had already seen that the tangential projection of the covariant
derivative of a tangential vector field V , i.e. V N = 0 or V = Ea v a , agrees with
the intrinsic covariant derivative defined by the induced metric hab , i.e.
V )Ea E
a(h) vb = ( a vb . (17.44)
b
where
V = h h V .
(17.45)
1. For example, if the vector V is not tangent to , then one can decompose V
into a tangent and normal part according to
V = Ea v a + (N V )N . (17.46)
Ea Eb V =
a vb + Ea E ((N V )N )
b
= a vb + (N V )Ea E N
(17.47)
b
a vb + (N V )Kab .
=
2. The extrinsic curvature tensor also enters when one inquires about the normal
component of the simply-projected quantity Ea V (with V again assumed to
be tangential, say, V N = 0). This normal component is given by the scalar
product with N , and can be written as
a v b E Kab v b N .
Ea V = (17.49)
b
Both (17.47) and (17.49) illustrate that the extrinsic curvature tensor is essentially
the same as a -tensorial repackaging of the normal components of the connection
(Christoffel symbols ), as already anticipated in (17.33).
351
3. Note that (17.48) implies that for a vector field V tangent to , V N = 0 one
has
K V V (or Kab v a v b ) = (V V )N , (17.50)
V V = 0 K V V = 0 , (17.51)
4. From (17.49) one can deduce a stronger statement. Namely, contracting with v a
one has
a v b )E Kab v a v b N .
V V = (v a (17.52)
b
Thus we see that all geodesics on (with respect to the induced metric hab ) are
also geodesics of the embedding space if and only if K = 0. Such hypersurfaces
are called totally geodesic. In particular, by the comment in the previous section,
made in connection with (17.40), constant time surfaces in static space-times are
examples of such totally geodesic hypersurfaces.
Similar manipulations to those performed above allow one to obtain the so-called Gauss-
Codazzi equations, which express certain components of the space-time curvature tensor
and K (or
(restricted to ) in terms of the intrinsic and extrinsic curvatures R
R abcd and Kab ) of the hypersurface .
We first consider the space-time Riemann tensor with purely spatial components and
its relation to the intrinsic Riemann curvature tensor R (or R
abcd ) of the metric h
(or hab ) on . For example, if V is tangent to , then one can define the Riemann
curvature tensor of h by
,
[ V
]V = R (17.53)
where the term V on the left-hand side is the fully projected expression
V = h h h (h h V ) .
(17.54)
Analysing this bit by bit, for example the term h h h evaluates to
h h h = h h (N N )
= h h ( N )N (17.55)
= K N
352
and this vanishes after anti-symmetrisation in and because K is symmetric.
Another contribution is (by the same calculation as above, and using N V = 0)
h h ( h )h h V = K N h V
= +K V h N (17.56)
= +K K V .
Therefore
= h h h h R + (K K K K ) .
R (17.57)
It thus expresses the purely tangential components of the space-time curvature tensor
in terms of the intrinsic and and extrinsic curvature tensors of .
It requires significantly less effort to express the component of the space-time Riemann
tensor with 1 normal component and 3 tangential components in terms of the extrinsic
curvature. Indeed, simply calculating [ K] one finds on the nose
K = h h h ( K K )
K
= h h h ( N N ) (17.59)
= h h h (R N )
or
R N Ea Eb Ec =
c Kab
b Kac . (17.60)
Remarks:
1. We could have set up the calculations in such a way that we obtain directly
a
the -tensorial form (17.58) or (17.60) of the results, by starting with b vc ,
say, but then we would have had to deal with covariant derivatives of the Ea at
intermediate stages of the calculation - the derivation given above appears to be
somewhat simpler in that respect (but this may be a matter of taste).
between the intrinsic and extrinsic curvature tensors. This can be directly verified
e.g. for the sphere S n Rn+1 of radius L, for which one has = +1 and
353
We also see that if the induced metric on such a hypersurface is flat, then nec-
essarily the extrinsic curvature tensor is also zero. This also substantiates the
claim, made in the introduction to this section, section 17.1, that there can be no
embedding of the flat torus T 2 into R3 , because such an embedding would have
to be both intrinsically and extrinsically flat.
3. Note that the above (purely tangential, or 3 tangential and 1 normal) components
of the Riemann tensor could be expressed in terms of R abcd , Kab and K
ab , i.e.
in terms of the tangential and 1st normal derivatives of the metric. In general the
Riemann tensor depends on all the second derivatives of the metric, in particular
also on the second normal derivatives of the metric. Thus the remaining compo-
nents of the Riemann tensor (with two normal and two tangential directions) are
more complicated and cannot be expressed solely in terms of the intrinsic and ex-
trinsic curvatures of and their tangential derivatives, and we will not determine
their explicit form here.
R = g R = h R + N N R (17.63)
depends explicitly on the components of the Riemann tensor with 2 normal com-
ponents.
This previous remark notwithstanding, certain components of the Ricci tensor and cer-
tain components of the Einstein tensor can be expressed entirely in terms of the intrinsic
and extrinsic curvature tensors of .
For example, contracting (17.59) with h or g (this has the same effect on tangential
tensors), one finds
K
K = h R N a Kab
R Ea N = bK a , (17.64)
a
which expresses the mixed normal / tangential components of the space-time Ricci
tensor in terms of the (tangential derivatives of the) extrinsic curvature tensor of .
Because N and Ea are orthogonal with respect to the space-time metric, one has the
same expression for the mixed components of the Einstein tensor
G = R 21 g R , (17.65)
namely
a Kab
G Ea N = bK a . (17.66)
a
Moreover, from (17.63) one finds, using the symmetries of the Riemann tensor, that the
Ricci scalar and the normal-normal component of the Ricci tensor can be written as
R = g R = h h R + 2h N N R
(17.67)
R N N = h N N R .
354
It follows that the normal-normal component of the Einstein tensor has the simple form
G N N = R N N 12 g N N R
= R N N 21 R (17.68)
= 12 h h R .
(K K K K ))
2G N N = h h (R
(17.69)
=R + (K K K 2 ) .
Finally, we will also derive a useful expression for the Ricci scalar. First of all, from
(17.67) and (17.69) we have
+ (K K K 2 ) + 2h N N R
R=R
(17.70)
+ (K K K 2 ) + 2R N N .
=R
The first two terms in this expression are already of the desired form, depending only
on the intrinsic and extrinsic curvature of , while the third is not. However, it turns
out that, up to a total derivative, we can trade R N N for a term depending only
on K . Indeed, it is straightforward to establish the identity
(N N N N ) = ( N ) N + N [ , ]N ( N ) N
= R N N + K K K 2 .
(17.71)
The only minor subtlety is to verify that no normal components of N contribute to
( N ) N , so that one indeed has
( N ) N = K K , (17.72)
and this in turn follows from N N = 0 etc. With the help of this identity we can
eliminate R N N from (17.70) and write the scalar curvature (now with the opposite
sign for the K 2 -term) as
+ (K 2 K K ) + 2 (N N N N ) .
R=R (17.73)
These relations play an important role in particular in the Hamiltonian and initial value
formulations of the Einstein equations, where the first step is the choice of an initial
spacelike hypersurface and an accompanying 4 3+1 decomposition of the curvature
tensor and the Einstein equations. This will be discussed in section 20.
355
C: Dynamics of the Gravitational Field
356
18 The Einstein Equations
18.1 Heuristics
We expect the gravitational field equations to be non-linear second order partial dif-
ferential equations for the metric. If we knew more about the weak field equations of
gravity (which should thus be valid near the origin of an inertial coordinate system) we
could use the Einstein equivalence principle (or the principle of general covariance) to
deduce the equations for strong fields.
However, we do not know a lot about gravity beyond the Newtonian limit of weak time-
independent fields and low velocities, simply because gravity is so weak. Hence, we
cannot find the gravitational field equations in a completely systematic way and some
guesswork will be required.
Nevertheless we will see that with some very natural assumptions (and the benefit of
hindsight) we will arrive at an essentially unique set of equations. Further theoretical
(and aesthetical) confirmation for these equations will then come from the fact that they
turn out to be the Euler-Lagrange equations of the absolutely simplest action principle
for the metric imaginable.
Recall that, way back, in section 1.1, we had briefly discussed the possibility of a scalar
relativistic theory of gravity described by an equation of the form (1.3)
We had noted there that one way to render this equation (tensorially) consistent is to
think of both the left and the right hand side as (00)-components of some tensor, which
we expressed in (1.6) as
While this appeared to be an exotic proposal back in section 1.1, we now understand
that this is exactly what is required, and we have a fairly precise idea of what this tensor
on the left-hand side should be.
Indeed, recall from our discussion of the Newtonian limit of the geodesic equation that
the weak static field produced by a non-relativistic mass density is
g00 = (1 + 2) , (18.3)
357
This suggests that the weak-field equations for a general energy-momentum tensor take
the form
where E is constructed from the metric and its first and second derivatives.
By the Einstein equivalence principle, if this equation is valid for weak fields (i.e. near
the origin of an inertial coordinate system) then also the equations which govern gravi-
tational fields of arbitrary strength must be of this form, with E a tensor constructed
from the metric and its first and second derivatives.
Another way of anticipating what form the field equations for gravity may take is via
an analogy, a comparison of the geodesic deviation equations in Newtons theory and
in General Relativity. Recall that in Newtons theory we have
d2 i
x = K ij xj
dt2
K ij = i j , (18.7)
(D )2 x = K x
K = R x x . (18.8)
Tr K = 4GN , (18.9)
Tr K = R x x . (18.10)
This suggests that somehow in the gravitational field equations of General Relativity,
should be replaced by the Ricci tensor R . Note that, at least roughly, the tensorial
structure of this identification is compatible with the relation between and g00 in
the Newtonian limit, the relation between and the 0-0 component T00 of the energy
momentum tensor, and the fact that for small velocities R x x R00 .
We will now turn to a somewhat more precise argument along these lines which will
enable us to determine E .
1. E is a tensor
358
2. E has the dimensions of a second derivative. If we assume that no new dimen-
sionful constants enter in E then it has to be a linear combination of terms
which are either linear in second derivatives of the metric or quadratic in the first
derivatives of the metric. (Later on, we will see that there is the possibility of
a zero derivative term, but this requires a new dimensionful constant, the cos-
mological constant . Higher derivative terms or higher non-linearities could in
principle appear but would only be relevant at very high energies.)
T = 0 E = 0 . (18.11)
5. Finally, for a weak static gravitational field and non-relativistic matter we should
find
E00 = g00 . (18.12)
Now it turns out that these conditions (1)-(5) determine E uniquely! First of all, (1)
and (2) tell us that E has to be a linear combination
E = aR + bg R , (18.13)
where R is the Ricci tensor and R the Ricci scalar. Then condition (3) is automatically
satisfied.
To implement (4), we rewrite the above as a linear combination of the Einstein tensor
(7.92) and g R,
E = aG + cg R a(R 21 g R) + cg R (18.14)
G = 0 . (18.15)
Thus, R is proportional to T and since this quantity need certainly not be constant
for a general matter configuration, we are led to the conclusion that c = 0. Thus we
find
E = aG . (18.17)
We can now use condition (5) to determine the constant a.
359
18.3 Newtonian Weak-Field Limit
By the above considerations we have determined the field equations to be of the form
aG = 8GN T , (18.18)
with a some, as yet undetermined, constant. We will now consider the Newtonian
weak-field limit of this equation. We need to find that G00 is proportional to g00 and
we can then use the condition (5) to fix the value of a. The following manipulations
are somewhat analogous to those we performed when considering the Newtonian limit
of the geodesic equation. The main difference is that now we are dealing with second
derivatives of the metric rather than with just its first derivatives entering in the geodesic
equation.
First of all, for a non-relativistic system we have |Tij | T00 and hence |Gij | |G00 |.
Therefore we conclude
|Tij | T00 Rij 12 gij R . (18.19)
Next, for a weak field we have g = + h with h small (in a suitable sense) and,
in particular,
R R = Rkk R00 , (18.20)
which, together with (18.19), translates into
R 23 R R00 . (18.21)
or
R 2R00 . (18.22)
In the weak field limit, R00 in turn is given by
R00 = Rk0k0 = ik Ri0k0 . (18.23)
Moreover, in this limit only the linear (second derivative) part of R will contribute,
not the terms quadratic in first derivatives. Thus we can use the expression (7.13) for
the curvature tensor. Additionally, in the static case we can ignore all time derivatives.
Then only one term (the third) of (7.13) contributes and we find
Ri0k0 = 12 g00 ,ik , (18.24)
and therefore
R00 = 12 g00 . (18.25)
Putting everything together, we get
G00 = (R00 12 g00 R) = (R00 12 00 R)
(18.26)
= (R00 + 12 R) = (R00 + R00 ) = g00
(see also section 22.3 for a somewhat more streamlined and covariant derivation of
this result). Thus we obtain the correct functional form of E00 and comparison with
condition (5) determines a = +1 and therefore E = G .
360
18.4 Einstein Equations
We have finally arrived at the Einstein equations for the gravitational field (metric) of
a matter-energy configuration described by the energy-momentum tensor T . It is
R 12 g R = 8GN T (18.27)
These are the equations that replace the Newtonian (Poisson) equation for the gravita-
tional potential.
Another common way of writing the Einstein equations is obtained by taking the trace
of (18.27), which yields
R = 8GN (T 21 g T ) . (18.29)
T = 0 R = 0 , (18.30)
G = 0 R = 0 . (18.31)
A space-time metric satisfying this equation is, for obvious reasons, said to be Ricci flat.
This is a tremendously complicated set of equations, and trying to learn and say some-
thing about general properties of solutions to these equations is very challenging.32
32
The current state of knowledge and understanding of the mathematical structure of the Einstein
equations, in particular regarding the properties of the Cauchy (initial value) problem for the Einstein
equations, is described in detail in the awe-inspiring uvre General Relativity and the Einstein Equations
by Y. Choquet-Bruhat (warning: not for the faint of heart). A readable historical introduction to the
Cauchy problem for the Einstein equations is given by her in Y. Choquet-Bruhat, Beginnings of the
Cauchy problem, arXiv:1410.3490 [gr-qc].
361
Even the vacuum Einstein equations still constitute a complicated set of non-linear
coupled partial differential equations whose general solution is not, and probably will
never be, known. Usually one makes some assumptions, in particular regarding the
symmetries of the metric, that reduce the number of independent variables from 10
functions g (x) of 4 variables to a smaller number of functions depending on a smaller
number of variables, and which then simplify the equations to the extent that they can
be analysed explicitly, either analytically, or at least qualitatively or numerically. How
to do this in practice (in the simplest non-trivial situations), will be explained in detail
later on in these notes.
Remarks:
1. With c not set equal to one, and with the convention that T00 is normalised such
that it gives the energy-density rather than the mass-density, one finds that the
factor 8GN on the right hand side should be replaced by
8GN
8GN . (18.32)
c4
A note on dimensions: Newtons constant has dimensions (M mass, L length, T
time) [GN ] = M1 L3 T2 so that
Thus
[GN /c4 ] = L2 = [R ] , (18.35)
as it should be. Frequently, an alternative (and equally reasonable) convention
is used in which T00 is a mass density, so that then Ttt = c2 T00 is the energy
density. In that case, the factor on the right-hand side of the Einstein equations
is 8GN /c2 .
2. The streamlined derivation of the Einstein equations given here may give the
misleading impression that also for Einstein this was a straighforward affair. Noth-
ing could be further from the truth. Not only do we have the benefit of hindsight.
We also have a much more systematic and advanced understanding of Rieman-
nian geometry and tensor calculus than was available to Einstein at the time.
This concerns in particular things like the contracted Bianchi identities and their
importance for energy-momentum (non-)conservation and for general covariance
(to be briefly discussed in section 18.7 below).33
33
For an illuminating brief account of the torturous and convoluted route and crucial final stages
that led Einstein (and Grossmann) to the correct field equations, see N. Straumann, Einsteins Z
urich
Notebook and his Journey to General Relativity, arXiv:1106.0900v1 [physics.hist-ph].
362
3. As an aside, note that the trace (18.28) of the Einstein equations
R = 8GN T , (18.36)
is a scalar generally covariant differential equation for the metric (but it is of course
far from sufficient to determine 10 independent components of the metric up to
coordinate transformations). If one assumes, however, that the space-time metric
can be parametrised by a single scalar , say (somewhat like in the Newtonian
limit), e.g. by stipulating that it only differs from the Minkowski metric by a
conformal factor, as in
g = 2 , (18.37)
then a scalar equation like (18.36) (the numerical constant needs to be adjusted
appropriately in order to obtain the correct Newtonian limit) provides a differential
equation for and thus a generally covariant scalar theory of gravity. A theory
of this kind, a geometrisation and covariantisation of previous scalar theories of
gravity, was proposed by Einstein and Fokker in 1913/14, some two years before
Einstein arrived at the final (tensorial) form of the field equations. In this theory,
there is no coupling of gravity and Maxwell theory (which has a traceless energy-
momentum tensor), and null lines are identical to null lines in Minkowski space
(because of conformal flatness), so for either of these reasons there is no bending
of light rays by the gravitational field in such a theory.
4. As we saw before, in two and three dimensions, vanishing of the Ricci tensor
implies the vanishing of the Riemann tensor. Thus in these cases, space-times are
necessarily flat away from where there is matter, i.e. at points at which T (x) = 0.
Thus there are no true gravitational fields and no gravitational waves.
In four dimensions, however, the situation is completely different. As we saw,
the Ricci tensor has 10 independent components whereas the Riemann tensor has
20. Thus there are 10 components of the Riemann tensor which can curve the
vacuum, as e.g. in the field around the sun, and a lot of interesting physics is
already contained in the vacuum Einstein equations.
The Einstein tensor, i.e. the (unique) rank-2 tensor that can be constructed
from the Riemann curvature tensor which has vanishing covariant divergence,
has the same form in any dimension, G = R (1/2)g R.
Likewise, what will appear on the right-hand side of the equations is the
appropriate generally covariant energy-momentum tensor.
363
However, in the higher-dimensional analogue of the Einstein equations the
constant of proportionality between the Einstein and energy-momentum ten-
sors should not be called 8GN . After all, this factor was determined from
the Newtonian limit of the (3 + 1)-dimensional Einstein equations where e.g.
a factor 4 has, via the Poisson equation for a point mass, its origin in the
fact that the area of a unit 2-sphere is 4. Thus we will just call it (which
is then related in a dimension-dependent way to however one wants to nor-
malise the D-dimensional gravitational coupling constant). Thus precisely
as in 4 dimensions one can write the Einstein equations as
1
G R g R = T . (18.38)
2
If one wants to use the analogue of (18.29), one should pay attention to the
fact that it is less symmetric with respect to (18.38) than its 4-dimensional
counterpart since in D = n + 1 dimensions it takes the form
1
R = (T g T ) . (18.39)
D2
As mentioned before, there is one more term that can be added to the Einstein equations
provided that one relaxes the condition (2) that only terms quadratic in derivatives
should appear. This term takes the form g . This is compatible with the condition
(4) (the conservation law) provided that is a constant, the cosmological constant. it
is a dimensionful parameter with dimension [] = L2 one over length squared.
R 12 g R + g = 8GN T . (18.40)
To be compatible with condition (5) ((1), (3) and (4) are obviously satisfied), has to
be quite small (and observationally it is very small indeed).
Remarks:
R 21 g R = g . (18.41)
R = g , (18.42)
364
2. In general, solutions to the equation R = cg for some constant c (and ei-
ther Riemannian or Lorentzian signature) are known as Einstein manifolds in the
mathematics literature.
4. Comparing with the energy-momentum tensor of, say, a perfect fluid (see (6.70)
in section 6.5 or section 34.2),
T = ( + p)u u + pg , (18.43)
Thus, depending on the sign of either the energy density or the pressure is
negative,
< 0 < 0 , > 0 p < 0 . (18.46)
6. However, things are not as simple as that. Just because it is not required does not
mean that it is not there. In fact, one of the biggest puzzles in theoretical physics
today is why the cosmological constant is so small. According to standard quan-
tum field theory lore, the vacuum energy density should be many many orders of
magnitude larger than astrophysical observations allow. Now usually in quantum
field theory one does not worry too much about the vacuum energy as one can
normal-order it away. However, as we know, gravity is unlike any other theory in
that not only energy-differences but absolute energies matter (and cannot just be
dropped).
The question why the observed cosmological constant is so small (and recent
astrophysical observations appear to favour a tiny non-zero value) is one aspect of
365
what is known as the Cosmological Constant Problem. See section 37.4 for a brief
discussion of this profound issue and some references.
7. We will consider the possibility that 6= 0 only in the sections on cosmology (in
all other applications, can indeed be neglected).
can, taken at face value, be regarded as ten algebraic equations for certain traces of
the Riemann tensor R . R has, as we know, twenty independent components,
so how are the other ten determined? The obvious answer, already given above, is of
course that we solve the Einstein equations for the metric g and then calculate the
Riemann curvature tensor of that metric.
However, this answer leaves something to be desired because it does not really provide
an explanation of how the information about these other components is encoded in the
Einstein equations. It is interesting to understand this because it is precisely these
components of the Riemann tensor wich represent the effects of gravity in vacuum, i.e.
where T = 0, like tidal forces and gravitational waves.
The more insightful answer is that the information is encoded in the Bianchi identities
which serve as propagation equations for the trace-free parts of the Riemann tensor
away from the regions where T 6= 0.
Let us see how this works. Recall from section 10.4 the decomposition of the Riemann
tensor into the traceless Weyl tensor and the trace parts, the Ricci tensor and Ricci
scalar,
C = R
1
(g R + R g g R R g )
D2
1
+ R(g g g g ) .
(D 1)(D 2)
As anticipated, the Weyl tensor thus encodes the information about the gravitational
field in vacuum.
366
that question we make use of the relation (10.87) derived in section 10.4,
C = (D 3) ( P P ) . (18.49)
Using the D-dimensional Einstein equations (18.38), (18.39) to replace the Ricci ten-
sor and Ricci scalar by the energy-momentum tensor, one now obtains a propagation
equation for the Weyl tensor of the form
C = J , (18.51)
F = J , (18.53)
and provides an intuitive (as well as, if required, detailed anlaytical) understanding of
the propagation properties of the gravitational field.
Let us try to understand in a bit more detail, but necessarily at a very superficial and
unsophisticated level, the structure of the Einstein equations.
As a first step, let us do something that we should have perhaps done rightaway, namely
count the number of dynamical variables and the number of equations we have:
the dynamical variables are the components g (x) of the metric, i.e. 10 functions
of 4 variables.
the Ricci or Einstein tensor is symmetric; therefore the Einstein equations consi-
tute a set of ten algebraically independent second order differential equations for
the metric g .
At first, this ten 2nd order equations for ten unknowns looks exactly right: specifying
the values of the metric and its first time-derivative as initial values on some (constant
time) hypersurface, say, this should then uniquely determine the ten components of
the metric in some region to the future of that hypersurface.
367
At second sight, however, this cannot possibly be right and the end of the story and, if
true, would actually be a major disaster. After all, the Einstein equations are generally
covariant. Thus, given one metric that is a solution to the Einstein equations, one should
be able to perform an arbitrary coordinate transformation and still have a (physically
equivalent) solution to the Einstein equations. That means that the (ten?) Einstein
equations should not determine the ten components of the metric uniquely but only
up to arbitrary coordinate transformations, i.e. up to four arbitrary functions of four
variables.
Phrased in terms of initial values, one should be able to perform arbitrary time-dependent
coordinate transformations on a solution, but if these coordinate transformations hap-
pen to be the identity transformation on the initial hypersurface, then these solutions
related by (future) coordinate transformations should arise from the same initial data.
Either way we should expect only six independent generally covariant equations for the
metric, determining the 10 components of the metric up to 4 arbitary functions. How
does that happen? Here we should recall the contracted Bianchi identities. They tell
us that
G = 0 . (18.54)
We see that, even though the ten Einstein equations are algebraically independent, there
are actually four differential relations among them, so this is just right.
It is no coincidence, by the way, that the Bianchi identities come to the rescue of
general covariance. We will see in section 19.6 that the Bianchi identities can in fact be
understood as a consequence of the general covariance of the Einstein equations (and
of the corresponding action principle).
The general covariance of the Einstein equations is reflected in the fact that only six
of the ten equations are truly dynamical 2nd-order differential equations while four of
them constrain the initial values of the fields on some spacelike hypersurface. Indeed, in
terms of some choice (x ) = (t, xk ) of time and space coordinates, the Bianchi identities
G = G + G + G = 0 (18.55)
can be written as
t Gt = k Gk G G . (18.56)
Since the 3 terms on the right-hand side contain at most 2nd time derivatives of the
metric, the 4 components Gt of the Einstein tensor can contain at most 1st time
derivatives of the metric. Thinking of intitial data as being given by the metric and
its 1st time-derivative on some initial hypersurface, this means that the components
Gt = 0 of the Einstein equations (or their counterpart in the presence of matter)
impose constraints on these initial data and do not provide evolution equations for
these initial data.
368
The perhaps more familiar counterpart of these constraints in the case of Maxwell theory
~ E
is the Gauss Law constraint . ~ = 0, which arises as the 0-component of the Maxwell
equations a F ab = 0,
~ E
a F a0 = k F k0 = . ~ =0 , (18.57)
and which also involves at most 1st time-derivatives of the dynamical field (the gauge
field), and thus constitutes a constraint on the initial conditions rather than a true
evolution equation.
a b F ab = 0 0 (i F i0 ) = k (a F ak ) (18.58)
implies
1. that the 4 Maxwell equations are not independent (as required by gauge invariance
as they should only determine the 4 components Aa of the gauge field up to gauge
transformations) and
2. that the Gauss Law contraint equation is propagated, i.e. that by virtue of the
true equations of motion it will hold at all times if it holds initially:
Analgously, for the Einstein equations the contracted Bianchi identity in the form
(18.56) implies not only 4 relations among the 10 field equations (as required by general
covariance) but also that the constraints of general relativity are again propagated
in this sense. One simple way to see this (or that this is plausible, at least - in order
to prove a theorem one would need te be more precise about the initial value formula-
tion and make sure that it leads to a well-defined time evolution etc.) is to note that
by(18.56) G = 0 at t = t0 (thus also k G = 0 at t = t0 ) implies
We will discuss this and related issues in some more detail from a slightly different
(Hamiltonian) perspective in section 20.
369
19 Einstein Equations from an Action Principle
To increase our confidence that the Einstein equations we have derived above are in fact
reasonable and almost certainly correct, we can adopt a more modern point of view.
We can ask if the Einstein equations follow from an action principle or, alternatively,
what would be a natural action principle for the metric.
After all, for example in the construction of the Standard Model, one also does not start
with the equations of motion but one writes down the simplest possible Lagrangian with
the desired field content and symmetries.
We will start with the gravitational part, i.e. the Einstein tensor G of the Einstein
equations, and deal with the matter part, the energy-momentum tensor T , later.
By general covariance, an action for the metric g will have to take the form
Z
4
S= gd x (g ) , (19.1)
where is a scalar constructed from the metric. So what is going to be? Clearly,
the simplest choice is the Ricci scalar R, and this is also the unique choice if one is
looking for a scalar constructed from not higher than second derivatives of the metric.
Therefore we postulate the beautifully simple and elegant action
Z
SEH [g ] = gd4 x R (19.2)
Hilbert was not the first to apply this principle to gravitation. Lorentz had
done it before him. So had Einstein, a few weeks earlier. Hilbert was the
first, however, to state this principle correctly.34
34
A. Pais, Subtle is the Lord (chapter 14.d, which also contains a detailed account of the interaction
between Einstein and Hilbert in the crucial November 1915 period).
370
We will now prove that the Euler-Lagrange equations following from the Einstein-Hilbert
Lagrangian indeed give rise to the Einstein tensor and the vacuum Einstein equations.
It is truly remarkable, that such a simple Lagrangian is capable of explaining practically
all known gravitational, astrophysical and cosmological phenomena.
Before turning to a proof of this statement, I need to make one preliminary remark:
In this discussion we will at first ignore total derivative (or boundary) terms that one
picks up from partial integrations of the variations and concentrate on the bulk Euler-
Lagrange eqations of motion. In standard variational problems one usually justifies
this by appealing to the fact that one can e.g. choose the variations of the fields to
vanish on the boundary and that therefore such boundary terms are zero. In the case
at hand, things are a bit more complicated since the boundary terms that one picks up
in the process of performing the variations turns out to depend both on the variation
of the field (i.e. the metric) on the boundary and on its normal derivative, and it is
not consistent to require both to be zero (i.e. to impose both Dirichlet and Neumann
boundary conditions). This whole issue is interesting in its own right and warrants a
separate discussion, and therefore we will deal with it afterwards, in sections 19.4 and
19.5.
Returning to the Einstein-Hilbert action, we now need to determine its behaviour under
a variation of the metric. Since the Ricci scalar is R = g R , it turns out to be more
convenient to consider variations g of the inverse metric instead of g . This is of
course equivalent, the two variations being related by
(g g ) = ( ) = 0 g = g (g )g . (19.3)
to deduce
Z
SEH = gd4 x [( 21 g R + R )g + g R ]
Z Z
4 1 4
= gd x (R 2 g R)g + gd x g R . (19.6)
The first term all by itself would already give the Einstein tensor. Thus we need to show
that the second term is a total derivative. I do not know of any particularly elegant
371
argument to establish this (in a coordinate basis - written in terms of differential forms
this would be completely obvious), so this will require a little bit of work, but it is not
difficult.
Postponing the proof of this statement to the next section 19.2, we have established
that (ignoring boundary terms) the variation of the Einstein-Hilbert action gives the
gravitational part (left hand side) of the Einstein equations,
Z Z I
4 4
SEH [g ] = gd x R = gd x G g + . . . . (19.7)
Remarks:
1. If one wants to include the cosmological constant , then the action gets modified
to Z
4
SEH, = gd x (R 2) . (19.8)
G = R 12 g R R 21 g (R 2) = G + g , (19.9)
2. Of course, once one is working at the level of the action, it is easy to come up
with covariant generalisations of the Einstein-Hilbert action, such as
Z
4
S= gd x (R + c1 R2 + c2 R R + c3 R R + c4 RR + . . .) , (19.10)
372
priori totally non-obviously) leads to equations of motion that are no higher than
2nd order in derivatives.35
The purpose of this technical appendix to the previous section is to derive a formula
for the metric variation of the Ricci tensor which shows that indeed g R is a total
derivative.
First of all, we need the explicit expression for the Ricci tensor in terms of the Christoffel
symbols, which can be obtained by contraction of (7.5),
R = + . (19.12)
Now we need to calculate the variation of R . We will not require the explicit expression
in terms of the variations of the metric, but only in terms of the variations induced
by the variations of the metric. This simplifies things considerably.
R = + + . (19.13)
Now the crucial observation is that is a tensor. This follows from the arguments
given in section 4.4, but I will repeat it here in the present context. Of course, we know
that the Christoffel symbols themselves are not tensors, because of the inhomogeneous
(second derivative) term appearing in the transformation rule under coordinate trans-
formations, but this term is independent of the metric. Thus the metric variation of the
Christoffel symbols indeed transforms as a tensor.
This can also be confirmed by explicit calculation. Just for the record, I will give an
expression for which is easy to remember as it takes exactly the same form as
the definition of the Christoffel symbol, only with the metric replaced by the metric
variation and the partial derivatives by covariant derivatives, i.e.
= 21 g ( g + g g ) . (19.14)
It turns out, none too surprisingly, that R can be written rather compactly in terms
of covariant derivatives of , namely as
R = . (19.15)
Thus one simply needs to replace the partial derivatives in (19.13) by covariant deriva-
tives and drop the other terms that involve the undifferentiated Christoffel symbols.
35
For a review of these so-called Lanczos-Lovelock models, see e.g. T. Padmanabhan, D. Kothawala,
Lanczos-Lovelock models of gravity, arXiv:1302.2151 [gr-qc].
373
In fact, this could not have been otherwise as (19.13) depends on the partial derivatives
of the but must at the same time be tensorial. The expression (19.15) is the unique
possibility that fulfills these requirements. If you dont trust this argument (which
esentially amounts to working at the origin of an inertial coordinate system where
partial = covariant derivatives), you can also check this in detail (and thus perhaps in
this way learn to trust and appreciate the quick argument):
As a first check on (19.15), note that the first term on the right hand side is manifestly
symmetric and that the second term is also symmetric because of (4.48) and (5.70). To
establish (19.15), one simply has to use the definition of the covariant derivative. The
first term is
= + , (19.16)
which takes care of the first, fourth, fifth and sixth terms of (19.13). The remaining
terms are
+ = , (19.17)
which establishes (19.15).
What we really need is the contraction g R , which we can now write as
g R = (g ) (g )
(19.18)
= g g .
This establishes the claim that this term is a total derivative and hence gives rise to
a boundary term in the variation of the Einstein-Hilbert action, a boundary term that
does, however, require further discussion - see sections 19.4 and 19.5 below.
Using the explicit expression for given above, we see that we can also write (19.18)
rather neatly and explicitly as
g R = ( g )g
= (g g g g ) g (19.19)
= (g g g g ) g .
This result will turn out to be useful on a couple of occasions later on in these notes,
e.g. for the discussion of Noether currents associated to general covariance in section
19.6 and for the derivation of the energy-momentum tensor of a non-minimally coupled
scalar field in section 21.3.
One can also use the identity (19.19) to rather painlessly determine the metric vari-
ation of some more complicated Lagrangians of the type (19.10). Consider e.g. the
class of Lagrangians known as F (R) Lagrangians where the Lagrangian is (none too
surprisingly) some function F (R) of the scalar curvature R,
Z
4
S= gd x F (R) (19.20)
374
(for no particularly compelling reason, at least as far as I can see (it can be done is
not a compelling reason . . . ), a lot of work has been dedicated to such Lagrangians in
the last ten years, as a quick look at the arXiv will reveal).
R = R g ( g 2)g , (19.22)
and assuming there are no (or ignoring) boundary terms, so that we can integrate by
parts the differential operator acting on g and let it act on F (R) instead, one finds
Z
1
S = g g F (R) + F (R)R ( g 2)F (R) g . (19.23)
2
From this one can immediately read off the vacuum field equations.
One evident consequence of this is that for non-pathological choices of F (R), a solution
of the vacuum Einstein equations (R = 0, R = 0, such as the Schwarzschild solution)
will continue to be a solution of this F (R)-gravity theory, so that such proposed modi-
fications of the Einstein-Hilbert action do not immediately run afoul of precision solar
system tests of general relativity.
In order to obtain the non-vacuum Einstein equations, we need to decide what the matter
Lagrangian should be. Now there is an obvious choice for this. If we have matter, then
in addition to the Einstein equations we also want the equations of motion for the matter
fields. Therefore we should add to the Einstein-Hilbert action the standard minimally
coupled matter action
Z
4
SM [, g ] = gd x LM ((x), (x), . . . ; g (x), g (x), . . .) , (19.24)
representing any kind of (scalar, vector, tensor, . . . ) matter field, obtained by suitable
covariantisation of the corresponding matter action in Minkowski space via the principle
of minimal coupling (section 5). Thus e.g. the matter action for a Klein-Gordon field
would be (5.11),
Z
4 h 1 i
SM [, g ] = gd x 2 g 12 m2 2 , (19.25)
375
Of course, the variation of the matter action with respect to the matter fields will
give rise to the covariant equations of motion of the matter fields. If we now want
to derive the coupled gravity-matter equations from a variational principle, then the
matter contribution to the gravitational field equations (i.e. the source terms for the
gravitational field) will necessarily be given by the metric variation of the matter action.
As already discussed in detail in section 6.6, we may as well simply define the covariant
energy-momentum tensor to be the source of the gravitational field equations (6.105),
Z
4 2
1
metric SM [, g ] = 2 gd x T g T := SM [, g ] ,
g g
(19.27)
In particular, we had already seen in section 6.6 that this definition reproduces the
known results in the case of a scalar field or Maxwell theory, and that in general it
automatically gives a symmetric and gauge invariant tensor without the need for some
improvement procedure. It is also automatically covariantly conserved on-shell as a
consequence of general covariance of the matter action (cf. section 19.6).
Remarks:
1. If one were to try to deduce the gravitational field equations by starting from a
variational principle, i.e. by constructing the simplest generally covariant action
for the metric and the matter fields (and this would be the modern approach to
the problem, had Einstein not already solved it for us a 100 years ago), then one
would also invariably be led to the above action.
The relative numerical factor 16GN between the two terms would of course then
not be fixed a priori, because this approach will not (and cannot possible be
expexted to) determine Newtons constant. The prefactor could once again be
determined by looking at the Newtonian limit of the resulting equations of motion.
2. Typically, the above action principle will lead to a very complicated coupled system
of equations for the metric and the matter fields because the metric also appears
in the energy-momentum tensor and in the equations of motion for the matter
fields.
376
constant term to the matter Lagrangian instead (and this clearly reveals its inter-
pretation as a constant shift of the energy, e.g. by a vacuum energy contribution,
of the matter fields).
As we have seen above, the variation of the Einstein-Hilbert action leads to a boundary
term that depends not just on the metric but also on the derivatives of the metric. This
is related to the fact that the action itself depends also on the second derivatives of the
metric. Indeed, it follows from the explicit expression for the scalar curvature in terms
of the Christoffel symbols and the metric, obtained by contracting (7.5),
R = g ( + ) , (19.30)
that the Einstein-Hilbert action contains terms that are quadratic in first derivatives of
the metric, as well as terms that depend linearly on the second derivatives.
In ordinary Lagrangian field theory, such linear second derivative terms can usually be
introduced or eliminated (depending on what one wants to achieve) by the addition of
suitable boundary terms to the action. As an example, consider the action of a free
scalar field (in Minkowski space, say):
When one uses this action, the boundary term arising from the variation of the
action will depend on but not on its derivatives,
Z Z Z I
S[] = () ( ) = () d ( ) , (19.32)
where the boundary of the integration region is denoted by . This is thus the
appropriate action for
377
It will therefore give rise to the same Euler-Lagrange bulk equations of motion.
In this case, however, the boundary term arising from the variation of the action
will depend both on and its derivatives,
Z I
S1 [] = () + 2 d ( ) .
1
(19.36)
There are no obvious (or at least no obviously useful) boundary conditions com-
patible with this form of the action.
Its variation is Z I
S2 [] = () + d ( ) . (19.38)
In this case, the boundary term only depends on the normal derivative of the
variation,
d ( ) N , (19.39)
Let us now return to the case at hand, the action for gravity. In this case second
derivative terms are required by general covariance of the action (since there is no
scalar that can be constructed solely from the metric and its first derivatives). However,
the fact that that the second derivatives appear linearly and that g R is a total
derivative reflect the fact that these second derivatives are spurious in the sense that
they can be eliminated by an integration by parts or by adding a suitable boundary or
total derivative term, albeit at the expense of general covariance.
g = g g g = ( g + g ) (19.41)
and ( g) = g , one finds that the Lagrangian density gR can be written as
gR = 2 g g ( ) + ( g B ) (19.42)
where
B = g g . (19.43)
With due care, one can also write the total derivative term as
( g B ) = g B , (19.44)
378
as long as one remembers that B is not a tensor. Either way, we see that instead of the
generally covariant Einstein-Hilbert action one can use the non-covariant but quadratic
action
Z
4
SEH [g ] SE [g ] = gd x 2g ( ) . (19.45)
This action was originally considered by Einstein himself and is therefore also known as
the Einstein action.
in which the non-covariant terms are now manifestly confined to the boundary of the
region of integration. This is now a reasonably respectable action, but there is a more
attractive variant of this construction which we will discuss in the next section.
In section 19.1 we have seen that the metric variation of the Einstein-Hilbert action has
the form Z Z
4 4
SEH [g ] = gd x G g + gd x (B) , (19.47)
where
(B) = g g
(19.48)
= (g g g g ) g .
The reason for this notation, and the relation between this object (B) and the quan-
tity B introduced in (19.43) will be explained below. The first (bulk) term gives us as
the Euler-Lagrange equations the vacuum Einstein equations G = 0, while the second
term is a total derivative.
Thus, when one performs the integral over a space-time region V bounded by the hy-
persurface V = (which we shall assume to be spacelike or timelike), upon use of the
Gauss integral formula (15.47) one finds that the second (total derivative) term can be
written as Z I
4
gd x (B) = d (B)
V I (19.49)
= dn y h N (B) ,
where N is the normal vector to the boundary in V and hab is the induced metric
on the boundary. Explicitly the boundary integrand is
N (B) = (N g N g ) g . (19.50)
379
Using the decomposition
g = h + N N (19.51)
of the metric on (with N N = ), one sees that the terms with 3 N s cancel, and
one is left with
N (B) = N h g N h g . (19.52)
The first term only depends on the variations g on the boundary, and its tangential
derivatives h g . Therefore that term is zero if one imposes standard Dirichlet
boundary conditions
g | = 0 (19.53)
on the metric at . The second term, on the other hand, depends on g and its normal
derivative N g , and is therefore non-zero,
g | = 0 N (B) | = N h g | = h N g | . (19.54)
Therefore with Dirichlet boundary conditions the variation of the Einstein-Hilbert action
gives rise to a non-zero boundary term,
Z I
4
SEH [g ] = gd x G g + hd3 y (h N g ) . (19.55)
It is therefore not true that the variation of the Einstein-Hilbert action is the Einstein
tensor,
SEH 6= gG . (19.56)
g
In fact, the presence of this boundary term means that the functional SEH [g ] is not
even differentiable (in the sense of variational calculus).
The way to resolve this issue is, as for the scalar field discussed above, to add a suitable
boundary term to the action itself. This will not change the bulk variation, and it turns
out that e.g. for Dirichlet boundary conditions the boundary term can be chosen in such
a way that its variation cancels the boundary term above.
Actually, we already have one candidate for the boundary term, namely the one relating
the Einstein and Einstein-Hilbert actions in (19.46). The variation of the Einstein action
is
I
SE [g ] = SEH [g ] d B
Z I I
4
= gd x G g + dn y h N (B) ( dn y h N B ) .
(19.57)
Calculating the variation in the second term for Dirichlet boundary conditions on ,
one finds I I
n
( d y h N B ) = dn y h N B , (19.58)
380
with
B = (g g ) = g g = (B) . (19.59)
Thus for Dirichlet boundary conditions, the variation of B on the boundary equals the
quantity (B) (19.48) arising from the variation of the Einstein-Hilbert action. As
a consequence, there is no boundary term in the variation of the Einstein action for
Dirichlet boundary conditions,
Z
4
SE [g ] = gd x G g (19.60)
and the Einstein action leads to a well-defined variational principle (with a differentiable
action).
Although this is progress, the boundary term that one adds to the Einstein-Hilbert
action to obtain the Einstein action is not particularly attractive, in particular as it is
non-covariant not only with respect to bulk coordinate transformations but also with
respect to boundary coordinate transformations (i.e. the integrand is not a -scalar in
the terminology of section 14.1).
This is something one would have to live with if one could not do better. However,
such a boundary term achieving this is not unique as it is evidently only defined up
to terms whose variations vanish for Dirichlet boundary conditions, in particular up to
terms that only depend on the intrinsic geometry of . Among these candidates there
is a preferred, geometrically transparent, boundary term, the Gibbons-Hawking-York
boundary term, I
SGHY [g ] = 2 hd3 y K . (19.61)
Here
K = hab Kab = h N
= (g + N N ) N (19.62)
= g N = N
is the trace of the extrinsic curvature Kab of (discussed in section 17).
One way to prove that this is a good boundary term is to determine its variation and
to show that it cancels against the variation arising from the Einstein-Hilbert action for
Dirichlet boundary conditions. Alternatively one should show that the above boundary
term differs from the boundary term for the Einstein action only by expressions that
depend on the metric g and its tangential derivatives.
Since usually one does the former, let us do the latter. The difference between the two
boundary integrands is
2K + N B = 2 N + N B . (19.63)
381
A calculation identical to the one leading to N (B) in (19.52) shows that
N B = N h g N h g . (19.64)
Here the first term only depends on the metric and its tangential derivatives, while the
second term involves normal derivatives of the metric. On the other hand, for K we
have
2K = h LN g = h N g + 2h N , (19.65)
where we have used the non-covariant way (8.37) of writing the Lie derivative,
LN g = N g + N g + N g . (19.66)
Thus the first term cancels the normal derivative term in (19.64) and the remaining
terms in (19.64) and (19.65) only involve fields g , i.e. h and N , and their tangential
derivatives that are fixed on the boundary.
Remarks:
1. In addition, one can add terms to the action that do not depnd on the dynam-
ical fields (as this will certainly change neither the variation with respect to the
dynamical fields nor the equations of motion). A common choice is a kind of
background subtraction, designed to associate the numerical value S = 0 to a
particular background metric g 0 .36 Thus one could define the physical action
to be
0
S[g ] = Sg [g ] Sg [g ] . (19.69)
In particular, if one is interested in asymptotically flat space-times, say, then the
appropriate reference background metric is just the flat Minkowski metric, and
then (19.69) takes the simple form
Z I
4
S[g ] = gd x R + 2 hd3 y (K K 0 ) , (19.70)
36
See e.g. S. Hawking, G. Horowitz, The Gravitational Hamiltonian, Action, Entropy, and Surface
Terms, arXiv:gr-qc/9501014, for a brief discussion.
382
where K 0 is the trace of the extrinsic curvature of the boundary (isometrically)
embedded into Minkowski space. In section 20.12 we will be led to a similar
subtraction prescription at the level of the Hamiltonian.
2. Another way to motivate (or arrive at) the Gibbons-Hawking-York boundary term
is to start from the decomposition (17.73)
+ (K 2 K K ) + 2 (N N N N ) .
R=R (19.71)
of the Ricci scalar provided by the Gauss-Codazzi equations. This turns out to be
a convenient starting point for the canonical (Hamiltonian) formulation of general
relativity, and we will therefore discuss this in that context in section 20.2.
The variational (i.e. action or Lagrangian based) formulation of general relativity has a
number of significant conceptual and technical advantages, and we will explore some of
them in this section.
I mentioned before, in section 18.7, that it is no accident that the Bianchi identities
come to the rescue of the general covariance of the Einstein equations in the sense
that they reduce the number of independent equations from ten to six. We will now
see that indeed the Bianchi identities are a consequence of the general covariance
of the Einstein-Hilbert action.
Virtually the same calculation will show that the covariant (metric, Hilbert)
energy-momentum tensor, as defined above, is automatically conserved (on shell)
by virtue of the general covariance of the matter action.
This argument can also be turned around to show that (generically, at least) the
Einstein equations imply the matter equations of motion, a very characteristic
feature of generally covariant gravitational field equations.
Simple variants of these arguments will also provide us with the Noether currents
associated with the general covariance of the Einstein-Hilbert action.
Analogous considerations, but now applied to the minimally coupled generally co-
variant matter action, will provide us with some insight into the relation between
the (Belinfante improved) canonical and covariant energy-momentum tensors in-
troduced in section 6.
To set the stage, we need to discuss how to express general covariance of an action,
either of the Einstein-Hilbert gravitational action SEH [g ] or of some (minimally)
383
gravitationally coupled general covariant matter action SM [, g ], in a form that allows
us to explore its consequences in a Lagrangian formalism.
At first, the statement that an action is generally covariant means that it is invariant
under transformations x x of the coordinates (for present purposes it will for once
be more convenient to use the same indices on the old and new coordinates and to distin-
guish transformed objects by primes), and the accompanying tensorial transformation
of the fields, given e.g. by
x x (x) (x ) = (x)
x x (19.72)
g (x) g (x ) = g (x) .
x x
So far, this is utterly familiar, but since in the action one is integrating over the co-
ordinates x , say, we would like to express this invariance not as a statement between
transformed fields at x and the old fields at x but in terms of transformations of the
fields at a point x (which we can then plug into the Lagrangian or action). This means
that we do not want to consider the transformation (x) (x ) (and its counterpart
for the metric) but rather the transformations
(x) (x) ,
g (x) g (x) . (19.73)
x = x + (x) (19.74)
or (suppressing the )
x = (x) , (19.75)
the infinitesimal variations of the fields are then precisely the Lie derivatives of the fields
discussed in section 8,
(x) = L (x) = (x) (19.76)
As a reminder, a quick way to derive this transformation of the metric is to start with
the tensorial transformation behaviour in the form
(g (x ) g
(x ))dx dx = g (x )dx dx g (x)dx dx , (19.78)
and to then apply this to the infinitesimal transformation (19.74). Expanding the
differentials
dx = dx + ( )dx (19.79)
384
and the components g (x ) of the metric to first order in ,
one finds (19.77) (in its not manifestly covariant form (8.37)).
To see that this indeed leads to a symmetry for any generally covariant action, i.e. any
action of the form Z
4
S= gd x L(x) (19.81)
where L is a scalar, note that for any density gF , F a scalar, one has, by the by now
familiar identity for the variation of g,
( gF ) = ( g)F + g F = 21 gg (L g )F + gL F
(19.82)
= g( )F + g F = ( g F ) ,
i.e.
( gF ) = ( g F ) = g ( F ) , (19.83)
a result previously obtained in (8.67). Thus the variation of a generally covariant action,
Z
4
S = gd x ( L) , (19.84)
SEH = 0 . (19.85)
385
for any metric variation. Combining these two facts we arrive at the conclusion
that Z
4
0 = SEH = gd x G g
Z
4
= 2 gd x G (19.87)
Z
4
= +2 gd x ( G ) .
Since this has to hold for all (vanishing on the boundary), we deduce
SEH = 0 G = 0 . (19.88)
Aa = a , (19.90)
386
Thus we now consider again the Einstein-Hilbert action, but now with s that are
allowed to be non-zero on the boundary,
Z
4
SEH = gd x ( R) . (19.92)
By explicitly performing this variation, as above, the bulk (Einstein tensor) term
is identically zero by the contracted Bianchi identity, but we obtain one total
derivative term from (19.19),
g R = [(g g g g ) L g ]
(19.93)
= [(g g g g ) ( + )] ,
and a second total derivative term
from the integration by parts performed in the course of the calculation in (19.87).
The term involving the scalar curvature is identical to, and cancels against, the
scalar curvature term arising from (19.92). One is thus left with the statement
that for any vector field and any integration domain one has
Z
4
gd x [2R + (g g g g ) ( + )] = 0 . (19.95)
J () = R + 12 (g g g g ) ( + ) . (19.96)
J () = ([ ] ) J () = 0 (19.97)
already mentioned in section 12.7. We thus learn that the generalised Komar
currents of that section are indeed, as anticipated there, precisely the identically
conserved Noether currents associated to the general covariance of the Einstein-
Hilbert action.
Note that, as mentioned above, it is a general feature of Noether currents as-
sociated to local (gauge) symmetries that they are in fact identically conserved:
the current J () (or its conterpart for some local symmetry of another theory)
cannot possibly be conserved for all possible (x) unless it is actually identically
conserved.37 In particular, this implies that the conserved charges associated with
these currents can always be expressed as surface integrals.
37
See e.g. B. Julia, S. Silva, Currents and Superpotentials in classical gauge invariant theories I:
Local results with applications to Perfect Fluids and General Relativity, arXiv:gr-qc/9804029 for a
rather explicit elementary argument, and R. Wald, On identically closed forms locally constructed
from a field, J. Math. Phys. 31 (1990) 2378, R. Wald, Black Hole Entropy is Noether Charge, arXiv:
gr-qc/9307038, V. Iyer and R. Wald, Some properties of Noether charge and a proposal for dynamical
black hole entropy, arXiv:gr-qc/9403028 and references therein for related considerations and further
developments.
387
3. On-Shell Covariant Conservation of the Energy-Momentum Tensor
Now let us play the same game with the matter action SM (19.24). Once again,
the variation SM , expressed in terms of the Lie derivatives L g and = L
of the matter fields should be identically zero, by general covariance of the matter
action (for the time being we again at first only consider which are such that any
boundary terms vanish). Proceeding as before, and using the definition (6.105) of
the energy-momentum tensor, we find
0 = SM
Z
4 LM
= gd x ( 21 T g + )
Z Z
4 4 LM
= gd x ( T ) + gd x . (19.98)
Now once again this has to hold for all , and as the second term is identically
zero on-shell, i.e. for satisfying the matter Euler-Lagrange equations of motion
LM / = 0, we deduce that
SM = 0 T = 0 on-shell . (19.99)
This should be contrasted with the contracted Bianchi identities which are valid
off-shell. The more general situation with the not restricted to vanish on the
boundary will be analysed in detail in section 21.2.
G = 8GN T T = 0 . (19.100)
F j j = 0 , (19.103)
388
but not the complete equations of motion of the matter fields.
To explain what is special about the generally covariant gravitational field equa-
tions in this respect, I will conclude this section with a quote by Misner, Thorne
and Wheeler, since I could not possibly state this more eloquently:
19.7 First Order Form of the Action, Torsion and the Palatini Principle
depends quadratically on the 1st derivatives of the gauge field Aa , and leads to the 2nd
order equations of motion
a F ab = 0 . (19.105)
To put this into 1st order form, one treats Aa and Fab Fab as a priori independent
fields and considers the action
Z
S[A, F] = d4 (a Ab )F ab + 14 Fab F ab
Z (19.106)
4 1 ab 1 ab
= d 2 (a Ab b Aa )F + 4 Fab F ,
whch depends purely algebraically on Fab and only linearly on the 1st derivatives of Aa .
The equations of motion arising from the variation of Aa are the 1st order equations
A a F ab = 0 . (19.107)
38
C. Misner, K. Thorne, J. Wheeler, Gravitation, section 20.6.
389
However these are not (yet) the vacuum Maxwell equations because Fab is not (yet) to
be identified with the Maxwell field stength tensor of Aa . This identification now results
from the (purely algebraic) equation of motion associated with variations of Fab ,
Plugging this result into the previous equations then gives rise to the standard Maxwell
equations (and plugging it into the 1st order action S[A, F] reduces it to the standard
Maxwell action S[A]).
Something similar (but more interesting and somewhat more subtle) can be done in the
case of general relativity.
are
Recall from sections 4.4 and 10.5 that a priori a metric g and a connection
independent concepts, and that the notion of curvature (curvature and Ricci tensors)
can be defined for an arbitrary connection,
=
R ()
+
= R ()
. (19.109)
, R ()
General relativity employs and is formulated in terms of the canonical Levi-Civita con-
= , characterised by the fact that
nection described by the Christoffel symbols
the connection is compatible with the metric and has no torsion. It is thus easy to
come up with various generalisations of general relativity in which these requirements
are relaxed. We will not get into these matters here.39
, with g and
for a (yet to be specified) class of connections to be treated
as independent variables. Since R () does then not depend on the metric, the ac-
tion depends purely algebraically on the metric, and on at most 1st derivatives of the
connection (linearly!).
One key simplification of this kind of action is that the variation with respect to the
metric is elementary (and identical to the variation of the gg terms of the Einstein-
39
Many of these generalisations (including theories with non-symmetric metrics) were originally
explored by Einstein and his collaborators in their futile and (at least by the 1930s) ill-motivated
attempts to find a unified field theory of gravity and Maxwell theory. For details, see e.g. the
review H. Goenner, On the History of Unified Field Theories, Living Rev. Relativity 7 (2004) 2,
http://www.livingreviews.org/lrr-2004-2.
390
Hilbert Lagrangian density gg R ), namely
Z
] = gd4 x R ()
g S[g , 1 g R()
g (19.111)
2
These are, however, not yet the vacuum Einstein equations because the independent
is not the Levi-Civita connection.
connection
It remains to look at the equations of motion imposed by stationarity of the action with
It turns out (the Palatini principle) that
respect to variations of .
1. if one chooses the connections to be torsion-free and imposes the -equations of
motion, then the connections are forced to also be compatible with the metric and
thus is uniquely determined to be the Levi-Civita connection
2. if one chooses the connections to be compatible with the metric and imposes the
-equations of motion, then the connections are forced to also be torsion-free and
thus is uniquely determined to be the Levi-Civita connection
In terms of the notation introduced in section 10.5, this amounts to the assertions
= 0 and S[g, ]
T () =0 =0
Q () =
(19.113)
= 0 and S[g, ]
Q () =0 =0
T ()
=
In either case, the metric equations of motion (19.112) then reduce to the vacuum
Einstein equations.
In order to establish the assertions (19.113), we need two preparatory results. The first
is that the generalisation of the formula (19.15),
R () = (19.114)
for the variation of the Ricci tensor in terms of the variation of the connection is
=
R ()
+
. (19.115)
The second is that when the connection is not the Levi-Civita connection, an expression
J is not a total derivative in the integral, this being only true for the Levi-Civita
like
connection thanks to the identity
Z Z
4
gd x J = d4 x ( gJ ) .
(19.116)
391
Writing
= + C ,
(19.117)
we have
V = V + C V
(19.118)
and, in particular,
J = J + C J ,
(19.119)
only the first term on the right-hand side giving rise to a total derivative.
and with the above results and notation we
What we are interested in is g R (),
can write this as
=g C C C C C C + C
g R ()
C
(19.120)
+ total derivative terms
=0
S[g, ] g C + g C C C = 0 . (19.122)
However, these equations do not determine the C uniquely (we will explicitly parametrise
this non-uniqueness below), and hence in this case the Einstein-Hilbert-like action
(19.110) alone does not give rise to acceptable equations of motion for the fields.
and
The situation changes if one imposes some a priori constraints on the allowed ,
hence on their variations C . We now consider separately the two cases mentioned
above:
C = C (19.123)
C = C (19.124)
2g C + g C + g C 2C 2C = 0 . (19.125)
392
Taking traces, once by contraction with g and once by contraction with g (or,
equivalently, with g ), one obtains two linearly independent conditions on the
traces C and C requiring both to vanish,
traces C = 0 , C = 0 . (19.126)
C + C = 0 C + C = 0 (19.127)
As we have seen in (10.102) of section 10.5, this is precisely the condition that the
non-metricity tensor is zero,
Q = C + C = 0 , (19.128)
C = C (19.129)
C = C (19.130)
Thus
C = C() + C[] = T() + 21 T (19.131)
is the contorsion tensor. In this case, anti-symmetrisation of (19.121) leads to
g C g C + C C = 0 . (19.132)
Taking traces, one finds C = 0 (the other trace C is identically zero because
of anti-symmetry), and thus (19.132) reduces to
C C = 0 C = C , (19.133)
T = C C = 0 . (19.134)
Since we started off with metric-compatible connection, this means that the equa-
tions of motion fix the connection to be the Levi-Civita connection. Alterna-
tively, (19.129) and (19.133) imply that C = 0. This concludes the proof of
the second assertion in (19.113).
393
Alternatively, and perhaps somewhat more insightfully, one can first determine the
general solution to the (under-determined) equation (19.122),
g C + g C C C = 0 , (19.135)
and then analyse the properties of the solution and the consequences of imposing some
conditions on C .40 To disentangle this equation, we proceed as in the proof of the
uniqueness of the Levi-Civita connection in section 4.4 and take sums and differences
of cyclic permutations of the above equation. Then one ends up with the equation
2C = g (A + B ) + g (A B ) + g (B A ) , (19.136)
where
A = C , B = C (19.137)
are two of the (a priori independent) traces of C . Performing either of these con-
tractions in (19.136), one finds the condition
A = B , (19.138)
and therefore
C = g A C = A . (19.139)
= + A ,
(19.140)
g = 2A g
(19.141)
and
= A A .
(19.142)
It is now obvious that requiring either metric compatibility or the symmetry of the
connection enforces A = 0 and thus = .
Remarks:
1. When one couples either of these theories to matter, one will find the standard
Einstein equations with source the usual matter energy-momentum tensor, pro-
vided that the minimally coupled matter action depends only on the metric and
not on the connection. As we have seen, this is satisfied in the case of scalar or
40
See e.g. A. Bernal et al., On the (non-)uniqueness of the Levi-Civita solution in the Einstein-Hilbert-
Palatini formalism, arXiv:1606.08756 [gr-qc] and references therein.
394
Maxwell gauge fields (for which the minimally coupled action in the usual setting
could be written in such a way that it depends only on the metric but not on the
derivatives of the metric). However, typically the connection appears explicitly in
the action for spinors, and in this case variation of the matter action will produce
a non-zero contribution to the torsion, say. Thus in that case the Einstein-Hilbert
approach (no torsion as a kinematical constraint) and the Palatini approach (tor-
sion determined dynamically) are inequivalent.
2. Within the present framework, it is not possible to relax both the no-torsion
and the metricity constraints simultaneously and to simultaneously regain them
dynamically, but one can attempt to achieve this either with the aid of additional
auxiliary (Lagrange multiplier) fields, or spontaneously by adding potentials
that force the connection in the ground state to either zero torsion or zero non-
metricity.41
3. Once one contemplates and permits the presence of non-metricity and/or tor-
sion, there are many more terms that one can in principle use to build an action
(by using scalars built from the torsion and non-metricity tensors T and Q
and their covariant derivatives). Thus, unless one imposes additional symmetry
requirements, say, there is no good reason to focus attention exclusively on the
Einstein-Hilbert-like action (19.110), and many generalisations of general relativ-
ity suggested and discussed in the literature can and should be either rejected or
amended simply on these grounds.42
41
See e.g. R. Percacci, Geometry of Nonlinear Field Theories for an exploration of some of these ideas.
42
In his review of gauge theories of gravity, F. Hehl, Gauge Theories of Gravity and Spacetime,
arXiv:1204.3672 [gr-qc], also emphasises this: Numerous pages of printed pages could be saved if
our colleagues would [. . . ] just motivate their choice of the unknown constants. In that review it is also
pointed out that what is generally referred to (and I also called and will continue to call) the Palatini
formalism should properly also be attributed to Einstein (1925).
395
20 Hamiltonian Formulation of General Relativity
No other body of work on classical general relativity has single-handedly had such an
impact and influence on research in this field: it awoke general relativity from its (to
a large extent rather uninspiring and uninspired) finding exact solutions phase; it
brought it to the renewed attention of a wider theoretical physics community since it
provided a field theorists analysis, perspective and understanding of the basic structures
of general relativity; it presented a clean and clear 1st-order (canonical) formulation of
the theory (the ADM formalism), which is crucial in understanding the Cauchy (initial
value) problem in general relativity and which also provided the basis for (in)numerous
subsequent attempts at a canonical quantisation of gravity; it provided groundbreaking
work and insights on questions related to the notions of energy and radiation in general
relativity, etc. etc.
In this section, I will sketch some aspects of the Hamiltonian or canonical formulation of
general relativity, without however attempting to develop this in a completely systematic
way and without being able to do justice to the depth and importance of this subject
and body of work.44
For a concrete illustration of some of the facts and statements encountered in this
section, see the discussion of the simple cosmological toy model (or minisuperspace
model in more fancy terminology) in sections 34.6 and 34.7.
The canonical formalism has been developed in particular with an eye towards canonical
quantisation of gravity and in recent years a variant of the ADM canonical variables
(the Ashtekar variables) has become very popular and forms the basis of the so-called
loop quantum gravity approach to quantum gravity (but I will have nothing more to
say about this in these notes).45
In this section, we will (have to) freely make use of the results on the geometry of
hypersurfaces obtained in sections 14 - 17, in particular the extrinsic geometry and
Gauss-Codazzi equations desribed in section 17.
43
This body of research is summarised in the 1962 article R. Arnowitt, S. Deser, C. Misner, The
Dynamics of General Relativity, kindly made available on the arXiv as arXiv:gr-qc/0405109.
44
See e.g. section 10 and Appendix E of R. Wald, General Relativity, or sections 3.6 and 4 of E.
Poisson, A Relativists Toolkit, or sections 3 and 4 of C. Kiefer, Quantum Gravity (2nd edition) for
modern textbook treatments of this subject.
45
See e.g. A. Ashtekar, Lectures on non-perturbative canonical gravity, or C. Rovelli, Quantum Gravity
for textbook accounts, as well as numerous review articles by these and other authors on the arXiv.
396
20.1 General Covariance and Constraints
We had previously seen in sections 18.7 and 19.6 that general covariance of the Einstein
equations is related to the Bianchi identities, i.e. to the existence of 4 differential rela-
tions among the 10 components of the Einstein equations. We had also seen in section
18.7 that this is reflected in the fact that the Bianchi identities imply that only six of the
ten equations are truly dynamical 2nd-order differential equations while four of them
constrain the initial values of the fields on some spacelike hypersurface.
We can also see this directly (i.e. without going via the Bianchi identities) from the
Gauss-Codazzi equations we derived in section 17.4. We choose a foliation of the space-
time by spacelike hypersurfaces and choose one of them as the surface on which
we will specify initial data, with the time direction pointing off (but not necessarily
normal to) the hypersurface . These initial data will be the spatial metric hab on
and something like the time-derivative of hab , i.e. something like the extrinsic curvature
tensor Kab . The Einstein equations should then evolve these initial data forward from
, i.e. they should determine the space-time metric g in such a way that hab is the
induced metric on and Kab its extrinsic curvature.
It turns out, however, that these initial data cannot be specified freely but are subject
to some constraints. This can be immediately seen from the expressions (17.66) and
(17.69) for the time-time and time-space components of the Einstein tensor we had
obtained in section 17.4, namely
+ 1 (K 2 Kab K ab )
GN N G N N = 12 R 2
(20.1)
GaN G E N = Kba
b aK b
a b
These just depend on the values of hab and Kab on , and therefore these components of
the Einstein equations are not evolution equations at all but rather provide 4 constraints
among the initial data. These constraints
+ K 2 Kab K ab = 16GN T N N
R
(20.2)
b Kba
a K b = 8GN T Ea N
b
The remaining (space-space) components of the Einstein tensor depend on the 2nd time
derivatives of the metric, i.e. on the time-derivatives of Kab , and therefore the remaining
6 space-space components
Gab = 8GN Tab (20.3)
of the Einstein equations are true evolution equations for hab . Due to their non-linearity,
and due to the presence of the constraints, these equations are highly non-trivial and
mathematically extremely challenging.
397
In a canonical (Hamiltonian, first-order) formulation of the problem, the first step is
a 3+1 dimensional decomposition of the space-time into space (a hypersurface or
family of spacelike hypersurfaces ) and time, and a corresponding decomposition
of the dynamical variables. Among the dynamical dynamical variables one would then
have the spatial metric hab on , and phase space variables and initial data would
then include the configuration variable hab and its canonically conjugate momentum
ab .
In the ADM formalism, a more detailed analysis, starting from the Lagrangian formu-
lation of the theory and then using the Gauss-Codazzi expression (17.73) for the Ricci
scalar (Einstein-Hilbert Lagrangian) R shows that, more specifically, the canonically
conjugate variables are hab and
ab = h(K ab Khab ) (20.4)
(see (20.70) in section 20.6). Since ab can be expressed in terms of hab and Kab (and
conversely Kab can be expressed in terms of hab and ab ), initial data can also be
specified by specifying hab and ab on (so these variables span the phase space of the
theory).
Of course, these variables need to satisfy the constraints. In a Hamiltonian formulation
these constraints are known as the Hamiltonian constraint and the Momentum con-
straints respectively. Presence of such constraints in the Hamiltonian formulation is a
characteristic feature of gauge theories and/or generally covariant theories, and we will
see below how precisely they arise from a canonical formulation of the theory, and what
their significance is from this perspective. Roughly speaking, they turn out to generate
the time evolution and the action of spatial coordinate transformations on the fields
via Poisson brackets. This is the way 4-dimensional general covariance is implemented
in a foliation-dependent way in the (foliation-dependent) 3+1 dimensional Hamiltonian
formulation of the theory.
+ (K 2 K K ) + 2 (N N N N ) ,
R=R (20.5)
In order to be able to use this, let us assume that we do not just have a single hypersur-
face but a foliation of the space-time into such hypersurfaces. We thus assume that
398
the space-time is of the form R, with R representing the time-direction, and are
the constant time slices equipped with some fixed spatial coordinates y a .
On each of these slices of constant time the scalar curvature takes the above form
only depends on the intrinsic geometry of (and thus contains
(20.5). The first term R
no normal derivatives), while the extrinsic curvature term contains squares of terms
with first normal derivatives but no second normal derivatives. These second normal
derivatives can then only appear in the third term, which is a total derivative. Thus this
decomposition is reminiscent of, and serves the same purpose as, say the addition of the
Gibbons-Hawking-York boundary term (19.61) to the Einstein-Hilbert action discussed
in section 19.5.
Indeed, if the boundary consists of one (or two, initial and final, say) of these spacelike
hypersurfaces , this already leads to an appropriate decomposition of the Einstein-
Hilbert action, namely the Gauss-Codazzi form of the action
Z
4 + (K 2 K K ))
SGC [g ] = gd x (R
I
= SEH [g ] 2 d (N N N N ) (20.6)
I
= SEH [g ] 2 hd3 y N (N N N N ) .
As we will now see, for spacelike boundaries addition of this total derivative term
is equivalent to the addition of the Gibbons-Hawking-York term. Indeed, looking
at the boundary term more closely, we see that, as a consequence of N N =
(N N )/2 = 0, it reduces to
I I
3
2 hd y N (N N N N ) = 2 hd3 y N , (20.7)
With respect to such a foliation and boundary, the Gauss-Codazzi form of the action is
therefore identical to the standard gravitational action (19.67),
Remarks:
1. Thus another way to motivate (or arrive at) the Gibbons-Hawking-York boundary
term is to start from the decomposition (17.73) of the Ricci scalar provided by
the Gauss-Codazzi equations.
2. Once expressed in terms of the so-called ADM variables - see section 20.4 below -
I will refer to the form (20.6) of the action as the ADM action.
399
3. If in addition there are timelike (asymptotic) boundaries B, then additional bound-
ary terms are required, because the contribution of such a boundary to the bound-
ary term in the action, schematically something like
I p
2 hB d3 y r (N N N N )
B
(with r the normal to B and hB the absolute value of the determinant of the
(Lorentzian signature) metric induced on B), will not equal the standard Gibbons-
Hawking-York boundary term for this boundary (the integral over B of the trace
of the extrinsic curvature of B). In the following we will, until further notice,
assume that there is no such boundary component B, and will then return to this
issue in sections 20.10 and 20.11.
The next step is to find a parametrisation of the space-time metric adapted to a given
choice of foliation of the space-time by (constant time) hypersurfaces. In order to achieve
this, we first assume that the spatial hypersurfaces of this foliation of the space-time
are hypersurfaces of constant time, i.e. they are the level sets of some time function
t(x ),
t0 = {x : t(x ) = t0 } , (20.9)
We can now introduce coordinates (t, y a ) on the space-time via a coordinate transfor-
mation
x = x (t, y a ) (20.10)
gives us the embedding (cf. (14.4) and the discussion in sections 14.1 and 14.2) of
a hypersurface (with coordinates y a ) as the hypersurface t0 in space-time,
xt0 : t0 M . (20.12)
The curves
xy0 (t) := x (t, y0a ) (20.13)
then connect points on different hypersurfaces with the same values of the spatial
coordinates y a = y0a , and thus provide us with a notion (or encode a choice) of
time evolution from one hypersurface to the next.
400
Given x = x (t, y a ) for some choice of foliation and time-evolution,
x
Ea = (20.14)
y a t
gives us the components of the time-evolution vector field t . The curves (20.13) are not
required to be normal to the hypersurface. In general, therefore, t can be decomposed
into a normal and tangential part as
(t ) = N N + Ea N a . (20.16)
The function N and spatial vector field N a appearing in this expression are known as
the lapse function and shift vector field respectively. They parametrise the freedom in
the choice of the time-evolution vector.
We thus have
dx = (N N + Ea N a )dt + Ea dy a
(20.17)
= N N dt + Ea (dy a + N a dt) .
Plugging this into the line element for the space-time metric and using g N N = 1,
one finds
where
hab = g Ea Eb (20.19)
is the induced metric. This is the so-called ADM decomposition of the metric, and
is the usual point of departure for developing the Hamiltonian formulation of general
relativity (and of field theories in a gravitational background).
and
2. The normalised timelike normal vector field to the surfaces of constant t, thus
N t is given by
N = N t . (20.22)
401
3. Thus in the ADM coordinates (t, y a ) one has
Nt = N , Na = 0 (20.23)
as well as
Eat = 0 , Eab = ab . (20.24)
4. In terms of these variables, the 4-dimensional volume element g takes the simple
factorised form
g=N h . (20.25)
5. Moreover, in terms of these variables the extrinsic curvature tensor of the surfaces
of constant t can be written as
1
Kab = (hab LN hab ) (20.26)
2N
where
h ab = Lt hab = t hab (20.27)
is the time (Lie) derivative of hab , and in terms of the intrinsic = induced covariant
derivative the 2nd term can be written as
a Nb +
LN hab = b Na . (20.28)
It is perhaps only the derivation of the last result (20.26) that requires some comment.
Here is a sketch of 2 derivations:
Kab = Ea Eb N (20.29)
Kab = N Ea Eb t = N Ea Eb t . (20.30)
Now use the explicit expressions for the components of the metric and its inverse
to write this as
Kab = N 1 tab + N 1 N c cab . (20.31)
Noting that
2tab = a Nb + b Na t hab , cab
cab = (20.32)
402
5b Alternatively, start with
h ab = Lt hab = Lt (g Ea Eb ) (20.33)
and use
Lt Ea = [t , ya ] = 0 (20.34)
to write this as
(t ) = N N + N (20.36)
with
N = g N = g Ea N a (20.37)
h ab = 2N Kab +
a Nb +
b Na , (20.38)
With these preliminaries out of the way, let us now turn our attention to the gravitational
action. The starting point is the Gauss-Codazzi action (20.6), but now of course viewed
as a functional of the ADM variables (hab , N, N a ). Since the extrinsic curvature tensor
K is a spatial tensor, one has
and
K K = K ab Kab (20.40)
where
Kab = Ea Eb K . (20.41)
This is what I will refer to as the (2nd order form of the) ADM action. We can write
this as an integral of a Lagrangian LADM or a Lagrangian density LADM as
Z Z
SADM [hab , N, Nk ] = dt LADM = dt d3 x LADM . (20.43)
403
Note that in terms of the so-called DeWitt metric
K K K 2 = Kab K ab K 2 (20.45)
can be written as
Kab K ab K 2 = Gabcd Kab Kcd . (20.46)
Remarks:
1. As this DeWitt metric determines the form of the kinetic term, it also plays the role
of a natural metric on the space of spatial metrics or, better, metric deformations
hab , in the sense that one can define
Z
h1 h, 2 hi = hd3 x Gabcd (1 hab )(2 hcd ) . (20.48)
This metric is not positive definite. The negative direction in the space of
deformations of a spatial metric hab turns out to be associated with overall volume
deformations.
2. This can be seen very explicitly in the case of simple cosmological models, where
this overall scaling of the spatial metric is the only degree of freedom (the cosmic
scale factor ) and thus the gravitational kinetic contribution to the action is strictly
non-positive (see the discussion in section 34.6 for an explicit illustration of this
fact).
At this point, for comparison purposes it will be useful to have some at least very
superficial familiarity with the canonical formulation of Maxwell theory. I will therefore
briefly summarise this here (more sophisticated treatments of this standard subject can
be found in many places).
We start with the Lorentz-invariant Lagrangian (density) of Maxwell theory,
L = 41 F F (20.49)
404
but break manifest Lorentz invariance by choosing a particular inertial frame with co-
ordinates (x0 = t, xa ), and a corresponding slicing of space-time by constant time hy-
persurfaces. Then the Lagrangian takes the form
1 ~2 ~2
L= (E B ) (20.50)
2
where
F0a = 0 Aa a A0 = Ea , Fab = abc Bc . (20.51)
Now we proceed as follows:
A a = a + a A0 , (20.53)
The Legendre transform of the Lagrangian density is thus the Hamiltonian density
H = a A a L = a a + a a A0 L
(20.54)
~2 +B
= 12 ( ~ 2 ) A0 (a a ) + a (A0 a ) .
with suitable boundary conditions we can ignore the total derivative term. Thus
we can work instead with the Hamiltonian density
~2 +B
H = 12 ( ~ 2 ) A0 (a a )
(20.56)
~2 + B
= 12 (E ~ 2 ) + A0 (a E a ) .
reproduce the relation (20.53) between the velocities and momenta, and the spatial
components of the Maxwell equations F = 0,
a = {a , H}
F a = 0 . (20.59)
405
A0 acts as a Lagrange multiplier, imposing the time-component of the Maxwell
equations,
a E a = 0 F 0 = 0 . (20.60)
Alternatively, this equation arises from requiring that the primary constraint 0 =
0 be preserved under time-evolution,
0 = {0 , H} =! 0
G = a E a = 0 . (20.61)
This condition does not contain time-derivatives of the canonical variables, and
therefore it is not an evolution equation for the phase space variables but rather
a secondary constraint on the initial data on a fixed time hypersurface, the Gauss
Law Constraint.
The name derives from the fact that in the presence of matter one would instead
have
a E a = , (20.62)
with the charge density, and this relation allows one to express the total charge
contained in a spatial volume as a surface integral (a statement usually known as
the Gauss Law).
This Gauss law constraint reflects the underlying gauge invariance of Maxwell
theory. In particular, via Poisson brackets it generates the action of the gauge
transformations on Aa (and E a ),
Z
Aa (~x) = {Aa (~x), d3 y (~y )b E b (~y )} = a (~x)
Z (20.63)
a a 3 b
E (~x) = {E (~x), d y (~y )b E (~y )} = 0 .
Thus the physical (reduced) phase space of the system consists of the pairs (Aa , E a )
satisfying the Gauss law constraint, modulo gauge transformations. Gauge invari-
ant observables are those functions on phase space that have vanishing Poisson
brackets with the Gauss law constraint.
depend only on the electric field, not on the vector potential, they satisfy the
constraint algebra
{G[1 ], G[2 ]} = 0 (20.65)
which reflects the Abelian U (1) gauge invariance of Maxwell theory (whereas the
corresponding Gauss law generators in a non-Abelian gauge theory would have
formed a Poisson bracket realisation of the gauge algebra).
406
Finally we note that the on-shell value of the Hamiltonian gives the energy (den-
sity) of a solution,
G=0 ~2 + B
H = 12 (E ~ 2 ) = T00
Z (20.66)
H = d3 x T00 = E .
In the following, you should see that there is a close analogy betweeen the Gauss Law
constraint G (and its associated Lagrange multiplier A0 ) of Maxwell theory, and the
so-called Momentum Constraint Ha (and its associated Lagrange multiplier, the shift
vector N a ) on the gravity side (but I will refrain from constantly pointing out these
analogies in the following, as that can become rather obnoxious).
On the other hand, there is no good Maxwell analogue of the so-called Hamiltonian
Constraint, whose presence is instead a characteristic feature of general relativity (and
other parametrisation invariant theories).
The (genuine, unconstrained) conjugate momenta to hab are (the definition adopted
here is that the canonical momenta are tensor densities)
LADM
ab = . (20.68)
h ab
as anticipated in (20.4).
407
This can also be written as
hKab = Gabcd cd , (20.73)
where
Gabcd = 21 (hac hbd + had hbc hab hcd ) (20.74)
is indeed the inverse of the DeWitt metric (20.44) in the sense of a metric on symmetric
2-tensors,
Gabcd Gcdef = 21 (ea fb + fa eb ) . (20.75)
Using (20.26) in the form
h ab = 2N Kab + LN hab , (20.76)
one sees that the velocities h ab can be written in terms of the coordinates and momenta
as
2N
h ab = Gabcd cd + LN hab . (20.77)
h
Turning now to the other variables N and N a , note that the action does not depend
is completely inde-
on their time-derivatives at all since the intrinsic scalar curvature R
pendent of these variables while the extrinsic curvature involves only N and the spatial
covariant derivatives of N a . Thus the conjugate momenta to these variables are zero,
LADM LADM
pN = =0 , pN a = =0 . (20.78)
N N a
Since the action does not depend on the time-derivatives of these variables, they act
as Lagrange multipliers and variation of the action with respect to the lapse function
and shift vector gives rise to the Hamiltonian constraint H = 0 and the Momentum
Constraints Ha = 0 already mentioned in section 20.1.
(it is convenient to define the constraints as tensor densities; this accounts for the
factor of h). Comparison with (20.1) shows that
H = 2 hGN N (20.80)
408
Thus variation of the action with respect to N (i.e. differentiation of the La-
grangian density with respect to N in the case at hand) simply changes the rel-
ative sign of the 2 terms, giving rise to the Hamiltonian constraint H = 0. This
Hamiltonian constraint will indeed turn out to be part of the Hamiltonian of
the theory. In this sense, variation with respect to N implements the Legendre
transformation.
so that the Momentum constraint impose these components of the Einstein equa-
tions. Written in terms of the canonical momenta (20.70), the Momentum con-
straint is simply
Hb = 2 a ab . (20.85)
Note that this is a tensor density because of the h in the definition (20.70) of
ab .
(because of (20.78), whether or not we also formally include pN N etc. in this expression
makes no difference). Now from (20.26) ab h ab consists of 2 kinds of terms, namely
ab h ab = ab (2N Kab + 2
a Nb ) (20.87)
409
The first of these, combined with the kinetic term, gives
2N ab Kab N hGabcd Kab Kcd = N hGabcd Kab Kcd . (20.88)
so that
h(K ab Kab K 2 ) = ( ab ab 12 2 )/ h . (20.90)
We can also write this in terms of the inverse DeWitt metric (20.74) as
abcd
hG Kab Kcd = Gabcd ab cd / h . (20.91)
Taking into account the scalar curvature term in the Lagrangian, we thus arrive at
2N ab Kab LADM = N h(Gabcd ab cd /h R) (20.92)
This is precisely (N times the) the Hamiltonian constraint (20.79), now expressed in
terms of the canonical variables (hab , ab ),
with
H=
h(Gabcd ab cd /h R)
(20.94)
= ( ab ab 12 2 )/ h hR .
a Nb
2 ab a ab )Nb = Nb Hb
2( (20.95)
HADM = N H + N a Ha (20.96)
We can also use these results to write the ADM action in 1st-order form as
Z
SADM = dt d3 x ( ab h ab N H N a Ha ) . (20.98)
Remarks:
410
1. As anticipated in the previous section, the Hamiltonian constraint (20.94) looks
exactly like a standard Legendre transform of the Lagrangian (20.47)
LADM = N h(Gabcd Kab Kcd + R)
(20.99)
N H = N h(Gabcd ab cd /h R)
of the ADM action, it is now manifest that variations of the action with respect to the
lapse and shift give rise to the Hamiltonian and Momentum constraints respectively,
SADM
=0 H=0
N (20.106)
SADM
=0 Ha = 0
N a
In the Hamiltonian picture, these constraints arise and are implemented by demanding
that the so-called primary constraints (20.78)
pN = 0 , pN a = 0 , (20.107)
411
are preserved under the Hamiltonian time-evolution. This indeed gives rise precisely to
the Momentum and Hamiltonian constraints as secondary constraints,
p N = {pN , HADM } = 0 H=0 GN N = 0
(20.108)
p N a = {p N a , HADM } = 0 Ha = 0 GN a = 0 .
The remaining (true evolution) equations Gab = 0 are then the Hamilton equations for
the spatial metric (configuration variable) hab and its conjugate momentum ab . These
can either be written in the form
HADM HADM
h ab = ab
, ab = (20.109)
hab
or in terms of Poisson brackets as
h ab = {hab , HADM } , ab = { ab , HADM } , (20.110)
where the non-vanishing Poisson brackets between the canonical variables hab and ab
are
{hab (x), cd (y)} = 12 (ab cd + ad bc )(x, y) . (20.111)
Inserting the explicit expression for the Hamiltonian, one finds (unsurprisingly) that
the equation for h ab simply reproduces the definition of ab , i.e. the relation (20.77).
Indeed, from the kinetic term of the Hamiltonian constraint one finds
Z 2N
{hab , d3 x N Gabcd ab cd / h} = Gabcd cd , (20.112)
h
and the Poisson bracket with the potential term is zero
Z
{hab , d3 x N hR} =0 (20.113)
(because R = R(h)
is only a function of hab and its spatial derivatives). Finally, the
Poisson bracket with the momentum constraint part of the ADM Hamiltonian gives
Z Z
3
{hab , d x (2Nc d )} = {hab , d3 x (+2(
cd d Nc ) cd )}
(20.114)
= a N b + b N a .
Putting everything together, one sees that this indeed reproduces (20.77) in the form
2N
h ab = Gabcd cd +
a Nb +
b Na . (20.115)
h
The equation for ab ,
HADM
ab = = { ab , HADM } , (20.116)
hab
is now equivalent to Gab = 0. The explicit expression can of course be worked out in
analogy with the above and from the results obtained so far, but it is rather complicated
(these are, after all, the non-linear coupled Einstein equations) and not particularly
enlightning, at least not upon first sight, and will not be given here. A partial result,
however, namely the Poisson bracket of ab with the momentum constraint part of the
R
Hamiltonian, { ab , N a Ha }, will be given below as it illustrates the significance of the
momentum constraint.
412
20.9 Properties and Significance of the Constraints
pN = 0 , pN a = 0 , (20.117)
i.e. the condition that they are preserved under the time-evolution generated by the
ADM Hamiltonian, leads to the secondary Hamiltonian and Momentum constraints
(20.108),
p N = 0 , p N a = 0 H = 0 , Ha = 0 . (20.118)
One now needs to inquire whether further (tertiary, . . . ) constraints are generated by
the requirement that these secondary constraints are preserved under time-evolution.
It turns out that the story ends here and that no further constraints are required. Thus
the Hamiltonian and Momentum constraints will be satisfied at all times provided that
they are satisfied on the initial value surface. This is the Hamiltonian counterpart of the
statement about propagation of the constraints discussed in section 18.7 in connection
with the Bianchi identities and their implications.
and
Z Z
d
H(x) = d y N (y){H(x), H(y)} + d3 y N b (y){H(x), Hb (y)}
3
dt
Z Z (20.120)
d
Ha (x) = d y N (y){Ha (x), H(y)} + d3 y N b (y){Ha (x), Hb (y)} ,
3
dt
it is clear that checking this amounts to calculating the Poisson brackets among the
constraints and verifying that these are zero when the constraints themselves are satis-
fied, i.e. that the Poisson bracket algebra of the secondary constraints actually closes
on the secondary constraints.
Even though this fact, and the resulting surface deformation algebra, are in some
sense one of the most interesting aspects of this entire story, we will skip the direct
calculation of the Poisson bracket algebra of the constraints here as it is somewhat
painful. However, it is useful to at least display the algebra of constraints (and we will
then at least partially verify it afterwards).
The Poisson brackets among the naked constraints H(x), H(y), Ha (x), Hb (y), as they
appear in (20.120), will involve delta-functions (x, y) and their derivatives and are a
bit unattractive (see e.g. (20.129) and (20.137) below). In order to exhibit the Pois-
son bracket algebra and clarify its structure, it is more instructive and convenient to
explicitly introduce the smeared constraints
Z Z
H[N ] = d x N H , P [N ] = d3 x N a Ha ,
3
(20.121)
413
in terms of which the ADM Hamiltonian takes the form
Here the new lapse function and shift vectors appearing on the right-hand side are
3. and finally a shift vector constructed from the two lapse functions N1 and N2 (and
the metric!), denoted by N1 N 2 N2 N 1 , which has the components
2 N2 N
(N1 N 1 )a = N1
a N2 N2
a N1
(20.127)
hab (N1 b N2 N2 b N1 )
There are many things that can and should be said about this algebra, its properties,
its interpretation, its deeeper meaning, and its consequences, but in the following I will
just make some rather elementary and simplistic comments.
First of all, recall that we expect the Hamiltonian and Momentum constraints to re-
flect the general covariance of general relativity. This general covariance is manifest in
the covariant 4-dimensional Lagrangian formulation, but the Hamiltonian formulation
requires a split of the 4-dimensional space-time into space and time via the choice of a
foliation of space-time by spacelike hypersurfaces t , encoded in the choice of a time
evolution vector field t with (20.16)
(t ) = N N + Ea N a . (20.128)
In this Hamiltonian formulation, spatial general covariance is still manifest, and this is
reflected in the fact that the part of the constraint algebra that is easiest to understand
is the algebra among the momentum constraints, or its naked counterpart
414
Indeed, the Momentum constraint P [] associated to some vector field on implements
the action of (infinitesimal) spatial diffeomorphisms
xa = a (20.130)
on the phase space variables, and thus on functions on phase space, namely the Lie
derivative L , via Poisson brackets,
(a proof of this is postponed to the very end of this section). As discussed in various
ways in section 8, the Lie derivative provides a representation of the Lie algebra of
vector fields (with respect to the Lie bracket) on tensors,
and the momentum constraint algebra shows that on phase space variables this repre-
sentation is lifted to a representation at the level of Poisson brackets,
{P [1 ], P [2 ]} = P [[1 , 2 ]] . (20.133)
simply expresses the fact that H is (and hence transforms as) a scalar density (8.68),
H = L H = a ( a H) . (20.136)
2 N2 N
{H[N1 ], H[N2 ]} = P [N1 N 1]
(20.137)
{H(x), H(y)} = (hab (x)Hb (x) + hab (y)Hb (y))xa (x, y)
Rather, the Poisson bracket algebra of the constraints represents what is known as the
surface deformation algebra, subtly different from the algebra of space-time diffeomor-
phisms, as it acts not on the space-time but on the space of embeddings of spatial
hypersurfaces.
415
Thinking of the surfaces in terms of an embedding x (y a ), with
Ea (y) = ya x (20.138)
C(y) = N (20.140)
x (y)
with N the unit normal vector field to the hypersurfaces (generating normal deforma-
tions of the hypersurfaces).
Remarks:
1. From the surface deformation algebra and the requirement that evolution from
a hypersurface i to a hypersurface f should be independent of how one slices
/ foliates the space-time between the two hypersurfaces, one can derive that the
vanishing of the generators H and Ha of this algebra must be imposed as con-
straints.46
4. Taken at face value, this suggests that in a generally covariant theory there is no
dynamics, or that the dynamics is frozen, and that the only allowed observables
are functions on the phase space that Poisson-commute with the Hamiltonian, i.e.
that are in some sense constants of motion. This cannot be strictly correct, of
46
S. Hojman, K. Kuchar, C. Teitelboim, Geometrodynamics regained, Ann. Phys. (NY) 96 (1976)
88-135. For a detailed discussion of this and further references, see e.g. chapters 3 and 4.1 of C. Kiefer,
Quantum Gravity (2nd edition).
416
course, and the problem appears in a different light once one fixes a gauge, i.e.
makes a choice of coordinates. Nevertheless, this does not solve all the problems
and there are endless debates in the literature about these issues. In particular, the
debate over what are acceptable observables in a generally covariant (quantum)
theory continues to this day.47
We now turn to the proof of the relation (20.131), In order to establish (20.131), it is
sufficient to show that the canonical Poisson brackets (20.111),
imply that
hab {hab , P []} = L hab
(20.142)
ab { ab , P []} = L ab .
The first relation is equivalent to the identity (20.114) already derived above, since
a b +
L hab = b a . (20.143)
The proof of the 2nd relation is a bit more complicated as it also involves the metric vari-
ation of the Christoffel symbols appearing in the covariant derivative d cd . Recalling
that cd is a tensor density,
cd = h(K cd hcd K) hpcd , (20.144)
Likewise, the Lie derivative of this tensor density is (cf. sections 8.4 and 8.6)
L ab = (L h)pab + hL pab
= 21 h(hcd L hcd )pab + hL pab
(20.146)
= h( c c )pab + h( c c pab pcb
c a pac
cb )
= c ( c ab ) cb
c a ac
cb .
Now, to calculate
Z
{ ab (x), P []} = 2 d cd (y)}
d3 y { ab (x), c (y) (20.147)
47
See e.g. C. Rovelli, Quantum Gravity or S. Giddings, D. Marolf, J. Hartle, Observables in effective
gravity, rXiv:hep-th/0512200 for different points of view and discussions of these issues.
417
we make the hab -dependence more explicit (but suppress the y-dependence in this equa-
tion),
d cd } = { ab (x), hce e
{ ab (x), c d cd }
(20.148)
d cd + hce e { ab ,
= { ab (x), hce } e c } df .
df
From the 1st term, one immediately obtains (from the canonical Poisson brackets, and
with the factor of (-2) and the integration over y from (20.147))
d cd
{ ab (x), hce } e d ad + a
b d bd . (20.149)
For the calculation of the 2nd term, we observe that taking the Poisson bracket with ab
is equivalent to taking (minus) the variation with respect to hab . We can therefore use
the formula (19.14) for the variation of the Christoffel symbols under metric variations,
hce h f hed +
c = 1 ( d hef
e hdf ) . (20.150)
df 2
Now an integration by parts (moving the derivatives off the delta-functions) shows that
the 2nd term contributes
c } df
hce e { ab , c ( c ab )
d ( a bd )
d ( b ad ) . (20.151)
df
Putting everything together, one finds precisely the Lie derivative (20.146),
c ( c ab ) cb
ab = { ab , P []} = c a ac
c b = L ab (20.152)
So far in this section we have assumed that the spatial slices have no boundary,
= , and we have therefore also ignored possible boundary terms that are required
or generated by the presence of such a boundary. In the remainder of this section, i.e.
here and in subsections 20.11 and 20.12 below, we will look at some of the issues and
features that arise when one takes these into account.
To set the stage, recall that we saw in section 19.5 that differentiability of the gravita-
tional action in the sense of variational calculus, i.e. (19.68)
Z
Sg [g ] = g G g (20.153)
without boundary terms on the right-hand side for Dirichlet boundary conditions, can
be achieved e.g. by adding the Gibbons-Hawking-York boundary term to the Einstein-
Hilbert action (19.67),
Sg [g ] = SEH [g ] + SGHY [g ]
Z I (20.154)
= g R + 2 hK .
418
Moreover, we saw in section 20.2 that the Gauss-Codazzi decomposition of the Ricci
scalar (20.5) ,
+ (K 2 K K ) 2 (N N N N ) ,
R=R (20.155)
automatically takes care of the Gibbons-Hawking-York boundary term for initial and
final spacelike hypersurfaces i and f which are part of the foliation of the space-
time M into spacelike hypersurfaces t , with normal vector N , since
N (N N N N ) = N N N = K . (20.156)
We also noted in section 20.2 that, if in addition to initial and final spacelike boundaries
there is a timelike boundary B,
M = {f } {i } B , (20.157)
then additional boundary terms are required in the action (and also in the Hamiltonian).
We will assume in the following that the boundary B is orthogonal to the spatial slices
in the sense that the normal N to is orthogonal to the normal r to B,
N r = 0 . (20.158)
Tracing back through the various derivations in this section, one finds that there are 2
contributions to this boundary term for the action on B (in addition, later on we will
identify a boundary term contribution to the Hamiltonian, and thus to the 1st order
Hamiltonian form of the ADM action):
hB = (g r r )|B , hB r = 0 (20.160)
48
For a discussion of boundary terms for non-orthogonal boundaries see e.g. S. Hawking, C. Hunter,
The Gravitational Hamiltonian in the Presence of Non-Orthogonal Boundaries, arXiv:gr-qc/9603050,
and I. Booth, R. Mann, Moving Observers, Non-orthogonal Boundaries, and Quasilocal Energies,
arXiv:gr-qc/9810009, and references thereto.
419
induced on B. The projection term in the expression for the trace of the extrinsic
curvature is not strictly speaking necessary, since r r = 1 implies
(g r r ) r = g r r (20.161)
This is not equal to the standard Gibbons-Hawking-York boundary term for this
boundary component (which would be expressed solely in terms of r , not also
N ). Using the assumption of orthogonality r N = 0, and N N = 1, this
can by an integration by parts be written as
I p
S2 = +2 hB N N r . (20.163)
B
where
s = g + N N r r (20.165)
with
s N = s r = 0 . (20.166)
St = t = t B . (20.167)
As a consequence, we have
p
hB = N s (20.168)
and
kS = s r (20.169)
is the extrinsic curvature of St in t . Thus this new boundary term modifies the 2nd
order ADM form (20.42) of the complete gravitational action to
Z Z I
2
SADM = dt 3 ab
d x hN (R + K Kab K ) + 2 2
d x sN kS . (20.170)
St
420
The Legendre transform of the ADM Lagrangian to the Hamiltonian will thus, in partic-
ular, also lead to a boundary term in the ADM Hamiltonian (20.97), namely (reinserting
the coupling constant)
I
1 2
LADM LADM + sd x N kS
8GN St
I (20.171)
1 2
HADM HADM sd x N kS .
8GN St
a Nb = 2
2 ab a ( ab Nb ) + N a Ha . (20.172)
Therefore the total Hamiltonian in the presence of timelike boundaries (or: when the
spatial slices have boundaries) has the form
Z
HADM = d3 x (N H + N a Ha )
I I
2 (20.173)
1 1
sd x N kS + d2 x Na ab rb .
8GN S 8GN St
Remarks:
1. The necessity of these boundary terms in the Hamiltonian can also be under-
stood form the requirement of having a differentiable Hamiltonian in the sense of
variational calculus and, as we will see in section 20.11 below, this provides an
alternative route to determining these boundary terms.
2. The other significance of these boundary terms lies in the fact that they give the
on-shell value of the Hamiltonian, i.e. the value of the Hamiltonian on a solution
satisfying (in particular) the Hamiltonian and Momentum constraints, namely
I
1
H = Ha = 0 HADM = d2 x N skS Na ab rb . (20.174)
8GN S
Turning to the 1st issue, recall that the Hamiltonian equations of motion are assumed
to be (20.109)
HADM HADM
h ab = ab
, ab = (20.175)
hab
421
However, validity of these equations (differentiability of the Hamiltonian in the sense of
variational calculus) requires that the variation of the Hamiltonian with respect to the
canonical variables hab and ab has the form
Z h i
HADM [hab , ] = d3 x (. . .)ab hab + (. . .)ab ab
ab
(20.176)
with
(16GN ) ab cd h
H= Gabcd R (20.178)
h (16GN )
and
b ba
Ha = 2 (20.179)
we see that
no boundary term arises from the variation of the 1st (kinetic) term in the Hamilto-
nian constraint, as it does not depend on the derivatives of the canonical variables;
as it
a boundary term will arise from the variation of the 2nd (potential) term R,
depends on the 2nd derivatives of hab ;
a boundary term will arise from the integration by parts required to express the
variation of the Momentum constraint as (. . .) ab .
The latter issue is obviously taken care of by simply reinstating the total derivative term
in (20.172) and adding it to HADM . This immediately leads to the 2nd boundary term
in (20.173), I
1
HADM HADM + d2 x Na ab rb . (20.180)
8GN S
we can observe that this is
In order to resolve the issue arising from the variation of R,
simply the 3-dimensional counterpart of the issue that arises when varying the Einstein-
Hilbert action with Lagrangian R. Thus we can appeal to the discussion of the Gibbons-
Hawking-York boundary term in section 19.5 to conclude that the required boundary
term to be added involves the trace kS of the extrinsic curvature of the boundary
St = t in t . Noting that (a) the normal vector is spacelike ( = +1), and (b) that
the Hamiltonian involves (N R) rather than (+R), we deduce from (20.154), say, that
this requires modifying the bulk Hamiltonian HADM according to
I
1 2
HADM HADM sd x N kS . (20.181)
8GN S
422
We thus conclude that validity of the Hamiltonian equations of motion in the presence
of a spatial boundary S = requires adding boundary terms to the bulk Hamiltonian
according to
Z
HADM = d3 x (N H + N a Ha )
I I (20.182)
1 2 1
sd x N kS + d2 x Na ab rb .
8GN S 8GN S
As mentioned at the end of section 20.10, the value of the Hamiltonian on a configuration
(hab , ab ) satisfying the Hamiltonian and Momentum constraints is
I
1
HADM [N, N a ] = d2 x N skS Na ab rb . (20.183)
8GN S
While in the spatially closed case, = , this on-shell value of the Hamiltonian is zero,
it has a significance e.g. for asymptotically flat space-times (the prototypical example
being the Schwarzschild metric). While in this case there is no spatial boundary = S
in the strict sense, in the asymptotically flat context variations of hab should be restricted
to preserve this asymptotic flatness. The boundary terms in the Hamiltonian derived
above are also appropriate in this setting (as one is essentially imposing the Dirichlet
condition hab = ab at infinity).49
(t ) = N N + Ea N a , (20.184)
and noting that asymptotically the time-evolution of static observers in the Minkowskian
geometry at infinity is orthogonal to the spatial directions, the choice N = 1 and
N a = 0 (asymptotically) gives the value of the Hamiltonian associated to asymptotic
time-translations. As such, it provides a candidate definition of the gravitational energy
of a configuration (hab , ab ), the ADM energy
I
? 1
E= lim d2 x skS . (20.185)
8GN S
For a boosted observer at infinity, his proper time would correspond to a non-trivial
linear combination of the 2 terms in (20.184), hence to a non-trivial shift vector N a .
49
Defining and implementing the conditions for asymptotic flatness requires and merits more care.
See e.g. R. Wald, General Relativity, chapter 11, for a careful discussion of all the issues we are glossing
over in the following.
423
The second term in (20.183), depending on N a is therefore naturally associated with a
linear momentum (and other choices of lapse and shift can be used analogously to define
candidate notions of angular momentum etc.), but we will not explore this further here.
The above candidate expression (20.185) for the energy still requires some improvements.
First of all, the limit here refers to taking the boundary 2-sphere S to infinity. This can
be implemented more concretely by introducing asymptotically a Cartesian coordinate
system on , with an associated notion of radial distance r and considering the limit of
the coordinate spheres SR of radius r = R as R . Thus we can write a somewhat
improved version of (20.185) as
I
? 1
E= lim d2 x skS . (20.186)
8GN R SR
The problem with this expression is that unfortunately it diverges even for the flat
metric h0ab = ab on ,
h0ab dy a dy b = ab dy a dy b = dr 2 + r 2 d2 . (20.187)
It is natural to assign the energy E = 0 to Minkowski space (and its flat slices), and
it is therefore also reasonably natural to subtract this divergent contribution from E in
(20.186). We thus finally arrive at the definition of the ADM Energy
I
1
EADM = lim d2 x s(kS kS0 ) . (20.190)
8GN R SR
Here kS0 is defined to be the extrinsic curvature of S embedded in flat space R3 in such
a way that the induced metric on S is the same as that induced on S by the metric hab
on (in particular, then, s is the same for both terms and therefore only appears as
an overall factor in the integrand).
briefly mentioned in section 19.5, and which would have also led us to (20.190).
To see that (20.190) gives a finite and meaningful result in cases of interest, we con-
sider the prime example of an asymptotically flat solution to the Einstein equations,
424
namely the Schwarzschild solution describing the exterior of a spherically symmetric
star (see section 23 and subsequent sections for a detailed derivation and discussion of
this metric).
In the standard Schwarzschild coordinates, this metric has the form (23.30)
2m
ds2 = f (r)dt2 + f (r)1 dr 2 + r 2 d2 , f (r) = 1 , (20.192)
r
where the parameter m is related to the mass M of the star by m = GN M (in units
with c = 1). We can directly work with the (sufficiently simple) exact expression for
the metric, but it will be sufficient to look at the asymptotic (large r) behaviour of
the spatial metric on the slices t of constant time t. As a consequence, the following
analysis applies not just to the Schwarzschild metric but to any metric of the above
general form, with
2m
f (r) = 1 + O(1/r 2 ) . (20.193)
r
To first order in an expansion in m/r, the metric on a hypersurface is given by
Alternatively, in so-called isotropic coordinates, the metric takes the form given in
(23.39), and the asymptotic form of the spatial metric on the slices t of constant time
t is (calling the radial coordinate r again)
Note that even though the (rr)-components of the metric (and hence the radial normal
vector to the spheres) is the same in both coordinate systems,
the induced metric on the spheres is different. Thus even though in both cases the flat
reference metric is simply
(ds0 )2 = dr 2 + r 2 d2 , r0 = r , (20.197)
also the required isometric embedding into flat space will have to be different in the
two cases, and we will now see how these things conspire to give the same (and highly
reasonable) result
EADM = M . (20.198)
We will use that the trace of the extrinsic curvature can be written as
425
1. Schwarzschild coordinates
The induced metric on SR is R2 d2 and thus the extrinsic curvature is
2 m
kS |r=R (1 ) . (20.201)
R R
In the flat reference metric, one obtains the same induced metric if one also chooses
the radius r = R, and (as above)
2
kS0 = , (20.202)
R
so that
2m
kS kS0 . (20.203)
R2
Thus the ADM energy is
I
1
EADM = lim d R2 (2m/R2 ) = m/GN = M . (20.204)
8GN R SR
2. Isotropic Coordinates
In isotropic coordinates, the induced metric on a sphere of radius r = R is
Thus
2 2 M
kS0 = (1 ) , (20.208)
R0 R R
and therefore
2m
kS kS0 , (20.209)
R2
426
as in Schwarzschild coordinates. In the R limit, only the leading term of
the induced volume element will contribute, and therefore the result is indeed
identical to that obtained in Schwarzschild coordinates,
I
1
EADM = lim d (R2 + 2mR)(2m/R2 )
8GN R SR
I (20.210)
1
= lim d (R2 )(2m/R2 ) = m/GN = M .
8GN R SR
In section 22.4 we will encounter a seemingly different expression for the ADM energy,
deduced and extrapolated there not from a canonical analysis but rather from the
linearised Einstein equations, namely (22.33)
I
1 k hik
i h) .
EADM = dSi ( (20.211)
16GN S 2
As mentioned there, it can be shown that this agrees with the canonical expression
when the induced metrics on S agree. Moreover, in section 23.8 we will evaluate this
expression for the Schwarzschild metric, again both in Schwarzschild and in isotropic
coordinates, and reassuringly also find EADM = M in this way.
427
21 Energy-Momentum Tensor II: Selected Topics
G = 8GN T , (21.1)
one can either try to find exact solutions in certain specific situations, or one can try to
learn or prove something in general about solutions to the Einstein equations.
For the former, one usually starts by specifying the matter content and the energy-
momentum tensor (either phenomenologically or microscopically), and then furthermore
imposes some symmetry conditions, and this is how we will usually proceed in other
parts of these notes, when discussing e.g. solar system physics, black holes or cosmology.
In this case, one thus in particular chooses (or is at least well-advised to choose) an
energy-momentum tensor with reasonable and well-motivated physical properties from
the outset.
For the latter, it is clear that in order to be able to say anything of substance at all, one
needs to impose some conditions on the energy-momentum tensor T . After all, any
metric whatsoever can be considered to be a solution of the Einstein equations, with
energy-momentum tensor defined by
1
T = G . (21.2)
8GN
The problem with this approach (this is sometimes referred to as the poor mans way of
solving the Einstein equations, but this is too charitable a characterisation and maligning
poor men) is that generically this candidate energy-momentum tensor will not have
any of the very general properties one would usually associate with reasonable forms of
matter.
...
It turns out that (various combinations, or variants, of) such conditions can be imple-
mented by imposing some simple general constraints on the energy-momentum tensor
known as Energy Conditions. The simplest and most common among these take the
form of pointwise conditions on the contraction of an energy-momentum tensor with
428
causal (i.e. timelike or null, or non-spacelike) vectors. One can also consider weaker
averaged versions of these conditions, averaged either along geodesics or over regions
of space(-time), say, but we will only consider the pointwise conditions here.
By continutiy, this inequality is then also valid for null vectors , i.e. for all causal
vectors v ,
T v v 0 v : v v 0 . (21.4)
The rationale for this condition is that the null Raychaudhuri equation describing
the focussing of null geodesic congruences is (11.107)
d
= R + . . . (21.6)
d
so that the geometry will have a focussing (attractive) effect on null geodesics if
R 0 . (21.7)
and because is null, this translates into the null energy condition (21.5).
R t t 0 . (21.10)
429
By the Einstein equations, this can be rewritten as the condition
(T 21 g T )t t T t t 0 t : t t < 0 . (21.11)
Again by continuity this is then also true (or required to be satisfied) for all causal
vectors v ,
T v v 0 v : v v 0 . (21.12)
This is known as the strong energy condition (this terminology is standard but
confusing because the strong energy condition does not imply the weak energy
condition - more on the relations among the various energy conditions below).
P = T t (21.13)
represents the energy-momentum current density seen by that observer. The phys-
ically eminently reasonable dominant energy condition is then the statement that
the speed of the flow of energy should not exceed the speed of light, i.e. that P
should be causal (and future-directed),
(
causal and future directed
P = T t (21.14)
for all timelike and future directed t
Since t is itself timelike and future directed, this is equivalent to the 2 conditions
P t 0 and P P 0 . (21.15)
P t 0 T t t 0 (21.16)
so that the dominant energy condition implies the weak energy condition, but
requires additionally P P 0.
It is clear from the above definitions that one has the implications
and
(SEC) (NEC) , (21.18)
and thus the NEC is the weakest of these energy conditions. However, neither does the
SEC imply the WEC nor is there a simple relation between the SEC and the DEC.
430
It is good to keep in mind that none of these energy conditions are sacrosanct. While
they are all satisfied in simple models (like that of a massless scalar field below), it is easy
to construct reasonable physical models that violate any one of these energy conditions,
either classically or at the quantum level (where e.g. negative Casimir vacuum energy
densities can arise). Thus any result that is obtained on the basis of one of these energy
conditions comes with a built-in caveat that it only applies to matter satisfying that
energy condition. How plausible the assumption of a particular energy condition is
depends on the specific context.
To see what these energy conditions require concretely, it is useful to look at some
examples:
T = 12 g ()2 , (21.19)
with
()2 g (21.20)
and
T = . (21.21)
T = ( )2 (.)2 0 . (21.22)
Here the 1st term is manifestly non-negative but the 2nd term is not. In
order to disentangle this, consider the covector
s = + t (t.) . (21.24)
s s > 0 (21.26)
431
and therefore we can write T t t as a sum of non-negative terms,
T t t = 21 (t.)2 + 12 s s 0 . (21.28)
T v v = (v.)2 0 . (21.29)
P = t. + 12 t ()2 . (21.30)
P P = 14 (()2 )2 0 . (21.32)
Thus reassuringly a free massless scalar field satisfies all the 4 energy conditions.
T = 12 g ()2 g V () , (21.33)
and
T = + g V () . (21.34)
The NEC is clearly unaffected by the addition of this potential term to the energy-
momentum tensor. For the remaining energy conditions, there is no need to go
through the detailed analysis again. It is clear that a potential that is too negative
can lead to a negative energy density and can thus threaten or violate the WEC
and the DEC, while a potential that is too positive can threaten the SEC. To see
e.g. the latter, note that
T t t = (t.)2 V () , (21.35)
so that any static field configuration (in the sense of t. = 0) with a positive
potential will violate the SEC. Thus even though the SEC is occasionally used, it
needs to be kept in mind that even quite ordinary matter can violate this particular
energy condition.
432
3. Cosmological Constant
The effect of a positive versus a negative potential can be seen very explicitly by
looking at such a term in isolation, e.g. in the form of a cosmological constant
contribution to the energy-momentum tensor (18.45),
T = g . (21.36)
with (18.44)
= p = (21.37)
8GN
and
T
= + g . (21.38)
For either sign of the cosmological constant this will (marginally) satisfy the
NEC.
For the WEC we need to look at
T t t = + . (21.39)
Thus a positive cosmological constant will satisfy the WEC, while a negative
cosmological constant will violate it.
For the DEC, we see that
P = T t = + t (21.40)
is manifestly timelike, and will be future oriented iff > 0, i.e. for a positive
cosmological constant. A negative cosmological constant, on the other hand,
violates both DEC conditions.
Finally, for the SEC the signs are reversed with respect to the WEC,
T
t t = , (21.41)
and therefore a positive cosmological constant violates the SEC while a neg-
ative cosmological constant satisfies it.
4. Perfect Fluid
Another useful and instructive example to look at is the energy-momentum tensor
of a perfect fluid (6.70),
T = ( + p)u u + pg , (21.42)
T = + 3p , (21.43)
433
so that
T = ( + p)u u + 21 ( p)g . (21.44)
T u u = . (21.45)
Rather, one needs to lok at T t t for all timelike future directed t . In order
to implement this concretely, and nevertheless make use of the presence of u in
the energy-momentum tensor, let us assume without loss of generality that at the
space-time point of interest (or by choosing a comoving coordinate system) u has
the form
u = (1, 0, 0, 0) . (21.46)
We can then boost this vector with rapidity to another timelike vector
or more generally to
t = cosh u + sinh n (21.48)
The NEC (for which none of the above gymnastics are required) requires
T = ( + p)(u.)2 0 , (21.51)
and therefore
+p0 . (21.52)
434
For the WEC we need to look at
T t t = ( + p)(u.t)2 p . (21.53)
This expression is linear in (u.t)2 . Thus to check (or impose) positivity (or
non-negativity, to be precise), we just need to check (or impose) it at the 2
endpoints of the (u.t)2 -interval [1, ). For (u.t)2 = 1 and (u.t)2 we
obtain respectively
When these conditions are satisfied, the WEC is also satisfied for any other
choice of t. Therefore the WEC is equivalent to the NEC with the additional
requirement that 0.
For the DEC, in addition to the WEC we need to impose the condition that
P = T t = ( + p)(u.t)u pt (21.55)
Together with the conditions arising from the WEC, this can simply be writ-
ten as the single requirement
|p| . (21.58)
T t t = ( + p)(u.t)2 12 ( p) . (21.59)
+p0 , + p 12 ( p) 0 (21.60)
or
+p0 , + 3p 0 . (21.61)
We will make use of these results in the discussion of cosmology later on, in
particular in sections 34 and 35.
435
21.2 Canonical vs Covariant Energy-Momentum Tensor
We will generalise the analysis of section 19.6 and look at some of the implications
of general covariance of the matter action when taking into account boundary terms.
This turns out so be surprisingly rewarding - surprising because of the lack of obvious
significance (not to be confused with obvious lack of significance, their significance is
explained and illustrated e.g. in the references given in footnote 37 of section 19.6) of the
identically conserved Noether currents obtained from the gravitational Einstein-Hilbert
action in this case.
As we will see, this will lead us to a relation between the covariant energy-momentum
tensor (as defined here, via the metric-variation of the action) and the canonical energy-
momentum tensor (deduced from the translation-invariance of Poincare-invariant field
theories in Minkowski space via Noethers theorem).
On general grounds we know that, under the transformation = L the action trans-
forms as (8.67) Z Z
4 4
gd x LM = gd x ( LM ) . (21.63)
Using the formula (8.28) for the Lie derivative of a covector, one sees that
= L = + ( )
(21.65)
= ( ) = L =
(this is one of the simplifications brought about by considering only scalars). Then,
performing the usual integration by parts, keeping track of the boundary term, and
combining (21.63) and (21.64), one finds
Z Z Z
4 4 LM 4
gd x T + gd x = gd x ( ) , (21.66)
where
LM
= LM (21.67)
( )
436
is the usual canonical Noether energy-momentum tensor, and
LM LM LM
= (21.68)
( )
is the Euler-Lagrange variational derivative. Since this has to hold for all , we deduce
LM
T + ( ) = ( ) , (21.69)
or
LM
(T ) = ( ) . (21.70)
If this is to be valid for arbitrary , the coefficients of and have to vanish
separately (at the origin of an inertial coordinate system this would be the coefficients of
the obviously functionally independent functions and ). This has the following
implications:
1. First of all, we learn that (in the case of scalar fields) the canonical and covariant
energy-momentum tensors agree (off-shell),
T = . (21.71)
This is of course something that we already checked explicitly in section 6.6, but
it is reassuring to see this drop out of the general formalism as well.
2. We also learn that the latter (and hence the former) is conserved if the equations
of motion of the matter fields are satisfied,
LM
=0 = T = 0 . (21.72)
This could have also alternatively been deduced from integrating (21.69) over a
domain (with the chosen to vanish on the boundary), and integrating by parts
- this is just a repetition of the argument that previously led us to (19.99).
3. We can also see directly from (21.69) that for a Killing vector, the Noether
current
JN () = (21.73)
is on-shell conserved,
LM
( ) = 0 and =0
JN () = 0 . (21.74)
These are the conserved currents previously discussed in section 9.1.
As mentioned above, the case of scalar fields has some simplifying and non-generic
features (exemplified e.g. already by the fact that no Belinfante improvement is required
437
in this case). In general, i.e. when one does not restrict to scalar fields, the above chain of
reasoning leads to the (on-shell) identification of the covariant energy-momentum tensor
T with a suitable covariantisation of the Belinfante symmetric improvement ab
of the canonical Noether energy-momentum tensor ab .
That the above argument naturally gives rise to the improved energy-momentum tensor,
and hence to the on-shell identification of the covariant and improved tensors, can be
seen quite explicitly in the case of Maxwell theory (and this can easily be extended to
other theories).
In general, for any matter action LM we have (we now remove all the, really quite
unnecessary, integrals from the previous argument)
L ( gLM ) = g ( LM ) = g [(a LM ) )] , (21.75)
because gLM is a scalar density. On the other hand, by explicitly acting with the Lie
derivative on the metric and the other fields the Lagrangian density depends on, and
splitting
L = (L )g + (L )M (21.76)
into its action on the metric and the matter fields, we can write this as
L ( gLM ) = (L )g ( gLM ) + g(L )M LM
= 12 gT L g + g(L )M LM (21.77)
= gT + g(L )M LM .
Now let us specialise to Maxwell theory. In that case, we have
LM = 14 F F (L )M LM = 12 F L F . (21.78)
Using the explicit covariant expression (8.29) for the Lie derivative on a (0, 2)-tensor,
we can write this as
(L )M LM = 12 F [ F + 2( )F ] . (21.79)
T + 21 F [ F + F + F ] + ( F )F
(21.81)
= [F F 41 F 2 ) ] .
This is the Maxwell counterpart of (21.69) and we see that where we had the covari-
antised canonical energy-momentum tensor we now have automatically obtained
the improved gauge-invariant and symmetric expression
T = F F 41 F 2 . (21.82)
438
We also see very explicitly from this result that conservation of T requires both sets
of Maxwell equations (in particular also the Bianchi idenitity, something one would not
have anticipated from (21.69)).
(F F ) = (F ( A ) ) (F ( A ) ) (21.83)
and had just moved the first of these over to the other side, then that term would
have combined with LM to give the covariantised (but not gauge-invariant) canonical
energy-momentum tensor
= F A 14 F 2 , (21.84)
which, as already discussed in section 6.5 is not covariantly conserved. All the other
terms, including some coming from the Bianchi, identities would then usually be at-
tributed as covariantisations of Belinfante improvement terms (and other cosmetic cor-
rection terms arising e.g. from the fact that covariant and Lie derivatives do not com-
mute, [L , ]A 6= 0 etc.).
However, it is clear from the above derivation that this is an unnatural way of splitting
up a perfectly reasonable and gauge-invariant equation like (21.80).
6= 0 on-shell (21.86)
because of curvature terms, i.e. because one would have to be able to commute
covariant derivatives on the fields in order to establish that the tensor is covariantly
conserved, and this is not possible for field of spin higher than zero. Indeed, by
explicitly calculating the covariant divergence, one finds
LM LM
= ( ) + [ , ] . (21.87)
( )
439
Likewise the covariantisation of the improvement term, in particular with
with = , (21.88)
= 12 [ , ] = 12 R 6= 0 . (21.89)
Thus there seems to be no good reason in the first place to add this term to
the non-conserved . Nevertheless, it turns out that the sum of these two
contributions,
= + ,
(21.90)
=
= 0 .
and (21.91)
This can be demystified somewhat by deriving this result from Noethers theorem
applied to coordinate transformations (generalising the calculation done for scalar
fields at the beginning of this section by taking into account the non-trivial trans-
formation behaviour of higher-rank tensor fields under coordinate transformations
and/or extending the usual Belinfante argument from Lorentz transformations to
general coordinate transformations).50
Ultimately, however, what this proof shows is that these properties hold for
because this tensor is equal to the covariant energy-momentum tensor on-shell,
= T
on-shell , (21.92)
the latter being (on-shell) covariantly conserved and (off-shell) symmetric by con-
truction, and requiring absolutely no Belinfante-like gymnastics and improvement
terms for its construction.
440
agress with the Belinfante-improved canonical Noether energy-momentum tensor
ab (6.114),
ab = Tab on-shell . (21.94)
For me, the upshot of this long discussion is (but please draw your own conclusions)
that the covariant definition of the energy-momentum tensor appears to be superior,
both conceptually and calculationally, to other definitions, because it is more general,
more concise and to the point, and easier to work with. It should therefore also be the
definition of choice even if one is just interested in Poincare-invariant field theories in
Minkowski space, in particular as it completely side-steps the issue of having to contruct
the Belinfante improvement terms to the canonical energy-momentum tensor.
In section 6.7 we had discussed matter actions that are invariant under Weyl transfor-
mations
g (x) g (x) = (x)2 g (x) (21.95)
of the metric alone (i.e. without also transforming the matter fields) and had shown that
such actions lead to an (off-shell) traceless covariant energy-momentum tensor. It was
also evident from that discussion that one would obtain an on-shell traceless energy-
momentum tensor in theories that are invariant under a joint non-trivial Weyl rescaling
of the metric and the matter fields. We have now accumulated everything that we need
to discuss an example of this kind, namely the so-called conformally coupled scalar field.
We start off by considering the minimally coupled action of a free massless scalar field
in D space-time dimensions,
Z
D
S0 [, g ] = 12 gd x g (21.96)
g g = e 2 g , = e w (21.97)
441
of the metric and the scalar field, provided that one chooses the scaling weight or Weyl
weight of the scalar field to be
2D
w = . (21.98)
2
To see this, note that under this rescaling one has, for the gravitational part,
gg e (D 2) gg , (21.99)
e 2w = e (2 D) , (21.100)
which establishes the invariance of the massless free action under this scaling for the
choice of weight (21.98). We will return to this scale invariance, and its relation to the,
perhaps more familiar, dilatation invariance of relativistic field theories, below (and
just note for now, with apologies, that for a scalar field the scaling weight w is related
to what is called the scale dimension d of a field in relativistic field theories by
w = d ).
While the action (21.96) is invariant under constant rescalings with the above weights,
as it stands the action is clearly not invariant under Weyl rescalings, i.e. space-time
dependent rescalings of the form
because one will invariably pick up derivatives of (x) that are not cancelled by anything
else (unless w = 0, i.e. D = 2, which brings us back to the case already discussed in
section 6.7).
It is a remarkable fact, however, that for D > 2 invariance under (21.101) can be
achieved by adding a non-minimally coupled mass term of the form R2 to the La-
grangian. Indeed, the action
Z
D
1
S [, g ] = 2 gd x g + R2 (21.102)
At first sight this invariance seems to be not only unlikely but also somewhat unpleasant
to try to prove (or disprove), since it appears that one would have to work out how the
442
curvature scalar of the Weyl-rescaled metric 2 g is related to that of g . While
this is something that can be worked out with a steady hand, by first determining
how the Christoffel symbols are related (cf. (27.7)), and then working out the relation
between the curvature tensors etc. from there, this is no fun. Since I am not aware of
a particularly efficient way to short-cut this calculation, and the final result is not even
particularly illuminating, we will forego this here.51
However, since there are no global subtleties hiding in Weyl transformations, it is com-
pletely sufficient to check invariance of the action under infinitesimal Weyl transforma-
tions, i.e. transformations with of the form
R = (g )R + g R
(21.107)
= (g )R + ( g )g ,
R = 2R + ( g )2g
(21.108)
= 2R + (2 2D) .
Therefore, using
( gg ) = (D 2) gg (21.109)
R
the variation of the R2 term in the action (21.102) is (suppressing the dD x here and
in the subsequent equations)
Z Z
2
g R = g[22 R + 2 R]
Z (21.110)
2
= (2 2D) g .
It is even more straightforward to work out how the first (standard kinetic) term of the
action transforms under (21.106). Ignoring boundary terms arising from integrations
51
If, for whatever reason, you need these formulae and you do not want to work them out yourself,
you can look them up e.g. in Appendix D of R. Wald, General Relativity or Appendix G of S. Carroll,
Spacetime and Geometry, or Appendix E of T. Ortin, Gravity and Strings (see how nobody wants to
clutter the body of the text with these formulae?), which, I am happy to report, all agree once differences
in conventions have been accounted for.
443
by part one finds
Z Z
gg = gg (D 2)[ ()]
Z
= gg (2 D)( )( ) (21.111)
Z
D2 2
= g
2
It is now evident that for a judicious choice of coupling constant one can get (21.110)
to cancel against (21.111). Specifically, one finds that
Z Z
2 D2 2
g(g + R ) = + (2 2D) g , (21.112)
2
so that invariance of the action S (21.102) is achieved for
D2 D2
+ (2 2D) = 0 = , (21.113)
2 4(D 1)
as claimed.
Remarks:
1. We can also write in a somewhat more informative manner in terms of the Weyl
(or scaling) weight w of the scalar field (given in (21.106) or (21.98)) as
w
= . (21.114)
2(D 1)
2. Under the Weyl transformation (21.101) of the scalar field and the metric g ,
a monomial of the scalar field p transforms as
p p(2D)/2 p , gp D+p(2D)/2 gp . (21.115)
is Weyl invariant. In particular, for D = 4 one can add a quartic interaction term,
D=4 V () 4 , (21.118)
and the two other integer solutions for p are p = 6 for D = 3 and p = 3 for D = 6
(cf. also the discussion around (21.150) below).
444
3. If one considers gravity coupled to a single scalar field in this conformally cou-
pled Weyl-invariant manner, then the local gauge symmetry (21.101) is essentially
enough to gauge fix the scalar field (x) to a constant (away from zeros of (x)).
Then the action (21.117) essentially reduces to the Einstein-Hilbert gravitational
action (albeit with the wrong sign, the constant value of the scalar field being
related to Newtons constant or the Planck length/scale) plus a cosmological con-
stant term. One can therefore also attempt to interpret the Einstein-Hilbert +
cosmological constant action as arising from the spontaneous breaking of the Weyl
invariance of the original Weyl-invariant action (21.117) through the eppearance
of this gravitational scale.52
( R) = 0 (21.119)
following from the action (21.102). Not only is this a useful exercise and good to check;
as we will see below, the explicit expression for the energy-momentum tensor is also
quite interesting in its own right.
Thus we need to determine the variation of the action with respect to the metric, for
which we can again use (21.107). Integrating the term 2 ( g )g by parts
(twice) and expressing this in terms of g rather than g (which just leads to a
change of sign), one finds that the energy-momentum tensor associated with the action
S is
T = 21 g (g ) + (G + g )2
= T + (G + g )2 (21.120)
,
where the first (-independent) part is just the standard (Noether = covariant T )
energy-momentum tensor of a massless scalar field and G in the second part is the
Einstein tensor (evidently one of the terms arising from the metric variation of gR2 ).
This expression is valid for any , not just the conformally invariant value. When
referring specifically to the conformally invariant theory with given by (21.113), I will
use the superscript (c) on T , i.e.
(c) =(D2)/4(D1)
T = T (21.121)
52
See e.g. R. Kallosh, A. Linde, Hidden Superconformal Symmetry of the Cosmological Evolution,
arXiv:1311.3326 [hep-th] for an exploration of these ideas in the context of cosmology and inflation.
445
With
2 = 2g + 2 (21.122)
one finds that the trace of the energy-momentum tensor can be written as
D2 D2
T = 2(D 1) g + R . (21.123)
4(D 1) 4(D 1)
Note that we only have a single parameter to play with but that cooperatively both
terms vanish (the first off-shell, the second on-shell) precisely when is chosen to have
the value (21.113),
D2
= T (c) = 0 on-shell. (21.124)
4(D 1)
While without higher knowledge this may appear to be a minor miracle at this point,
we know on general grounds that this had to work out.
Thus we have been able to construct a symmetric, on-shell conserved and on-shell trace-
less energy-momentum tensor for the action of a massless scalar field in any dimension
D 2. As a consequence of this and the considerations in section 9.2, associated to
any (conformal) Killing vector
In particular, the conformal invariance of this model in Minkowski space (in this sense)
is implied by the Weyl invariance of the action in curved space.
To conclude this section, I just want to point out that the Einstein equations
G = T ( = 8GN ) (21.127)
for such a scalar field non-minimally coupled to the scalar curvature can be (and occa-
sionally are) written in a different way, involving a field-dependent effective coupling
constant ef f = ef f (, ), and a corresponding effective energy-momentum tensor
(ef f )
T which is different from, and should not be confused with, the energy-momentum
tensor T given in (21.120). This comes about as follows:
1. Equations of Motion
At the level of the equations of motion, one sees from the explicit expression
(21.120) that the Einstein tensor appears both on the left- and on the right-hand
sides of the Einstein equation (21.127),
G = T + (G + g )2
(21.128)
2 G + Tef f
.
446
Combining the G -terms, this equation can be written equivalently as
G = ef f T
ef f
(21.129)
2. Action
At the level of the action, this can be understood (or could have been anticipated)
by noting that the non-minimal coupling term R2 in the scalar field action
(21.102) can also be regarded as modifying the gravitational coupling constant
appearing in front of the Einstein-Hilbert action,
Z Z Z
1 1 2 1
gR g R = g ef f R . (21.131)
2 2 2
However, variation of this action with respect to the metric does not give rise to
G /ef f , and variation of the rest of the matter action (21.102), which is just the
ef f
standard action for a scalar field, does not give rise to T (which depends on
ef f
), so care is required when using these effective quantities. In particular, T is
not traceless for a conformally coupled scalar field and can therefore not be used
in the construction of additional conserved currents.
the second term being the remnant of the (linearised) Ricci scalar. We note that by
construction this energy-momentum tensor is symmetric, and on-shell traceless and
conserved,
(c)
a Tab = T (c)aa = 0 on-shell . (21.133)
Now, what is the significance of this energy-momentum tensor and the fact that it differs
from the canonical energy-momentum tensor of a scalar field?
First of all, we oberve that the term proportional to is an improvement term in the
sense of (6.62) and (6.63), i.e. the energy-momentum tensor can be written as
(c)
Tab = ab + c cab , (21.134)
447
where the second term is identically conserved,
As I mentioned before, what I referred to as scale invariance above (the special case
of Weyl invariance for a constant rescaling of the metric) is closely related to what is
known as dilatation invariance (or also as scale invariance) for relativistic field theories
in Minkowski space. There one considers scalings not of the (fixed) Minkowski metric
but instead of the coordinates (this is the dilatation), accompanied by an appropriate
rescaling of the fields. Thus one considers transformations of the form (xa denote inertial
Minkowski coordinates usually denoted a elsewhere in these notes)
a = e xa
x , x) = e d (x) = e d (e x
( ) , (21.136)
where d = d() is the scale dimension (or dilatation weight) of the field , while the
constant Weyl rescaling operation (21.97) is of the form
g g = e 2 g , = e w . (21.137)
Remarks:
1. In accordance with the standard practice in field theory, the dilatation action
on the coordinates is written as x = exp()x, so that the coordinates have
dimension (1) and the corresponding scale dimensions d count mass dimensions.
By contrast, what I have called the scaling (or Weyl) weight w = w() of a field
in (21.137) counts length dimensions.
3. For a scalar field one finds that a necessary condition for dilatation invariance
is that d = w , with w given in (21.98), but the relation between the Weyl
weight and the scale dimension is different for derivatives of fields and/or higher
rank tensors. We will see examples of this below.
448
4. Since the operation (21.136) is not particularly meaningful in a general curved
space (as it depends on a choice of coordinates), in that context it is preferable
to consider scalings of the metric rather than of the coordinates. In this sense
(21.137) is the appropriate extension of (21.136) to curved spaces.
When one has such a dilatation invariant action, by the standard procedure the Noether
theorem provides one with an on-shell conserved dilatation Noether current jNa,
a
a jN =0 on-shell . (21.140)
On the other hand, we know on general grounds from the discussion in section 9 that
(c)
if we have a traceless symmetric conserved energy-momentum tensor Tab , like our Tab
in (21.132), we can construct a conserved current JDa (9.16) associated to the Killing
a
JD = T ab D b = T ab xb a
a JD = T aa = 0 on-shell . (21.141)
In passing we note that symmetry of Tab is not strictly necessary for this conclusion since
b Da = ab is symmetric (or Da = a (x2 )/2 is a gradient vector). However, if we have
a symmetric, on-shell tracefree and on-shell conserved energy-momentum tensor we can
construct not only a conserved dilatation current but in fact conserved currents for the
entire menagerie of generators of the conformal group (see section 9.3), in particular
therefore also for the generators C (m) of special conformal transformations.
Thus the question if the dilatation current can be written in the form (21.141) is strictly
related to the question (of interest in field theory) if dilatation invariance extends to
full-fledged conformal invariance.
Now in the example at hand, of a free scalar field, it turns out that the Noether current
is not of the form (21.141) for D > 2 with respect to the canonical Noether energy-
momentum tensor ab ,
a
D > 2 : jN 6= ab xb (21.142)
(the precise form of jN a is derived below - see (21.167)). Indeed, this could hardly
449
We can now (finally) return to the issue raised at the beginning of this section regarding
the significance of the improvement term in the energy-momentum tensor (21.132)
(c) d
Tab = ab + (ab a b ) 2 (21.143)
2(D 1)
for a free masless scalar field in D dimensions. Namely, it turns out that using this
improved energy-momentum tensor the Noether current can be written in the form
(21.141), modulo an identically conserved total derivative term (that does not contribute
to charge integrals and may as well be neglected). Concretely one finds that (off-shell)
a a
JD jN = T (c)ab xb jN
a
= b ab , (21.144)
where
d
ab = ba = (xa b xb a )2 . (21.145)
2(D 1)
A proof of this assertion will be provided at the end of this section.
Thus what we have seen is that this improved energy-momentum tensor and its as-
sociated dilatation and conformal currents arise directly from the covariant energy-
momentum tensor of a conformally coupled (i.e. Weyl invariant) scalar field in curved
space-time. It is pleasing to see that also this improvement can be understood (and
derived) directly from the gravitationally coupled matter action.
an understanding of why the improvement terms has the precise form it has (in
the present case it can be traced back to the structure of the Ricci scalar)
and a different perspective on the relation between scale and conformal invariance.
More generally, the question under which conditions one can construct an improvement
term for the energy-momentum tensor such that the dilatation Noether current has the
form (21.141) has been analysed in detail first by Callan, Coleman and Jackiw, and the
improved energy-momentum tensor (when it exists) is known as the Callan - Coleman
- Jackiw or CCJ tensor.53
53
C. Callan, S. Coleman, R. Jackiw, A new improved energy-momentum tensor, Ann. Phys. 59 (1970)
42. See e.g. S. Coleman, R. Jackiw, Why dilatation generators do not generate dilatations, Ann. Phys.
67 (1971) 552 for a detailed analysis, and section 2.4 of T. Ortin, Gravity and Strings for a modern and
gravitational prespective (but note that his improved energy-momentum tensor (2.74) is in general not
symmetric, so that the construction of conserved conformal generators fails).
450
In the remainder of this section, for the sake of completeness I will give a self-contained
proof of the assertion (21.144), which really boils down to determining the Noether
current for dilatations. It is useful, however, to start with a slightly more detailed
discussion of two prototypical examples of non-trivially scale-invariant theories.
Examples:
In order to achieve dilatation invariance of the scalar field action, we need the
Lagrangian to have dimension D. Thus we need to associate dimension D/2 to
a . Evidently, if has dimension d(), then its derivative a has dimension
By contrast, the Weyl weight of a field under constant rescalings of the metric has
nothing to do with the coordinates, and therefore
w( ) = w() . (21.148)
Thus, by postulating that the scalar field has this dimension, one arrives at a
dilatation-invariant action.
Remarks:
V () 2D/(D2) . (21.150)
451
Only for D = 3, 4, 6 is this an integer power of , namely
D=3 V () 6
D=4 V () 4 (21.151)
3
D=6 V () .
In this case for dilatation invariance we need d(Fab ) = D/2, and thus
With this dimension-assignement for the gauge field, the D-dimensional Maxwell
theory is dilatation invariant.
Remarks:
(a) The result d = (D 2)/2 evidently only relies on the fact that one has a
kinetic term which is quadratic in first derivatives, and therefore is valid for
the standard action of a bosonic field af any spin.
(b) By the same reasoning, for actions of the fermionic type that are first-order in
derivatives dilatation-invariance requires that the dimension of the spinorial
field is
d() = (D 1)/2 , (21.154)
Since the Weyl weight of gg g is
w( gg g ) = D 4 , (21.156)
452
D arising from g and (4) from the two inverse metrics, scaling invariance
requires that the Weyl weight of A and F should be
We see that in this case this is not the same as minus the dimension, but it
should be clear from these two examples how to translate in general Weyl
weights into scale dimensions and vice-versa.
(d) For D = 4 we have w(A ) = 0, and this is precisely the example already
discussed in section 6.7. Since A does not transform, the action is then
actually invariant under arbitrary Weyl transformations of the metric, not
just constant rescalings.
(e) When D 6= 4, Maxwell theory has invariance under constant scalings but not
under Weyl transformations. Since the Weyl transformation of the Maxwell
Lagrangian is not gauge invariant, there is no obvious way to fix this in
analogy with what we did in the scalar field case in section 21.3. In par-
ticular it seems that there is no way to construct a symmetric and traceless
energy-momentum tensor for this theory, strongly suggesting (from this grav-
itational perspective) that the restriction of the theory to Minkowski space
is an example of a theory that is scale- but not conformally invariant.54
Now let us return to the dilatations (21.136) for theories involving only scalar fields,
a = e xa
x , x) = e d (x) = e d (e x
( ) (21.158)
(for higher spin fields one would need to also invoke the Belinfante improvement pro-
cedure in the discussion below, but since this is a tangential issue for our purposes, the
principal aim here being to prove (21.144), we will not consider this).
For Noethers theorem we need the corresponding infinitesimal transformations (at the
same point x), namely
(a ) = (d + 1 + xb b )a = a (21.160)
453
Now consider a Lagrangian L = L(, ) depending on the scalar field (or fields) and
its derivatives. L has a well-defined dimension dL = d(L) if all the terms (summands)
in L have the same dimension. In this case L satisfies
L L
d + (d + 1) = dL L . (21.161)
( )
Then the infinitesimal varation of L under dilatations is
L = (dL + xb b )L . (21.162)
dL = D L = a (xa L) , (21.164)
Since we have a continuous symmetry, by the usual Noether argument there will be a
corresponding on-shell conserved current
a
jN = xa L a , (21.165)
where
L
a = (21.166)
(a )
is the field momentum conjugate to . Using (21.159) for , one finds explicitly that
the current is
a
jN = ab xb d a (21.167)
where
ab = ab L a b (21.168)
is the canonical conserved Noether energy-momentum tensor (which is automatically
a 6= a xb
symmetric for a scalar field). This already proves the claim in (21.142) that jN b
for D 6= 2. In this context the second term in the Noether current is known as
the virial current.
L = 12 ab a b a = a , (21.169)
(c)
and the explicit form of Tab (21.143), one finds that
a a d
JD jN = xb (ab a b ) 2 d a
2(D 1)
d
= xa 2 xb b a 2 (D 1) a 2 (21.170)
2(D 1)
d
= b (xa b xb a )2 ,
2(D 1)
454
which establishes (21.144), with ab as given in (21.145).
All the above relations are valid off-shell. Calculating the divergence of (21.167) and
using that on-shell one has
L
a ab = 0 , a a = , (21.171)
one finds
a L L
a jN = DL d + (d + 1) = (D dL )L , (21.172)
( )
thus confirming that dilatation invariance requires (and is equivalent to) dL = D.
The key result and insight of section 21.2, expressed in equations (21.94) and (21.92), is
that the improved canonical Noether energy-momentum tensor is actually equal on-
shell to the (on-shell) covariantly conserved and (off-shell) symmetric covariant energy-
momentum tensor T .
We look at this issue in turn for the two examples of section 5.7, working in Minkowski
space. Thus the relevant energy-momentum tensor is the restriction of the covariant
energy-momentum tensor T to Minkowski space, as in (6.113) or (21.93).
455
as it should be.
The canonical energy-momentum tensor of this model has the form
= s + m + a (21.175)
(the sum of the contributions from the scalar, Maxwell and axion action), while
the covariant energy-momentum tensor has the form
T = Ts + Tm
(21.176)
because the axionic term does not couple to the metric. The scalar energy-
momentum tensors are equal on the nose (i.e. off-shell, without improvement
terms),
s = Ts , (21.177)
As we know, for pure Maxwell theory one would proceed and argue as follows:
one writes m as
m = F F 41 F F + F A
(21.180)
= Tm + (F A ) ( F )A .
The 1st term on the right-hand side is the covariant (symmetric, gauge invariant,
metric) energy-momentum tensor, the 2nd is identically conserved, and the 3rd
is zero on-shell, so that on-shell the canonical and covariant energy-momentum
tensors agree modulo identically conserved terms that do not contribute to surface
integrals etc. In the present (non-vacuum) case, one needs to take into account
the non-trivial equation of motion
(F + f ()F ) = 0 , (21.181)
m + a = Tm (the term one expects)
+ [(F + f ()F )A ] (identically conserved)
(21.182)
+ [ (F + f ()F )]A (on-shell zero)
h i
+ f () F F 14 F F (???)
456
At first sight the last terms appears to spoil the claimed relation between the
canoncial and covariant energy-momentum tensors. However, either by explicit
calculation in components or from a more covariant argument given below one
finds that the term in square barackest in the last line,
Tm
F 1 F F
:= F 4 (21.183)
is actually cooperatively identically zero (for any anti-symmetric F and its dual),
Tm
0 . (21.184)
Therefore one concludes that, as it should be, the canonical and covariant energy-
momentum tensors are indeed equal on-shell modulo identically conserved terms.
Here is a general proof that Tm
= 0 identically:
F F = 1
2 F F = 12 F F
(21.185)
= 12 F[ F] = 21 F[ F]
F[ F] = F[ F] . (21.186)
F[ F] = F[ F] = g , (21.187)
with g determined by
1
= 24 g = 12 F F . (21.188)
F F = 21 F[ F] = 24
1
F F
(21.189)
= 1 6 F F = 1 F F
24 4
457
with
m = Tm + (F
A ) ( F )A , (21.191)
as usual, and
cs = k Lcs + k A A . (21.192)
The covariant energy-momentum tensor, on the other hand, is just the Maxwell
,
Tm
T = Tm , (21.193)
because the CS term is metric independent.
After some rearrangement the CS contribution to the canonical energy-momentum
tensor can be written as
cs = k Lcs + 21 k (A F + A F + A F )
(21.194)
+ k A F k ( A A )
Thus for the sum of the Maxwell and CS contributions one has the result
m + cs = Tm (the term one expects)
+ [(F k A )A ] (identically conserved)
(21.195)
+ [ F +k F ]A (on-shell zero)
+ T ,
with
T = k Lcs + 21 k (A F + A F + A F ) . (21.196)
By an argument analogous to that in the previous example, one can show that
T 0 (21.197)
identically:
A F + A F + A F = 3A[ F] ; (21.198)
one has
A[ F] = 61 ( A F ) = 13 Lcs ; (21.199)
thus
T = k Lcs + 32 k A[ F]
(21.200)
= k Lcs 21 k Lcs = 0 .
These two examples evidently show a common and general pattern, and it is now easy to
generalise this to topological or quasi-topological interactions in higher dimensions, but
this is not necessary - after all, one can prove in complete generality that the canonical
and covariant energy-momentum tensors are equal on-shell modulo identically conserved
terms, and the present examples just serve to illustrate how this works in a non-trivial
situation.
458
21.6 Comments on Gravitational Energy
It may not have escaped your attention that in the entire discussion of energy and
energy-momentum tensors of fields in a gravitiational field the notion of the energy of
the gravitational field itself has not appeared so far, even though clearly there can be an
exchange of energy between matter and gravitational fields and one should not expect
one to be conserved without taking into account the other. This is evidently a major
omission, and I will try to rectify this now, but as you will perhaps understand from
the discussion below there is a good reason why I have so far tried to avoid this issue.
To get us started, let us return to the covariant conservation law T = 0 for the
matter energy-momentum tensor which played such a key role above, and which we now
write more explicitly as the non-conservation law (cf. (6.144))
T = 0 ( g T ) = g T . (21.201)
This suggests that also in the present gravitational case it should be possible to define
a conserved total energy-momentum tensor by taking into account not only the matter
energy-momentum tensor but also the energy-momentum of the gravitational field itself.
However, this is easier said then done, and attempts to make sense of this and make this
well-defined are an important and controversial part of resarch in general relativity from
the earliest days of general relativity to today. For example, Noethers fundamental and
famous work Invariante Variationsprobleme on symmetries and variational problems
was prompted by questions of Hilbert regarding the apparent failure of what he referred
to as the energy theorem in general relativity.55
I will just make some short, scattered and introductory remarks on this subject, but
must precede these with a caveat:
This is treacherous territory and I cannot guarantee that even these elementary remarks
are widely considered to be uncontroversial (in fact, the number of uncontroversial
statements one can make about this subject is probably quite small - but is likely to
include this parenthetical remark . . . ).
55
See e.g. N. Byers, E. Noethers Discovery of the Deep Connection Between Symmetries and Con-
servation Laws, arXiv:physics/9807044 [physics.hist-ph] for an account of the historical circum-
stances of these discoveries.
459
1. Covariant Gravitational Energy-Momentum Tensor?
Ones first thought may be that, in precise analogy with the definition of the
covariant matter energy-momentum tensor, the gravitational energy-momentum
tensor should be defined in terms of the variation of the gravitational (Einstein-
Hilbert) action with respect to the metric, or, equivalently, that the total energy-
mmentum tensor should be defined as the variation of the total (Einstein-Hilbert
+ matter) action with respect to the metric. While this seems to be the logical
thing to do,
tot ? 2 1
T = SEH [g ] + SM [, g ]
g g 16GN
(21.202)
1
= G + T ,
8GN
it is evidently not useful because by the variational principle for general relativ-
ity this total energy-momentum tensor is identically zero for any solution to the
Einstein equations. This reinterpretation of the Einstein tensor as the energy-
momentum tensor of the gravitational field was first suggested by Lorentz (in
1916) and Levi-Civita (in 1917), but immediately dismissed by Einstein (1918).56
Whatever thus the poetic or philosophical virtues of such a definition might be
(everything from nothing, the total energy of the universe is zero), it is clearly
deficient in many other respects and does not provide a whole lot of insight. In
particular, with this definition one would assign zero gravitational energy to any
source-free region of space-time, i.e. to vacuum gravitational fields (like gravi-
tational waves, the exterior of a star etc.). Evidently, therefore, this definition
fails to capture some essential aspects of and contributions to what one would
commonly refer to as gravitational potential energy.
460
if the gravitational energy-momentum tensor were a true tensor, it would
then have to be zero in all reference frames
thus a non-trivial gravitational energy-momentum tensor cannot be ten-
sorial
~
~ + ~a = (
~ + ~a.~x) (21.203)
for any constant acceleration vector ~a. One can thus always make the the force
~
and the potential energy density vanish at a point x0 by choosing ~a = (x 0)
57
For a lucid recent discussion of this issue and its ramifications in general relativity, see J.
Frauendiener, L. Szabados, A note on the post-Newtonian limit of quasi-local energy expressions,
arXiv:1102.1867 [gr-qc].
461
(i.e. by going to a freely falling reference frame, as in our first discussion of the
equivalence principle in section 1.1). This local ambiguity of the gravitational
potential energy density can be (and is usually implicitly) fixed by invoking (non-
locally) the required or expected asymptotic behaviour of the gravitational field
(e.g. that it goes to zero asymptotically).
It is this local ambiguity that sets the gravitational potential energy density apart
from (formally similar) quantities like the electrostatic potential energy density.
This suggests that perhaps this non-localisability or non-tensoriality of the grav-
itational potential energy density is an intrinsic property of this quantity, thus
simply an inevitable feature and not a bug.
Again this argument may sound compelling but is not without loopholes. For
example, an integration by parts would put the Newtonian gravitational potential
energy density into the form , and following the argument in section 18.3
(extended to quadratic order in the fluctuations of the metric) it is certainly con-
ceivable that one come up with some tensorial generalisations of this expression
(involving second derivatives of the metric, i.e. contractions of the Riemann ten-
sor). However, then one needs to investigate whether such a candidate expression
has other desired or desirable properties, beyond having the right Newtonian
limit, in order to qualify as a candidate for the gravitational energy density or
energy-momentum tensor.
In some way, this situation reflects the tension of having to choose between working
with
either a covariant action, but involving second derivatives of the metric (the
Einstein-Hilbert action (19.2)),
or a non-covariant action, but nicely quadratic in the first derivatives of the
metric (the Einstein action (19.45)).
4. Gravitational Pseudotensors?
If among the desirable features for a potential gravitational energy-momentum
tensor one wants to include the statement that, whatever the energy-momentum
tensor t
G of the gravitational field might be, it should be such that the total
energy-momentum tensor consisting of the sum of the matter and gravitational en-
ergy momentum tensors is conserved (in the ordinary sense of providing conserved
currents), then this conflicts directly with the requirement of tensoriality. Indeed,
then the total energy-momentum tensor would have to satisfy some ordinary local
conservation law of the type
g(T + t
G )=0 (21.204)
(or perhaps with some other power of g). However, since ( gT ) is not
tensorial by itself (it is only the first part of the covariant derivative), it is clear
462
that t
G is then also invariably not tensorial. And if the sum T
+ t were
G
covariantly conserved instead, then we would again face the same problem as in
(21.201).
One simple way to construct (non-tensorial) objects t G satisfying an equation
like (21.204), known as energy-momentum pseudotensors, and illustrating the high
degree of arbitrariness in such a construction uses the method of superpotentials:
Construct an object
U = U [] (21.205)
out of the metric and its first derivatives (say), so that in particular its
divergence ( gU ) is identically conserved,
( gU ) 0 . (21.206)
Use this to split off a total derivative (divergence) term from the Einstein
tensor to define a corresponding candidate gravitational pseudo-tensor tU
by
16GN gt U := 2 gG + ( gU ) . (21.207)
as desired.
A multitude of such (and related) objects have been constructed, e.g. by Einstein
himself, or by Landau and Lifshitz, the pseudotensor of the latter at least having
the attractive feature that it is symmetric, by Weinberg, by Mller, etc. These
are typically indeed quadratic in the first derivatives of the metric (and hence,
as we already discussed, necessarily non-tensorial but with a potentially viable
Newtonian limit). They suffer from at least 2 ambiguities, however, one being the
choice of the superpotential, and the second (since we are dealing with non-tenorial
quantities) the choice of coordinates / reference frame. Their overall usefulness
for providing a local expression for the energy-density of the gravitational field is
therefore somewhat (or severely) limited.58
58
For modern accounts and further references see e.g. the brief discussions in the textbooks N. Strau-
mann, General Relativity or T. Ortin, Gravity and Strings, the review article by L. Szabados (footnote
61) and C. Chen, J. Nester, R. Tung, Gravitational Energy for GR and Poincare gauge theories: a
covariant Hamiltonian approach, arXiv:1507.07300.
463
5. Looking for the Right Answer to the Wrong Question?
Such obervations and considerations (and a dislike of pseudo-tensorial objects)
have led to the realisation that the notion of local energy density of the gravi-
tational field at a point may not be particularly useful or meaningful in general
relativity, as already summarised so eloquently by Misner, Thorne and Wheeler
in the early 1970s:
Anybody who looks for a magic formula for local gravitational energy-
momentum is looking for the right answer to the wrong question. Un-
happily, enermous time and effort were devoted in the past to trying to
answer this question before investigators realised the futility of the
enterprise.59
464
The final outcome may be that eventually a single best and universally accepted
and agreed upon definition of quasi-local gravitational energy emerges from these
investigations. However, at present it seems at least equally (if not far more)
likely that no such universal best definition exists and that the right answer
depends on the specific context and question one is asking.
http://www.livingreviews.org/lrr-2009-4.
465
22 Linearised Gravity and Gravitational Waves
While it is evidently of interest to apply general relativity to situations where the grav-
itational field is so strong that the Newtonian approximation fails (and we will consider
such situations later on), in most ordinary situations,the gravitational field is weak, very
weak, and then it is legitimate to work with a linearisation of the Einstein equations.
When we first derived the Einstein equations we checked that we were doing the right
thing by deriving the Newtonian theory in the limit where
3. matter is non-relativistic
The fact that the 2nd condition had to be imposed in order to recover the Newtonian
theory is interesting in its own right, as it suggests that there are novel features in general
relativity, even for weak fields, when the 2nd condition is not imposed. Indeed, as we
will see, in this more general setting one discovers that general relativity also predicts
the existence of gravitational waves, i.e. linearised perturbations of the gravitational
field propagating like ordinary waves on the (Minkowski) background geometry. Our
principal aim in this section will be to derive these linearised equations, show that they
are wave equations, and study some of their more elementary consequences.
An important next step would be to study and understand how or under which cir-
cumstances gravitational waves are created and how they can be detected. These,
unfortunately, are rather complicated questions in general and I will not enter into this.
The things I will cover in the following are much more elementary, both technically and
conceptually, than other applications of general relativity discussed elsewhere in these
notes.
We express the weakness of the gravitational field by the condition that the metric be
close to that of Minkowski space, i.e. that
(1)
g = g + h (22.1)
with |h | 1. This means that we will drop terms which are quadratic or of higher
power in h . Here and in the following the superscript (1) indicates that we keep only
up to linear (first order) terms in h . In particular, the inverse metric is
g(1) = h (22.2)
466
where indices are raised with .
As one has thus essentially chosen a background metric, the Minkowski metric, one can
think of the linearised version of the Einstein equations (which are field equations for
h ) as a Lorentz-invariant theory of a symmetric tensor field propagating in Minkowski
space-time. I wont dwell on this but it is good to keep this in mind. It gives rise to
the field theorists picture of gravity as the theory of an interacting spin-2 field (which
is useful for many purposes but which I do not subscribe to unconditionally because it
is an inherently perturbative and background dependent picture).
It is straightforward to work out the Christoffel symbols and curvature tensors in this
approximation. The terms quadratic in the Christoffel symbols do not contribute to the
Riemann curvature tensor and one finds
(1)
= 12 ( h + h h )
(1) 1
R = 2 ( h + h h h ) (22.3)
(this result for the Riemann tensor can also be inferred directly from the expression
(7.13) of the Riemann tensor at the origin of an inertial coordinate system). Hence
where
h h = h = h00 + ik hik (22.5)
is the trace of h and
= = (0 )2 + (22.6)
is the Minkowski wave operator;
R(1) = R
(1)
= h h ; (22.7)
G(1) (1) 1
= R 2 R
(1)
(22.8)
= 12 ( h + h h h h + h) ;
are thus
(1)
R =0 h + h h h = 0 , (22.10)
467
(1)
or, equivalently (in the form G = 0)
h + h h h h + h = 0 . (22.11)
The latter can be derived from the quadratic Minkowski space (Poincare invariant) field
theory action (the Fierz-Pauli action (1939))
Z
S[h ] = d4 x L(h , h )
(22.12)
L(h , h ) = 14 h, h, + 21 h, h, + 41 h, h, 21 h, h,
for a free massless spin-2 field described by the Lorentz tensor h . Indeed, variation of
the action with respect to the h (and the usual integration by parts) leads to
Z
h h + h S[h ] = d4 x G(1) h
. (22.13)
Remarks:
so that the sign in (22.13) is the standard one if expressed in terms of the variation
of g , Z
S[h ] = + d4 x G(1)
g
(1)
. (22.15)
2. Signs 2: the sign and overall normalisation of the Lagrangian are also such that
one obtains the canonically normalised kinetic term
L = + 14 (t hik )2 + . . . (22.16)
3. Note that the Lagrangian is not the linearised Einstein-Hilbert Lagrangian, i.e.
the linearised Ricci scalar (22.7) (as the latter is, by construction, linear in h ).
Rather,
L = 12 h G(1)
+ total derivative , (22.17)
where the total derivative contribtions serve to turn the terms of the form h 2 h
(1)
arising from h G into the standard (h)2 -form, so that L is of a standard
(quadratic) form in the derivatives of h , as it should be.
4. Note that we can indeed take the integration measure to be the flat Minkowski
measure d4 x, as L is already quadratic in h and its derivatives, so that any
contribution of h to the measure would give a subleading contribution.
468
5. This theory, just like its spin-1 Maxwell counterpart, has a local gauge invariance
which we will return to and make use of below.
G(1) (0)
= 8GN T . (22.18)
Note that only the zeroth order term in the h-expansion appears on the right
hand side of this equation. This is due to the fact that T must itself already
(0)
be small in order for the linearised approximation to be valid, i.e. T should be
of order h . Therefore, any terms in T depending on h would already be of
order (h )2 and can be dropped.
T (0) = 0 , (22.19)
G(1) = 0 , (22.20)
which can easily be verified, and which reflects the invariance of the theory under
the linearised coordinate transformations to be discussed below.
In section 18.3 we verified that the Newtonian (weak field, static, non-relativistic matter)
limit of the Einstein equations reduces to the Newtonian (Poisson) equation =
4GN . This is a special case of the above general weak-field limit equations (22.18),
(1)
with G given in (22.8), and it will be instructive to redo the calculation of section
18.3 from this more general perspective.
We thus assume an energy-momentum tensor whose only non-negligible component is
the energy density T00 = , with static and GN 1. Then we can assume that
the deviations of the space-time geometry from the Minkowski metric are small and
time-independent,
(1)
g = g = + h , 0 h = 0 . (22.21)
Then the (00)-component of the Einstein tensor reduces (after some cancellations) to
(1)
G00 = 12 (ik hik ) + 12 i k hik . (22.22)
In particular, h00 and its derivatives have dropped out of this expression, which appears
to be at odds with the desired G00 = g00 for Newtonian fields (section 18.3). How-
ever, we have not yet used at all the condition that Tik = 0 Gik = 0. In particular,
469
for static perturbations we find from (22.8) that the trace of the spatial components of
the Einstein tensor is
(1)
ik Gik = 21 (ik hik ) 12 i k hik h00 (22.23)
so that
(1) (1)
Gik = 0 G00 = h00 . (22.24)
This is precisely the relation required (and verified) in the analysis of section 18, in
order to have the correct Newtonian limit.
As we are in the realm of a Poincare-invariant classical field theory, we can define the
total energy of the system as the integral of the energy-density T00 = over a spatial
constant time x0 = const. slice ,
Z
E= d3 x T00 (22.25)
(another way to see that we can take the integration measure to be the flat Euclidean
measure d3 x is to recall that T00 is small by assumption). If the linearised Einstein
equations are satisfied, we can express the total energy in terms of the Einstein tensor,
Z
1 (1)
E= d3 x G00 . (22.26)
8GN
(1)
Now we have two distinct expressions for G00 in terms of the derivatives of the metric
and, interestingly, both of them are spatial total derivatives, namely
(1)
G00 = 12 (ik hik ) + 21 i k hik = i ( 12 i hkk + 12 k hik ) (22.27)
and
(1)
G00 = h00 = i ( i h00 ) . (22.28)
This allows us to write the total energy E of the system as a boundary integral over
the boundary
2
= S (22.29)
or as I
1
E= dSi i h00 . (22.31)
8GN 2
S
470
Note that these final expressions only depend on the asymptotics of the gravitational
field at spatial infinity (the assumption that the gravitational field is weak there trans-
lating into the statement that the metric is asymptotically flat). It is thus tempting
to adopt E as the definition of the total energy of an isolated (i.e. asymptotically flat)
system in general, even when the field in the interior is not weak:
and
h = gik hik (22.35)
is the trace of hik with respect to the background metric. If the background
metric is the Minkowski metric, and one works in Cartesian coordinates, then
(22.33) reduced to (22.32).
The justification for the identification of this quantity with the energy is sub-
stantially strengthened by the canonical (Hamiltonian) ADM analysis of general
relativity discussed in section 20, in particular in section 20.12. Indeed, in section
20.12 we deduced another candidate expression for the energy from the boundary
term in the ADM Hamiltonian, namely (20.190)
I
1
EADM = d2 x s(kS kS0 ) (22.36)
8GN S
At first sight, beyond the fact that both ar expressed as integrals over a 2-sphere at
infinity and that both depend in some way on a choice of reference (background)
metric, these two expresssions appear to have little in common. However, it is
possible to show that (22.33) is in fact equal to (22.36) provided that the metrics on
S induced by the metrics gik (appearing in (22.33)) and gik 0 (implicitly appearing
in (22.36)) are the same. The proof of this statement I am aware of relies on a
convenient choice of coordinate system and is therefore not particularly insightful
per se and will not be given here.62
62
For this proof, see e.g. S. Hawking, G. Horowitz, The Gravitational Hamiltonian, Action, Entropy,
and Surface Terms, arXiv:gr-qc/9501014, or Exercise 4.5.7 of E. Poisson, A Relativists Toolkit (mod-
elled on the cited article).
471
2. The second expression is actually a special case of the Komar charges briefly
mentioned in section 12.7, here applied to the (asymptotic) timelike Killing vector
t (the generator of time-translations, and thus naturally associated with the
energy or Hamiltonian). Indeed, recall that for any Killing vector K we had
the conserved current J = R K (12.49), which was itself a divergence of an
anti-symmetric tensor, J = A , with A = A = K . Thus the
corresponding conserved charge QK (V ) contained in a volume V can be written
as a surface integral of K over the boundary V of the volume V ,
I
QK (V ) dS K . (22.37)
V
This fixes the normalisation of the Komar charge QK (V ) (22.37) in this case,
I
1
EKomar (V ) QK=t (V ) = dS K (22.40)
8GN V
and we can identify (22.31) with the total Komar energy (V = ) associated to
the Killing vector K = t ,
I I
1 1
EKomar () = dSi i h00 = dS K . (22.41)
8GN S 2 8GN S
2
As our primary litmus test, in section 23.8 we will apply these expressions to the
Schwarzschild metric, the unique spherically symmetric asymptotically flat solution of
the vacuum Einstein equations. In particular, it therefore describes the gravitational
field in the exterior of a star, and it depends on a single parameter m, related to the
mass of the star by m = GN M/c2 , and it is then of interest to see what the ADM
and Komar energies have to say about this. Suffice it to say here that reassuringly one
indeed finds EADM = EKomar = M .
In the same way one can also introduce various notions of momentum or angular mo-
mentum of an isolated system (the latter for instance being non-zero for the Kerr metric
(29.3) describing a rotating star or black hole), but we will not pursue this here.
472
22.5 Wave Equations and Gauge Conditions in Maxwell Theory
We will now abandon the assumption of static non-relativistic fields and return to the
general weak-field linearised Einstein equations (22.18). In order to understand how to
proceed from there, in this section we will briefly recall in a condensed way the analogous
steps in the case of Maxwell theory.
1. The Maxwell equations (in a Minkowski background) are already linear (no need
to linearise) and read
F = J . (22.42)
A ( A ) = J . (22.43)
A A + V (22.44)
(in particular, the field strenghts F are invariant) which allows one to choose
for instance the Loren(t)z gauge condition63
A = 0 . (22.45)
V = A , (22.46)
then
(A + V ) = 0 . (22.47)
With this choice of gauge the Maxwell equations become decoupled wave equa-
tions,
A = J , (22.48)
63
Credit for this should really go the Danish physicist Ludvig V. Lorenz (who used this condition in
its Lorent(!)z non-invariant form already in 1867, just 3 years after the publicaton of Maxwells treatise)
rather than the more famous Dutch physicist Hendrik A. Lorentz. See J. Jacksons Electrodynamics
(note at the end of chapter 6).
473
and can now be straightforwardly solved in terms of Green functions etc. One
particular solution to the inhomogeneous equation is the retarded potential
Z
1 J (t |~x ~x |, ~x )
A (t, ~x) = (4) d3 x . (22.49)
|~x ~x |
3. The homogeneous equation, i.e. the vacuum Maxwell equations, can now be solved
in terms of plane waves,
A = e ik x with k k = 0 (22.50)
(or wave packets constructed from them), and the Lorenz gauge constrains the
polarisation vector by
k = 0 . (22.51)
For a wave travelling in the x3 -direction, this implies for the polarisation vector
k = (, 0, 0, k3 = ) 3 = 0 . (22.52)
A = A + , = 0 (22.53)
now completely fixes this residual gauge invariance. Under this residual gauge
transformation the polarisation vector transforms as
= + i0 k , (22.55)
in particular
0 = 0 i0 . (22.56)
Thus choosing 0 = 0 /i, one has 0 = 0, implying 3 = 0, and one is left with
the polarisation vector
= (0, 1 , 2 , 0) (22.57)
474
22.6 Linearised Gravity: Gauge Invariance and Coordinate Choices
We now proceed analogously in the case of linearised gravity. First of all we need to
understand the gauge invariance (or the counterpart of gauge invariance) in the present
case.
The original non-linear Einstein equations have a local invariance consisting of gen-
eral coordinate transformations. What remains of general coordinate invariance in the
linearised approximation are, naturally, linearised general coordinate transformations.
Indeed, h and
h = h + LV (22.58)
V h = LV (22.60)
the linearised Riemann tensor transforms into the Riemann tensor for the Minkowski
metric,
(1)
V R = LV R ( ) = 0 . (22.61)
However, in contrast to the Lagrangian of Maxwell theory, say, the Lagrangian (22.12)
for linearised gravity is not strictly gauge invariant, but only invariant up to a total
derivative.
Given this gauge invariance of the linearised theory, for explicit calculations it is useful
to make a particular gauge choice and, as in Maxwell theory, a good choice of gauge can
simplify things considerably (and a bad choice of gauge can have the opposite effect).
g = 0 . (22.62)
It is called the harmonic gauge condition, or Fock, or de Donder gauge condition (even
though harmonic coordinates were used extensively by Einstein himself until just before
the discovery of the final, truly generally covariant, formulation of general relativity).
The name harmonic derives from the fact that in this gauge the coordinate fuctions x
are harmonic:
x g x = g , (22.63)
475
and thus
x = 0 g = 0 . (22.64)
h 21 h = 0 . (22.65)
The gauge parameter V which will achieve this is the solution to the equation
V = ( h 12 h) . (22.66)
(h + V + V ) 21 ( h + 2 V ) = 0 . (22.67)
Note for later that, as in Maxwell theory, this gauge choice does not necessarily fix
the gauge completely. Any transformation x x + with = 0 will leave the
harmonic gauge condition invariant.
Now let us use this gauge condition in the linearised Einstein equations. In this gauge
they simplify somewhat to
(0)
h 12 h = 16GN T . (22.68)
In particular, the vacuum equations (or the equations in a source-free region of space-
time) are just
(0)
T = 0 h = 0 , (22.69)
h = 0
h 12 h = 0 (22.70)
= h 1 h .
h (22.71)
2
476
This is also commonly known as the trace reversed perturbation, because in 4 spacetime
dimensions (but only there) one has
= h .
h (22.72)
Note, as an aside, that with this notation and terminology the Einstein tensor (again
in 4 spacetime dimensions only) is the trace reversed Ricci tensor,
= R 1 g R = G .
R (22.73)
2
, the linearised Einstein equations and the harmonic gauge condition (in
In terms of h
any dimension) are just
h (0)
= 16GN T
h = 0 . (22.74)
This way of writing the linearised Einstein equations sharpens the analogy with the
Maxwell equations in the Lorenz gauge:
the Lorenz gauge A = 0 decouples the Maxwell equations for A which in this
gauge read A = j
the gauge condition h = 0 decouples the linearised Einstein equations for the
which in this gauge read h = 16GN T(0)
variables h .
The homogeneous equation is the linearised vacuum Einstein equation in the harmonic
gauge,
h = 0 . (22.76)
477
Thus plane waves are solutions to the linearised equations of motion and the Einstein
equations predict the existence of gravitational waves travelling along null geodesics (at
the speed of light). The timelike component of the wave vector is often referred to as
the frequency of the wave, and we can write k = (, ki ). Plane waves are of course
not the most general solutions to the wave equations but any solution can be written
as a superposition of plane wave solutions (wave packets).
So far, we have ten parameters and four parameters k to specify the wave, but
many of these are spurious, i.e. can be eliminated by using the freedom to perform
linearised coordinate transformations and Lorentz rotations.
= 0 k = 0 .
h (22.78)
h
h + V + n uV V
(22.79)
h h
+ V
so that the gauge condition is invariant precisely under linearised coordinate transfor-
mations with V = 0. Taking the solution of this equation to be of the form
V = v e ik x , (22.80)
+ i(k v + k v ) i k v . (22.81)
One can now choose the v in such a way (see the example below) that the new polari-
sation tensor satisfies k = 0 (as before) as well as
0 = = 0 . (22.82)
All in all, we appear to have nine conditions on the polarisation tensor but as both
(22.78) and the first of (22.82) imply k 0 = 0, only eight of these are independent.
Therefore, there are two independent polarisations for a gravitational wave.
= 0
k h , 0 = 0 ,
h = 0 .
h (22.83)
This is known as the transverse traceless gauge, and a field satisfying this gauge is
frequently denoted by
hTT .
478
For example, let us consider a wave travelling in the x3 -direction,
k = (, 0, 0, k3 ) = (, 0, 0, ) . (22.84)
Then
= 0 = h ,
h (22.85)
we have deduced that the metric describing a gravitational wave travelling in the x3 -
direction can always be put into the form
with hab = hab (t x3 ). This neatly encodes and describes the distortion of the space-
time geometry in the directions transverse to the gravitational wave.
Remarks:
u = t x3 , v = t + x3 , (22.87)
By abandoning the assumption that hab (u) be small, one can look for (and easily
find) solutions of the full non-linear Einstein equations of the form
These are known as exact gravitational plane waves, and are discussed in some
detail in section 42.
479
2. This analysis was rather evidently independent of the dimension. In D dimensions
the polarisation states of a graviton are described by a symmetric, transverse, and
traceless tensor hab where a, b = 1, . . . , D 2. Thus the number of physical
polarisation states of a graviton in D dimensions are
(D 2)(D 1) D(D 3)
#[hab ] = 1= . (22.90)
2 2
Note that this gives zero for D = 3, in agreement with the fact, noted before,
that in 3 space-time dimensions there is no gravitational vacuum dynamics since
vanishing of the Ricci tensor is equivalent to vanishing of the Riemann tensor.
To determine the physical effect of a gravitational wave racing by, we cannot just look
at the gravitational field (22.86) at a point (by the equivalence principle), i.e. we cannot
detect the presence of such a wave (ultra-)locally. However, we can consider its influence
on the relative motion of nearby particles. In other words, we look at the geodesic
deviation equation (7.36).
Consider a family of nearby particles described by the velocity field u (x) and separation
(deviation) vector S (x). Then the change of the deviation vector along the flow lines
of the velocity field is determined by
(D )2 S = R u u S . (22.91)
We consider the situation where the test particles are initially, in the absence of the
gravitational wave, at rest, u = (1, 0, 0, 0). Then the gravitational wave will lead, to
lowest order in the perturbation h , to a 4-velocity
However, because the Riemann tensor is already of order h, to lowest order the right
hand side of the geodesic deviation equation reduces to
(1)
R00 = 12 0 0 h (22.93)
(because h0 = 0). On the other hand, to lowest order the left hand side is just the
ordinary time derivative. Thus the geodesic deviation equation becomes (an overdot
denoting a t-derivative)
S .
S = 1 h (22.94)
2
480
x2
x1
the particles are only disturbed in directions perpendicular to the wave. The movement
of the particles in the 1-2 plane is then governed by
a S b (2 )a S b ,
Sa = 12 h (22.95)
b b
and we can consider separately the two cases (1) 12 = 0 and (2) 11 = 22 = 0.
481
x2
x1
Figure 10: Effect of a gravitational wave with polarisation 12 moving in the x3 -direction,
on a ring of test particles in the x1 x2 -plane.
2. If, on the other hand, 11 = 0 but 12 = 21 6= 0, then the lowest order solution is
3
S 1 (t) = S 1 (0) + 12 12 e i(t x ) S 2 (0)
3
(22.100)
S 2 (t) = S 2 (0) + 12 12 e i(t x ) S 1 (0) .
This time the deplacement in the x1 -direction is governed by the original deplace-
ment in the x2 -direction and vice-versa, and the ring of particles will bounce in
the shape of a (12 = ) - see Figure 10.
These solutions display the characteristic behaviour of quadrupole radiation, and this
is something that we might have anticipated on general grounds. First of all, we know
from Birkhoffs theorem that there can be no monopole (s-wave) radiation. Moreover,
dipole radiation is due to oscillations of the center of charge. While this is certainly
possible for electric charges, an oscillation of the center of mass would violate momentum
conservation and is therefore ruled out. Thus the lowest possible mode of gravitational
radiation is quadrupole radiation, just as we have found.
Now that we have found the solutions to the vacuum equations, we should include
sources and study the production of gravitational waves, characterise the type of radia-
tion that is emitted, estimate the radiated energy etc. However, this is quite a delicate
and both technically and conceptually quite challenging subject, and I will just develop
this to the extent that the quadrupole property of the radiation becomes plausible.64
64
See e.g. S. Weinberg, Gravitation and Cosmology, B. Schutz, A first course in general relativity, or
S. Carroll, Spacetime and Geometry, and of course C. Misner, K. Thorne, J. Wheeler, Gravitation for
detailed discussionss.
482
In order to study the production of gravitational waves, we need to include sources, i.e.
we need to go back to the retarded solution (22.75)
Z (0)
(t, ~x) = 4GN T (t |~x ~y |, ~y )
h d3 y . (22.102)
|~x ~y |
At large distances, and if the source does not oscillate too rapidly (the wavelength
should be much larger than the size of the source), one can approximate this by
Z
4GN
h (t, ~x) d3 y T
ret
(t, ~y ) , (22.103)
r
where r = |~x| and the retarded source is
ret (0)
T (t, ~y ) = T (t r, ~y ) . (22.104)
This is the gravitational analogue of the dipole approximation to the multipole expan-
sion in electrodynamics (and, as we will see, here this turns out to be a quadrupole
approximation).
(0)
Next, since T is conserved, also
Z
0 (ret)
h d3 y T0 (22.105)
we thus have
ik (t, ~x) 2GN Q
h ret , (22.108)
ik
r
where (6.54) Z
Qret
ik (t) = d3 x ret xi xk (22.109)
ret = ret . Thus, if
is the quadrupole moment tensor of the retarded energy density T00
the source has a time-dependence
(t) e it , (22.110)
say (of course, one should in the end take real superpositions of such modes), then
i(t r)
ik (t, r) 2GN 2 Qret e
h (22.111)
ik
r
483
clearly describes an outgoing spherical wave.
As noted before, the retarded solution is automatically in the harmonic gauge, but it
is not yet in the transverse traceless gauge. Transforming the above solution to the
transverse traceless gauge, one finds that the (transverse, traceless) components
T T = hT T
h (22.112)
ab ab
are naturally expressed not in terms of the quadrupole moments Qik but in terms of
the so-called reduced (traceless) quadrupole moments
Z
Qik = d3 x ret (xi xk 13 ik r 2 )
ret
(22.113)
ret 1 ret j
= Qik 3 ik (Q ) j .
These formulae can now in principle be applied to various specific situations of interest
by specifying the source term appropriately.
Finally, one quantity of particular interest is of course the energy radiated away by the
source. However, as discussed in section 21.6, the notion of gravitational energy or
energy of the gravitational field is not in general well defined and raises numerous
conceptual and technical issues. One might perhaps have hoped that these issues can
be completely side-stepped in the linearised theory we are dealing with here, which is
after all much more like a standard field theory in Minkowski space. And indeed, several
strategies are available, and they all lead to expressions for the energy-density which
are of the standard form quadratic in the derivatives of the fields. For example one
can
expand the Einstein equations not only to linear but to quadratic order in the fluc-
tuations h and interpret the quadratic terms as the gravitational contribution
to the energy-momentum tensor,
...
E.g. in the transverse traceless gauge the Fierz-Pauli action reduces to a standard
quadratic action
L = 21 ( h11 h11 + h12 h12 ) . (22.114)
For hab = hab (t x3 ), say, this gives rise to an energy density and energy flux
...ret
00 = 03 = (h 11 )2 + (h 12 )2 r 2 (Q )2 . (22.115)
484
On the basis of such considerations one might expect or anticipate the total radiated
energy to be proportional to something like
...ret
dE/dt (Q )2 . (22.116)
However, dealing with quadratic terms in a linearised theory is somewhat dodgy and not
strictly speaking internally consistent. As a consequence, equally plausible strategies
may not necessarily lead to equivalent results. Nevertheless, there appears to be some
consensus that a formula like this is indeed correct, and more specifically that (with
certain approximations and averaging) one has the remarkable formula
dE GN ...ret ...ret
= (Q )ik (Q )ik . (22.117)
dt 5
While this formula (with its 3rd derivative squared) may look unfamiliar, it is precisely
analogous to the corresponding formula for the radiated power of an electric quadrupole
...2
in Maxwell theory (also proportional to Q ). The main difference between gravitational
and electro-magentic radiation lies in the fact that in the Maxwell case the leading
(lowest multipole) contribution arises from dipole radiation, while in the gravitational
case the leading contribution is quadrupole radiation.
I will conclude this section with some very general comments on the detection of gravi-
tational waves.
Alternatively, the particles need not be free but could be connected by a solid piece
of material. Then gravitational tidal forces will stress the material. If the resonant
frequency of this antenna equals the frequency of the gravitational wave, this should
lead to a detectable oscillation. This is the principle of the so-called Weber detectors
or Weber bars (1966-. . . ). While fine in principle, in practice gravitational waves are
extremely weak. To the best of my knowledge, such detectors have not produced con-
clusive results so far, and other detection techniques are favoured in modern generations
of detectors.
More sensitive modern experiments are not fine-tuned to a particular resonant frequency
but can in principle detect a continuous range of frequencies. These use detectors based
on huge laser Michelson interferometers (arms several kilometers long), e.g. LIGO (Laser
485
Interferometer Gravitational Wave Observatory) and VIRGO. These or their upgrades,
or the space-based LISA (Laser Interferometer Space Antenna), are widely expected to
have reached sufficient sensitivity to finally directly detect gravitational waves in the
next couple of years.65
However, in spite of the absence of direct evidence for gravitational waves, reassuringly
there is indirect (and very compelling) evidence for gravitational waves. A binary system
of stars rotating around its common center of mass should radiate gravitational waves
(much like electro-magnetic synchroton radiation). For two stars of equal mass M at
distance 2r from each other, the prediction of General Relativity is that the power
radiated by the binary system according to the general formula (22.117) is
2 G4N M 5
P = dE/dt = . (22.118)
5 r5
This energy loss has actually been observed. In 1974, Hulse and Taylor discovered a
binary system, affectionately known as PSR1913+16, in which both stars are small neu-
tron stars, both roughly of solar mass, one of them being a pulsar, a rapidly spinning
neutron star. The period of the orbit is only eight hours, and the fact that one of
the stars is a pulsar provides a highly accurate clock with respect to which a change
in the period as the binary loses energy can be measured. The observed value is in
good agreement with the theoretical prediction for loss of energy by gravitational ra-
diation and Hulse and Taylor were rewarded for these discoveries with the 1993 Nobel
Prize. These observations have been confirmed and refined by the discovery and precise
measurements and observations of other (even more extreme) binary systems.
Other situations in which gravitational waves might be either detected directly or in-
ferred indirectly are extreme situations like gravitational collapse (supernovae) or matter
orbiting black holes.
65
If you want (your PC) to contribute to the search for (continous) gravitational wave sources such
as pulsars, take a look at the Einstein@Home project at http://einstein.phys.uwm.edu/.
486
D: General Relativity and the Solar System
487
23 Einstein Equations and Spherical Symmetry
23.1 Introduction
3. the anomalous precession of the perihelion of the orbits of Mercury and Venus,
and calculated the theoretical predictions for these effects. In the meantime, other tests
have also been suggested and performed, for example the time delay of radar echos
passing the sun (the Shapiro effect).66
All these tests have in common that they are carried out in empty space, with grav-
itational fields that are to an excellent aproximation stationary (time independent)
and isotropic (spherically symmetric). Thus our first aim will have to be to solve the
vacuum Einstein equations under the simplifying assumptions of isotropy and time-
independence. This, as we will see, is
Even though we have decided that we are interested in stationary spherically symmetric
metrics, we still have to determine what we actually mean by this statement. After all, a
metric which looks time-independent in one coordinate system may not do so in another
coordinate system. There are two ways of approaching this issue:
1. One can try to look for a covariant characterisation of such metrics, in terms of
Killing vectors etc. In the present context, this would amount to considering met-
rics which admit four Killing vectors, one of which is timelike, with the remaining
three representing the Lie algebra of the rotation group SO(3).
2. Or one works with preferred coordinates from the outset, in which these symme-
tries are manifest.
66
For detailed discussions of experimantal tests of general relativity see e.g. (1) the popular account
C. Will, Was Einstein right?, (2) the detailed monograph C. Will, Theory and Experiment in Grav-
itational Physics, (3) the recent review article S. Turyshev, Experimental tests of general relativity,
arXiv:0806.1731 [gr-qc], and the useful resource letter C. Will, Resource Letter PTG-1: Precision
Tests of Gravity, arXiv:1008.0296 [gr-qc].
488
While the former approach may be conceptually more satisfactory, the latter is much
easier to work with and is hence the one we will adopt.
It is important to recall and realise once again that, precisely because the theory is in-
variant under coordinate transformations, one is allowed to choose whatever coordinate
system is most convenient to perform a calculation (much like Lorentz invariance in
special relativity allows one to prove Lorentz-invariant statements by proving them in
any suitably chosen inertial frame).
This ansatz, depending on the four functions A(r), B(r), C(r), D(r), can still be simpli-
fied a lot by choosing appropriate new time and radial coordinates.
Then
dT 2 = dt2 + 2 dr 2 + 2 dr dt . (23.3)
Thus we can eliminate the off-diagonal term in the metric by choosing to satisfy the
differential equation
d(r) C(r)
= . (23.4)
dr A(r)
This is tantamount to making the coordinate choice C(r) = 0, so that the metric can
be chosen to have the diagonal form
In the terminology of section 15.4 the metric is then not only stationary but actually
static and thus what we have shown is that a stationary spherically symmetric metric
is automatically static (as already mentioned in the discussion around (15.87)). Thus
in the context of spherical symmetry the two notions coincide and we will not be overly
pedantic about the use of the word stationary versus that of the word static in the
following.
We can also eliminate D(r) by introducing a new radial coordinate R(r) by R2 = D(r)r 2 .
Thus we can assume that the line element of a static isotropic metric is of the form
489
This is tantamount to making the coordinate choice D(r) = 1, and leads to what is
known as the standard form of a static isotropic metric.
Remarks:
1. Of course this choice is only valid if, or in regions where, D(r) > 0. More generally,
any coordinate choice is a local choice of coordinates, and one needs to be aware
of the possibility that such a choice will not provide a global picutre of the space-
time one wishes to describe. This will be amply illustrated by our discussion in
sections 25 and 26.
with d~x2 the standard Euclidean line element. This is the static spherically
symmetric metric in what is known as isotropic form. The advantage of this
isotropic form of the metric is that one can, as already indicated in (23.7), replace
dr 2 + r 2 d2 by e.g. the standard metric on R3 in Cartesian coordinates, or any
other metric on R3 . This is useful when (like many astronomers) one likes to think
of the solar system as being essentially described by flat space, with some choice
of coordinates.
3. As we will see in the course of section 26, other useful coordinate choices that we
might have made (at least with the benefit of hindsight), are (23.1) with D(r) = 1
and either the condition B(r) = 1 (this is what will give rise to the so-called
Painleve-Gullstrand coordinates, section 26.2, or the condition B(r) = 0 (which
will give rise to Eddington-Finkelstein coordinates, section 26.4). In the former
case the metric takes the form
and has the characteristic property that it is non-diagonal while the metric on the
slices of constant t is the flat Euclidean metric.
For the time being, however, we will mostly be using the metric in the standard form
(23.6), as this coordinate system is well adapted to the description of the exterior of a
normal star. Let us note some immediate properties of this metric:
490
2. The surfaces of constant t and r have the metric
3. Because B(r) 6= 1, we cannot identify r with the proper radial distance. How-
ever, even though r is not a measure of proper radial distance, it has the clear
geometrical significance that a 2-sphere of coordinate radius r has the area
A(Sr2 ) = 4r 2 . (23.10)
For this reason, the coordinate r is also known as the aerea radius or aerial radius.
4. Also, even though the coordinate time t is not directly measurable, up to an affine
transformation
t at + b (23.11)
it can be invariantly characterised by the fact that /t is a timelike Killing vector.
5. The functions A(r) and B(r) are now to be found by solving the Einstein field
equations.
7. We will come back to other aspects of measurements of space and time in such a
geometry after we have solved the Einstein equations.
In conclusion to this section I want to stress that in the present discussion we have
assumed from the outset that the metric is stationary. However, it can be shown with
little effort (see section 23.6) that the vacuum Einstein equations actually imply all by
themselves hat a spherically symmetric metric is necessarily static!
This result is known as Birkhoff s theorem. It is the General Relativity analogue of
the Newtonian result that a spherically symmetric body behaves as if all the mass were
concentrated in its center.
In the present context it means that the gravitational field not only of a static spherically
symmetric body is static and spherically symmetric (as we have assumed), but that the
same is true for a radially oscillating/pulsating object. This is a bit surprising because
one would expect such a body to emit gravitational radiation. What Birkhoffs theorem
shows is that this radiation cannot escape into empty space (because otherwise it would
destroy the time-independence of the metric). Translated into the language of waves,
this means that there is no s-wave (monopole) gravitational radiation.
491
23.3 Solving the Einstein Equations: the Schwarzschild Metric
We will now solve the vacuum Einstein equations for the static isotropic metric in
standard form, i.e. we look for solutions of R = 0 for metrics of the type (23.6). You
should have already (as an exercise) calculated all the Christoffel symbols of this metric,
using the Euler-Lagrange equations for the geodesic equation, as described in section
2.5.
As a reminder, here is how this method works. To calculate all the Christoffel symbols
r , say, in one go, you look at the Euler Lagrange equation for r = r( ) resulting from
the Lagrangian L = g x x /2. This is easily seen to be
B 2 A 2
r + r + t + = 0 (23.13)
2B 2B
(a prime denotes an r-derivative), from which one reads off that rrr = B /2B etc.
Proceeding in this way, you should find (or have found) that the non-zero Christoffel
symbols are given by
B A
rrr = rtt =
2B 2B
r r sin2
r = r =
B B
1 t A
r = r = tr =
r 2A
= sin cos = cot (23.14)
Now we need to calculate the Ricci tensor of this metric. A silly way of doing this
would be to blindly calculate all the components of the Riemann tensor and to then
perform all the relevant contractions to obtain the Ricci tensor. A more intelligent and
less time-consuming strategy is the following:
1. Instead of using the explicit formula for the Riemann tensor in terms of Christoffel
symbols, one should use directly its contracted version
R = R
= + (23.15)
Since the metric is invariant under t t, the Ricci tensor should also be
invariant.
492
Under the coordinate transformation t t, Rrt transforms as Rrt Rrt .
Hence, invariance requires Rrt = 0, and no further calculations for this com-
ponent of the Ricci tensor are required.
Rr = Rr = Rt = Rt = R = 0 . (23.16)
4. Since the Schwarzschild metric is spherically symmetric, its Ricci tensor is also
spherically symmetric. It is easy to prove, by considering the effect of a coordinate
transformation that is a rotation of the two-sphere defined by and (leaving
the metric invariant), that this implies that
R = sin2 R . (23.17)
Here is one possible proof (I will give a shorter argument below): Consider a
coordinate transformation (, ) ( , ). Then
" 2 2 #
d 2 + sin2 d2 = + sin2 d 2 + . . . (23.18)
2
R = (
) R + ( )2 R . (23.20)
Demanding that this be equal to R (because we are considering a coordinate
transformation which does not change the metric) and using the condition derived
above, one obtains
2
R = R (1 sin2 (
) ) + ( )2 R , (23.21)
which implies (23.17).
5. Alternatively, look at the mixed spherical components Rab of the Ricci tensor,
xa = (, ). Rotational invariance implies that Rab ab . Since g = 0 and
g = sin2 g , this implies for the covariant components Rab that R = 0 and
R = sin2 R .
6. Thus the only components of the Ricci tensor that we need to compute are Rrr ,
Rtt and R .
493
7. Rtt was already determined in section 12.5 (see (12.37)) using a shortcut procedure
based on the Killing vector = t and some identities relating Killing vectors and
the curvature tensor, and R and R can be calculated by the same procedure
(or directly).
Putting everything together, the final result for the Ricci tensor of the general static
spherically symmetric metric is
A A A B A
Rtt = ( + )+
2B 4B A B rB
A A A B B
Rrr = + ( + )+
2A 4A A B rB
1 r A B
R = 1 ( ) . (23.22)
B 2B A B
Inspection of these formulae reveals that there is a linear combination which is partic-
ularly simple, namely BRtt + ARrr , which can be written as
BRtt + ARrr = 1
rB (A B + B A) . (23.23)
R = 0 A 1 + rA = 0 (Ar) = 1 , (23.26)
To fix C, we compare with the Newtonian limit which tells us that asymptotically
A(r) = g00 should approach (temporarily reinserting c) (1 + 2/c2 ), where =
GN M/r is the Newtonian potential for a static spherically symmetric star of mass M .
Thus C = 2M G/c2 , and the final form of the metric is
2M GN 2 2 2M GN 1 2
ds2 = (1 2
)c dt + (1 ) dr + r 2 d2 (23.28)
c r c2 r
494
This is the famous Schwarzschild metric, obtained by the astronomer Karl Schwarzschild
in 1916, the very same year that Einstein published his field equations, while in hospital
as a soldier in World War I.67 It was apparently discovered independently a few months
later by Johannes Droste, a student of Lorentz (cf. the reference in footnote 11 in section
4.9).
We will usually not write the constant GN explicitly (and set c = 1), and thus we
introduce the abbreviation
GN M
m= , (23.29)
c2
in terms of which the Schwarzschild metric takes the form
2m
ds2 = f (r)dt2 + f (r)1 dr 2 + r 2 d2 , f (r) = 1 . (23.30)
r
The interpretation of m is that of the gravitational mass radius associated to the mass
M . To see that all this is dimensionally correct, note that Newtons constant has
dimensions (M mass, L length, T time) [GN ] = M1 L3 T2 so that
For examples of the value of m for various objects see section 23.4.
We have seen that, by imposing appropriate symmetry conditions on the metric, and
making judicious use of them in the course of the calculation, the complicated Einstein
equations become rather simple and manageable.
We begin our investigation of the Schwarzschild metric by taking a look at the coordi-
nates and their range (always a useful first step).
67
Due to some of the idiosyncracies of Einsteins earlier versions of his field equations, Einsteins
obsession with coordinate systems in which the determinant of the metric was precisely -1 in the
words of P. Fromholz, E. Poisson, C.M. Will, The Schwarzschild metric: Its the coordinates, stupid!,
arXiv:1308.0394 [gr-qc], Schwarzschild originally found this solution in different (and much less
convenient) coordinates. See this article for more details.
495
1. The polar coordinates and have their standard interpretation and range.
2. The time coordinate t can be interpreted as the proper time of a static observer
infinitely far away from the star, at r . Thus, given the asymptotic flatness
of the solution, we can think of t as measuring Minkowski time. Clearly, the range
of t is unrestricted, < t < .
(a) We had already discussed above, that its geometrical interpretation is that
of an area radius, i.e. it is characterised by the fact that, even though r is
not proper radial distance, the surface area of a sphere of constant radius r
is 4r 2 .
(b) Moreover, the metric is, by construction, a vacuum metric. Thus, if the star
has radius r0 , then the solution is only valid for r > r0 , and the range of r is
restricted appropriately, r0 < r < .
(c) However, (23.30) also shows that the metric appears to have a singularity at
the Schwarzschild radius rs , given by
2GN M
rs = = 2m . (23.32)
c2
Thus, for the time being we will also require r > rs . For most practical
purposes, this is not a further constraint on the range of r, since the radius
of a physical object is almost always much larger than its Schwarzschild
radius. For example, for a proton, for the earth and for the sun one has
approximately
However, for more compact objects, their radius can approach that of their
Schwarzschild radius. For example, for neutron stars one can have rs 0.1r0 ,
and it is an interesting question (we will take up again later on, in sections
25 and 26) what happens to an object whose size is equal to or smaller than
its Schwarzschild radius.
(d) One thing that does not occur at rs , however, in spite of what (23.30) may
suggest, is a true physical singularity. The singularity in (23.30) turns out
to be a pure coordinate singularity, i.e. an artefact of having chosen a poor
coordinate system, and later on we will construct coordinates in which the
metric is completely regular at rs . Nevertheless, it turns out that something
interesting does happen at r = rs , even though there is no singularity and
e.g. geodesics are perfectly well behaved there: rs is an event-horizon, in a
496
sense a point of no return. Once one has passed the Schwarzschild radius of
an object with r0 < rs , there is no turning back, not on geodesics, but also
not with any amount of acceleration.
In order to learn how to visualise the Schwarzschild metric (for r > r0 > rs ), we will
now discuss some further elementary properties of length and time in the Schwarzschild
geometry.
Let us first consider proper time for a static observer, i.e. an observer at rest at fixed
values of (r, , ). Proper time is related to coordinate time by
Thus, first of all, up to a constant factor for such observers their proper time agrees with
their coordinate time, and we can simply and conveniently label events as described by
a static observer by the coordinate time t instead of the proper time of the observer.
Secondly, we can interpret the above relationship as the statement that static clocks
(measuring the proper time ) run slower in a gravitational field - something we already
saw in the discussion of the gravitational redshift in section 2.9, and also in the discussion
of the so-called twin-paradox and the equivalence principle in section 1.1. This formula
again suggests that something interesting is happening at the Schwarzschild radius r =
2m - we will come back to this below.
As regards spatial length measurements, thus dt = 0, we have already seen above that
the slices r = const. have the standard two-sphere geometry. However, as r varies, these
two-spheres vary in a way different to the way concentric two-spheres vary in R3 . To see
this, note that the proper radius R, obtained from the spatial line element by setting
= const., = const, is
In other words, the proper radial distance between concentric spheres of area 4r 2 and
area 4(r + dr)2 is dR > dr and hence larger than in flat space. Note that dR dr
for r so that, as expected, far away from the origin the space approximately looks
like R3 . One way to visualise this geometry is as a sort of throat or sink, as in Figure
11.
To get some more quantitative feeling for the distortion of the geometry produced by
the gravitational field of a star, consider a long stick lying radially in this gravitational
field, with its endpoints at the coordinate values r1 > r2 . To compute its length L, we
have to evaluate Z r1
L= dr(1 2m/r)1/2 . (23.36)
r2
497
Sphere of radius r+dr
dr
dR > dr
Sphere of radius r
Figure 11: Figure illustrating the geometry of the Schwarzschild metric. In R3 , concen-
tric spheres of radii r and r + dr are a distance dr apart. In the Schwarzschild geometry,
such spheres are a distance dR > dr apart. This departure from Euclidean geometry
becomes more and more pronounced for smaller values of r, i.e. as one travels down the
throat towards the Schwarzschild radius r = 2m.
It is possible to evaluate this integral in closed form (by changing variables from r to
u = 1/r), but for the present purposes it will be enough to treat 2m/r as a small
perturbation and to only retain the term linear in m in the Taylor expansion. Then we
find Z r1
r1
L dr(1 + m/r) = (r1 r2 ) + m log > (r1 r2 ) . (23.37)
r2 r2
We see that the corrections to the Euclidean result are suppressed by powers of the
Schwarzschild radius rs = 2m so that for most astronomical purposes one can simply
work with coordinate distances.
This is even more evident when one puts the Schwarzschild metric into the isotropic
form of the metric (23.7). This is accomplished by the coordinate transformation
m
r = (1 + )2 , (23.38)
2
leading to
m 2
(1 2 ) m 4 2
ds2 = m 2 dt
2
+ (1 + ) (d + 2 d2 )
(1 + 2 ) 2
m 2
(23.39)
(1 2 ) 2 m
= m 2 dt + (1 + )4 d~x2 ( 2 = ~x2 ) ,
(1 + 2 ) 2
as can easily be verified. To actually find the appropriate coordinate transformation in
the first place, one needs to solve the equation
498
i.e.
dr d
= ln(/0 ) = cosh1 (r/m 1) . (23.41)
rf 1/2 (r)
This leads to (23.38) for the choice 0 = m/2 (but any other choice would have been
just as good). From this one can now read off that the relation between proper and
coordinate distance (the latter now referring to the spatial Cartesian coordinates ~x in
(23.39)) is
m
(x)proper = (1 + )2 (x) . (23.42)
2
However, in interpreting this or using this form of the metric for other purposes, one
should pay attention to the fact that the region 2m < r < of the Schwarzschild
metric is covered twice by the isotropic coordinates. In particular, r both for
0 and for , while r() reaches its minimal value r = 2m for = m/2.
Thus the metric in isotropic coordinates appears to describe not just one but two iden-
tical (isometric) asymptotically flat regions, joined together at the 2-sphere = m/2
r = 2m. This is the first indication that with the Schwarzschild metric we seem to have
obtained more than we bargained for, in particular when considering objects whose
radius is smaller than the Schwarzschild radius rs = 2m. Later on, we will encounter
numerous other coordinate systems for the Schwarzschild metric, providing us with dif-
ferent insights into its physics and geometry. In particular, we will then (re-)discover
this second asymptotically flat region in section 26.7 (as the mirror region III).
A slightly more general calculation than we have performed in section 23.3 to find the
Schwarzschild solution for the exterior of a spherically symmetric static star provides
us with
2. some more insight into the interpretation of the parameter M as the mass of the
solution;
3. as an added benefit, the basic set of equations governing the solutions of the
Einstein equations for the interior of the star (an issue we will briefly consider in
section 23.7).
Thus, let us start with a general spherically symmetric (but not necessarily time-
independent) metric. Generalizing (23.1), we can at first parametrise such a metric
as
ds2 = A(t, r)dt2 + B(t, r)dr 2 + 2C(t, r)dr dt + R(t, r)2 (d 2 + sin2 d2 ) . (23.43)
499
This form of the metric is still invariant under transformations
This caveat is the following.68 Clearly, the metric (23.45) has the property that the
gradient of the radius of the 2-sphere is spacelike,
Simple counterexamples to the form (23.45) of the metric are thus provided
R(t, r) = R , (23.48)
R(t, r) = t , (23.49)
for which one could choose R as a new time coordinate, but not as a radial
coordinate.
These cases (as well as that where the gradient is null) would in principle require a
separate analysis, but I will forego this here. The case where the gradient is timelike
roughly speaking corresponds to an exhange of the roles of t and r, and since this will
be of some interest in our discussion of black holes later on, I will briefly come back to
this below.
68
Cf. e.g. the discussion in section 14.1 of J. Plebanski, A. Krasinski, An Introduction to General
Relativity and Cosmology.
500
For now we continute with the understanding that we are considering (regions of) space-
times for which the form (23.45) of the metric is valid. However, as we have already
seen in the derivation of the Schwarzschild metric, this parametrisation of the metric
in terms of the two functions A(r) and B(r) is not ideal. To see what might be more
convenient, we first reanalyse the Einstein equations in the time-independent case, but
this time with an energy-momentum tensor. Thanks to the relation (23.23) we have
(AB)
Rrr Rtt = (rB)1 . (23.50)
AB
This suggests that it is useful to introduce a new function h(r) through
i.e. through
A(r) = e 2h(r) f (r) , B(r) = f (r)1 (23.52)
for some arbitrary new function f (r). Using the Einstein equations (18.29) in the form
R = 8GN (T 21 T ) , (23.53)
one sees that one particular linear combination of the Einstein equations now takes the
form
h (r) = 4GN rf (r)1 (T rr T tt ) . (23.54)
The remaining independent component (in the time-independent case) can be chosen
to be
Rtt 12 R = 8GN T tt (23.55)
which, after a bit of algebra with the formulae (23.22), works out to be
Let us therefore now, in the time-dependent case, and with the benefit of the above
hindsight, parametrise the two arbitrary functions A(t, r) and B(t, r) in (23.45) in terms
of two other functions h(t, r) and either f (t, r) or m(t, r) by the substitutions
501
and
2m(t, r)
f (t, r) = 1 . (23.60)
r
Thus, explicitly, the modified ansatz for a general spherically symmetric metric is
In this gauge, the full (non-vacuum) Einstein equations turn out to take a particuarly
simple and useful form. The previously obtained equations (23.54) and (23.58) continue
to be valid also in the time-dependent case, and there is now one more independent
equation, arising from, say, the (rt)-component of the Einstein equation.
2m (t, r)
Gtt =
r2
2m(t,
r)
Grt = + 2
(23.62)
r
2h (t, r)f (t, r) 2m (t, r)
Grr = + ,
r r2
with a somewhat more complicated and unenlightning expression for the angular compo-
nents, depending on all of m , m , m, which we will fortunately not need. In
h , h , h,
m,
particular, among the 3 above components only Grt contains a time-derivative m(t, r) =
r
t m(t, r). Moreover one can replace G r by the simpler linear combination
m (t, r) = 4GN r 2 (T tt )
r) = 4GN r 2 (+T rt )
m(t, (23.64)
h (t, r) = 4GN rf (t, r)1 (T tt + T rr ) .
T = 0 m (t, r) = m(t,
r) = 0 m(t, r) = m constant ,
(23.65)
and that h (t, r) = 0 so that h = h(t) is only a function of t,
502
Thus h(t), which only appears in the (tt)-component of the metric, can simply be
absorbed into a redefinition of t,
f (r)(dtnew )2 +f (r)1 dr 2 + r 2 d2
Thus we uniquely recover the Schwarzschild solution, even without having to assume
from the outset that the metric is time-independent. This is Birkhoffs theorem.
Remarks:
1. A caveat related to that at the beginning of this section should be added here: if
one applies the above reasoning to a region of space-time where f (r) < 0 (we will
study this region of the Schwarzschild metric in great detail in section 26), so that
the roles of t and r are interchanged, then the above argument still shows that an
additional Killing vector emerges from the joint requirement of spherical symmetry
and the vacuum Einstein equations, but now this Killing vector (misleadingly
called t ) is spacelike and not timelike.69
2. The above set (23.64) of Einstein equations for spherical symmetry (which should
still be supplemented by, say, the conservation law for the energy-momentum
tensor), also allows one to read off some fairly simple generalisations of Birkhoffs
theorem, such as spherical symmetry and static sources (i.e. T = T (r))
metric is time-independent.
4. Realistic astrophysical systems are neither exactly spherically symmetric nor ex-
actly vacuum (even outside the star), and typically the sources are not static
either. It is therefore of interest to investigate more generally if or to which extent
69
For a more careful statement and proof of Birkhoffs theorem along these lines see e.g. K. Schleich,
D. Witt, A simple proof of Birkhoff s theorem for cosmological constant, arXiv:0908.4110v2 [gr-qc].
70
H.-J. Schmidt, The tetralogy of Birkhoff theorems, arXiv:1208.5237 [gr-qc].
503
Birkhoffs theorem remains approximately true when the system under consider-
ation is only approximately spherically symmetric or vacuum. This question has
been analysed by Goswami and Ellis.71
of the Schwarzschild solution is implied not just by the vacuum Einstein equa-
tions but, more generally, by the Einstein equations with T tt = T rr . This sit-
uation is not as uncommon as one may think. For example, solutions of the
Einstein-Maxwell equations for spherically symmetric electrically charged stars
(the Reissner-Nordstrm solution, see section 30), even with the inclusion of a
cosmological constant, turn out to also be of this form. Some other examples of
solutions of this type (in more than 4 dimensions) are presented in section 29.3.72
We conclude this section with some remarks about the mass function m(t, r) appearing
in the ansatz (23.60) and its interpretation:
1. First of all, Since T tt is (minus) the energy density and T rt represents the radial
energy flux, the above equations show that m(t, r) can inded be interpreted as the
mass or energy of the solution.
Indeed, let (r) = T tt denote the energy density inside a static spherically sym-
metric star, say, and let m(r) = GN M (r). Then (23.58) implies
Z r
M (r) = 4r 2 (r) M (r) = 4 dr (r )2 (r ) , (23.69)
0
which looks exactly like the ordinary mass inside sphere of radius r (in flat space).
In particular, if (r) = 0 for r > r0 (with r0 the radius of the star), then one can
interpret Z r0
M M (r0 ) = 4 dr (r )2 (r ) (23.70)
0
as the total mass-energy of the star. One can (try to) attribute the difference
between this integral and that of (r) weighted by the proper spatial volume
element, Z r0 p
Mproper = 4 dr (r )2 grr (r )(r )
Z 0 r0 (23.71)
2 1/2
= 4 dr (r ) (r )(1 2m(r )) >M
0
to the binding energy of the star.
71
R. Goswami, G. Ellis, Almost Birkhoff Theorem in General Relativity, arXiv:1101.4520 [gr-qc],
R. Goswami, G. Ellis, Birkhoff Theorem and Matter, arXiv:1202.0240 [gr-qc].
72
For some further reflections on the ubiquity of such solutions, see T. Jacobson, When is gtt grr = 1?,
arXiv:0707.3222v3 [gr-qc].
504
2. It is also worth noting that the Misner-Sharp mass function MM S (t, r) = m(t, r)
in (23.60) has a coordinate invariant meaning (in spherical symmetry). First of
all, for the metric (23.61) one has
2m
grr = f = 1 . (23.72)
r
Now consider, as in the argument leading to (11.138), an arbitrary coordinate
transformation (t, r) z a (t, r), thus preserving the manifest spherical symmetry,
but e.g. abandoning the areal radius r as one of the coordinates. In particular,
r = r(z a ) is now a function of the new coordinates. Then the metric will take the
general spherically symmetric form
and by the usual tensorial tansformation rule for the metric one has
Thus in a general spherically symmetric coordinate system the mass function can
be expressed in terms of (or defined via) the gradient-squared of the radius function
of the transverse sphere,
r(z)
MM S (z) m(z) = 1 g ab (z)a r(z)b r(z) , (23.75)
2
and is thus a scalar under these coordinate transformations.
23.7 Interior Solution for a Static Star and the TOV Equation
The Schwarzschild solution is a solution of the vacuum Einstein equations for the exterior
of a spherically symmetric (static) star. The Einstein equations also govern and describe
the gravitational field = space-time geometry in the interior of the star. In this case one
needs to specify the energy-momentum tensor for the matter content in the interior of the
star, and in general this is a complicated astrophysics problem which we will not address
here. However, a useful idealised model of the energy-momentum tensor, compatible
with the symmetry requirements arising from the fact that the star is assumed to be
static and spherically symmetric, which we will take to mean that the metric has the
form
ds2 = e 2h(r) f (r)dt2 + f (r)1 dr 2 + r 2 d2
2m(r) (23.76)
f (r) = 1 ,
r
is provided by the ansatz
505
Here we interpret = (r) as the energy density of the star, and p = p(r) as its pressure
density.
Remarks:
1. This ansatz amounts to neglecting anisotropic stresses in the interior of the star
(the spatial off-diagonal components) as well as energy-flow in the form of heat-
conduction, say (the off-diagonal time-space components). Depending on the type
of star one wishes to describe this may or may not be a justified approximation
(but is considered to be an excellent approximation for very compact stars like
white dwarves / dwarfs and neutron stars).
T = ( + p)u u + pg , (23.78)
where
1/2
u = (gtt (r)) , 0, 0, 0 u u = gtt (r)(gtt (r))1 = 1 (23.79)
3. Specifying the energy-momentum content requires specifying not only the energy-
momentum tensor (23.77) but also an equation of state, which in this simplified
context amounts to postulating a relation p = p(). Again see section 34.2 for
a discussion and examples of this. We will sidestep this issue in the discussion
below since the only case we will consider explicitly is that of constant energy
density (r) = 0 , in which case the Einstein equations determine p = p(r) via
the Tolman-Oppenheimer-Volkoff equation to be derived (in general) below.
With this set-up, the Einstein equations (23.54) and (23.58) for a metric of the form
(23.76) and an energy-momentum tensor of the form (23.77) read
506
where the regularity condition M (0) = 0 has been imposed.
The equations (23.80) need to be supplemented either by the conservation-law
T = 0 (23.82)
Remarks:
1. Note that the right-hand side is manifestly negative so that (reassuringly) the
pressure inside the star decreases as one moves to larger values of r.
3. Given such an equation of state, in principle one can then integrate the TOV
equation and (23.81)
dM (r)
= 4r 2 (r) , (23.87)
dr
for M (r) and P (r). In practice, except for some very special simple equations of
state, this needs to be done numerically.
507
4. The TOV equation can be interpreted as the condition for hydrostatic equilibrium
of the star, as it generalises the Newtonian hydrostatic equation
(r)M (r)
p (r) = GN . (23.88)
r2
The differences between (23.86) and (23.88) can be attributed to (and provide
useful insight into) the key differences between Newtonian gravity and general
relativity:
In general relativity, not only the mass M (r) acts as a source of the gravi-
tational field, but anything that appears in the energy-momentum tensor, in
particular in the present situation the pressure p(r). This accounts for the
substitution M (r) M (r) + 4r 3 p(r).
In general relativity, gravity acts not only on (r) but also on p(r). This
accounts for the substitution (r) (r) + p(r).
In general relativity, the gravitational force differs from the Newtonian grav-
itational force. In the present case this accounts for the additional factor of
f (r) = 1 2m(r)/r in the denominator.
All these new terms are suppressed by a factor of c2 relative to the leading
Newtonian terms.
We will now consider the solution of this set of equations in the case where the energy
density is constant inside the star,
This ansatz replaces an assumption about an explicit equation of state relating (r) and
p(r), as p(r) can now be determined from the TOV equation.
This already completely determines the spatial part of the interior metric, and matches
perfectly with the radial part of the exterior Schwarzschild solution, which has grr =
f (r)1 with f (r) = 1 2GN M/r for all r > r0 .
508
Knowing m(r) (and (r) = 0 ), we can determine p(r) from the TOV equation (23.86),
which now reads
4GN r
p (r) = (0 + p(r)) (0 + 3p(r)) . (23.92)
3 f (r)
Writing
p (r) 1 d 0 + 3p
= ln (23.93)
(0 + p(r))(0 + 3p(r)) 20 dr 0 + p
and
4GN r 1 d
= ln f (r) , (23.94)
3 f (r) 40 dr
it follows immediately that the solution satisfying the boundary condition p(r0 ) = 0 is
Before turning to that, let us complete the solution of the Einstein equations by deter-
mining h(r). This can be found either directly by integration of the second equation in
(23.80),
h (r) = 4GN rf (r)1 (0 + p(r)) , (23.96)
or, more efficiently, from (23.84), which we rewrite as
d p (r) d
(h(r) + 21 ln f (r)) = (h + 21 ln f ) = (0 + p)1 . (23.97)
dr 0 + p(r) dp
The solution to this equation is evidently
and fixing this integration constant by the requirement h(r0 ) = 0, so that also the (tt)-
component of the metric matches onto that of the exterior Schwarzschild solution at
r = r0 , one finds
e h(r) = 12 (3f (r0 )1/2 /f (r)1/2 1) , (23.100)
or
gtt (r) = 14 (3f (r0 )1/2 f (r)1/2 )2 . (23.101)
This completes the derivation of the solution of the Einstein equations for the interior
of a perfect-fluid star with constant energy density.
509
with
r2 8GN
f (r) = 1 2
=1 0 r 2 , (23.103)
R 3
supported by a perfect fluid matter with (r) = 0 and p(r) given in (23.95). This
matches continuously (and, in fact, once-differentiably) onto the exterior Schwarzschild
solution with gtt (r) = f (r), grr = f (r)1 , and f (r) = 1 2GN M/r, where M is the
total mass of the star, M = 40 r03 /3.
Remarks:
dr 2
(ds2 )space = f (r)1 dr 2 + r 2 d2 = + r 2 d2 , (23.104)
1 r 2 /R2
with
3
R2 = . (23.105)
8GN 0
and r r0 (we assume that the energy density 0 is positive, 0 > 0). By standard
manipulations we can put this metric into the usual form (1.117),
3 r03 r3
R2 = = = 0 , (23.107)
8GN 0 2GN M rs
we see that 2
R r0
= . (23.108)
r0 rs
Thus R > r0 iff the radius of the star is larger than its Schwarzschild radius,
This is indeed a necessary condition for a star - an object that is smaller than its
Schwarzschild radius will turn out to be not a star but a black hole.
510
2. A stronger constraint on the relative size of r0 and rs arises from an analysis of the
solution (23.95) of the TOV equation. Recall that we have f (r) = 1r 2 /R2 . Since
r < r0 in the interior, one has f (r0 ) < f (r). Thus the pressure can potentially
become infinite at values of r where the denominator of (23.95) vanishes.
The condition for the existence of a stable star, i.e. the requirement that the
pressure be non-singular everywhere in the interior of the star, in particular at the
origin r = 0, is
9 4
f (r0 )1/2 > 1/3 r0 > rs Mmax = r0 . (23.110)
8 9GN
Thus we learn that a star consisting of matter with constant energy density must
be larger than 9/8 times its Schwarzschild radius. In other words, the maximal
amount of mass that can be contained in a star with radius r0 is bounded from
above by 4r0 /9GN , or
2m 8
. (23.111)
r0 9
This result is actually valid for far more general equations of state and is known
as the Buchdahl limit or Buchdahls Theorem (1959).73
3. Here we have considered the simplest model of a static star, with constant energy
density. The basic set-up, however, can also be used to analyse in detail the stellar
structure and evolution of other compact stars like neutron stars.74
In section 22.4 we had derived two (tentative) expressions for the total energy of an
isolated system, namely the ADM energy (22.32)
I
1
EADM = dSi (k hik i hkk ) , (23.112)
16GN S 2
73
See e.g. J. Guven, N. O Murchadha, Bounds on 2m/R for static spherical objects,
arXiv:gr-qc/9903067, H. Andreasson, Sharp bounds on 2m/r of general spherically symmetric
static objects, arXiv:gr-qc/0702137, J. Mark Heinzle, Bounds on 2m/r for static perfect fluids,
arXiv:0708.3352 [gr-qc] for a survey of known results and more recent work along these lines.
74
See e.g. N. Straumann, General Relativity: with applications to astrophysics.
511
and the Komar energy (22.41)
I I
1 1
EKomar () = dSi i h00 = dS K . (23.113)
8GN S 2 8GN S
2
We can now apply these to the Schwarzschild metric (for which we had already calculated
the canonical ADM energy in section 20.12).
In standard coordinates the asymptotic behaviour of the spatial part of the metric is
Since the ADM expression for the energy appeals to an asymptotically Cartesian co-
ordinate system, this is one instance where it is perhaps more natural (and safer) to
use the Schwarzschild metric in isotropic coordinates (23.39). The asymptotic ( ,
with 2 = ~x2 ) behaviour of the spatial part of the metric is
2m
(1 + m/2)4 d~x2 (1 + 2m/)d~x2 hik = ik . (23.119)
Note that this is not the same as (23.116). Nevertheless, calculating k hik i hkk in
this case, one finds the same expression as in Schwarzschild coordinates (with r ),
4m
k hik i hkk = xi , (23.120)
3
and the remainder of the calculation is then identical, leading again to EADM = M .
This agrees with the result of the calculations of the canonical ADM energy of the
Schwarzschild metric in section 20.12.
From the alternative Komar expression (22.41) for the energy, this (eminently reason-
able and respectable) result arises not from the asymptotic behaviour of the spatial
512
components of the metric but instead form that of the (00)-component of the metric
(the relation between the two being provided by the Einstein equations). We have
2m 2m
g00 = 1 + h00 = , (23.121)
r r
leading to I
dSi i h00 = 4r 2 r (2m/r) = 8m . (23.122)
Sr2
Note that for the special case of the Schwarzschild metric, for which (23.121) is exact
(and not just true asymptotically), this is independent of r, and thus
I I
1 1
EKomar = lim dSi i h00 = dSi i h00 = M . (23.123)
r 8GN S 2 8GN Sr2
r
However, more generally, for any metric with the asymptotic behaviour
a
g00 = 1 + + O(r 2 ) (23.124)
r
this calculation shows that the mass / energy of the solution is determined by the 1/r
term of the (00)-component of the metric,
I
1
EKomar = lim dSi i h00 = a/2GN . (23.125)
r 8GN S 2
r
In particular, this applies e.g. to the Reissner-Nordstrm metric of section 30 and shows
that the parameter m appearing in the solution
2m q 2
ds2 = f (r)dt2 + f (r)1 dr 2 + r 2 d2 , f (r) = 1 + 2 (23.126)
r r
has the interpretation of the total energy of the system, E = m/GN , even in the presence
of charge and electrostatic fields.
Remarks:
( K ) = R K . (23.127)
513
In particular, therefore, in source-free regions of space-time (T = 0 R = 0)
one has ( K ) = 0 and by the usual arguments the surface integral of K
is then independent of the choice of surface.
The Reissner-Nordstrm metric, on the other hand, is a solution of the combined
Einstein-Maxwell equations with a non-trivial energy-momentum tensor through-
out space-time. In this case ( K ) 6= 0, and the Komar integral will depend
on the radius of the sphere, say, as different surface-integrals will enclose different
amounts of electrostatic energy.
2. One may be a bit concerned by the fact that we obtained a non-zero result for
the ADM or Komar energy for the Schwarzschild metric, allegedly a solution of
the vacuum Einstein equations, even though in order to arrive at the expressions
for the energy, we started off with a non-trivial energy-momentum tensor, either
in the linearised theory, with T00 = 6= 0, or via the Komar charge associated to
the current
J = R K = 8GN (T 12 R)K (23.128)
(which is identically zero for a vacuum solution, so it appears that one doesnt
have a leg to stand on in that case).
This seeming conflict can be resolved in a number of ways. Let us being with the
ADM energy.
relevant statement.
Now let us turn to the Komar energy. The point to realise (recall) is that it is not
the conserved Komar current J (which would indeed be vanishing identically for a
vacuum solution) which is the crucial quantity but rather the object A = K
514
satisfying (23.127), and its surface integral. If one chooses the 3-volume V to
remain outside the star (or black hole), bounded by 2 spheres at radii r1 > r0
(radius of the star) or r1 > rs , and r2 > r1 , then the fact that J = 0 in that
region just reproduces the statement that the 2-surface integral is independent of
the radius but does not preclude a non-zero value of the integral over either one
of those surfaces.
515
24 Particle and Photon Orbits in the Schwarzschild Geometry
We now come to the heart of the matter, the study of planetary orbits and light rays
in the gravitational field of the sun, i.e. the properties of timelike and null geodesics of
the Schwarzschild geometry. We shall see that, by once again making good use of the
symmetries of the problem, we can reduce the geodesic equations to a single first order
differential equation in one variable, analogous to that for a one-dimensional particle
moving in a particular potential. Solutions to this equation can then readily be discussed
qualitatively and also quantitatively (analytically).
A convenient starting point in general for discussing geodesics is, as I stressed before,
the Lagrangian L = g x x . For the Schwarzschild metric this is
where 2m = 2M GN /c2 . Rather than writing down and solving the (second order)
geodesic equations, we will make use of the conserved quantities K x associated with
Killing vectors. After all, conserved quantities correspond to first integrals of the equa-
tions of motion and if there are a sufficient number of them (there are) we can directly
reduce the second order differential equations to first order equations.
So, how many Killing vectors does the Schwarzschild metric have? Well, since the
metric is static, there is one timelike Killing vector, namely /t, and since the metric
is spherically symmetric, there are spatial Killing vectors generating the Lie algebra of
SO(3), hence there are three of those, given explicitly in (8.53), and therefore all in all
four Killing vectors.
Now, since the gravitational field is isotropic (and hence there is conservation of angular
momentum), the orbits of the particles or planets are planar. Without loss of generality,
we can choose our coordinates in such a way that this plane is the equatorial plane
= /2, so in particular = 0 at all times. In case this is not obvious, here are two
ways to establish this:
One can certainly choose ones coordinates in such a way that at some initial time
= 0 one has (0 ) = /2 and that the angular velocity ( 0 ) = 0. It then
follows from the Euler-Lagrange equations of motion for ( ) that they are solved
) = 0.
at all times by ( ) = /2, (
516
associated to the Killing vectors V(a) (8.53),
L(1) = r 2 cos + cot sin (sin2 )
L(2) = r 2 + sin + cot cos (sin2 )
(24.3)
L(3) = r 2 sin2 ,
and to use spherical symmetry to rotate the L(a) into the form
Using the explicit expressions (24.3), it is straightforward to see that L(1) = L(2) =
0 implies = /2, = 0.
This leaves two conserved quantities, the energy (per unit rest mass) E and the mag-
nitude L (per unit rest mass) of the angular momentum, corresponding to the cyclic
variables t and (or: corresponding to the Killing vectors /t and V(3) = /),
L d L
=0 =0
t d t
L d L
=0 =0 , (24.6)
d
namely
E = (1 2m/r)t (24.7)
L = r 2 sin2 = r 2 . (24.8)
Calling L the angular momentum (per unit rest mass) requires no further justification,
but let me pause to explain in what sense E is an energy (per unit rest mass). On the
one hand, it is the conserved quantity (2.75) associated to time-translation invariance.
As such, it certainly deserves to be called the energy.
It is moreover true that for a particle at infinity (r ) E is just the special relativistic
energy E = (v )c2 , with (v) = (1 v 2 /c2 )1/2 the usual relativistic -factor, and v
the coordinate velocity dr/dt at infinity. This can be seen in two ways. First of all, for
a particle that reaches r = , the constant E can be determined by evaluating it at
r = . It thus follows from the definition of E that
E = t . (24.9)
In Special Relativity, the relation between proper and coordinate time is given by (set-
ting c = 1 again) p
d = 1 v 2 dt t = (v) , (24.10)
517
suggesting the identification
E = (v ) (E = (v )c2 if c 6= 1) (24.11)
Another argument for this identification will be given below, once we have introduced
the effective potential.
As we have seen in section 2.1 (and again in section 4.8), there is also always one more
integral of the geodesic equation, namely L itself,
d
L = 2g x D x = 0 (24.12)
d
(this can be interpreted as the conserved Hamiltonian associated with the invariance
of L under translations of the affine parameter). Thus we set
L= , (24.13)
where = 1 for timelike geodesics and = 0 for null geodesics. Thus we have
and we can now express t and in terms of the conserved quantities E and L to obtain
a first order differential equation for r alone, namely
L2
(1 2m/r)1 E 2 + (1 2m/r)1 r 2 + = . (24.15)
r2
Multiplying by (1 2m/r)/2 and rearranging the terms, one obtains
E2 + r 2 m L2 mL2
= + + 2 3 . (24.16)
2 2 r 2r r
Now this equation is of the familiar Newtonian form
r 2
Eef f = + Vef f (r) , (24.17)
2
with
E2 +
Eef f =
2
m L2 mL2
Vef f (r) = + 2 3 , (24.18)
r 2r r
describing the energy conservation in an effective potential. Except for t , this is
exactly the same as the Newtonian equation of motion in a potential
m mL2
V (r) = 3 , (24.19)
r r
the effective angular momentum term L2 /r 2 = r 2 2 arising, as usual, from the change to
polar coordinates. As in the corresponding Newtonian one-dimensional (radial) prob-
lem, the qualitative behaviour and characteristics of the orbits in the Schwarzschild
geometry can thus essentially be determined by inspection.
518
Given that in principle we started off with the four coupled non-linear geodesic differen-
tial equations, this is an enormous and enormously useful simplification, and the main
result of this section.
Remarks:
and noting that Vef f (r) 0 for r , we can read off that for a particle that
reaches r = we have the relation
2
r = E2 1 . (24.22)
This implies, in particular, that for such (scattering) trajectories one necessarily
has E 1, with E = 1 corresponding to a particle initially or finally at rest at
infinity. For E > 1 the coordinate velocity at infinity can be computed from
r2
2
v = . (24.23)
t2
Using (24.9) and (24.22), one finds
2 E2 1 2 1/2
v = E = (1 v ) , (24.24)
E2
thus confirming the result claimed in (24.11).
3. For null geodesics, = 0, on the other hand, the Newtonian part of the potential
is zero, as one might expect for massless particles, but in General Relativity a
photon with L 6= 0 feels a non-trivial potential
mL2
=0 V (r) =
r3 (24.25)
L2
Vef f (r) = f (r) 2 .
2r
519
4. Finally, we note that the success of the above analysis relied only on the symme-
tries of the metric, not on the fact that the particular metric we were looking at
satisfies the vacuum Einstein equations. It is indeed straightforward to generalise
the preceding analysis to arbitrary static spherically symmetric metrics. As men-
tioned before, among these general static spherically symmeric metrics the class
of metrics (23.68)
is of particular importance and interest. For these metrics one finds that the
geodesic equation can still be reduced to the effective potential form (24.17), with
Eef f precisely as in (24.18), and Vef f now given by
L2 L2
Vef f (r) = (r) + + (r) , (24.27)
2r 2 r2
where we have written f (r) in terms of the corresponding Newtonian potential
(r) as
f (r) = 1 + 2(r) . (24.28)
For the Schwarzschild metric one has (r) = m/r and (24.27) reduces to (24.18).
Namely, starting from the Lagrangian L (24.1), considerations about spherical symmetry
led us to choose = /2, and imposing this condition in the Lagrangian we were led to
the reduced Lagrangian in (24.5). This is legitimate, since such holonomic constraints
can always be inserted into the Lagrangian itself.
Subsequently we used the conserved quantities E and L associated to the cyclic variables
Here one needs
t and in order to eliminate from the Lagrangian the quantities t and .
to be more careful. In general such non-holonomic constraints like r = L cannot be
2
inserted into the Lagrangian to obtain a (reduced) Lagrangian from which the equations
of motion for the remaining variables (here r) can be obtained as the Euler-Lagrange
equations (one correct way to do this is to pass to what is known as the Routhian, a
partial Legendre transform of the Lagrangian on the cyclic variables).
As a simple example, consider the Lagrangian of a (unit mass) free particle in 2 dimen-
sions, expressed in polar coordinates. Its Lagrangian is
L = 12 (r 2 + r 2 2 ) . (24.29)
p = r 2 = L , (24.30)
520
and the equation of motion for r is the Euler-Lagrange equation
r = r 2 . (24.31)
r = L2 /r 3 . (24.32)
Had one eliminated in the Lagrangian instead, one would have found the found the
reduced Lagrangian and resulting Euler-Lagrange equations
L 21 (r 2 + L2 /r 2 ) r = L2 /r 3 (wrong!) . (24.33)
However, even though you might have gained that impression, this is of course not
what we were doing in our derivation of the radial effective potential equation. There
we were using this elimination in conjunction with the condition that L is constant on
solutions, to obtain a first-order equation for r (the effective potential equation), and
this is legitimate. This is a 1st integral of the 2nd order radial equation, the 2nd order
equations for r follow from this simply by differentiating with respect to time t, and not
by treating r 2 /2 + Vef f (r) as a Lagrangian and looking at its Euler-Lagrange equations.
And finally, just to illustrate the claim about the Routhian in the above example, the
partial Legendre transform with respect to the cyclic variable of L is (expressing in
terms of p )
R = L p = 12 (r 2 p2 /r 2 ) 21 (r 2 L2 /r 2 ) (24.34)
(note the sign flip compared with the wrong Lagrangian above). This is the correct
reduced Lagrangian, whose Euler-Lagranqe equations give rise to the correct radial
equation of motion.
Typically, one is primarily interested in the shape of an orbit, that is in the radius
r as a function of , r = r(), rather than in the dependence of, say, r on some
extraterrestrials proper time . In this case, the above mentioned difference between t
(in the Newtonian theory) and (here) is irrelevant: In the Newtonian theory one uses
L = r 2 d/dt to express t as a function of , t = t() to obtain r() from r(t). In General
Relativity, one uses the analogous equation L = r 2 d/d to express as a function of ,
= (). Hence the shapes of the General Relativity orbits are precisely the shapes of
the Newtonian orbits in the potential (24.19). Thus we can use the standard methods
of Classical Mechanics to discuss these general relativistic orbits and of course this
simplifies matters considerably.
521
to combine (24.17),
r 2 = 2Eef f 2Vef f (r) , (24.36)
and (24.8),
L2
2 = 4 (24.37)
r
into
r 2 2
L = 2Eef f 2Vef f (r) (24.38)
r4
where a prime denotes a -derivative.
In the examples to be discussed below, we will be interested in the angle swept out
by the object in question (a planet or a photon) as it travels along its trajectory between
the farthest distance r2 from the star (sun) (r2 = for scattering trajectories) and the
position of closest approach to the star r1 (the perihelion or, more generally, if we are
not talking about our own solar system, periastron), and back again,
Z r2
d
= 2 dr . (24.39)
r1 dr
In the Newtonian case, these integrals can be evaluated in closed form. With the
general relativistic correction term, however, these are elliptic integrals which cannot be
expressed in closed form. A perturbative evaluation of these integrals (treating the exact
general relativistic correction as a small perturbation) also turns out to be somewhat
delicate since e.g. the limits of integration depend on the perturbation.
It is somewhat simpler to deal with this correction term not at the level of the solution
(integral) but at the level of the corresponding differential equation. As in the Kepler
problem, it is convenient to make the change of variables
1 r
u= u = . (24.40)
r r2
Then (24.38) becomes
u2 = L2 (2Eef f 2Vef f (r)) . (24.41)
Upon inserting the explicit expression for the effective potential, this becomes
E 2 + 2m
u2 + u2 = 2 u + 2mu3 . (24.42)
L2 L
This can be used to obtain an equation for d(u)/du = u1 , leading to
Z u1
d
= 2 du . (24.43)
u2 du
522
Thus either u = 0, which corresponds to a circular orbit of constant radius (i.e. the
solution is u() = u0 or r() = r0 )), and this is not only a trivial but also an irrelevant
solution since neither the planets nor the photons of interest to us travel on circular
orbits, or
m
u + u = 2 + 3mu2 . (24.45)
L
The unperturbed (Newtonian) equation is the linear equation obtained by dropping
the last term, and its solutions u0 () give the familiar conical sections for = 1 (cf.
section 24.4) and straight lines for = 0 (section 24.6). Treating the last term as a
small perturbation, one can expand the solution u as
u = u0 + u1 (24.46)
u1 + u1 = 3mu20 (24.47)
for the perturbation u1 , with u0 providing the source term (or external force on the
harmonic oscillator).
Equations (24.45) and (24.47) are the equations that we will study below to determine
the perihelion shift and the bending of light by a star. In the latter case, which is a
bit simpler, I will also sketch two other derivations of the result, based on different
perturbative evaluations of the elliptic integral.
2. At the Schwarzschild radius rs = 2m, nothing special happens and the potential
is completely regular there,
1
Vef f (r = 2m) = . (24.50)
2
For the discussion of planetary orbits in the solar system we can safely assume that
the radius of the sun is much larger than its Schwarzschild radius, r0 rs , but the
523
Veff (r)
2m r
|
-1/2 _
Figure 12: Effective potential for a massive particle with L/m < 12. The extrapolation
to values of r < 2m has been indicated by a dashed line.
above shows that even for these highly compact objects with r0 < rs geodesics are
perfectly regular as one approaches rs . Of course the particular numerical value
of Vef f (r = 2m) has no special significance because V (r) can always be shifted by
a constant.
3. The extrema of the potential, i.e. the points at which dVef f /dr = 0, are at
p
mr 2 L2 r + 3mL2 = 0 r = (L2 /2m)[1 1 12(m/L)2 ] , (24.51)
and the potential has a maximum at r and a local minimum at r+ . Thus there
are qualitative differences in the shapes of the orbits between L/m < 12 and
L/m > 12.
Let us discuss these two cases in turn. When L/m < 12, then there are no critical
points and the potential looks approximately like that in Figure 12. Note that we
should be careful with extrapolating to values of r with r < 2m because we know that
the Schwarzschild metric has a coordinate singularity there. However, qualitatively the
picture is also correct for r < 2m.
From this picture we can read off that there are no bounded orbits for these values
of the parameters. Any inward bound particle with L < 12m will continue to fall
inwards (provided that it moves on a geodesic). This should be contrasted with the
Newtonian situation in which for any L 6= 0 there is always the centrifugal barrier
reflecting incoming particles since the repulsive term L2 /2r 2 will dominate over the
attractive m/r for small values of r. In General Relativity, on the other hand, it is
the attractive term mL2 /r 3 that dominates for small r.
524
Veff (r)
2m r1 r+ r2 r
|
r-
-1/2 _
E
Figure 13: Effective potential for a massive particle with L/m > 12. Shown are the
maximum of the potential at r (an unstable circular orbit), the minimum at r+ (a
stable circular orbit), and the orbit of a particle with Eef f < 0 with turning points r1
and r2 .
Fortunately for the stability of the solar system, the situation is qualitatively quite
different for sufficiently large values of the angular momentum, namely L > 12m (see
Figure 13).
In that case, there is a minimum and a maximum of the potential. The critical radii
correspond to exactly circular orbits, unstable at r (on top of the potential) and stable
at r+ (the minimum of the potential). From (24.8) one finds that the angular coordinate
velocity on such a circular orbit is
From the effective potential equation and r = 0 one finds that the corresponding
energy is
E 2 = 1 + 2Vef f (r ) = (1 2m/r )(1 + L2 /r
2
) (24.53)
L2 = mr
2
/(r 3m) , (24.54)
Since is inversely proportional to the period of the motion, this equation looks very
much like Keplers law which says that the period-squared is proportional to the radius-
525
cubed of an orbit. However, this is a bit of a fluke because this relation is not coordinate-
independent but relies on having expressed time and radius in terms of the Schwarzschild
coordinates (rather than in terms of proper time along the orbit, say).
For L 12m these two circular orbits approach each other, the critical radius tending
to r 6m. Thus the innermost stable circular orbit (known affectionately as the ISCO
in astrophysics) is located at
rISCO = 6m . (24.56)
This is of course only a relevant quantity when rISCO lies outside the star, i.e. for
r0 < 3rs (and thus mainly for black holes or other quite extreme objects). It has the
characteristic orbital frequency
On the other hand, for very large values of L the critical radii are (expand the square
root to first order) to be found at
L
(r+ , r ) (L2 /m, 3m) . (24.58)
For given L, for sufficiently large values of Eef f a particle will fall all the way down
the potential. For Eef f < 0, there are bound orbits which are not circular and which
range between the radii r1 and r2 , the turning points at which r = 0 and therefore
Eef f = Vef f (r1,2 ). We will take a closer look at these bound (but not closed) orbits
below.
Because of the general relativistic correction 1/r 3 , the bound orbits will not be closed
(elliptical). In particular, the position of the perihelion, the point of closest approach
of the planet to the sun where the planet has distance r1 , will not remain constant.
However, because r1 is constant, and the planetary orbit is planar, this point will move
on a circle of radius r1 around the sun.
As described in section 24.2, in order to calculate this perihelion shift one needs to
calculate the total angle swept out by the planet during one revolution by integrating
this from r1 to r2 and back again to r1 , or
Z r2
d
= 2 dr . (24.59)
r1 dr
Rather than trying to evaluate the above integral via some sorcery, we will determine
by analysing the orbit equation (24.45) for = 1,
m
u + u = + 3mu2 . (24.60)
L2
526
In the Newtonian approximation, this equation reduces to that of a displaced harmonic
oscillator,
m
u0 + u0 = 2 (u0 m/L2 ) + (u0 m/L2 ) = 0 , (24.61)
L
and the solution is a Kepler ellipse described parametrically by
m
u0 () = (1 + e cos ) (24.62)
L2
where e is the eccentricity (e = 0 means constant radius and hence a circular orbit).
Plugging this back into the Newtonian non-linear 1st-order equation (cf. (24.42))
E 2 1 2m
u0 2 + u20 = + 2 u0 , (24.63)
L2 L
one finds that the integration constant e is related to the energy by
L2 2 2L2
e2 = 1 + (E 1) = 1 + Eef f . (24.64)
m2 m2
In particular, e2 < 1 for bound states (bounded orbits) with Eef f < 0, and we will
concentrate on these orbits. The perihelion (aphelion) is then at = 0 ( = ), with
L2 1
r1,2 = . (24.65)
m 1e
Thus the semi-major axis a of the ellipse,
2a = r1 + r2 , (24.66)
is
L2 1
a= . (24.67)
m 1 e2
In particular, in the Newtonian theory, one has
Z
()0 = 2 d = 2 . (24.68)
0
The anomalous perihelion shift due to the effects of General Relativity is thus
= 2 . (24.69)
u = u0 + u1 (24.70)
u1 + u1 = 3mu20 . (24.71)
The general solution of this inhomogenous differential equation is the general solution
of the homogeneous equation (we are not interested in) plus a special solution of the
inhomogeneous equation. Writing
527
and noting that
528
1. this effect is cumulative, i.e. after N revolutions one has an anomalous perihelion
shift N ;
2. Mercury has a very short solar year, with about 415 revolutions per century;
3. and accurate observations of the orbit of Mercury go back over more than 200
years.
Thus the cumulative effect is approximately 103 and this is sufficiently large to
be observable in principle. From (24.80) the prediction of General Relativity for this
precession of the perihelion can be calculated to be
And indeed such an effect is observed (and had for a long time presented a puzzle, an
anomaly, for astronomers).
In actual fact, the perihelion of Mercurys orbit shows a precession rate of 5601 per
century (which, admittedly, does not yet look like a brilliant confirmation of general
relativity). However, of this effect about 5025 are due to fact that one is using a
non-inertial geocentric coordinate system (precession of the equinoxes). 532 are due to
perturbations of Mercurys orbit caused by the (Newtonian) gravitational attraction of
the other planets of the solar system (chiefly Venus, earth and Jupiter). This much was
known prior to General Relativity and left an unexplained anomalous perihelion shift
of
anomalous = 43, 11 0, 45 /century . (24.82)
Thus general relativity, with its general relativistic correction to the (unperturbed)
Kepler orbit of Mercury, appears to account precisely for the observed anomaly, and
this is quite a striking and impressive confirmation of a prediction of general relativity.
Other observations, involving e.g. the mini-planet Icarus, discovered in 1949, with a
huge eccentricity e 0, 827, binary pulsar systems, and more recently obervations of
highly eccentric stars close to the galactic (black hole) center have provided further
confirmation of the agreement between General Relativity and observations.
L2 mL2 L2 2m
Vef f (r) = 2
3
= 2
(1 ) . (24.83)
2r r 2r r
The following properties are immediate:
529
Veff (r)
r=2m r=3m r
Figure 14: Effective potential for a massless particle. Displayed is the location of the
unstable circular orbit at r = 3m. A photon with an energy E 2 < L2 /27m2 will be
deflected (lower arrow), photons with E 2 > L2 /27m2 will be captured by the star.
3. Vef f (r = 2m) = 0.
5. There is one critical point of the potential, at r = 3m, with Vef f (r = 3m) =
L2 /54m2 .
Thus the potential has the form sketched in Figure 14, with the following consequences:
1. For energies E 2 > L2 /27m2 , photons are captured by the star and will spiral into
it. For energies E 2 < L2 /27m2 , on the other hand, there will be a turning point,
and light rays will be deflected by the star.
As this may sound a bit counterintuitive (shouldnt a photon with higher energy
be more likely to zoom by the star without being forced to spiral into it?), think
about this in the following way. L = 0 corresponds to a photon falling radially
towards the star, L small corresponds to a slight deviation from radial motion,
while L large (thus large) means that the photon is travelling along a trajectory
that will not bring it very close to the star at all (see the next subsection for the
530
precise relation between the angular momentum L and the impact parameter b of
the photon). It is then not surprising that photons with small L are more likely to
be captured by the star (this happens for L2 < 27m2 E 2 ) than photons with large
L which will only be deflected in their path. We will study this in more detail
below.
2. Now let us also consider the opposite situation, that of light from or near the
star (and we are of course assuming that r0 > rs ). Then for r0 < 3m and
E 2 < L2 /27m2 , the light cannot escape to infinity but falls back to the star,
whereas for E 2 > L2 /27m2 light will escape. Thus for a path sufficiently close
to radial (L small, because is then small) light can always escape as long as
r0 > 2m.
3. Finally, let us consider the critical point of the potential at r = 3m. From the
equation of motion
L2
r = 4 (r 3m) (24.84)
r
we see that the radial acceleration of a photon changes sign at it falls past r = 3m,
thus decelerating / slowing down for r < 3m. However, this is accompanied by an
increase in the angular velocity of the photon, leading to an inspiralling motion.
The existence of one unstable circular orbit for photons at r = 3m (the photon
sphere), while not relevant for the applications to the solar system in this section,
turns out to be of some interest in black hole astrophysics (as a possibly observable
signature of black holes).
To study the bending of light by a star, we consider an incoming photon (or light ray)
with impact parameter b (see Figure 15) and we need to calculate (r) for a trajectory
with turning point at r = r1 . At that point we have r = 0. Here the dot can, as
usual, be taken to be the derivative with repect to some affine parameter . However,
noting that the condition g x x = 0 is reparametrisation-invariant (unlike its massive
cousin g x x = 1), we can equally well choose to parametrise the lightrays by the
coordinate time t even though this is not an affine parameter (this matters at the level
of the 2nd order geodesic differential equations but not at the level of the 1st integrals
and the effective potential).
L2 2m
Eef f = Vef f (r1 ) r12 = (1 ) . (24.85)
E2 r1
531
The first thing we need to establish is the relation between b and the other parameters
E and L. Consider the ratio
L r 2
= . (24.86)
E (1 2m/r)t
For large values of r, r 2m, this reduces to
L d
= r2 . (24.87)
E dt
On the other hand, for large r we can approximate b/r = sin by . Since we also have
dr/dt = 1 (for an incoming light ray), we deduce
L d b
= r2 =b . (24.88)
E dt r
In terms of the variable u = 1/r the equation for the shape of the orbit (24.45) is
u + u = 3mu2 (24.89)
In the absence of the general relativistic correction (calling this Newtonian is perhaps
not really appropriate since we are dealing with photons/light rays) one has b1 = u1
or b = r1 (no deflection). The orbit equation
u0 + u0 = 0 (24.92)
= . (24.96)
532
by perturbatively solving the orbit equation (24.89);
In order to solve the orbit equation (24.89), we proceed as in section 24.4. Thus the
equation for the (small) deviation u1 () is
3m 3m
u1 + u1 = 3mu20 = 2
(1 cos2 ) = 2 (1 cos 2) (24.97)
b 2b
which has the particular solution (cf. (24.73))
3m
u1 () = (1 + 13 cos 2) . (24.98)
2b2
Therefore
1 3m
u() = sin + 2 (1 + 31 cos 2) . (24.99)
b 2b
By considering the behaviour of this equation as r or u 0, one finds an equation
for (minus) half the deflection angle, namely
1 3m 4
(/2) + 2 = 0 , (24.100)
b 2b 3
leading to the result
4m 4M GN
= = . (24.101)
b bc2
is therefore Z b1
b3 u3
()1 = =2 du 2 . (24.104)
m m=0 0 (b u2 )3/2
533
b r1
Delta phi
delta phi
Figure 15: Bending of light by a star. Indicated are the definitions of the impact
parameter b, the perihelion r1 , and of the angles and .
First of all, redoing the analysis of sections 24.1 and 24.2 for a general spherically
symmetric static metric (23.6),
534
it is easy to see that the orbit equation can be written as
r 2 1 E2
B(r) + = + (24.110)
r4 r2 L2 L2 A(r)
or, in terms of u = 1/r, as (abusing notation by writing A(r = 1/u) as A(u) etc.,
apologies if this causes allergic reactions)
E2
B(u) u2 + u2 = + . (24.111)
L2 L2 A(u)
We will concentrate on the lightlike case = 0,
E2
B(u) u2 + u2 = , (24.112)
L2 A(u)
and express the impact parameter b = L/E in terms of the turning point r1 = 1/u1 of
the trajectory. At this turning point, u = 0, and thus
E2
= A(u1 )u21 , (24.113)
L2
leading to
A(u1 ) 2
B(u) u2 = u u2 . (24.114)
A(u) 1
We thus find 1/2
d 1/2 A(u1 ) 2 2
= B(u) u u . (24.115)
du A(u) 1
For the (linearised) Schwarzschild metric the term in square brackets is
A(u1 ) 2
u u2 = u21 (1 + 2m(u u1 )) u2
A(u) 1
u21
= (u21 u2 )(1 2m ) . (24.116)
u1 + u
Using this and the approximate (linearised) value for B(u),
u31 u3
= (u21 u2 )1/2 + m . (24.118)
(u21 u2 )3/2
The first term now gives us the Newtonian result and, comparing with Derivation II,
we see that the second term agrees precisely with the integrand of (24.104) with b r1
535
(which, in a term that is already of order m, makes no difference). We thus conclude
that the deflection angle is, as before,
Z u1
u3 u3 4m
= 2 du m 2 1 2 3/2 = 4mu1 . (24.119)
0 (u1 u ) b
Remarks:
1. This effect is physically measurable and was one of the first true tests of Einsteins
new theory of gravity. For light just passing the sun the predicted value is
1, 75 . (24.120)
Quick sanity check on this estimate: 1 = /(180 3600) 0.5 105 radians,
while using the (rough) values for rs = 2m and r0 b for the sun given in (23.33),
one has
4m
(6/7) 105 (12/7) 1.7 . (24.121)
b
Experimentally this is a bit tricky to observe because one needs to look at light
from distant stars passing close to the sun. Under ordinary circumstances this
would not be observable, but in 1919 a test of this was performed during a total
solar eclipse, by observing the effect of the sun on the apparent position of stars
in the direction of the sun. The observed value was rather imprecise, yielding
1, 5 < < 2, 2 which is, if not a confirmation of, at least consistent with
General Relativity.
2. More recently, it has also been possible to measure the deflection of radio waves
by the gravitational field of the sun. These measurements rely on the fact that a
particular Quasar, known as 3C275, is obscured annually by the sun on October
8th, and the observed result (after correcting for diffraction effects by the corona
of the sun) in this case is = 1, 76 0, 02 .
3. The value predicted by General Relativity is, interestingly enough, exactly twice
the value that would have been predicted by the Newtonian approximation of the
geodesic equation alone (but the Newtonian approximation is not valid anyway
because it applies to slowly moving objects, and light certainly fails to satisfy this
condition). A calculation leading to this wrong value had first been performed by
Soldner in 1801 (!) (by cancelling the mass m out of the Newtonian equations
of motion before setting m = 0) and also Einstein predicted this wrong result in
1908 (his equivalence principle days, long before he came close to discovering the
field equations of General Relativity now carrying his name).
This result can be obtained from the above calculation by setting B(u) = 1 instead
of (24.117), as in the Newtonian approximation only g00 is non-trivial.
536
4. More generally, one can calculate the deflection angle for a metric with the ap-
proximate behaviour
B(u) 1 + 2mu , (24.122)
for a real parameter, with the result
1 + 4m
. (24.123)
2 b
This reproduces the previous result for = 1, half its value for = 0, and checking
to which extent measured deflection angles agree with the theoretical prediction
of general relativity ( = 1) constitutes an experimental test of general relativity.
In this context is known as one of the PPN parameters (PPN for parametrised
post-Newtonian approximation).
The perhaps slickest way to obtain the orbits of the Kepler problem is to make use of the
so-called Runge-Lenz vector. Recall that, due to conservation of angular momentum L, ~
the orbits in any spherically symmetric potential are planar. The bound orbits of the
Kepler problem, however, have the additional property that they are closed, i.e. that
the perihelion is constant. This suggests that there is a further hidden symmetry in the
Kepler problem, with the position of the perihelion the corresponding conserved charge.
This is indeed the case.
~ = ~x L
A ~ + W (r)~x (24.124)
or, in components,
Ai = ijk x j Lk + W (r)xi . (24.125)
A straightforward calculation, using the Newtonian equations of motion in the potential
W (r), shows that
d
Ai = (rr W (r) + W (r))x i . (24.126)
dt
Thus A~ is conserved if and only if W (r) is homogeneous of degree (1),
d ~ c
A = 0 W (r) = . (24.127)
dt r
In our notation, c = m, and we will henceforth refer to the vector
A ~ + m ~x
~ = ~x L (24.128)
r
as the Runge-Lenz vector.
It is well known, and can be shown e.g. by determining the Poisson brackets among the
~ extends the manifest
Li and Aj (only the calculation of {Ai , Aj } is a bit messy), that A
537
symmetry group of rotations SO(3) of the Kepler problem to the (hidden, phase space)
symmetry group SO(4) for bound orbits and SO(3, 1) for scattering orbits.
~ While A
It is straightforward to determine the Keplerian orbits from A. ~ has 3 compo-
~ as the norm A of A
nents, the only new information is contained in the direction of A, ~
can be expressed in terms of the other conserved quantities and parameters (energy E,
angular momentum L, mass m) of the problem. In the notation of section 24.1 one has
A2 = E 2 L2 + (L2 + m2 ) . (24.129)
1 m A
= 2 (1 + cos ) . (24.131)
r() L m
Comparing with (24.62), we recognise this as the equation for an ellipse with eccentricity
e and semi-major axis a (24.67) given by
A m 1
e= 2
= . (24.132)
m L a(1 e2 )
Moreover, we see that the perihelion is at = 0 which establishes that the Runge-Lenz
vector points from the center of attraction to the (constant) position of the perihelion.
During one revolution the angle changes from 0 to 2.
1 A
= 2 cos (24.133)
r() L
L2 L
b= = . (24.134)
A E
In this case, runs from /2 to /2 and the point of closest approach is again at
= 0 (distance b).
We see that the Runge-Lenz vector captures precisely the information that in the New-
tonian theory bound orbits are closed and lightrays are not deflected. The Runge-Lenz
vector will no longer be conserved in the presence of the general relativistic correction
to the Newtonian motion, and this non-constancy is a precise measure of the deviation
of the general relativistic orbits from their Newtonian counterparts. As shown e.g. in an
538
article by Brill and Goel76 this provides a very elegant and quick way of (re-)deriving
the results about perihelion precession and deflection of light in the solar system.
Calculating the time-derivative of the Newtonian Runge-Lenz A ~ (24.128), but now for
a particle moving in the general relativistic potential (24.19)
m mL2
V (r) = 3 , (24.135)
r r
one finds one additional term arising from substituting the equation of motion into x
j ,
leading to (of course we now switch from t to )
d ~ 3mL2 d
A= ~n , (24.136)
d r 2 d
where ~n = ~x/r = (cos , sin , 0) is the unit vector in the equatorial plane = /2 of
the orbit. Thus A~ rotates in the -direction in the equatorial plane. If A ~ is originally
pointing in the x1 -direction = 0, then its initial angular velocity in the x2 -direction is
3mL2 cos
= . (24.137)
Ar 2
In principle, here A refers to the norm of the Newtonian Runge-Lenz vector (24.128)
calculated for a trajectory ~x( ) in the general relativistic potential (24.135). This norm
is now no longer constant,
2mL4
A2 = E 2 L2 + (L2 + m2 ) + . (24.138)
r3
L2
r0 () = , (24.139)
A cos m
539
For = 1, and (1 , 2 ) = (0, 2), this results in (only the cos2 -term gives a non-zero
contribution)
3m2 6m2
= 2 2 = , (24.142)
L L2
in precise agreement with (24.77,24.79).
Using Z
cos3 = sin 13 sin3 , (24.144)
one finds
4mA 4m
= 2
= , (24.145)
L b
which agrees precisely with the results of section 24.6.
540
E: Black Holes
541
25 Black Holes I: Approaching the Schwarzschild Radius rs
In our previous discussions of this solution and its properties, in sections 23 and 24, we
had considered objects of a size larger (in practice much larger) than their Schwarzschild
radius, r0 > rs = 2m. We also noted that the effective potential Vef f (r) is perfectly
well behaved at rs . We now consider objects with r0 < rs and try to unravel some of
the bizarre physics that nevertheless occurs when one approaches or crosses rs = 2m.
We will do this in several steps:
First we will consider observers that dont quite dare to cross rs and which try to
remain static at a fixed value of r close to rs .
Then we will consider observers that fall freely (and radially) in this geometry,
and describe their voyage both from their point of view and from that of a distant
static observer (which will turn out to be quite different).
Then we will study the geometry of the Schwarzschild metric near r = rs , and
show that the geometry is completely non-singular (and is in fact closely related to
the geometry of the Rindler metric for Minkowski space-time discussed in section
1.3).
These considerations will indicate that (and explain why) the usual Schwarzschild
coordinates (specifically the time-coordinate t) are inadequate for describing the
physics across the radius rs .
Encouraged by this, in section 26 we then begin to explore the region near and
beyond r = rs . To that end we first introduce coordinates that are adapted to
infalling observers. This is an illuminating exercise which is interesting in its own
right and which will already tell us something about the hidden geometry behind
rs .
In order to learn more about this region, we next study the behaviour of lightcones
and light rays in this geometry. These considerations will then lead us to the
introduction of corresponding adapted coordinates in which the Schwarzschild
metric is non-singular for all 0 < r < , and then the fun can begin and we can
try to understand what actually happens (and what characterises) r = rs .
542
25.1 Static Observers
Some insight into the Schwarzschild geometry, and the difference between Newtonian
gravity and general relativistic gravity, is provided by looking at static observers, i.e.
observers hovering at fixed values of (r, , ). Thus their 4-velocity u = x has the
form u = (u0 , 0, 0, 0) with u0 > 0. The normalisation u u = 1 then implies
!
1
u = f (r)1/2 , 0, 0, 0 = p , 0, 0, 0 . (25.2)
1 2m/r
u = V /V (25.3)
where V = t is the timelike Killing vector and V its norm (2.185), but for present
(pragmatic and calculational) purposes the explicit coordinate expression is more useful.
The worldline of a static observer is clearly not a geodesic (that would be the worldline
of an observer freely falling in the gravitational field), and we can calculate its covariant
acceleration (4.94)
a = D u = u u = (d/d )u + u u . (25.4)
This looks nicely Newtonian, with a force in the radial direction designed to precisely
cancel the gravitational attraction. However, this is a bit misleading since this is a
coordinate dependent statement. A coordinate-invariant quantity is the norm of the
acceleration,
1/2
m 2m 1/2
a(r) g a a = 2 1 . (25.7)
r r
While this approaches the Newtonian value as r , it diverges as r 2m, indicating
that static observers will find it harder and harder, and need to travel nearly at the speed
of light, to remain static close to r = 2m.
Remarks:
1. While a(r) diverges as r 2m, we will see later (section 26.9) that the finite
quantity limr2m f (r)1/2 a(r) = 1/4m can be regarded as a measure of the strength
of the gravitational field at r = 2m (surface gravity).
543
2. One can also think of a(r) = ar (r) as the radial component of the acceleration with
respect to an orthonormal frame at that point, whose radial component would be
so that here (and in the static spherically symmetric case in general) (r) can be
regarded as the general relativistic analogue of the Newtonian potential (r), to
which it reduces in the Newtonian limit,
1 1
(r) = 2 ln f (r) 2 ln(1 + 2(r)) = (r) + . . . (25.10)
because V is constant along V (8.59). Using the Killing condition this can be
written as
a = (V /V2 ) V = 21 log(V V ) , (25.12)
We will now consider an object with r0 < rs and an observer who is freely falling
vertically (radially) towards such an object. Vertical means that = 0, and therefore
there is no angular momentum, L = 0. Hence the effective potential equation (24.16)
becomes
2m
E 2 1 = r 2 , (25.14)
r
where the conserved energy E (per unit mass) is given by
2m
E = (1 )t . (25.15)
r
In particular, if ri is the point at which the particle (observer) A was initially at rest,
dr
|r=ri = 0 , (25.16)
d
544
we have the relation
2m
E2 = 1 = f (ri ) (25.17)
ri
between the constant of motion E and the initial condition ri . In particular, E = 1 for
an observer following a trajectory of an object that would have initially been at rest at
infinity. In that case, (25.14) is readily integrated to give
r( ) (0 )2/3 (25.18)
Coordinate time, on the other hand, becomes infinite at rf = 2m. This can roughly
(and very easily) be seen by noting that
2m 1/2
= (1 ) t . (25.24)
r
As is finite (as we have seen) and (1 2mr )
1/2 0 as r 2m, clearly we need
t . One can also calculate explicitly t = t() or t = t(r), say, and check that
this indeed diverges as r 2m, but this is unenlightning. What we will do below is
check the divergence by concentrating on the region near r = 2m, and interpret this
divergence in terms of observations made by a static oberver.
545
25.3 Vertical Free Fall as seen by a Distant Observer
We will now investigate how the above situation presents itself to a distant observer
hovering at a fixed radial distance r . He will observe the trajectory of the freely falling
observer as a function of his proper time . Up to a constant factor (1 2m/r )1/2 ,
this is the same as coordinate time t, and we will lose nothing by expressing r as a
function of t rather than as a function of .
From (25.14),
2m
r 2 + (1 ) = E2 , (25.25)
r
which expresses r as a function of the freely falling observers proper time , and the
definition of E,
2m ) ,
E = t(1 (25.26)
r
which relates to the coordinate time t, one finds an equation for r as a function of t,
dr 2m 2m 1/2
= E 1 (1 )(E 2 (1 )) (25.27)
dt r r
(the minus sign has been chosen because r decreases as t increases). We want to analyse
the behaviour of the solution of this equation as the freely falling observer approaches
the Schwarzschild radius, r 2m,
dr r 2m r 2m 1/2
= E 1 ( )(E 2 )
dt r r
r 2m r 2m
E 1 ( )(E 2 )1/2 = ( ) . (25.28)
2m 2m
We can write this equation as
d 1
(r 2m) = (r 2m) , (25.29)
dt 2m
which obviously has the solution
This shows that, from the point of view of the observer at infinity, the freely falling
observer reaches r = 2m only as t . In particular, the distant observer will never
actually see the infalling observer cross the Schwarzschild radius.
This is clearly an indication that there is something wrong with the time coordinate t
which runs too fast as one approaches the Schwarzschild radius. We can also see this by
looking at the coordinate velocity v = dr/dt as a function of r. Let us choose ri =
for simplicity - other choices will not change our conclusions as we are interested in the
behaviour of v(r) as r rs . Then E 2 = 1 and from (25.27) we find (now dropping the
minus sign)
r 2m
v(r) = (2m)1/2 3/2 (25.31)
r
546
As a function of r, v(r) reaches a maximum at the critical radius rc = 6m = 3rs ,
d
v(r) = 0 r = rc = 6m , (25.32)
dr
where the velocity is (restoring the speed of light c)
2c
v(rc ) = . (25.33)
3 3
The fact that this radius agrees with the ISCO (innermost stable circular orbit) (24.56)
mentioned in section 24.3 should (presumably) be regarded as a coincidence.
Beyond that point, i.e. for r < rc , v(r) decreases again and clearly goes to zero as
r 2m. The fact that the coordinate velocity goes to zero is another manifestation of
the fact that coordinate time goes to infinity. Somehow, the Schwarzschild coordinates
are not suitable for describing the physics at or beyond the Schwarzschild radius because
the time coordinate one has chosen is running too fast. This is the crucial insight that
will allow us to construct better coordinates, which are also valid for r < rs , later on
in this section.
As an aside, if one repeats the above calculation for arbitrary E, from (25.27) one
finds that the critical radius is
2m
rc = , (25.34)
1 2E 2 /3
with the E-dependent maximal velocity
2c
v(rc ) = E 2 . (25.35)
3 3
This reproduces the above result for E = 1, shows that rc 2m for E 0, and
p
moreover curiously shows that there is a maximal E, Emax = 3/2, for which such a
critical point of the coordinate velocity will occur, at rc , with maximal velocity
p
E Emax = 3/2 rc and v(rc ) vmax = c/ 3 . (25.36)
Thus for larger values of E, the coordinate velocity will monotonically decrease from its
initial value
v(r ) = E 1 (E 2 1)1/2 (25.37)
to v(r) = 0 as r 2m.
[And if anybody has an intuitive explanation for these facts, please let me know . . . ]
One dramatic aspect of what is happening at (or, better, near) the Schwarzschild radius
for very (very!) compact objects with rs > r0 is the following. Recall the formula
547
(2.169) for the gravitational redshift, which gave us the ratio between the frequency of
light e emitted at the radius re and the frequency received at the radius r > re
in a static spherically symmetric gravitational field. The result, which is in particular
also valid for the Schwarzschild metric, was
(1 2m/re )1/2
= . (25.39)
e (1 2m/r )1/2
We now choose the emitter to be the freely falling observer whose position is described
by re = r( ) or r(t), and the receiver to be the fixed observer at r rs . As re rs ,
one clearly finds
0 . (25.40)
e
Expressed in terms of the gravitational redshift factor z,
e
1+z = (25.41)
this means that there is an infinite gravitational redshift as re rs ,
re rs z . (25.42)
More explicitly, using (25.30) one finds the late-time behaviour of the redshift factor z,
in terms of coordinate time (or the distant observers proper time), to be
Thus for the distant observer at late times there is an exponentially growing redshift
and the distant observer will never actually see the unfortunate emitter crossing the
Schwarzschild radius: he will see the freely falling observers signals becoming dimmer
and dimmer and arriving at greater and greater intervals, and the freely falling observer
will completely disappear from the distant observers sight as re rs . Note that the
time-scale tz for this exponential redshift at late times is set by tz 4m/c, which is of
the order of
M
tz 105 s , (25.44)
Msun
so that this is pretty much instantaneous for an object the mass of an ordinary star.
We will come back to these estimates later on when (briefly) talking about gravitational
collapse in section 28.3.
This was for a static observer. As we have seen, the situation presents itself rather dif-
ferently for the freely falling observer himself who will not immediately notice anything
particularly dramatic happening as he approaches or crosses rs .
548
25.5 Geometry near rs and Minkowski Space in Rindler Coordinates
We have now seen in two different ways why the Schwarzschild coordinates are not
suitable for exploring the physics in the region r 2m: in these coordinates the metric
becomes singular at r = 2m and the coordinate time becomes infinite. On the other
hand, we have seen no indication that the local physics, expressed in terms of covariant
quantities like proper time or the geodesic equation, becomes singular as well. So we
have good reasons to suspect that the singular behaviour we have found is really just
an artefact of a bad choice of coordinates.
In fact, the situation regarding the Schwarzschild coordinates is quite similar to that of
the Rindler coordinates for Minkowski space we discussed (way back) in section 1.3.
in the coordinates
ds2 = 2 d 2 + d2 ; (25.46)
that the lines of constant are hyperbolas and that these are the worldlines of
observers with constant acceleration 1/;
that these coordinates are adapted to obervers with constant acceleration in the
same way as inertial coordinates are adapted to static observers: they stay at fixed
values of their spatial coordinate, and the coordinate time is a direct measure of
their proper time;
inertial (geodesic, freely falling) observers could exit this region in finite proper
time.
All this is of course quite reminiscent of the things that we have discovered so far about
the Schwarzschild geometry, and this is in fact more than just a loose analogy: as we
will see now, remarkably the Rindler metric (25.46) gives an accurate description of the
geometry of the Schwarzschild metric close to the Schwarzschild radius.
To confirm this, let us temporarily introduce the variable r = r 2m measuring the
coordinate distance from the critical radius r = 2m. In term of r the (t, r)-part of the
549
Schwarzschild metric reads
r r + 2m
ds2 = dt2 + r2 .
d (25.47)
r + 2m r
ds2 = 2 d 2 + d2 , (25.52)
which, remarkably, is identical to the Rindler metric (1.68). Keeping track of the trans-
verse 2-sphere and using r 2 (2m)2 in the near-rs approximation, the complete metric
in this limit and in these coordinates reads
If we further restrict to just a small angular region on the sphere, we can approximate
In any case, we see that, up to the harmless redefinition (r, t) (, ) that we just
performed, which is just a reparametrisation
p
= (r) = 8m(r 2m)
(25.55)
= (t) = t/4m
Schwarzschild coordinates for the Schwarzschild geometry near r = 2m are just like
Rindler coordinates for Minkowski space. This leads to a much improved understanding
of the Schwarzschild geometry in general and the observations that we made regarding
static and freely falling observers in particular:
550
1. The first, and most crucial, thing we learn from this is that the singularity of
the Schwarzschild metric at r = 2m in the Schwarzschild coordinates (t, r) is,
as anticipated, a mere coordinate singularity. Indeed, r = 2m corresponds to
r = 0 = 0, and we already know that the singularity of the Rindler metric at
= 0 is just a coordinate singularity (which can be eliminated e.g. by passing to
standard inertial Minkowski coordinates a via (1.67)).
3. The situation is quite different for the freely falling observers. Their worldlines
look like the vertical line labelled worldline of a static observer in Figure 7,
they cross the horizon in finite proper time, experiencing no strong acceleration
or gravitational fields. As already noted in section 1.3, they evidently become
invisible to observers in the Rindler quadrant (now Schwarzschild patch) of the
geometry, static outside observers noting an infinite gravitational redshift affecting
the signals sent out by the freely falling observer.
4. Introducing coordinates adapted to freely falling observers (so that e.g. time is
their proper time) would be tantamount to passing from Rindler coordinates to
ordinary Minkowski coordinates. and we will consider that option in section 26.2
below (Painleve-Gullstrand coordinates). These will provide us, as we will see,
with a coordinate system that extends in a non-singular way across the Schwarz-
schild radius.
551
just one new region (quadrant) of space-time (the one lying to the future of
r = 2m), but also counterparts of the other two quadrants of Minkowski space.
This expectation will indeed be borne out.
In section 25.2 we had seen that observers following timelike radial ingoing geodesics
reach r = 2m at finite values of the affine parameter (proper time), and in section 25.3
that they do so at infinite values of the coordinate time t. The same is mutatis mutandis
true for ingoing lightrays travelling along null geodesics, and the argument in this case
is even simpler.
Indeed, since for radial lightrays the effective potential (24.25) is zero, radial lightrays
are governed by the equation
1 2
2 r = Eef f = 12 E 2 r = E , (25.56)
the lower sign corresponding to ingoing lightrays. The solution for ingoing lightrays is
thus evidently
r() = r(0) E (25.57)
so that from any finite initial position r(0) > 2m, the Schwarzschild radius r = 2m is
reached for the finite value
r(0) 2m
= (25.58)
E
of the affine parameter.
Note, by the way, that (25.57) or r = 0 shows that r and are related by an affine
transformation so that one can equally well use r as the affine parameter along ingoing
null geodesics, making it even more manifest that r = 2m arises at a finite value of that
affine parameter. To reduce clutter, let us choose r(0) = 2m so that r = 2m is reached
at = 0. It then follows from f (r)t = E, i.e.
that
t() = E + 2mE/(E) = E 2m/ t() = E 2m log() . (25.60)
r 2m 0 t + (25.61)
also for null geodesics. Thus t is evidently not a good coordinate to describe physics at
(or beyond) r = 2m.
552
t
r=2m r
Figure 16: Causal structure of the Schwarzschild geometry in the Schwarzschild coordi-
nates (r, t). As one approaches r = 2m, the lightcones become narrower and narrower
and eventually fold up completely.
Thus
dt
= (1 2m/r)1 . (25.63)
dr
Recall that, as noted above, this equation for t = t(r) can be equivalently regarded as
an equation for t = t() as a function of the affine parameter.
In the (t, r)-diagram of Figure 16, dt/dr represents the slope of the lightcones at a given
value of r. Now, as r 2m, one has
dt r2m
, (25.64)
dr
so the light cones close up as one approaches the Schwarzschild radius. This is the
same statement as before regarding the fact that the coordinate velocity goes to zero at
r = 2m, but this time for null rather than timelike geodesics.
As our first step towards introducing coordinates that are more suitable for describing
the region around rs , let us write the Schwarzschild metric in the form
553
We see that it is convenient to introduce a new radial coordinate r via
r 2m
dr = (1 2m/r)1 dr = dr = (1 + )dr . (25.66)
r 2m r 2m
The solution to this equation is (up to an arbitrary finite constant)
r = r + 2m log(r/2m 1) . (25.67)
This new radial coordinate r , known as the Regge - Wheeler radial coordinate or tortoise
coordinate, also provides us with the solution
t = r + C (25.68)
do not seem to fold up as the lightcones have the constant slope dt/dr = 1 (see
Figure 17), and there is no singularity at r = 2m. However, r is still only defined for
r > 2m and the surface r = 2m has been pushed infinitely far away (r = 2m is now at
r = ). Moreover, even though non-singular, the metric components gtt and gr r
(as well as g) vanish at r = 2m.
Thus the tortoise coordinate has so far not really allowed us to make dramatic progress
in our exploration of the region near or behind rs , but we will substantially improve
this situation in section 26.
As an aside, and to conclude this section, I just want to point out that the tortoise
coordinate r and the corresponding retarded and advanced time-coordinates u, v =
t r (which we will reencounter in section 26.4 as part of the Eddington Finkelstein
coordinates) are not only useful for clarifying the causal structure of the Schwarzschild
geometry, but also for the analysis of the propagation of scalar (and other) fields. This
is principally due to the fact that in these coordinates the (t, r)-part of the metric is
554
t
r*
r=2m
r* =-infinity
Figure 17: Causal structure of the Schwarzschild geometry in the tortoise coordinates
(r , t). The lightcones look like the lightcones in Minkowski space and no longer fold
up as r 2m (which now sits at r = ).
conformally flat - see (25.69) or (26.124) below. Combined with the observation of
section 6.7 that the Klein-Gordon action is conformally invariant in (1 + 1)-dimensions,
this leads to a canonical form for the (3 + 1) wave operator in the (t, r)-sector (and the
(, )-sector is standard anyway).
Specifically, we start with the action for a (massless, say) scalar field in the metric
(25.69),
Z
4
S[] gd x g
Z
= dt dr d r 2 f (r) f (r)1 ((t )2 + (r )2 ) r 2 S 2 (25.71)
Z
= dt dr d (rt )2 + (rr )2 ) f (r) S 2 ,
where f (r) = 1 2m/r, d = sin d d denotes the solid angle on the 2-sphere, and
S 2 is the Laplace operator on the 2-sphere. Separating variables according to
X
(x) = r 1 m (t, r )Ym (, ) , (25.72)
,m
using
S 2 Ym = ( + 1)Ym , (25.73)
555
and using
r r = f (r) rr (m /r) = r m r 1 f (r)m , (25.74)
(t2 r2 ) + V (r ) = 0 , (25.75)
where
( + 1) 2m
V (r ) = f (r) + 3 . (25.76)
r2 r
Remarks:
1. Note that this potential is quite similar to the effective potential (24.25)
L2
Vef f (r) = f (r) (25.77)
2r 2
for massless particles in the Schwarzschild geometry derived in section 24.1. In-
deed, in the large limit the last term in (25.76) can be neglected and then V
reduces to Vef f with the identification ( + 1) L2 . This is as it should be and
as one expects (and can show) on general grounds: in a suitable geometric optics
(large , high frequency) limit the massless Klein-Gordon equation reduces to the
Hamilton-Jacobi equation for null geodesics.
2. This potential is non-negative for all 2m < r < and goes to zero quite rapidly
as r or r 2m, i.e. as r ,
r + r : V (r) (r )2
(25.78)
r r 2m : V (r) e r /2m .
This means that at infinity (in r) and near the horizon, the solutions of this
equation can be chosen to have the standard right-moving (outgoing) / left-moving
(ingoing) form
(t, r ) e i(t r ) = e iu or (t, r ) e i(t + r ) = e iv .
(25.79)
3. However, this does not mean that a mode having the above form near infinity,
say, evolved from a mode that had also had such a form near the horizon. Rather,
the almost infinite exponential gravitational redshift between the near-horizon
and asymptotic regions discussed in section 25.4 leads to an exponential relation
between the parameter u, say, labelling an outgoing wave at infinity, and the cor-
responding parameter near the horizon. This exponential relation is analogous
to that encountered for a scalar field in Rindler versus inertial Minkowski coor-
dinates (section 6.8). For the Schwarzschild metric it is encoded in the precisely
556
analogous relation (26.138) between the coordinates u, v and the Kruskal coordi-
nates uK , vK to be introduced below, the latter being the analogues of Minkowski
inertial coordinates, and the former the analogues of Rindler coordinates. These
observations are at the heart of the so-called Hawking Effect, i.e. the quantum
radiation of black holes. See the references given in section 26.6 for introductions
to these topics.
(t, r ) = e it (r ) , (25.80)
It plays an important role in numerous aspects of Black Hole physics, e.g. in the
analysis of the stability of the Schwarzschild solution. In this context, the above
equation and its counterparts for vectors and symmetric tensors are known as the
Regge-Wheeler(-Zerilli) equations.
557
26 Black Holes II: the Schwarzschild Black Hole
The primary purpose of this section is to understand the significance and physics of
the Schwarzschild radius and the region r < 2m. We will acccomplish this principally
via a construction of appropriate, physically motivated coordinate systems and, indeed,
a secondary purpose of this section is to illustrate how to go about constructing such
coordinate systems in a systematic way (instead of just introducing them without further
explanations).
Even though the details will differ, the general principles of how to construct coordinates
to explore and understand a given space-time (by constructing and using coordinates
adapted to preferred classes of observers or geodesics) can be applied to other metrics,
e.g. some of those listed in section 29.
Let me also make clear from the outset what the issue is not. Namely, the issue is not
one of constructing appropriate coordinates solely in the region 0 < r < 2m. Indeed, we
have known such coordinates all along, simply the original Schwarzschild coordinates
(t, r). The Schwarzschild metric is a vacuum solution of the Einstein equations also
in that region, and the coordinates (t, r) give a valid non-singular description of the
metric there. Since f (r) = 1 2m/r < 0, their interpretation differs, i.e. r is a timelike
coordinate, and t plays the role of a radial coordinate, but this notational issue can
easily be rectified by renaming r = T, t = R, and writing the Schwarzschild metric in
the region 0 < r = T < 2m as
2m 2m
f (r) = 1 ( 1)
r T (26.1)
2m 2m
ds2 = ( 1)1 dT 2 + ( 1)dR2 + T 2 d22 .
T T
While this provides some minimal insight (e.g. that for r < 2m surfaces of constant r are
spacelike and that the metric looks time-dependent), what we are looking for is a way of
describing the physics of the Schwarzschild solution that encompasses both the region
r > 2m and the region r < 2m (and that therefore e.g. provides a valid continuous map
for the freely falling observer as he crosses r = 2m).
The way we will go about this is to use either the worldlines of ingoing timelike geodesic
observers or those of ingoing lightrays, as well as the affine parameter along them, to
provide us with coordinates in the region r < 2m (recall that in both cases r = 2m lies
at a finite value of that affine parameter so that this affine parameter is also a good
coordinate beyond r = 2m). We will explore the 1st option in sections 26.2 and 26.3
and the 2nd option in section 26.4 and subsequent sections. The latter, based on null
geodesics, is technically somewhat simpler than the former (and is therefore also the
558
approach commonly adopted in the literature), and will lead us rather painlessly and
quickly to the maximal analytic extension of the Schwarzschild geometry in section 26.7.
I have also included the former option here, based on timelike goedesics, precisely be-
cause the resulting coordinate systems, which are quite interesting in their own right
and are useful for certain more advanced applications, are usually not dealt with in any
detail in the standard textbooks (and I therefore had to work out many of these details
myself at some point). However, it is possible to skip sections 26.2 and 26.3 and go
directly to section 26.4 and continue from there.
We had seen that the Schwarzschild time coordinate t is adapted to static observers
(whose proper time is proportional to t), and therefore not useful for describing the
region r < 2m. We had also seen that freely falling observers cross r = 2m in finite
proper time . This suggests to choose some family of freely falling observers (geodesics)
and to use their proper time = T (t, r, , ) as the new time coordinate, i.e. to perform
the coordinate transformation
A natural (and the simplest) choice is to consider the family of freely falling observers
which fall radially (angular momentum L = 0) and start off at rest from infinity (energy
E = 1). In this case, the geodesic equations (25.14) and (25.15) take the form
559
We are thus led to the introduce the new time coordinate T (t, r) by
(r/2m)1/2 1
T (t, r) = t + (r) = t + 2(2mr)1/2 + 2m ln 1/2
. (26.9)
(r/2m) + 1
Since the metric does not depend explicitly on t, to determine the metric in the coordi-
nates (T, r, , ) we only need to substitute dt by
or p
r/2m
dt = dT dr . (26.11)
(r/2m) 1
Then one immediately finds the simple result
Most importantly, we see that due to the non-singular off-diagonal term the metric in
these coordinates is well-defined and non-degenerate for all 0 < r < , in particular at
r = 2m. This is the definitive proof that the singularity at r = 2m in Schwarzschild
coordinates is really just a coordinate singularity.
Remarks:
1. The metric has the characteristic property that the metric induced on the slices
of constant T is just the flat Euclidean metric for any T ,
This makes this form of the metric (and choice of time coordinate) particularly con-
venient e.g. for the canoncial quantisation of fields in the Schwarzschild space-time,
and it is also for this reason that this coordinate system has become increasingly
popular in recent years.77
77
See e.g. P. Kraus, F. Wilczek, A Simple Stationary Line Element for the Schwarzschild Geom-
etry, and Some Applications, arXiv:gr-qc/9406042 and citations thereto; A. Nielsen, M. Visser,
Production and decay of evolving horizons, arXiv:gr-qc/0510083 for PG-like metrics for general
time-dependent spherically symmetric metrics; C. Barcelo, S. Liberati, M. Visser, Analogue Gravity,
arXiv:gr-qc/0505065 for a detailed discussion of PG-like metrics in the context of so-called analogue
models of gravity.
560
2. It is easy to verify / confirm directly in the PG coordinates that the new time-
coordinate T really has the interpretation as measuring the proper time of
radially freely falling observers starting off at rest at infinity. To that end note
that in PG coordinates a radial timelike geodesic satisfies
p 2
T 2 + r + 2m/r T = 1 , (26.14)
while the conserved energy associated with the T -translation invariance of the
metric has the form
p
E = f T 2m/r r . (26.15)
These are precisely the geodesic paths with which we began the construction.
The coordinate transformation (26.9) is of the general form T (t, r) = t + (r) (23.2)
discussed previously, and preserves the t-independence and manifest spherical symmetry
of the metric. It leads to the metric in the form anticipated in (23.8).
Turning this around, once one has found the Schwarzschild metric in Schwarzschild
coordinates, say, one can of course discover an infinite number of new coordinate
systems for the Schwarzschild metric by performing such a (or a more general) coordinate
transformation. Only in special cases, however, will one find a coordinate system that
is actually useful. Let us see how to recover the PG coordinate system in this way,
and how to precover another coordinate system that we will disuss in more detail in
section 26.4.
where C(r) = f (r) (r) is an essentially arbitrary function of r. This metric represents
the Schwarzschild metric for any choice of C(r). In particular, the vacuum Einstein
equations do not impose any constraints on C(r) - this can be checked explicitly, but it
would be silly to do so since we have just seen explicitly that C(r) is not determined and
just corresponds to the freedom of performing a particular class of coordinate transfor-
mations. One can therefore now choose C(r) at will.
561
1. One natural choice is
gT r = 0 C(r) = 0 . (26.18)
This makes the metric diagonal, corresponds to (r) constant, and evidently re-
turns one to Schwarzschild coordinates. Any other choice of C(r) will lead to a
non-diagonal metric in the coordinates (T, r).
We see that with the upper sign this is precisely the choice giving rise to Painleve-
Gullstrand coordinates introduced at the beginning of this section, confirmed by
the fact that for (r) this implies the differential equation
p p
C(r) = + 2m/r (r) = 2m/rf (r)1 (r) = (r) . (26.20)
The same argument shows that the lower sign gives a corresponding set of PG
coordinates based on outgoing rather than on ingoing coordinates.
so that the metric has the particularly simple (and non-singular at r = 2m) form
Referring back to the discussion of section 25.6, in particular (25.66), we see that
the solution to this equation is
These are just the advanced and retarded time coordinates (u, v) of section 25.7,
and we will encounter them in section 26.4 as Eddington-Finkelstein coordinates.
This procedure can in principle be applied to any static, spherically symmetric metric,
and we will also make use of it later, e.g. in section 38.2 when constructing coordinates
for de Sitter space.
u u = u u = 12 (u u ) = 0 , (26.24)
562
(this is a special case of the general result established in (8.63) that a gradient vector
field is geodesic iff it is of constant length), and that the metric component gT T with
respect to this new coordinate function T is
g T T = g T T = g u u = 1 , (26.25)
Conversely, this kind of reasoning may allow one to discover the geometric interpretation
of a coordinate system that one has selected through other criteria, e.g. via a convenient
choice of the function C(r) in (26.17). In particular, applied to PG coordinates, one
can argue as follows:
1. Given the metric in PG coordinates, one seeks the interpretation of the gradient
covector
u = T , (26.26)
and comparison with (26.3) shows that this is precisely the family of tangent
vectors
u = x = (t,
r, )
, (26.29)
Even though we now have a coordinate system that extends in a non-singular way
across r = 2m, so that r = 2m is not a true singularity, this does not mean that nothing
interesting at all happens at that locus. Indeed, the (legitimate) static observers at
large radii r 2m are still waiting for an explanation for their observations, described
in sections 25.3 and 25.4. Simply telling them that their coordinates are no good near
r = 2m will hardly be considered by them to be a satisfactory explanation of what they
observe.
78
See K. Martel, E. Poisson, Regular coordinate systems for Schwarzschild and other spherical space-
times, arXiv:gr-qc/0001069, which also introduces generalised PG coordinates adapted to geodesic
observers with E > 1, i.e. with non-zero velocity at infinity. Corresponding coordinates for E < 1, i.e.
adapted to freely falling observers with maximal radius ri < , have been constructed by Gautreau
and Hoffmann (see the references in this article).
563
In order to address this issue, we now look at the behaviour of (radial) lightrays in PG
coordinates, characterised by
p
f (r)dT 2 + 2 2m/rdT dr + dr 2 = 0 . (26.30)
This is the usual situation. There is one outgoing direction along which r = r(T ) grows,
and one ingoing direction along which r(T ) decreases. In particular, while remaining
within the future lightcone one can choose to go to either larger or smaller values of r.
r < 2m : r+ <0 and r <0 . (26.34)
Thus along both directions lightrays (and therefore massive particles as well) must move
to smaller values of r. In particular, no observer, no lightray and no information can
escape from the region r < 2m. No wonder that the asymptotic static observers never
see the infalling observer cross r = 2m. This new region is a future extension of the
Schwarzschild patch r > 2m of the space-time, uncovered by ingoing radial geodesics.
With the opposite choice of sign in (26.19), corresponding to outgoing rather than
ingoing radial geodesics, one would instead have discovered a region r < 2m in which
> 0. This can therefore not possibly be the same region r < 2m, and indeed is a
r
past extension of the Schwarzschild patch, uncovered by the back-tracking of outgoing
radial geodesics.
While it is possible to further explore the consequences of all this in terms of the present
PG coordinates, since we have been led to consider the structure of the lightcones, i.e.
null geodesics, it turns out to be more convenient, also for the following, to discuss
this in terms of coordinates that are adapted to lightrays rather than to the timelike
geodesics. These are the Eddington-Finkelstein coordinates to be discussed in section
26.4 below.
564
26.3 Lematre and Novikov Coordinates
This will be accomplished by constructing comoving coordinates for the geodesic ob-
servers underlying the (generalised) PG coordinate system, where comoving means that
these observers remain at fixed values of all the spatial coordinates and only evolve in
time = proper time. Since radial geodesics already remain at fixed values of the angular
coordinates (, ), what this amounts to is to trade the coordinate r for another coor-
dinate that simply labels the individual geodesics (and that is therefore, tautologically,
guaranteed to also remain constant along such a geodesic). Such a label is provided by
an integration constant appearing in the solution to the radial geodesic equation since
(tauto-)logically such an integration constant is constant along the geodesic.
Let us first consider the case E = 1. In this case, the effective potential for radial motion
gives us the radial equation
2m
r 2 = , (26.35)
r
with the ingoing solution (26.5),
p
r = 2m/r r( ) = [3 2m(0 )/2]2/3 . (26.36)
This satisfies
p
dr = rd
+ (r/)d = r(d
d) = 2m/r(d d) , (26.39)
so that (denoting the PG proper time coordinate T now by ) the Schwarzschild line
565
element in PG form (26.12) turns into
p 2
ds2 = d 2 + dr + 2m/rd + r 2 d2
p 2
= d 2 + 2m/rd + r 2 d2 (26.40)
2m
= d 2 + d2 + r(, )2 d2 ,
r(, )
with r(, ) given explicitly by (26.38). This is the Schwarzschild metric in Lem
aitre
Coordinates (Lematre (1938)).
Remarks:
1. These coordinates have a clear physical interpretation and also manifestly extend
in a non-singular way across r = 2m, the Schwarzschild radius being located at
the innocuous value
r = 2m = 4m/3 (26.41)
of the Lematre coordinates, to all r > 0.
2. This is the first (but will not be the last) time that we see that it can be useful to
work with a radial coordinate, here = (r, ), that is not equal to the standard
(aerial radius) radial coordinate r.
This also shows that the form of the metric does not depend on the precise choice
of integration constant used to label the geodesics, since it is manifestly invariant
under transformations
4. Since
dr = 0 d = d , (26.45)
the metric induced on surfaces of constant r = r0 is given by
2 2m
ds |r=r0 = 1 d 2 + r0 (, )2 d2 . (26.46)
r0 (, )
566
Thus, as could have (partially) been anticipated from the Schwarzschild form of
the metric, a hypersurface of constant r = r0 is timelike for r0 > 2m, null for
r0 = 2m, and spacelike for r0 < 2m.
5. The surfaces of constant time = 0 , on the other hand, are manifestly spacelike
everywhere,
2m
ds2 | =0 = d2 + r(0 , )2 d2 . (26.47)
r(0 , )
They run into the spacelike singularity at r = 0 at = 0 .
6. In these coordinates, the volume element g has the simple form
g= 2mr 3/2 sin = 3m( ) sin . (26.48)
(, ) ( + c, + c) (26.49)
We will discuss comoving coordinates, and metrics employing them, in much more
detail in the context of cosmology in sections 32-37, cf. in particular section 33.2
for geodesics of comoving observers in comoving coordinates,
567
(unlike the Novikov coordinates, briefly discussed below, or the Kruskal-Szekeres
coordinates to be described in detail later on) again makes them an attractive co-
ordinate system to use e.g. when studying quantum field theory in a Schwarzschild
background.79
Closely related to Lematre coordinates are the so-called Novikov coordinates, comoving
coordinates based on radial geodesics with a finite maximal radius, ri < (i.e. E < 1)
in the notation of section 25.2. In this case, the equation to solve is
2m 2m
r 2 = + 2Eef f = + E2 1 , (26.53)
r r
where Eef f or E depends on the choice of geodesic via a choice of integration constant
(one could even choose E itself, say, as a comoving coordinate). Using ri instead, related
to the energy by
2m
E(ri )2 = 1 = f (ri ) = 1 + 2Eef f (ri ) , (26.54)
ri
the trajectories are implicitly defined by (25.22) and (25.23), i.e.
r() = 21 ri (1 + cos )
3 1/2 (26.55)
ri
() = ( + sin ) .
8m
We now think of these relations, together with the usual relation E = f t (25.15) as
defining a change of variables
Note that ri is clearly a comoving coordinate, as it can be used to label the geodesic.
Note also that, if required/desired, from (26.55) one can solve for = (r, ri ),
= arccos(2r/ri 1)
3 1/2 p (26.57)
ri
(r, ri ) = arccos(2r/ri 1) + 2 (r/ri ) (r/ri )2 .
8m
However, even without having to solve or invert these implicit equation, it is easy to see
that the resulting metric will have the form
568
Writing
+ t dri
dt = td , + r dri ,
dr = rd (26.59)
ds2 = (f t2 f 1 r 2 )d 2 +2(f 1 rr
f tt
)d dri +(f 1 (r )2 f (t )2 )dri2 . (26.60)
Moreover, for comoving coordinates the second term in brackets, i.e. the off-
diagonal term, is zero, g ri = 0 (26.52). Thus we can deduce
g ri = 0 f 1 rr
f tt
=0 . (26.62)
Using
1/2
r = (fi f )1/2 , f t = E = fi (fi f (ri )) , (26.63)
Remarks:
1. Note that, just as the Lematre metric in (26.43), the form of the Novikov metric
does not depend on the precise choice of integration constants used to label the
geodesics, as the term (ri r)2 (dri )2 is invariant under transformations
ri = F (ri ) . (26.66)
( r)2
ds2 = d 2 + d2 + r(, )2 d2 , (26.67)
1 + 2Eef f ()
569
where r(, ) is the solution to the radial equation of motion
2
r 2m
= + 2Eef f () . (26.68)
r
For Eef f () = 0 this reduces to the metric in Lematre coordinates, for Eef f < 0
and = ri this reduces to the above metric in Novikov coordinates, and one can
likewise consider Novikov coordinates based on radial geodesics with Eef f > 0,
i.e. with a non-zero velocity at infinity.
2. Usually, the metric is expressed not in terms of the variable ri but in terms of
R2
R = (ri /2m 1)1/2 ri = 2m(R2 + 1) E 2 = fi = (26.69)
R2 + 1
so that the metric takes the form
2
2 R2 + 1
2 r
ds = d + dR2 + r(, R)2 d2 . (26.70)
R2 R
be written as
" 1/2 #
(r, R) r (r/2m) 2 1/2 r/2m
= (R2 + 1) 2 + (R2 + 1)3/2 arccos .
2m 2m R +1 R2 + 1
(26.73)
This expression is occasionally found and used in the literature.
3. The fact that r(, ri ) or r(, R) is only determined implicitly makes Novikov coor-
dinates somewhat more awakward to use in practice than Lematre coordinates.80
Nevertheless, this is compensated by their clear physical interpretation. In par-
ticular, they are useful e.g. in numerical simulations and other investigations of
gravitational collapse, and the Novikov metric will indeed naturally arise in this
context in our brief discussion of gravitational collapse in section 28, in particular
first in section 28.3.
4. Since the timelike geodesics that Novikov coordinates are based on oscillate in
finite proper time from r = 0 in the past through the maximal radius ri to r = 0
in the future, Novikov coordinates cover both the past and future extensions of the
80
See e.g. the appendix of J. Makela, A. Peltola, Thermodynamical Properties of Horizons,
arXiv:gr-qc/0205128 for somewhat more explicit expressions, including a Taylor expansion of r .
570
metric in the Schwarzschild patch discovered in terms of PG coordinates in section
26.2, and to be discussed again below in section 26.4. The reflection symmetry
R R of the metric also exchanges the Schwarzschild patch and its mirror
region (first encountered in the context of isotropic coordinates at the end of
section 23.5, and to be discussed in detail in section 26.7), and thus Novikov
coordinates actually turn out to provide a complete covering of the fully extended
Kruskal-Schwarzschild space-time.81
Instead of singling out some class of timelike observers in order to construct coordinates,
it is at least equally (if not more) natural to introduce coordinates that are adapted to
null geodesics. We can easily accomplish this by promoting the integration constant C
in (25.68) labelling the lightray to a new coordinate, namely the retarded and advanced
time coordinates
u = t r , v = t + r , (26.74)
where
r = r + 2m log |r/2m 1| (26.75)
is the solution of
dr
= f (r)1 . (26.76)
dr
Then ingoing radial null geodesics (dr /dt = 1) are characterised by v = const.
and outgoing radial null geodesics by u = const. (and u and v can be thought of as
comoving coordinates for outgoing resp. ingoing lightrays).
Then we can label space-time points (in addition to by their angular coordinates) e.g.
in terms of ingoing lightrays by specifying the lightray (i.e. v) and the affine parameter
indicating how far one has to travel along that lightray to reach the point. As we
had seen explicitly e.g. in (25.57), this affine parameter can conveniently be chosen to
be the radial coordinate r itself, and evidently extends across r = 2m.
571
(and likewise for u). It follows that in terms of these coordinates the Schwarzschild
metric reads
ds2 = (1 2m/r)dv 2 + 2dv dr + r 2 d2 (26.78)
Remarks:
1. Even though the metric coefficent guu or gvv vanishes at r = 2m, as for the
PG coordinates of section 26.2 there is no real degeneracy, the two-dimensional
metric in the (v, r)- or (u, r)-directions having the completely non-singular and
non-degenerate form !
(1 2m/r) 1
g= (26.80)
1 0
In particular, the determinant of the metric is
which is completely regular at r = 2m. Therefore we can now extend the range of
r to the region r < 2m with impunity. Thus this provides another explicit proof
that the singularity of the Schwarzschild metric in the Schwarzschild coordinates
at r = 2m is a removable (pure coordinate) singularity.
or
= t = (t r)r + (t u)u = u . (26.83)
That this is indeed a Killing vector is obvious from the fact that in Eddington-
Finkelstein coordinates the components of the metric do not depend on v or u.
In particular, now extends smoothly across r = 2m, with norm
g = (1 2m/r) . (26.84)
Thus is timelike for r > 2m, null on r = 2m and spacelike for 0 < r < 2m. This
is a crucial and characteristic feature we will come back to on various occasions
in subsequent sections.
572
To determine the lightcones in ingoing Eddington-Finkelstein coordinates we again look
at radial null geodesics which this time are solutions to
x () = (r = r0 , v = v0 , = 0 , = 0 ) (26.86)
are geodesics. The sign has been chosen so that the tangent vector x =
dx /d is future-oriented, i.e. such that its scalar product with t = v is negative,
!
g x (t ) = gv x = r < 0 . (26.87)
Thus the radius indeed decreases along future-oriented ingoing null geodesics (as
the name was meant to suggest) and the radial coordinate is an affine parameter
along these geodesics (as we also already saw in (25.57)).
That these curves are geodesics can be seen explicitly by calculating the acceler-
ation of u = (0, 1, 0, 0) (in the coordinates (v, r, , )),
u u = r u = r u = rr , (26.88)
r r = rr . (26.89)
Either way this shows that r is geodesic, as rr = 0 (since grr = 0, the only
possible contribution to rr could have arisen from r gvr , but gvr = 1 is constant).
v(r) = 2r + C u = t r = C . (26.91)
We thus reassuringly recover the fact that outgoing lightrays are described by lines
of constant u.
573
Thus the metric and the lightcones remain well-behaved (do not fold up) at r = 2m,
the surface r = 2m is at a finite coordinate distance, namely (to reiterate the obvious)
at r = 2m, and there is no problem with following geodesics beyond r = 2m.
In particular, this means that we now encounter no difficulties when entering the region
r < 2m, e.g. along lines of constant v and this region should be included as part of
the physical space-time. Note that because v = t + r and r for r 2m,
we see that decreasing r along lines of constant v amounts to t . Thus the new
region at r 2m we have discovered is in some sense a future extension of the original
Schwarzschild space-time.
To understand the nature of this new region, we now take a more detailed look at the
behaviour of the lightcones. Even though the lightcones do not fold up at r = 2m,
something interesting is certainly hapening there. Whereas, in a (v, r)-diagram (see
Figure 18), one side of the lightcone always remains horizontal (at v = const.), the
other side becomes vertical at r = 2m (dv/dr = ) and then tilts over to the other
side. In particular, beyond r = 2m all future-directed paths, those within the forward
lightcone, now have to move in the direction of decreasing r: clearly the ingoing null
geodesics move towards smaller values of r, but so do those that for r > 2m were
outgoing,
(26.90) dr/dv < 0 for r < 2m . (26.92)
There is thus no way to turn back to larger values of r, not on a geodesic but also not
on any other path (i.e. not even with a powerful rocket) once one has gone past r = 2m.
Thus, even though locally the physics at r = 2m is well behaved, globally the surface
r = 2m is very significant as it is a point of no return:
Since r = 2m is a null surface, once one has reached the event horizon one has to
travel at the speed of light to stay there and not be forced further towards smaller
values of r.
Once one has passed the Schwarzschild radius, there is no turning back to larger
values of r.
Even though the Eddington-Finkelstein coordinates (more precisely the closely related
Eddington time coordinate - cf. section 26.5 below) were already introduced by Edding-
574
v
v=const.
r
r=0 r=2m
ton back in 1924, this full significance of the Schwarzschild radius and its interpretation
as a one-way membrane were only understood much later (Finkelstein, 1958).
Remarks:
n = r , = v + 21 f (r)r . (26.94)
It is easy to check that these are 2 linearly independent future-pointing null vector-
fields, n corresponding to ingoing lightrays and to (would-be) outgoing lightrays,
575
cross-normalised to n = 1,
n n = = 0 , n = 1 . (26.95)
With s = r 2 sin , one finds from (26.93) that
2 rf (r) r 2m
n = , = 2
= . (26.96)
r r r2
In particular, in Minkowski space one has the standard behaviour that the ingoing
radial null congruence always has negative expansion (it is contracting) while the
outgoing radial null congruence has positive expansion (it is expanding),
2 1
f (r) = 1 n = <0 , = + >0 . (26.97)
r r
While one still has n < 0 in the Schwarzschild geometry (ingoing lightrays are
contracting),
2m
f (r) = 1 n < 0 r , (26.98)
r
for the congruence one has
> 0 r > rs
2m
f (r) = 1 = 0 r = rs (26.99)
r
< 0 r < rs
Thus r = rs is characterised by the fact that the outgoing lightrays have zero
expansion. We will come back to this, and the related notion of trapped surfaces,
briefly in the context of the discussion of horizons of black holes in section 31.
2. In the above (v, r) coordinate system we can cross the event horizon only on future
directed paths, not on past directed ones, and only in the direction of decreasing
r. However, clearly this cannot be the whole story: the Schwarzschild metric in
the Schwarzschild coordinates is invariant under time-reflections t t. Hence
there must also exist a time-reversed version of the future extension and its event
horizon. Noting that t t implies
it is clear that one will have access to this new region when working with the out-
going Eddington-Finkelstein coordinates (u, r) instead, and backtracking outgoing
lightrays beyond t = .
Indeed, when one uses the coordinates (u, r) instead of (v, r), the lightcones in
Figure 18 are flipped (either up-down or left-right), and one can now pass through
the horizon along future directed paths only in the outgoing direction of increasing
r.
576
The new region of space-time covered by the coordinates (u, r) is thus definitely
different from the new region we uncovered using (v, r) even though both of them
lie behind r = 2m. In fact, this one is a past extension (beyond t = ) of
the original Schwarzschild patch of space-time. In this patch, the region behind
r = 2m acts like the opposite (time-reversal) of a black hole (a white hole) which
cannot be entered on any future-directed path.
As we will discuss later, in section 28.3, this white hole region is unphysical, i.e.
an artefact of the idealisation of an eternal black hole metric. The future black
hole region, however, is definitely of relevance as it can be created by gravitational
collapse.
3. One can of course check directly that the Eddington-Finkelstein metric (26.78)
is a solution of the vacuum Einstein equations. One can also obtain the metric
directly from integrating the Einstein vacuum equations if one starts not with the
standard form of a static isotropic metric (23.6) but makes an ansatz of the form
in terms of two unknown functions f (v, x) and r(v, x) of two variables. The same
arguments as in the discussion of ingoing null geodesics above show that this
general form of the metric is characterised by the fact that that the integral curves
of the null vector x (i.e. the curves with constant (v, , )) are null geodesics, with
affine parameter x,
gxx = 0 , x x = 0 . (26.102)
As one can always choose to build a coordinate system in such a way, this is a valid
a priori ansatz for the metric, and from this perspective one never encounters the
question if the singularity at r = 2m is real or not, since it is not even a coordinate
singularity in these coordinates.
Alternatively, one can start with the metric in the Bondi gauge (39.30),
the defining feature of a black hole, namely the existence of an event horizon which
causally seals off the interior (black hole) region from the outside (we will discuss
this in more general terms in section 31.4),
577
and the prime example of a black hole gravitational field, namely that described
by the future extension of the Schwarzschild metric,
let us briefly, to conclude this section, discuss (and equally briefly dispose of) several
popular misconceptions about black holes that are unfortunately quite common in the
pop-sci and sci-fi literature:
578
Schwarzschild metric is (26.145)
m2
R R . (26.104)
r6
This shows that the strength of the gravitational field in the Schwarzschild geom-
etry, as measured by tidal forces, is m/r 3 (as in the Newtonian theory). Near
the horizon at r = 2m this is
m/r 3 1/m2 (26.105)
and can thus be arbitrarily weak for sufficiently massive (and large) black holes.
Related to this, we will estimate in section 28.3 the average density of an object
with the size of its Schwarzschild radius and will see that for sufficiently massive
objects (e.g. galaxies) this density can be as small as one likes.
The crucial point, as we will discuss in more detail in section 31, is that by
definition a black hole is characterised in terms of global properties of space-time
(in the sense of a region of space-time that is invisible to an asymptotic observer),
and these are not necessarily unambiguously detectable by local observers. In
particular, as we will see, event horizons can exist in (and emerge from) flat
Minkowskian regions of space-time, in which there is definitely no trace of a strong
gravitational field (at that time!).
Ordinary space-time diagrams are more familiar (and therefore more intuitive) than
space-null diagrams such as the above (r, v)-diagram (in which, for example, in the
asymptotically flat regime r the lightcone has slopes 0 and 2 rather than the
usual 45 slopes 1). In the present case this can easily be rectified by introducing a
new time-coordinate t instead of v by the relation
v = t + r = t + r , (26.106)
i.e.
t = t + 2m log(r/2m 1) . (26.107)
This time coordinate is also known as the Eddington time coordinate (it was discovered
and used first by Eddington, and then rediscovered by Finkelstein, but neither of them
wrote down explicitly the null form of the metric that is now known as the Eddington-
Finkelstein form of the metric and which we discussed above).
Initially the metric looks perhaps somewhat less illuminating in the (t, r)-coordinates,
579
but the lightcones and the horizon now have the following simple and easy to visualise
description, which follows from factorising the line element as
namely:
in particular, the horizon is (again) vertical in such a diagram, with the (would-
be) outgoing side of the lightcone vertical and tangent to the horizon, while the
lightcone degenerates as r 0.
Nice diagrams that you can find in many places depicting the collapse of a spheri-
cally symmetric star to a black hole and the formation of the horizon typically (either
implicitly or explicitly) use these (t, r)-coordinates.
v v = vv = 0 (26.114)
because v = t+ r is a null coordinate for the Minkowski metric dt2 + dr 2 + (. . .). This
is the Schwarzschild metric in what is known as Kerr-Schild form. I will therefore also
occasionally refer to t as the Kerr-Schild time coordinate (but, as mentioned above, t is
commonly also known as the Eddington time coordinate).
580
= (t, ~x) for Minkowski
By introducing standard inertial (Cartesian) coordinates x
space, the metric can now also be written in the form
2m
ds2 = dt2 + d~x2 + x )2
( d (26.115)
r
where r 2 = ~x2 , and has the components
u = t r = t r t = t 2m log(r/2m 1) , (26.117)
are known as Kerr-Schild metrics. This ansatz for the metric played an important role
in the search for other exact solutions of the Einstein equations.
g = f (x)N N (26.121)
g g = ( f (x)N N )( + f (x)N N )
(26.122)
= f (x)N N + f (x)N N = .
g N = N f (x)N N N = N , (26.123)
581
26.6 Kruskal-Szekeres Coordinates
The first guess might be to use the coordinates u and v simultaneously, instead of r and
t. In these coordinates, the metric takes the form
with r = r(u, v). While this is a good idea, the problem is that in these coordinates the
horizon is once again infinitely far away, at u = + or v = (i.e. at 2r = v u =
). We can rectify this by introducing coordinates U and V with
32m3 r/2m
ds2 = e dU dV + r(U, V )2 d2 , (26.129)
r
with r = r(U, V ) given implicitly by
U V = e (v u)/4m = e r /2m = (r/2m 1)e r/2m (26.130)
582
and t = t(U, V ) explicitly by
Finally, we pass from the null coordinates (U, V ) (meaning that U and V are null
vectors) to more familiar timelike and spacelike coordinates (T, X) defined, in analogy
with (u, v) = t r , by
U =T X , V =T +X , (26.132)
one finds that the coordinate transformation (t, r) (T, X) is explicitly, and in its full
glory, given by
Remarks:
1. In these coordinates, the original Schwarzschild patch r > 2m, the region of
validity of the Schwarzschild coordinates, corresponds to the region < U < 0,
0 < V < , or, in terms of X and T , to the region X > 0 and X 2 T 2 > 0, or
|T | < X. As Figure 19 shows, this Schwarzschild patch is mapped to the first
quadrant of the Kruskal-Szekeres metric, bounded by the lines X = T .
r = 2m X = T , (26.137)
583
t T
r=2m
Figure 19: Schwarzschild patch in the Kruskal-Szekeres metric: the half-plane r > 2m
is mapped to the quadrant between the lines X = T in the Kruskal-Szekeres metric.
584
7. Writing the implicit relation (26.130) between r and (U, V ) as
defined by
x = W (x)e W (x) (26.142)
Now that we have the coordinates X and T , we can let them range over all the values
for which the metric is non-singular. The only remaining singularity is at r = 0, which
corresponds to the two sheets of the hyperboloid
p
r = 0 T 2 X2 = 1 T = 1 + X2 . (26.143)
82
For some applications of the Lambert function in this context see e.g. K. Lake, Some notes on the
Kruskal - Szekeres completion, arXiv:1002.3600 [gr-qc], Some further notes on the Kruskal - Szekeres
completion, arXiv:1202.0860 [gr-qc].
83
For an introduction to QFT in curved spacce-time and a discussion of these effects, see e.g. the mono-
graphs N. Birrell, P. Davies, Quantum Fields in Curved Space, V. Mukhanov, S. Winitzki, Quantum
Effects in Gravity, or the on-line lecture notes J. Traschen, An Introduction to Black Hole Evapora-
tion, arXiv:gr-qc/0010055, T. Jacobson, Introduction to Quantum Fields in Curved Spacetime and the
Hawking Effect, arXiv:gr-qc/0308048, C. Krishnan, Quantum Field Theory, Black Holes and Hologra-
phy, arXiv:1011.5875.
585
That r = 0 is indeed a real singularity that cannot be removed by a coordinate trans-
formation can be shown by calculating some invariant of the curvature tensor, like the
Kretschmann scalar K = R R (7.51). On purely dimensional grounds, from
2
r2 (1 2m/r) m2 /r 6 , (26.144)
say, one would expect K to be proportional to m2 /r 6 , the crucial feature being that
the constant of proportionality is not zero, explicit calculations (this is a doable but
thoroughly unenlightning exercise) showing that
m2
R R = 48 . (26.145)
r6
Thus the geometry is genuinely singular at r = 0. Nevertheless, since the metric is
non-singular for all values of (X, T ) subject to the constraint r > 0 or T 2 X 2 < 1,
there is no physical reason to exclude the regions in the other quadrants also satisfying
this condition.
In addition to the Schwarzschild patch, quadrant I, we have three other regions, living
in the quadrants II, III, and IV, each of them having its own peculiarities. Note that
obviously the conversion formulae from (r, t) (X, T ) in the quadrants II, III and IV
differ from those given above for quadrant I. E.g. in region II one can use Schwarzschild(-
like) coordinates in which the metric reads
1
2 2m 2 2m
ds = 1 dt 1 dr 2 + r 2 d2 (26.146)
r r
(these are not the same coordinates as those in patch I, as we have seen we cannot
continue the Schwarzschild coordinates across the horizon), and in this quadrant (where
r is a time coordinate etc.) the relation between Schwarzschild and Kruskal coordinates
is
586
T r=0
r=2m
r=m
r=3m r=3m
II
III
X
I
IV
r=m
r=2m
r=0
Figure 20: Complete Kruskal-Szekeres universe. Diagonal lines are null, lines of constant
r are hyperbolas. Region I is the Schwarzschild patch, separated by the horizon from
regions II and IV. The Eddington-Finkelstein coordinates (u, r) cover regions I and II,
(v, r) cover regions I and IV. Regions I and III are filled with lines of constant r > 2m.
They are causally disconnected. Observers in regions I and III can receive signals from
region IV and send signals to region II. An observer in region IV can send signals into
both regions I and III (and therefore also to region II) and must have emerged from the
singularity at r = 0 at a finite proper time in the past. Any observer entering region II
will be able to receive signals from regions I and III (and therefore also from IV) and
will reach the singularity at r = 0 in finite time. Events occuring in region II cannot be
observed in any of the other regions.
587
To get acquainted with the Kruskal diagram, let us note the following basic facts (some
of which we had already noted for region I in the previous section).
1. Null lines are diagonals X = T +const., just as in Minkowski space. This greatly
facilitates the exploration of the causal structure of the Kruskal-Szekeres metric.
3. Lines of constant r are hyperbolas. For r > 2m they fill the quadrants I and III,
for r < 2m the other regions II and IV.
5. Notice in particular also that in regions II and IV worldlines with r = const. are
no longer timelike but spacelike. Including the transverse 2-sphere, this should be
rephrased as either lines of constant (r, , ) are spacelike for r < 2m or the
surfaces of constant r are spacelike for r < 2m.
6. Lines of constant Schwarzschild time t are straight lines through the origin. E.g.
in region I one has X = (coth(t/4m))T , with the future horizon X = T corre-
sponding, as expected, to t .
7. The Eddington-Finkelstein coordinates (v, r) cover the regions I and II, the coor-
dinates (u, r) the regions I and IV.
8. Quadrant III is completely new and is separated from region I by a spacelike dis-
tance. That is, regions I and III are causally disconnected. This is the region
that already prematurely made a brief appearance when we analysed the coordi-
nate transformation between Schwarzschild and isotropic coordinates at the end
of section 23.5. Thus isotropic coordinates actually cover the regions I and III.
Now let us see what all this tells us about the physics of the Kruskal-Szekeres metric.
An observer in region I (the familiar patch) can send signals into region II and
receive signals from region IV.
The same is true for an observer in the causally disconnected region III.
Once an observer enters region II from, say, region I, he cannot escape from it
anymore and he will run into the catastrophic region r = 0 in finite proper time.
As a reward for his or her foolishness, between having crossed the horizon and
being crushed to death, our observer will for the first time be able to receive
signals and meet observers emerging from the mirror world in region III.
588
Events occurring in region II cannot be observed anywhere outside that region
(black hole).
Finally, an observer in region IV must have emerged from the (past) singularity
at r = 0 a finite proper time in the past and can send signals and enter into either
of the regions I or III.
An even better visualisation of the causal and global structure of the maximally ex-
tended Schwarzschild solution is provided by its Carter-Penrose Conformal Diagram
(or Penrose diagram for short), given in Figure 21. Its construction (and the notation
used here) will be explained in section 27.
r=0 i+
2m
I+
=
r
i0
r
=
2m
r=0 i
All in all, the picture that we have uncovered of the complete maximally extended
Schwarzschild space-time is quite intricate and rich, but also somewhat peculiar (to say
the least). In particular, the mirror region III and the white hole region IV appear to
be quite unphysical. As we will see in section 28.3, reassuringly the existence of these
regions is an artefact of an eternal black hole solution and these regions do not actually
exist for astrophysical black hole solutions that are formed by gravitational collapse.
589
Killing vector field = t of region I, when expressed in terms of Kruskal coordinates,
becomes null on the horizon and spacelike in region II.
Indeed it is easy to check from (26.130) and (26.131) that the time-translation symmetry
(t, r) (t + c, r) of the Schwarzschild patch corresponds to the transformation
This is a boost in the (U, V ) or (T, X)-plane which leaves the entire Kruskal metric
invariant since
dU dV = dT 2 dX 2 (26.149)
is invariant and r = r(U, V ) depends on U and V only via the boost-invariant (time-
independent) quantity U V (26.130),
timelike
is timelike in the original region I (and in the mirror region III), with (26.134)
2m r/2m r 2m
I: ||K||2 = e ( 1)e r/2m = (1 ) , (26.153)
r 2m r
confirming that = t in region I. We thus recover the statement that the Schwarz-
schild metric in the Schwarzschild patch is static.
spacelike
is spacelike in region II. Thus region II has no timelike Killing vector field,
therefore cannot possibly be static, but has instead an additional spacelike Killing
vector field - cf. the Remark 1 in section 23.6 in connection with Birkhoffs theo-
rem.
Related to this is the fact, already mentioned above, that in regions II and IV the
slices of constant r are no longer timelike but spacelike surfaces. Thus they are
analogous to, say, constant t or T slices for r > 2m. Just as it does not make
sense to ask where is the slice t = 1? (say), only when is t = 1?, or where
is r = 3m?, in these regions it makes no sense to ask where is r = m?, only
when is r = m?.
590
null
is null on the horizon. This turns out to be the most interesting case, and turns
out to be one of the characteristic features of static black holes in general, and
therefore I will elaborate on this a bit in the following.
In this section, we will perform a rather pedestrian analysis of some properties of the
horizon which are related to the fact that the Killing vector becomes null on this null
surface. These properties can be derived and understood without a general knowledge
of the geometry of null hypersurfaces, as discussed in section 16, but it is useful to keep
in mind that they are just special cases of the properties of null hypersurfaces, and of
more general Killing horizons (to be discussed later on in section 31.5).
In particular, from section 16.2 one knows on general grounds that the integral curves
of the normal vector are (possibly non-affinely parametrised) null geodesics, but it is
instructive to rederive this here from scratch, and from a slightly different perspective.
Therefore generates translations along the horizon, v v + c, and is called the null
generator of (this branch of) the horizon. Thus v can naturally be used as a coordinate
there.
On the other hand the line U = 0 is itself an outgoing (radial) null geodesic, but
in light of the above v cannot possibly be an affine parameter along that geodesic
(the affine parameter should not reach infinite values half-way along the geodesic). The
failure of v to be an affine parameter on the horizon can be quantified by calculating the
acceleration = t t (4.94) of the (integral curves of the) Killing vector and then
taking the limit r 2m. This calculation is quite painless in Eddington-Finkelstein
591
coordinates (v, r) in which = v everywhere,
( = v is evidently a Killing vector because the components of the metric do not depend
on v in Eddington-Finkelstein coordinates). Then the acceleration is
( ) = vv = (f /2)(f r + v ) (26.155)
Remarks:
1. Since we have on the horizon, this shows first of all that there it generates
a non-affinely parametrised geodesic (see (2.35) and (4.96)). Moreover, we see that
one interpretation of the ubiquitous factor
1
= (26.157)
4m
is that it measures the inaffinity, i.e. the failure of v to be an affine parameter on
the horizon,
lim = . (26.158)
r2m
Anyway, to return to the beginning of this story, we have seen that v is not an affine
parameter along the horizon. It turns out, however, that the Kruskal coordinate V is an
affine parameter there (and this is one way of understanding why Kruskal coordinates
are so natural for exploring the causal structure of the metric), meaning that the null
curve
x () = (U (), V (), (), ()) = (0, , 0 , 0 ) (26.160)
is an affinely parametrised null geodesic. This follows on the nose from (26.156) and
the result (2.37) of section 2.2. Noting that is constant (actually not just along the
geodesic but on the entire horizon, but the former is all we need), (2.37) with
and v becomes (dropping integration constants)
d
e v (v) 1 e v V . (26.161)
dv
592
Another way to see this, which provides some more insight, is to analyse the geodesic
Lagrangian and the conserved quantity associated to in ingoing Eddington-Finkelstein
coordinates (one could also work in Kruskal coordinates, but nothing is gained by that).
The elementary steps in the calculation are the following:
From the metric (26.78) we deduce that for a radial null-geodesics one has
f (r)v 2 + 2v r = 0 (26.162)
where a dot denotes a derivative with respect to the affine parameter . The
geodesics with v = const describe ingoing null-geodesics and we are not interested
in these, so we have
f (r)v + 2r = 0 , (26.163)
f (r)v r = E . (26.164)
r = E r() = E( 0 ) + rs . (26.165)
For the null geodesic along the horizon we are ultimately interested in, we have
r() = rs , i.e. E = 0, but we need to approach this with some care, so we keep
the general solution for now. Analogously, for v we find the equation
2Er() 2rs
f (r)v = 2E v = = 2E + . (26.166)
r() rs 0
We can now take the limit E 0 with impunity, and are left with
2rs
v = v() = 2rs ln( 0 ) + const. . (26.167)
0
The prefactor 2rs = 4m is now precisely such that it cancels the factor 1/4m in
the definition of the Kruskal coordinate V (26.125), so that
V () = e v()/4m = a( 0 ) , (26.168)
593
We also see explicitly that for other null geodesic integral curves of v or V , i.e.
those with E 6= 0, and with solution
Remarks:
1. We now have two natural coordinates on the future horizon U = 0, V > 0 which we
can for instance use to measure the frequency of incoming waves. = t = v mea-
sures what is commonly called Killing frequency (this requires no further explana-
tion since it is associated to the Killing vector which generates time-translations
in the Schwarzschild patch), and is the natural notion of frequency to be used by
static obervers with 4-velocvity u .
V , on the other hand, measures the so-called free fall frequency since, more or
less by construction, a freely falling observer in the Schwarzschild geometry near
r = 2m will see approximately the Minkowski space-time metric ds2 dU dV
(recall that the transformation (2.158) between Rindler and Minkowski coordi-
nates is strictly analogous to the transformation (26.138) between Schwarzschild
and Kruskal coordinates). The exponential relation (26.125) between them reflects
the exponential blue- or redshift we first encountered in section 25.4.
2. As a concluding remark to this section, I cannot resist mentioning that the notion
of surface gravity plays a crucial role in the analysis of the classical dynamics
of general black holes, and even more so in the semi-classical context, since it is
directly proportional to the temperature of the famous Hawking radiation of an
evaporating black hole,
1
TH = = . (26.170)
2 8GN M
For more details on this and other related advanced topics I am not able to cover
or do justice to here, I refer you to the references given in footnote 83 of section
26.6, as well as to the superb Cambridge lecture notes on Black Holes.85
There is one more coordinate system for the Schwarzschild geometry that I want to
mention because it is quite remarkable and, equally remarkably, apparently not widely
known or used. It was discovered by W. Israel in 1966, and rediscovered several times
since, most recently by T. Klosch and T. Strobl in a different particularly insightful
85
P. Townsend, Black Holes, arXiv:gr-qc/9707012v1.
594
way.86 I will introduce these coordinates in a way that is complementary to those in
these articles (and that is also motivated by certain generalisations, that I will however
not discuss here).
Recall that the Kruskal coordinates were based on suitably combining the outgoing
and ingoing Eddington-Finkelstein coordinates. Now, more generally one frequently
encounters the situation that one knows a solution either in coordinates adapted to
ingoing null geodesics, or in coordinates adapted to outgoing null geodesics, but usually
not both (and given one constructing the other is usually a hard problem that may have
no simple analytical solution).
Thus, let us assume that we are given (or have found) the Schwarzschild metric in
outgoing Eddington-Finkelstein coordinates,
We are happy and proud of this, but we quickly realise that this cannot be the end of the
story because the above coordinates provide an incomplete covering of the space-time.
The simplest way to discover this is to study radial lightrays or null geodesics, governed
by the two equations
f (r)u 2 + 2u r = 0 (26.172)
(the null condition), and
f (r)u + r = E (26.173)
(due to the u-translation invariance). One set of null geodesics is simply given by u = 0,
i.e. u = u0 constant, and for these one has
u( ) = u0 r = E r( ) = E + r0 . (26.174)
For future-oriented null geodesics one needs E > 0, and therefore one has r > 0. These
are the outgoing null geodesics to which the outgoing Eddington-Finkelstein coordinate
system is adapted. Here r0 is an integration constant which, by an affine transformation
(actually just a shift) of , we could e.g. without loss of generality choose to be r0 = 2m.
Then the solution describes the outgoing null geodesics that emerge from the past event
horizon at r = 2m for = 0.
The other set of (thus ingoing) null geodesics has u 6= 0 and is therefore governed by
the equations
f (r)u + 2r = 0 and f (r)u + r = E . (26.175)
86
W. Israel, New Interpretation of the extended Schwarzschild Manifold, Phys. Rev. 143 (1966) 1016-
1021; T. Kl osch, T. Strobl, Explicit Global Coordinates for Schwarzschild and Reissner-Nordstroem,
arXiv:gr-qc/9507011. It is also mentioned in section 15.4.4 of the directory of exact solutions, H.
Stephani, D. Kramer, M. MacCallum, C. Hoenslaers, E. Herlt, Exact Solutions to Einsteins Field Equa-
tions (2nd Edition). See also K. Lake, An explicit global covering of the Schwarzschild-Tangherlini black
holes, arXiv:gr-qc/0306073, and Maximally extended, explicit and regular coverings of the Schwarz-
schild - de Sitter vacua in arbitrary dimension, arXiv:gr-qc/0507031 for some generalisations.
595
Subtracting the two one finds r = E, and therefore (again with a convenient choice of
integration constant or origin of )
0 r 2m , u . (26.177)
Thus we discover that we reach u = + in finite (affine) time, we run out of coordinate
space as the ingoing null geodesics approach r = 2m (and the coordinate u is evidently
not suitable for describing what happens beyond r = 2m, u = ).
Note that this locus is most certainly not the past event horizon (r = 2m, u = u0
finite), as we know that we can only cross that horizon along future-directed paths from
smaller to larger values of r (the white hole). Thus we have discovered a new barrier
(which also turns out to be a horizon, and which, with hindsight, we know is the future
event horizon), and we realise that the outgoing Eddington-Finkelstein coordinates do
not cover the whole space-time.
Now let us assume that we do not have the luxury of being able to appeal to ingoing
Eddington-Finkelstein coordinates to construct a future extension (and subsequently
the maximal Kruskal-Szekeres extension) of this space-time. How could we proceed?
so that the complete range of u, u (, +), is covered as x runs over the interval
x (, 0). This clearly now permits to continue the ingoing null geodesics beyond
u = +. However, if one just replaces u x, the metric appears to be singular at
x = 0. This can be rectified by introducing a new coordinate y through the relation
r 2m = xy , (26.179)
its range subject to the condition r > 0. To better understand the rationale for this
change of variables, note that as u one has r 2m x so that (r 2m)/x
remains finite in the limit - and can therefore be used as a new coordinate, the one that
we have called y.
It is straightforward to see that in these coordinates the metric takes the form
8my 2
ds2 = dx2 + 8m dx dy + (xy + 2m)2 d2 . (26.180)
xy + 2m
596
This is the Schwarzschild metric in Israel coordinates. Before discussing some of it most
important properties, note that by the simple scaling uI = 4mx, so that
is just the canonically normalised Kruskal coordinate introduced in (26.138), and the
relabelling y vI , one can put the metric into the somewhat more common canoncially
normalised form
vI2
ds2 = du2 + 2 duI dvI + ((uI /4m)vI + 2m)2 d2 . (26.182)
2m((uI /4m)vI + 2m) I
Here are some of the key-properties of this metric (to describe these I will continue to
use the coordinates (x, y), i.e. the form (26.180) of the metric):
1. First of all, one sees that the metric is completely non-singular as long as xy >
2m, and one can therefore let x and y run over all the values for which this
condition is satisfied.
r = 2m xy = 0 : {x = 0} {y = 0} (26.183)
3. Then one sees immediately that this space-time covers four distinct patches:
For x < 0 and y < 0 one has r > 2m: this is the original Schwarzschild patch.
There are two regions for which x and y have opposite sign, subject to the
condition xy > 2m: these are the white hole region behind the past hori-
zon (x < 0, y > 0), covered by the original outgoing Eddington-Finkelstein
coordinates, and the new region beyond the future event horizon with x >
0, y < 0.
Finally, for x > 0 and y > 0 one also has r > 2m: this is a distinct region
isometric to the Schwarzschild patch, the mirror region.
4. Thus the Israel coordinates provide not only a future extension of the Schwarz-
schild metric in Schwarzschild or outgoing Eddington-Finkelstein coordinates but
actually a complete covering of the maximal Kruskal-Szekeres extension of the
Schwarzschild space-time.
5. The usual Kruskal-Szekeres coordinates (U, V ) are related to the Israel coordinates
by
U x , V y e xy/2m + 1 . (26.184)
597
6. The main difference to (and advantage compared with) Kruskal-Szekeres coordi-
nates is that for the Israel coordinates the coordinate transformation to Schwarz-
schild coordinates and its inverse are completely explicit (whereas the radial co-
ordinate r is only given implicitly in terms of the Kruskal-Szekeres coordinates).
E.g. for x < 0 one has
) (
u = 4m log(x) x = e u/4m
(26.185)
r = xy + 2m y = (r 2m)e u/4m
8. By inspection, the metric depends only on products like xy, dxdy etc. Thus it has
the isometry
(x, y) (x, 1 y) (26.187)
y2
x 2 + x y = 0 (26.189)
xy + 2m
2y 2
xy
xx + yx =c . (26.190)
xy + 2m
One has the curves with x (as well as, of course, (, )) constant, so that these
are straight lines in an (x, y)-diagram. For these curves, (26.190) implies that
y is constant, so that y an affine parameter along these null geodesics.
The other null geodesics are given by the relation
y2
x + y = 0 . (26.191)
xy + 2m
xy
+ xy = c x( )y( ) = c + d . (26.192)
598
Using this to eliminate x and x from (26.191), the equation of motion for y
reduces to
xy = 2m log(Cy) (26.194)
599
26.11 Appendix: Summary of Schwarzschild Coordinate Systems
Here is, to wrap up this section, a list of the coordinate systems that we have used to
gain insight into the properties of the Schwarzschild metric. [As region III is isometric
to region I, this doubling of possibilities (e.g. Schwarzschild coordinates cover I or III,
etc.) has not been indicated in the last column. The perfectly valid but somewhat
un-insightful option to use Schwarzschild coordinates only in region II, say, and variants
thereof, have also not been indicated in the last column.]
Abbreviations:
600
27 Interlude: Carter-Penrose Conformal Diagrams
27.1 Introduction
Quite generally, the ability to visualise or depict complex situations plays an impor-
tant role in developing physical intuition in such a setting. However, clearly curved
4-dimensional space-times provide a challenge for every-day visualisation techniques,
and even relatively simple and highly symmetric space-times are often difficult to visu-
alise in a reliable way (just think of the rich structure that we uncovered when analysing
the Schwarzschild geometry and its Kruskal-Szekeres maximal extension). This is true
in particular for asymptotic or global aspects of a space-time (after all, in all the pictures
we have drawn so far this asymptotic region is infinitely far away).
An extremely useful (and widely and commonly used) method to visualise both the
causal and the global structure of a (sufficiently symmetric) space-time is that of Carter-
Penrose Conformal Diagrams (or Penrose Diagrams for short in the following, with
apologies to B. Carter), already briefly alluded to at the end of section 26 (see Figure
21).
In this section I will introduce and explain these Penrose diagrams by way of some ele-
mentary examples.87 I will not, however, enter into the underlying (and highly technical)
issues regarding the proper definition of (weakly) asymptotically simple or asymptoti-
cally flat space-times, say.88 Other examples will appear later on here and there in these
notes.
In order to capture both the global and the causal structure of such a space-time in
a (1+1)-dimensional diagram of finite extent, the basic idea is to find a coordinate
transformation
and such that (radial) light rays are always at 45 (as e.g. in the Kruskal diagram)
87
For introductions to Penrose diagrams, see e.g. Appendix H of S. Carroll, Spacetime and
Geometry: An Introduction to General Relativity, or section 2.4 of P. Townsend, Black Holes,
arXiv:gr-qc/9707012v1, or section 2 of A. Strominger, Les Houches Lectures on Black Holes,
arXiv:hep-th/9501071.
88
The mathematical aspects are discussed in detail in S. Hawking, G. Ellis, The large scale structure
of space-time and in section 11 of R. Wald, General Relativity.
601
How to implement the 1st requirement is already nicely illustrated by the conformal
compactification of the Euclidean plane mentioned in section 10.3 (see the discussion
around (10.65)):
r = tan /2 , (27.2)
With respect to the new metric with line element d s2 , the point = is now not only
at finite coordinate distance but also at finite (affine) geodesic distance, and adding it
conformally compactifies R2 to S 2 .
Analogous coordinate transformations (involving the tan function and related objects)
and conformal rescalings of the metric are commonly used to compactify the range
of non-compact coordinates of some space-time metric and to then construct Penrose
diagrams. In this case, however, we also need to pay attention to the 2nd requirement,
related to the causal structure.
Regarding the relation between the causal structure and conformal (or Weyl) rescalings
of the metric,
g (x) g (x) = (x)2 g (x) (27.5)
or
ds2 d
s2 = (x)2 ds2 , (27.6)
we just note the following facts:
602
1. The causal nature of a vector field or curve is invariant under conformal rescalings,
i.e. a vector field is spacelike with respect to g iff it is spacelike with respect to
g , a curve is everywhere timelike with respect to g iff it is everywhere timelike
with respect to g , etc.
3. In general, even though timelike or spacelike paths are mapped into timelike or
spacelike paths, timelike or spacelike geodesics are not mapped into each other.
However, the paths that are traced out by null geodesics are mapped into each
other under conformal rescalings (albeit with respect to different, and therefore in
general non-affine, parametrisations).
The first 2 assertions are obvious. Thus the only one that may require an explanation
is the 3rd. This follows directly from the relation between the Christoffel symbols of
the 2 metrics which is easily seen to be
= + 1 ( + g ) .
(27.7)
where
= x = d/d .
(27.9)
In particular, if x ( ) is an affinely parametrised geodesic for the metric g (x), then
it also satisfies the equation
+
x x 1 (g x x ) .
x x = 21 (27.10)
If g x x 6= 0 (i.e. for timelike or spacelike geodesics), this is not the geodesic equation
for the metric g (x). For null geodesics, on the other hand, one has
+
x x ,
x x = 21 (27.11)
and this is the geodesic equation, but with respect to a non-affine parametrisation (cf.
(2.35) and the discussion in section 2.2), with inaffinity
= (d/d )(log 2 ) .
( ) = 21 (27.12)
This null geodesic will then be affinely parametrised with respect to the parameter
determined along the null geodesic by the relation (2.37)
R
d
= e dt (t) = (x( ))2 (27.13)
d
= 2 d between the null affine parameters is not what might
Note that this relation d
have naively (but incorrectly) expected or extrapolated from the relation d s = ds
between the proper spatial distance or proper time intervals in the 2 metrics.
603
27.3 Penrose Diagram for (3+1) Minkowski Space
We will now see how to accomplish the desiderata laid out at the beginning of this section
in the simplest example, namely (3+1)-dimensional Minkowski space-time. At first
sight the (1+1)-dimensional case may appear to be an even simpler example. However,
because of the absence of an honest spatial radial direction in that case, it turns out to
be somewhat atypical, and therefore does not constitute the optimal starting point. I
will therefore discuss the (1+1)-dimensional case separately in section 27.4 below.
One simple-minded way to map the infinite coordinate ranges to a finite range would
be to introduce, in analogy with the Euclidean case above, a new radial coordinate R
related to r via a tan function, and likewise for t, along the lines of
However, while this accomplishes the 1st desideratum (finite range of coordinates), it
fails to satisfy the 2nd requirement (lightcones at 45 ). Indeed, using dt = dT / cos2 T
etc., one finds that the (t, r)-part of the metric takes the form
and evidently have a slope that depends on (T, R) (whereas we would like dT /dR = 1).
In order to rectify this, we will first introduce coordinates that are adapated to radial
in- and outgoing lightrays, namely the coordinates
604
Radial light rays are described by
du dv = 0 u = u0 or v = v0 . (27.21)
Lines of constant u describe outgoing radial lightrays while lines of constant v describe
ingoing radial lightrays.
In terms of these and the original coordinates we can now identify different infinities
(asymptotic regions) of Minkowski space-time, namely (in standard notation)
i (past timelike infinity): where one asymptotes to when one takes t at fixed
r
I + (future null infinity): where outgoing radial lightrays asymptote to in the future,
i.e. one takes v at fixed u
I (past null infinity): where ingoing radial lightrays asymptote to in the past, i.e.
one takes u at fixed v
of the lightcone coordinates. One possible choice which maps the symptotic regions to
finite values of the new coordinates is the (by now unsurprising) tan transformation
605
so this definitely describes a finite region in the (U, V )-plane, namely a triangle. Infinity
corresponds to the locus |U | /2 and/or |V | /2.
= tanh u
U , V = tanh v , (27.25)
say, with
V < +1
1<U (27.26)
but let us continue to work with the coordinates (U, V ) defined in (27.3). In terms of
these the metric (after some elementary trigono-gymnastics) takes the form
1
ds2 = 4dU dV + sin2 (V U )d2 . (27.27)
4 cos2 U cos2 V
Note in particular that the prefactor diverges as one approaches infinity, in agreement
with the evident fact that with respect to this metric infinity is at infinite proper distance
even though it is at finite coordinate distance.
However, if our interest is in the global and causal structure of the metric, while disre-
garding the proper distance structure also encoded in the metric, we can just remove
this prefactor. This will not change the fact that in/out radial light rays are described
by lines of constant V or U , but allows us to extend the metric to include the boundary
points at which the prefactor diverges. Thus we consider the metric
This can be put into a more familiar form by replacing the lightcone coordinates U and
V by new time and radial coordinates T and R via the analogue of u = t r, v = t + r
(27.18), namely
T =U +V , R=V U 0 . (27.29)
and the combined transformation from the original coordinates (t, r) to these coordinates
(T, R) is
t r = tan 21 (T R) . (27.31)
Before proceeding to draw the appropriate (1+1)-dimensional diagram for this (by sup-
pressing the spherical / angular directions), let me make some comments on this (3+1)-
dimensional metric.
Remarks:
606
1. If T had the range < T < + and R were a standard polar angular coordinate
, then this would be the standard metric on R S 3 , a space-time given by the
direct product of the time direction and a spatial 3-sphere of constant unit radius,
s2 = dT 2 + d23 ,
d (27.32)
with (1.117)
d23 = d 2 + sin2 d22 . (27.33)
2. This (in the present context unphysical) metric, regarded as a solution of the
Einstein equations, happens to have a name, namely the Einstein Static Universe
(ESU), and happens to be of some historical interest (because finding such a static
cosmological solution motivated Einstein to introduce the infamous cosmological
constant into his equations in the first place). For this reason, we will briefly
discuss this solution in the context of cosmology in section 36.2. However, for
present purposes this is just an unnecessary distraction.
3. In the current case of interest this is in any case not the range of the coordinates.
Rather, the triangular bound (27.24) on the coordinates U, V translates into the
conditions
|T | + R < , 0 R < (27.34)
on the range of the coordinates T, R. Thus, if one likes one can think of Minkowski
space as being conformally equivalent to the subspace of R S 3 defined by these
conditions. Combined with the previous comment one can thus think of Minkowski
space as conformally equivalent to a subspace of the ESU, and pictorial represen-
tations of this (with the ESU represented by the cylinder R S 1 ) can be found in
many places (including all but one of the references in footnote 87). I have never
found this particularly illuminating, however.
For this reason we will now just focus on the (1+1)-dimensional metric
s2 = 4dU dV = dT 2 + dR2
d (27.35)
with the coordinate ranges given in (27.24) and (27.34) respectively. In a (U, V )-diagram
(with the U -axis vertical, say, and the V -axis horizontal), this is evidently just the lower
right triangular half of a square of length centered at the origin. In terms of (T, R),
the apex of this triangle at U = /2, V = /2 is mapped to R = V u = and
T = V + U = 0. Thus this corresponds to a counter-clockwise rotation of the triangle
by /4, and we therefore obtain Figure 22.
607
(T = +, R = 0)
R=0
(T = 0, R = )
(T = , R = 0)
Figure 22: Towards the Penrose Diagram for Minkowski space: Minkowski space cor-
responds to the interior of the triangle, including the line R = 0 but excluding the
diagonal boundary lines and their endpoints.
(T 0, R ) (u , v +) (t finite, r ) (27.36)
i0 : (T = 0, R = ) . (27.37)
(T , R 0) (u +, v +) (t +, r finite) (27.38)
i+ : (T = , R = 0) , (27.39)
i : (T = , R = 0) , (27.40)
which is the upper diagonal line in Figure 22, and likewise for past null infinity,
By simply adding these regions and this information to the diagram in Figure 22, we
obtain our final version of the Penrose diagram of Minkowski space, Figure 23.
To get acquainted with this diagram (and with Penrose diagrams in general), let us note
the following facts:
608
i+
I+
r=0
i0
3. i , i0 , on the other hand, are really points because the radius sin R of the 2-
sphere vanishes at the poles R = 0, ;
6. all (infinitely extended) spacelike geodesics begin at i0 , pass through (are re-
flected at) r = 0 and end again at i0 ;
7. all (infinitely extended) null geodesics begin on I , are reflected at r = 0 and end
at I + .
While we cannot expect to learn too much about Minkowski space from this diagram
that we did not already know, understanding the asymptotic structure of Minkowski
space-time will be useful in the following, because any reasonable definition of an asymp-
totically flat space-time representing the gravitational field of an isolated object should
be such that asymptotically it looks like Minkowski space, i.e. its Penrose diagram
should asymptotically resemble that of Figure 23.
Moreover, this pictorial representation is also interesting in its own right since it makes
causal information easily accessible and, in particular, makes two features of Minkowski
space manifest that are not shared by all space-times:
1. Any timelike geodesic observer will eventually be able to see all of Minkowski space
(since eventually, at i+ , the past lightcone of the observer covers all of Minkowski
space).
609
2. Past and future lightcones of any 2 events intersect. In particular, any 2 events
in Minkowski space were causally connected at some time in the past.
characteristically, black hole space-times are such that observers at infinity do not
have access to all of space-time since they cannot see behind the event horizon (cf.
the discussion in section 27.5 below, and the more general discussion in section
31),
cosmological space-times typically also fail to have at least one of these properties,
and this is ususally characterised in terms of so-called cosmological particle and
event horizons (cf. the discussion in section 35.7).
27.4 Penrose Diagram for (1+1) Minkowski Space and Rindler Observers
and its Penrose diagram. The only (but crucial) difference to the radial part of (3+1)-
dimensional Minkowski space,
x = t x , (27.45)
their interpretation is now different: instead of describing in- and outgoing lightrays,
lines of constant x describe right-moving lightrays while lines of constant x+ describe
left-moving lightrays. Thus there are corresponding left and right future and past null
infinities IL and IR . Common sense and/or the analogue
of the coordinate transformation (27.31) now shows that this space-time can be repre-
sented in a Penrose diagram by doubling the triangle of Figure 23 to a diamond with
left and right asymptotic regions (Figure 24).
610
i+
IL+ +
IR
i0L i0R
IL IR
Rindler coordinates adapated to such an observer which cover the right (and/or left)
Rindler wedge of Minkowski space. The right Rindler wedge is the grey shaded area in
Figure 25. Also indicated there is the worldline of a Rindler observer.
This diagram also illustrates that non-geodesic timelike worldlines that are not geodesics
do not necessarily end up at future timelike infinity i+ but can end anywhere on future
null infinity I + (provided that there is enough acceleration).
i+
IL+ +
IR
i0L i0R
IL IR
Figure 25: Penrose Diagram for (1+1)-dimensional Minkowski space, showing the right
Rindler wedge (the grey shaded area) and the wordline of a Rindler observer.
We now come to the Schwarzschild metric. We already know, from our detailed in-
vestigations in section 26, that a very convenient global picture of the Schwarzschild
611
space-time is provided by the maximal Kruskal-Szekeres extension of the Schwarzschild
metric and the resulting Kruskal diagram (sections 26.6 and 26.7). And indeed a conve-
nient starting point for constructing the Penrose version of the Kruskal diagram is the
double-null form (26.129)
32m3 r/2m
ds2 = e dU dV + r(U, V )2 d2 , (27.47)
r
of the Schwarzschild metric in Kruskal coordinates.
We will come back to this below. However, it is instructive to go back a step and
first start with the more modest aim of constructing a Penrose representation of the
Schwarzschild patch (region I) of the space-time. As in the case of Minkowski space,
we start by introducing coordinates that are adapted to radial lightrays. These are
the advanced and retarded Eddington-Finkelstein coordinates u and v, related to the
Schwarzschild time coordinate t and the tortoise coordinate r (section 25.6) by (26.74)
u = t r , v = t + r . (27.48)
In terms of these the (t, r)-part of the Schwarzschild metric can be written as (26.124)
I + : (u finite, v +)
(27.50)
I : (u , v finite) .
as r ranges over 2m < r < +, there are also two other asymptotic regions, much
as in the case of (1+1)-dimensional Minkowski space discussed above. Here, however,
crucially and characteristically, their interpretation is quite different. Namely, as we
have seen in sections 26.4 and 26.6, these are the future and past horizons of the
Schwarzschild black hole at r = 2m, now denoted by H ,
H+ : (u +, v finite)
(27.52)
H : (u finite, v ) .
We can now map the entire (u, v)-plane to a finite (diamond) region by introducing the
, V (not to be confused with the Kruskal coordinates U, V ) via
coordinates U
u = tan U , v = tan V (27.53)
612
i+
H+ I+
2m
=
r
i0
r
=
H I
2m
i
| < /2 and |V | < /2, and we can then depict the Schwarzschild patch as in
with |U
Figure 26.
This already teaches us what the Schwarzschild patch of the Kruskal diagram (Figure
20) will look like in a Penrose diagram of the maximal Kruskal-Szekeres extension of the
Schwarzschild geometry. In order to extend this description beyond the horizons H , one
can of course now switch to Kruskal coordinates (U, V ) and introduce new coordinates
, V from the Kruskal coordinates U, V via the (by now familiar) transformation
U
U = tan U , V = tan V , (27.54)
and one can follow this up by the (by now equally familiar, cf. (27.29)) transformation
to new time and radial coordinates.
T = U
+ V , = V U
R (27.55)
One can of course work also out explicitly the metric in these coordinates. This is
straightforward but not really neccessary and I will spare you (and me) the details of
this, and will just add some remarks and explanations below. The upshot of this is that
then the Penrose (T, R)
diagram takes the form displayed in Figure 27.
Remarks:
1. In this diagram I have only labelled the various boundaries and horizons on the
right half of the diagram. Evidently the same labels can be pasted onto the mirror
left half.
2. The asymptotic structure (in the Schwarzschild patch and its mirror) is precisely
that of Minkowski space, in agreement with our intuition that the Schwarzschild
metric is asymptotically flat.
613
r=0 i+
+
I+
2m
H
=
r
i0
r
=
H
2m
I
r=0 i
Figure 27: Penrose Diagram of the maximal Kruskal-Szekeres extension of the Schwarz-
schild space-time.
= 0 and V = 0 respectively.
4. The future / past horizons H are at U
5. The future horizon H+ is (now manifestly) the boundary of the region from which
signals can escape to future null infinity I + .
6. Another way of phrasing this is in terms of the causal past of future null infinity
I + , the union of all the space-time points that lie in the past lightcone of some
point on I + : from this perspective, the horizon H+ is the boundary of (the closure
of) the past of future null infinity. We will come back to in section 31.
This accounts for the fact that in the Penrose diagram the singularities are now
represented by straight horizontal lines.
It is instructive to compare the Penrose diagram for the Schwarzschild metric with that
for the negative mass m = |m| Schwarzschild metric, which we write as
2 2|m| 2 2|m| 1 2
ds = 1 + dt + 1 + dr + r 2 d2 . (27.58)
r r
614
In this case the singularity at r = 0 is timelike, not hidden behind an event horizon,
and therefore naked. The coordinates are valid all the way to r = 0, and thus the
Penrose diagram (Figure 28) looks deceptively like that of Minkowski space (Figure
23), the crucial difference of course being that the vertical line r = 0 now represents a
singularity, visible all the way to I + .
i+
I+
r=0 i0
Figure 28: Penrose Diagram for the negative mass Schwarzschild metric. The singularity
at r = 0 is timelike and not hidden behind an event horizon.
Here is a list of other examples of Penrose diagrams that appear elsewhere in these
notes:
615
Figure 37 in section 30.9: regime of validity of Kruskal-Szekeres coordinates for
the Reissner-Nordstrm metric
Figure 40 in section 31.10: event horizon vs apparent horizon for the Vaidya metric
Figure 41 in section 31.11: event horizon vs apparent horizon for the collapsing
null shell
Figure 42 in section 31.12: event horizon vs apparent horizon for the Oppenheimer-
Snyder gravitational collapse
Figure 43 in section 31.13: location of (outer) trapped surfaces for the collapsing
null shell
Figure 47 in section 35.6: spatially flat cosmological solution with constant ex-
pansion velocity.
Figures 59 - 61 in section 41.3: Penrose diagrams for the linear mass Vaidya metric.
616
28 Black Holes III: Simple Models of Gravitational Collapse
Now you may well wonder if all this talk about white holes and mirror regions in the
previous sections is for real or just science fiction. Clearly, if an object with r0 < 2m
(figuratively speaking) exists and is described by the Schwarzschild solution, then we
will have to accept the conclusions of the previous section.
However, this requires the existence of an eternal black hole (in particular, eternal in the
past) in an asymptotically flat space-time, and this is not very realistic. If this were the
only way to obtain black holes, then one might be justified in simply regarding them as a
mathematical oddity, an unphysical feature permitted by the Einstein equations (much
like general relativity does not rule out closed timelike curves and other peculiarities)
but having nothing to do with the real world.
Non-eternal black holes are believed to exist, however, because they are believed to form
as a consequence of e.g. the gravitational collapse of a star whose nuclear fuel has been
exhausted (and which is so massive that it cannot settle into a less singular final state
like a White Dwarf or Neutron Star ).
Before trying to understand how we could model such a gravitational collapse of a star
(without having to worry about astrophysical issues), we briefly consider a very simple
toy model of gravitational collapse and black hole formation.
The arguably simplest (and very instructive, but highly idealised and unrealistic) model
of gravitational collapse to a black hole is provided by a collapsing thin (very thin!)
sphericall shell of null matter (radiation) in Minkowski space. Other simple toy-models
of gravitational collapse can be based on considering collapsing shells of non-null mat-
ter.89
Thus we consider the situation where we have an implosion of an infinitely thin spher-
ical shell of radiation in an otherwise empty space-time. This requires somewhat of a
conspiracy, of course, but let us assume that we have been nasty enough to arrange this.
The assumption that the shell is infinitely thin (delta-function localised) is of course a
mathematical idealisation and should be regarded as an approximation to a shell with
finite thickness.
In order to describe such incoming radiation (along ingoing null geodesics), it is natural
to work in ingoing coordinates that are adapted to such null geodesics. Thus Minkowski
space in ingoing coordinates (v = t + r, r) (and the usual coordinates on the sphere)
89
See e.g. E. Poisson, A Relativists Toolkit or R. Adler, D. Bjorken, P. Chen, J. Liu, Simple Analytic
Models of Gravitational Collapse, arXiv:gr-qc/0502040 for pedagogical discussions.
617
takes the form
ds2 = dv 2 + 2dv dr + r 2 d2 (28.1)
The relevance of these two metrics for the probem at hand arises from the fact that
in the two vacuum regions inside and outside the shell one will have (essentially by
Birkhoffs theorem)
In these adapted ingoing coordinates, we can assume that the shell moves along the
ingoing null trajectory v = v0 , as viewed from both the internal Minkowski geometry
and the external Schwarzschild geometry.
Naively, one can then simply attempt to describe the metric in ingoing Minkowski /
Eddington-Finkelstein coordinates by
2mf
ds2 = f (v, r)dv 2 + 2dv dr + r 2 d2 , f (v, r) = 1 (v v0 ) , (28.3)
r
where mf (or mf /GN ) is the final / total mass and (v) is the step function. See
section 28.2 for a slightly more detailed justification for this ansatz.
2m(v)
ds2 = f (v, r)dv 2 + 2dv dr + r 2 d2 , f (v, r) = 1 , (28.4)
r
in this particular case with the distributional mass function
m(v) = mf (v v0 ) . (28.5)
Vaidya metrics will be briefly mentioned in section 29.2 in an overview of black hole
solutions, see (29.17), and then again in sections 31.8 and 31.9 in the context of the
discussion of black hole horizons, and will be be discussed in some detail in sections 39
- 41.
Calculating the Einstein tensor of this metric (or using e.g. (39.9)), one finds that this
is a solution of the Einstein equations with an energy-momentum tensor whose only
non-vanishing component is
1 mf
Tvv = (v v0 ) . (28.6)
4GN r 2
618
This describes purely ingoing radiation, localised along the null world volume of the
shell, with constant total mass M = mf /GN , as desired and expected,
It is clear that at some point the radius of the shell (moving along the line v = v0 in the
direction of decreasing r) will reach and then cross its Schwarzschild radius. Once that
has happened, the exterior Schwarzschild geometry (covering the Schwarzschild patch
as well as the region outside the shell but inside the Schwarzschild radius) describes a
black hole with a future event horizon. However, there is no trace here of either the
mirror region III or the white hole region IV.
This is best understood by looking at the Penrose diagram for this solution. Let us
first just draw the null wordline of the shell in the Penrose diagram of Minkowski space
(Figure 23). This is given in Figure 29.
i+
I+
sh
el
l
r=0
i0
v = v0
Figure 29: Penrose Diagram indicating the worldline of a null shell in Minkowski space.
The worldline of the shell is given by the line v = v0 . Only the interior part (below the
line v = v0 ) is displayed correctly in this diagram.
However, this does not yet describe correctly the gravitational field / geometry of this
situation. Inside the shell (i.e. below the line v = v0 ), the geometry is indeed that of
Minkowski space, so this part of the diagram is correct. However, outside of the shell
(above the line v = v0 ) the geometry is the Schwarzschild geometry, and there is a
singularity when the shell collapses to zero size (reaches r = 0).
To see what this amounts to we can also add the worldline of the shell to the Penrose
diagram version of the Kruskal diagram (Figure 27). In this way we arrive at Figure 30.
In this diagram, only the geometry outside the shell (i.e. above the line v = v0 ) is
displayed correctly (as that of the Schwarzschild space-time), while the inside should
be replaced by Minkowski space. The correct diagram is thus obtained by gluing the
619
r=0 i+
I+
H+
sh
2m
el
=
l
r
i0
v = v0
r
=
H
2m
I
r=0 i
Figure 30: Penrose Diagram indicating the worldline of a null shell in the maximally
extended Schwarzschild geometry. The worldline of the shell is given by the line v = v0 .
Only the exterior part (above the line v = v0 ) is displayed correctly in this diagram.
two Penrose diagrams in Figures 29 and 30 together along the worldline of the shell. As
a consequence, the mirror region III and the white hole region IV get exorcised from
the diagram (as well as part of the black hole region II). In this way one arrives at the
diagram in Figure 31.
r=0
i+
I+
H+
i0
r=0
v = v0
Figure 31: Penrose Diagram for the collapse of a thin null shell to a black hole. The
worldline of the shell is given by the line v = v0 . In the region v < v0 inside the shell the
geometry is that of Minkowski space; the geometry outside the shell is Schwarzschild.
Formation of the black hole occurs when the shell crosses the event horizon H+ .
Remarks:
1. Notice that, as indicated in the figure, the event horizon H+ starts growing /
expanding from r = 0 a long time before the shell arrives or crosses its Schwarz-
620
schild radius. This apparently acausal / prescient behaviour is a peculiar, but
very characteristic feature of the event horizon. This will be discussed further in
section 31.
2. This example is also easily generalised to the description of the collapse of a shell
onto a pre-existing Schwarzschild black hole with mass mi by choosing the mass
function to be
3. As an aside: in this case, the (apparently somewhat ham-handed and cavalier) pro-
cedure with distributional curvatures leading to (28.6) gives the correct (Barrabes-
Israel90 ) surface energy density
1 mf
(r) = (28.8)
4GN r 2
of the collapsing null shell (null world volume ), with constant total mass M =
mf /GN . This is due to the fact that we have worked from the outset in what
are known as adapted coordinates, in this case in particular with the ingoing
coordinate v. I will very briefly come back to this in section 28.2 below. In gen-
eral, however, much more care is required to identify correctly the (distributional)
components of the stress tensor of a thin shell.91
Here is a quick and rough (and by no means indispensable) explanation of what I meant
by adapted coordinates in the last remark of the previous section.
To that end, let us describe a bit more carefully the situation we are trying to model.
Thus let be the null hypersurface describing the worldvolume of the shell. This divides
the space-time into two parts, the inside (past) V and the outside (future) V+ . Let
us at first try to model this situation by using the standard Schwarzschild coordinates
outside the shell (and correspondingly standard radial Minkowski coordinates inside the
shell). This will of course not allow us to describe the geometry inside the horizon, so it
is clear that this is not the optimal choice of coordinates, but this is not the issue here
(which, as we will see, also manifests itself outside the horizon).
V : ds2 = dt2 + dr 2 + r 2 d2
(28.9)
V+ : ds2+ = f (r)dt2+ + f (r)1 dr 2 + r 2 d2 ,
90
C. Barrabes, W. Israel, Thin shells in general relativity and cosmology: The lightlike limit, Phys.
Rev. D43 (1991) 1129-1143.
91
A nice and characteristically lucid dicussion of the general formalism for null and non-null shells
can be found in E. Poisson, A Relativists Toolkit: the Mathematics of Black Hole Mechanics.
621
where f (r) = 12m/r (I now write m instead of mf ). Here I have already identified the
radial and angular coordinates across the shell (this is possible), but have been careful
to introduce two different time-coordinates t . The reason for this is that these two
time-coordinates cannot be identified.
Indeed, in terms of the internal Minkowski coordinates, the trajectory of the shell, i.e.
the ingoing light ray, is described by
: t + r = 0 , (28.10)
+ : t+ + r = 0 , (28.11)
with
??? 2m
f (r) = 1 (r rshell (t)) (28.13)
r
say, with rshell (t) (supposedly) describing the location of the shell, does not even make
sense. In such a situation one has to appeal to the general Barrabes-Israel formalism
(footnotes 90 and 91) to determine the surface energy-momentum tensor (and the correct
junction conditions).
Let us now look at the same problem in ingoing and outgoing coordinates. In ingoing
coordinates
v = t + r , v+ = t+ + r (28.14)
V : ds2 = dv
2
+ 2dv dr + r 2 d2
(28.15)
V+ : ds2+ = f (r)dv+
2
+ 2dv+ dr + r 2 d2 ,
with the location of the shell described by v+ = C+ outside the shell, and v = C inside
the shell, for some constants C . Just by shifting v appropriately, we can arrange that
C = 0, so that from both sides the shell is described by
: v = 0 , (28.16)
622
it makes sense to write the metric collectively as in (28.3),
2m
ds2 = f (v, r)dv 2 + 2dv dr + r 2 d2 , f (v, r) = 1 (v) , (28.17)
r
and one can calculate the surface energy-momentum tensor by determining the
bulk Einstein tensor in these adapted coordinates (as done e.g. in the appendix of
the Barrabes-Israel article cited in footnote 90).
As a final variation of this theme, let us look at what happens when one attempts to
describe the ingoing shell in outgoing coordinates, i.e. in coordinates (u , r, , ) with
u = t r , u+ = t + r . (28.18)
Now the metric on the two sides of the shell takes the form
: u + 2r = C + : u+ + 2r = C+ . (28.20)
Thus we are again in a situation where u satisfy different equations and can hence
not be identified across the shell (however, this would be the right choice of adapted
coordinates for describing outgoing (exploding) shells). Therefore also in this case one
would then need to appeal to the general Barrabes-Israel formalism to determine the
surface energy-momentum tensor.
In the case at hand, even if for some reason one is interested in the final result in outgoing
coordinates (e.g. if one wants to superimpose on this black hole geometry the effect of
outgoing Hawking radiation), it is much simpler to first do the calculation in adapted
(ingoing) coordinates and to then transform the result back to outgoing coordinates (and
then this needs to be done separately for the interior and exterior regions). This option
may not always be available, however (e.g. it is an analytically non-trivial problem to
transform a general ingoing Vaidya metric to outgoing coordinates, say).
To see how we could picture the situation of gravitational collapse (without having to
address astrophysical questions and thus without trying to understand why this collapse
occurs in the first place), let us estimate the average density of a star whose radius r0
is equal to its Schwarzschild radius. For a star with mass M we have
2M GN 4r03
rs = and M . (28.21)
c2 3
623
Therefore, setting r0 = rs , we find that
2
3c6 Msun
= 2 1016 g/cm3 . (28.22)
32G3N M 2 M
For stars of a few solar masses, this density is huge, roughly that of nuclear matter.
In that case, there will be strong non-gravitational forces and hydrodynamic processes,
significantly complicating the description of the situation. The situation is quite simple,
however, when an object of the mass and size of a galaxy (M 1010 Msun ) collapses.
Then the critical density (28.22) is approximately that of air, 103 g/cm3 , non-
gravitational forces can be neglected completely, and the collapse of the object can be
approximated by a free fall. The Schwarzschild radius of such an object is of the order
of light-days ( 105 s).
Neglecting radiation-effects, the mass M of the star (galaxy) remains constant so that
the exterior of the star, r > R0 ( ), is described by the corresponding subset of region
I, and subsequently (once R0 ( ) < 2m) also region II, of the Kruskal-Szekeres metric.
Note that regions III and IV no longer exist because the region r < R is simply not at
all described by the Schwarzschild solution, but should be described by a solution of the
Einstein equations appropriate for the interior of the collapsing star (in particular, this
better be a solution of the non-vacuum Einstein equations, and we will describe such
solutions later on in this section).
Schematic Kruskal and Penrose diagrams for this process are given in Figures 32 and
33. As the Penrose diagram shows, much like in the case of collapsing null shells the
event horizon starts growing before the star has crossed its Schwarzschild radius. This
is not our main concern here, but we will analyse this in some detail in section 31.12.
To model this, we start with the Schwarzschild metric in the region outside the star.
For t 0 this is the region r > r0 , while for t > 0 this is the region r > R0 (t) where
R0 (t) is the radius of the star at time t, and R0 (t) describes the radial free fall (geodesic
motion) of the points on the surface towards the center discussed in section 25.2. By
continuity of the metric, the space-time metric induced by the exterior metric on the
624
T r=0
r=2m , t=infinity
Surface
of the
Star
Interior
of the
Star
r=const.
Figure 32: Kruskal diagram of a gravitational collapse. The surface of the star is
represented by a timelike geodesic, modelling a star (or galaxy) in free fall under its own
gravitational force. The surface will reach the singularity at r = 0 in finite proper time
whereas an outside observer will never even see the star collapse beyond its Schwarzschild
radius. However, as discussed in the text, even for an outside observer the resulting
object is practically black.
Figure 33: Penrose Diagram for the collapse of a star to a black hole (schematic). The
shaded region indicates the interior of the star.
(the subscript ext on ext is used to indicate that this is the metric induced on the
surface of the star by the exterior metric, i.e. the metric outside the star). Expressed
625
in terms of proper time , this becomes (writing now R0 = R0 ( ), and t = dt/d, R 0 =
dR0 /d , as usual)
Because (t( ), R0 ( )) parametrise radial timelike geodesics (for each value of the angular
coordinates (, )), one has
R0 () = 21 r0 (1 + cos )
3 1/2 (28.27)
r0
() = ( + sin ) .
8m
Remarks:
1. This simple form of the metric is due to the fact that the radially falling particles
remain at fixed values of the angular coordinates (so these are again comoving
coordinates), and that is the corresponding proper time, so that one has ds2 =
d 2 .
Indeed, we can think of the induced metric ds2ext as the restriction of the Novikov
metric (26.58)
ds2ext = d 2 + R0 ( )2 d2 d 2 + r(, r0 )2 d2
(28.29)
= d 2 + f (ri )1 r (, ri )2 dri2 + r(, ri )2 d2 r =r0 .
i
2. From (28.27) one sees that R0 takes its initial (maximal) value R0 = r0 at = 0
or = 0, will inevitably cross the Schwarzschild radius of the star at some finite
value of , and will reach r = 0 at = after the finite proper time
1/2 1/2
r03 r03
r0 0 = ( + sin ) = . (28.30)
8m 8m
For an object the size of the sun (for which our free-fall approximation is, however,
not really adequate) this would be of the order of one hour, and correspondingly
somewhat larger for larger, more massive and less dense, objects.
626
3. As an aside, note also that this implies that when freely falling radially into a
black hole, the proper time it takes to reach the singularity at r = 0 once one has
crossed the Schwarzschild radius is
3 1/2
rs
rs 0 = = m , (28.31)
8m
or, restoring c,
rs 0 = GN M/c3 . (28.32)
4. For an observer remaining outside the collpasing star, say at the constant value
r = r , in principle the situation (not unexpectedly by now) presents itself in
a rather different way. Up to a constant factor (1 2m/r )1/2 , his proper time
equals the coordinate time t. As the surface of the collapsing galaxy crosses the
horizon at t = , strictly speaking the outside observer will never see the black
hole form.
However, we had also seen that this period is accompanied by an infinite and
exponentially growing gravitational redshift (25.43), z exp t/4m for radially
emitted photons. Therefore the luminosity L of the star decreases exponentially,
as a consequence of this gravitational redshift and the fact that photons emitted
at equal time intervals from the surface of the star reach the observer at greater
and greater time intervals. It can be shown that
L e t/3 3m , (28.33)
so that the star becomes very dark very quickly, the characteristic time being of
the order of
5 M
3 3m 2, 5 10 s . (28.34)
Msun
Thus, even though for an outside observer the collapsing star never disappears
completely, for all practical intents and purposes the star is black and the name
black hole is justified.
5. Since only regions I and II of the Kruskal diagram are relevant for gravitational
collapse, and for black holes arising from gravitational collapse, for most prac-
tical purposes Kruskal-Szekeres coordinates are not required and it is sufficient
to consider coordinates that cover these two regions, such as Painleve-Gullstrand
(section 26.2) or Eddington-Finkelstein coordinates (section 26.4).
6. Note that, even if the free fall (geodesic) approximation is no longer justified at
some point, once the surface of the star has crossed the Schwarzschild horizon,
nothing, no amount of pressure, can stop the catastrophic collapse to r = 0 be-
cause, whatever happens, points on the surface of the star will have to move
within their forward lightcone and will therefore inevitably end up at r = 0 in
finite proper time.
627
In order to substantiate this claim, note that since timelike geodesics maximise
proper time, any non-geodesic radial attempt to avoid hitting r = 0 will only get
you there even quicker. Also, trying to somehow pick up some angular momentum
will not help, because for r < 2m the attractive general relativistic correction term
in the effective potential (24.18) dominates over the repulsive angular momentum
barrier term,
L2 mL2
r < 2m 2
< 3 . (28.35)
2r r
7. In interpreting the collapse to r = 0, it should be kept in mind that the Schwarz-
schild metric was never meant to be valid at r = 0 anyway (as it is supposed to
describe the exterior of a gravitating body). Nevertheless, just being close enough
to r = 0, without actually reaching that point is more than sufficient to crush any
kind of matter. Indeed, (26.145) and the geodesic deviation equation (section 8.3)
show that the force needed to keep neighbouring particles apart is proportional to
r 3 . Thus the tidal forces within arbitrary objects (be they solids or elementary
particles) eventually become infinitely big so that these objects will be crushed
or torn apart completely. In that sense, the physics becomes hopelessly singular
even before one reaches r = 0 and there seems to be nothing to prevent a collapse
of such an object to r = 0 and infinite density.
8. In sections 28.4 - 28.8 below we will construct a matching interior solution to the
Einstein equations describing a collapsing star, the Oppenheimer-Snyder collapse
solution. It shows that this singularity is akin to a cosmological (Big Bang, or
rather Big Crunch in the present context) singularity, and confirms that in the
interior there is a genuine singularity in the form of a diverging matter density.
Certainly classical general relativity (and even current-day quantum field theory)
are inadequate to describe this situation (and if or how a theory of quantum
gravity can deal with these matters remains to be seen).
It is fair to wonder at this point if the above conclusions regarding the collapse to r = 0
are only a consequence of the fact that we assumed exact spherical symmetry. Would the
singularity be avoided under more general conditions? The answer to this is, somewhat
surprisingly and shockingly, a clear no.
It has e.g. been shown that the gravitational field of a static vacuum black hole, even
without further symmetry assumptions, is necessarily given by the spherically symmetric
Schwarzschild metric and is thus characterised by the single parameter M (Israel, 1967).
This was the first of a series of remarkable black hole uniqueness (or no hair) theorems
which I will briefly come back to in section 29.1 below. Curiously, initially the result by
Israel was interpreted by many as confirming that such singularities could only occur
628
in exactly spherically symmetric situations.92 It turned out, however, that what this
theorem actually implies is that higher multipole moments will have to be radiated away
during gravitational collapse.
Moreover, there are very general singularity theorems, due to Penrose, Hawking and
others, which all state in one way or another that if Einsteins equations hold, the
energy-momentum tensor satisfies some kind of positivity condition, and there is a
regular event horizon, then some kind of singularity will appear (typically in some form
of geodesic incompleteness, i.e. in the existence of geodesics that cannot be extended
to arbitrary values of their affine parameter). These theorems do not rely on any
symmetry assumptions.93
In section 23.7 we had described the general set-up (as well as a special solution) for
the interior solution of a static spherically symmetric star, and in section 28.3 above
we have described the exterior (Schwarzschild) geometry of a collapsing star. We will
now attempt to find an idealised description of the interior of a star during the time-
dependent phase of gravitational collapse. This interior of a star will be modelled on a
(bounded subset of a) gravitationally collapsing cosmological model, in particular that
of a collapsing dust-filled universe. The exact solutions to the Einstein (Friedmann-
Lematre) equations for this case are derived in section 36.3.
We will also make sure that the exterior and interior descriptions of this gravitational
collapse match at the surface of the star.
If we assume that matter inside the spherically symmetric star can be modelled by
a perfect fluid with spatially uniform energy density = (t) and pressure p = p(t),
then the spatial geometry is locally both isotropic and homogeneous. Thus (see section
13.1) the spatial geometry of the star is that of a (bounded subspace of a) maximally
symmetric space, and solutions are then governed by the Friedmann equations (section
34.5), i.e. by the Einstein equations specialised to this situation. Some familiarity with
sections 33.4, 34.5 and 36.3 will therefore be necessary (and assumed) in the following.
92
I owe this remark to A. Ashtekar, The Last 50 Years of General Relativity and Gravitation: From
GR3 to GR20 Warsaw Conference, arXiv:1312.6425 [gr-qc].
93
See e.g. S. Hawking and G. Ellis, The large scale structure of space-time and R. Wald, Chapter
9 of General Relativity for textbook accounts, chapter 1 of S. Hawking, The Nature of Space and
Time, arXiv:hep-th/9409195 for an introduction, and J. Senovilla, Singularity Theorems in General
Relativity: Achievements and Open Questions, arXiv:physics/0605007v1 [physics.hist-ph] for an
overview.
629
Let us first address the geometry of this problem. The spherically symmetric star is a
3-ball B 3 , i.e. a 3-dimensional space with boundary a 2-sphere S 2 . Its 2-dimensional
counterpart would usually be called a disc (or 2-disc) D 2 , a surface with boundary a
circle S 1 . A priori, one could model the geometry of this disc e.g. as the subset of the
Euclidean plane (with its induced maximally symmetric flat metric),
ds2 (D 2 ) = dr 2 + r 2 d2 (r r0 ) (28.36)
or as the cap of a sphere (with its induced maximally symmetric positive curvature
metric),
ds2 (D 2 ) = d 2 + sin2 d2 ( 0 ) (28.37)
or even as its negative curvature counterpart, say the Poincare disc model of the hyper-
bolic plane, given in polar coordinates in (10.67), i.e.
dr 2 + r 2 d2
ds2 (D 2 ) = 4 ( 0 r r0 ) . (28.38)
(1 r 2 )2
Likewise, one can model the 3-ball geometry of a spherically symmetric star in terms of
bounded subspaces of any of the k = 0, 1 3-geometries that we have been considering,
e.g. the spatially flat 3-ball or 3-disc for k = 0 or the cap of a 3-sphere for k = +1,
k = 0 : ds2 (B 3 ) = dr 2 + r 2 d22 (r r0 )
2 3 2 2
(28.39)
k = +1 : ds (B ) = d + sin d22 ( 0 )
(we had already encountered the k = +1 cap/disc/3-ball as the spatial geometry un-
derlying a static spherically symmetric star in section 23.7).
As far as the matter content is concerned, a spatially constant non-zero pressure would
in particular lead to a non-zero pressure at the surface of the star. This would need
to be compensated by a non-zero surface tension, a further contribution to the energy-
momentum tensor, -function localised on the surface of the star. In order not to have
to deal with this situation, we will only consider the simplest possibility, namely that
of pressureless dust, p = 0 (Oppenheimer and Snyder, 1939).
In this case, the interior solution is provided by the cosmological solutions of the matter
dominated era derived in section 36.3, with the radial coordinate of the star restricted
to run over a finite range, as in (28.39). The exterior solution would then, as in our
discussion of section 28.3, be given by the Schwarzschild metric, and one thing we need
to do is make sure that the exterior and interior descriptions of the surface of the star
agree.
In order to understand the role of the spatial curvature k and how to glue the interior and
exterior solutions together, recall that we had already seen in section 34.3, in (34.70),
that pressureless dust necessarily moves along geodesics of the space-time geometry. In
the present case these are the geodesics of comoving observers (dust particles) and the
630
cosmological time t is their proper time. In particular, if we think of a(t) as (proportional
to) the time-dependent radius of the collapsing star, then a(t) describes the geodesic
trajectories of particles at the surface of the star.
However, these surface particles should follow geodesics as if the total mass of the star
were concentrated at the center of the star, i.e. they should also move along geodesics
of the outside Schwarzschild geometry with that mass. Thus we are led to the, a priori
perhaps somewhat surprising, statement that the Friedmann equations for dust must
agree with the geodesic equation for radially freely falling particles in the Schwarzschild
geometry. This is indeed the case:
On the one hand, the evolution of the cosmic scale factor is governed by the
Friedmann equation (36.23),
Cm
a 2 + k = . (28.40)
a
On the other hand, according to the results of section 25.2, radial free fall is
governed by the equation (25.19),
2m 2m
r 2 + = , (28.41)
ri r
where ri is the radius where the particle is initially at rest, r(r
= ri ) = 0.
We see that these really do have the same form (and we will match them more precisely
below). We also see that the choice of pressureless matter, i.e. the equation of state
parameter w = 0, for the interior of the star is essential for this (other values of w
leading to other powers of a on the right-hand side of (28.40)).
This similarity of the equations in the case w = 0 is also reflected in the explicit solutions
of the geodesic and Friedmann equations, as for example in the solution (28.27) of the
radial geodesic equation in the Schwarzschild geometry,
R0 () = 21 r0 (1 + cos )
3 1/2 (28.42)
r0
() = ( + sin ) ,
8m
and the recollapsing solution (36.34) of the Friedmann equations for a spatially closed
dust-filled universe,
amax
a() = (1 cos )
2
amax
t() = ( sin ) . (28.43)
2
From this we see that a finite ri or r0 corresponds to a k = +1 interior solution. The
spatially flat k = 0 3-disc geometry, on the other hand, corresponds to ri = , i.e. the
case where the surface of the star behaves as if it had been released from rest at infinity,
631
as can be seen by comparing the explicit solutions for radial geodesics (25.18) and the
cosmic scale factor (36.25) in this case,
Thus a matching with the exterior geometry of the collapsing star discussed in section
28.3 (where we assumed free fall from rest from a finite radius r0 ) requires k = +1.
However, the k = 0 solution is also instructive in its own right, and we will analyse both
possibilities below.
In order to match the exterior and interior geometries, we should start by matching
the coordinates used for the two solutions. In both cases, due to spherical symmetry
(and due to having chosen coordinates that make this symmetry manifest), there is a
transverse 2-sphere with line-element d22 = d 2 + sin2 d2 , and we will simply identify
the coordinates (, ) of the two solutions.
This leaves us with the temporal and radial directions. The time coordinate of the
cosmological (interior) metrics is the proper time of comoving observers (in particular
those on the surface of the star), and this is a natural choice which we will maintain. It
follows that also for the exterior Schwarzschild geometry we should choose coordinates
such that the time-coordinate is the proper time of these comoving = freeely falling
observers, and we have already constructed various such coordinate systems in section
26. In particular, we could use
In order to illustrate the procedure, we will pursue both options and discuss the case
k = 0 in terms of PG-like coordinates and the case k = +1 in terms of comoving
coordinates.
632
p
This is adapted to radially infalling observers with dr = 2m/rdT (so that dT = d
is proper time). These are the radial geodesics with E = 1 ri = . We assume that
from the exterior point of view particles on the surface of the star follow such geodesics,
so that the surface of the star is described by r = R0 ( ), with
R 0 ( ) = 2m(R0 ( ))1/2 R0 ( ) = (9m/2)1/3 (0 )2/3 . (28.46)
ds2 = d 2 + (d
r rH( )d )2 + r2 d2 . (28.47)
Here we have already identified the cosmological time t = as the proper time of comov-
ing observers, H( ) = a(
)/a( ) is the Hubble parameter, and the radial coordinate r
is related to the usual comoving radial coordinate of the Robertson-Walker metric (now
denoted rc , to avoid confusion with the radial coordinate of the Schwarzschild or PG
metric)
ds2 = d 2 + a( )2 (drc2 + rc2 d2 ) (28.48)
by
r = a( )rc . (28.49)
This form of the metric is adapted to comoving observers (fixed rc ), which obey the
Hubble relation dr/d = rH( ). We assume that the surface of the star has fixed
0 ( ),
(comoving) radial coordinate rc,0 , and is thus described by r = R
0 ( ) = a( )rc,0
R 0 = ar
R 0 .
c,0 = H R (28.50)
Its time-dependence (i.e. the collapse of the star) is governed by the negative square-root
of the k = 0 Friedmann equation for pressureless matter,
p p
) = Cm a( )1/2 R
a( 0 ( ) = Cm (rc,0 )3/2 R 0 ( )1/2
(28.51)
R 0 ( ) = rc,0 (9Cm /4)1/3 (0 )2/3 ,
with
H( ) = 2/3(0 ) . (28.52)
Thus more explicitly the interior metric can now be written as
where
2r3
m(,
r) = . (28.54)
9(0 )2
Comparison of the two metrics, the exterior Schwarzschild metric in Painleve-Gullstrand
form (28.45) and the interior metric in Painleve-Gullstrand-like form (28.53), makes it
633
manifest that we should identify not only PG time T with the cosmological time t =
(as already anticipated above), but also the PG-Schwarzschild radial coordinate r with
the cosmological radial coordinate r = a( )rc .
A seamless matching of the two metrics then requires the identification of the location
of the surface of the star from the two sides, through
0 ( ) ,
R0 ( ) = R (28.55)
m(,
0 ( )) = m
R 0 ( ) = (9m/2)1/3 (0 )2/3 = R0 ( ) .
R (28.56)
Note that this necessarily leads to the requirement (that we had already imposed) that
the interior of the star is described by pressureless dust, in order to reproduce the
characteristic 2/3 -behaviour of the geodesic.
Either from the explicit expression for the two solutions, or from comparing the geodesic
equation in (28.46) with the Friedmann equation in (28.51), one finds that R0 ( ) =
R 0 ( ) is equivalent to
0 ( )
R0 ( ) = R (9m/2)1/3 = rc,0 (9Cm /4)1/3 3
2m = Cm rc,0 . (28.57)
This resulting condition relating the parameters of the exterior and interior solutions
can be demystified by recalling the definition (35.27) of Cm as the constant
8GN
Cm = ( )a( )3 . (28.58)
3
Then (28.57) becomes
3 m 4 4
2m = Cm rc,0 M = (a( )rc,0 )3 ( ) = R0 ( )3 ( ) , (28.59)
GN 3 3
which is simply the statement that at all times the Schwarzschild gravitational mass-
energy M of the star felt by the freely-falling particles on the stars surface is precisely
the total mass-energy (density times volume) of the star. This encapsulates the essence
of the Oppenheimer-Snyder construction.
With this identification, the induced metric on the surface of the star satisfies
h p i
ds2ext dT 2 + (dr + 2m/rdT )2 + r 2 d2
T =,r=R0 ( )
2 2 2
= d + R0 ( ) d
(28.60)
0 ( )2 d2
= d 2 + R
= d 2 + (d r rH( )d )2 + r2 d2 r=R 0 ( ) ds2int .
634
28.6 Synopsis of the Oppenheimer-Snyder Construction
Dropping all tildes and other now (in retrospect) irrelevant decorations, and choosing
without loss of generality the instant of total collapse of the star to be at 0 = 0, the
set-up and results can be summarised as follows:
Here C is given in terms of the total mass m (the parameter characterising the
exterior Schwarzschild geometry) by
C = (9m/2)1/3 , (28.64)
635
3. Jointly the exterior and interior metrics can be written compactly as
r !2
2m(, r)
ds2 = d 2 + dr + d + r 2 d2 (28.65)
r
with (
m r > R( ) = C( )2/3
m(, r) = (28.66)
2r 3 /9( )2 r < R( ) = C( )2/3
This solution describes a collapsing dust star for < 0 = 0, collapsing to zero
radius at time = 0 = 0.
While the metric is now certainly continuous across the surface of the star, in order
to complete the story one should also check that the first derivatives of the metric
match on the two sides as well. Indeed, by the Einstein equations in order for the
energy-momentum tensor to only exhibit a finite jump as one crosses , rather than
a -function localised surface energy-momentum tensor on , also the 1st derivative of
the metric induced on should be continuous. This is automatic for derivatives tangent
to the surface, and thus this continuity requirement boils down to the requirement that
the normal derivatives of the metric (i.e. derivatives in the direction orthogonal to )
agree on , and we will come back to this issue below.
Second derivatives, however, will not and cannot be continuous across the surface, be-
cause the energy momentum tensor has spatially constant density inside the star and
is identically zero outside the star, so that by the Einstein equations also the Einstein
tensor necessarily has a discontinuity across the surface of the star.
636
In order to address the issue of continuity of the normal derivatives of the induced
metric, first of all we need to determine the (normalised) normal vectors on the two
sides. To that end we note first that the tangent directions to the surface of the star are
the two spacelike directions tangent to the 2-sphere S 2 , as well as the timelike direction
u spanned by the geodesics describing the free-fall motion of the surface. Thus the
normal vector N is a spacelike radial vector orthogonal to u , determined (up to a
choice of sign) by the conditions
u N = 0 , N N = +1 . (28.71)
u = = (1, 0, 0, 0) u = (1, v , 0, 0) = ( , r, )
, . (28.72)
N = (0, 1, 0, 0) N = N = r , (28.73)
Returning now to the issue at hand, namely the continuity of the normal derivative of
the metric, recall first of all that we had already checked the continuity of the metric
on , a condition that we can express in terms of the induced metric
h = g N N (28.74)
h+
= h h+
ab = hab , (28.75)
where h ab denotes the metric on induced from the exterior / interior geometry re-
spectively. Now recall that in section 17.2 we introduced the extrinsic curvature K
precisely as the tangential projection of the normal derivative of the induced metric
(17.24),
K = 12 h h LN g = h h N
(28.76)
Kab = Ea Eb N .
In view of this it is entirely plausible that we can formulate the condition for the
continuity of the normal derivative of the metric across the surface of the star, and
the absence of distributionally localised energy-momentum at the surface of the star, as
the condition that the interior and exterior extrinsic curvatures be equal,
+ +
K = K Kab = Kab . (28.77)
637
This is indeed the correct condition and together the conditions (28.75) and (28.77) are
known as the Israel(-Darmois) junction condition.95
We will now verify (28.77). Since N has the simple form r (and the same form both
outside and inside the star), while for its associated covector one has the (marginally
more complicated) expression
N = (v , 1, 0, 0) (28.78)
K = h h N | = h h N | = h h r | . (28.79)
hk = k (28.80)
because N has no angular components. Thus for the angular components one
simply needs
and therefore
+
Kik = Kik = R( )(di dk + sin2 di dk ) . (28.82)
u ir = 21 (u r gi + u gir u i gr ) . (28.83)
and all three terms are individually zero: (i) (r gi )u = 0 because u has no com-
ponent in an angular direction, (ii) gir = 0, (iii) gr components are independent
of y i . Therefore one has
+
Ki u = Ki u =0 . (28.84)
3. It remains to show that the u u components K u u are equal. This can be
shown by explicit calculation, but it is more enlightning to note that this follows
in general from (17.43), because u is geodesic. Thus
+
K u u = K u u =0 . (28.85)
95
For a detailed discussion of these junction conditions see E. Poisson, A Relativists Toolkit: the
Mathematics of Black Hole Mechanics, sections 3.7 - 3.11.
638
Putting (1), (2) and (3) together, we have thus established
+
K = K (28.86)
and therefore the continuity of the first derivatives of the metric across the surface of
the star. We also learn that this is essentially due to the fact that we have matched the
two space-times along geodesics (something that would not have been necessary if we
had just wanted to have a continuous metric).
This is also easy to understand intuitively: one could have of course tried to force our
dust star to collapse at a different rate, i.e. not in free fall (or e.g. tried to force a star
with a different interior to collapse as if it were in free fall). In either case, however,
this would require introducing some pressure / surface tension localised at the surface
of the star to make the star do this. Within the Israel junction condition formalism
such a surface energy-momentum tensor is precisely equivalent to a discontinuity of the
extrinsic curvature.
We now consider the collapse of a star whose surface is initially at rest at some finite ra-
dius. We model the exterior geometry by the Schwarzschild metric in comoving Novikov
coordinates (26.58),
ri is a comoving coordinate, and we assume that particles on the the surface of the star
move along the geodesics with ri = r0 , r0 labelling the maximal radius of the star. As
already described in section 28.3, the restriction of the Novikov metric (26.58) to the
surface of the star, i.e. to the comoving radial coordinate ri = r0 , is
ds2ext = d 2 + f (ri )1 r (, ri )2 dri2 + r(, ri )2 d2 r =r0
i
(28.88)
= d 2 + R0 ( )2 d2
R0 () = 21 r0 (1 + cos )
3 1/2 (28.90)
r0
() = ( + sin ) ,
8m
639
We model the interior geometry by the spatially closed k = +1 Robertson-Walker metric
in the standard comoving coordinates (where we now, in analogy with the notation of
the previous section, write rc for the comoving radial coordinate)
drc2
ds2 = d 2 + a( )2 ( + rc2 d2 ) . (28.91)
1 rc2
We assume that from the interior point of view the surface of the star is at the fixed
comoving radius rc = rc,0 , so that the induced metric on the surface of the star is
2 2 2 drc2 2 2
dsint = d + a( ) ( + rc d )
1 rc2 rc =rc,0 (28.92)
= d 2 + a( )2 rc,0
2 )2 d2 .
d2 d 2 + R(
3 C
rc,0
)2 + rc,0
R( 2
=
m
. (28.95)
R0
Continuity of the metric across the surface of the star requires R0 ( ) = R 0 ( ), and
comparison of (28.89) and (28.95), say, gives us two conditions. The first,
2m 2
p
= rc,0 rc,0 = 2m/r0 < 1 (28.96)
r0
just provides us with the relation between the comoving Novikov coordinate r0 and the
comoving Robertson-Walker coordinate rc,0 . The second,
3
2m = Cm rc,0 , (28.97)
is identical to the condition (28.57) found in the previous section, in the context of the
k = 0 PG collapse, with the same consequence
m 4 0 ( )3 .
M = ( )R (28.98)
GN 3
The physical content of this equation is again that the (constant) Schwarzschild mass
M , i.e. the gravitational mass of the collapsing star as seen from the outside, is at any
640
time given by the product of the density and the (coordinate) volume of the star (cf.
also the comment on coordinate versus proper volume in this context in section 23.6).
The two conditions (28.96) and (28.97) can also equivalently be written as
and with these identification it is now manifest that the solution for R0 ( ) given in
0 ( ) = rc,0 a( ) obtained from (28.94). Thus
(28.90) is identical to the solution for R
the metric is now manifestly continuous across the surface of the star (with analogous
comments regarding its 1st and 2nd derivatives as in the case k = 0).
641
29 Black Holes IV: Other Black Hole Solutions (a brief overview)
While the Schwarzschild solution that we have discussed at length above is not the only
black hole solution of the Einstein equations, in 4 space-time dimensions the possiblilities
are remarkably restricted. In particular, (with some technical assumptions) it can be
shown that the most general stationary and asymptotically flat black hole solution of
the 4-dimensional vacuum Einstein or Einstein-Maxwell equations (with a regular event
horizon) is characterised by just three parameters, namely its mass M , charge Q and
angular momentum J.
These black hole uniqueness theorems constitute a significant generalisation of the re-
markable Israel theorem (1967) on the Schwarzschild solution (briefly already referred
to at the end of section 28.3), which states that (under certain technical conditions)
the unique regular static black hole solution of the Einstein vacuum equations is the
Schwarzschild solution. In particular, under these circumstances staticity implies spher-
ical symmetry (this is not to be confused with the content of Birkhoffs theorem which
asserts that spherical symmetry and the vacuum Einstein equations imply staticity, a
much more elementary result).
The generalisations of this theorem constituting the black hole uniqueness theorems
are colloquially referred to as the fact that black holes have no hair or also as the
no-hair theorems.96 In terms of gravitational collapse, these theorems roughly amount
96
For an historical review and account of these developments, see e.g. D.C.
Robinson, Four decades of black hole uniqueness theorems, available from
http://www.mth.kcl.ac.uk/staff/dc robinson/papers.html. For a detailed recent critical re-
view of the status (and limitations) of the assertions of the uniqueness theorems, see P. Chrusciel, J.
642
to the statement that the only characteristics of a black hole which are not somehow
radiated away during the phase of collapse via multipole moments of the gravitational,
electro-magnetic, . . . fields are those which are protected by some conservation laws.
The two most important examples of black hole solutions generalising the Schwarz-
schild metric, 2-parameter subfamilies of the complete 3-parameter family of black hole
metrics, are the Reissner-Nordstrm and Kerr metrics.
m m q 2 /2r , (29.2)
643
simplest, relatively speaking), the metric takes the form
where
(r) = r 2 2mr + a2
(r, )2 = r 2 + a2 cos2 (29.4)
(r, ) = (r 2 + a2 )2 (r)a2 sin2 .
Another useful way of grouping the terms in the metric is as (suppressing now the
arguments (r, ) of the functions , , )
2 2 2
ds2 = dt + 2 sin2 (d dt)2 + dr 2 + 2 d 2 , (29.5)
where
= 2mar/ (29.6)
Remarks:
1. This metric is time-independent and axially symmetric, with the two com-
muting Killing vectors t and .
2. However, it is only stationary, not static (cf. the discussion in section 15.4).
In the present adapted coordinates, in which the metric components are in-
dependent of t, this amounts to the statement that the metric is invariant
under constant time-translations but not under time-reflection t t (be-
cause of the rotation term gt ). More invariantly, this is the statement that
the Killing vector t is not hypersurface-orthogonal, neither to the surfaces
of constant t not to any other hypersurface.
3. The Kerr metric fairly manifestly reduces to the Schwarzschild metric for
a = 0. It also reduces to the Minkowski metric for m = 0, as one might
expect (rotating Minkowski space is still Minkowski space), but this is
somewhat less manifest as one obtains the Minkowski metric in some rather
obscure coordinates (known as oblate spheroidal coordinates).
4. Regardless of how one writes the metric, its singularity and horizon structure
and the behaviour of geodesics are much more intricate and intriguing than
for the Schwarzschild and Reissner-Nordstrm solutions. We will take a brief
644
look at the horizon structure in section 31.3, but for more details (and an
analysis of the more intricate singularity structure) I have to refer you to any
of the modern standard textbooks on general relativity.
5. Electric charge can be added to this solution by the same replacement m
m q 2 /2r (29.2) as in the relation between the Schwarzschild and Reissner-
Nordstrm metrics. The resulting (charged Kerr or rotating Reissner-Nordstrm)
metric is known as the Kerr-Newman metric.98
1. Kottler Metric
The Kottler metric
2m r 2
ds2 = f (r)dt2 + f (r)1 dr 2 + r 2 d2 , f (r) = 1 (29.7)
r 3
is the unique spherically symmetric solution of the Einstein vacuum equations
with a cosmological constant ,
G + g = 0 R = g . (29.8)
It is also known as the Schwarzschild - de Sitter metric for > 0 and the Schwarz-
schild - anti-de Sitter metric for < 0. This solution is not asymptotically flat
but asymptotically (A)dS, i.e. asymptotic to pure de Sitter or anti-de Sitter space,
which is the maximally symmetric solution of the Einstein equation with a posi-
tive (negative) cosmological constant - see section 38 for a detailed discussion of
these space-times. In particular, we will derive the solution (29.7) in section 38.2
(see equation (38.59)).
Unsurprisingly, one can also add charge to this solution,
2m r 2 q 2
f (r) = 1 + 2 , (29.9)
r 3 r
to find an exact charged black hole solution of the Einstein-Maxwell equations
with a cosmological constant.
98
See e.g. T. Adamo, E. Newman, The Kerr-Newman metric: A Review, arXiv:1410.6626 [gr-qc]
for the (pre-)history of this metric (and the Kerr metric) and a discussion of its basic properties.
645
2. Topological Black Holes
Remarkably, and more surprisingly, for < 0 one can replace the 1 in f (r) by
a constant k = 0, 1,
2m r 2
fk (r) = k (29.10)
r 3
(formally this also works for > 0, but since fk (r) is strictly negative in that
case, this requires some reinterpretation . . . ), provided that one also replaces the
2-sphere by R2 or T 2 for k = 0, and the 2-dimensional hyperboloid H 2 for k = 1,
2
d2 for k = +1
ds2 = fk (r)dt2 + fk (r)1 dr 2 + r 2 d2(k) , d2(k) = d~x2 for k = 0
2
d2 for k = 1
(29.11)
2 2
(d2 denotes the line element of the standard metric on H ). These solutions
describe black holes immersed into AdS space, with horizons with a non-spherical
topology. Therefore such solutions are also, somewhat confusingly, known as topo-
logical black holes.99 A special case of (29.11) are the metrics (38.137) one obtains
for m = 0, which describe pure AdS space (no black hole) in different coordinate
systems.
646
of general relativity, mentioned at the end of section 28.3, which are typically
of the form under some reasonable assumptions, if there is something like an
event horizon, there must be something like a singularity, such solutions need to
walk a fine line between avoiding the singularity theorems and not being outright
unphysical. Usually this is achieved by some weak violation of the (occasionally
unreasonably strong) positive energy conditions (cf. section 21.1) entering the
singularity theorems, in particular the strong energy condition (SEC).101
One of the earliest and simplest solutions of this kind, which is also of the standard
simple f f 1 -form, is the Bardeen solution, a solution of the Einstein equations
coupled to (some non-linear version of) Maxwell theory, with metric function
2mr 2
f (r) = 1 . (29.12)
(r 2 + e2 )3/2
For suitable choice of the mass and charge parameters m and e, f (r) possesses
simple zeros (the largest zero corresponding to an event horizon). It approaches
the Schwarzschild metric for large r,
2m
r : f (r) 1 (29.13)
r
but for small r it approaches the de Sitter metric (in the form of the metric (29.7)
with m = 0),
647
5. Vaidya Metrics
Moving away from time-independent solutions, there is a simple class of time-
dependent generalisations of the Schwarzschild metric known as Vaidya metrics.
They can be obtained from the Schwarzschild metric written in ingoing or outgoing
Eddington-Finkelstein coordinates by replacing the constant mass m by a mass
function m(v) or m(u) depending on an advanced or retarded time coordinate.
Thus these metrics have the form
2m(v)
ds2 = f (v, r)dv 2 + 2dvdr + r 2 d2 , f (v, r) = 1
r (29.17)
2m(u)
ds2 = f (u, r)du2 2dudr + r 2 d2 , f (u, r) = 1 .
r
These turn out, none too surprisingly, to give rise to solutions to the Einstein
equations that describe null dust (or radiation) either entering (falling into) the
black hole or star (for m = m(v) a function of the ingoing Eddington-Finkelstein
coordinate) or exiting from (or being radiated away by) the black hole or star
(for m = m(u) a function of the outgoing Eddington-Finkelstein coordinate). In
particular, Vaidya metrics provide one with toy models that allow one to study
the formation and evolution of a black hole.
There are a lot of interesting things that one can do with, say about and learn
from Vaidya metrics. I will discuss some of them in sections 31 and 39 - 41.
The (standard) black hole solutions are also easily generalised to higher dimensions,
but in addition to that higher dimensions surprisingly offer many more possibilities
that have no 4-dimensional counterpart:
5. Schwarzschild-Tangherlini Solution
The D = d+1 dimensional generalisation of the Schwarzschild metric is sometimes
also known as the Schwarzschild-Tangherlini solution. It has the standard form
648
6. Topological Black Holes in Higher Dimensions
There is a corresponding, but much richer, generalisation of the topological black
hole solutions,
r2
fk (r) = k d2 2 (29.19)
r
where the transverse space (d 1)-dimensional space can now be S d1 , Rd1 or
H d1 , or any other Einstein manifold with a metric hij with the same curvature,
Rij (h) = (d 2)khij .104
649
9. Black Rings and other Exotic Black Objects
What is perhaps as remarkable as the uniqueness theorems for rotating black holes
in D = 4 is the fact that the situation is completely different for D > 4, a far cry
from the completely orderly and manageable situation in D = 4. In particular,
in D = 5 there are asymptotically flat black ring solutions with horizon-topology
S 2 S 1 , and even more exotic objects in D > 5. The general construction and
classification of black solutions in higher dimensions is an active area of research
and many open questions remain.107
107
See e.g. R. Emparan, H. Reall, Black Holes in Higher Dimensions, arXiv:0801.3471 [hep-th]
for a general review, as well as S. Hollands, A. Ishibashi, Black hole uniqueness theorems in higher
dimensional spacetimes, arXiv:1206.1164 [gr-qc], and G. Galloway, Constraints on the topology of
higher dimensional black holes, arXiv:1111.5356 [gr-qc].
650
30 Black Holes V: the Reissner-Nordstrm Solution
2m q 2
ds2 = f (r)dt2 + f (r)1 dr 2 + r 2 d2 , f (r) = 1 + 2 . (30.1)
r r
In this section we will analyse various aspects of this metric. Some of these depend on
the specific form of f (r), e.g. the analysis of the motion of (charged) particles in section
30.5. Others, such as the construction of Eddington-Finkelstein and Kruskal-Szekeres
coordinates in sections 30.7 and 30.9, are valid more generally for static black holes of
the ubiquitous (see e.g. the examples in section 29) f (r)dt2 + f (r)1 dr 2 form, and
these will initially be discussed in this more general context before specialising to the
Reissner-Nordstrm metric.
In order to obtain the solution we again start with with the standard form (23.6)
of a spherically symmetric metric, and for the gauge field make the electrostatic ansatz
that
At = At (r) (r) , Ar = A = A = 0 . (30.3)
with (r) the usual scalar potential. Thus the only non-vanishing component of the
field strength tensor is
651
Note that this is traceless, as it should be. Moreover, the combination
vanishes identically so that as in the Schwarzschild case (23.23) - (23.25) one concludes
that
BRtt + ARrr = 0 A(r) = B(r)1 f (r) . (30.10)
This means that the determinant of the metric is the same as the determinant of the
flat metric. Recalling that
1
F = ( gF ) (30.11)
g
this in turn implies that the usual Minkowski space solution of the Maxwell equations
F = 4j , j = (Q(r), 0, 0, 0) (30.12)
describing the electric field of an electric point charge with charge Q at r = 0, namely
is also a solution of the Maxwell equations in the gravitational background we are trying
to determine. Thus we can take the matter source of the Einstein equations to be given
by the standard electrostatic field Er = Q/r 2 of a point charge. The energy momentum
tensor then reduces to
Q2
(Ttt , Trr , T , T ) = f (r), f (r)1 , r 2 , r 2 sin2 (30.14)
8r 4
which can also be written as
Q2
T = diag(1, 1, +1, +1) . (30.15)
8r 4
Note the characteristic negative radial pressure! Note also that it is a fortuitous coinci-
dence (or hindsight, if you will) in this case that, by doing things in the right order, we
have been able to more or less decouple the matter and gravitational equations. Usually
one cannot just plug ones favourite Minkowski solution to the equations of motion into
the energy-momentum tensor and then solve the Einstein equations because there is
no guarantee that the initial solution will also be a solution of the matter equations of
motion in the resulting curved space-time.
As for Schwarzschild, the fact that A(r) = B(r)1 = f (r) implies (cf. (23.22) and
(23.26)) that
R = 1 f rf (30.16)
and plugging this into the ()-component of the Einstein equation one finds
Q2 C Q2
1 f rf = 8GN T = GN f =1+ + GN 2 . (30.17)
r2 r r
652
One can now verify that this is a solution of the complete set of Einstein(-Maxwell)
equations.
Comparison with the Schwarzschild solution, and introducing, in analogy with the grav-
itational mass radius m = GN M , the gravitational charge radius q via
q 2 = GN Q2 (30.18)
then gives
2m q 2
f (r) = 1 + 2 (30.19)
r r
and finally the Reissner-Nordstrm solution
1
2 2m q 2 2 2m q 2
ds = 1 + 2 dt + 1 + 2 dr 2 + r 2 d2 (30.20)
r r r r
The main new features of the Reissner-Nordstrm metric, compared with the Schwarz-
schild metric, are all due to the fact that the function f (r) now potentially has two
roots, at
p
r = m m 2 q 2 . (30.21)
Thus one has to distinguish the three cases m2 q 2 < 0, = 0, > 0, corresponding to the
three possibilities for the relative size of the gravitational mass radius m = GN M and
the gravitational charge radius q = GN Q, and we will discuss these three possibilities
in turn below. In the following, we will always assume that m > 0.
Remarks:
1. Instead of with (30.2) one can also start with the ansatz (23.61)
for which the Einstein equations take the form (23.64). In particular, one concludes
r) = 4GN r 2 (T rt ) = 0
m(t, (30.23)
and
h (t, r) = 4GN rf (t, r)1 (T tt + T rr ) = 0 , (30.24)
so that, without loss of generality, we can choose h(t, r) = 0. The remaining
equation for m(r) is
GN Q2
m (r) = 4GN r 2 (T tt ) = , (30.25)
2r 2
leading to
GN Q2 q2
m(r) = m =m (30.26)
2r 2r
and thus to (30.19),
2m(r) 2m q 2
f (r) = 1 =1 + 2 . (30.27)
r r r
653
2. As a check on the dimensions note that Newtons constant has dimensions (M
mass, L length, T time) [GN ] = M1 L3 T2 so that (23.31)
while in Gauss units the Coulomb force has no dimensionful factors apart from
Q1 Q2 /r 2 , and therefore (force F = ma having units [F ] = MLT2 )
In this case of an overcharged star (this is not a very realistic situation to put it
mildly), f (r) has no real roots, and the coordinate system is valid all the way to r = 0
(where there is a curvature singularity). In particular, the coordinate t is always time-
like and the coordinate r is always space-like. While this may sound quite pleasing,
much less insane than what happens for the Schwarzschild metric, this is actually a
disaster.
Note that m2 q 2 < 0 includes as a special case the solution with m = 0, supposedly
describing the gravitational field of a massless charged object. As shown in section
22.4, m measures the total energy / mass of the system (including, therefore, the pos-
itive electrostatic energy of the solution). There is thus clearly something disturbingly
unphysical about this solution.
Since, beyond exhibiting a naked timelike singularity the causal structure of these space-
times is not particularly interesting, and they are considered to be unphysical (and,
ideally, excluded by cosmic censorship), we shall not discuss this case any further in the
following and just note the following curious fact: if one wanted to model elementary
108
See e.g. R. Wald, Gravitational Collapse and Cosmic Censorship, arXiv:gr-qc/9710068.
654
particles such as the electron classically as point particles with a given mass and charge
(but one should, in any case, resist that temptation), they would satisfy q 2 > m2 by a
wide margin (since the gravitational interaction is completely negligible compared with
the Coulomb repulsion).
However, one should not conclude from this that elementary particles should hence be
modelled by naked singularities - electrons are essentially quantum mechanical objects
and outside the regime of validity / applicability of classical general relativity, and thus
outside the regime of validity of the present considerations.
f (r) has a double zero at r+ = r = m. Thus the metric takes the simple form
m 2 2 m 2 2
ds2 = 1 dt + 1 dr + r 2 d2 . (30.31)
r r
Since f (r) 0 everywhere, the singularity is timelike as in the overcharged case dis-
cussed above. Thus crossing the horizon one can avoid running into the singularity
and turn back. However, since one cannot cross the same horizon in two directions
(one should and can substantiate this by constructing ingoing Eddington-Finkelstein
coordinates in this case which will exhibit this one-way behaviour via the tilting of the
lightcones at the horizon, and we will do this below) this means that on the way out one
is really crossing a white hole horizon (use outgoing Eddington-Finkelstein coordinates
there) into a new asymptotically flat extremal Reissner-Nordstrm universe, and this
story repeats itself ad infinitum.
655
the Schwarzschild metric in isotropic coordinates). Choosing q positive so that m = q,
we can write the metric in the suggestive form
q 2 2 q 2 2
ds2 = 1 + dt + 1 + d~x . (30.35)
|~x| |~x|
Remarks:
(with all qk > 0, say). This is a special case of the Majumdar-Papapetrou class
of solutions (characterised in general by an equality GN m = e between the
matter and charge densities), and describes a multi-centered extremal black hole
solution of the Einstein-Maxwell equations, with the mutual gravitational attrac-
P
tion precisely balanced by the electrostatic repulsion, and with mass m = k qk .
Such extremal black holes (typically characterised by the saturation of an in-
equality between mass and charges) arise naturally as supersymmetric solutions
of supergravity theories. As such they enjoy particular stability properties and
provide useful toy-models for all kinds of considerations.
As for the Schwarzschild metric (section 25.5), it is instructive to look at the geometry
of the solution in the near-horizon region r 0. In the Schwarzschild case (with
r = r 2m of course) this gave us
r 2 2m 2
ds2 = dt + r + (2m)2 d2
d (30.39)
2m r
and then the Rindler-like metric (25.53)
656
Here, because of the double pole / zero of the extremal metric at r = m, one finds
instead that for r 0 one has
r2 2 m2 2
ds2 = r + m2 d2 .
dt + 2 d (30.41)
m2 r
In particular, this metric factorises, i.e. has a product structure, the second factor just
being the standard metric on the 2-sphere with constant radius m. To identify the first
factor, introduce the coordinate y = m2 / r . Then one has
r2 2 m2 2 2
2 dt + dy
2
dt + d
r = m (30.42)
m2 r2 y2
which is nothing other than the Lorentzian counterpart (10.73) of the constant negative
curvature metric on the Poincare upper-half plane metric, (10.59), also known as the
two-dimensional anti-de Sitter metric AdS2 (with (curvature) radius m). See section
38.3 for more information on AdS metrics and coordinate systems for them.
d = m y 1 dy = m r1 d
r (30.43)
rm r 0 y log r , (30.46)
is at an infinite proper distance from any point outside the horizon. This should then
be (and is) a fortiori true in the original extremal Reissner-Nordstrm metric (30.31).
Indeed, proper radial distance in that metric is determined by
m 1
d = 1 dr . (30.47)
r
Up to the replacement 2m m this is exactly the relation (25.66) that determined the
tortoise coordinate r (25.67) for the Schwarzschild geometry, so that the solution is
(up to a choice of integration constant)
= r + m log(r m) . (30.48)
657
The crucial difference is that here is a measure of proper distance whereas in the
Schwarzschild case r was just a coordinate. In any case this result exhibits the loga-
rithmic divergence log(r m) as r m. Nevertheless, the horizon can of course be
reached in finite proper time along e.g. timelike geodesics, or in finite affine parameter
along null geodesics. In particular, it is easy to see (cf. section 30.5) that radial ingoing
lightrays are simply described by r() = r0 E (30.85) for a constant (energy) E, so
that the affine time required to reach the horizon is proportional to the coordinate
distance to the horizon, not the proper distance.
Remark:
This, by the way, provides a nice and drastic illustration of the fact that one should not
attempt to define something like an average velocity of a particle between two points by
dividing proper spatial distance by proper time (which would come out to be not only
superluminal in this case, but actually infinite). Proper time and proper distance
measure distance along completely different paths in space-time, and dividing them is
akin to dividing apples by oranges. Dont yield to the temptation to do this (unless
you just want a way of quantifying small deviations from Minkowskian physics in a
weak gravitational field, say, without insisting on interpreting (apples)/(oranges) as a
velocity). Unfortunately, many people do not heed this advice . . .
This is in some sense the most interesting case, not because we actually expect to find
stars carrying a significant amount of electric charge, but rather because the double-
horizon structure this solution exhibits is not untypical and also appears in astrophys-
ically more relevant cases like those of rotating black holes such as the Kerr solution
(however, due to the lack of spherical symmetry, the Kerr solution is only axially sym-
metric, the actual horizon and singularity structure of the Kerr solution is more intricate
than that of the Reissner-Nordstrm solution).
at which f (r) vanishes, and f (r) is positive for 0 < r < r and r > r+ and negative in
the intermediate region r < r < r+ . In this case it is more informative to write the
function f (r) and the metric as
(r r+ )(r r )
f (r) = (30.50)
r2
and
(r r+ )(r r ) 2 r2
ds2 = dt + dr 2 + r 2 d2 (30.51)
r2 (r r+ )(r r )
658
The coordinate system in which we have written the metric is valid within each of the
three separate regions r > r+ , r < r < r+ and r < r (but (r, t) in one region are of
course not the same coordinates as (r, t) in another region).
The outer radius r+ > m is and behaves just like the event horizon of the Schwarzschild
metric, to which it tends for q 0,
q0 r+ 2m . (30.52)
In the intermediate region r < r < r+ any timelike (or null) curve will then have to
move from larger to smaller values of r (and will in fact reach r in finite proper time).
At the inner radius r < m, which is absent for the Schwarzschild metric since
q0 r 0 , (30.53)
there is also just a coordinate singularity and a horizon that reverses the role of radius
and time once more so that the singularity is time-like and can be avoided by returning
to larger values of r.
Remarks:
1. Again (i.e. as in the extremal case) we can anticipate the appearance of a new
white hole region beyond the original inner horizon, through which the particle
can pass back across r = r to larger values of r, and on across r = r+ to a new
asymptotically flat region etc. For a more detailed discussion of this see sections
30.5 - 30.9 below.
and always lies inside the inner horizon, rc < r , because f (rc ) = 1 > 0.
3. In the extremal case we saw that the horizon r = m is at an infinite proper distance
from any point r0 > m. Let us see what the situation is in the current non-extremal
case. Thus we want to calculate the distance between r+ and r0 > r+ ,
Z r0 Z r0
r r + r+
= dr p = rp
d (30.56)
r+ (r r+ )(r r ) 0 r(
r + )
659
where r = r r+ and = r+ r . For = 0 we evidently recover the logarithmic
divergence of the extremal case. For > 0, on the other hand, the integral is
finite (it can be expressed in closed form in terms of some unenelightning arccosh-
expression, but we wont need this), the potentially dangerous piece coming from
r+ r+
p r1/2 (30.57)
r(
r + )
r
4. While we are at it, let us perform another similar calculation which will have
an amusing and perhaps unexpected consequence. Namely, let us calculate the
proper (timelike) distance between the two horizons r+ and r , i.e. the proper
time it takes a radially freely falling observer to get from r+ to r . We can use
the standard (t, r) coordinates in this region between the horizons (remembering
that r is timelike there) and thus, according to the usual rules, the proper time
is Z r
r
= dr p . (30.58)
r+ (r r+ )(r r )
Introducing the coordinate via
p
r =m+ m2 q 2 cos 0 (30.59)
one finds Z
p
= d m+ m2 q 2 cos . (30.60)
0
So far so straightforward. The curious thing, however, is that the second term of
the integrand cos integrates to zero over the interval [0, ] and therefore does
not contribute to the integral at all, leading to the universal result
= m (30.61)
independent of q. Thus the proper time is m for any q < m, and also in the
extremal limit q m r r+ . In the extremal black hole, on the other
hand, with r+ = r , the coordinate distance between r+ and r = r+ is clearly
zero. What this shows is that the extremal black hole (q = m) is perhaps not for
all intents and purposes the same thing as the extremal limit (q m) of a non-
extremal black hole, and there are also other contexts in which this distinction
appears to play a role.109
In the extremal case, it turned out to be useful and instructive to write the metric in
isotropic coordinates (30.35). Equivalently, it was useful to introduce the coordinate
109
For further discussion, see e.g. S. Carroll, M. Johnson, L. Randall, Extremal limits and black hole
entropy, arXiv:0901.0931 [hep-th].
660
distance r = r m to the extremal horizon, and the resulting metric turned out to be
in isotropic form.
Extending this constrution to the non-extremal case, these two strategies result in two
different coordinate systems that are occasionally useful, true isotropic coordinates and
another coordinate system which I will refer to as brane coordinates because it the
prototype coordinate system in which one usually writes solutions to higher-dimensional
(super-)gravity theories describing black spatially extended objects. These are known
as black p-branes, with brane extracted from membrane, so that a 2-brane is a
membrane and in the case at hand we are dealing with a 0-brane.110
1. Isotropic Coordinates
Recall that for the Schwarzschild metric, the isotropic radial coordinate is (23.38)
m 2 m2
r() = (1 + ) =+m+ , (30.62)
2 4
and that for the extremal Reissner-Nordstrm metric we found (30.33) (with r
)
r() = + m . (30.63)
m2 q 2
r() = + m + (30.64)
4
which interpolates nicely between the two previous expressions for q = 0 and
q 2 = m2 . Then it is straightforward to check that
2
2 2 m2 q 2
r 2mr + q = () . (30.65)
4
and that
()1/2
dr = d . (30.66)
As a consequence, the Reissner-Nordstrm metric can be written as
() 2 r()2
ds2 = dt + 2 (d2 + 2 d2 ) . (30.67)
r()2
2. Brane Coordinates
It is also possible to write the metric in a form which interpolates nicely between
the standard Schwarzschild form of the Schwarzschild metric for q 0 and the
110
See e.g. T. Ortin, Gravity and Strings for a detailed account of these objects and their uses and
applications.
661
nice isotropic form (30.34) of the extremal metric in the extremal limit q m. To
that end, all we need to do is replace r by = rr , which reduces to the isotropic
coordinate r = r m in the extremal limit and, moreover, to the standard
radial coordinate r in the Schwarzschild limit r 0 (unlike the r = r r+
introduced in remark 3 above, which has the Schwarzschild limit r r 2m).
In terms of = r r , and with
p
= r+ r = 2 m2 q 2 (30.68)
measuring the deviation from extremality, the metric takes the form
ds2 = H()2 F ()dt2 + H()2 F ()1 d2 + 2 d2 (30.69)
with
r
H() = 1 + , F () = 1 . (30.70)
Observe that this manifestly reduces to (30.34) in the extremal limit,
F () 1
0 r m (30.71)
H() 1 + m/
We now consider the motion of a test particle with mass and charge e in the Reissner
- Nordstrm space-time with m2 > q 2 . This is evidently described by the geodesic
equation modified by the Lorentz-force term,
+ x x = (e/)F x .
x (30.73)
662
We will set = 1 in the following (the case 6= 1 can obviously be recovered from this
by scaling e e/), so that quantities associated to the particle like charge, energy
and angular momentum that appear below are, as usual (cf. the discussion in section
2.5) to be thought of as quantities per unit particle mass.
Proceeding in exact analogy with the derivation of the effective potential for geodesics in
the Schwarzschild geometry in section 24.1, in order to exploit the symmetries and con-
served charges of the system it will be convenient to work at the level of the Lagrangian
which we can choose to be
L = 21 g x x + eA x (30.74)
Plugging in the metric and gauge field (and choosing rightaway equatorial paths at
= /2 because of spherical symmetry), this Lagrangian becomes more explicitly
L = 21 (f t2 + f 1 r 2 + r 2 2 ) eQt/r
. (30.75)
r 2 = (E eQ/r)2 f , (30.79)
663
From this one can then readily deduce the qualitative features of the worldlines of
charged and uncharged particles in the Reissner-Nordstrm geometry.
Remarks:
which includes the usual angular mometum barrier term L2 /r 2 as well as the
familiar attractive general relativistic correction term mL2 r 3 and a novel re-
pulsive correction term q 2 L2 r 4 . However, we will focus on radial motion in
the following.
2. For massless particles ( = 0), which we will of course also consider to be uncharged
(e = 0), the effective potential consists of just the above angular momentum term,
In particular, for radial lightrays the effective potential is zero and r = E for
outgoing (respectively ingoing) null geodesics, so that
r() = r0 E , (30.85)
with the affine parameter. Thus massless particles can reach the horizon (and
r = 0) in finite affine time.
3. For certain purposes it is also useful to write the effective potential in terms of
the horizons r instead of the parameters m and q using the relations
r+ + r = 2m , r+ r = q 2 . (30.86)
Then one has, with e = e/ GN ,
eE r+ r (r+ + r )
2 r+ r
Vef f (r) = + 1 e2 . (30.87)
2r 2r 2
4. The interpretation of the first term in (30.81) is pretty clear: it describes the
competition between the leading Coulomb electrostatic and Newton gravitational
1/r interactions between the charged massive star and the charged massive test
particle. The only thing that may require some explanation is the factor of E in
the Coulomb interaction. As we know from the discussion of Schwarzschild, E
is esentially the special-relativistic -factor and thus the substitution Q EQ
accounts for the Lorentz contraction of the electric field lines as seen by a particle
2 = E 2 1 at r = .
with velocity r
664
5. The second term is more mysterious and interesting in several respects. First of
all, for a neutral test particle freely falling (following a geodesic) in the Reissner
- Nordstrm geometry, it provides a repulsive potential at short distances,
m q2
e=0 Vef f (r) = + 2 . (30.88)
r 2r
mimicking the angular momentum barrier term L2 /2r 2 . This inevitably leads to
a turning point of the trajectory, and we will see below that this turning point
lies inside the inner horizon, i.e. in the region r < r . A heuristic but not entirely
satisfactory explanation for the occurrence of this phenomeneon is that this is due
to a mass renormalisation
q2
m m(r) = m (30.89)
2r
required to compensate the infinite electrostatic energy density q 2 /r 4 of the
star. Alternatively one may take this as an indication that the interior of the
Reissner-Nordstrm solution is not particularly physical (we will come back to
this below).
6. For a charged particle, the coefficient q 2 = GN Q2 in the 1/r 2 term of the potential
is replaced by Q2 (GN e2 ). Recalling that e is really the charge per unit mass and
replacing e e/, one sees that the sign of this term is determined by the sign
of G2N 2 GN e2 , i.e. the relative size of the gravitational mass and charge radii
of the test particle. Reverting to = 1, we will call ordinarily charged particles
those for which e2 < GN , extremal those with e2 = GN and overcharged those
with e2 > GN . Thus the 1/r 2 term in the radial effective potential provides a
repulsive potential at short distances for all ordinarily charged particles, but this
term becomes attractive for overcharged particles.
Note that this not somehow an electrostatic effect (in particular since it is in-
dependent of the sign of the charge e of the particle) but a purely gravitational
effect. I do not have a good heuristic explanation for why overcharged particles
all of a sudden experience an attractive 1/r 2 potential (and would be glad to learn
of one . . . ).
7. If one gives the particle some angular momentum, no matter how tiny, i.e. if there
is just the slightest deviation from radial motion, then the term q 2 L2 /r 4 will kick
in at short distances to yet again provide a repulsive potential as a joint effect of
the charge of the black hole and the angular momentum of the particle.
Let us now take a closer look at some of the possible trajectories. First of all, for a
given choice of parameters the allowed values of r are constrained by the condition
665
and at a turning point rm (for maximal or minimal radius) of the trajectory one has
2m q2
+ 2 = E2 1 . (30.92)
rm rm
For a particle initially at rest at infinity, E = 1, one immediately reads off that the
minimal radius is equal to the core radius rc (30.55),
q2 r+ r r
rm (E = 1) = rc = = = < r . (30.93)
2m r+ + r 1 + (r /r+ )
It is also plausible (and moreover true) that the particle will penetrate slightly deeper
into the Reissner-Nordtstrm core if it initially has a non-zero inward directed velocity,
i.e.
E > 1 r m < rc , (30.94)
but clearly no finite energy particle can overcome the charge barrier to reach r = 0.
These particles will turn around at rm (E) and then escape again to infinity in a new
branch of the universe (since they clearly cant cross the same inner horizon r in both
directions).
Particles with E < 1 have both a minimum and a maximum radius rm , located in the
regions rm < r and rm+ > r+ respectively. Thus these particles appear to oscillate
in and out of the black hole region r < r < r+ but this is clearly not possible (if
r+ is a black hole horizon, you cannot just dance around and oscillate in and out of
it to your hearts content). What is happening is that, after having reached its inner
turning point at rm < r , the particle turns around to larger values of r, crosses a new
r = r horizon into a new (time-reversed) version of the region r < r < r+ in which it
can only move to larger values of r, crosses a white hole horizon at r = r+ into a new
asymptotically flat Reissner-Nordstrm patch, up to the maximal radius rm+ allowed
by its energy, and then turns around again to enter another new region etc. etc.
This suggests that somehow the maximally extended Reissner-Norstrm solution con-
sists of an infinite sequence of such universes patched together along the horizons, and
this indeed turns out to be the case. Moreover, none too surprisingly the analysis re-
veals that, as in the Kruskal diagram for the Schwarzschild metric, there is in addition
a mirror region, and therefore also an infinite sequence of such mirror regions. The
666
resulting Penrose diagram, in its full glory (well, almost full glory, I had to truncate it
somewhere) is shown in Figure 34. While this looks quite crazy, we will substantiate
this picture somewhat below by constructing coordinates that allow us to (patchwise)
cover all these regions of the extended space-time.
r r
r=0 r=0
I+ r+ r+ I+
r+ r+
I
I
r r
r=0 r=0
r r
I+ r+ r+ I+
r+ r+
I I
r r
r=0 r=0
Figure 34: Penrose Diagram of the maximal analytic extension of the non-extremal
Reissner-Nordstrm black hole.
We can also consider charged particles. Their behaviour depends strongly on whether
they are ordinarily charged or overcharged particles (the latter having a regretful suicidal
tendency to end up in the singularity at r = 0 regardless of the sign of the charge), but
also on the energy and on whether the Coulomb or gravitational 1/r interaction is
dominant. Thus this requires a bit of a case by case analysis which we will not pursue
667
here. Suffice it to say here that for all ordinarily charged particles one finds that the
first turning point of a radially infalling particle is located inside the inner horizon (and
not outside the outer horizon, which would, in principle, have been the other option).
In fact, the basic construction works for any metric of the form
u = t r , v = t + r . (30.97)
Infalling radial null geodesics (dr /dt = 1) are characterised by v = const. and
outgoing radial null geodesics by u = const.
(with an analogous expression for the metric written in terms of (u, r)). This
metric is now regular at any zero rh of f , f (rh ) = 0.
668
6. If f (r) changes sign from f (r) > 0 to f (r) < 0 as one moves from r > rh to r < rh ,
then the situation is identical to that for the Schwarzschild black hole:
the Killing vector v becomes null on the horizon and spacelike for for r < rh
r = rh is an event horizon and r will decrease along any future-directed causal
(timelike or lightlike) path.
Let us now see what happens to the lightcones, and what are therefore the allowed
paths for massless or massive particles, as one enters the Reissner-Nordstrm black hole
through the future outer horizon.
1. The first thing that will happen is, as already discussed above, and exactly as
in the Schwarzschild case, that the lightcones tilt over at r = r+ . Subsequently
both ingoing and (misleadingly still called) outgoing light rays will converge to
smaller values of r,
In particular, once inside one must continue to smaller values of r until one reaches
either a singularity (as for Schwarzschild) or another horizon.
669
v (and the angular coordinates) still label the ingoing lightrays crossing the inner
horizon. Indeed, the coordinates continue to be valid all the way up to r = 0.
Note that for this we do not need to know if the tortoise coordinate r is well-
behaved at r = r . As a matter of fact, it is not, but this does not affect the
Eddington-Finkelstein coordinates. It will, however, affect the Kruskal-Szekeres
coordinates which will break down at r = r (but have a larger region of validity
across the past white hole horizon of the original Reissner-Nordstrm patch).
3. At r = r , the function f (r) again changes sign and outgoing lightrays are indeed
outgoing, i.e. moving to larger values of r. Once these outgoing light rays in the
region r < r reach the new (white hole) inner horizon at r = r , the original
set of Eddington-Finkelstein coordinates (v, r) finally break down since v can only
label the ingoing lightrays and the new r sits at advanced time v = .
4. However, in this patch r < r (whose metric is identical to that in the outside
patch r > r+ ), one can also construct and use outgoing Finkelstein coordinates
(u, r), with u labelling the outgoing light rays. These coordinates will not only
cover the region 0 < r < r , but they will extend across the new white hole inner
and outer horizons r into a new asymptotically flat Reissner-Nordstrm patch.
5. From that region one can in principle continue into a new black hole region across
another r+ , but now it is evidently the outgoing Eddington-Finkelstein coordinates
that break down and one returns to step 1 and again constructs ingoing Eddington-
Finkelstein coordinates to describe this.
6. It is now evident that, proceeding in this way, one can pave / tessellate the entire
infinitely periodic fully extended non-extremal Reissner-Nordstrm solution with
ingoing and outgoing Eddington-Finkelstein coordinates (whose domains of valid-
ity overlap in the regions r < r and r > r+ where f (r) > 0). This is indicated
in the Penrose diagram in Figure 35.
In the extremal case, when there is a double zero of f (r), the story is similar, the only
difference being that the region between r and r+ is absent. The metric has the form
m 2 2
ds2 = 1 dv + 2dv dr + r 2 d2 . (30.103)
r
In particular, r = m is a null surface and a horizon. In the patch covered by the above
ingoing Eddington-Finkelstein coordinates one can only cross it along future-directed
curves. The diffference is that now f (r) is positive on both sides of the horizon, so that
outgoing light rays are really outgoing on both sides of the horizon. Ingoing Eddington
Finkelstein coordinates can still also cover the patch r < m, but they cannot describe
the new outgoing region beyond the new white whole inner horizon r = m. For this one
needs to introduce outgoing Eddington-Finkelstein coordinates etc. Again one can pave
670
Figure 35: Penrose Diagram of the Reissner-Nordstrm metric: regimes of validity of one
set of ingoing (left panel) and outgoing (right panel) Eddington-Finkelstein coordinates
are indicated by the shaded areas. These overlap in the regions r < r or r > r+ and
thus one can cover the entire maximal extension with such coordinates and their mirror
counterparts.
671
r
=
m
I
m
I+
=
r
r=0
r
=
m
I
m
I+
=
r
One can also introduce Kruskal-Szekeres coordinates via the same chain of transforma-
tions
(t, r) (u, v) (U, V ) (T, X) (30.104)
as in the Schwarzschild case. We again start by setting up the problem in the general
context of metrics of the form (30.96),
and we will now be more specific and assume that f (r) has a simple zero at r = rh .
Since the issue is the elimination of the coordinate singularity at rh , we can focus on
the behaviour of f (r) near r = rh ,
(here is where the treatment for a double zero, say, would of course be different). Thus
the tortoise coordinate can be approximated by
dr 1
dr = f (r)1 dr r log |r rh | . (30.107)
(r rh )f (rh ) f (rh )
672
Here f (rh ) has the same dual interpretation (26.159) as in section 26.9, namely on the
one hand as the inaffinity, measuring the failure of the coordinate v to provide an affine
parametrisation of the horizon generators = v , and on the other hand as the surface
gravity, providing a measure of the strength of the gravitational field at the horizon,
(
inaffinity: limrrh = h
h = 12 f (r)|r=rh = (30.108)
surface gravity: h := limrrh f (r)1/2 a(r)
To see this note that the derivation of the inaffinity given in (26.155) and (26.156) goes
through verbatim in general,
( ) = vv = (f /2)(f r + v )
(30.109)
lim = 21 f (r)|r=rh = h .
rrh
and that the generalisation of the calculation of the acceleration for a static observer in
section 25.1 gives
and (if one doesnt like null coordinates) one can also introduce new time- and space-
coordinates (tK , xK ) via
uK = tK xK , vK = tK + xK , (30.115)
and we have just seen that in terms of these Kruskal coordinates the metric near the
horizon at r = rh takes the manifestly non-singular form
673
The precise value of the coefficient C > 0 will depend on the non-singular terms in r
(which we have suppressed here), evaluated at r = rh , and on the choice of integration
constants, and is therefore arbitrary, and also irrelevant.
Thus these Kruskal coordinates provide us with a good system of coordinates not just
in the original patch where the coordinates (t, r) were valid, but also across the future
horizon at uK = 0 and the past horizon at vK = 0. In the Schwarzschild case, more than
that was true, namely the Kruskal coordinates provided us with a coordinate system
for the complete maximal extension of the Schwarzschild geometry. This is, however,
not guaranteed by the above general construction and need not, and will not, be true
in general.
Indeed, already for the Reissner-Nordstrm metric we will be able to see this explicitly.
To that end we will need the explicit expressions for the tortoise coordinate. In the
non-extremal case we have
(r r+ )(r r )
f (r) = . (30.117)
r2
The surface gravities at the two horizons r are
r+ r
= 12 f (r ) = 2 . (30.118)
2r
r2 1 1
dr = dr r = r + log |r r+ |+ log |r r | . (30.120)
(r r+ )(r r ) 2+ 2+
to cover the region around r+ . This coordinate system does not just cover the future
black hole region (until r = r , see below) but (as for Schwarzschild) also a past white
hole region (until r = r ) and a mirror asymptotically flat region of the Reissner-
Nordstrm patch.
However, these coordinates break down as r r . This can be seen from the explicit
expression of the metric in these coordinates, which display a (coordinate) singularity
674
at r = r (but we will forego this here). It can also be seen, more directly, and more to
the point, from the fact that, as noted above, r for r r , so that
r r r vu uK+ vK+ , (30.122)
so that the outer Kruskal coordinates uK+ , vk+ are singular at, and can therefore not
be extended beyond, the inner horizon, as shown in Figure 37.
In the region between the horizons one can then introduce the inner Kruskal coordinates
(uK , vK ) which extend beyond r (but become singular at r+ instead). These two
types of Kruskal coordinate systems can then be used alternatingly, each an infinite
number of times, to pave the entire space-time.
675
Since thus the space-time cannot be covered by a single Kruskal coordinate patch,
typically not a whole lot is gained by using Kruskal coordinates and for most purposes
the simpler Eddington-Finkelstein coordinates are actually more convenient. However,
as a matter of principle it is useful to know (and a remarkable fact in its own right)
that there is a generalisation (due to Kl osch and Strobl) of the Israel coordinates for
the Schwarzschild metric discussed in section 26.10 that provides a global covering of
the complete (infinitely periodically extended) Reissner-Nordstrm space-time - see the
reference in footnote 86 of that section.
Nevertheless, all in all, as in the case of the Kruskal diagram for the eternal fully ex-
tended Schwarzschild space-time, one should perhaps be somewhat skeptical of this
intriguing and entertaining white hole - black hole structure and narrative for the ex-
tended Reissner-Nordstrm space-time and take it with a substantial grain of salt:
1. First of all, for a collapsing star settling down to the non-extremal Reissner-
Nordstrm solution, the exotic regions beyond (i.e. before) the past white hole
horizon r+ are eliminated (as for Schwarzschild), as is the mirror region - but at
first the infinite chain of white and black holes in the future remains intact.
3. One can repeat the story for the extremal case. In this case, the surface gravity
is zero, but one can still construct Eddington-Finkelstein coordinates (as we have
seen above), so that one can describe the region behind the horizon in this case.
The instability of the inner horizon of a non-extremal black hole may however
limit the validity of this picture and, in fact, seems to suggest that in the extremal
case the outer = inner horizon may become singular.112
111
E. Poisson, W. Israel, Internal Structure of Black Holes, Phys. Rev. D41 (1990) 1796. See also
section 5.7.3 of E. Poisson, A Relativistt Toolkit.
112
See D. Marolf, The dangers of extremes, arXiv:1005.2999 [gr-qc] and D. Garfinkle, How extreme
are extreme black holes?, arXiv:1105.2574 for a discussion of this.
676
31 Black Holes VI: Horizons
31.1 Introduction
So far, we have studied concretely 2 classes of exact solutions of the Einstein equations
that can describe what we have called black holes, namely the Schwarzschild metric and
the Reissner-Nordstrm metric. However, this 2-parameter family of solutions to the
Einstein(-Maxwell) equations is obviously very special, as the solutions are both static
and spherically symmetric.
The aim of this section is to study properties of black holes in more generality, and
therefore the first issue to address is what one actually means by a black hole. From
the examples that we have studied, we know that the characteristic features arise from
what is happening at the Schwarzschild horizon of the Schwarzschild black hole and at
the outer horizon of the Reissner-Nordstrm black hole. These examples suggest that
black holes in general should be characterised and defined not in terms of what happens
inside a black hole, but in terms of the properties of its boundary or horizon.
While the examples that we are already familiar with give us some idea, as we will recall
in section 31.2, the Schwarzschild and Reissner-Nordstrm horizons share a number of
different properties and can thus also be characterised in many different ways. Therefore
this does not automatically provide us with a unique candidate definition of a black hole
boundary or horizon in a more general context.
1. Event Horizon
This is the traditional global notion of a black hole that is meant to capture the
idea that the black hole is a region of space-time that is invisible to an outside or
asymptotic oberver. Informally speaking, an event horizon is then the boundary
of this black hole region.
Until further notice, we will use the term event horizon in this way. A slightly
more formal, gobal and causal, definition of the event horizon will then be given in
section 31.4 (without, however, attempting to make this mathematically rigorous).
2. Killing Horizon
This notion of a horizon relies on the existence of an asymptotically timelike Killing
vector. It turns out to be a very convenient (local, geometric) characterisation of
the (global, causal) event horizon of a stationary black hole.
677
Various aspects of the Killing horizon, and its relation with the event horizon, will
be briefly discussed in sections 31.5 (rigidity theorems), 31.6 (surface gravity) and
31.7 (properties of the generating null congruences).
These trapped surfaces also play a central role in the singularity theorems of general
relativity and are the prime indicators that a singularity will develop. From this per-
spective, it is perhaps the trapped surfaces that are fundamental, and the event horizon
is only a considerate afterthought woven by a benign cosmic censor to hide the re-
sulting singularity from the outside. It is therefore of interest to investigate the relation
between the event horizon and various notions of black hole boundaries based on trapped
surfaces, and this will be the subject of the last part of this section.
This section is unavoidably technically somewhat more advanced than other sections in
this part of the notes. In particular, we will make extensive use of
the properties of null geodesic congruences studied in sections 11.4 and 11.5;
Let us start by reconsidering the various features of the future horizon of the Schwarz-
schild metric, which we write either in the standard Schwarzschild coordinates or in
678
advanced Eddington-Finkelstein coordinates (which extend across the future horizon)
as
ds2 = f (r)dt2 + f (r)1 dr 2 + r 2 d2
(31.1)
= f (r)dv 2 + 2dv dr + r 2 d2 ,
with
2m rs
f (r) = 1 =1 . (31.2)
r r
In this metric, there are several apparently quite different things happening at r = rs ,
and therefore different ways of characterising r = rs . Everything that is said below is
also valid for the outer horizon r+ of a Reissner-Nordstrm black hole with
2m q 2 (r r+ )(r r )
f (r) = 1 + 2 = . (31.3)
r r r2
We begin with the description in terms of the coordinate components of the metric:
1. At r = rs we have
gtt (rs ) = 0 gvv (rs ) = 0 . (31.4)
g rr (rs ) = 0 . (31.5)
In order to make the 1st characterisation somewhat less coordinate dependent, we can
assign some additional physical significance to the coordinate t or the vector field = t .
For example, we can focus on the fact that static observers, i.e. those that stay at fixed
spatial Schwarzschild coordinates (r, , ), have worldlines with 4-velocity u .
3. We can then interpret the fact that gtt (rs ) = 0 as the statement that such static
observers can only exist for r > rs . In this sense
r = rs g = 0 , (31.8)
679
6. Moreover, we have that this locus is itself actually a null surface,
Because the length of a Killing vector does not change in the direction of a Killing
vector (see (8.59) or (8.60)), is tangent to K (and therefore also normal to K,
cf. the discussion of null hypersurfaces in section 16.1).
7. We can reformulate this somewhat more invariantly as the statement that the nor-
mal vector to the hypersurfaces of constant r, N r, which is asymptotically
spacelike, becomes null at r = rs ,
r = rs g r r = 0 (31.11)
8. In particular, the hypersurface r = rs is null, and for r < rs one can only move
through the spacelike hypersurfaces of constant r in the direction of decreasing r,
As we have seen, a crucial role in our analysis of the Schwarzschild black hole was played
by analysing and understanding the behaviour of radial lightrays and lightcones, i.e. the
causal structure of the space-time. Let us reconsider r = rs from this point of view:
and therefore for r < rs these would-be outoing lightrays actually also move to
smaller values of r,
Thus r = rs is where the lightcones tilt over, and can only be crossed in the
direction from r > rs to r < rs .
680
We had already noted in section 26.4 that this can be phrased in a somewhat more
geometric and invariant way in terms of the expansions n and of the in- and outgoing
radiall null congruences defined by the null vector fields (26.94)
n = r , = v + 12 f (r)r . (31.17)
For these expansions we found
2
n = <0 r , (31.18)
r
and
> 0 r > rs
r 2m
= = 0 r = rs (31.19)
r2
< 0 r < rs
One says that for r < rs (resp. r = rs ), the spheres Sr,v of constant r and v, with n < 0
and < 0 (resp. = 0) are trapped (resp. marginally trapped).
11. We can also rephrase this as the statement that the null surface r = rs is foliated
by such marginally trapped spheres,
T v Srs ,v = {r = rs } . (31.21)
Finally we can also turn to the global, causal characterisation of a black hole in terms
of a (future) event horizon, defined here (for the time being) informally as the boundary
of the region from which signals (lightrays) can be sent to an asymptotic oberver (the
more formal definition of the event horizon will be discussed in section 31.4):
12. Due to the time-independence of the Schwarzschild metric, this global property
follows from the local behaviour of the lightcones established above, and therefore
we have
{r = rs } = H+ is a (future) event horizon . (31.22)
For the static and spherically symmetric Schwarzschild metric, all these 12 characteri-
sations of r = rs (and perhaps some others I have overlooked or deliberately ignored)
are equivalent. Some of these appear to be more closely related than others to what
one might mean by a horizon or a black hole, but clearly the Schwarzschild black
hole alone is not enough to decide which of these criteria are pertinent or equivalent in
more generality. As soon as one moves away either from the static situation or from
the spherically symmetric situation, one finds that these different characterisation do
no longer necesarily coincide (or are not even applicable) and even when applicable may
capture different phenomena. Thus in order to decide which of these properties are the
most useful or appropriate to capture at least some aspect of the black hole-ness of
an object, we will now look at another example.
681
31.3 Kerr Metric: Ergosphere vs Killing Horizon and Event Horizon
This example is the Kerr metric, already briefly mentioned in section 29.1. In contrast
to the Schwarzschild metric, it is neither static nor spherically symmetric, but it is still
stationary and axially symmetric.
The Kerr metric is perhaps the single most important exact solution of the Einstein
equations for astrophysical purposes, and there are a lot of things that should be said
about the Kerr metric (and this is done in most respectable textbooks of general rela-
tivity), but here I will focus on those that are relevant for the (horizon) issue at hand.
In Boyer-Lindquist coordinates (t, r, , ), the Kerr metric is (29.3),(29.5)
2mr 4mra sin2 2
ds2 = 1 2 dt2 2
dt d + 2 sin2 d2 + dr 2 + 2 d 2
2 2
(31.23)
2 2 2 2 2 2
= dt + 2 sin (d dt) + dr + d ,
where , , , are the (unfortunately somewhat complicated) functions
(r) = r 2 2mr + a2
(r, )2 = r 2 + a2 cos2
(31.24)
(r, ) = (r 2 + a2 )2 (r)a2 sin2
(r, ) = gt /g = 2mar/(r, ) .
For later use we note that the coefficients of the metric satisfy the simple (but rather
unobvious) relation
2
gt gtt g = (r) sin2 . (31.25)
This also implies that the volume element g has the surprisingly simple Schwarzschild-
like form
g = 2 sin (31.26)
As mentioned in section 29.1, this metric describes the gravitational field outside a
rotating star or that of a rotating a black hole, with mass parameter m and angular
momentum parameter a (and with the condition |a| m, analogous to the condition
|q| m for the Reissner-Nordstrm metric excluding naked singularities - we will only
look at the non-extremal case m2 > a2 in the following).
This metric is stationary, with time-translation Killing vector = t , and axially sym-
metric, with the rotational Killing vector = , and the Boyer-Lindquist coordinates
are evidently adapted to these two commuting symmetries. The most general Killing
vector of the Kerr metric is thus of the form
K = a + b (31.27)
682
with constant a, b. Because asymptotically the norm of is proportional to r 2 , while the
norm of is asymptotic to -1, the unique asymptotically timelike Killing vector of the
Kerr metric is (up to a constant rescaling) the time-translation Killing vector = t .
Since this metric is stationary, with Killing vector , it makes sense to ask if or where this
(asymptotically timelike) Killing vector becomes null. Likewise, because of the existence
of a privileged (adapted) time-coordinate, there is a preferred class of observers, static
observers, which remain at fixed values of the spatial coordinates (r, , ), with 4-velocity
u (31.28)
and it is legitimate ask if there is a static limit or infinite redshift surface for such
observers. Both of these questions amount to determining the zeros of
g = gtt , (31.29)
Since the metric also has the axial Killing vevtor = , there is also a more general
class of privileged observers, called stationary observers, who remain at fixed values of
(r, ) but rotate in the -direction with constant angular velocity , so that
u
= + , (31.30)
and one can (and we will) also enquire about the existence of a corresponding station-
ary limit surface. Note that constant angular velocity here means constant for an
observer at constant (r, ), i.e. = (r, ).
We will first consider = t and static observers. The question is thus if or where gtt is
zero. From the explicit expression for the metric given above one finds that
with solution (the rationale for the notation rsl () will become apparent below)
p
r = rsl () = m + m2 a2 cos2 . (31.32)
Thus at the poles = 0, (on the axis of rotation) one has rsl = m + m2 a2 , and
on the equatorial plane = /2 one has rsl = 2m.
This surface
S = {r = rsl ()} (31.33)
r = rsl () defines the static limit surface for static observers (hence the notation
rsl ), i.e. no static observers can exist for r < rsl ().
683
r = rsl also defines a surface of infinite redshift for static observers.
rsl () also defines what is commonly also called the ergosphere, and the region
between the ergosphere and the event horizon, which we will pinpoint below, is
then known as the ergoregion. This name arises from the fact (known as the
Penrose process), that (some of) the rotational energy of a rotating black hole can
be extracted from the ergoregion of a black hole (and ergon = work in ancient
Greek).
1. For example, even though no static observers can exist for r < rsl (), this does
not by itself imply that one cannot escape from that region, and it is also not
true. Indeed, while static obervers cannot exist inside the ergosphere (static limit
surface) S, stationary observers with u + can (for some range of
r < rsl ()) provided that they are willing to rotate with, i.e. in the direction of,
the black hole.
More precisely, requiring that u or = + be timelike,
where q
gt 2 g g
gt
tt 2
= = (31.36)
g sin
are the two roots of the polynomial in (31.34). In the 2nd step I used the definition
of in (31.24) and the identity (31.25).
Now, on the ergosphere (static limit surface) S one has, by definition, gtt = 0,
so that = 0 there (note that gt is negative), while is negative (positive)
outside (inside) the ergosphere,
<0 for r > rsl
=0 for r = rsl (31.37)
>0 for r < rsl
684
This is to be interpreted as the statement that outside the ergosphere stationary
observers can exist that can rotate either with or against the sense of rotation
of the black hole, while on and inside the ergosphere a stationary observer has
no choice but to rotate with (i.e. to be dragged along by) the black hole. This
condition > 0 continues to hold inside the ergosphere even when one adds
momentum in the r (and/or ) direction, of either sign, and such observers can
then leave the ergopshere again. Thus the ergopshere is not like a horizon or 1-way
membrane.
From the above explicit expression for the we see that something special
happens not only when gtt = 0 (this we just discussed) but also when or where
(r) = 0. We will come back to this below.
2. Another way of stating that the ergosphere (static limit surface) S is not very
horizon-like is as the fact that S is a timelike surface, i.e. it has a spacelike normal
(away from the axis of rotation). This can be seen from the fact that a (non-
normalised) vector normal to
will be
N = S : N = (0, 1, drsl /d, 0) , (31.39)
with norm
N N = g rr + g (drsl ()/d)2 . (31.40)
With
1
grr = , g = (31.41)
2 2
this evaluates on r = rsl () to
1 m2 a2 sin2
N N = 0 , (31.42)
2mrsl m2 a2 cos2
with N N = 0 only at the poles. Such a timelike surface can never act as a
horizon or 1-way membrane, since one can cross a timelike surface or timelike
worldline multiple times in both directions (otherwise it would be really hard to
meet people more than once!).
Thus, even though the Killing vector becomes null on the ergopshere, this does
not imply all by itself that the surface on which becomes null is itself a null
surface, even though this is what happened in the Schwarzschild case (we will see
in section 31.5 that in general in the static case the former implies the latter).
Looking back at the list in section 31.2, we see that the (more or less equivalent)
properties (1) and (3)-(5), as applied to = t , describe the ergopshere but not a
685
horizon (but we keep an open mind regarding condition (6) because, as mentioned
above, does not satisfy this condition since the ergosphere is not a null hypersurface).
Moving down in the list, we next have the (again more or less equivalent) conditions
(2) and (7)-(9). For the Kerr metric, one has
grr = (31.43)
2
and therefore
grr (r) = 0 (r) = r 2 2mr + a2 = 0 . (31.44)
This has the 2 roots p
r = m m2 a2 , (31.45)
and we focus on r+ , as this is the one one encounters first. Note that
p p
r+ = m + m2 a2 m + m2 a2 cos2 = rsl () (31.46)
Thus the surface r = r+ is null, at this point one (radial) leg of the lightcone is aligned
with this surface, and therefore this surface can (locally) only be crossed in one direction,
in the case at hand from r > r+ to r < r+ . Thus r+ is our candidate for a black hole
horizon.
It turns out that this also agrees with the event horizon. Instead of attempting to confirm
this head-on, we first make the following observations regarding additional properties
of the null surface r = r+ :
h h = + h (31.48)
becomes null,
g h h |r=r+ = 0 . (31.49)
As noted at the beginning of this section, h is not asymptotically timelike. Nev-
ertheless, a preferred normalisation for (such as 1 asymptotically) also
leads to a preferred normalisation for h .
686
3. By the same argument as for the Schwarzschild metric in section 31.2, since the
length of a Killing vector does not change in the direction of the Killing vector,
h is tangent to the null hypersurface r = r+ . Therefore this particular linear
combination of the two Killing vectors actually satisfies property (6) of section
31.2,
We will formalise this property (a null hypersurface with a normal Killing vector)
in terms of a Killing horizon below (section 31.5).
4. Because of the lack of spherical symmetry of the Kerr metric, the determination
of the expansion of outgoing null congruences orthogonal to some 2-surface (of
constant t and r, say) is somewhat more involved than for the Schwarzschild
metric. Therefore, this is not the ideal way to check if our horizon candidate
r = r+ can also be described in terms of marginally trapped surfaces, as in the
characterisations (10) and (11) of section 31.2.
A better way to do this is to make use of the fact we just established that the
Killing vector h is tangent to the null surface r = r+ . Since h is a Killing vector,
the geometry cannot change along h . Moreover, because h is normal to r = r+ ,
it provides the null generators of K (cf. the discussion in section 16.2). Together,
these two statements imply that the null geodesic congruence generated by h on
r = r+ must have zero expansion,
h = 0 , (31.51)
because if it had non-zero expansion, something would change along the congru-
ence, e.g. the cross-sectional area. The formal argument for this will be given in
section 31.5 below. This also shows that
T = {r = r+ } (31.52)
is foliated by marginally trapped surfaces, and we see that for the Kerr metric
r = r+ also satisfies the properties (10) and (11) of section 31.2.
Finally we can turn to property (12), i.e. we return to the question if r = r+ is actually
the event horizon, as informally defined so far. We have seen that outgoing lightrays
can only be truly outgoing for r > r+ . In general, the future behaviour of such (momen-
tarily outgoing) lightrays depends on the future evolution of the geometry. In the case
at hand, however, because the metric is stationary, we can extrapolate this statement
all the way to the future to conclude that indeed r = r+ is also the (future, outer) event
horizon of the Kerr metric. (In addition, as for the Reissner-Nordstrm metric, there
are inner and/or past horizons at r = r , but we are not interested in these here).
687
Thus, from the Kerrr metric we learn that there are essentially 3 a priori logically
distinct ways of characterising the event horizon of a stationary black hole, namely in
terms of
1. an event horizon
2. a Killing horizon
(cf. the introduction to this section), and we will now formalise these notions in turn,
starting with the event horizon.
Since this refers to asymptotic observers and the causal structure, it is useful to re-
call the Penrose diagrams for the Schwarzschild and Reissner-Nordstrm solutions. In
particular, as we already know that past horizons and white holes are an artefact of
considering eternal black holes, we will focus on the external asymptotically flat region
and the region around the future horizon, as represented e.g. by the Penrose diagrams
for the collapse of a null shell (Figure 31) or of a star (Figure 33). The essential features
of these diagrams are reproduced in Figure 38 below.
From these diagrams we can read off that what characterises the black hole in an
asymptotically flat space-time is that it is the region of space-time from which one
cannot send signals to future null infinity I + . Equivalently, the event horizon is the
boundary of the region that can send signals to I + .
This is now also precisely captured by, and made precise in, the official definition of
a future event horizon H+ , as the boundary of the past of future null infinity I + ,
This becomes a rigorous definition once all the terms appearing in it have been properly
defined, but we will not attempt this here.113 This definition is illustrated in Figure 39
which, none too surprisingly, does not differ significantly from the diagrams in Figure
113
For a detailed treatment, see e.g. S. Hawking, G. Ellis, The large scale structure of space-time and
sections 11 and 12 of R. Wald, General Relativity.
688
r=0
i+ r=0
i+
H+
I+ I+
H+ i0 i0
r=0
r=0
I I
i i
Figure 38: Penrose Diagram of the essential part of a (Schwarzschild) black hole: col-
lapse of a null-shell on the left, collapse of a star on the right.
38 (it does, however, deliberately remain agnostic about what happens inside the black
hole, e.g. whether or not there is a singularity inside; or an inner horizon; or dragons;
the definition does not address this).
B
I+
H+
Figure 39: Definition of the event horizon and the black hole region: the future event
horizon H+ is the (future) boundary of the past of future null infinity I + . The comple-
ment of the past of I + is the black hole region B, the region from which no signals can
be sent to I + .
2. The definition relies on the existence of conformal infinity, in particular future null
infinity I + . As such, this definition can be used in (suitably defined) asymptoti-
689
cally flat space-times, as well as for certain other asymptotics (e.g. asymptotically
anti-de Sitter space-times). However, it cannot be used in spatially compact space-
times.
3. This definition is what is usually called teleological in the literature, i.e. given
that the event horizon is defined as the future boundary of the past of future
null infinity, in order to define a black hole (or even in order to decide if there
is a black hole at all somewhere right now) one needs to know the entire future
evolution of the space-time (and then trace back lightrays from the infinite future
to today).
This definition of a black hole in terms of an event horizon has been tremendously
useful, and has led to numerous and valuable insights into the nature of black holes.
However, this time-honoured definition of a black hole is not completely unproblematic
(the following, and other, points have all been made repeatedly in the past, in particular
recently in the literature developing and advocating alternative quasi-local definitions
of black hole boundaries; see the references in footnote 124 in section 31.9 below):
1. This definition of a black hole is so non-local in space that it rules out black holes
in spatially compact universes.
2. It is also so non-local in time that it does not even allow astrophysicists to speak
now about a supermassive black hole at the center of our galaxy.
Perhaps black holes are indeed intrinsically so non-local objects that one cannot do
better. However, in many ways black holes appear to be behave like reasonably local
objects. It should also be kept in mind that, strictly speaking, the definition of asymp-
totically flat space-times and the associated construction of I + and conformal infinity,
were always meant to be idealisations of sufficiently distant observers in realistic space-
times, say. Such idealisations are of course very common in phyics (spherical cows), but
they are only useful if they actually simplify the analysis. If such idealisations give rise
to their own technical problems (and there are indeed such problems114 ), then perhaps
other idealised descriptions should be sought.
114
See e.g. sections 1.4 and 1.5 of P. Chrusciel, Black Holes, arXiv:gr-qc/0201053 for an incisive
mathematical critique, and an analysis of the deficiencies of the corresponding I + -based definition of
a black hole.
690
This suggests that the definition of a black hole in terms of an event horizon is perhaps
not for all intents and purposes the best definition. For all these reasons, but also
motivated by considerations involving the mechanics and thermodynamics of black hole
horizons, in recent years a lot of work has gone into finding suitable definitions and
quasi-local geometric characterisations of horizons and studying their properties. Most
(if not all) of these rely in one way or another on marginally trapped surfaces and related
concepts, and we will briefly return to this later on, after having discussed the Vaidya
metric (sections 31.8 and 31.9).
Looking back at the list in section 31.2, one of the characterisations of the Schwarzschild
radius that a priori appears to have little to do with the most familiar properties or
intuitive notions of a black hole, or with the event horizon, is property (6), that the
horizon is a null surface with normal vector a Killing vector ( = t in that case).
Nevertheless, we saw that the event horizon of the Kerr black hole also has this property,
albeit with respect to a different Killing vector h = + h .
The fact that in both these examples the event horizon turned out to have this property
is no coincidence. Indeed, there are so-called rigidity theorems which relate the global
causal notion of an event horizon to the (a priori unrelated and independent) local,
purely geometrical notion of a Killing horizon, which we can define eqivalently as
a Killing horizon is a null hypersurface K whose null generators are the integral
curves of the restiction of a Killing vector field to K.
These above-mentioned rigidity theorems then state that under rather general condi-
tions, and in a variety of circumstances, the event horizon of a stationary black hole
must be a Killing horizon.115
One particular result along these lines is that in the static case the event horizon is
a Killing horizon for the asymptotically timelike (and hypersurface-orthogonal) Killing
vector . In particular, validity of this statement requires that the hypersurface on
115
See R. Wald, The Thermodynamics of Black Holes, Living Rev. Relativity 4, (2001),
6; http://www.livingreviews.org/lrr-2001-6, arXiv:gr-qc/9912119 for an overview and gen-
eral discussion, S. Hawking and G. Ellis, The large scale structure of space-time for a de-
tailed account of the classical results, and section 3.3.1 of P. Chrusciel, J. Lopes Costa, M.
Heusler, Stationary Black Holes: Uniqueness and Beyond, Living Rev. Relativity 15 (2012) 7,
http://www.livingreviews.org/lrr-2012-7, arXiv:1205.6112 [gr-qc] for a critical assessment of
the current state of the art.
691
which becomes null, i.e. the static limit surface or infinite redshift surface, is itself a
null surface, and thus a Killing horizon (something that, as we have seen, is not true
for the Kerr metric).
Subject to one simplifying technical assumption, this latter assertion is easy to prove.
This assumption is that the static limit surface S is indeed a hypersurface that can be
defined by 2 = 0 (or at least as a conected component of this set). In other words, we
assume that 2 does not also vanish in some neighbourhood of a hypersurface. In that
case, as in our general discussion of hypersurfaces in sections 14 - 16, we can characterise
S in terms of its defining function
as
S = {x : (x) (x) = 0} = {x : S(x) = 0} . (31.55)
We can then also choose S as a non-vanishing normal to the surface.
Because the norm of a Killing vector does not change along the orbits of a Killing vector
(see (8.59) or (8.60)), is necessarily tangent to S, and since is null on S, S cannot
be a spacelike surface and therefore can be either a timelike surface (as for the Kerr
metric) or a null surface (as for the Schwarzschild metric). What we want to show is
that for hypersurface-orthogonal the hypersurface S is null,
Then one has a null surface with a Killing normal and therefore a Killing horizon.
[ ] = 0 . (31.58)
+ = . (31.59)
Contracting this with , we see that on the static limit surface S we have
S = S (on S) . (31.60)
V W = V W W V (31.61)
692
(provided that neither V nor W is identically zero), we can conclude that, since by our
assumption S 6= 0 on S, we have
S 6= 0 S (on S) . (31.62)
Since is null on S, this shows that the normal vector to the surface is a null vector,
and therefore the static limit surface S is a null surface with a normal Killing vector
and therefore is a Killing horizon,
as claimed.
As the geometrically defined Killing horizon is much easier to work with than the globally
defined event horizon, even in the stationary non-static case, it is common practice to
base investigations of stationary black holes on the Killing horizon. In the following we
will explore some of the more elementary properties of such Killing horizons.
Thus we assume that we are given a Killing vector with a Killing horizon K. Since K is
a null surface, K will have all the properties of a general null hypersurface N described
in section 16. In particular, the integral curves of are the null geodesic generators of
the surface, and there is a function (x) on K (the inaffinity) such that
= . (31.64)
Special features of Killing horizons arise from the fact that these null geodesics genera-
tors are Killing vectors or orbits of the isometry group. Some properties of the inaffinity
(or surface gravity) will be discussed in section 31.6 below, while the properties of
the generating null congruence of a Killing horizon (and the comparison with those of
a general event horizon) will be the subject of section 31.7.
All in all, Killing horizons turn out to provide a fairly satisfactory characterisation and
description of stationary black holes. In particular it provides the basis of the laws of
black hole machanics and black hole thermodynamics.116 Nevertheless, this definition
has some shortcomings:
1. First of all, Killing horizons are not necessarily associated with black holes. For
example, the horizon x = t of a Rindler observer ( 1 = 0 in the notation of section
1.3, but here we use to denote the Killing vector, not inertial coordinates) is a
Killing horizon of the boost Killing vector (1.70)
= xt + tx (31.65)
116
See e.g. section 12.5 of R. Wald, General Relativity, or section 6 of P. Townsend, Black
Holes, arXiv:gr-qc/9707012v1, G. Comp`ere, An introduction to the mechanics of black holes,
arXiv:gr-qc/0611129, or section 5.5 of E. Poisson, A Relativists Toolkit for introductions to this
subject.
693
Indeed,
= t 2 x2 (31.66)
N a = (x t) N = x + t , (31.67)
3. Also, the definition of a Killing horizon requires the existence of a global asymp-
totically timelike Killing vector, and is thus not applicable in situations where
either a Schwarzschild black hole forms from gravitational collapse, say, or where
locally a black hole can be considered to be in equilibrium with its immediate
surroundings but where there is some dynamics far away from the black hole.
4. Nevertheless, if black holes are not intrinsically and unavoidably very non-local
objects one would expect some version of the laws of black hole mechanics to apply
also to the (stationary portions) of the horizons of such objects. This was one of the
motivations for developing the framework of Isolated Horizons.117 These isolated
horizons can be considered to be a special (null) case of definitions of horizons
based on marginally trapped surfaces which I will briefly discuss later on.
As shown in section 16.2, and recalled in the previous section, the null normal vector
field of any null hypersurface N generates a null geodesic congruence; in particular
one has
= (31.70)
for some function (x) called the inaffinity. However, as also discussed in section 16.2,
for a general null hypersurface N the function (x) has no particular significance, since
it can be changed (and even made to vanish) by replacing the normal vector by f
117
See e.g. A. Ashtekar, B. Krishnan, Isolated and dynamical horizons and their applications, Living
Rev. Relativity 7, (2004) 10. http://www.livingreviews.org/lrr-2004-10, arXiv:gr-qc/0407042
and J. Engle, T. Liko, Isolated horizons in classical and quantum gravity, arXiv:1112.4412 [gr-qc].
694
for some non-vanishing function f on N . In particular, if one chooses f such that
= f is affinely parametrised one has (x) = 0.
For a Killing vector field , however, i.e. for N = K a Killing horizon, we only have the
freedom to rescale by a constant, and if we have a preferred normalisation for (as for
the Schwarzschild and Kerr metrics), then is uniquely determined and is also known
as the surface gravity of the Killing horizon or of the corresponding black hole. In this
section, we will look at some elementary properties of the surface gravity of a black
hole.
While we defined as the inaffinity (31.70) of the null geodesics generated by the
Killing vector on its Killing horizon K, there are two commonly used alternative ways
of defining and/or determining , and we start by introducing these.
1. For the 1st alternative definition, let us again assume, as in the proof of the
statement (31.56) in the previous section, that the condition = 0 actually
defines K, i.e. that is null locally only on K and not also in some neighbourhood
of K. Thus we can characterise K in terms of its defining function
as
K = {x : S(x) = 0} , (31.72)
and S is normal to K and thus necessarily proportional to ,
S . (31.73)
S = ( ) = 2 = +2 = 2 . (31.74)
( ) = 2 . (31.75)
695
with gvv = f and gvr = 1, and therefore
|K = r . (31.78)
2. For the 2nd alternative definition, we make use of the fact that, as the normal
vector to K, is hypersurface-orthogonal and therefore satisfies the Frobenius
integrability condition (14.55)
[ ] = 0 . (31.80)
= + . (31.81)
= = 2( )2 . (31.82)
Thus at points at which (x) 6= 0, one can extract from this that can alterna-
tively be defined as (or computed from)
( )2 = 12 ( )( ) . (31.83)
By continuity this equation can then also be shown to hold at points at which
(x) = 0 (and at which then necessarily ()(x) 6= 0 identically - cf. the argument
in section 13.1).
Because is defined purely geometrically, one can (and should) expect (x) to be
constant along the isometry directions, i.e. along the null geodesic generators of K,
This is indeed true and not too difficult to prove, and we will do this below. Interestingly,
in the situations where one has the rigidity theorems mentioned in section 31.5 at
ones disposal one can prove a much stronger statement, namely that (x) is not only
constant along (the integral curves of) , but actually constant all over the Killing
horizon K, but this requires more work. I will briefly come back to this at the end of
this section.
696
1. using the characterisation (31.70) and the Lie derivative along
1. The first proof is essentially a 1-line argument, and uses Lie derivatives. It relies
on the fact that for a Killing vector and any two vector fields X, Y one has
L (X Y ) = L X Y + X (L Y ) (31.85)
(while for a non-Killing vector field there would be another term arising from the
Lie derivative of the Christoffel symbols, which one could write symbolically as
(L )X Y ). Since L = [, ] = 0, one has
0 = L ( ) = L ( ) = (L ) . (31.86)
2. An alternative argument uses covariant instead of Lie derivatives, and the identity
(12.3) of section 12.1 for the 2nd covariant derivative of Killing vectors, namely
= R . (31.87)
Armed with this, we act with on the defining relation (31.70). Acting on
the left-hand side we find
( ) = ( ) +
= + R (31.88)
= ( )2 ,
since the curvature term vanishes because of the anti-symmetry of the Riemann
tensor. Acting on the right-hand side, we have
( ) = ( ) +
(31.89)
= ( ) + ( )2 .
3. The expression (31.83) for also provides one with a quick alternative proof
along these lines of the constancy of along the orbits of . As a consequence of
(31.87), one has
( )2 = ( ) ( ) = ( )R = 0 . (31.90)
697
As mentioned before, it is also possible to show, with some additional hypotheses (most
importantly the so-called dominant energy condition, cf. section 21.1), that is also
constant along the other (spatial) directions of the horizon. In terms of the adapted
coordinates of section 16.3 this is the statement that
Ek ( )2 = ( )R Ek = 0 . (31.91)
However, the standard proofs of this fact all require some non-trivial or at least non-
obvious gymnastics.118
b = 12 s + 21 (b + b s ) + 12 (b b )
(31.92)
= 12 s + + .
d
= R 12 2 + , (31.93)
d
describing the evolution of the expansion along the congruence generated by . In
section 11.5, we had then subsequently extended this to non-affinely parametrised null
congruences, with the result that there is just one additional term involving the inaffinity
of the congruence (11.129),
d
L = = R 12 2 + . (31.94)
d
We will now see that these results simplify drastically when restricted and specialised
to a Killing horizon and the null congruence generating that Killing horizon:
hypersurface-orthogonal = 0 on K . (31.95)
118
See e.g. section 12.5 of R. Wald, General Relativity, or section 6 of P. Townsend, Black Holes,
arXiv:gr-qc/9707012v1; for a discussion and a different argument see also I. Racz, R. Wald, Global
Extensions of Spacetimes Describing Asymptotic Final States of Black Holes, arXiv:gr-qc/9507055.
698
Because is a Killing vector,
+ = 0 , (31.96)
the symmetric part of B = vanishes, and therefore also its spatial pro-
jection is zero, implying that the shear and expansion of this null congruence are
zero,
L = R (31.98)
Since the expansion is zero on K, = 0, it does not vary along K, and therefore
also
L = 0 on K . (31.99)
Therefore we have
R = 0 on K . (31.100)
If the Einstein equations are satisfied, this can be rephrased as the statement that
T = 0 on K . (31.101)
This can be interpreted as the statement that there is no flow of matter across
the Killing = event horizon, evidently a necessary condition for a stationary black
hole.
It is useful to contrast this with the corresponding equation for a general event horizon
H+ . This is still a null surface, and therefore has a null normal and the corresponding
hypersurface-orthogonal generators. As a consequence, the generating null congruence
of H+ satisfies
d
= 12 2 R . (31.102)
d
Here we have chosen to generate affinely parametrised geodesics, as we are free to
in this more general context where is not restricted by the Killing vector condition.
Using the Einstein equations, we can also write this as
d
= 21 2 8GN T . (31.103)
d
Here the first 2 terms on the right-hand side are manifestly non-positive, and the last
term will also be non-positive provided that the so-called null energy condition (cf.
section 21.1)
k k = 0 T k k 0 (31.104)
699
is satisfied. Thus in that case one has
d
T 0 0 , (31.105)
d
i.e. cannot increase. As shown in section 11.4, this implies that if ( ) < 0 for some
value of , then within a finite -interval. This is about as far as possible
from the value = 0 of the event horizon of a stationary black hole, and therefore for
any event horizon that asymptotically becomes stationary one must have 0. This
is a special case of a much more general result due to Penrose that the generators of an
event horizon (whose definition requires the existence of a well-defined I + etc.) have no
future endpoints (and can therefore in particular not develop caustics with ),
H+ : 0 . (31.106)
Because measures the change in the cross-sectional area of the null congruence,
d
s = s , (31.107)
d
we deduce that the cross-sectional area of the generating null congruence of an asymp-
totically stationary event horizon cannot decrease,
d
s0 . (31.108)
d
This is one of the key ingredients in Hawkings celebrated more general Area Theorem
stating that the area of a black hole cannot decrease if the null energy condition is
satisfied.
As shown in section 11.4, we can also write (31.103) as an equation (11.109) for the
change in the expansion rate of the cross-sectional area s of the congruence, i.e. of
the horizon in the case at hand, namely (using the Einstein equations and setting the
rotation to zero)
d2
1 2
s = + 2 8GN T s . (31.109)
d 2
and this equation provides some insight into the behaviour of the event horizon.119 In
particular, one sees that even though (as shown above) cannot increase, the rate of
expansion of the horizon can increase, and will actually increase whenever the 1st term
dominates over the other terms.
One seemingly counterintuitive consequence of this is that the growth rate of the horizon
is largest when there is no matter and that it actually decreases when matter arrives to
cross the event horizon into the black hole. In some sense this reflects, and is commonly
attributed to, the global definition of an event horizon which requires one to know the
119
See I. Booth, Black Hole Boundaries, arXiv:gr-qc/0508107, for an illuminating discussion and
more details.
700
entire evolution of the black hole in the future in order to determine the location of the
event horizon at some earlier time.
However, the above conclusion is true not just for an event horizon but more generally for
the generating (and thus hypersurface-orthogonal) congruence of any null hypersurface
(perhaps subject to the condition 0). It also becomes somewhat less counterintu-
itive when one compares it with the behaviour of radial null congruences in Minkowski
space. This example was discussed in Remark 4 of section 11.4, where we observed that
1 d2 1 2
2
s = 2 (r )2 r 2 = 2 = + 12 2 , (31.113)
s d r r
A deviation of this behaviour thus signals the presence of a non-trivial curved space-
time and matter, and matter obeying the null energy condition will have an attractive
focussing effect on lightrays and will therefore decrease the expansion rate of the con-
gruence, in the case at hand that of the horizon, just as we saw above.
Thus, once one has an event horizon, its evolution behaves in a causal and predictable
way (namely according to the Raychaudhuri equation). Nevertheless, the very fact that
an event horizon can start forming in empty space (as e.g. in the collapse of a null shell),
long before any matter has arrived, does reflect the teleological character of the event
horizon.
So far, we have only explicitly considered stationary black holes. New features arise
when one considers truly time-dependent dynamical black hole solutions. In general
this is complicated, of course, but a tractable class of examples is provided by the
701
so-called Vaidya metrics, already briefly mentioned in section 29.2. We will consider
the ingoing Vaidya metrics, generalisations of the Schwarzschild metric in ingoing (ad-
vanced) Eddington-Finkelstein coordinates with a mass parameter m = m(v) that is
now allowed to depend on the retarded time coordinate v,
2m(v)
ds2 = f (v, r)dv 2 + 2dvdr + r 2 d2 , f (v, r) = 1 . (31.114)
r
We will make use of the following properties of these metrics (for a more detailed
discussion of Vaidya metrics see sections 39 - 41):
1. These metrics are spherically symmetric, and they are written in coordinates that
are adapted to this spherical symmetry and to ingoing radial lightrays, i.e. the
lines of constant v (and constant angular coordinate) are ingoing lightrays, with
r an affine parameter along these lightrays.
2. These metrics are solutions to the Einstein equations with a null energy-momentum
tensor of the form (39.9)
m (v) v v
T = . (31.115)
4GN r 2
The null energy condition requires
m (v) 0 , (31.116)
so that the mass m(v) cannot decrease. These solutions can describe null dust (or
incoherent radiation) either entering or forming a black hole.
3. A particular (but singular) example of this was the collapsing null shell of section
28.1 with mass function (28.5),
m(v) = mf (v v0 ) . (31.117)
(a) A concrete and analytically tractable example is provided by the linear mass
function (choosing v0 = 0 for notational convenience)
0 v v0 = 0
m(v) = (mf /v1 )v 0 v v1 (31.119)
mf v v1
702
In this case, the energy momentm tensor m (v) has jumps (discontinuities)
at v = v0 and v = v1 . We will look at this class of models (and variants
thereof) in detail in section 41.
(b) One can of course also choose mass functions such that the metric and its 1st
derivative with respect to v are continuous. A simple (and common) choice
is
v2
m(v) = mf 2 for v 0 , (31.120)
v + T2
with
In the case of the Schwarzschild metric, the null hypersurface r = 2m described the
characteristic event horizon of a static black hole. It is evident that also for the general
ingoing Vaidya metric something special happens at those points of space-time where
We can also write this condition equivalently in the equally familiar form
r = 2m(v) g rr = 0 . (31.123)
However, as we will discuss now, this is not the event horizon of the Vaidya black hole.
Rather, depending on who one talks to this hypersurface is known
(these terms will be explained in section 31.9 below), and it is distinct from the event
horizon unless m(v) = m0 is constant.
In the present context it is first of all again the locus where the lightcones tilt over, i.e.
the boundary between between the region where the so-called outgoing future-oriented
lightrays are really locally outgoing in the sense that they move to larger values of
r, dr/d > 0, and the region where also the supposedly outgoing future-oriented
lightrays move to smaller values of r, dr/d < 0. This can be seen directly e.g. from
the condition
dr
f (v, r)dv + 2dr = 0 2 = f (v, r) . (31.124)
dv
for outgoing (v not constant) lightrays in ingoing coordinates. Then one sees that
(
dr > 0 for r > 2m(v) : truly outgoing
= 21 f (v, r) . (31.125)
dv < 0 for r < 2m(v) : actually ingoing
703
As in our discussion of the analogous phenomenon for the Schwarzschild metric in
Eddington-Finkelstein coordinates in section 26.4, we can rephrase this more geomet-
rically and invariantly in terms of the expansion of null congruences. To that end we
introduce the radial null vector fields
with
n 2 = 2 = 0 , n. = 1 , (31.127)
which are the obvious Vaidya counterparts of the null vector fields introduced in (26.94)
for the Schwarzschild metric. Also the expansions turn out to be exactly like their
Schwarzschild counterparts. In order to determine the expansions, we can consider the
2-spheres S = Sv,r of constant r and v. The intrinsic geometry is characterised by the
induced metric, in particular by the induced volume element
s = r 2 sin . (31.128)
Because of the spherical symmetry the extrinsic geometry of the 2-sphere can be com-
pletely characterised by the fractional change of the area element along and n, i.e. by
the expansions
1 1
= L s , n = Ln s . (31.129)
s s
Concretetly, using (31.128), one finds for the expansions (cf. (11.151) and (11.152))
s 2 2
= = r = r
s r r
(31.130)
n s 2 2 r
n = = n r = n .
s r r
These expansions are therefore a measure of the change of r (and hence the induced
area) along the null directions n and . Since n is ingoing, one expects n < 0, and this
expectation is indeed borne out in the ingoing Vaidya metric, for which one has, from
(31.126), nr = 1 < 0, and thus
2
n = <0 . (31.131)
r
This is indepedent of the mass function m(v) and therefore, in particular, identical to
the inward expansion (perhaps better: contraction) of a sphere of constant t and r in
Minkowski space along an ingoing radial congruence of light rays.
r 2m(v)
= v + 12 f (v, r)r = . (31.132)
r2
704
Thus
> 0 for r > 2m(v)
= 0 for r = 2m(v) (31.133)
< 0 for r < 2m(v)
Now in general for a 2-surface S with n < 0
untrapped if > 0
S is called marginally trapped if = 0 (31.134)
trapped if < 0
and thus we can rephrase the above result as the statement that for the Vaidya metric
untrapped for r > 2m(v)
Sv,r is marginally trapped for r = 2m(v) (31.135)
trapped for r < 2m(v)
Remarks:
1. The null vector field n = r is future oriented and ingoing, and in terms of n
the energy-momentum tensor of the Vaidya metric takes the characteristic ingoing
form (see (39.57) and the general discussion in section 39.4)
T = in n n . (31.136)
T = in 0 . (31.137)
n n = 0 n = 0 . (31.138)
For , on the other hand, one finds (see (40.8) in section 40.1, which contains a
general discussion of Vaidya null geodesics)
m(v)
= , (31.139)
r2
which is again the obvious Vaidya counterpart of the Schwarzschild expression.
705
However, as we will see below, if (or where) m(v) is not locally constant, is not
tangent to r = 2m(v). Thus it is not immediately obvious if this expression can
have a useful interpretation as the surface gravity of the Vaidya metric.120
t = v r , (31.141)
which is modelled on (and reduces to) the Kerr-Schild (or Eddington) time-
coordinate t defined by
v = t + r = t + r (31.142)
for the Schwarzschild metric and introduced in (26.106),
It is good to keep in mind, however, that, while there is evidently a unique solution
of r = 2m(v) for a given v (i.e. on a slice of constant v), the solution need
not be unique on a slice of constant t. For example, if m(v) v k , say, then
substituting v = t + r in the condition r = 2m(v), for a fixed t one obtains a
polynomial equation of degree k for r. Moreover, the number of real and positive
solutions to this equation may also jump as one varies t, leading to a perhaps
unexpected behaviour and evolution of (marginally) trapped surfaces when viewed
in a foliation by spacelike hypersurfaces.
with
2m(v, r)
f (v, r) = 1 , (31.144)
r
where the mass function m(v, r) can be invariantly characterised as the Misner-
Sharp mass (23.75)
r(z)
MM S (z) m(z) = 1 gab (z)a r(z)b r(z) . (31.145)
2
120
For some attempts to define surface gravity for non-Killing horizons, see e.g. A. Nielsen, M. Visser,
Production and decay of evolving horizons, arXiv:gr-qc/0510083, A. Nielsen, J. Yoon, Dynamical
surface gravity, arXiv:0711.1445 [gr-qc], M. Pielahn, G. Kunstatter, A. Nielsen, Dynamical Sur-
face Gravity in Spherically Symmetric Black Hole Formation, arXiv:1103.0750 [gr-qc], B. Cropp, S.
Liberati, M. Visser, Surface gravities for non-Killing horizons, arXiv:1302.2383 [gr-qc].
706
Also in this case the spheres Sv,r with
In the present case we are thus also led to consider the union of all the marginally
trapped spheres (as v varies),
First of all, let us obtain some more information about T and, in passing, introduce
some (actually quite a bit of) terminology:121
707
In the present case, T has the additional property that as one moves inwards, i.e.
along n, the expansion decreases, i.e. becomes negative inside of the MTT
T . Specifically, we have
r 2m(v) 1
(Ln )|r=2m(v) = 2
|r=2m(v) = 2 < 0 . (31.149)
r r r
This means that just inside T there are genuinely trapped surfaces with < 0.
An MTT with this property is called a future outer trapping horizon (FOTH)
in the terminology introduced by Hayward in influential early work on trapped
surfaces and associated notions of horizons.122
It is also useful to explicitly construct the tangent vector field to T that connects
the different MTSs (specifically, that connects the points with the same values of
and as v varies), i.e. the evolution vector field of the MTSs.123 This is the
purely radial linear combination
E = A + Bn (31.151)
708
so that B = 2m (v) < 0 and
which confirms that T is spacelike (null) where m (v) > 0 (m (v) = 0).
E E = 2B (31.156)
and therefore T is spacelike for B < 0, null for B = 0 and timelike for B > 0. In
the null case, E = is akin to the usual null tangent and normal of an event or
Killing horizon.
E = Bn , (31.159)
and thus with n < 0 the sign of the expansion of T is correlated with the signature
of T ,
B < 0 T spacelike and expanding
B=0 T null and constant area (31.160)
B>0 T timelike and contracting .
709
quasi-local geometric notions of horizons have been intensely studied in recent
years and are still an active area of research.124
Finally, it is or was quite common to use the term apparent horizon in this context,
as a notion of a horizon associated with trapped surfaces (and a choice of folia-
tion of the space-time by spacelike hypersurfaces). Because of various technical
complications125 and because of the difficulty in locating the apparent horizons
even in situations where it is well-defined, the precise definition of an apparent
horizon has been pretty much abandoned in favour of those given above (and will
therefore not be given here).
In practice, nowadays the term apparent (3-)horizon appears to be used as syn-
onymous with, say, the outermost surface with = 0 or the MTT consisting of
such surfaces.
3. this MTT T is
124
See e.g. the following review articles (and references therein and thereto): A. Ashtekar, B. Kr-
ishnan, Dynamical Horizons and their Properties, arXiv:gr-qc/0308033; A. Ashtekar, B. Krish-
nan, Isolated and dynamical horizons and their applications, Living Rev. Relativity 7, (2004) 10.
http://www.livingreviews.org/lrr-2004-10, arXiv:gr-qc/0407042; I. Booth, Black Hole Bound-
aries, arXiv:gr-qc/0508107; A. Nielsen, Black holes and black hole thermodynamics without event hori-
zons, arXiv:0809.3850 [hep-th]; J. Engle, T. Liko, Isolated horizons in classical and quantum gravity,
arXiv:1112.4412 [gr-qc]; B. Krishnan, Quasi-local back hole horizons, arXiv:1303.4635 [gr-qc].
125
See e.g. section 1.6 of P. Chrusciel, Black Holes, arXiv:gr-qc/0201053.
710
(a) null and and of constant area and consists of isolated horizon sections where
m (v) = 0,
(b) spacelike and expanding and consists of dynamical horizon sections where
m (v) > 0;
4. the MTT T is also a future outer trapping horizon (FOTH) in the sense of Hay-
ward.
In this summary I have emphasised spherical symmetry. Indeed, the MTSs and the
MTT T identified above are not unique:
3. and one can also study them from the point of view of non-spherically symmetric
slicings of space-time into spacelike hypersurfaces.
We will return to the 1st and 2nd items in the discussion in section 31.13 below. For
the 3rd, note that one simple (axially but) not spherically symmetric choice of slicing
is provided by modifying (31.141) to
t = v r r cos , (31.161)
where is a constant indicating how far from spherical symmetry the constant t
surfaces are.126
In the previous subsection we have tentatively identified and defined various (quasi-
)local geometric notions of black hole horizons or black hole boundaries based on trapped
surfaces. These local geometric notions of a black hole horizon need to be distinguished
from the global causal notion of a true event horizon, the boundary of the past of future
null infinity I + (see section 31.4), whose existence is usually taken to be the defining
characteristic of a black hole. In the remainder of this section, we will look at various
aspects of the relation between these two concepts of horizons, by way of examples and
some general remarks.
Since, by its definition as a causal boundary, the event horizon is a null surface, it is
already evident from the above examples (with T spacelike when/where m (v) > 0),
126
This slicing and corresponding MTSs and MTTs have been investigated in A. Nielsen, M. Jasiulek,
B. Krishnan, E. Schnetter, The slicing dependence of non-spherically symmetric quasi-local horizons in
Vaidya Spacetimes, arXiv:1007.2990 [gr-qc].
711
that in general a (future, outer) trapping horizon (FOTH) or a marginally trapped tube
(MTT) will not coincide with the event horizon (and by definition a dynamical horizon
cannot coincide with the event horizon).
This is also easy to understand intuitively. The event horizon is much more of a global
and subtle (teleological) object than, say, the apparent horizon, the spherically sym-
metric MTT T , which we have been able to determine without any effort. In order to
determine the event horizon, it is not enough to know if a lightray locally or instanta-
neously moves to larger values of r (this local information is completely captured by
the expansions n and ). In order to be able to assert that this lightray will reach
infinity, i.e. I + , one needs to make sure that it continues to move to larger values of r
in the future. As the future behaviour of the lightray depends on the future evolution
of the geometry (e.g., in the present context, on the form of the mass function m(v)),
it is clear that the location of the event horizon at a given time cannot be determined
without knowing the (entire!) future evolution of that space-time.
More specifically, in the present context of the Vaidya metric, if one has an initially
really outgoing lightray at some time vi , i.e. at some ri > 2m(vi ), it is not guaranteed
that this lightray will remain at r > 2m(v) for all v. If it crosses the apparent horizon
r = 2m(v) at some later time v, it reaches a local maximum of r there,
and then (at least at first) returns to smaller values of r. In particular, what may have
appeared initially to be a safe radial distance (where one can send lightrays outwards
locally) can become unsafe in the future if the mass increases (m (v) > 0, as we are
assuming).
712
Depending on m(v), this can be done either analytically (for a linear mass function see
section 41) or numerically.
A typical Penrose diagram for the Vaidya space-time, here for the (prototypical) class
of examples (31.118), (
0 v v0
m(v) = (31.164)
m f v v1
is given in Figure 40.
As an aside: to really end up with a black hole space-time as displayed in the Figure,
i.e. in order to avoid the formation of a naked singularity in this collapse, one has to
impose the peculiar condition
See section 41 for a derivation of this for the linear mass dependence with m(v) = v,
leading to the requirement 16 > 1 in this case, and compare with the Penrose diagrams
in Figures 59 - 61.
Returning to Figure 40, note that, in particular, and as we already saw in the Penrose
diagrams describing the thin null shell or Oppenheimer-Snyder collapse, an event horizon
can exist even in flat regions of space-time, and starts growing from r = 0 in anticipation
of matter falling in to form a black hole at a later time. Spherically symmetric trapped
surfaces and the spherically symmetric MTT T , on the other hand, exist only in the
region v > v0 .
Because T is described by the equation r = 2m(v), the above choice of mass function
implies that T starts off at r = 0 at v = v0 , grows to r = 2mf at v = v1 and agress
with the Schwarzschild event horizon at r = 2mf in the Schwarzschild region v > v1 .
It is particularly easy to describe this evolution and growth of the event horizon in the
case of the collapsing spherical shell of null matter in Minkowski space discussed in
section 28.1, with metric (28.3)
2mf
ds2 = f (v, r)dv 2 + 2dv dr + r 2 d2 , f (v, r) = 1 (v) . (31.166)
r
1. For v > 0, i.e. outside the shell, the metric is the Schwarzschild metric and the
event horizon is simply the Schwarzschild event horizon at r = rs = 2mf .
2. Moreover, since outside the shell the geometry is the static Schwarzschild geom-
etry, trapped surfaces exist everywhere in the region r < 2mf , the hypersurface
r = 2mf is foliated by spherically symmetric MTSs and thus outside the shell
713
Figure 40: Event Horizon vs Apparent Horizon (Marginally Trapped Tube) T for the
Vaidya metric, with infalling null matter in the interval [v0 , v1 ]. For v < v0 , the geometry
is that of Minkowski space, for v [v0 , v1 ] the geometry is described by the Vaidya
metric, and for v > v1 one has the Schwarzschild geometry with final mass m = mf .
The Event Horizon starts growing from r = 0 in the flat region and is described by
r = 2mf in the Schwarzschild region. The Apparent Horizon (MTT) T is described by
r = 2m(v). Thus it starts off at r = 0 at v = v0 and reaches r = 2mf at v = v1 , after
which it agrees with the Event Horizon. In the interval [v0 , v1 ], T is spacelike whenever
m (v) > 0.
3. To determine the event horizon in the interior of the shell, i.e. for v < 0, one needs
to determine the S 2 -family of outgoing radial lightrays in Minkowski space which
reaches rs = 2mf at v = 0, and thus connects to the exterior event horizon at the
locus v = 0 of the shell.
Outgoing lightrays in ingoing Minkowski coordinates (v, r) are described by
i.e. by
r(v) = v/2 c/2 . (31.168)
At v = 0 one has
!
r(v = 0) = c/2 = rs = 2mf c = 2rs . (31.169)
714
Therefore the event horizon is described parametrically by
r(v) = v/2 + rs , (31.170)
which starts growing from r = 0 at the time v = 2rs , before the shell has arrived
or crossed its Schwarzschild radius.
4. By contrast, there are no spherically symmetric (we will come back to this qualifier
below) MTTs for v 0, and thus the corresponding marginally trapped tube
(apparent horizon) T is absent for v < 0.
I+
T
=
H+
i0
H+
r=0
v = v0
Figure 41: Event Horizon vs Apparent Horizon (Marginally Trapped Tube) T in the
collapse of a thin null shell to a black hole. The worldline of the shell is given by the line
v = v0 . In the region v < v0 inside the shell the geometry is that of Minkowski space;
the geometry outside the shell is Schwarzschild. Formation of the black hole occurs
when the shell crosses the event horizon H+ . The event horizon starts growing from
r = 0 in the flat Minkowski region and is situated at r = 2mf outside the shell; the
Apparent Horizon exists only outside the shell, and agrees with the Event Horizon there.
The point indicated by a bullet represents a spherically-symmetric trapped sphere, and
there are such trapped spheres for all points in the region v > v0 , 0 < r < 2mf .
715
globally the boundary of the region of space-time that is causally connected to infinity)
in a time-dependent geometry.
The exterior geometry is given by the Schwarzschild metric and the interior geometry
by a solution of the Friedmann equations describing a collapsing sphere of dust. In
Painleve-Gullstrand(-like) coordinates, the metric can be written as (see section 28.6)
( p
2 d 2 + (dr + 2m/rd )2 + r 2 d2 r > R( )
ds = (31.171)
d 2 + (dr rH( )d )2 + r 2 d2 r < R( )
H( ) = 23 ( )1 < 0 , (31.172)
This solution describes a collapsing dust star for < 0, collapsing to zero radius at time
= 0.
As the exterior geometry is just the Schwarzschild geometry, in the exterior region the
event = apparent horizon is the null surface r=2m, coming into existence at the time
= f when the star crosses its Schwarzschild radius, i.e. at the time f given by
The interest is therefore in the formation and evolution of horizons in the interior of the
star. In order to explore the causal structure of this (spherically symmetric) solution,
we look at radial null rays, characterised by
Since H < 0, the former describe ingoing radial null geodesics because
dr
= (1 + rH) < 0 (ingoing) , (31.177)
d
while the latter, satisfying
dr
= (+1 + rH) (outgoing) , (31.178)
d
describe truly outgoing radial null geodesics only for r < 1/H, while these geodesics
are also ingoing for r > 1/H. Thus there are marginally trapped spheres inside the
716
star, centered at r = 0 and with radius r = 1/H. These define an apparent horizon
or a marginally trapped tube T inside the star, at (recall that < 0)
which shows that the apparent horizon is a timelike hypersurface in this case. It is
not clear if such an object deserves to be called a horizon at all, and the terminology
timelike membrane has been proposed for a timelike hypersurface foliated by marginally
trapped surfaces.
In order to determine the event horizon, we need to determine the interior outgoing
lightrays that reach the surface of the star just as the surface of the star passes through
its Schwarzschild radius, i.e. at the time = f determined in (31.174).
r = 1 + rH = 1 + 2r/3 . (31.181)
r( ) = 3 r = 1 + 2r/3 . (31.183)
r( ) = 3 + c0 ( )2/3 . (31.184)
The integration constant is determined by selecting the outgoing lightray with r(f ) =
2m,
r(f ) = 2m c0 = 3C , (31.185)
with C defined in (31.173), leading to the parametric equation
reh ( ) = 3 [ + R( )] (31.186)
Collecting our intermediate results, we see that the surface of the star, the apparent
horizon and the event horizon are described by
717
Figure 42: Event Horizon vs Apparent Horizon (Marginally Trapped Tube) T for the
Oppenheimer-Snyder collapse geometry. The shaded region is the interior of the star,
the surface of the star follows r = R( ). Outside the star, one has the Schwarzschild
geometry with event horizon = apparent horizon the null hypersurface at r = 2m and
the region r < 2m contains spherically symmetric trapped surfaces. The Event Horizon
starts growing from r = 0 in the interior of the star. Inside the star, there is also a
timelike marginally trapped tube, a Timelike Membrane, at r = rah ( ) that starts at
r = 2m as the star crosses its Schwarzschild radius, and then shrinks to r = 0 at the
time = 0 of complete collapse. At time spheres inside the star centered at r = 0
and with radius r > rah ( ) are trapped.
This agrees with the results reported in the reference in footnote 94 in section 28.6.
reh ( = f ) = 2m (31.188)
as it should be.
2. The event horizon starts growing from the non-singular center of the star r = 0
at the time = i < 0 determined by
718
with
reh (i ) = 1 , reh (f ) = 0 (31.192)
so the apparent horizon starts forming as the star crosses its Schwarzschild radius
at = f . Indeed for < f one would have rah ( ) > R( ), but this would be
outside the star (and we determined the apparent horizon by studying the interior
lightrays). It then shrinks from r = 2m at = f to r = 0 at = 0.
4. Thus for f < < 0 the apparent horizon has two branches, one inside the star
and one outside. Inside the star, trapped spheres only occur outside the apparent
horizon, i.e. in the region between the apparent horizon and the surface of the
star. Outside the star, they of course occur in the Schwarzschild black hole region
r < reh = 2m.
In the above examples, we have seen that in general dynamical situations marginally
trapped tubes or trapping horizons will typically not coincide with the event horizon.
Moreover, from what we have seen so far, locally nothing particularly untoward or
dangerous seems to be happpening in the region between the two, the danger apparently
revealing itself only through the future evolution of the space-time.
However, in the above we have restricted attention to spherically symmetric MTSs and
MTTs (the apparent horizon), and I had mentioned that even in spherical symmetry
there can and will be non-spherically symmetric MTSs and MTTs. This non-uniqueness
of MTTs, and the question what happens in the region between the apparent horizon and
719
event horizon provide motivations to consider the entire region of space-time containing
trapped surfaces, i.e. the Trapped Region T defined to be the set of space-time points
which lie on at least one trapped surface, spherically symmetric or not. The boundary
B = T (31.194)
of this trapped region, the Trapping Boundary, is then a natural candidate for the
black hole boundary, independent of any choices, and automatically inheriting all the
symmetries of the space-time. It turns out to be surprisingly difficult and delicate,
however, to determine this region precisely, even in simple examples such as the ones
we have discussed here, e.g. the Vaidya metrics, and I will close this section with some
comments on this subject.
1. First of all, instead of considering trapped surfaces one can also consider outer
trapped surfaces ( < 0, no condition on n ), and thus the region To covered by
outer trapped surfaces, and its boundary
Bo = To . (31.195)
It was conjectured by Eardley that this 3-surface actually coincides with the event
horizon, and this conjecture was established by Ben-Dov in the case of Vaiyda
space-times with m (v) 0 and with finite total mass, i.e. with m(v) bounded
from above.127
2. Thus, if or when Eardleys conjecture holds, this seems to provide the desired
almost local characterisation of the event horizon. However, this is somewhat
misleading because the outer trapped spacelike surfaces that are required extend
far into the future and in this way manage to feed back the information about
the future evolution of the space-time into the location of the boundary 3-surface
at an earlier time. This is also referred to as the clairvoyant property of (outer)
trapped surfaces.128 Thus in spite of these results there appears to be no good
local in time characterisation of the event horizon.
3. Ben-Dov also showed that the restriction to genuinely trapped surfaces with n < 0
as well is not enough to fill out the space between the apparent and event horizons.
This issue has been further analysed by Bengtsson and Senovilla who showed that
genuinely trapped surfaces can in principle extend into parts of the flat region
and investigated how far such genuinely trapped surfaces can extend into the
127
D. Eardley, Black Hole Boundary Conditions and Coordinate Conditions, arXiv:gr-qc/9703027; I.
Ben-Dov, Outer Trapped Surfaces in Vaidya Spacetimes, arXiv:gr-qc/0611057.
128
See e.g. J. Senovilla, Trapped Surfaces, arXiv:1107.1344 [gr-qc].
720
intermediate region for Vaidya space-times.129 However, it appears that at present
the location of B has been significantly constrained but has not yet been pinned
down precisely.
(a) there exist trapped surfaces ( < 0, n < 0) that can extend into the flat re-
gion, and B represents a point that lies on such a (necessarily not spherically
symmetric) trapped surface,
(b) any point in the region inside the event horizon can be shown to lie on some
outer trapped surface ( < 0, no condition on n ),
(c) for points sufficiently close to the event horizon and sufficiently far from the
shell (point C) there are no truly trapped surfaces ( < 0, n < 0) through
that point.
721
r=0
i+
I+
T
=
A
H+
B i0
H+
C
r=0
v = v0
Figure 43: Trapped Surfaces in the collapse of a thin null shell to a black hole. See the
body of the text for the description.
6. Apart from some very general properties, very little seems to be known at present
about T and B = T in situations without spherical symmetry.
It seems appropriate to close this section with a quotation from the article just men-
tioned:
We find it puzzling, and indeed intriguing, that the very simple questions
we ask are so difficult to answer.130
722
F: Cosmology
723
32 Cosmology I: Basics
We now turn away from considering isolated systems (stars) to some (admittedly very
idealised) description of the universe as a whole. This subject is known as Cosmology.
It is certainly one of the most fascinating subjects of theoretical physics, dealing with
such issues as the origin and ultimate fate and the large-scale structure of the universe.
Due to the difficulty of performing cosmological experiments and making precise mea-
surements at large distances, many of the most basic questions about the universe are
still unanswered today:
3. What actually happened at (or even before) what is usually called the Big Bang?
4. Why is the Cosmic Microwave Background radiation so isotropic and what can
the miniscule anisotropies tell us about the very early universe?
While recent precision data, e.g. from supernovae surveys and detailed analysis of the
cosmic microwave background radiation, suggest answers to at least some of these ques-
tions, these answers leave less wiggle-room for philosophical prejudices or esthetic pref-
erences and actually just make the universe more mysterious than ever.
Fortunately, however, many of the important features any realistic cosmological model
should display are already present in some very simple models, the so-called Friedmann-
Lematre-Robertson-Walker Models (FLRW models) already studied in the 20s and
30s of the last century. They are based on the simplest possible ansatz for the metric
compatible with the assumption that on large scales the universe is roughly homogeneous
724
and isotropic (cf. the next section for a more detailed discussion of this Cosmological
Principle) and have become the standard model of cosmology.
We will see that they already display all the essential features such as
1. a Big Bang
Our first aim will be to make maximal use of the symmetries that simple cosmological
models should have to find a simple ansatz for the metric. Our guiding principle will
be the Cosmological Principle.
At first, it may sound impossibly difficult to find solutions of the Einstein equations
describing the universe as a whole. However: if one looks at the universe at large (very
large) scales, in that process averaging over galaxies and even clusters of galaxies, then
the situation simplifies a lot in several respects:
2. Furthermore we assume that the earth, and our solar system, or even our galaxy,
have no privileged position in the universe (this is occasionally referred to as the
Copernican Principle). This means that at large scales the universe should look
the same from any point in the universe. Mathematically this means that there
should be translational symmetries from any point of space to any other, in other
words, space should be homogeneous.
3. Also, we assume that, at large scales, the universe looks the same in all directions.
Thus there should be rotational symmetries and hence space should be isotropic.
Together, the second and third assumptions form the Cosmological Principle, which is
the starting point for our discussion of cosmology and on which much of the work in
cosmology is based. It is plausible (and true) that the assumption of isotropy (around
us) can be tested experimentally / observationally, while testing the assumption of
homogeneity is evidently going to be more tricky.131
131
The extent to which these assumptions, in particular homogeneity, can be observationally tested is
discussed e.g. in C. Clarkson, R. Maartens, Inhomogeneity and the foundations of concordance cosmol-
ogy, arXiv:1005.2165 [astro-ph.CO], G. Ellis, Inhomogeneity effects in Cosmology, arXiv:1103.2335
[astro-ph.CO], R. Maartens, Is the Universe homogeneous?, arXiv:1104.1300 [astro-ph.CO].
725
Making the above assumptions, it follows from our discussion in section 13, that the n-
dimensional space (of course n = 3 for us) has n translational and n(n 1)/2 rotational
Killing vectors, i.e. that the spatial metric is maximally symmetric. For n = 3, we
will thus have six Killing vectors, two more than for the Schwarzschild metric, and the
ansatz for the metric will simplify accordingly.
Note that, since we know from observation that the universe expands, we do not require
a priori a maximally symmetric space-time as this would imply that there is also a
timelike Killing vector.
What simplifies life considerably is the fact that, as we have seen, there are only three
species of maximally symmetric spaces (for any n), namely flat space Rn (with its
standard Euclidean metric), the sphere S n (with tis standard round metric), and its
negatively curved counterpart, the n-dimensional pseudosphere or hyperboloid we will
call H n .
Thus, for a space-time metric with maximally symmetric spacelike slices, the only
unknown is the time-dependence of the overall size of the metric. More concretely, the
metric can (now fixing the number of spatial dimensions to be n = 3) be chosen to be
dr 2
ds2 = dt2 + a2 (t)( + r 2 d2 ) , (32.1)
1 kr 2
where k = 0, 1 corresponds to the three possibilities mentioned above. Thus the
metric contains only one unknown function, the radius or cosmic scale factor a(t).
This function will be determined by the Einstein equations via the matter content
of the universe (we will of course be dealing with a non-vanishing energy-momentum
tensor), modelled by a perfect fluid.
One paradox, popularised by Olbers (1826) but noticed before by others is the following.
He asked the seemingly innocuous question Why is the sky dark at night?. According
to his calculation, reproduced below, the sky should instead be infinitely bright.
The simplest assumption one could make in cosmology (prior to the discovery of the
Hubble expansion) is that the universe is static, infinite and homogeneously filled with
stars. In fact, this is probably the naive picture one has in mind when looking at
the stars at night, and certainly for a long time astronomers had no reason to believe
otherwise.
However, these simple assumptions immediately lead to a paradox, namely the conclu-
sion that the night-sky should be infinitely bright (or at least very bright) whereas, as
we know, the sky is actually quite dark at night. This is a nice example of how very
simple observations can actually tell us something deep about nature (in this case, the
nature of the universe). The argument runs as follows.
726
1. Assume that there is a star of brightness (luminosity) L at distance r. Then, since
the star sends out light into all directions, the apparent luminosity A (neglecting
absorption) will be
A(r) = L/4r 2 . (32.2)
2. If the number density of stars is constant, then the number of stars at distances
between r and r + dr is
dN (r) = 4r 2 dr . (32.3)
Hence the total energy density due to the radiation of all the stars is
Z Z
E= A(r)dN (r) = L dr = . (32.4)
0 0
Now what is one to make of this? Clearly some of the assumptions in the above are
much too naive. The way out suggested by Olbers is to take into account absorption
effects and to postulate some absorbing interstellar medium, but this is also too naive
because in an eternal universe we should now be in a stage of thermal equilibrium.
Hence the postulated interstellar medium should emit as much energy as it absorbs, so
this will not reduce the radiant energy density either.
Of course, the stars themselves are not transparent, so they could block out light com-
pletely from distant sources, but if this is to rescue the situation, one would need to
postulate so many stars that every line of sight ends on a star, but then the night sky
would be bright (though not infinitely bright) and not dark.
Modern cosmological models can resolve this problem in a variety of ways. For instance,
the universe could be static but finite (there are such solutions, but this is nevertheless
an unlikely scenario) or the universe is not eternal since there was a Big Bang (and
this is a more likely scenario).
We have already discussed one of the fundamental inputs of simple cosmological models,
namely the cosmological principle. This led us to consider space-times with maximally-
symmetric spacelike slices. One of the few other things that is definitely known about
the universe, and that tells us something about the time-dependence of the universe, is
that it expands or, at least that it appears to be expanding.
In fact, in the 1920s and 1930s, the astronomer Edwin Hubble made a remarkable
discovery regarding the motion of galaxies. He found that light from distant galaxies is
727
systematically redshifted (increased in wave-length ), the increase being proportional
to the distance d of the galaxy,
z := d . (32.5)
Hubble interpreted this redshift as due to a Doppler effect and therefore ascribed a
recessional velocity v = cz to the galaxy. While, as we will see, this pure Doppler shift
explanation is not tenable or at least not always the most useful way of phrasing things,
the terminology has stuck, and Hubbles law can be written in the form
v = Hd , (32.6)
where H is Hubbles constant. To set the historical record straight: credit for this
fundamental discovery should perhaps (also) go to G. Lematre.132
We will see later that in most cosmological models H is actually a function of time, so
the H in the above equation should then be interpreted as the value H0 of H today. It is
one of the main goals of observational cosmology to determine H0 and H as precisely as
possible, and the main problem here is naturally a precise determination of the distances
of distant galaxies. This is a complex and fascinating issue in its own right, but one
that we will not go into here (safe for a brief mention of the luminosity distance at the
end of section 33.8).133 I will just conclude this section with one comment on the units
usually employed to express galactic distances and the Hubble constant H0 .
H0 = 100hkm/s/Mpc ,
h = 0.71 0.06 . (32.7)
We will usually prefer to express it just in terms of inverse units of time. The above
132
See e.g. M. Way, H. Nussbaumer, The linear redshift-distance relationship: Lematre beats Hubble
by two years, arXiv:1104.3031v1 [physics.hist-ph]; J.-P. Luminet, Editorial note to The begin-
ning of the world from the point of view of quantum theory, arXiv:1105.6271v1 [physics.hist-ph],
and in particular also M. Livio, The Expanding Universe: Lost (in Translation) and Found,
http://hubblesite.org/pubinfo/pdf/2011/36/pdf.pdf for an important addition to this debate.
133
For a beautiful introduction to this subject of the Cosmic Distance Ladder, see the pdf-slides of
a public talk by Fields Medalist (the mathematics counterpart of a Nobel Laureate!) Terry Tao at
http://terrytao.files.wordpress.com/2010/10/cosmic-distance-ladder.pdf (you can also find a
video of the talk on youtube).
728
result leads to an order of magnitude range of
(whereas Hubbles original estimate was more in the 109 year range).
729
33 Cosmology II:
Geometry and Physics of Robertson-Walker Metrics
Having determined that the metric of a maximally symmetric space is of the simple
form (13.27), we can now deduce that a space-time metric satisfying the Cosmological
Principle can be chosen to be of the form (32.1),
2 2 2 dr 2 2 2
ds = dt + a (t) + r d . (33.1)
1 kr 2
Here we have used the fact that (as in the ansatz for a spherically symmetric metric) non-
trival gtt and gtr can be removed by a coordinate transformation. This metric is known
as the Friedmann-Robertson-Walker metric or just the Robertson-Walker metric, and
spatial coordinates in which the metric takes this form are called comoving coordinates,
for reasons that will become apparent below. The function a(t), the radius of the
universe, is known as the cosmic (or cosmological) scale factor.
Remarks:
In the latter case, the 3 different possibilities for the spatial geometry, are distin-
guished by k < 0, k = 0, k > 0. For the most part we work with the first option,
but for certain questions (like why is space so close to being flat today?) it
is convenient to rephrase and express this in terms of k (why is k so close to
zero?), which is evidently only meaningful if one does not restrict k to the 3
discrete values k = 0, 1.
2. Another convenient way of representing this metric, that we will occasionally make
use of below, is as
730
where (cf. (13.34) and (13.35))
k=0
gk () = sin k = +1 (33.4)
sinh k = 1
gij = a2 (t)
gij , (33.5)
where gij is the maximally symmetric spatial metric. Thus for k = +1, a(t)
directly gives the size (radius) of the universe. For k = 1, space is infinite, so
no such interpretation is possible, but nevertheless a(t) still sets the scale for the
geometry of the universe, e.g. in the sense that the curvature scalar R(3) of the
(3) of gij by
metric gij is related to the curvature scalar R
1 (3) .
R(3) (t) = R (33.6)
a2 (t)
Finally, for k = 0, three-space is flat and also infinite, but one could replace R3
by a three-torus T 3 (still flat but now compact) and then a(t) would once again
be related directly to the size of the universe at constant t.
4. Through the dependence of a(t) on t, proper length scales and distances in the
constant time surfaces depend on time. Thus a(t) changes or sets the scale, i.e.
a(t) plays the role of a cosmological scale factor.
5. Note that the case k = +1 opened up for the very first time the possibility of
considering, even conceiving, an unbounded but finite universe! These and other
generalisations made possible by a general relativistic approach to cosmology are
important as more naive (Newtonian) models of the universe immediately lead
to paradoxes or contradictions (as we have seen e.g. in the discussion of Olbers
paradox in section 32.3).
Let us quickly rederive this result here. Note that, since gtt = 1 is a constant and the
off-diagonal time - space components of the metric are zero, gtk = 0, one has
tt = 0 . (33.7)
731
X X
X
X X
X
X X X X
X
X X X
X
X X X
Figure 44: Illustration of a comoving coordinate system: Even though the sphere (uni-
verse) expands, the Xs (galaxies) remain at the same spatial coordinates. These tra-
jectories are geodesics and hence the Xs (galaxies) can be considered to be in free fall.
The figure also shows (cf. the discussion in section 5.8) that it is the number density
per unit coordinate volume that is conserved, not the density per unit proper volume.
Therefore the vector field t is geodesic, which can be expressed as the statement that
t t := tt = 0 . (33.8)
In simpler terms this means that the curves ~x = const. (~x referring to the spatial
coordinates),
(t( ), ~x( )) = (, ~x0 ) (33.9)
are geodesics.
Hence, in this coordinate system, observers remaining at fixed values of the spatial
coordinates are in free fall. In other words, the coordinate system is falling with them
or comoving, and the proper time along such geodesics coincides with the coordinate
time or cosmic time t, d = dt. It is these observers of constant ~x or constant (r, , )
who all see the same isotropic universe at a given value of t.
Remarks:
1. This may sound a bit strange but a good way to visualise such a coordinate system
is, as in Figure 44, as a mesh of coordinate lines drawn on a balloon that is being
inflated or deflated (according to the behaviour of a(t)). Draw some dots on that
balloon (that will eventually represent galaxies or clusters of galaxies). As the
balloon is being inflated or deflated, the dots will move but the coordinate lines
will move with them and the dots remain at fixed spatial coordinate values. Thus,
as we now know, regardless of the behaviour of a(t), these dots follow a geodesic,
and we will thus think of galaxies in this description as being in free fall.
732
2. Recall that we had already encountered analogous comoving coordinates in our
discussion of Lematre coordinates for the Schwarzschild metric in section 26.3 and
subsequently in equation (28.26) of section 28.3, when we had introduced proper
time of the freely falling particles on the surface of the star to describe the metric
induced on the surface of the star.
The worldlines of comoving observers discussed above are special timelike geodesics. To
discuss the general case, it will be convenient to write the metric in the form (33.3)
By spatial maximal symmetry and the associated conserved (angular) momenta, we can
without loss of generality consider motion in the (t, )-direction, so that t and are
related by
t2 a(t)2 2 = 1 , (33.11)
Even though we do not have a timelike Killing vector (and its associated conserved
energy) to further simplify this, in the case at hand we have plenty of spacelike Killing
vectors V with V t = 0. Among them there will be -translational Killing vectors
which have the form
V = f (, ) + . . . (33.12)
(the 3-dimensional counterparts of the Killing vectors V(1) , V(2) (8.53) of the 2-sphere,
say). Associated to any such Killing vector and the timelike geodesic there is the
conserved momentum
(, ) .
P = x V = a(t)2 f (33.13)
Since and are constant along the lightray, we can absorb f (, ) into the definition
of P , and thus we have
= P/a2 . (33.14)
733
In particular, we see that comoving observers are characterised by P = 0,
P =0 = 0 (comoving) , (33.16)
and that precisely for these observers the cosmic time t coincides with their proper time,
P =0 t = 1 dt = d . (33.17)
Nevertheless, even in the general case it is useful to combine the two previous equations
to obtain an equation for as a function of t, namely (assuming P 6= 0)
d(t) P 1
= =p = p . (33.18)
dt t 4 2
a(t) + P a(t)2 a(t) 1 + a(t)2 /P 2
In general, even for simple power-law behaviours for a(t) (which we will typically find as
solutions to the Einstein equations in the spatially flat case k = 0), this equation cannot
be solved in closed form (but can be approximated by a tractable, even elementary,
integral when a(t) |P | or when a(t) |P |).
dr
dR = a(t) Rp (t) = a(t)fk (r1 ) , (33.19)
(1 kr 2 )1/2
where
r k=0
fk (r) = arcsin r k = +1 (33.20)
(sinh)1 r k = 1
Note that the fk (r) are the inverses of the functions gk () defined in (33.4) (the precise
form of fk (r) will however be irrelevant for this argument). If we use the coordinates
(t, ) instead of (t, r), the instantaneous proper distance to a point with coordinate 1
simply has the form
Rp (t) = a(t)1 . (33.21)
734
Here we have introduced the Hubble parameter
a(t)
H(t) = , (33.23)
a(t)
which plays a pivotal role in cosmology.
The relation (33.22) clearly expresses something like Hubbles law v = Hd (32.6): all
objects run away from each other with velocities proportional to their distance. We
will have much more to say about H(t), and about the relation between distance and
redshift z, below.
d2 d d a
(t)
Rp (t) = ( H(t))Rp (t) + H(t) Rp (t) = Rp (t) (33.24)
dt2 dt dt a(t)
since
d a 2
(t) a(t) a
(t)
H(t) = 2
= H(t)2 . (33.25)
dt a(t) a(t) a(t)
Thus the cosmological expansion or contraction can be visualised as acting like a linear
harmonic oscillator force on the separation of comoving objects, with (in general time-
dependent) real or imaginary frequency (t),
d2 a
(t)
Rp (t) + (t)2 Rp (t) = 0 , (t)2 = . (33.26)
dt2 a(t)
Universes with an accelerating expansion thus lead to imaginary frequencies, and hence
to an exponential-like rather than harmonic motion (over periods of time during which
the time-dependence of the frequency can be neglected).
If the object is not comoving, i.e. is not sitting at a fixed value of , say, one has
735
Alternatively this follows from (33.18), which allows us to express the peculiar velocity
as
d(t) 1
Vpec (t) = a(t) =p . (33.31)
dt 1 + a(t)2 /P 2
Either way we see that the peculiar velocity is always subluminal.
However, the recessional velocity is not restricted in this way and in most cosmological
models superluminal recessional velocities will occur for objects which are suffficiently
far away (in the sense of having a sufficiently large proper distance Rp (t)).
Remarks:
1. There is absolutely nothing illegal or pathological about this because Vrec measures
the rate of an increase in distance between two objects, not any locally measurable
velocity. For example, even in Special Relativity, if in your rest-frame you send off
two objects in opposite directions at speeds > c/2 each, then the distance between
them grows at a rate (measured with respect to your proper time) larger than c,
but clearly you have not violated or disproven Special Relativity by doing this.
At any given time t, objects further away than t appear to have superluminal
recession velocities. On the other hand, as shown in section 36.1, the space-time
with a(t) = t and k = 1 is just a part of Minkowski space. Thus even Minkowski
space can be foliated in such a way (by hyperboloids in the future lightcone) that
events appear to have superluminal recession velocities, but evidently there is
nothing here that violates any of the postulates of Special Relativity.
4. Cosmologists frequently refer to the sphere beyond which the recessional velocity
exceeds the speed of light as the Hubble sphere. Its radius RH (t) at time t, the
Hubble radius, is (restoring for once and temporarily the speed of light c)
736
Misleadingly, this surface is also often referred to as the Hubble horizon, the reason
for this apparently being the idea or belief that we can never observe objects
outside the Hubble sphere, but this is in general not correct. In particular, it is
not correct to say (perhaps based on a mistaken analogy with special relativistic
reasoning) that objects with recessional velocities Vrec > c are infinitely redshifted
and therefore invisible to us. There is indeed a cosmological redshift, worked out
in section 33.7, and there is also an ensuing Hubble-like redshift - distance relation
in Robertson-Walker geometries, derived in section 33.8. However, this cannot be
written as a standard special relativistic recessional velocity - redshift relation
involving the recessional velocity.135
5. It is true that there are limits to how much of the universe one can observe at
any given time, and it is also true that in certain situations the Hubble radius
RH (t) provides one with an order of magnitude estimate of the size of the visible
universe.
However, it is certainly misleading (even though some people appear to be obsessed
with this) to think of the visible universe (or the inside of the Hubble sphere) as
somehow being like the inside of a Schwarzschild black hole or some such nonsense.
In fact, with any standard definition of a black hole going beyond pop-sci culture
wisdom this statement is so obviously wrong or misleading in so many respects
that I dont even know where to start (so dont get me started, but you may enjoy
poking holes into this statement yourself . . . ).
We will return to some of these issues in a (slightly) more quantitative way later on, in
sections 35.7 35.9, with the Friedmann equations (i.e. the Einstein equations for the
standard model of cosmology we are in the process of developing) at our disposal.
33.4 Painlev
e-Gullstrand-like Coordinates for Comoving Observers
In section 26.2 we introduce coordinates for the Schwarzschild metric that are adapted
to radial geodesic observers, known as Painleve-Gullstrand coordinates. We can do
something analogous for the comoving observers of the Robertson-Walker metrics.
For k = 0 the Robertson-Walker metric is
To put this into PG-like form, we keep t (which is, after all, already the proper time
of comoving observers) but introduce, instead of the comoving coordinate r the area
radius
r(t, r) = a(t)r . (33.35)
135
See also T. Davis, C. Lineweaver, Superluminal Recession Velocities, arXiv:astro-ph/0011070 for
a nice discussion of this.
737
In terms of this the Robertson-Walker metric takes the PG-like form
Remarks:
3. Performing the same coordinate transformation to the area radius r = a(t)r for
k 6= 0, one finds the metric
1
ds2 = dt2 + r rH(t)dt)2 + r2 d2 .
(d (33.39)
r 2 /a2
1 k
4. While these PG-like coordinates are not widely used in the cosmological context,
we make use of them in section 28.5, in the description of the interior geometry
of a collapsing star (because this PG-like form of the metric makes it particularly
easy to match the interior metric to the exterior Schwarzschild metric).
where a tilde refers to the maximally symmetric spatial metric, we see that it is natural
to introduce a new time-coordinate through
d = dt/a(t) , (33.41)
738
in terms of which the Robertson-Walker metric takes the simple form
(a() is short (and sloppy) for a(t())). In terms of polar coordinates (33.3), this
becomes
ds2 = a2 ()(d 2 + d 2 + gk ()2 d2 ) . (33.43)
Remarks:
2. Since the -dependence resides exclusively in the overall conformal factor of the
metric, the vector is a conformal Killing vector of the Robertson-Walker metric
in the sense of (9.5),
a ()
C = : C + C = 2 g = 2a(t)
g . (33.44)
a()
3. We will use conformal time in section 36.3 to solve the cosmological Einstein
equations in a particular case, and we will use the above conformal Killing vector
and the associated conserved charge for null geodesics (9.9) in the discussion of
the cosmological redshift in section 33.7.
In the spatially flat case k = 0, (33.43) shows that Robertson-Walker metrics are con-
formally flat, here written in radial polar coordinates,
This is actually true for all Robertson-Walker metrics, i.e. also for k 6= 0, but the
coordinate transformation required to exhibit this as explicitly as in the k = 0 case is
somewhat more involved.
u= , v =+ , (33.46)
This metric is conformally flat if the line element in brackets, which does not depend
on the cosmic scale factor a(t) or a(), is conformally flat, so let us focus on
739
where, recall, gk () = sin (h) for k = 1 respectively.
Given a metric of this form, it is natural to consider transformations of the form
because under such transformations dudv and dU dV are just related by an overall
conformal factor. It is now straightforward to check, using some basic trigonometric
identities (or their hyperbolic counterparts) that the specific coordinate transformation
s2 into
is such that it transforms d
s2 = cos (h)2 u/2 cos (h)2 v/2 dU dV + 41 (V U )2 d2 .
d (33.51)
Now the metric in brackets is just the Minkowski metric, written in radial null coordi-
nates, as can be seen by undoing the transformation to null coordinates through
U =T R , V =T +R , (33.52)
which results in
dU dV + 41 (V U )2 d2 = dT 2 + dR2 + R2 d2 , (33.53)
The aim of this and the subsequent sections is to learn as much as possible about the
general properties of Robertson-Walker geometries (without using the Einstein equa-
tions) with the aim of looking for observational means of distinguishing e.g. among the
models with k = 0, 1.
To get a feeling for the geometry of the Schwarzschild metric, we studied the properties
of areas and lengths in the Schwarzschild geometry. Spatial length measurements are
rather obvious in the Robertson-Walker geometry, so here we focus on the properties of
areas.
We write the spatial part of the Robertson-Walker metric in polar coordinates as (33.3)
740
where gk () = , sin , sinh for k = 0, +1, 1 (see (13.35)). Now the radius of a
surface = 0 around the point = 0 (or any other point, our space is isotropic and
homogeneous) is given by
Z 0
=a d = a0 . (33.55)
0
On the other hand, the area of this surface is determined by the induced metric
a2 gk2 (0 )d2 and is
Z 2 Z
2 2
A() = a gk (0 ) d d sin = 4a2 gk2 (/a) . (33.56)
0 0
For k = 0, this is just the standard behaviour
A() = 42 , (33.57)
but for k = 1 the geometry looks quite different. For k = +1, we have
Thus the area reaches a maximum for = a/2 (or = /2), then decreases again for
larger values of and goes to zero as a. Already the maximal area, Amax = 4a2
is much smaller than the area of a sphere of the same radius in Euclidean space, which
would be 42 = 3 a2 .
This behaviour is best visualised by replacing the three-sphere by the two-sphere and
looking at the circumference of circles as a function of their distance from the origin
(see Figure 45).
For k = 1, we have
A() = 4a2 sinh2 (/a) , (33.59)
so in this case the area grows much more rapidly with the radius than in flat space.
In principle, this distinct behaviour of areas in the models with k = 0, 1 might allow
for an empirical determination of k. For instance, one might make the assumption that
there is a homogeneous distribution of the number and brightness of galaxies, and one
could try to determine observationally the number of galaxies as a function of their
apparent luminosity. As in the discussion of Olbers paradox, the radiation flux would
be proportional to F 1/2 . In Euclidean space (k = 0), one would expect the number
N (F ) of galaxies with flux greater than F , i.e. distances less than to behave like 3 ,
so that the expected Euclidean behaviour would be
N (F ) F 3/2 . (33.60)
Any empirical departure from this behaviour could thus be an indication of a universe
with k 6= 0, but clearly, to decide this, many other factors (redshift, evolution of stars,
etc.) would have to be taken into account. This illustrates as a matter of principle
how the geometry of the spatial slices influences, and can be encoded in, observable
quantities. In practice, however,
741
psi Circle of Radius psi1
psi=psi1
psi=psi2
psi=psi3
The most important information about the cosmic scale factor a(t) comes from the
observation of shifts in the frequency of light emitted by distant sources.
To calculate the expected shift in a Robertson-Walker geometry, let us again place
ourselves at the origin r = 0. We consider a radially travelling electro-magnetic wave
(a lightray) and consider the equation d 2 = 0 or
dr 2
dt2 = a2 (t) . (33.61)
1 kr 2
Since the cosmological scale factor a(t) sets the length scale, one may expect that wave
lengths at different times are related by
(t1 ) a(t1 )
= , (33.62)
(t0 ) a(t0 )
136
G. Ellis, R. Maartens, M. MacCallum, Relativistic Cosmology, section 7.6.
742
leading to the relation
(t0 ) a(t1 )
= , (33.63)
(t1 ) a(t0 )
among the frequencies. This is indeed the correct result.
As in our discussion of the gravitational redshift in section 2.9, I will analyse this
situation in two ways, in a geometric optics approach, where we trace the lightrays
in the above geometry, and in a slightly more covariant language using the geodesic
equation and the symmetries and associated conserved charges. I will also give a third,
essentially one-line (but perhaps at first somewhat obscure looking), derivation based
on the conformal Killing vector (33.44) and its associated conserved charge.
1. Let us assume that the wave leaves a galaxy located at r = r1 at the time t1 .
Then it will reach us at r = 0 at a time t0 given by
Z 0 Z t0
dr dt
fk (r1 ) = = . (33.64)
r1 1 kr 2 t1 a(t)
Note that there will only be a solution to this equation if the light from the galaxy
at r = r1 actually reaches us at a time t0 . In this sense galaxies whose light has not
yet reached us (or may perhaps never reach us) are implicitly (and now, having
said this, explicitly) excluded from the analysis - after all, such galaxies are not
particularly useful for analysing redshifts.
As typical galaxies will be comoving, i.e. have have constant spatial coordinates,
fk (r1 ) (33.20) is time-independent. If the next wave crest leaves the galaxy at r1
at time t1 + t1 , it will arrive at a time t0 + t0 determined by
Z t0 +t0
dt
fk (r1 ) = . (33.65)
t1 +t1 a(t)
Subtracting these two equations and making the (eminently reasonable) assump-
tion that the cosmic scale factor a(t) does not vary significantly over the period
t given by the frequency of light, we obtain
t0 t1
= . (33.66)
a(t0 ) a(t1 )
Indeed, say that b(t) is the integral of 1/a(t). Then we have
743
2. As in derivation 2 of section 2.9, we describe a light ray by the null wave vector
k = (, ~k). The frequency measured by an observer with velocity u is then
= u k (2.178).
Adapting the discussion of timelike geodesics in the Robertson-Walker geometry
in section 33.2 to null rays, we can choose the wave vector to be of the form
k = (t, 0, 0)
, (33.70)
with
t2 + a(t)2 2 = 0 . (33.71)
In section 2.9 we used the timelike Killing vector of the static spherically symmetric
metric, and its associated conserved energy, to relate the measured frequencies for
static obervers at different radial positions and to determine the gravitational
redshift. Here we use one of the spatial Killing vectors to deduce, as in section
33.2, that
= P/a2 (33.72)
The observers we are interested in are the comoving observers at fixed values of the
spatial coordinates, i.e. with u = (1, 0, 0, 0). Thus these measure the frequency
(t0 ) a(t1 )
= , (33.75)
(t1 ) a(t0 )
3. Alternatively, and even more quickly, one can use the conformal Killing vector
(33.44)
C = = a(t)t or C = a(t)u (33.76)
744
Astronomers like to express this result in terms of the redshift parameter (see the
discussion of Hubbles law above)
0 1
z= , (33.78)
1
which in view of the above result we can write as
a(t0 )
z= 1 . (33.79)
a(t1 )
Thus if the universe expands one has z > 0 and there is a redshift while in a contracting
universe with a(t0 ) < a(t1 ) the light of distant glaxies would be blueshifted.
Remarks:
1. This cosmological redshift has nothing to do with the stars own gravitational field
- that contribution to the redshift is completely negligible compared to the effect
of the cosmological redshift.
3. However, like the gravitational redshift, the final result depends only on the posi-
tion (time) of emission and arrival of the lightray, not on the intermediate gravita-
tional field (cosmic scale factor). This illustrates that fundamentally the redshift
is due to the different reference frames used by emitter and observer, not due to
the fact that something happens to the lightray along the way. We will briefly
return to this matter from a slightly different perspective in section 33.10.
4. While the previous remark suggests a purely Doppler-like explanation of the cos-
mological redshift it is best to think of the redshift as a combined effect of gravi-
tational and Doppler redshifts. Without additional choices (like preferred families
of intermediate observers) it is not very meaningful to separate this into the two
and/or to interpret this only in terms of one of them.137
745
6. The cosmic microwave background radiation (CMBR), which originated just a
couple of 100.000 years after the Big Bang ( 370.000 years), has z & 1000. This
was the time when atoms were formed, and the CMBR photons were decoupled
and emitted. This happened at a temperature of Tdec 3000K. Comparing with
the fact the temperature of the CMBR today is Tcmbr 3K and using T a1
(this is essentially a reformulation of our above result for the redshift, since - up
to conversion factors ~ and k - frequency = energy = temperature), one then finds
the above-quoted estimate for z.138
For a long time, reliable data for cosmological redshifts as well as for distance measure-
ments were only available for small values of z, and thus it was common to consider the
case where t0 t1 and r1 are small, i.e. small on cosmological scales. This allows one to
find a redshift-distance relation which can e.g. be written as a power-series in z. Such a
formula is not quite good enough for modern purposes, however, and I will come back
to this below.
Assuming the validity of such an expansion, this allows us in particular to expand a(t)
in a Taylor series,
0 ) + 21 (t t0 )2 a
a(t) = a(t0 ) + (t t0 )a(t (t0 ) + . . . (33.80)
Let us introduce the Hubble parameter H(t) (which already made a brief appearance in
(33.23) of section 33.3) and the deceleration parameter q(t) by
a(t)
H(t) =
a(t)
(33.81)
a(t)
a(t)
q(t) = ,
2
a(t)
and denote their present day values by a subscript zero, i.e. H0 = H(t0 ) and q0 = q(t0 ).
H(t) measures the expansion velocity as a function of time while q(t) measures whether
the expansion velocity is increasing or decreasing. We will also denote a0 = a(t0 ) and
a(t1 ) = a1 . In terms of these parameters, the Taylor expansion can be written as
746
Higher order terms in this expansion are known as jerk (3rd derivative) and snap (4th
derivative).139
We can use the expansion (33.82) to express z as a function of (t0 t1 ), and we can
in principle use (33.64) to express r1 as a function of (t0 t1 ). Combining the two re-
sults and eliminating (t0 t1 ), one therefore obtains the sought-for relation between the
redshift z and the (coordinate-) distance r1 . The result is given in (33.89). The deriva-
tion is primarily an exercise in inverting series expansions and not per se particularly
enlightning.
From (33.82) one finds that the redshift parameter z, as a power series in time, is
1 a1
= = 1 + (t1 t0 )H0 21 q0 H02 (t1 t0 )2 + . . . (33.83)
1+z a0
or
z = (t0 t1 )H0 + (1 + 12 q0 )H02 (t0 t1 )2 + . . . (33.84)
747
This clearly indicates to first order a linear dependence of the redshift on the distance of
the galaxy and identifies H0 , the present day value of the Hubble parameter, as playing
the role of the Hubble constant introduced in (32.6).
Remarks:
1. Note that the linear relation (33.22) between recessional velocity and distance of
comoving objects is exact while (33.89) shows that the relation between redshift
and distance is only approximately linear for small z.
2. Nowadays, cosmologists routinely deal with objects with redshifts z > 1. For
such objects, the relation (33.89), a power-series exansion in z, is evidently not
appropriate. In section 37.1 we will derive a non-perturbative formula for H =
H(z) (the value of the Hubble parameter at the time an object emitted the light
that we now observe with redshift z), namely (37.8)
1/2
1
H(z) = H0 (1 + z) 1 + (M )0 z + ( )0 1 . (33.90)
(1 + z)2
Here M and are the so-called density parameters associated to matter and a
cosmological constant, the subscript 0 denoting their value today (so that H(z) is
expressed in terms of quantities that are in principle directly or indirectly observ-
able).
3. Returning to the case of small z, even in that case (33.89) is not yet a very useful
way of expressing Hubbles law even in that case. First of all, the distance a0 r1
that appears in this expression is not the proper distance (unless k = 0), but is at
least equal to it in our approximation. Note that a0 r1 is the present distance to
the galaxy, not the distance at the time the light was emitted.
4. Even proper distance is not directly measurable or observable and thus, to compare
this formula with experiment, one needs to relate r1 to the measures of distance
used by astronomers.
One practical way of doing this is based on the so-called luminosity distance dL . If for
some reasons one knows the absolute luminosity of a distant star (for instance because it
shows a certain characteristic behaviour known from other stars nearby whose distances
can be measured by direct means - such objects are known as standard candles), then
one can compare this absolute luminosity L with the apparent luminosity A. Then one
can define the luminosity distance dL by (cf. (32.2))
L
d2L = . (33.91)
4A
748
We thus need to relate dL to the coordinate distance r1 . The key relation is
A 1 1 a1 1
= 2 2 = 2 2 . (33.92)
L 4a0 r1 1 + z a0 4a0 r1 (1 + z)2
Here the first factor arises from dividing by the area of the sphere at distance a0 r1 and
would be the only term in a flat geometry (see the discssion of Olbers paradox). In
a Robertson-Walker geometry, however, the photon flux will be diluted. The second
factor is due to the fact that each individual photon is being redshifted. And the third
factor (identical to the second) is due to the fact that as a consequence of the expansion
of the universe, photons emitted a time t apart will be measured a time (1+ z)t apart.
Hence the relation between r1 and dL is
Intuitively, the fact that for z positive dL is larger than the actual (proper) distance of
the galaxy can be understood by noting that the redshift makes an object look darker
(further away) than it actually is.
This can be inserted into (33.89) to give an expression for the redshift in terms of dL ,
Hubbles law
dL = H01 [z + 12 (1 q0 )z 2 + . . .] . (33.94)
and very briefly qualitatively discussing some of the properties of these equations. This
a continuation and variation of the theme begun in section 5.3 (general formalism for
scalar fields in a gravitational field), section 6.8 (scalar field in Rindler coordinates),
and section 25.7 (scalar field in the Schwarzschild space-time).
749
In the original coordinates (t, ~x), the action explicitly takes the form
Z
S[] = 12 dt d3 x a(t)3 2 ()
~ 2 /a(t)2 m2 2 , (33.97)
However, in order to make the setting as close as possible to that of a scalar field
in Minkowski space (which is useful e.g. if one is intent on quantising the scalar field
afterwards), it is useful to employ the conformal time coordinate , already introduced
in (33.41) and defined by
We see that the field has a non-canonical kinetic term A()2 ( )2 , leading to
Euler-Lagrange equations of motion containing friction terms,
2
d
d (A ) = A2 ( + 2(A /A) ) , (33.101)
which are awkward (in the classical theory, but even more so for quantisation). Happily,
these non-canonical terms can be eliminated by the field redefinition
Remarks:
1. Observe that S[] is the standard action for a Klein-Gordon field in Minkowski
space, its only mildly exotic feature being the time-dependent mass term with an
effective mass
m2eff () = m2 A()2 A ()/A() . (33.104)
Thus the interaction of the scalar field with the gravitational background is
entirely encoded in this time-dependent mass term. Note that this effective mass
term is even present when the original field is massless, m2 = 0.
750
2. This purely geometric contribution to the mass term can be interpreted as an
induced non-minimal coupling to the scalar curvature R of the space-time. Indeed,
we have
d d d
A /A = (1/a)(a )(a )a = (aa) = aa + a 2 . (33.105)
dt dt dt
Comparison with the result (34.7) for the Ricci scalar of the Robertson-Walker
metric for k = 0 shows that
3. This non-minimal coupling to the scalar curvature, and the factor 1/6, are (or
should be) reminiscent of the conformal coupling R2 of a scalar field discussed
in section 21.3. If instead of with the action (33.96) we start off with the non-
minimally coupled action (21.102)
Z
4
1
S [] = 2 gd x g + R2 , (33.108)
is simply the action of a free massless scalar field in Minkowski space. This is
as it should be: the action with = 1/6 (and m2 = 0) is conformally invariant,
and the Robertson-Walker metric is conformally flat (this is manifest in (33.99)).
Thus the action with this value of must reduce to the free action in Minkowski
space, the rescaling (33.102) of the scalar field reflecting the non-trivial conformal
weight of a scalar field in D = 4.
When one introduces the non-mimimal -coupling in the action together with a
non-zero explicit mass term, then everything goes through as above, the only
difference being that the effective mass is now -dependent,
m2eff () = A()2 m2 + ( 1/6)R() . (33.112)
751
4. The equations of motion are
Spatial flatness k = 0 brings with it the simplifying feature that we can expand the
spatial dependence of the fields in standard Fourier modes. Upon spatial Fourier
expansion, Z
~
(, ~x) d3 k () e ik.~x ,
~k (33.114)
~k () + k2 ()~k () = 0 (33.115)
k2 () = m2eff () + k2 . (33.116)
5. The crucial feature of this action and the mode equations are their explicit time-
dependence which means that the energy of is not conserved. This in turn will
lead to the important phenomena of particle or mode production in a cosmological
background.
6. For example, one can consider the (evidently highly idealised) situation where
the cosmic scale factor is asymptotically constant in the remote past and in the
remote future. During these early and late periods the metric is essentially the
Minkowski metric (possibly up to a rescaling of the coordinates), and one thus
has a preferred notion of particles during those eras, uniquely determined by
the asymptotic Poincare symmetry. However, these definitions of particles need
not (and will amost invariably not) agree when there is an intermediate time-
dependent phase. For instance, the early time vacuum Heisenberg state would not
be interpreted or seen as a vacuum by the late time observer, and this disagreement
about the particle content is then interpreted as a particle production due to the
time-dependent gravitational field. See the references in footnote 83 (section 26.6)
for a detailed discussion of these and other related fascinating issues.
As an aside, and as a conclusion to this section, let me make some comments on (and
issue a caveat regarding) the seductive picture of an inflating balloon as the model for
an expanding universe, as depicted e.g. in Figure 44 of section 33.2.
1. For many purposes, this picture certainly provides an instructive and illuminating
analogy:
752
it illustrates how an expanding universe can look the same to all comoving
observers;
in particular, it illustrates how it is possible for everything to move away from
any given (comoving) observer without that observer actually being singled
out as special;
it shows that a spatially homogeneous expansion naturally gives rise to a
universal velocity-distance relation;
it illustrates (or is at least meant to illustrate) that expansion of the universe
is something intrinsic and does not mean expansion in and into some space
into which the universe has somehow been embedded.
2. The above picture of the inflating balloon is often used to describe the expansion
of the universe as an expansion of space itself, and it is then, in view of the
success of this picture, tempting to ascribe the behaviour of light and test particles
in the universe (which are mathematically described by the geodesic equations)
to such an expansion of space.
This manner of speaking and this imagery may provide some further useful in-
tuitive understanding for some effects in the general relativistic description of
cosmology. However, unless one defines precisely what one means by this, the
notion of expanding space is not without its pitfalls and as with all analogies here
one runs the risk of pushing this analogy too far. In particular, the danger hides
in the above word ascribe, i.e. in the risk of confusing cause and effect, or cause
and effective description. Fundamentally, there is no (new?) force that (somehow)
acts on space to (somehow) make it expand, and that can therefore be invoked
to explain (in a Newtonian way) the behaviour of particles and light in such a
space-time.140
753
scale factor at the times t0 and t1 , indicating that the redshift is not a cumulative
effect due to the expansion of space while it was traversed by the lightray, but
that it can (equally intuitively and perhaps more correctly) be ascribed to the fact
that emitter and observer do not share the same inertial frame.
4. The issue also, and in particular, arises when it comes to frequently asked questions
such as which objects participate in the cosmic expansion? (do you expand
with the universe? does a hydrogen atom? does our solar system?) or, to use
cosmologists jargon, which objects join the Hubble flow?, which have generated
a lot of confusion over the decades. Here again the expanding space image may
lead to a misleading intuition, in particular when space is then viewed as some
kind of viscous fluid which will invariably drag other objects along with it when
it expands.141
5. These kinds of questions have a long history in general relativity, dating back at
least to an article by Einstein and Straus in 1945 entitled The Influence of the
Expansion of Space on the Gravitational Fields Surrounding the Individual Stars.
The Einstein-Straus solution, known as the Einstein-Straus vacuole is a space-
time that is obtained by a cut-and-paste procedure from a suitable cosmological
solution, removing a ball of mass M and replacing it by a Schwarzschild solution
of the same mass. This is essentially an inside-out version of the Oppenheimer-
Snyder collapse solution (removing a ball from Schwarzschild and replacing it
by a contracting cosmological solution modelling the collapsing star) discussed
in section 28. The Einstein-Straus procedure can also be applied multiple times
around various centers and can then be used to model inhomogeneities in an
otherwise homogeneous universe (and in this context the model is then known to
cosmologists as the Swiss cheese model).
6. Since the work of Einstein and Straus, a lot of work has gone into finding exact
solutions of the Einstein equations that describe gravitational objects like stars or
black holes somehow embedded into cosmological backgrounds. On the basis of
such exact solutions one can then (try to) answer the question if a given bound
object takes part in the cosmic expansion or not, and try to to develop some
intuition for this issue that complements intuition coming from more Newtonian
considerations.
There is a common folklore statement or rule of thumb to the effect that grav-
itationally (or otherwise) bound systems do not expand with the universe, and
while this statement undoubtedly has a certain validity it requires a more precise
141
For a clear discussion of these issues for geodesic but non-comoving particles, highlighting the
necessity to define precisely and carefully what one means by joining the Hubble flow, see L. Barnes,
M. Francis, J. Berian James, G. Lewis, Joining the Hubble Flow: Implications for Expanding Space,
arXiv:astro-ph/0609271.
754
formulation to decide if or when such a statement is not only true but also has
some non-trivial content.
7. The most prominent class of solutions among these hybrid star-cosmology solu-
tions (apart from black holes in (anti-)de Sitter space, the Schwarzschild (anti-)de
Sitter metrics (29.7)) is the so-called McVittie solution, found already in 1933.
It consists of a crude superposition of a (in the simplest case k = 0) Robertson-
Walker metric,
ds2 = dt2 + a(t)2 d~x2 , (33.117)
While this metric is easy to write down, it leads to a somewhat peculiar energy-
momentum tensor, and therefore its physical interpretation and significance are
somewhat obscure. These issues, as well as aspects of the global structure of the
McVittie space-time, continue to be debated in the literature to this day.142
142
For a pedagogical review of these topics and a discussion of the literature see e.g. M. Carrera, D.
Giulini, On the influence of the global cosmological expansion on the local dynamics in the Solar System,
arXiv:gr-qc/0602098, Influence of global cosmological expansion on local dynamics and kinematics,
arXiv:0810.2712 [gr-qc]. For more on the on-going McVittie debate, see also N. Kaloper, M. Kleban,
D. Martin, McVitties Legacy: Black Holes in an Expanding Universe and references therein and thereto.
755
34 Cosmology III:
Friedmann-Lematre-Robertson-Walker Cosmology
So far, we have only used the kinematical framework provided by the Robertson-Walker
metrics and we never used the Einstein equations. The benefit of this is that it allows one
to deduce relations betweens observed quantities and assumptions about the universe
which are valid even if the Einstein equations are not entirely correct, perhaps because
of higher derivative or other quantum corrections in the early universe.
Now, on the other hand we will have to be more specific, specify the matter content and
solve the Einstein equations for a(t). We will see that a lot about the solutions of the
Einstein equations can already be deduced from a purely qualitative analysis of these
equations, without having to resort to explicit solutions (section 35). Exact solutions
will then be the subject of section 36.
Of course, the first thing we need to discuss solutions of the Einstein equations is
the Ricci tensor of the Robertson-Walker metric. Since we already know the curvature
tensor of the maximally symmetric spatial metric entering the Robertson-Walker metric
(and its contractions), this is not difficult.
In this section all objects with a tilde, , will refer to 3-dimensional quantities
calculated with respect to the maximally symmetric metric gij .
2. One can then calculate the Christoffel symbols in terms of a(t) and i . The
jk
non-vanishing components are (we had already established that 00 = 0)
i a i
ijk = , ij0 = , 0ij = aa
gij . (34.2)
jk
a j
756
4. The partial contraction of the purely spatial components of the Riemann tensor
over the spatial indices is thus
ij + 2a 2 gij
Rkikj = R
(34.5)
gij + 2a 2 gij .
= 2k
Remarks:
A formal proof of this is given at the end of section 34.2. It is phrased there as
a statement about the energy-momentum tensor in a Robertson-Walker metric,
but the result is a general statement about the structure of spatially maximally
symmetric space-time tensors. Thus in a sense the only non-trivial content of the
above calculation is in the precise form of the t-dependent coefficients of G00 and
Gij .
757
2. We already know that in a maximally symmetric space not only can we express
the Ricci tensor in terms of the Riemann tensor (namely as a contraction thereof)
but we can also write the Riemann tensor algebraically in terms of the Ricci tensor
(and even just in terms of the Ricci scalar), as is obvious from (34.4).
Even though the Robertson-Walker metrics are not space-time maximally sym-
metric, it is nevertheless true that even in this case the Riemann tensor can be
expressed algebraically in terms of the Ricci tensor. Indeed, it is easy to see that
the components of the Riemann tensor given in (34.3) can be written in terms of
the components of the Ricci tensor in (34.6) simply as
Ri0j0 = 13 R00 ij
(34.9)
Rkilj = 21 (kl Rij kj Ril ) .
On general grounds this follows from the fact, established in section 33.5, that the
Robertson-Walker metrics are conformally flat so that the Weyl tensor vanishes.
3. The significance of this statement lies in the fact that it shows that a vacuum so-
lution of the Einstein equations with spatial maximal symmetry is necessarily flat
Minkowski space. This is perhaps as it should be, and at least vaguely Machian,
but it is still good to have established this here once and for all since by just
solving the vacuum equations one may (and will) find a solution that at first sight
appears to be non-trivial, namely the Milne universe to be discussed in section
36.1, but which can then be shown to be just Minkowski space written in some
non-inertial coordinates. The above result (34.9) shows that this had to be true.
4. Occasionally, in particular for the canonical analysis (i.e. developing the Hamil-
tonian formalism), it is useful to know the Ricci scalar (i.e. the Einstein-Hilbert
Lagrangian) for the slightly more general metric
ds2 = N 2 (t )(dt )2 + a2 (t )d
s2 , (34.10)
where the function N (t ) is known as the lapse function (cf. (20.18)). Instead of
redoing the calculation of the scalar curvature in this case, one can simply use the
change of variable
d 1 d
dt = N (t )dt = (34.11)
dt N dt
to rewrite the final result (34.7) as (a prime on a or N denoting a derivative with
respect to t )
6 6
R= a + a 2 + k) = 2 3 (N (aa + (a )2 ) aa N + kN 3 ) .
(a (34.12)
a2 a N
We will come back to and make use of this result in section 34.6.
5. The results for the Christoffel symbols (34.2) and the Riemann tensor (34.3) are
true in any dimension, i.e. for a general n-dimensional maximally symmetric space,
758
and the first time that a dimension-dependence enters is in the factors of 2 and 3
in equations (34.5) and (34.6), which arise from taking traces. If one replaces the
spatial dimension 3 n, equations (34.6) - (34.8) take the form
a
R00 = n
a
a
k + a 2
Rij = + (n 1) gij
a a2
a
k + a 2
R = 2n + n(n 1) (34.13)
a a2
n(n 1) k + a 2
G00 = 2
2 a
(n 2) k + a 2
a
Gij = (n 1) + gij .
a 2 a2
Next we need to specify the matter content. On physical grounds one might perhaps like
to argue that in the approximation underlying the cosmological principle galaxies (or
clusters) should be treated as non-interacting particles or a perfect fluid (first discussed
in section 6.2). As it turns out, we do not need to do this as either the symmetries
of the metric or comparison with the Einstein tensor determined above fix the energy-
momentum tensor to be that of a perfect fluid anyway.
At the end of this section I will give a formal argument for this using Killing vectors.
Informally we can already deduce this from the structure of the Einstein tensor obtained
above. Comparing (34.8) with the Einstein equation G = 8GN T , we deduce that
the Einstein equations can only have a solution with a Robertson-Walker metric if the
energy-momentum tensor is of the form
T00 = (t)
T0i = 0
Tij = p(t)gij , (34.14)
where p(t) and (t) are some functions of time. A covariant way of writing this tensor
is as
T = (p + )u u + pg , (34.15)
759
on its density, p = p(). The most useful toy-models of cosmological fluids arise from
considering a linear relationship between p and , of the type
p = w , (34.16)
where w is known as the equation of state parameter. Occasionally also more exotic
equations of state are considered, but the above covers a wide variety of commonly
considered fluids and gases and other simple thermodynamic systems.
Consider e.g. a system whose entropy S is some function of the (internal) energy E and
the (spatial) volume V ,
S = S(E, V ) . (34.17)
T dS = dE + pdV (34.18)
implies
1
S S
T = , p=T , (34.19)
E V V E
and thus
p = w = wE/V V V S = wEE S . (34.20)
p = w S = S(V w E) . (34.21)
Here are the most common and useful special cases of the equation of state p = w.
1. Dust
For non-interacting particles, there is no pressure, p = 0, i.e. w = 0, the energy-
momentum tensor has the simple form
T = u u (34.22)
2. Radiation
This corresponds to w = 1/3 (in 1+3 dimensions). One way to see this is to note
that the trace of a perfect fluid energy-momentum tensor is
T = + 3p . (34.24)
760
For electro-magnetic radiation, for example, the energy-momentum tensor is that
of Maxwell theory and hence traceless (6.121). Therefore electromagnetic radia-
tion in an FLRW universe (in particular compatibility with the symmetries implies
neglecting all anisotropies) has the equation of state
Alternatively, this can be deduced from familiar statements about the thermody-
namics of electromagnetic radiation. E.g. S E/T and T 4 imply
which (by (34.21)) also implies w = 1/3 (alternatively this follows from s = S/V
3/4 , and this generalises to s d/d+1 for d spatial dimensions, corresponding to
w = 1/d).
In general spatial dimension d, a traceless perfect fluid energy-momentum tensor
(describing what one might call a Weyl invariant or conformal fluid - cf. the
discussion in section 6.7) thus has the form
T = 0 p = /d T = (g + (d + 1)u u ) . (34.27)
d
3. Cosmological Constant
A cosmological constant , on the other hand, corresponds, as we will see, to a
matter contribution with w = 1, i.e. one has
Remarks:
761
1. In section 21.1 we had introduced various energy conditions, the null energy con-
dition (NEC), the weak energy condition (WEC), the dominant energy condition
(DEC), and the strong energy condition (SEC), and had also analysed their impli-
cations for a perfect fluid energy-momentum tensor. In particular, the conditions
that we found were
NEC: + p 0
WEC: 0 , +p0
(34.30)
DEC: |p|
SEC: + p 0 , + 3p 0 .
With the equation of state p = w these energy conditions can now be written as
conditions on w. For physical (gravitating instead of anti-gravitating) matter one
usually requires at least the condition > 0 (positive energy density). With this
condition, the NEC and the WEC are equivalent and require
while the SEC requires (the 2nd condition is then stronger than the 1st)
Some of the conclusions about the qualitative behaviour of the solutions to the
Einstein equations in section 35 rely on the strict validity of the SEC, i.e. on the
assumption that (at least in the era of interest) the matter content of the universe
is dominated by stuff with w > 1/3.
On the other hand, as we had seen, a cosmological constant has w = 1, and
thus either is negative or p is negative. Therefore this violates either > 0 (the
WEC and the DEC) or + 3p > 0 (the SEC).
2. The equation of state parameter need not necessarily be a constant. Consider for
instance a scalar field . Such a field will respect the symmetries of a Robertson-
Walker metric (and hence can potentially give rise to a solution of the Einstein
equations of the Robertson-Walker form) if it depends only on t and not on the
spatial coordinates, = (t). For such a scalar field, the energy-momentum tensor
(6.75),
T = 12 g g g V () , (34.34)
reduces to
T00 = 21 2 + V () , Tij = 1 2
2 V () gij , (34.35)
762
which is thus indeed of the general perfect fluid form (34.14). The energy and
pressure density are
Note that here we are treating the scalar field as a source term for the Einstein
equations. This should be contrasted with (and should not be confused with) what
what we did in section 5.3, where we developed the general formalism for scalar
fields in a fixed gravitational field, neglecting the backreaction of the matter fields
on the gravitational field / metric.
Note also that (34.37) mimics a cosmological constant, i.e. w = 1, during periods
(of the scalar field slowly rolling down a very flat potential) where the kinetic
energy term is negligible compared with the potential energy term. This can also
be seen directly from the action (5.13), a constant potential leading to a constant
contribution to the matter or gravity Lagrangian, a.k.a. as a cosmological constant,
and plays an important role in models of inflation.
3. Occasionally, more exotic equations of state are also considered in cosmology, such
as that of a Chaplygin gas, with the exotic equation of state
We will briefly return to the properties of the Chaplygin gas at the end of section
34.4 - cf. (34.88) and (34.89), but for the most part we will concentrate on the
linear equations of state p = w in the following.
T = k k (34.39)
T = k k = 0 , (34.40)
763
Here is the formal argument that the energy-momentum tensor necessarily has the form
given in (34.14).143
It is of course a consequence of the Einstein equations that any symmetries of the Ricci
(or Einstein) tensor also have to be symmetries of the energy-momentum tensor. Now
we know that the metric gij has six Killing vectors K (a) and that (in the comoving
coordinate system) these are also Killing vectors of the Robertson-Walker metric,
Therefore also the Ricci and Einstein tensors have these symmetries,
To prove this one can either (non-covariantly) choose, for each Killing vector, an adapted
coordinate system, or one generalises the argument given in section 12.4 for the Ricci
scalar, LK R = 0, to the Ricci tensor.
The Einstein equations then imply that T should have these symmetries,
Moreover, since the LK (a) act like three-dimensional coordinate transformations, in order
to see what these conditions mean we can make a (3 + 1)-decomposition of the energy-
momentum tensor. From the three-dimensional point of view, T00 transforms like a
scalar under coordinate transformations (and Lie derivatives), T0i like a vector, and Tij
like a symmetric tensor. Thus we need to determine what are the three-dimensional
scalars, vectors and symmetric tensors that are invariant under the full six-parameter
group of the three-dimensional isometries.
For scalars we thus require (calling K now any one of the Killing vectors of gij ),
LK = K i i = 0 . (34.44)
Since K i (x) can take any value in a maximally symmetric space (homogeneity), this
implies that has to be constant (as a function on the three-dimensional space) and
therefore T00 can only be a function of time,
For vectors, it is almost obvious that no invariant vectors can exist because any vector
would single out a particular direction and therefore spoil isotropy. The formal argument
(as a warm up for the argument for tensors) is the following. We have
jV i + V j
LK V i = K j jKi . (34.46)
143
This elementary argument, which requires no higher knowledge about Lie groups and invariants,
is adapted from the discussion in section 13.4 of S. Weinberg, Gravitation and Cosmology.
764
i Kj Kij is an arbitrary
We now choose the Killing vectors such that K i (x) = 0 but
anti-symmetric matrix. Then the first term disappears and we have
LK V i = 0 Kij V j = 0 . (34.47)
ki V j = ji V k , (34.49)
T0i = 0 . (34.50)
We now come to symmetric tensors. Once again we choose our Killing vectors to vanish
at a given point x and such that Kij is an arbitrary anti-symmetric matrix. Then the
condition
k Tij +
LK Tij = K k i K k Tkj +
j K k Tik = 0 (34.51)
reduces to
gmk ni Tkj + gmk nj Tik ) = 0 .
Kmn ( (34.52)
If this is to hold for all anti-symmetric matrices Kmn , the anti-symmetric part of the
term in brackets must be zero or, in other words, it must be symmetric in the indices
m and n, i.e.
gmk ni Tkj + gmk nj Tik = gnk mi Tkj + gnk mj Tik . (34.53)
Therefore
gij k
Tij = T . (34.55)
n k
Now we already know that the scalar T kk has to be a constant. Thus we conclude that
the only invariant tensor is the metric itself, and therefore the Tij -components of the
energy-momentum tensor can only be a function of t times gij . Writing this function as
p(t)a2 (t), we arrive at
Tij = p(t)gij . (34.56)
We thus see that the energy-momentum tensor is determined by two functions, (t) and
p(t), precisely as in (34.14).
765
34.3 Conservation Laws and Comoving Congruences
J = n(t)u (34.57)
in covariant form. Here n(t) could be a number density like a galaxy number density.
It gives the number density per unit proper volume. The conservation law J = 0 is
equivalent to
J = 0 t ( gn(t)) = 0 . (34.58)
Thus we see that n(t) is not constant, but the number density per unit coordinate
volume is (as we had already anticipated in the picture of the balloon, Figure 44).
For a Robertson-Walker metric, the time-dependent part of g is a(t)3 , and thus the
conservation law says
n(t)a(t)3 = const. (34.59)
Let us now turn to the conservation laws associated with the energy-momentum tensor,
T = 0 . (34.60)
T i = 0 , (34.61)
turn out to be identically satisfied, by virtue of the fact that the u are geodesic and
that the functions and p are only functions of time. This could hardly be otherwise
because T i would have to be an invariant vector, and we know that there are none
(nevertheless it is instructive to check this explicitly).
T 0 = T 0 + T 0 + 0 T = 0 , (34.62)
which for a perfect fluid with T00 = (t) and Tij = p(t)gij becomes
Inserting the explicit expressions (34.2) for the Christoffel symbols, one finds
a
= 3( + p) . (34.64)
a
Before discussing some special cases of this, it is instructive to rederive the above results
in a somewhat more general and covariant manner.144 Thus we consider a general
144
For a detailed exposition of cosmology in this covariant framework see e.g. G. Ellis, H. van Elst, Cos-
mological Models (Carg`ese Lectures 1998), arXiv:gr-qc/9812046, or the even more detailed monograph
by G. Ellis, R. Maartens, M. MacCallum, Relativistic Cosmology.
766
velocity field u (x) with u u = 1, and the perfect fluid energy-momentum tensor
(34.15),
T = (p + )u u + pg , (34.65)
with for the time being and p arbitrary functions of the space-time coordinates. Note
that u is the covariant derivative along the integral curves of u , the object we
denoted D or D/D in section 4.7. Acting on scalars we will simply denote it, as
usual, by an overdot, i.e.
u (34.66)
etc. Let us now see what the conditon T = 0 (which has to hold if this energy-
momentum tensor is to give us a solution to the Einstein equations) tells us.
T = u u T = ( + u )u + u u . (34.67)
Here u is (and measures) the expansion of the velocity field u (and was
introduced previously, in the context of the Raychaudhuri equation, in (11.34)),
and the last term u u a is its acceleration (4.94), so that we can also write
this equation as
( + )u + a = 0 . (34.68)
u u = 1 u a = 0 , (34.69)
T = 0 + = 0 and a = u u = 0 . (34.70)
Its time (energy flow) component is a continuity equation, while its space (mo-
mentum flow) part tells us that the particles have to move on geodesics.
u h u (g + u u ) = u u = 0 . (34.71)
767
again breaks up nicely into two orthogonal pieces. The part tangent to u tells us
that
+ ( + p) = 0 , (34.73)
Returning thus to the cosmological setting, where we have (correctly, and uniquely as we
now know) chosen the matter to move along geodesics, we are left with the continuity
equation (34.73), which is now the same as (34.64) because for u = (1, 0, 0, 0) in
comoving coordinates one has
1
= u = ( gu ) = a(t)3 t (a(t)3 ) = 3a(t)/a(t)
. (34.75)
g
d
= 3H 2 (q + 1) . (34.77)
dt
This equation is a special case of the Raychaudhuri equation (11.36) for timelike geodesic
congruences,
d
= 31 2 + R u u . (34.78)
d
Indeed, specialising (34.78) to the family of comoving observers in a Robertson-Walker
geometry and noting that
the rotation is zero (either by explicit calculation or, more ot the point, because
the geodesics are orthogonal to the hypersurfaces t = const., or on symmetry
grounds)
768
the shear is zero (either by explicit calculation or on symmetry grounds)
d
= 13 2 R00 = 3a 2 /a2 + 3
a/a (34.79)
dt
which is identical to (34.76). Thus we see how the parameters H(t) and q(t), originally
introduced to characterise the first terms in a Taylor expansion of the cosmic scale factor
also govern the local behaviour of freely falling observers like (clusters of) galaxies. In
particular, in an expanding universe the congruences of comoving observers diverge /
expand ( > 0) but the rate of expansion decreases ( < 0) provided that q > 1.
a
= 3( + p) , (34.80)
a
for some specific equations of state.
1. For instance, when the pressure of the cosmic matter is negligible, like in the
universe today, and we can treat the galaxies (without disrespect) as dust, then
one has
a
w=0 = 3 , (34.81)
a
and this equation can trivially be integrated to
Thus the (proper) density is proportional to the inverse (proper) spatial volume,
an unsurprising (and reassuring) result.
2. On the other hand, if the universe is dominated by, say, radiation, then one has
the equation of state p = /3, and the conservation equation reduces to
a
w = 1/3 = 4 , (34.83)
a
and therefore
(t)a(t)4 = const. (34.84)
The reason why the energy density of photons decreases faster with a(t) than that
of dust is of course . . . the redshift.
769
3. More generally, for matter with equation of state parameter w one finds
5. Quite generally, we see from (34.80) that in an expanding universe (i.e. a > 0),
the energy density of matter satisfying the null energy condition (NEC) + p 0
cannot increase (and will necessarily decrease for matter satisfying the strict NEC,
with + p > 0),
6. As the final example, consider the peculiar Chaplygin gas with equation of state
p = A/ (34.38) with A constant. In this case (34.80) reads
A a
p = A/ = 3 , (34.88)
a
and a universe filled with a cosmological constant at late times (large a(t)),
a(t) large (t) A p(t) A . (34.91)
770
34.5 Einstein and Friedmann-Lematre Equations
After these preliminaries, we are now prepared to tackle (hence first to determine) the
Einstein equations in this setting.
Allowing for the presence of a cosmological constant, we thus consider the equations
G + g = 8GN T . (34.92)
R = 8GN (T 21 g T ) + g . (34.93)
Because of isotropy, there are only two independent equations, e.g. the (00)-component
and any one of the non-zero (ij)-components. Using (34.6), we find
a
3
= 4GN ( + 3p)
a
a
a 2 k
+ 2 2 + 2 2 = 4GN ( p) + . (34.94)
a a a
Alternatively, instead of the spatial components of (34.93) one could have used the
(00)-component of (34.92) and the expression in (34.8) for G00 to deduce
a 2 k 8GN
G00 = 8GN T00 + = + (34.95)
a2 a2 3 3
(and this equation could have also been obtained by appropriate subtraction of the
previous 2 equations). Either way, we supplement these by the conservation equation
a
= 3( + p) . (34.96)
a
and thus end up with the set of equations
a 2
(F 1) a2
+ ak2 = 8G3 + 3
N
Remarks:
771
2. In terms of the Hubble parameter H(t) and the deceleration parameter q(t), these
equations can also be written as
(F 1 ) H 2 = 8G k
3 a2 + 3
N
4. (F2), on the other hand, can be interpreted as the Raychaudhuri equation (34.79).
Indeed, with = 3a/a
and
5. Introducing some fixed comoving volume v and its associated proper volume
V = a3 v, and noting that
dV (t)
= 3a(t)2 a(t)v
= 3H(t)V , (34.100)
dt
the conservation equation (F3) can be written in a more suggestive and familiar
(mechanical or thermodynamical) form as
dE dV
= P (34.101)
dt dt
where E = V and P = pV are the total energy and pressure in the volume V
respectively.
6. In writing the above equations, I have separated out the cosmological constant
from the remaining matter contributions. Of course, using (34.29), it could
have just been treated as one other perfect fluid contribution (with w = 1).
Occasionally either one or the other way of writing these equations is (marginally)
more convenient.
7. Note that because of the Bianchi identities, the Einstein equations and the con-
servation equations should not be independent, and indeed they are not:
(a) It is easy to see that (F1) and (F3) imply the second order equation (F2) so
that, a pleasant simplification, in practice one only has to deal with the two
first order equations (F1) and (F3). Sometimes, however, (F2) is easier to
solve than (F1), because it is linear in a
(t), and then (F1) is just used to fix
one constant of integration.
772
(b) It is also easy to see that (F1) and (F2) imply (F3), i.e. that the gravity
equations of motion imply the matter equations of motion, a general and
fundamental feature of general relativity.
(c) Finally, formally (F2) and (F3) also imply (F1), with k (which only appears
in (F1)) arising as an integration constant.
8. One can use (34.13) to determine the analogue of the Friedmann equations in any
space-time dimension D = n + 1.
T = + np , (34.103)
one finds
a
= [(n 2) + np] . (34.104)
a n(n 1)
Finally the continuity equation takes the form
a
= n( + p) . (34.105)
a
9. As an aside note that, as the Robertson-Walker metrics are, in particular, spheri-
cally symmetric, and written in the manifestly spherically-symmetric form (23.73),
we have the notion of the Misner-Sharp mass (23.75) for spherical symmetry at
our disposal. In terms of the area radius (33.35),
Using the Friedmann equation (F1), this can be written in a more informative
way as
1 3 3 8GN
m(t, r) = 2 a(t) r +
3 3
(34.109)
4 3
= r GN ( + ) ,
3
773
with = /8GN the energy density (34.29) associated with the cosmological
constant. Note that this result, which again, as in section 23.6, has the interpre-
tation as mass = coordinate volume (not proper volume) density, does not
depend explicitly on either the pressure p or the curvature k. In particular it is
independent of the equation of state relating p and .
For many purposes it is useful to cast the above set of Friedmann-Lematre equations
into a Lagrangian or Hamiltonian form. In particular (and this is the motivation for
doing this here), this system of equations is sufficiently simple to provide a concrete
illustration of some of the general features of the canonical Hamiltonian formulation of
general relativity discussed in a somewhat cursory way in section 20.
Rather than specialising the general results of that section to the case at hand, we will
adopt a more pedestrian approach here and derive these results from scratch. This will
make contact with and hopefully shed some light on a variety of different issues that
have arisen in various parts of these notes, e.g.
the characteristic constraints of general relativity (sections 18.7 and 20), in par-
ticular the Hamiltonian constraint
the role and significance of the lapse function (cf. (20.18)) introduced into the
Robertson-Walker metric (for what appeared to be no good reason at that point)
at the end of section 34.1.
774
Indeed, in the present context it is pretty obvious that, with the single gravitational
degree of freedom a(t), associated with the size of the spatial metric, it is impossible
to derive both the (one independent) spatial component of the Einstein equations (the
Friedmann equation F2, say) and the time-time component of the Einstein equations
(the Friedmann equation F1) from a Lagrangian depending just on a(t) (and the matter
variables).
and we can always choose the N (t) = 1 t is comoving proper time gauge at the end
of the calculations.
In section 34.1 we had already determined the Ricci scalar of this metric, namely (now
again, consistent with t t denoting time-derivatives by overdots rather than primes)
(34.12)
6
R = 2 3 (N (a a + a 2 ) aa N + kN 3 ) . (34.112)
a N
The Einstein-Hilbert Lagangian density is therefore
p p 6a
gR = N a3 gR = a + a 2 ) aa N + kN 3 ) .
g 2 (N (a (34.113)
N
The only dependence on the spatial coordinates is in the spatial volume element g.
Therefore, integrating this Lagrangian density over the space-time, one obtains a (po-
tentially infinite) volume factor from the integration over the spatial coordinates which
we will simply drop. Thus in the infinite-volume case (k = 1 or k = 0 without peri-
odic toroidal identifications) this is not really a reduction in the strict technical sense.
However, this is not our main concern here. Our aim is simply to obtain a Lagrangian
formulation of the Friedmann equations and (as we will see) this can be accomplished
by just dropping the integration over the spatial coordinates.
775
Naively, to eliminate this a
-term, we can integrate the first term of the Lagrangian by
parts,
d
N 1 = (6a2 aN
6a2 a 1 ) + 6aN 2 (2a 2 N + aa N ) . (34.115)
dt
We see that this has the effect of changing the sign of the 2nd term of (34.114) and
cancelling the 3rd term N , so that we have
6a d
LEH = 2
1 ) .
(N a 2 + kN 3 ) + (6a2 aN (34.116)
N dt
It turns out that the total derivative term is cancelled precisely by the Gibbons-Hawking-
York boundary term discussed in section 19.5. This should not come as a surprise: after
all, that was its purpose. To see this explicitly, note that with any of the definitions
or characterisations of the extrinsic curvature tensor given in section 17, one finds that
the extrinsic curvature of the constant time t hypersurfaces (with unit normal vector
(1/N )t ) in the ambient space-time is
1 a
Kij = t gij = gij (34.117)
2N aN
so that the trace is
3a
K = gij Kij = . (34.118)
aN
Thus the Gibbons-Hawking-York boundary term (19.61) reduces to (again dropping the
spatial volume element g and using = 1 for spacelike hypersurfaces)
2 h K LGHY = 2a3 (3a/aN
1 .
) = 6a2 aN (34.119)
The complete gravitational Lagrangian is now, in analogy with the complete standard
gravitational action (19.67),
the Lagrangian
d
Lg = LEH + LGHY
dt
6a (34.121)
= (N a 2 + kN 3 )
N2
)2 + ka) .
= 6N (a(a/N
Inclusion of the cosmological constant ,
gR g(R 2) , (34.122)
leads to
)2 + ka a3 /3) .
Lg = 6N (a(a/N (34.123)
Remarks:
776
1. Note that, as desired, the Lagrangian now only depends on the fields a(t) and
N (t) and (at most) their 1st derivatives.
2. Actually, we see that the Lagrangian depends only on N (t), not its time-derivative
N (t). This will also turn out to be the case for the matter action (which typically
does not depend on any derivatives of the metric at all). Thus the role of N (t) is
just that of a Lagrange multiplier, and as we will see the constraint it imposes is
simply the Friedmann equation F1.
3. One can also obtain (34.121) directly from the Gauss-Codazzi (20.6) or ADM form
(20.42) of the action,
Z
4 + K ij Kij K 2 ) .
SADM [g ] = gd x (R (34.124)
Indeed, the 3-dimensional scalar curvature is simply (this follows e.g. from (13.19))
= 6k/a2 ,
R (34.125)
K ij Kij K 2 = 3(a/aN
)2 9(a/aN
)2 = 6(a/aN
)2 (34.126)
so that (again dropping the spatial volume element g), one finds
LADM = a3 N (6(a/aN
)2 + 6k/a2 ) = Lg . (34.127)
Compatibility with the symmetries of the gravitational field, in particular the spatial
homogeneity, requires that the scalar field is spatially constant and is thus only a func-
tion of time t. With this assumption (and again dropping g), the matter Lagrangian
reduces to
Therefore the total gravitational + matter Lagrangian and action are (reinstating the
gravitational coupling constant)
1
Ltot = Lg + Lm
16GN
! (34.130)
3 aa 2 a3 2 3ka a3
=N + + a3 V ()
4GN 2N 2 2N 2 8GN 8GN
777
and
Z
Stot [a, , N ] = dt Ltot
Z ! (34.131)
3 aa 2 a3 2 3ka a3 3
= dt N + + a V ()
4GN 2N 2 2N 2 8GN 8GN
Before analysing the equations of motion and the Hamiltonian arising from this La-
grangian and action, let us note the following points:
1. From this form of the action it is evident that one role of N (t) is to ensure time
reparametrisation invariance of the action. Indeed, dt and N (t) only appear in
the combinations N (t)dt or (1/N (t))(d/dt). Hence the action remains unchanged
under time reparametrisations t t(t) if one simultaneously transforms N (t) to
(t), say, according to
a new lapse function N
(t)dt
N (t)dt = N = N (dt/dt) .
N (34.132)
This is the remnant of the general covariance of the original action, and the mech-
anism here is the same as that which rendered the parent geodesic action (2.9)
reparametrisation invariant.
2. We see that this action does not involve the time-derivative of the lapse function
N (t). Hence N (t) acts as a Lagrange multiplier enforcing a constraint. This con-
straint can be regarded as the constraint associated to this reparametrisation in-
variance, and such constraints are thus a characteristic feature of any reparametri-
sation invariant or generally covariant system.
3. In this combined gravity plus matter action, we now see very clearly that the
gravitational kinetic term has the opposite sign of the matter kinetic term. This
is a particular (and particularly obvious) manifestation of the general fact (men-
tioned in connection with the DeWitt metric at the end of section 20.2) that the
gravitational kinetic term is not positive definite and that the negative direction
in field space is associated with overall spatial volume deformations.
4. Finally, this form of the action makes it particularly obvious that the cosmological
constant term can also be regarded as leading to (or arising from) a constant shift
of the potential for the matter fields,
a3
+ a3 V () = a3 (V () + ) . (34.133)
8GN
We will therefore absorb the cosmological constant term into the scalar potential
in the following and not carry it around explicitly.
Now let us look at the Euler-Lagrange equations arising from this action.
778
The Euler-Lagrange Equation for N (t)
As mentioned before, N (t) acts as a Lagrange multiplier enforcing the constraint
Stot L
=0 =0 . (34.134)
N N (t)
Explicitly this is the condition
3 aa 2 a3 2 3ka
2
+ 2
+ a3 V () = 0 . (34.135)
4GN 2N 2N 8GN
This really is a constraint rather than an equation of motion, because it only
depends and the fields and their 1st derivatives, not 2nd derivatives, and thus
constitutes a condition on initial data on some constant time initial hypersurface.
In the gauge N (t) = 1 (which we can now choose, after having determined the
equation of motion arising from varying N in the action), this constraint becomes
3 a3 2
(aa 2 + ak) = + a3 V () , (34.136)
8GN 2
or
a 2 + k 8GN 1 2
= ( + V ()) . (34.137)
a2 3 2
Now recall that for such a scalar field the energy density and pressure are given
by (34.36)
(t) = 12 2 + V () , p(t) = 12 2 V () . (34.138)
Hence we can write the constraint as
a 2 + k 8GN
2
= , (34.139)
a 3
and this we now recognise as precisely the Friedmann equation F1.
Having derived the constraint arising from the variation of the lapse function N , we can
now simplify our life by using the reparametrisation invariance to set N (t) = 1. Thus
we can work with the simpler action
Z !
3 aa 2 a3 2 3ka
Stot [a, ] = dt + + a3 V () . (34.140)
4GN 2 2 8GN
779
where p = p(t) is the pressure (34.138). Using the constraint (the Friedmann
equation F1) derived above, this becomes
a
4GN
= ( + 3p) , (34.143)
a 3
which is precisely the Friedmann equation F2.
+ p = 2 (34.145)
and
= + V () , (34.146)
+ 3(a/a)
+ V () = 0 + 3( + p)(a/a)
=0 , (34.147)
We have thus verified that we have indeed obtained a Lagrangian description of the
complete set of Friedmann equations.
Our starting point is the Lagrangian (34.130) (with absorbed into V ()), i.e.
!
3 aa 2 a3 2 3ka
Ltot = N + + a3 V () . (34.148)
4GN 2N 2 2N 2 8GN
146
For detailed textbook discussions of this subject, see e.g. M. Henneaux, C. Teitelboim, Quantisation
of Gauge Systems, or H. Rothe, K. Rothe, Classical and Quantum Dynamics of Constrained Hamiltonian
Systems.
780
In order to streamline the following discussion, it is convenient to consider the fields
a(t) and (t) as the two coordinates
QA = (a, ) (34.149)
i.e.
3aa a3
Pa = , P = . (34.154)
4GN N N
These relations can, as usual, be used to eliminate the velocities in favour of the mo-
menta. The major novelty is thus the presence of N , whose conjugate momentum
vanishes,
Ltot
PN = =0 . (34.155)
N
This is our first constraint (and a primary constraint in the terminology of constrained
systems). While this relation does not allow us to eliminate N in terms of the momentum
PN , this is not an issue here since the Lagrangian does not depend on N in the first
place.
We can now follow the standard procedure to construct the Hamiltonian, via (in the
case at hand, whether or not we include the PN N -term evidently makes no difference)
781
As far as its dependence on (Q, P ) is concerned, this presents no surprises: time evolu-
tion is given by the Hamilton equations
Htot
Q A = + = {QA , Htot }
PA
(34.158)
Htot
P A = = {PA , Htot } ,
QA
and the 1st of these just reproduces the definition of the momenta (34.153), while the
2nd then reproduces the Euler-Lagrange equations for the fields QA = (a, ) discussed
in the previous section.
H(Q, P ) = 0 (34.160)
(and there are no further constraints in this class of examples). This Hamiltonian
constraint is precisely the Friedmann equation F1, i.e. the condition that was imposed
in the Lagrangian formulation by the Lagrange multipler N ,
Ltot
=0 H=0 . (34.161)
N
A painless way to see this is to note that the N -dependence in the Lagrangian (34.151),
1
Ltot = GAB Q A Q B N V (Q) , (34.162)
2N
is precisely such that differentiation with respect to N changes the relative sign between
the 2 terms and thus essentially implements the Legendre transformation from the
Lagrangian to the Hamiltonian (expressed as a function of the velocities),
Ltot 1
= GAB Q A Q B + V (Q) = H . (34.163)
N 2N 2
This structure
H = NH (34.164)
782
35 Cosmology IV: Qualitative Analysis
A lot can be deduced about the solutions of the Friedmann-Lematre equations, i.e. the
evolution of the universe in the Friedmann-Lematre-Robertson-Walker cosmologies,
without solving the equations directly and even without specifying a precise equation of
state, i.e. a relation between p and . In the following we will, in turn, discuss the Big
Bang, the age of the universe, and its long term behaviour, from this qualitative point
of view. I will then introduce the notions of critical density and density parameters, and
discuss some global and causal aspects of these cosmological models (Penrose diagrams,
horizons, . . . ).
a
4GN
3 = 4GN ( + 3p) q= ( + 3p) , (35.1)
a 3H 2
shows that, as long as the right-hand side is positive, one has q > 0, i.e. a < 0 so
that the universe is decelerating due to gravitational attraction. This is the case for
standard matter ( > 0) when it satisfies the strong energy condition (SEC) strictly (cf.
the discussion in section 34.2, in particular (34.32)),
It is also true for a negative cosmological constant (its negative energy density being
outweighed by 3 times its positive pressure). It need not be true, however, in the pres-
ence of a positive cosmological constant which provides an accelerating contribution to
the expansion of the universe. We will, for the time being, continue with the assumption
that is zero or, at least, non-positive, even though, as we will discuss later, recent
evidence (strongly) suggests the presence of a non-negligible positive cosmological con-
stant in our universe today (which is, however, totally irrelevant for the energy budget
of the early universe).
783
As a4 is constant for radiation (an apppropriate description of earlier periods of the
universe), this shows that the energy density grows like 1/a4 as a 0 so this leads to
quite a singular situation.
Once again, as in our discussion of black holes, it is natural to wonder at this point
if the singularities predicted by General Relativity in the case of cosmological models
are generic or only artefacts of the highly symmetric situations we were considering.
And again there are singularity theorems applicable to these situations which state
that, under reasonable assumptions about the matter content, singularities will occur
independently of assumptions about symmetries.
With the normalisation a(0) = 0, it is fair to call t0 the age of the universe. If a
had
been zero in the past for all t t0 , then we would have
= 0 a(t) = a0 t/t0 ,
a (35.3)
and
a(t)
= a0 /t0 = a 0 . (35.4)
where H01 is the Hubble time. However, provided that a < 0 for t t0 (as discussed
above, this holds under suitable conditions on the matter content - which may or may
not be realised in our universe), the actual age of the universe must be smaller than
this,
< 0 t0 < H01 .
a (35.6)
Thus the Hubble time sets an upper bound on the age of the universe. See Figure 46
for an illustration of this.
Let us now try to take a look into the future of the universe. Again we will see that
it is remarkably simple to extract relevant information from the Friedmann equations
without ever having to solve an equation.
We will assume that = 0 and that we are dealing with matter with > 0 and
w > 1/3 (the SEC). The Friedmann equation (F1) can be written as
8GN 2
a 2 = a k . (35.7)
3
784
The left-hand side is manifestly non-negative. Let us see what this tells us about the
right-hand side. Focus on the first term a2 . This term is strictly positive and,
according to (34.85), behaves as
Thus for w > 1/3 the exponent is negative, so that if and when the cosmic scale factor
a(t) goes to infinity, one has
lim a2 = 0 . (35.9)
a
Now let us look at the second term on the right-hand side of (35.7), and analyse the 3
choices for k. For k = 1 or k = 0, the right hand side of (35.7) is strictly positive.
Therefore a is never zero and since a 0 > 0, we must have
a(t)
>0 t . (35.10)
Thus we can immediately conclude that open and flat universes must expand forever,
i.e. they are open in space and time.
By taking into account (35.9), we can even be somewhat more precise about the long
term behaviour. For k = 0, we learn that
Thus the universe keeps expanding but more and more slowly as time goes on. By the
same reasoning we see that for k = 1 we have
k = 1 : lim a 2 = 1 . (35.12)
a
For k = +1, validity of (35.9) would lead us to conclude that a 2 1, but this is
obviously a contradiction. Therefore we learn that the k = +1 universes never reach
a and that there is therefore a maximal radius amax . This maximal radius occurs
for a = 0 and therefore
3
k = +1 : a2max = . (35.13)
8GN
Note that intuitively this makes sense. For larger or larger GN the gravitational at-
traction is stronger, and therefore the maximal radius of the universe will be smaller.
Since we have a < 0 also at amax , again there is no turning point and the universe
recontracts back to zero size leading to a Big Crunch. Therefore, spatially closed uni-
verses (k = +1) with physical matter are also closed in time. All of these findings are
summarised in Figure 46.
785
k=-1
a(t)
k=0
k=+1
t=0 t
t0
-1
H0
786
35.4 Density Parameters and the Critical Density
The primary purpose of this section is to introduce some convenient and commonly used
notation and terminology in cosmology associated with the Friedmann equation (F1).
We will now include the cosmological constant in our analysis. For starters, however,
let us again consider the case = 0 (or include as one contribution to ). (F1) can
be written as
8GN k
2
1= 2 2 . (35.14)
3H a H
If one defines the critical density cr by
3H 2
cr = , (35.15)
8GN
and the density parameter by
8GN
= = , (35.16)
cr 3H 2
This can be generalised to several species of (not mutually interacting) matter, char-
acterised by equation of state parameters wb , subject to the condition wb > 0 or
wb > 1/3, with density parameters
b
b = . (35.18)
cr
The total matter contribution M is then
X
M = b . (35.19)
b
Along the same lines we can also include the cosmological constant . Inspection of the
Friedmann equations reveals that the presence of a cosmological constant is equivalent
to adding matter ( , p ) with
= p = w = 1 . (35.20)
8GN
787
in agreement with what we had already deduced in (34.29). Note that this identification
is consistent with the conservation law (F3), since is constant.
Then the Friedmann equation (F1) with a cosmological constant can be written as
k
(F 1 ) M + = 1 + , (35.21)
a2 H 2
where
== . (35.22)
cr 3H 2
The 2nd order equation (F2) can also be written in terms of the density parameters,
X
q = 21 (1 + 3wb )b . (35.23)
b
Finally, one can also formally attribute an energy density k and pressure pk to the
curvature contribution k in the Friedmann equations. (F1), which does not depend
on p, determines k , and then (F2), which does not depend on k, shows that wk = 1/3
(so that k + 3pk = 0). Thus the curvature contribution can be described as
3k
k k = 3pk = wk = 1/3 , (35.24)
8GN a2
with associated density parameter k . (F3) is identically satisfied in this case (or, if
you prefer, requires that k is constant).
The Friedmann equation (F1) can now succinctly (if somewhat obscurely) be written
as the condition that the sum of all density parameters be equal to 1,
(F 1 ) M + + k = 1 . (35.25)
Denoting the values of the parameters today, at time t = t0 , by a subscript 0, the two
key equations relating the present-day values of these parameters are
(M )0 + ( )0 + (k )0 = 1
(35.26)
1
2 (1 + 3w0 )(M )0 ( )0 = q0
From H0 and q0 and (M )0 one can then in principle determine ( )0 and (k )0 .
In order to make the dependence of the Friedmann equation (F1) or (35.25) on the
equation of state parameters wb and on a(t) more manifest, it is useful to use the
conservation law (34.85,35.8) to write
8GN
b (t)a(t)2 = Cb a(t)(1+3wb ) 2 = Cb a(t)(1+3wb )
b a(t) (35.27)
3
788
for some constant Cb . Then the Friedmann equation takes the more explicit (in the
sense that all the dependence on the cosmic scale factor a(t) is explicit) form
X
a 2 = Cb a(1+3wb ) k + a2 . (35.28)
3
b
In addition to the vacuum energy (and pressure) provided by , there are typically two
other kinds of matter which are relevant in our approximation, namely matter in the
form of dust (w = 0) and radiation (w = 1/3). Denoting the corresponding constants
by Cm and Cr respectively, the Friedmann equation that we will be dealing with takes
the form
Cm Cr
(F 1 ) a 2 = + 2 k + a2 , (35.29)
a a 3
illustrating the qualitatively different contributions to the time-evolution.
One can then characterise the different eras in the evolution of the universe by which of
the above terms dominates, i.e. gives the leading contribution to the equation of motion
for a. This already gives some insight into the physics of the situation. We will call a
universe
1. No matter how small Cr is, provided that it is non-zero, for sufficiently small
values of a that term will dominate and one is in the radiation dominated era. In
that case, one finds the characteristic behaviour
Cr
a 2 = a(t) = (4Cr )1/4 t1/2 . (35.30)
a2
It is more informative to trade the constant Cr for the condition a(t0 ) = a0 , which
leads to
a(t) = a0 (t/t0 )1/2 . (35.31)
789
3. For general equation of state parameter w 6= 1, one similarly has
2
a(t) = a0 (t/t0 )h , h= . (35.34)
3(1 + w)
This describes a decelerating universe (h(h 1) < 0 0 < h < 1) for w > 1/3
and an accelerating universe (h > 1) for 1 < w < 1/3. This illustrates that
one has a
< 0 for + 3p > 0 and a > 0 for + 3p < 0.
4. For the special case w = 1/3, one has h = 1 and thus the linear evolution
a(t) t. Since, as noted in (35.24), one can formally attribute an equation of state
parameter wk = 1/3 to the curvature contribution to the Friedmann equation,
this solution arises not only for an exotic matter component with w = 1/3 and
k = 0, but also for an empty universe with k = 1. We will look at the latter
(the Milne universe) in more detail in section 36.1 (evidently an empty universe
with k = +1, governed by a 2 = 1, is not possible).
In spite of sharing the same Friedmann equation (F1) and the same solution a(t),
these two universes with w = 1/3 are decidedly not identical for the obvious
reason that one is empty and the other one is not, and thus they solve the Ein-
stein equations with very different energy-momentum tensors (alternatively, e.g.
their Misner-Sharp masses (34.109) are different). More dramatically, the scalar
curvature R(t) for a(t) = t is
6 6(1 + k)
R(t) = 2
a + a 2 + k) =
(a . (35.35)
a t2
Thus there is a singularity at t = 0 for k = 0 while R(t) = 0 for the Milne metric
with k = 1 (which turns out to be just Minkowski space in disguise).
Note, as an side, that one cannot conclude from the fact alone that one has k = 1
and the other one has k = 0 that they are different, since it is possible that a given
universe can be foliated in different ways by spatial hypersurfaces with different
curvatures. An example of this is provided by the de Sitter universe, the solution
to the Friedmann equations with a positive cosmological constant (and no other
matter) - see section 36.5.
5. For sufficiently large a, the cosmological constant , if not identically zero, will
always dominate, no matter how small the cosmological constant may be, as all
the other energy-content of the universe gets more and more diluted. In particular,
for k = 0, the Friedmann equation for a positive cosmological constant reduces to
p
2
a = (/3)a 2
a(t) = a(t0 )e /3(t t0 ) , (35.36)
790
This gives the k = 0 metric of de Sitter space,
6. Only for = 0 does k dominate for large a and one obtains, as we saw before, a
constant expansion velocity (for k = 0, 1).
7. We will find and discuss various other exact solutions in section 36.
with (35.34)
2
k=0 , p = w a(t) = a0 (t/t0 )h , h= (35.40)
3(1 + w)
(for w 6= 1). The case of a cosmological constant (w = 1), i.e. (anti-)de Sitter space,
needs to be treated separately. Penrose diagrams for some other (k = +1) solutions will
be given in section 36.
One crucial feature that will emerge from this analysis is the possible presence of cos-
mological horizons which delimit the regions of space-time that can be in causal contact
at a given time or that are visible at a given time or in principle in the infinite future.
This will be discussed further in section 35.7 below. This will also allow us to then,
in sections 35.8 and 35.9, better understand the significance (or lack thereof) of the
Hubble sphere or Hubble radius RH (t) (33.33) introduced in section 33.3.
so that
(
< 0 (decelerating) for 0 < h < 1 w > 1/3
h(h 1) (35.42)
> 0 (accelerating) for h > 1 w < 1/3
791
Now the 1st step will be the introduction of the conformal time coordinate (cf. section
33.5) defined by
d(t) = dt/a(t) (35.43)
This makes it manifest that the metric is conformally flat, and thus its causal structure
is particuarly easy to analyse. All we need to pay attention to is the range of .
a(t) = a0 th
0 t
h
d = a1 h h
0 t0 t dt (35.45)
and therefore
(t) = a1 h 1h
0 t0 t /(1 h) (35.46)
for h 6= 1 and
(t) = a1 h
0 t0 log |t| (35.47)
This shows that the causal structure and Penrose diagram of the space-time for h =
1, w = 1/3 are identical to that of Minkowski space in Figure 23, with the only crucial
difference that there is now a singularity at i , i.e. at t 0, (recall that the
scalar curvature is R(t) t2 (35.35)).
i+
I+
r=0
i0
i (singularity)
792
For h 6= 1, on the other hand, as indicated in Figure 48, the Penrose diagram is given by
the upper (lower) half of the Minkowski Penrose diagram respectively, with the addition
of the initial spacelike singularity. This has drastic implications for the global causal
structure (and the existence of cosmological horizons) to be explored below.
i+ I + , i+
i0
r=0
r=0
I+ I
i0
I , i i (singularity)
Figure 48: Penrose Diagram for the k = 0, w 6= 1/3, 1 solutions. On the left, the
diagram for decelerating solutions with 0 < h < 1, w > 1/3, on the right that for
acelerating solutions with h > 1. Also schematicaly indicated (as dashed lines) are
some surfaces of constant time t or .
We now turn to one particularly relevant implication of the preceding analysis, namely
the existence of (observer-dependent) horizons in cosmological space-times. Thus we
will mainly consider the spatially flat case k = 0 in the following, but in principle
everything can easily and immediately be extended to k 6= 0, e.g. by working in the
polar coordinates (33.3), with r . In the following, the term comoving distance
will refer to the coordinate distance as measured by the comoving radial coordinate r.
By spatial isotropy, we can without loss of generality choose the lightrays to be purely
radial. Then they are governed by the equation
dr (t) 1
dt2 = a(t)2 dr 2 = . (35.49)
dt a(t)
Here, in order to avoid a proliferation of objects with the same anonymous name r, we
call the comoving photon path r (t). Since a(t) is non-negative, lightrays propagate
either in the direction of increasing comoving radial coordinate distance r, or in the
793
direction of decreasing r. In particular, lightrays coming towards us (at r = 0) and
reaching r = 0 at t = t0 are governed by the equation
Z t0
dr (t) 1 dt
= r (t) = . (35.50)
dt a(t) t a(t)
We will occasionally also denote this solution by
when we want to make the dependence of the solution on t0 more explicit. We see that
this solution can be directly expressed in terms of conformal time as
One can also, if one wishes, express light propagation in terms of the (instantaneous)
proper spacelike distance
R (t) = a(t)r (t) , (35.53)
i.e. the spatial proper distance to the photon as measured by the maximally symmetric
spatial geometry on the slice of constant time t but, as we will see in sections 35.8 and
35.9, this is not without its pitfalls.
As an example, for matter with the equation of state parameter w 6= 1 and cosmic
scale factor a(t) = a0 (t/t0 )h (35.34), the equation for lightrays is
2
dr (t)/dt = th0 th /a0 , h= . (35.54)
3(1 + w)
Either from this, or from the result for (t) from (35.46), one finds that for h 6= 1
t0
r (t) = (1 (t/t0 )1h ) . (35.55)
a0 (1 h)
and
1
R (t) = (t1h th t) (35.56)
1h 0
(while for h = 1 one finds a logarithmic behaviour). We now look at 2 limiting cases of
these lightrays.
(with t = 0 the time of the initial singularity). If this integral converges, this
means that there is a region of the universe, namely that at comoving distance
794
r > rph (t0 ), with which we could not have had any causal contact until today. In
that case rph (t0 ) defines the cosmological particle horizon (or just particle horizon).
Expressed in terms of conformal time, whether or not there is a particle horizon is
equivalent to the question if (t) is finite or diverges as t 0. From the analysis
of the previous section for the single-component models with equation of state
p = w, w 6= 1, we can thus conclude that
Remarks:
795
separated events have non-intersecting past lightcones, i.e. have never been
in causal contact in the past. We will come back to one problematic aspect
and consequence of this in section 37.3 in our brief discussion of the (aptly)
so-called horizon problem in cosmology.
In interpreting this diagram, it is also good to keep in mind the highly dis-
torted nature of constant time hypersurfaces at late times, which all end
up at i0 . Thus the particle horizon really reaches infinity I + at t or
+.
i+
r=0
I+
PH
i0
i
Figure 49: Cosmological Particle Horizon: indicated are the past lightcones of a comov-
ing oberver at r = 0 (dashed lines) as well as that observers particle horizon (thick
solid line).
This diagram also shows that the particle horizon can be regarded as the
boundary of the region that can be influenced by the observer.
(d) One can of course also define these horizons for the models with k 6= 0. The
case k = +1 is different in the sense that space is compact. E.g. the range of
in
ds2 = a()2 (d 2 + d 2 + sin2 d2 ) (35.59)
is finite, so that even if there is a particle horizon initially, it may be possible
to see all of space (and later on all around the space at the back of ones
head) at a later time, namely when the -life-time of the universe is
(respectively 2). An example of this is provided by the matter dominated
k = +1 universe, whose conformal diagram is given in Figure 51 of section
36.3.
(e) The particle horizon is a horizon in the sense that it is null and acts as a one-
way membrane: any object that is inside the particle horizon or has crossed
the particle horizon towards us will remain inside the particle horizon at all
future times. This is also evident from the diagram.
796
2. Cosmological Event Horizon
From the above discussion it is also clear that the cosmological space-time is
causally inequivalent to Minkowski space in the far future if is bounded from
above as t . If it is bounded, then the past lightcone of an observer will
never be able to cover all of space-time even as t , i.e. there will be regions of
space-time from which that observer can never ever receive any information, and
in this case the past lightcone is known as a cosmological event horizon.
The radial coordinate size of the past lightcone from t0 = at some time t is
given by the future partner of (35.57), namely
and there is an event horizon iff this integral is finite, i.e. iff () is finite.
Going back to our k = 0 single-component matter example, it is now precisely the
accelerating cosmologies with w < 1/3 that exhibit an event horizon. This is
iluustrated in Figure 50.
I + , i+
i0
EH
r=0
i (singularity)
Figure 50: Cosmological Event Horizon: the region outside the event horizon (above
the line EH) is invisible to a comoving observer at r = 0.
Remarks:
(a) Since current observations suggest that our universe will be dominated by a
positive cosmological constant in the future, i.e. that it will be asymptotically
de Sitter in the future, while it was dominated by standard types of matter
in the past, the standard hot big bang / -CDM model of our universe has
both a particle horizon and an event horizon.
(b) This cosmological event horizon (or the issue whether or not one exists) is not
particularly relevant for observational cosmology today (or in the foreseeable
797
future), but it is of great theoretical interest since, at least in the (asymp-
totically) de Sitter case, in some respects this (again observer-dependent)
horizon appears to have more in common with a true black hole horizon
(like a temperature and an entropy) than one would have had the right to
expect.148
(c) The event horizon is a horizon in the sense that it is null and acts as a one-
way membrane: any object that is outside the event horizon or has crossed
the event horizon away from us will remain outside the event horizon at all
future times. Given that the event horizon is the past lightcone at a point at
future infinity, these statements are almost a tautology, and are also evident
from the diagram.
Recall that the Hubble radius RH (t) was introduced in section 33.3 as the (instanta-
neous) proper distance at which the recessional velocity (33.22)
d
Vrec (t) Rp (t) = H(t)Rp (t) (35.61)
dt
of a comoving object with Rp (t) = a(t)r1 equals the speed of light,
I also mentioned in section 33.3 that the Hubble sphere is occasionally also referred to
as the Hubble horizon. Let us see if, or to which extent, it has such a role and how
it is related to the cosmological particle and event horizons discussed in the previous
section.
The most direct connection between the Hubble radius and the propgation of light
arises from describing a photon path via its proper distance R (t) (35.53). This proper
distance of an (out- or ingoing) lightray changes with time according to
d R (t)
R (t) = a(t)r
(t) 1 = 1 . (35.63)
dt RH (t)
Thus expressed (or plotted) in terms of proper distance, an ingoing lightray exhibits
(or appears to exhibit) quite a different behaviour from the strictly monotonous r < 0
148
See e.g. M. Spradlin, A. Strominger, A. Volovich, Les Houches Lectures on de Sitter Space,
arXiv:hep-th/0110007 and references therein.
798
evolution of r (t). In particular, for such ingoing lightrays the proper distance reaches
a maximum at a time t = tm when
r (tm ) = 1/a(t
m ) = rH (tm ) R (tm ) = RH (tm ) , (35.64)
i.e. when the lightray crosses the Hubble sphere, and then decreases towards r = 0.
Here
rH (t) = RH (t)/a(t) = 1/a(t)
. (35.65)
is the time-dependent comoving radial coordinate of the Hubble radius. This behaviour
can intuitively (but with caution) be attributed to the fact that at the time when the
lightray crosses the Hubble radius RH (t) it has recessional velocity (i.e. away from us)
equal to c and is thus momentarily at rest with respect to us.
Before discussing how (not) to interpret this result, let us determine how the Hubble
radius evolves with time. The evolution equation for RH (t) is
d d
RH (t) = (a(t)/a(t))
= 1 + q(t) , (35.66)
dt dt
while that for rH (t) is
d d
rH (t) = (1/a(t))
= q(t)/a(t) . (35.67)
dt dt
Using (37.1)
q(t) = 21 (1 + 3w)M (35.68)
and
k=0 M + = 1 , (35.69)
these equations can be written as
d
RH (t) = 32 (1 + w)M
dt (35.70)
d
a(t) rH (t) = 23 (1 + w)M 1 .
dt
In particular, if one has a single species of matter (or just a cosmological constant), i.e.
M = 1 (or M = 0) one has
q = 12 (1 + 3w) , (35.71)
which is constant, and
d
RH (t) = 23 (1 + w)
dt (35.72)
d
a(t) rH (t) = 21 (1 + 3w) .
dt
This implies that the Hubble radius increases monotonically in time for
d
w > 1 RH (t) > 0
dt (35.73)
1 d
w> rH (t) > 0 .
3 dt
799
Evidently for w > 1 and a(t) th the solution for RH (t) reproduces the definition
RH = a/a,
w > 1 RH (t) = 23 (1 + w)t = t/h , (35.74)
while for w = 1,
w = 1 RH (t) = RH (t0 ) = 1/H0 . (35.75)
With these results at our disposal, we can now make the following elementary observa-
tions:
(It is also easy to sharpen this upper bound somewhat for w > 1/3, but we will
not need this).
2. Thus we see that the objects whose light we receive today at r = 0 cannot be
further away from us today than at a proper distance RH (t0 ) so that the Hubble
radius RH (t0 ) provides some kind of (rough) upper bound on the distance of such
objects. This may suggest (to some) that therefore the Hubble radius provides a
limit to what we can see (or can have seen) of the universe at time t0 .149 However,
this appears to me to be at best an extremely misleading way of phrasing things.
In fact, even though R (t) reaches a maximum at t = tm , this is not the case
for the comoving radial coordinate r (t) along the photon path. The monotonous
behaviour r < 0 for ingoing lightrays implies that
so that information about objects at larger distances r > rH (tm ) than the Hubble
radius can easily reach us. This distance is bounded only by the particle horizon
rph (t) which is what one gets when one tracks r (t) back to t = 0. Therefore it is
precisely the particle horizon which tells us about which points / comoving objects
in the constant time spatial surface we can already have obtained information, and
not the Hubble radius.
800
Therefore this surface H is null iff q = 1. In these two cases,
1
q = 1 w = 1 , q = +1 w=+ , (35.79)
3
(i.e. a pure positive cosmological constant or radiation) the Hubble radius agrees
with the event horizon or the particle horizon respectively,
Thus for the physically relevant intermediate range 1 < w < 1/3 the Hubble
surface is timelike, causally there is nothing strange or interesting going on there,
and nothing prevents one from crossing it mulitple times in both directions. In
particular it cannot and will not act like a one-way membrane.
Hinchcliffs rule:
Whenever the title of a paper (section) is a question with a yes/no answer, the answer is no.
In the previous section we collected some elementary properties of the Hubble radius
(Hubble sphere). In particular, we saw that the Hubble sphere is only null in 2 special
cases, in which it coincides with the particle or event horizon respectively.
Nevertheless, in spite of these elementary and well-known facts, strangely there is some
debate in the current literature about the significance of the Hubble horizon (or lack
thereof), and even the bizzarre idea that the visible universe (defined by RH (t)) is
somehow like the inside of a black hole.
As far as I can tell one of the contributing factors to this is the fact that in cosmology,
distances to distant objects are commonly expressed in terms of their (somewhat ficti-
tious) instantaneous proper spacelike distance R(t0 ) = a(t0 )r from us today, and not
801
(for instance) in terms of their (approximately constant) comoving coordinate distance,
or some measure of distance at the time the objects emitted the light that we receive
today. In principle, this is perfectly fine, of course, and with due care myths about the
Hubble radius can also be exorcised from this point of view.150
In practice, however, this use of R(t) leads to a somewhat allegorical (and therefore
potentially misleading) way of talking about perfectly mundane things:
For example, as we have seen above in our discussion of lightrays, one can say
(referring to R (t)) that lightrays reach a maximum distance and then turn around
to come towards us, but viewed in terms of comoving coordinates the lightrays
just continue in the direction of decreasing r. Thus care needs to be taken to
separate the physical motion of objects through space from artefacts arising from
describing them in terms of their instantaneous proper spacelike distance.
Let us consider the spatially flat decelerating case k = 0 and w > 1/3, so that
one has a(t) th for some 0 < h < 1. Then
In this setting one frequently encounters the following kind of argument to ex-
plain from this point of view why light can reach us from outside the Hubble
sphere: for 0 < h < 1 the Hubble sphere expands faster than the universe; thus
light emitted from a receding galaxy initially outside the Hubble sphere can even-
tually be overtaken by the Hubble radius and can then become visible to us. True,
and very figurative, in terms of some radius overtaking some other radius, but this
totally obscures the fact that the Hubble sphere has nothing to do with this and
that the only thing that matters for what is visible to us today is if the object is
inside the particle horizon or not, i.e. has comoving coordinate r < rph (t0 ) or not.
The above argument also seems to suggest that somehow things change when one
has an accelerating universe with h > 1, and this is again true, but the thing that
changes is that (as we have seen) for h > 1 there is simply no particle horizon and
therefore no obstruction to seeing objects at any time. This simple fact is again
obscured by the above argument.
Nevertheless, as I mentioned already in section 33.3, some people (and not just laymen
who understand neither cosmology nor black holes) appear to be obsessed with the idea
that our visible universe, defined (counterfactually, as we saw above) as the interior of
150
See e.g. P. van Oirschot, J. Kwan, G. Lewis, Through the Looking Glass: Why the Cosmic Horizon
is not a horizon, arXiv:1001.4795 [astro-ph.CO], G. Lewis, P. van Oirschot, How does the Hubble
Sphere limit our view of the Universe?, arXiv:1203.0032 for illuminating discussions and enjoyable
dissections of these issues from this perspective.
802
the Hubble sphere, is somehow like the inside of a Schwarzschild black hole, the Hubble
sphere playing the role of its horizon.
4RH3
k
RH = 2GN 2 H3 . (35.84)
3 a
Therefore the statement that k = 0 is equivalent to the statement that the Hubble
radius RH is equal to the (would-be) Schwarzschild radius Rs = 2GN M associated
with the total mass
4RH3
M (RH ) = (35.85)
3
contained in the Hubble sphere,
There may be something profound in this, I dont know, but just saying hey, its
the Schwarzschild radius, hence I have a black hole is not!
803
In particular, we will see below that for k = 0 (but only for k = 0), the
Hubble horizon is an apparent horizon, but one with properties that are
quite different to those of the apparent horizon of a black hole.152
Last but not least, the interior of a black hole exhibits a future spacelike
singularity while our universe appears to have emerged from an initial (past)
spacelike singularity (and, no, saying I meant white hole, not black hole
will not help, see the discussion below).
804
with norm
g a a = 3 (q + H 2 R02 )2 (R0 H 2 )2 . (35.93)
Clearly this diverges as R0 RH = 1/H, and therefore this is exactly like the
space-time seen by acceleration observers in a Rindler space-time who detect
a fictitious Rindler horizon.
Thus RH is not like an event horizon and there is nothing like a black hole in sight. In
fact, the best and most informative way of saying what RH (t) is and what its significance
is (or is not) is that (for k = 0) RH (t) is a spherically symmetric marginally trapped
tube (MTT) foliated by spherically symmetric marginally trapped surfaces (MTSs).
In other words, with respect to a spherically symmetric foliation of space-time, for
k = 0 (but only for k = 0) the surface r = rH (t) is an apparent (3-)horizon (cf. the
discussion in section 31.9). Moreover, as the hypersurface r = rH (t) is timelike in the
range 1 < w < 1/3, the horizon-terminology is really not very appropriate and, as
mentioned in section 31.9, in current terminology such a hypersurface is referred to as
a timelike membrane.
Concretely, and in elementary terms, this means the following. Consider (for any k) the
equation
dr
a(t) = dt (35.94)
1 kr 2
for out- respectively ingoing lightrays, and write this in terms of the area radius R(t) =
a(t)r,
p p
dR = 1 kR2 /a2 1 + RH/ 1 kR2 /a2 dt . (35.95)
This can of course also be deduced directly from the PG-like metric (35.87) and its
k 6= 0 counterpart (33.39).
that one solution, R+ , propagates to larger values of R, while the other, R , propagates
to smaller values of R. Evidently something special happens at R = Rah and for larger
values one has
(
H(t) > 0 : dR+ /dt > 0 and dR (t)/dt > 0
R(t) > Rah (t) (35.98)
H(t) < 0 : dR+ /dt < 0 and dR (t)/dt < 0
Thus, for a contracting universe this means that the spheres of constant R and t are
trapped for R > Rah (t), like the 2-spheres inside the event horizon of the Schwarzschild
805
black hole, with both ingoing and (would-be) outgoing radial lightrays moving towards
smaller radii. However, far from indicating the presence of a black hole in this case,
these trapped spheres and the apparent horizon Rah (t) indicate (together with an energy
condition) a future cosmological singularity.
In an expanding universe, on the other hand, for R > Rah both in- and outgoing
lightrays move to larger values of R. The spheres with R > Rah are thus the opposite of
trapped surfaces, i.e. anti-trapped or trapped towards the past (and reflect the existence
of a big bang singularity in the past).
The fact that both in- and outgoing expansions are positive is also a characteristic
feature of the (unphysical) white hole region of a black hole in the region before the
past event horizon. However, in that case it is the region around r = 0, and with
r < 2m, that contains the anti-trapped surfaces while sufficiently large spheres show
normal behaviour. Here, on the contrary, it is sufficiently large spheres around the
comoving observer that are anti-trapped while in a sufficiently small region around any
cmoving observer there are no (anti-)trapped surfaces.
Noting that for k = 0 the apparent horizon is equal to the Hubble radius,
all of this is simply a restatement of the discussion about the behaviour of lightrays in
the previous section, and nothing seems to be gained by coining a new name for this
well-established concept of an apparent horizon. Moreover, for k 6= 0 the Hubble sphere
is not even an apparent horizon or some other marginally trapped tube, so it seems best
not to associate the word Hubble with the word horizon at all.
806
36 Cosmology V: Some Exact Solutions
Let us start our excursion into exact solutions by looking at the totally unphysical case
of a completely empty universe, i.e. = p = = 0, and only k possibly not zero. As
trivial as this may be, it has its pedagogical value, which is why I am including this
case here.
a 2 + k = 0 k0 . (36.1)
It should not come as a suprise that the metric we have found is just that of Minkowski
space (the constant a0 can be absorbed into a rescaling of the spatial coordinates).
The case k = 1 is a bit more interesting. In that case the equation to solve is (I will
call the time-coordinate instead of t for reasons that will perhaps become apparent
below)
a 2 = +1 a( ) = ( 0 ) . (36.3)
and the resulting space-time metric is (choosing without loss of generality 0 = 0)
2
ds2 = d 2 + 2 d (36.4)
3
This appears to describe a non-trivial universe (known as the Milne universe) with
either (for a( ) = + ) a big bang at = 0 and with the universe subsequently expanding
153
For explicit solutions of the Friedmann equations for 6= 0 and non-trivial matter content (gener-
alising the solution (37.21)) see e.g. B. Aldrovandi, R. Cuzinatto, L. Medeiros, Analytic solutions for
the -FRW Model, arXiv:gr-qc/0508073.
807
linearly with > 0, or (for a( ) = ) with a linearly contracting universe for < 0,
ending in a big crunch at = 0, and all this in spite of the fact that the universe is
empty.
This is deceptive, however (not that it is empty, but that it is non-trivial). As always, we
need to be careful to disentangle coordinate artefacts from genuine geometrical state-
ments. Indeed, this space-time is again nothing other than (a part of) Minkowski
space-time. We had already anticipated this in section 34.1, in connection with (34.9)
which expresses the Riemann tensor of a Robertson-Walker metric in terms of its Ricci
tensor (and which therefore implies that a vacuum solution will necessarily be flat).
Nevertheless, it will be instructive to see this explicitly. To that end we start with the
Minkowski metric
ds2 = dt2 + d~x2 (36.5)
and introduce coordinates that are adapted to the family of space-like hyperboloids
For t > 0 these hyperboloids fill (foliate) the interior of the future lightcone at the origin
(and the interior of the past lightcone for t < 0), and if you draw these hyperboloids you
can perceive what appears to be a non-trivial dynamical evolution of these surfaces. To
show that this fake dynamics is precisely the dynamics exhibited by the Milne metric,
introduce new coordinates (, , angles) via
Then the Minkowski metric becomes precisely the Milne metric (36.4),
Remarks:
1. These coordinates only cover a part of Minkowski space-time, namely the interior
of the future (and past) lightcone of the origin. They are adapted to the comoving
(and thus geodesic) observers of the Milne metric, i.e. to families of observers with
constant values = 0 , ~n = ~n0 . Thus in terms of Minkowski coordinates, the
worldlines of these observers are described by
808
or
~x(t) = (~n0 tanh 0 )t ~v0 t (36.11)
with
(v0 )2 = tanh2 0 1 . (36.12)
Thus these are geodesic observers emanating radially with different initial direc-
tions and velocities from the origin, the constant-time hyperboloids of the Milne
metric being the surfaces which these observers reach at their proper time .
2. The coordinate singularity at = 0 is simpy due to the fact that the worldlines of
these observers all intersect at the origin (and thus do not provide good coordinates
there).
3. These coordinates are, as may have occurred to you by now, the future and past
relatives of the Rindler coordinates discussed (way back) in sections 1.3 and 2.8
(and then again in the context of the near-horizon geometry of the Schwarzschild
metric in section 25.5), in particular of the spherical Rindler space discussed at
the end of section 2.8:
Introducing Rindler-like coordinates (, , angles) adapted to the timelike hyper-
boloids
~x2 t2 = 2 (36.13)
via
t = sinh , ~x = ~n cosh , (36.14)
with ~n a spatial unit vector, ~n.~n = 1, the Minkowski metric takes the form (2.161)
This is the spherical Rindler metric. The metric in brackets in the 2nd line will
reappear below, in the guise of the (2+1)-dimensional de Sitter metric.
slices of constant r are of the form S 2 R (and we will recognise these below,
in more fancy terminology, as (2+1)-dimensional versions of the Einstein
static universe);
809
in Milne coordinates, slices of constant are hyperboloids H 3 ;
in the Rindler-like coordinates (36.15), slices of constant are are de Sitter
spaces dS3 .
This particular solution is mainly of historical interest. Before the discovery and un-
derstanding of the Hubble expansion of the universe, it was natural to assume that
the universe can be described by a static solution compatible with the cosmological
principle (homogeneity and isotropy). This led Einstein to introduce the (in-)famous
cosmological constant. Let us see how this comes about.
(F3) = 0 , (36.19)
which now implies that also p = 0. (36.18) then simply fixes the constant in terms of
the constants and p. For a standard matter content of the universe, say = m + r ,
this requires that > 0. Finally, the first Friedmann equation (F1) now becomes an
algebraic equation for a(t) = a0 , namely
k 8GN
(F1) 2
= + = 4GN ( + p)
a 3 3 (36.20)
k = +1 and a0 = (4GN ( + p))1 .
2
Remarks:
810
1. The topology of the solution is R S 3 , the radius a0 of the S 3 being smaller for
larger energy and pressure density (bigger gravitational attraction) and vice-versa.
3. The precise balance between matter and cosmological constant required by this
solution also implies that it is unstable to perturbations of either or , p: a
slight increase in relative to + p, no matter how small, will make the universe
expand, and a slight decrease will make it collapse. This alone is enough to make
this particular solution unphysical and shows that, even with inclusion of a cosmo-
logical constant, an expanding or collapsing universe is practically inevitable, thus
undermining the original motivation for introducing the cosmological constant in
the first place.
Cm
a 2 = k . (36.23)
a
We will now look in turn at the cases k = 0, +1, 1.
1. For k = 0, this is the equation we already discussed above, leading to the solution
(35.33),
a(t) = a0 (t/t0 )2/3 (36.24)
811
We recall that in this case we will have a recollapsing universe, with amax = Cm
attained for a = 0,
a = 0 a = amax = Cm . (36.27)
The equation (36.26) can be solved in closed form for t as a function of a, and the
solution to
dt a
=( )1/2 (36.28)
da amax a
is p
amax
t(a) = arccos(1 2a/amax ) aamax a2 , (36.29)
2
as can easily be verified. The universe starts at t = 0 with a(0) = 0, reaches its
maximum a = amax at
Denoting a derivative with respect to by a prime, and noting that a = a /a, one
then finds that for k 6= 0 the Friedman equation (36.23) can be written as
Cm
a 2 + k = (a )2 + ka2 = Cm a
a (36.32)
2 2 2
((a Cm /2k) ) + k(a Cm /2k) = kCm /4 .
Thus for k = +1, the solution to the Friedmann equation can be written as
Choosing 0 = (so that a( = 0) = 0), and integrating the relation dt/d = a()
to find t(), one then finds the solution
amax
a() = (1 cos )
2
amax
t() = ( sin ) , (36.34)
2
which makes it transparent that the curve is indeed a cycloid, roughly as indicated
in Figure 46.
The maximal radius is reached at
(with amax = Cm ), as before, and the total lifetime of the universe is 2tmax .
812
3. Analogously, for k = 1 the Friedmann equation in parametrised form (36.32)
can be solved in terms of hyperbolic (rather than trigonometric) functions,
Cm
a() = (cosh 1)
2
Cm
t() = (sinh ) . (36.36)
2
Remarks:
1. We see that for small times (for which matter dominates over curvature) the
solutions for k 6= 0 reduce to t 3 , a 2 and therefore a t2/3 which is indeed
the exact solution for k = 0.
2. Analogously, for late times in the k = 1 model one finds that a() t(),
reproducing the expected late-time behaviour a 1 of section 35.3.
3. In section 28 we will use the exact solutions of this matter dominated phase to
describe the interior geometry of collapsing stars.
Suppressing the transverse 2-sphere and eliminating the conformal factor a()2 , we are
thus led to consider the metric
s2 = d 2 + d 2 .
d (36.38)
Noting that in the case at hand the range of both and is finite,
1. Since this solution is spatially compact, there is no analogue here of spatial infinity
i0 . As a consequence, this diagram a priori looks very different from Penrose dia-
grams for asymptoticaly flat space-times, in particular also from those for spatially
flat cosmologies in section 35.6.
2. All timelike and null geodesics begin at the initial (spacelike) singularity at = 0
and end at the final singualrity at = 2.
813
i+ , I +
= 2
=0
=0 i , I =
3. A lightray sent out at the Big Bang will reach the antipodal point of the sphere
exactly at the time max = the universe reaches its maximal radius, and will
have circled around the universe exactly once precisely at the time 2max of the
final big crunch.
We now consider the situation when radiation is dominant (as is expected during some
time in the very early universe). In this case we need to solve
a2 a 2 = Cr ka2 . (36.40)
(and there is also evidently a corresponding collapsing solution). For k = +1, on the
other hand, there will be a maximal radius amax at
814
1. Because a(t) appears only quadratically, it is convenient to make the change of
variables b = a2 . Then one obtains
b 2
+ kb = Cr . (36.43)
4
For k = 1, one necessarily has b(t) = b0 + b1 t + b2 t2 . Fixing b(0) = 0, one easily
finds the solution
a(t) = [2Cr1/2 t kt2 ]1/2 . (36.44)
As expected this reduces to a(t) t1/2 for small times where the curvature term
is irrelevant.
For k = 1, on the other hand, the universe expands forever, the late-time
behaviour being given by
again as expected.
All this is of course in agreement with the results of the qualitative discussion
given earlier.
(a )2 + ka2 = Cr . (36.48)
Using dt/d = a() one can also find t(). E.g. for k = +1 one has
=0 a( = 0) = 0 , t( = 0) = 0 . (36.52)
815
Again it is instructive to display the k = +1 solution in a conformal diagram (Figure
52) and to compare it with that of the matter dominated solution (Figure 51). The
main difference is that here the range of is equal to the range of ,
As a consequence, the conformal diagram is a square, and any lightray sent out at
the Big Bang can only travel half-way around the universe during the lifetime of the
universe.
i+ , I +
=
=0
=0 i , I =
This case is of considerable interest for at least two reasons. On the one hand, as we
know, is the dominant driving force for a(t) very large, and may therefore, if current
observations are to be believed (see section 37.1), dominate the late-time behaviour of
our universe.
On the other hand, the currently most popular cosmological models trying to also
address and solve the so-called horizon problem and flatness problem (cf. the discussion
in section 37.3) of the standard FLRW model of cosmology (as well as a number of
other issues) use a mechanism called inflation based on an era of exponential expansion
during some time in the very early universe. This is typically generated by something
that acts effectively like a cosmological constant.
816
Thus the equations to solve are, setting = 0 and p = 0 but retaining and k,
2
a 2 = k + a . (36.54)
3
We see immediately that has to be positive for k = +1 or k = 0, whereas for k = 1
both positive and negative are possible,
(
if > 0 k = 0, 1 possible
a 2 = k + a2 (36.55)
3 if < 0 k = 1
This is one instance where the solution to the second order equation (F2),
a
= a , (36.56)
3
is more immediate, namely trigonometric functions for < 0 (only possible for k = 1)
and hyperbolic functions for > 0. The first order equation then fixes the constants of
integration according to the value of k.
||/3 = 2 (36.57)
a 2 = k a/2 , = a/2 ,
a (36.58)
a (t) e t/ (36.59)
3. k = 1: The solution is
2 .
ds2 = dt2 + 2 sinh2 t/ d (36.62)
3
817
Remarks:
1. It turns out that all 3 metrics actually represent the same space-time metric, just
written in different coordinates. This space-time is known as the de Sitter space-
time. Thus these 3 metrics exhibit different slicings of the de Sitter space-time
(or, henceforth, de Sitter space for short, or just dS space), with the t = const.
slices being R3 , S 3 and H 3 respectively.
2. One way to establish this would be to directly exhibit the coordinate transforma-
tions that map one metric to the other, but this is messy and does not provide any
additional insight. We will proceed in a different way in section 38 below. Indeed,
it turns out that the de Sitter solution is the unique maximally symmetric space-
time with positive curvature (cf. the discussion in sections 13 and 38), and this
perspective will provide us with a more efficient and insightful way of constructing
different coordinate systems and exploring the relations among them.
4. The coordinates appearing in the k = +1 metric turn out to cover de Sitter space
globally. Thus the global picture of de Sitter space is that of a 3-sphere that
It is clear from this description that e.g. the 2 metrics for k = 0 only cover the
contracting (-) or expanding (+) period of the de Sitter universe.
818
In this case, the only possibility (within the Robertson-Walker ansatz for the metric) is
k = 1, and the solution is
2 .
ds2 = dt2 + 2 sin2 t/ d (36.63)
3
Remarks:
1. This solution is nowadays known as the anti-de Sitter space-time (or anti-de Sitter
space or AdS space for short).
3. AdS space turns out to be the unique maximally symmetric space-time with neg-
ative curvature.
with 0 < and < < . In these coordinates the metric is time-
independent and therefore (unlike dS) AdS has a global timelike Killing vector,
namely .
5. A detailed discussion of many different coordinate systems for anti-de Sitter space
is given in section 38.3 below.
819
37 Cosmology VI: The Universe Today - Insights and Puzzles
Let us recall the key equations governing the evolution of the universe,
M + + k = 1
(37.1)
1
2 (1 + 3w)M = q
(M )0 + ( )0 + (k )0 = 1
(37.2)
1
2 (1 + 3w0 )(M )0 ( )0 = q0
In the universe today, the radiation contribution to the matter content is negligible and
the only non-negligible matter content appears to be that of w = 0 pressureless matter,
and thus
q0 = 12 (M )0 ( )0 . (37.3)
M M (cr )0 a3 H 2
M = (M )0 = (M )0 = 03 02 (M )0 . (37.4)
(M )0 (M )0 cr a H
Likewise, for the cosmological constant and curvature contributions one has
820
it is also possible to convert other evolution equations in t (or integrals over t) into
evolution equations in z (or integrals over z).
For a long time it was believed that = 0 (or at least negligibly small) today. In that
case one would have
q0 = 21 (M )0 , (37.10)
and therefore the curvature parameter k would be directly related to the value q0 of the
deceleration parameter today:
Moreover, observations indicated a value of 0 much smaller than the critical density
(cr )0 , thus suggesting a decelerating open k = 1 universe. While perhaps not the
most hospitable place in the long run, at least this scenario had the virtue of simplicity.
However, exciting recent developments and observations in cosmology and astrophysics
have provided strong evidence for a very different and extremely intriguing and puzzling
picture of the universe today. I will just summarise the results here:
P
1. Estimates for the current matter contribution M = b b are
(M )0 0.3 . (37.12)
2. Ordinary (visible, baryonic) matter only accounts for a small fraction of this,
namely
(M,visible)0 0.04 . (37.13)
Most of the matter density of the universe must therefore be due to some form of
(as yet ill-understood, non-relativistic, weakly interacting) Dark Matter or Cold
Dark Matter (CDM).
821
4. Approximate numerical values for the parameters and H0 characterising the
universe today are
5. The fact that there is something like a cosmological constant is perhaps not par-
ticularly puzzling as such (in the absence of a good reason why it should not have
been there in the first place), but nevertheless there are a number of puzzling
issues related to the value of the cosmological constant - see section 37.4 for a
brief discussion.
The exact solution for a spatially flat universe dominated by non-interacting (cold dark)
matter was already given in (35.33),
and that for a spatially flat universe dominated by a cosmological constant was given
in (35.36), p
a(t) = a0 e /3(t t0 ) . (37.18)
An exact solution of the Friedmann equations can also be written down when both types
of energy/matter are present simultaneously (as in the universe today). In this case the
Friedmann equation is just
M + = 1 , (37.19)
so that the Friedmann equation of the -CDM Model (cold dark matter with a cosmo-
logical constant) can be written as
3
2 2 a0
H = H0 (M )0 + ( )0 . (37.20)
a3
An exact solution of this equation is
(M )0 1/3 p 2/3
a(t) = a0 sinh((3/2) ( )0 H0 t)
( )0
(37.21)
(M )0 1/3
a0 (sinh(t/t ))2/3 .
( )0
This evidently reproduces the above power-law (exponential) behaviour at early (late)
times. The transition between the decelerating matter dominated and accelerating -
dominated phases occurs at the time t = t at which
a
(t ) = 0 . (37.22)
822
Calculating
d2
2/3 2 4/3 2 2/3
sinh (t/t ) = (2/3)t (1/3) sinh (t/t ) cosh (t/t ) + sinh (t/t )
dt2
(37.23)
one finds that at t = t
sinh2 (t /t ) = 1/2 , (37.24)
With the values for the density parameters given above, (M )0 = 0.3, ( )0 = 0.7, this
is roughly
a(t ) 0.6 a0 . (37.26)
Thus the transition occurred at a time when the universe was roughly speaking half
as big as today, and at a corresponding value
a0
z= 1 0.66 (37.27)
a(t )
The currently favoured -CDM scenario with (M )0 = 0.3, ( )0 = 0.7 and k = 0 has
been tested and confirmed in various independent ways. Nevertheless, it is a mystery
and raises all kinds of questions and puzzles, in particular because the emergence of
this particular universe appears to be somewhat unnatural (although it may perhaps be
difficult to quantify this sentiment) and to require an incredible amount of fine-tuning.
I will not say anything about the dark matter component, since an explanation presum-
ably needs to be found in the realm of particle physics (every model of physics beyond
the standard model worth its salt has its own dark matter candidates), and since the
story is in any case sufficiently strange and interesting even without worrying about,
or having to put up with, things like WIMPS (weakly interacting massive particles), or
MACHOS (massive compact halo objects), or binos, winos and other neutralinos.
Here, in a nutshell, are some of these problems or puzzles (and for more in-depth and far-
reaching discussions of these issues please consult the extensive literature on cosmology).
The first two (discussed in this section) are independent of the recent discovery of dark
energy, and appear to require some fine-tuning or other mechanism to intervene in
the very early universe. The third, on the other hand, has to do with dark energy /
the cosmological constant today, and with the relatively recent period of accelerated
expansion of the universe and is discussed in section 37.4.
823
I should stress that, while I generally make an attempt in these notes to present just the
(well-established) facts, this is not entirely possible in this and the subsequent section,
since there is no general consensus on how to resolve these issues (in particular those
arising in connection with dark energy). As a consequence, these sections contain not
just facts but certainly and unavoidably also some opinions (whereas there was no point
in sharing with you my opinion about the Riemann curvature tensor, say - I dont even
have one).
103 0 , (37.32)
824
so that at that time the deviation from flatness (as measured by ) was smaller
by a factor of 103 than it is today.
However, this may begin to look like a more serious fine-tuning issue when one
compares with the very early universe and specifies a particular (earliest) time
in the past. Concretely, let us (for the sake of argument) assume that
With the age of the universe t0 taken to be (only orders of magnitude will be
relevant for this argument) t0 1017 s, one finds that
In particular, any miniscule deviation from this tiny value at t = tp , i.e. a miniscule
difference between the actual and the critical density of the universe at that time,
would have led to a universe completely incompatible with obervations. The
universe would have most likely either recollapsed after a very short time ( ),
or expanded so quickly as to prevent the formation of any structure in the universe
( 0). In fact, = 0 and = are the only attractors (attractive fixed points)
of the theory in the terminology of dynamical systems, while = 1 is a repeller
(unstable fixed point).
What is one supposed to make of this? One possibility is to declare that the
above argument is bogus since 1 in the past is implied by the Friedmann
equations anyway, and extrapolating all the way back to the Planck time (or
the GUT era or . . . ) requires some enormous leap of faith. But if the flatness
problem is perceived as a real problem, then no explanation for this can be found
in the standard model of cosmology we have been discussing here so far (FLRW
with standard matter content at early times). Occasionally, therefore, weakly
anthropic arguments of the kind if had not been fine-tuned to this value by
some (unknown) mechanism/deity, we couldnt be here in the first place to ask
the question why takes the value it does today have been advocated.
While this cannot be ruled out (this is the whole problem with this entire anthropic
reasoning business), there is actually a mechanism which naturally leads to the
required tiny value of 1 at early times, namely inflation.155 As this mechanism
155
For introductions to inflation, see e.g. A. Linde, Particle Physics and Inflationary Cosmol-
ogy, arXiv:hep-th/0503203; D. Baumann, TASI Lectures on Inflation, arXiv:0907.5424 [hep-th];
S. Tsujikawa, Introductory review of cosmic inflation, arXiv:hep-ph/0304257; A. Liddle, An in-
troduction to cosmological inflation, arXiv:astro-ph/9901124; A. Linde, Inflationary Cosmology,
arXiv:0705.0164v2 [hep-th]; R. Brandenberger, Inflationary Cosmology: Progress and Problems,
arXiv:hep-ph/9910410.
825
is also invoked to solve other problems of the standard model of cosmology (such
as the horizon problem, see below) which are (a) possibly more serious and (b) not
obviously of anthropic significance, one could perhaps consider the anthropophile
value of 0 as an unintentional (but serendipitous) side-effect of inflation.
What inflation does is to postulate, for a number of reasons, a brief but highly
significant period of exponential expansion in the very early universe, as could be
triggered by the presence of a cosmological constant (35.36),
the particle horizon at time t = tls . Converting this to proper distance at time
t = tls , one obtains the quantity
Z tls
dt
Rph (tls ) = a(tls ) , (37.37)
0 a(t)
826
t0
tLS
Its significance in the present context is that the past light cone of events that are
further than 2Rph (tls ) apart on the surface t = tls do not intersect, so that they
are so far apart that they were never before in causal contact. In particular this
means that no causal interaction can be responsible for the temperature being the
same at the two events.
In the (matter dominated) meantime (i.e. between tls and t0 , today) the size of
such a causal patch of size Rph (tls ) on the last scattering surface has expanded
to proper size
a(t0 ) 2/3 1/3
Rph (tls ) (t0 /tls )2/3 tls = t0 tls . (37.39)
a(tls )
On the other hand, the distance over which the CMBR photons could have trav-
elled since t = tls is
Z t0
dt 2/3 1/3 1/3
a(t0 ) = 3t0 (t0 tls ) t0 , (37.40)
tls a(t)
where we have dropped the second term since t0 1010 years while tls 105 years
(once again all numbers here and below are just meant to be order of magnitude
estimates). Thus the region of the last scattering surface from which we receive
the CMBR photons today is much larger than a causal patch, their ratio being
(one can of course calculate this either at t = tls or at t = t0 , here we have chosen
the latter option)
2/3 1/3
t0 /(t0 tls ) = (t0 /tls )1/3 105/3 . (37.41)
What this means is that the sky splits into roughly 41010/3 & 104 disconnected
patches, that were never in communication before sending light to us. In view
of this the observed isotropy of the CMBR is not only astounding but utterly
implausible.
827
Again, inflation solves this in an extremely natural way. Inflation operates at a
time ti tls (perhaps some time between 1036 s and 1032 s after the big bang)
and it can easily inflate a tiny causally connected patch at that time t = ti to such
a size that at time t = tls it is (more than) large enough to explain the isotropy
of the CMBR.
The majority view among cosmologists appears to be that for these and other reasons
inflation should be considered to be part of the standard model of cosmology. However,
science is not decided by opinion polls and it is good scientific practice to keep an open
mind. In particular, one should keep in mind the possibility that for instance the two
(flatness and horizon) problems (or perhaps better: puzzles) may indicate problems
with the approximation of the early universe by a FLRW cosmology and are thus not
something that should (or needs to, or perhaps even can) be solved by (inflationary)
modifications within the class of FLRW models.156 The most significant success of
inflation, however, is that it not only explains the high degree of isotropy of the
CMBR but that it also provides a mechanism for, and a precise quantitative account of,
density perturbations and the small inhomogeneities exhibited by the CMBR (namely
as arising from quantum vacuum fluctuations). Simple models of inflation appear to be
able to account for the latest (2013) precision measurements and data from the Planck
satellite157 , but the debate over this and other aspects of inflation continues.158
Ultimately, in order for inflation to be considered a natural solution to the above prob-
lems (and others I have not mentioned - see the literature cited above), one needs to
be able to show that inflation arises fairly naturally (in some precise sense) and does
not itself require a comparably huge amount of fine-tuning in order to resolve these is-
sues. This is a complicated and intensely debated issue, and one I dont feel sufficiently
competent and knowledgeable about to have an informed opinion, let alone utter an
opinion in public. It will perhaps only be settled once one has a better understanding
of the physics in the very early, pre-inflationary, era that is supposed to be responsible
for setting the initial conditions for inflation.
This is a complex, confusing and multi-faceted problem that has been around for a long
time, and I will not remotely be able to do justice to it here.159 It has been sharpened
156
For example, there are claims that in certain inhomogeneous Lematre-Tolman cosmological models
the horizon problem can be avoided without invoking inflation - see e.g. chapter 18.17 of J. Plebanski,
A. Krasinski, An Introduction to General Relativity and Cosmology.
157
A. Linde, Inflationary Cosmology after Planck 2013, arXiv:1402.0526 [hep-th].
158
A. Ijjas, P. Steinhardt, A. Loeb, Inflationary schism after Planck2013, arXiv:1402.6980
[astro-ph.CO].
159
The classic reference for this is the authoritative and influential article by S. Weinberg, The cosmo-
logical constant problem, Rev. Mod. Phys. 61 (1989) 1-23.
828
and brought to the forefront again by the discovery of dark energy.160
We have remarked before that the cosmological constant looks like a vacuum energy con-
tribution to the energy-momentum tensor. It is perhaps better to turn this around and
to say that vacuum energy is one potential contribution to the cosmological constant,
in the sense that the associated energy density can be written as
Here
0
0 = (37.43)
8GN
is associated with a bare cosmological constant 0 , some parameter in the action,
while vac is a quantum contribution arising from the energy in the ground state or
vacuum, i.e. it is the vacuum expectation value of the Hamiltonian density operator
T00 ,
vac =< vac|T00 |vac > . (37.44)
and it is usually assumed that this extends to quantum field theory in a gravitational
background (as described e.g. in the references in footnote 83 in section 26.6) in the
form
< vac|T |vac >= vac g . (37.46)
2. Even in Minkowski space an expression like (37.45) requires some sort of regulari-
sation procedure, and the procedures that are common or privileged in the case of
Poincare-invariant field theories may either not be available or may not be in any
160
For recent reviews see e.g. S. Carroll, The Cosmological Constant, Living Rev. Relativity 4 (2001)
1, http://www.livingreviews.org/lrr-2001-1; J. Polchinski, The Cosmological Constant and the
String Landscape, arXiv:hep-th/0603249; T. Padmanabhan, Cosmological Constant - the Weight
of the Vacuum, arXiv:hep-th/0212290; R. Bousso, TASI Lectures on the Cosmological Constant,
arXiv:0708.4231 [hep-th], J. Martin, Everything You Always Wanted To Know About The Cosmo-
logical Constant Problem (But Were Afraid To Ask), arXiv:1205.3365 [astro-ph.CO].
829
way privileged when one considers a general curved background. Thus there are
ambiguities in the calculation of < T > which may effect the validity of (37.46).
One (relatively harmless) ambiguity of this kind would be the addition of a term
proportional to the Einstein tensor G to the right-hand side of (37.46),
< vac|T |vac >= vac g + G (37.47)
8GN
for some constant . This would be compatible with T = 0 and with the
Minkowskian limit, and would amount to a renormalisation of Newtons constant
GN .
In spite of all this it is usually assumed that something like (37.46) is at least approxi-
mately true for sufficiently weak fields, and we will proceed with this assumption.
Depending on the physics or physical process one is looking at, natural estimates for
the energy scale of a cosmological constant produced in this way might be in the MeV
or GeV range, or even at the (ultimate Planck cut-off) scale of
s
~c5
EP = 1018 GeV (37.48)
8GN
(with the inclusion of the factor of 8 in the definition this is known as the reduced
Planck energy). Any of these are many many orders of magnitude above what was and
is compatible with observation, or even with the very existence of our universe.
One cannot simply solve this problem by using the bare value 0 of the cosmological
constant to cancel the vacuum contribution by hand, because
830
this would require an enormous (and unexplained) fine-tuning of this bare value
contributions to the vacuum energy are expected to arise at various instances dur-
ing the evolution of the universe while this cancellation could at best be achieved
at one point in time during the evolution of the universe (if this cancellation is cho-
sen to take place too early, it will be incompatible with observations today while
if it happens too late it will be incompatible with the well-established thermal
history of the universe).
It therefore seemed natural to seek some mechanism that would simply make the net
cosmological constant identically zero, and a lot of effort went into finding or inventing
some mechanism responsible for this. Nowadays, with the strong evidence for dark
energy, this cannot be the solution and the original question/problem has morphed into
at least 3 distinct questions, namely
2. Why is not zero, and why does it have the value (103 eV)4 ?
T T + cg (37.49)
161
See S. Hollands, R. Wald, Quantum Field Theory Is Not Merely Quantum Mechanics Applied to
Low Energy Effective Degrees of Freedom, arXiv:gr-qc/0405082 for some reflections on this issue.
831
for constant c. To that end one postulates the so-called trace-free Einstein equa-
tions
R 14 g R = 8GN (T 41 g T ) , (37.50)
T = 0 (37.51)
(which is not implied by the modified Einstein equations (37.50)).162 Note that
both sides of (37.50) are manifestly traceless. In particular, therefore, any contri-
bution to the energy-momentum tensor g (like a cosmological constant) does
not contribute to (37.50),
T g T 14 g T = 0 , (37.52)
and therefore does not couple directly to gravity (said differently, the source term
is invariant under the shift (37.49), as required).
While this looks like a major departure from the usual (and well-tested) Einstein
equations, and this might make one believe that the equations (37.50) can easily
be ruled out experimentally, what actually happens is more subtle and somewhat
surprising. Namely, writing (37.50) as
R 12 g R = 8GN (T 41 g T ) 41 g R (37.53)
for some integration constant . Plugging this result back into (37.53), one finds
R 21 g R = 8GN T 14 g [8GN T + R]
(37.55)
= 8GN T g .
This is nothing other than the usual Einstein equations with a cosmological con-
stant,
R 12 g R + g = 8GN T , (37.56)
the crucial difference being that here is not determined by the matter con-
tent and its vacuum energy but arises solely as an integration constant. While
this does not explain the observed tiny value of the cosmological constant, it
162
See e.g. G. Ellis, H. van Elst, J. Murugan, J.-P. Uzan, On the Trace-Free Einstein Equations as
a Viable Alternative to General Relativity, arXiv:1008.1196v2 [gr-qc] and references therein for a
more detailed discussion and the history of this and other variants of unimodular gravity, originally even
considered by Einstein himself in 1919.
832
separates this issue from that of the vacuum fluctuations. Note also that this sce-
nario does not rule out a gravitational coupling to general quantum corrections to
states of some physical system (Lamb shift, Casimir energies, . . . ) which are not
Lorentz invariant, as required e.g. by precision tests of the equivalence principle
(see the discussion regarding do vacuum fluctuations gravitate? in the reviews
by Polchinski and Martin in footnote 160).
2. Why is not zero, and why does it have the value (103 eV)4 ?
More correctly, the question should perhaps be phrased as What is the origin
of dark energy (the source of what appears to be a late time acceleration of the
universe), and why does the corresponding energy density have the approximate
value (103 eV)4 ?
Numerous models have been proposed that do give rise to a late-time acceler-
ation of the universe for one reason or another (one of the buzzwords here is
quintessence). Most of them, however, assume (explicitly or implicitly) that some-
how the first problem has been solved and that there is no (bare or combined)
cosmological constant source term in the Einstein equations. Again anthropic rea-
soning (shudder!) can be invoked to argue that the above is a plausible value for
the cosmological constant. It would however of course be very desirable to find
alternative explanations.
It is not clear at all what kind of mechanism would produce an effect of the desired
size. One curious observation, however, which may provide us with a clue, is that
the energy scale E associated with the cosmological constant,
is of the order of the geometric mean of the (current-day) Hubble scale and the
Planck scale (I do not know who (if anybody) should be credited with this obser-
vation which has probably been made independently multiple times). In energy
units this is the curious relation
833
Hubble scale), which has led to some holographic ruminations.163
A more conservative realisation of the scenario described by (37.58) is also pos-
sible.164 For concreteness, consider the scalar field in a cosmological background
discussed in section 33.9, with a UV momentum cutoff at a momentum k = kU V .
Namely if, for one reason or another, one can argue that the leading quartic di-
vergence in the vacuum energy
Z
1 d3 k q 2 2 (kU V )4
vac 2 k + m ef f = + ... (37.61)
(2)3 16 2
(which would also be present in, and hence destroy, Minkowski space) should be
subtracted, then the subleading term is quite generically of the form
(in the model in section 33.9 this H02 -term would arise from the scalar curvature
term in the effective mass). With a cut-off at the Planck scale, this gives a vacuum
energy contribution of the order of the observed dark energy density,
In designing such scenarios, care should be taken that the vacuum contribution
thus determined really has an equation of state parameter equal (or very close
to) w = 1, so that vac calculated today is really (approximately) constant
(and does not behave like vac (t) H(t)2 , say, which would be in conflict with
many cosmological observations). Exploring scenarios of this kind is clearly a
constructive alternative to anthropic incantations.
834
time we find ourselves at (how precisely depends among other things on what one
means by comparable in size). Now unlikely things happen all the time, so this
may really just be a coincidence, but maybe it is not, and maybe, therefore, the
if it walks like a duck and talks like a duck . . . argument fails and dark energy
is not a cosmological constant.
However, some time after having first written this, I became aware of the following
quotation (also invoking ducks) which, in the interest of a balanced presentation,
I will not withhold:
Why obfuscate? If a poet sees something that walks like a duck and
swims like a duck and quacks like a duck, we will forgive him for enter-
taining more fanciful possibilities, It could be a unicorn in a duck suit -
whos to say! But we know that more likely, its a duck.165
Fair enough . . .
With M , and both of order of the critival density because M + = 1,
one has
3H02
= (c )0 = H02 EP2 (37.65)
8GN 8GN
which is the same statement as the empirical obervation (37.58) discussed above.
One can also eliminate any reference to Planck units and the Planck scale and
write this as
(H0 )2 , (37.66)
so another way of stating the coincidence problem is as the curious fact that the
two a priori completely unrelated time-scales, one set by (H0 )1 (which turns out
to be remarkably close to the current best estimates for the age of the universe)
and the other set by the energy density or curvature radius of dark energy, are
approximately equal. Why should the cosmological constant today be related to
the age of the universe today???
This is not to say that there is not a strong case to be made for what appears to be an
extreme fine-tuning of other (standard model) parameters to support everything from
primordial nucleosynthesis to chemistry and life as we know it.166 However, among
165
R. Bousso, TASI Lectures on the Cosmological Constant, arXiv:0708.4231 [hep-th].
166
See e.g. L. Barnes, The Fine-Tuning of the Universe for Intelligent Life, arXiv:1112.4647
[physics.hist-ph] for a recent assessment of the situation.
835
other things precisely because of the proviso as we know it, and also because of the
(as far as I know) limited understanding of chemistry based on other than standard
model gauge theories, this begs the question and the significance of these findings is far
from clear and difficult to assess.
836
G: Varia
Until now, our treatment of the basic structures and properties of General Relativity
has been reasonably systematic and standard. This part contains a biased and varied
selection of other fun topics.
837
38 de Sitter and anti-de Sitter Space
(Anti-) de Sitter spaces are the simplest solutions of the Einstein equations with 6= 0.
As such they are the curved counterparts of the = 0 Minkowski space-time. In
particular, they are the unique constant curvature (or maximally symmetric) space-
times, just as Minkowski space is the unique flat (or Poincare-symmetric) space-time.
Thus they are the Lorentzian-signature counterparts of spheres and hyperboloids, and
made a first appearance as such in section 13. Since they are thus in some sense the
simplest non-trivial space-times, it is worthwhile to study them in some detail.
The de Sitter and anti-de Sitter spaces subsequently reappeared in the context of cos-
mology in section 36.5, but it may not be immediately apparent that what we called
(anti-) de Sitter there is indeed identical to what we called (anti-) de Sitter in section
13. In order to bridge this gap, we will in particular redo the analysis of section 13.4
in the special case of interest. We will realise the (A)dS spaces via embeddings into a
higher-dimensional vector space, and we will use this embedding to express the resulting
induced metric in various coordinate systems. In this way we will, in particular, also
recover the metrics encountered in the cosmological context above. This will complete
the proof that the solution of the Friedmann equations in the cosmological constant
dominated phase is unique (for a given cosmological constant ) and uniquely given by
the maximally symmetric (A)dS space.
We will now discuss the embeddings of (A)dS space, beginning with the more familiar
cases of Euclidean signature spheres and hyperboloids, already discussed at some length
in section 13.
We will denote the coordinates of the 5-dimensional embedding space by z A , the range of
the indices (e.g. 1 to 5 or 0 to 4) being chosen to be whatever is convenient or suggestive
in the case at hand.
S4 : (z 1 )2 + . . . + (z 5 )2 = +1 . (38.1)
838
If we wanted to discuss the sphere of radius R, we would replace the +1 on the
right-hand side of the above equation by R2 (and likewise for the radius R or
curvature radius below).
Its isometry group is the (4 + 1)-dimensional Poincare group, which has dimension
15. The equation defining H 4 is left invariant by its SO(4, 1) Lorentz-subgroup
which has dimension 10, and thus the metric induced on H 4 by the Minkowski
metric on the embedding space will have isometry-group SO(4, 1) and is maximally
symmetric. This metric has Euclidean signature because the (-1) on the right-hand
side of (38.2) allows one to completely eliminate the time-like direction z 0 , and
the corresponding line element is denoted by d 2.
4
3. If we change the sign on the right-hand side of (38.2), the equation will still
be invariant under SO(4, 1), but now the signatue of the induced metric will
be Lorentzian instead of Euclidean and we obtain a realisation of a maximally
symmetric space-time, namely de Sitter space,
dS4 : (z 0 )2 + (z 1 )2 + . . . + (z 4 )2 = +1 . (38.4)
AdS4 : (z 0 )2 + (z 1 )2 + (z 2 )2 + (z 3 )2 (z 4 )2 = 1 . (38.5)
Since this equation is SO(3, 2)-invariant, this space-time will have isometry group
SO(3, 2), induced from the signature (2,3) metric on the embedding-space R2,3 .
The dimension of SO(3, 2) is also 10, just like that of SO(4, 1) or SO(5), and
(38.5) defines the maximally symmetric Anti-de Sitter space (actually, for AdS we
will take the universal covering space of the space described by (38.5) - we will
come back to this below).
As already indicated in section 13, the statements about the isometries of maximally
symmetric space(-time)s can be compactly summarised by writing them as homogeneous
spaces of the isometry groups.
This generalises the statement that the 2-sphere can be written as the homogeneous
space (or coset space) SO(3)/SO(2), which itself comes about as follows:
839
1. It is clear that the SO(3) rotations act transitively on the 2-sphere, i.e. that any
point can be mapped to any other point (this is the property of homogeneity
discussed in section 13).
2. Moreover, given any point p on the 2-sphere, there is an SO(2) subgroup of SO(3),
SO(2)p SO(3), consisting of rotations around the axis passing through that
point, that leaves the point p invariant but acts on the vectors at that point by
2-dimensional rotations (isotropy).
3. Since this SO(2)p -transformation must also be a symmetry of the metric at that
point, this shows in particular that the metric has a Euclidean signature at each
point.
4. Putting all this together, given any point p, we can establish a 1:1 correspondence
between points on S 2 and elements of SO(3) modulo elements of SO(2)p , and we
write this as
S2
= SO(3)/SO(2) , (38.6)
the set on the right-hand side considered as the set of equivalence classes [g] with
g SO(3) and [gh] = [g] for h SO(2), say (this defines right-cosets, SO(2)
acting on the right on an SO(3)-element).
These statements generalise straightforwardly to the 4-sphere, and also to the other
maximally symmetric space-times discussed above, and we summarise these facts in the
following table, adding also the notation we will occasionally use for the correspond-
ing line-element (when we do not just write it anonymously as ds2 ), and giving the
embedding (with ~z2 = (z 1 )2 + (z 2 )2 + (z 3 )2 ):
M
= G/H G H d2 embedding
M = S4 SO(5) SO(4) d24 +~z2 + (z 4 )2 + (z 5 )2 = +1
M = H4 SO(4, 1) SO(4) 2
d (z 0 )2 + ~z2 + (z 4 )2 = 1 (38.7)
4
M = dS4 SO(4, 1) SO(3, 1) d21,3 (z 0 )2 + ~z2 + (z 4 )2 = +1
M = AdS4 SO(3, 2) SO(3, 1) d2 (z 0 )2 + ~z2 (z 4 )2 = 1
1,3
The G-isometries are generated by the 5(5-1)/2 = 10 rotational Killing vectors (cf.
(9.25))
JAB = AC z C B BC z C A = JBA (38.8)
of the metric AB of the embedding space, and they satisfy the Lie bracket algebra
This generalises in an obvious way to higher dimensions, so that e.g. SO(4, 2) is the
isometry group of AdS5 . As we saw before (section 9.3), this group also happens to
840
be the conformal group of 4-dimensional Minkowski space, and this is one of the fun-
damental ingredients in the so-called AdS/CFT correspondence relating gravitational
theories in 5-dimensional (asymptotically) anti-de Sitter space-times to conformal field
theories in 3 + 1 dimensions.167
(z 0 )2 + (z 1 )2 + . . . + (z 4 )2 = +1 . (38.10)
This describes a time-like hyperboloid with topology R S 3 , the S 3 arising from the
slicing at fixed z 0 ,
Here is an overview of the coordinate systems and topics discussed in this section:
1. Global Coordinates
5. Planar Coordinates
6. Static Coordinates
9. Painleve-Gullstrand-like Coordinates
167
See e.g. J. Polchinski, Introduction to Gauge/Gravity Duality, arXiv:1010.6134 and H. Nastase,
Introduction to AdS-CFT, arXiv:0712.0689 [hep-th] for accessible introductions to this by now vast
subject.
841
38.2.1 Global Coordinates
(z 1 )2 + . . . + (z 4 )2 = +1 + (z 0 )2 , (38.12)
(analogous identities will be used repeatedly in the following). Then one finds the metric
!
X
2 0 2
ds = (dz ) + a 2
(dz ) |(38.10) = d 2 + cosh2 d23 . (38.15)
a
Remarks:
1. This is the k = +1, > 0 solution (36.61) of the Friedmann equations, thus
confirming that the solution found there is maximally symmetric.
2. The manifest symmetries in this coordinate system are the symmetries of S 3 , i.e.
the subgroup SO(4) SO(4, 1) of the total isometry group (thus 6 out of 10
isometries are manifest).
3. It can be read off from (38.10) that these coordinates cover the hyperboloid glob-
ally (modulo the usual, and utterly harmless, issues with spherical coordinates at
the poles of a sphere).
(with /2 /2).
From the global coordinates introduced above we can pass to conformal time in order
to then construct the Penrose diagram for de Sitter space. Thus we write
842
and introduce conformal time (usually called , but I will call it T here) by the relation
d
dT = . (38.18)
cosh
Surprsingly this (hyperbolic) equation has the simple (trigonometric) solution
1
cosh = (38.19)
cos T
with
(, +) T (/2, +/2) . (38.20)
Writing the metric on the 3-sphere as usual as d 2 + sin2 d22 , we see that de Sitter
space is conformal to
s2 = dT 2 + d 2 + sin2 d22 ,
d (38.22)
with [0, ] and (/2, +/2). Suppressing the transverse 2-sphere and adding
the points with T = /2 (future and past infinity), we end up with the simple Penrose
diagram of de Sitter space in Figure 54.
i+ , I +
T = +/2
T = /2
=0 i , I =
Now consider a comoving observer at the north pole = 0. Such an observer will have
both an event horizon (the boundary of that part of the universe that this observer
can in principle obtain information about or be influenced by), and a particle horizon,
843
i+ , I + i+ , I +
EH
PH
=0 i , I = =0 i , I =
Figure 55: Event and Particle Horizons and Regions of Influence for a comoving
observer at the north pole = 0 in de Sitter space.
which here we interpret as forming the boundary of the region which this observer can
inprinciple have influence on. These horizons and regions of influence are indicated
in the diagrams in Figure 55.
The intersection of these 2 regions is called the (northern) causal diamond and is the
only region of de Sitter space that is fully accessible to an observer at the north pole in
the sense that this is the region this observer can send signals to and receive signals from.
This northern causal diamond is completely causally disconnected from the (southern)
causal diamond of a comoving observer at the south pole = (Figure 56).
i+ , I +
=0 i , I =
844
For z 4 > 1, the right-hand side is negative and slices of constant z 4 are therefore
hyperboloids H 3 . In that case it is natural to introduce
The n thus parametrise H 3 , and one finds that the de Sitter metric can be written as
23 ,
ds2 = d 2 + sinh2 d (38.25)
Remarks:
2. The manifest symmetries in this coordinate system are the symmetries of H 3 , i.e.
the subgroup SO(3, 1) SO(4, 1) of the total isometry group (thus again 6 out
of 10 isometries are manifest).
3. Note that these coordinates only cover the region z 4 > 1, which is only a part of
de Sitter space. We will discuss coordinates in the range |z 4 | < 1 below.
Curiously, de Sitter space can be foliated by de Sitter spaces of one dimension less. To
see this note that when |z 4 | < 1 the right-hand side of (38.23) is positive. Thus the
slices of constant z 4 are then indeed dS3 spaces, and adapted coordinates are
dr 2
ds2 = + r 2 d21,2 = d 2 + sin2 d21,2 , (38.28)
1 r2
with d21,2 the line-element on the unit curvature radius (2 + 1)-dimensional de Sitter
space dS3 .
845
38.2.5 Planar Coordinates
z 4 z 0 = e t , z 4 + z 0 = e t ~x2 e t , z k = e t xk . (38.29)
It is evident that this also solves (38.10). Then one finds the metric
Remarks:
2. By sending t t, one sees that the metric can be written in either of the two
ways
ds2 = dt2 + e 2t d~x2 . (38.31)
3. The manifest symmetries in this coordinate system are the Euclidean group, i.e.
the translational and roational symmetries of R3 , as well as the time-translation
plus scaling symmetry
tt+ , ~x e ~x . (38.32)
z 4 z 0 = e t 0 . (38.33)
and introducing a new time coordinate = exp t, the metric takes the form
846
i+ , I +
t=
=0 i , I =
Figure 57: Planar Coordinates for de Sitter space cover half of de Sitter space. Indi-
cated (schematically) are some lines of constant t. The event horizon of the comoving
observer at the north pole is at t = +.
So far, the metric in all the coordinate systems was explicitly time-dependent (with a
different time-coordinate in each case). It is possible to locally introduce a coordinate
system that is time-independent. Namely, let us write (38.10) as
X
(z 1 )2 + (z 2 )2 + (z 3 )2 (z k )2 = 1 + (z 0 )2 (z 4 )2 (38.36)
k
P k )2
and introduce a spatial radial coordinate r via k (z = r 2 . Then one has
X
(z k )2 = r 2 (z 4 )2 (z 0 )2 = 1 r 2 . (38.37)
k
Remarks:
847
2. It is adapted to a static and geodesic observer at r = 0 (corresponding to an
observer at the north or south pole of the sphere in global coordinates, say) and
covers that observers causal diamond (Figure 56). Indeed, the equations for a
radial geodesic read
(1 r 2 )t = E , r 2 r 2 = E 2 1 . (38.40)
r( ) = r0 e , t( ) = 21 ln(1 r( )2 ) . (38.41)
The existence of this alternative form of the metric beyond the horizon of the static
metric is not surprising: we did (but then quickly dismissed as not particularly
insightful) something analogous in the Schwarzschild case, introducing the time
coordinate T = r and the radial coordinate R = t in the region 0 < r < 2m (26.1),
and we will also briefly consider something analogous in the anti-de Sitter case
below - see (38.139).
As the de Sitter metric in static coordinates has the standard form of a static spherically
symmetric metric, one can follow the general recipe outlined in sections 30.6 and 30.8 to
848
construct the counterpart of Eddington-Finkelstein and Kruskal-Szekeres coordinates.
The latter are occasionally used in discussions of the thermodynamics associated with
the de Sitter cosmological horizon, as they bring out most clearly the analogies with the
Schwarzschild event horizon.
We start with (38.39), with the inclusion of the curvature radius , corresponding to a
cosmological constant = 3/2 , and introduce the corresponding tortoise coordinate r
in the standard way via
ds2 = (1 r 2 /2 )dt2 + (1 r 2 /2 )1 dr 2 + r 2 d22
= (1 r 2 /2 )[dt2 + (1 r 2 /2 )2 dr 2 ] + r 2 d22 (38.45)
2 2 2 2 2
= (1 r / )[dt + (dr ) ] + r d22 .
In this case the relation
+r
dr = (1 r 2 /2 )1 dr r = 12 log (38.46)
r
can be explicitly inverted to give r as a function of r ,
r = tanh r / . (38.47)
Introducing in the usual way also the retarded and advanced coordinates u = t r , v =
t + r , one can now write the metric in Eddington-Finkelstein coordinates (u, r, , ) or
(v, r, , ), leading to
(and likewise for the retarded coordinates). Kruskal-Szekeres coordinates can now be
introduced by starting with
U = e u/ , V = e v/ (38.52)
849
38.2.8 Interlude on (A)dS Schwarzschild
The above form (38.39) of the de Sitter metric in static coordinates should be familiar
from (what appeared to be) quite a different context in section 23, in particular the
discussion of Birkhoffs theorem in section 23.6. In that context we had noted that the
characteristic form (23.68),
of the metric is implied not just by the vacuum Einstein equations for spherical sym-
metry but, more generally, by the Einstein equations in spherical symmetry whenever
T tt = T rr . This condition is, in particular, satisfied, when the matter content is that of
a cosmological constant,
T = g T tt = T rr = . (38.55)
8GN 8GN
Then the Einstein equations (23.64) reduce to the simple equation
leading to
2m(r) 2m0
f (r) = 1 =1 r 2 /2 . (38.58)
r r
Remarks:
2. Likewise, we learn that for < 0 there will be a static coordinate system in
which the anti-de Sitter metric takes the form (38.39) with r 2 +r 2 . We will
recover and reconfirm this below, see (38.71), when studying the AdS metrics more
systematically.
850
38.2.9 Painlev
e-Gullstrand-like Coordinates
In section 26.2 we had seen how to construct, from a static spherically symmetric metric
of the standard (and, as it subsequently turned out, ubiquitous) form (38.54) e.g. a
metric with with flat constant-time slices by a coordinate transformation T (t, r) =
t + (r). In the case at hand (38.39), with f (r) = 1 r 2 , the condition (26.19) leads to
which is solved by
(r) = 12 ln(1 r 2 ) . (38.61)
Performing the coordinate transformation form the static metric (38.39) with this choice
of (r) leads to the metric
Remarks:
3. This metric can also be obtained from the metric (38.31) in planar coordinates
(we give these a subscript p now)
851
4. In an analogous way, one can construct de Sitter analogues of Eddington-Finkelstein
coordinates etc.
Clearly, there are many more possiblities, but this shall suffice. It should be clear from
the above examples how to construct other coordinate systems for dS adapted to ones
needs.169
Coordinates for anti-de Sitter space can be described in precise analogy with the de
Sitter case. Our starting point is the defining equation (38.5),
(z 0 )2 + (z 1 )2 + (z 2 )2 + (z 3 )2 (z 4 )2 = 1 . (38.66)
This has the topology S 1 R3 , as can be seen by writing the equation as
(z 0 )2 + (z 4 )2 = 1 + (z 1 )2 + (z 2 )2 + (z 3 )2 . (38.67)
For z k fixed, this describes a circles in the (z 0 , z 4 )-plane. As the metric is negative-
definite on that plane, this show that the surface defined by (38.66) has closed timelike
curves through every point. To avoid such a paradoxical and pathological situation, we
will pass to the covering space, which amounts to replacing S 1 R. It is actually
this resulting space, without closed timelike curves, that is usually referred to as anti-de
Sitter space, and we will follow that convention.
Here is an overview of the coordinate systems and topics discussed in this section:
7. Poincare Coordinates
852
38.3.1 Global (and Static) Coordinates
Remarks:
1. These coordinates again make the periodic nature of the time-direction of AdS
manifest. The embedding hyperboloid would be covered by choosing to be
an angular variable with period 2. In the universal covering space, however,
without closed timelike curves, and 2 are not identified, and the range of
is < < +.
2. Alternatively, one can write r = sinh , with 0 r < , and now one recognises
the metric
ds2 = (1 + r 2 )d 2 + (1 + r 2 )1 dr 2 + r 2 d22 . (38.71)
as the negative curvature counterpart of the static de Sitter metric (38.39), already
anticipated in connection with the general solution (38.58) of the spherically sym-
metric Einstein equations with a cosmological constant. Notice in particular that
this is of the general f f 1 form
3. Note that, in contrast to the dS case, AdS has a global timelike Killing vector,
namely . It follows from the parametrisation of the embedding coordinates that
-translations are the same thing as a rotation in the (negative definite) (z 0 , z 4 )-
plane,
= ( z A )A = z 4 0 z 0 4 , (38.73)
and thus is identified with the Killing vector J04 (38.8) of the embedding space.
853
38.3.2 Conformal Coordinates, Conformal Boundary and Penrose Dia-
grams
which (as in the case of the conformal time coordinate of de Sitter space, cf. section
38.2.2), has the trigonometric solution
1
cosh = . (38.76)
cos
The difference is that here the range of is mapped to
Using
1
cosh = sinh = tan , tanh = sin , (38.78)
cos
the metric takes the form
1
ds2 = 2
d 2 + d 2 + sin2 d22
cos (38.79)
1
= 2
d 2 + d23 .
cos
Thus anti-de Sitter space is conformal to
s2 = d 2 + d23
d with 0 < /2 , (38.80)
i.e. AdS is conformal to one half of the Einstein static universe (which has the standard
range 0 ). Surfaces of constant are thus half-spheres (discs) with boundary at
= /2, and one can visualise AdS as a solid cylinder infinitely extended in the time
direction. The points with = /2 correspond to = , and = /2 is the conformal
boundary I of AdS. This conformal boundary is timelike,
s2 |=/2 = d 2 + d22
d (38.81)
with topology
I S2 R , (38.82)
and unites future and past null infinity I as well as spatial infinity i0 , symbolically
I = I + I i0 . (38.83)
854
Likewise, for the (d + 2)-dimensional anti-de Sitter space, say, one has
AdSd+2 : I R Sd , (38.84)
and this conformal boundary can be regarded as the (spatial) conformal compactification
of (d + 1)-dimensional Minkowski space,
R1,d R S d . (38.85)
Suppressing, as usual, the transverse 2- (or d-) sphere, the metric of this conformal
completion of anti-de Sitter space is
s2 = d 2 + d 2
d (, +) , [0, /2] . (38.86)
Since the range of is infinite while that of is finite, there is no way to compress this
into a finite range of coordinates for both while preserving the condition that lightrays
are diagonal. In other words, any further conformal transformation of the metric with
line element d s2 that maps the -interval to a finite range will squeeze the -interval to
a point, which is not particularly helpful for visualisation purposes. Thus the best one
can do is think of AdS as an infinite strip (or, as mentioned above, as an infnite solid
cylinder), as displayed in Figure 58.
This diagram may not appear to be particularly informative at first sight. However, it
displays and highlights several characteristic and peculiar features of AdS, in particular
that AdS has a timelike boundary, i.e. a boundary with Lorentzian signature (a lower-
dimensional space-time in its own right, equipped with a conformal class of metrics). In
particular, starting with 5-dimensional AdS, the boundary I is a 4-dimensional space-
time (which can be viewed as Minkowski space, with the space compactifed to a sphere).
Moreover, as is evident from the diagram (and we will confirm by a quick calculation
below), lightrays can reach the boundary I (infinity) in finite coordinate time. Indeed,
radial lightrays are governed by the pair of equations
the first being the radial null condition and the second the conserved energy associated
to -translation invariance. These can be combined into
d
2 = E 2 / cosh2 sinh = E . (38.88)
d
For outgoing lightrays (the plus-sign), one thus has
sinh () = E( 0 ) , (38.89)
and thus is reached for infinite values of the afffine parameter, . For the
coordinate time, on the other hand, one finds
855
i+
=0 = /2
Figure 58: Penrose diagram of anti-de Sitter space. = 0 represents the center (interior)
of AdS, = /2 the timelike boudary I. The diagram is infinitely extended to the
future and past, with timelike future/past infinity i residing there. Also indicated are
a lightray reflected at I and some timelike geodesics.
This also means that for any spacelike hypersurface (such as the horizontal line at
the bottom of the diagram) there are points to the future of which are such that
there are past-directed causal (null) geodesics from that point that do not intersect
that surface (because they run into the boundary I). This makes it plausible that
specifying initial data for some fields (scalar fields, say) on some spacelike hypersurface
alone is not enough to determine the future evolution of the field. Anti-de Sitter space
is thus an example of a space-time which has no Cauchy surfaces which would lead to
a well-defined Cauchy initial value problem (and one also says that such a space-time
is not globally hyperbolic).
856
In embryonic form, this problem already arises for (null) geodesics, and in the diagram I
have continued the lightray beyond I by adopting a particular prescription for evolving
the lightray after it hits I, namely reflecting boundary conditions at I. It turns
out that also for fields a well-defined evolution requires specifying not only initial data
on some hypersurface but also boundary conditions on I. In analysing this issue, the
conformal relation between anti-de Sitter space and the Einstein static universe (which
has Cauchy surfaces and a well-defined initial value problem) turns out to be valuable.170
Many of these things, in particular the existence of a timelike boundary on which fields
live (namely the boundary values of the bulk fields), combined with the fact men-
tioned before that the isometry group of 5-dimensional AdS, SO(4, 2), coincides with
the conformal group of the 4-dimensional (boundary) Minkowski space, are crucial basic
ingredients in the celebrated AdS/CFT correspondence relating a gravitational (quan-
tum) theory in the Anti-de Sitter bulk space-time to a conformal non-gravitational
quantum field theory on the (conformal) boundary (see the references in footnote 167
for an introduction).
Requiring that the term in brackets equals the flat metric d2 + 2 d2 in polar coordi-
nates, one finds the condition
with 0 2. Thus
(1 + 2 /4)2
f (r) = 1 + r 2 = , (38.94)
(1 2 /4)2
and
f (r)1 (dr/d)2 = r 2 /2 = (1 2 /4)2 , (38.95)
170
S. Avis, C. Isham, D. Storey, Quantum field theory in anti-de Sitter space-time, Phys. Rev. D18
(1978) 3565-3576.
857
so that in its full glory the anti-de Sitter metric in isotropic coordinates takes the form
(1 + 2 /4)2
2
ds = dt2 + (1 2 /4)2 (d2 + 2 d2 )
(1 2 /4)2
(38.96)
(1 + ~x2 /4)2 2 2 2 2
= dt + (1 ~x /4) d~x .
(1 ~x2 /4)2
Comparison with the standard metric on the hyperboloid in isotropic form (13.37),
namely
ds2 = (1 + k~x2 /4)2 d~x2 . (38.97)
for k = 1, shows that in these coordinates the slices of constant t are not just confor-
mally flat (by construction) but actually hyperbolic maximally symmetric.
(z 0 )2 + (z 1 )2 + (z 2 )2 + (z 3 )2 z z = (z 4 )2 1 . (38.98)
We see that the slices of constant z 4 are hyperboloids H 3 for |z 4 | < 1 and de Sitter
spaces dS3 for z 4 > 1. In the fomer case, adapated coordinates to this slicing are
When z 4 > 1, the slices of constant z 4 are de Sitter spaces dS3 , and we obtain an
analogue of the de Sitter Slicing coordinates (38.28) of de Sitter space. Corresponding
adapted coordinates are
858
or, with r = sinh ,
dr 2
ds2 = + r 2 d21,2 . (38.103)
1 + r2
We had seen in section 13, that the metrics on the three types of maximally symmetric
Riemannian spaces R3 , S 3 and H 3 could be written collectively as
dr 2
R3 , S 3 , H 3 : ds2 = + r 2 d22 . (38.104)
1 kr 2
Analogously, we now see from (38.28), (38.103) and the form of the Minkowski met-
ric (36.15) in Rindler-like coordinates, that the metrics on R1,3 , dS4 and AdS4 can
collectively be written as
dr 2
R1,3 , dS4 , AdS4 : ds2 = + r 2 d21,2 , (38.105)
1 kr 2
Anti-de Sitter space also has a coordinate systems in which constant radial slices are
themselves again anti-de Sitter spaces (of one dimension less, of course). This is obtained
by a simple variant of the previous construction. Namely, instead of introducing n that
parametrise a dS3 , split the defining equation as
(z 0 )2 + (z 1 )2 + (z 2 )2 (z 4 )2 = 1 (z 3 )2 (38.106)
and note that for fixed z 3 the left-hand side defines an AdS3 . Noting also that the
right-hand side is 1, write
21,2 .
dn dn = d (38.109)
21,2 .
ds2 = d2 + cosh2 d (38.110)
38.3.7 Poincar
e Coordinates
A somewhat unobvious but particularly interesting and useful way of parametrising the
solution to (38.66) is to write (a certain amount of hindsight helps)
z = rx ( = 0, 1, 2) , z4 z3 = r , z 4 + z 3 = r 1 + r x x (38.111)
859
Even though this is obscure, in these coordinates the metric takes the particularly simple
and easy-to-use form
dr 2
ds2 = + r 2 dx dx
r2
= z 2 ( dx dx + dz 2 ) (r = z 1 ) (38.112)
= d2 + e 2 dx dx (r = e ) .
I have (somewhat redundantly) listed explicitly these 3 closely related parametrisations
since all choices r, z and are commonly found in the literature (with what I have here
called z also frequently called r).
Remarks:
1. These are the AdS counterpart of the planar coordinates for dS space, and the
space-time counterpart of the uppper-half-plane model of hyperbolic geometry
discussed in section 10.3, see in particular (10.73). In these coordinates, a (2+1)-
dimensional Poincare symmetry is manifest, as well as a scaling symmetry (x , z)
(x , z). In this coordinate system it is also completely manifest that the met-
ric is conformally flat, i.e. differs from the Minkowski metric only by an overall
positive factor.
2. These coordinates do not cover all of AdS, as can for instance be seen by noting
that radial null-geodesics can reach r = 0 (or z = , say), at finite values of the
affine parameter: null condition and conserved energy give (x1 and x2 are kept
fixed)
r 2 = r 4 t2 and r 2 t = E r 2 = E 2 . (38.113)
The solution for decreasing r is therefore
r() = E( 0 ) , (38.114)
which reaches r = 0 for = 0 . Thus lightrays exit from the Poincare patch, i.e.
the region of the AdS spcae-time covered by these Poincare coordinates, at finite
values of the affine parameter. This boundary of the Poincare patch at r = 0 or
z = is also occasionally known as the Poincare horizon.
860
4. The conformal boundary I resides at r or z 0. Up to an infinite confor-
mal factor r 2 , the metric induced on I (rather, the part of I covered by these
coordinates) is just the Minkowski metric with line-element dx dx . Thus,
compared with the conformal boundary in global coordinates, the Poincare patch
just misses the point at infinity that compactifies the spatial directions to S 2
(cf. the discussion in section 38.3.2).
5. Writing
dx dx = dt2 + d~x2 , (38.117)
the relation between this Poincare time coordinate x0 = t and the global time
coordinate in (38.70) is given by the (somewhat unobvious) relation
2t
tan = . (38.118)
1 + z 2 + ~x2 t2
(a more symmetric choice would of course have been possible). Relabelling the
remaining coordinate x1 x, the metric evidently takes the form
7. We had already seen in section 36.5, in equations (36.60), (36.61) and (36.62), and
then again in section 38.2 that the de Sitter metric could be written in such a way
that the constant time slices are maximally symmetric spatial slices with either
k = 0 (Planar Coordinates (38.31)), or k = +1 (Global Coordinates, (38.15)), or
k = 1 (Hyperbolic Coordinates (38.25)).
Analogously, the anti-de Sitter metric can be written in such a way that the
metric on radial slices are maximally symmetric space-times with any sign of the
curvature, k = 0 Minkowski space-time in Poincare coordinates (38.112), k = +1
de Sitter slices in the coordinates (38.102), or k = 1 anti-de Sitter slices in the
coordinates (38.110),
where
exp for k = 0 dx dx for k = 0
fk () = sinh for k = +1 and d2(k) = d21,2 for k = +1
cosh for k = 1 2
d1,2 for k = 1
(38.122)
861
38.3.8 Plane Wave AdS Coordinates
Starting with the null-form (38.120) of the AdS metric in Poincare coordinates and
performing the coordinate transformation
(u, v, x, z) = tan U, V + 12 (Z 2 + X 2 ) tan U, X/ cos U, Z/ cos U , (38.123)
This is the AdS metric in plane wave coordinates or the AdS plane wave metric (the
reason for this nomenclature will be explained below). This metric can also be obtained
directly from the embedding coordinates by solving (38.66) via the parametrisation
1. First of all, note that the plane wave AdS metric (38.124) differs from the null
Poincare metric (38.120) only by the 2nd term (X 2 + Z 2 )dU 2 . In spite of this,
the global properties of this metric are very different from those of the metric
in Poincare coordinates. In particular, unlike the Poincare coordinates, which
only cover the Poincare patch of the anti-de Sitter space-time, these plane wave
coordinates provide a geodesically complete / global covering of the anti-de Sitter
space-time. This can be seen
862
2. The fact that this global metric is so similar to the Poincare metric is in marked
contrast to the relation between the Poincare metric and the AdS metric in the
usual global coordinates (38.71) which appears to bear no resemblance whatsoever
to the Poincare metric. It is also intriguing that the relation between Poincare
time t and the plane wave AdS time U given in (38.123) is so much simpler than the
relation between Poincare time and the usual global time coordinate (38.118),
2t
tan U = u versus tan = . (38.126)
1 + z 2 + ~x2 t2
3. This issue can still be sharpened somewhat by introducing a parameter into the
coordinate transformation (38.123) through
(u, v, x, z) = 1 tan U, V + 21 (Z 2 + X 2 ) tan U, X/ cos U, Z/ cos U ,
(38.127)
leading to the 1-parameter family of metrics
4. This raises the question if, despite their dissimilarity, a 1-parameter family of
metrics can be found that interpolates between the AdS metric in Poincare co-
ordinates and the usual global coordinates. This is indeed possible (and not too
hard once one knows that one should look for it). This metric can be found in the
first reference in the preceding footnote 171.
s2 = 2dU dV (X 2 + Z 2 )dU 2 + dX 2 + dZ 2 .
d (38.129)
Such metrics will be discussed in some detail in section 42. In this context it is
well known that indeed the coeefficient matrix of du2 acts as a harmonic oscillator
potential. Moreover, the existence of the coordinate transformation (38.123) from
or to Poincare coordinates can be understood as an uplift to AdS of the coordinate
transformation that exhibits the fact that isotropic plane waves (i.e. with the same
frequencies in all directions) are conformally flat (42.43).
863
38.3.9 Codimension-2 Hyperbolic Slicing Coordinates
(z 0 )2 + (z 1 )2 + (z 2 )2 = 1 + (z 4 )2 (z 3 )2 . (38.131)
Even with the condition (z 4 )2 (z 3 )2 < 1 it turns out that there are still two different
cases to consider, namely either (z 4 )2 (z 3 )2 < 0 or 0 < (z 4 )2 (z 3 )2 < 1.
(z 4 )2 (z 3 )2 < 0
In this case we solve (38.131) in terms of a radial coordinate r by
Remarks:
864
0 < (z 4 )2 (z 3 )2 < 1
This corresponds to 0 < r 2 < 1 in the above parametrisation, and the form
(38.134) of the metric already suggests that r is really a time coordinate in this
case (and what appeared as t above is a spatial coordinate). Indeed, with the
parametrisation
22 .
ds2 = (1 T 2 )1 dT 2 + (1 T 2 )dR2 + T 2 d (38.139)
38.3.10 Painlev
e-Gullstrand-like Coordinates?
for de Sitter space by starting with the static spherically symmetric form (38.39)
of the metric (to which we could thus apply the general t t + (r) procedure outlined
in section 26.2). We had also seen there that the PG coordinates could be thought of
as interpolating between static and planar coordinates (38.64),
Is there a counterpart of these relations for anti-de Sitter? At first sight, the answer to
this question seems to be a clear no. Indeed, the counterpart of static coordinates for
de Sitter space are the static spherically symmetric and global coordinates
of anti-de Sitter space, in which the metric takes the standard form (38.54), with f (r) =
1 + r 2 . However, attempting to shift t T (t, r) = t + (r) in order to find a metric
with flat constant T spatial slices, grr = 1, requires solving the condition (26.19), which
in the present case reads
!
1 C(r)2 = f (r) = 1 + r 2 . (38.144)
This is evidently not possible, so in this strict sense there are no PG-like coordinates
for anti-de Sitter space.
865
However, there is something analogous that one can do. Comparing the de Sitter planar
coordinates (38.142) with the anti-de Sitter metric in Poincare coordinates (38.112),
and introducing, in analogy with (38.142), polar Milne coordinates (section 36.1),
ds2 = d2 + e 2 (d 2 + 2 d
22 ) (38.146)
one sees that, roughly speaking (38.142) and (38.146) differ from each other by an
exchange of a radial with a time coordinate. And indeed, taking this hint seriously,
one can construct analogues of PG coordinates that are adapted to a suitable family of
spacelike geodesics, and which restrict to the flat Minkowski metric on radial slices of
constant R, with R being proper distance along this family of spacelike geodesics.
While this is a useful exercise, we can also turn the procedure around, i.e.
start with the metric (38.146) in Poincare / Milne coordinates and perform the
coordinate transformation T = exp to obtain a PG-like metric (with roles of
time and radius exchanged);
find a new radial coordinate R(, T ) through = R + (T ) such that the metric
is again diagonal, say.
The resulting metric should then be the analogue of the static spherically symmetric de
Sitter metric, and turns out to be the metric (38.139) (which is indeed the analogue of
the continuation (38.43) of the static de Sitter metric (38.141) beyond the horizon).
= e T 22 )
ds2 = (1 T 2 )d2 + 2T dT d + (dT 2 + T 2 d
(38.147)
= d2 (dT T d)2 + T 2 d 22 .
In particular, in this PG-like metric the metric on slices of constant is exactly the
Minkowski metric (in Milne coordinates).
= R + (T ) . (38.148)
Choosing (T ) to satisfy
(note the analogy with (38.61)) one finds precisely the anti-de Sitter metric in the form
(38.139),
2 .
ds2 = (1 T 2 )1 dT 2 + (1 T 2 )dR2 + T 2 d (38.150)
2
866
Thus the PG-like AdS metric (38.147) interpolates between the Poincare (planar) metric
(38.146) and the metric (38.150).
To see the relation between this PG-like metric and spatial geodesics, we observe the
following:
(1 T 2 )1 (T )2 + (1 T 2 )(R )2 = +1 (38.151)
P = (1 T 2 )R P 2 (T )2 = 1 T 2 , (38.152)
3. The coordinate of the PG-like metric (38.147) is precisely the proper distance
along the spacelike radial geodesics with P = 1 and T = +T ,
Clearly, there are many more possiblities, but this shall suffice. It should be clear from
the above examples how to construct other coordinate systems for AdS adapted to ones
needs.172
We have seen in the previous sections that the metrics of maximally symmetric space-
times can frequently be written in a way which exhibits their slicing by lower-dimensional
maximally symmetric spaces or space-times.
Typically, in the codimension-1 case these metrics have the general form
2
dSK = d 2 + f ()2 ds2k , (38.154)
where
172
And for more erudite and advanced investigations of the AdS geometry see the
1998 lectures by G. Gibbons, Anti-de-Sitter spacetime and its uses, now available as
arXiv:1110.1206 [hep-th], and I. Bengtsson, Anti-de Sitter Space, 1998 Lecture Notes, available as
www.fysik.su.se/~ingemar/Kurs.pdf.
867
2 is the line-element of a maximally symmetric space(-time) with constant
dSK
curvature K
For = +1, thus = r a radial coordinate, and f (r) = r, one obtains what is known as
the metric on the cone over the space(-time) with line-element ds2k , in general given by
gij (x)dxi dxj Cone Metric: ds2 = dr 2 + r 2 gij dxi dxj . (38.155)
A familiar example is the Euclidean metric on Rn+1 , which can be written in polar
coordinates as the cone metric over S n ,
so in this case the cone has actually been flattened out to Rn+1 . However, if one were
to replace S n by a less symmetric space, there would be a (conical) singularity at the
tip r = 0 of the cone.
of cosmology.
I will refer to the general class of metrics in (38.154) as (generalised) spacelike or timelike
cone metrics. They can also be considered as special cases of so-called warped product
metrics, which are metrics of the form
M = B f F (38.159)
of spaces or space-times, with hab a metric on the base B, gij a metric on the fibre F , the
f indicating that M does not carry the direct product metric but that the metric on
the fibre F is twisted or warped by the function f (y) on the base B. It is a reasonably
elementary exercise to work out the Riemann curvture tensor on M in terms of the
curvature tensors of the metrics hab and gij , and the function f and its derivatives, but
we will not consider the issue in this generality.
868
Rather, returning to the issue of writing maximally symmetric metrics in the form
(38.154), we now want to address the question what determines in general what choice
of , k, f () is required or possible to realise a maximally symmetric space(-time) with
a given K, say (or any variation of this question).
A quick way to answer this question is to make use of the fact that maximally symmetric
spaces are characterised by the property of having constant curvature, in the sense of
(13.10)
R = k(g g g g ) . (38.160)
Using this result for the Riemann curvature tensor of ds2k , it is straightforward to cal-
culate the curvature tensor of the generalised cone metric ds2K . This is just a minor
generalisation of the calculation of the Riemann tensor of the Robertson-Walker metric
(the = 1 version of (38.154)) performed in section 34.1. Requiring that the cone
metric with line-element ds2K be maximally symmetric, i.e. that its curvature tensor also
has this form, one finds the constraint
(K1) : k = (f )2 + Kf 2 (38.162)
A special (and especially boring) case that we will take care of (and dismiss) first is
f = 0, i.e. f constant, so that (38.154) describes a direct product metric. Then one
finds K = k = 0, and this just corresponds to the possibilities
We now concentrate on f 6= 0. Then (K1) implies (K2) (by differentiation) and con-
versely (K2) implies (K1), with k arising as an integration constant (in fact, (K1) is the
energy/Hamiltonian corresponding to the equation of motion (K2)).
f + Kf = 0 and k = (f )2 + Kf 2 . (38.165)
869
1. K = 0:
The solution is
f = ar + b k = a2 0 (38.166)
for constants a and b, and dismissing the case of constant f we have already
dealt with, we can without loss of generality choose f (r) = r, so that we are
dealing with a standard cone metric. Then k = +1 and we find either the
usual polar coordinate decomposition (38.156) of the Euclidean metric, or its
Lorentzian signature counterpart
and in this case there are essentially 3 distinct choices of a and b, leading to
the 3 different possible values of k.
(a) k = 0: This arises for a = 0 or b = 0, thus f () = exp , leading to
870
(b) k = +1: This arises for a = 1/2, b = 1/2 (say), thus f () = sinh ,
leading to
d 2 = d2 + sinh2 d2
n+1 n
(38.173)
d = d + sinh d2
2 2 2
1,n 1,n1
which is the standard form (13.33) of the metric on H n+1 and the AdS
metric in de Sitter slicing coordinates (38.102) respectively.
(c) k = 1: This arises for a = b = 1/2. Thus f () = cosh , and this gives
rise to
d 2n+1 = d2 + cosh2 d 2n
(38.174)
d 21,n = d2 + cosh2 d
21,n1
The former is the hyperbolic analogue of the nested coordinates for the
sphere, the latter the AdS metric in AdS slicing coordinates (38.110) .
= 1: Timelike Cones
In this case, is a time coordinate (which we will call t), and in order for ds2K to
have Lorentzian signature (and not two time directions), the metric to be warped
(with line element ds2k ) necessarily has Euclidean signature, so this reduces the
number of possibilities somewhat compared to the case = +1.
The equations governing this case are
f Kf = 0 and k = (f )2 + Kf 2 . (38.175)
1. K = 0:
The solution is
f = at + b k = a2 0 (38.176)
and discarding the case of constant f we are left with f (t) = t and k = 1.
The corresponding metric
2
d~x21,n = dt2 + t2 d (38.177)
n
871
(b) k = +1: This arises for a = b = 1/2, say, so the solution is f (t) = cosh t,
leading to the de Sitter metric in global coordinates (38.15),
2n
d21,n = dt2 + sinh2 t d (38.181)
We thus see that we have been able to reproduce many of the metrics found in sections
38.2 and 38.3 from this more general perspective, perhaps shedding some light on the zoo
of coordinate systems found there. In particular, we have seen how the conditions (K1)
(38.162) and (K2) (38.163) correlate the choice of curvatures k and K, the signature
of the cone direction, and the choice of warping function f .
872
39 Vaidya Metrics I: Bondi Gauge and Radiation Fields
To set the stage, we introduce the Vaidya metrics and list some of their basic properties.
dx dx = dv 2 + 2dvdr + r 2 d2 , (39.4)
873
and
v v = vv = 0 . (39.5)
This can also be made more explicit in terms of the Kerr-Schild (or Edding-
ton) time coordinate t defined by v = t + r, in terms of which the metric
takes the form
2m
ds2 = dt2 + dr 2 + r 2 d2 + d(t + r)2 . (39.6)
r
For the Schwarzschild metric, t is simply related to the Schwarzschild coor-
dinate t and the tortoise coordinate r by v = t + r = t + r.
(c) The curvature tensor simplifies accordingly, and the only non-vanishing com-
ponent of the Einstein tensor G is Gvv , with
2m (v)
Gvv = . (39.7)
r2
Equivalently, the only non-vanishing component of G is
2m (v)
Grv = . (39.8)
r2
Thus (39.2) solves the Einstein equations for an energy-momentum tensor of
the form
m (v) v v
T = . (39.9)
4GN r 2
(d) By construction, this energy-momentum tensor is covariantly conserved, T =
0. Explicitly, the only non-trivial component of this equation is
2
T v = 0 r 2 r (r 2 Tvv ) = r Tvv + Tvv = 0 , (39.10)
r
which is evidently satisfied for any energy-momentum tensor r 2 .
(e) If T is to satisfy some reasonable energy-condition (cf. section 21.1) like
the null energy condition (NEC), which requires positivity of Tvv , one needs
a non-decreasing mass, m (v) 0. The ingoing Vaidya metric thus describes
the metric of a star or black hole with infalling null dust or incoherent radi-
ation.
(f) The Kretschmann scalar (7.51) is
K R R = 48m(v)2 /r 6 , (39.11)
874
(a) It reduces to the Schwarzschild metric in outgoing Eddington-Finkelstein
coordinates for constant mass function m(u) = m0 .
(b) The only non-vanishing component of the Einstein tensor is Guu , with
2m (u)
Guu = . (39.13)
r2
Thus (39.12) solves the Einstein equations for an energy tensor of the form
m (u) u u
T = . (39.14)
4GN r 2
Again this is conserved due to the r 2 -behaviour,
2
r Tuu + Tuu = 0 . (39.15)
r
(c) If T is to satisfy some reasonable energy-condition (like positivity of Tuu ),
one needs a non-increasing mass, m (u) 0. The outgoing Vaidya metric
thus describes the metric of a radiating star (or, possibly, of an evaporating
black hole).
A generalisation of the Vaidya metric can be obtained by allowing the mass function to
be an arbitrary function of the coordinates (v, r) or (u, r):
875
(b) the Vaidya-Reissner-Nordstrm metrics with
2m(v) q 2
f (v, r) = 1 + 2 , (39.19)
r r
which solve the Einstein equations for an energy-momentum tensor that is
the sum of the Vaidya energy-momentum tensor (ingoing null dust) and
the electrostatic Maxwell energy-momentum tensor for a point charge with
constant charge q;
(c) the Vaidya-Bonnor metrics with
2m(v) q(v)2
f (v, r) = 1 + 2 , (39.20)
r r
which further generalise this to allow for an injection of charge in addition
to mass into the star or black hole.173
The interpretation of the mass function m(v) (or m(u)) of the Vaidya metric is brought
out most clearly by noting that, as in (23.72),
2m(v) 2m(u)
g rr = 1 or grr = 1 , (39.21)
r r
so that m(v) (or m(u)) is the invariantly defined Misner-Sharp mass (23.75) for spherical
symmetry, measuring the amount of mass enclosed by the 2-sphere of constant v (or u)
and r,
MM S (v, r) = m(v) , MM S (u, r) = m(u) . (39.22)
t = v r or t = u + r (39.23)
v = t + r = t + r or u = t r = t r (39.24)
173
For a study of gravitational collapse respectively black hole evaporation in this setting see e.g. K.
Lake, T. Zannias, Strucutre of singularities in the spherical gravitational collapse of a null fluid, Phys.
Rev. D43 (1991) 1798-1802, and M. Parikh, F. Wilczek, Global Structure of Evaporating Black Holes,
arXiv:gr-qc/9807031.
174
For some other applications and appearances of the generalised Vaidya metric, see e.g. V. Husain,
Exact solutions for null fluid collapse, arXiv:gr-qc/9511011, and A. Wang, Y. Wu, Generalized Vaidya
Solutions, arXiv:gr-qc/9803038.
876
for the Schwarzschild metric introduced in (26.106) and (26.117)), the mass function
m = m(t r) acquires the interpretation of the amount of mass enclosed by a 2-sphere
of constant t and r. Considering a fixed time-slice t = t0 and taking r requires
taking v respectively u . In the limit r , the Misner-Sharp mass
reduces to the ADM mass or energy (cf. sections 22.4 and 23.8) so that
(thus for physically meaningful space-times the mass function m should be bounded as
v or u ).
That this ADM mass limit of the mass function is indeed a conserved quantity (inde-
pendent of the chosen time t0 ) can be understood from the observation that at any given
time a spacelike hypersurface will intersect all the constant v null worldlines along which
null matter flows into the star or black hole. This is identical to the total mass of the
black hole as v . Likewise for the outgoing Vaidya metric a spatial slice extending
to infinity will intercept all the outgoing null lines of constant u along which null matter
escapes from the star, but the total energy (given by the initial mass m(u = ))
will be conserved. At any finite r, thus finite u or v, the mass function can then be
interpreted as the enclosed mass in a sphere of radius r at the time t0 .
If, instead of going to spatial infinity one goes to null infinity, instead of the ADM mass
one has the so-called Bondi-Sachs mass MBS (u) at ones diposal (with u thought of as
a coordinate at future null infinity I + labelling the outgoing null geodesics of constant
u). In particular, now keeping u fixed and taking r one finds that m(u) agrees
with the Bondi-Sachs mass at future null infinity,
It keeps track of the mass decrease through the amount of radiation that escapes to
infinity as the mass m(u) decreases from its initial value m(u = ). This can be
seen by writing the expression (39.14) for the energy-momentum tensor in terms of the
(outgoing null) energy density out as
m (u)
out = , (39.27)
4GN r 2
so that the total flux
F = 4r 2 out (39.28)
877
39.3 Einstein Equations in the Bondi Gauge (Radiative Coordinates)
All of the above are (of course) special cases of the general spherically symmetric metric,
which can be conveniently parametrised in terms of 2 arbitrary functions f (w, r) and
h(w, r) as
2m(w, r)
f (w, r) = 1 . (39.31)
r
This is the general spherically symmetric metric written in radiative coordinates, or in
the so-called Bondi gauge, the retarded / advanced Eddington-Finkelstein-like counter-
part of the Schwarzschild-Birkhoff ansatz (23.61)
with
2ms (t, r)
fs (t, r) = 1 (39.33)
r
(the subscript s on the functions indicating that these are a priori not the same functions
as those appearing in the ansatz (39.30)).
Remarks:
1. The Bondi gauge is adapted to radial null geodesics in the sense that w = const.
are radial null rays. For f (w, r) > 0 r > 2m(w, r), r decreases along future-
directed null rays with w v constant for = +1 (ingoing coordinates), and r
increases along future-directed null rays with w u constant for = 1 (outgoing
coordinates).
2. Within the Bondi gauge, there is still the freedom of reparametrising w: any
change of variables of the form
w w(w)
: = e h(w) dw
dw (39.34)
878
to affine transformations. If furthermore the metric is asymptotically flat, the
normalisation can be fixed by requiring e.g. that the norm of the Killing vector
||t |||2 1 asymptotically, and this then only leaves the (unavoidable because
of time-translation invariance) ambiguity t t + t0 for some constant t0 .
4. Starting from the metric in the Bondi gauge (39.30), one can go the Schwarzschild
gauge (39.32) by introducing w = w(t, r) through
e h(w, r) dw = e hs (t, r) dt + fs (t, r)1 dr (39.36)
where h(w, r) is the required integrating factor. Then (39.30) takes the form
(39.32) with fs (t, r) = f (w, r), or more explicitly
2m(w(t, r), r)
fs (t, r) = 1 = f (w(t, r), r)
ms (t, r) = m(w(t, r), r) .
r
(39.37)
We will look at this in somewhat more detail in section 39.5.
If h(w, r) = 0 (or depends only on w) and f depends only on r, one can choose
hs (t, r) = 0 and (39.36) reduces to the standard relation
dw = dt + f (r)1 dr = dt + dr . (39.38)
between advanded or retarded and tortoise coordinates of a static black hole met-
ric.
r(z)
m(z) = 1 g ab (z)a r(z)b r(z) . (39.40)
2
The observation above that in the transformation from the Bondi to the Schwarz-
schild gauge one has ms (t, r) = m(w, r) (39.37) is a particular manifestation of the
fact that m(z) is a scalar under coordinate transformations preserving the form
(39.39) of the metric.
As we saw in section 23.6, solving the vacuum Einstein equations in the standard spher-
ically symmetric gauge (39.32), one recovers Birkhoffs theorem and the Schwarzschild
metric in the standard Schwarzschild coordinates. However, at least with the benenfit of
879
hindsight, it is clear that a better gauge may be one which is adapted to radial lightrays
rather than to static observers. Such an ansatz is provided by the general spherically
symmetric metric written in the Bondi gauge (39.30).
For both signs, the independent (w, r)-components of the Einstein tensor take the simple
form (the counterpart of (23.64))
2r m(w, r)
Gww =
r2
2w m(w, r)
Grw =+ (39.41)
r2
2r h(w, r)
Grr =+ .
r
From these one can then also deduce the missing components, such as
One sees that in the chosen (Bondi gauge) parametrisation, the components in (39.41)
are the simplest complete set of independent components and building blocks of the
Einstein tensor, and therefore a particuarly convenient starting point for analysing and
solving the Einstein equations. The angular components G = G (that they are equal
and that G = 0 is implied by spherical symmetry) are more involved, but are often
not needed in practice, as they can be substituted by the Bianchi identities.
where the new coordinate w labelling radial in or out geodesics is defined by the
simple change of variables
dw = e h(w) dw , (39.44)
880
2. Characterisation of Vaidya Metrics
Starting from the general form (39.30) of the metric and its Einstein tensor (39.41),
one can now deduce that the most general metric describing a purely ingoing
( = +1) or outgoing ( = 1) matter content with Tww the only non-vanishing
component of the Einstein tensor, is the ingoing or outgoing Vaidya metric. In-
deed, from this assumption one deduces
G w w Grr = 0 r h(w, r) = 0
(39.45)
Gww = 0 r m(w, r) = 0
so that the metric can be put into the standard Vaidya form
2 2m(w)
ds = 1 dw2 + 2dwdr
+ r 2 d2 (39.46)
r
by the same redefinition (39.44) of the null coordinate as in the vacuum case.
While we have seen above that Vaidya metrics are characterised by the fact that they
have an energy-momentum tensor T w w , it is useful to rephrase this somewhat,
and we will do this in an elementary fashion in this section. This can also be formu-
lated in a geometrically somewhat more satisfactory (because less coordinate-dependent)
form, in terms of geodesic congruences, and we will study these (for these and other
reasons) in some detail later on.
T = F F + 14 g F F (39.47)
T = ( + p)u u + pg , u u = 1 , (39.48)
one also has what is known as a pure radiation field or null dust, characterised by an
energy-momentum tensor of the form
T = k k , k k = 0 . (39.49)
Remarks:
1. More generally, such an energy-momentum tensor could contain a sum over dif-
ferent species of massless particles moving at the speed of light,
X (a)
T = (a) k(a) k , k(a) k(a) = 0 . (39.50)
a
881
In particular, such an energy-momentum tensor is traceless,
X
T = (a) k(a) k(a) = 0 . (39.51)
a
2. Such an energy-momentum tensor can arise e.g. from null Maxwell fields or from
massless scalar fields (in some geometric optics / eikonal approximation). For
example a spherical outgoing scalar wave of the form (u, r) = (u)/r with u =
t r in Minkowski space gives rise to such an energy-momentum tensor when
terms like (u)/r (u-derivatives) dominate over terms like (u)/r 2 (r-derivatives),
leading to
(u)2 u u
T . (39.52)
r2
3. The absolute normalisation of the null energy density is not determined by this
form of the energy-momentum tensor since it scales under a space-time dependent
boost of k,
k e (x) k (x) e 2(x) (x) . (39.53)
We now consider again a general spherically symmetric space-time and denote the tan-
gent vectors to an ingoing (respectively outgoing) congruence of (not necessarily affinely
parametrised) radial future oriented null geodesics by
In the Bondi gauge (39.30), a natural (albeit asymmetric) choice for and n is
= +1 : n = r , = e h v + 21 f r
(39.56)
= 1 : = +r , n = e h u 12 f r .
Examples:
T = in n n . (39.57)
The general solution of the Einstein equations is given by the ingoing Vaidya
metric (39.2). For the ingoing Vaidya metric one can choose
882
with covariant components
n = v , v = 21 f , r = 1 . (39.59)
In these adapted coordinates (adapted to ingoing null geodesics) one has (sup-
pressing the two angular dimensions)
!
1 0
n n = . (39.60)
0 0
Therefore the energy-momentum tensor (39.9) of the ingoing Vaidya metric indeed
has the purely ingoing form
m (v)
T = n n . (39.61)
4GN r 2
2. We can also easily deduce this specific form of the energy-momentum tensor from
the geometry. Writing a general ansatz for the energy momentum tensor of the
ingoing Vaidya metric as
T = (v, r)n n , (39.62)
the covariant divergence of the energy-momentum tensor is, using the general
formulae
n n = n n , n = n + n (39.63)
from section 11.5 for the inaffinity and expansion of a null congruence,
T = (n + n )n + n n = (n + (n + 2n )) n .
(39.64)
In the case at hand, with n = r , we have n = 0 (r is an affine parameter along
ingoing radial null geodesics for the ingoing Vaidya metric), and in section 31.8
we determined the ingoing expansion (contraction) to be n = 2/r (exactly as
in Minkowski space).
Conservation of the energy-momentum tensor thus requires
(v)
T = 0 r (v, r) + 2(v, r)/r = 0 (v, r) = , (39.65)
r2
which is precisely the general form of the Vaidya energy-momentum tensor.
T = out . (39.66)
The general solution of the Einstein equations is given by the outgoing Vaidya
metric (39.12). For the outgoing Vaidya metric one can choose
883
with covariant components
= u , nu = 21 f , nr = 1 . (39.68)
and the energy-momentum tensor (39.14) of the outgoing Vaidya metric indeed
has the purely outgoing form
m (u)
T = . (39.70)
4GN r 2
This makes it even more manifest that the ingoing Vaidya metric describes purely
ingoing null matter (and the outgoing Vaidya metric purely outgoing null matter).
Likewise, the generalised ingoing Vaidya metrics (39.16) cannot describe purely outgoing
matter (and vice-versa), as can be seen from the Einstein tensor (39.17): outgoing
matter should have an energy-momentum tensor proportional to which, in ingoing
coordinates, has the form !
1 2
4 f 21 f
= (39.71)
12 f 1
(the expressions for n and from (39.58) are still valid in this case, since the conditions
n2 = 2 = 0 and n. = 1 are purely algebraic and do not depend on whether or not the
mass function depends on r). In particular, therefore, for outgoing pure radiation fields
in ingoing coordinates one necessarily has Grr 6= 0. The generalised ingoing Vaidya
metric, on the other hand, has Grr = 0 (and (39.41) shows that Grr 6= 0 requires a
non-trivial, i.e. r-dependent, h(w, r)).
We have just seen that what characterises the Vaidya metric is a matter content con-
sisting of purely in- or outgoing radiation, and that in the Bondi gauge this corresponds
to an energy-momentum tensor of the form
m (w)
T = (w, r)k k , (w, r) = (39.72)
4GN r 2
where k = w . It is also of interest to understand what characterises these metrics
in the usual Schwarzschild-Birkhoff gauge (39.32) and how to write the Vaidya metrics
in this gauge. In principle, the answer to this question is provided by the coordinate
transformation (39.36),
dw(t, r) = e h(w(t, r), r) e hs (t, r) dt + fs (t, r)1 dr (39.73)
884
between the Bondi and Schwarzschild gauges. If the metric in the Bondi gauge is of the
Vaidya form, one has h(w, r) = 0 (as well as m(w, r) = m(w)), and (39.73) reduces to
dw(t, r) = e hs (t, r) dt + fs (t, r)1 dr (39.74)
where
2ms (t, r) 2m(w(t, r))
fs (t, r) = 1 =1 . (39.75)
r r
Here we have used the fact, noted in section 39.3, that the mass function transforms
as a scalar under this coordinate transformation (39.37), and we will for notational
simplicity set ms = m in the following.
Thus in practice one needs to solve the equations
w(t, r) 2m(w(t, r)) 1
= 1 (39.76)
r r
and
w(t, r)
= e hs (t, r) , (39.77)
t
with hs (t, r) related to fs (t, r) by the integrability condition
This is an arduous task in general, even for simple Vaidya metrics. A quick way to deter-
mine the general form of the Vaidya metric in the Schwarzschild gauge, without having
to explicitly solve these equations in order to determine the coordinate transformation
w = w(t, r), is to write
so that
t m(t, r) = m (w)e hs , r m(t, r) = m (w)fs1 . (39.80)
In particular, this implies that
t m(t, r)
= fs e hs , (39.81)
r m(t, r)
which gives the desired relation between hs (t, r) and fs (t, r) or m(t, r). In particular,
the general Vaidya metric can now be written as
This is the general form of the Vaidya metric in the Schwarzschild gauge.
While we have performed this coordinate transformation at the level of the metric, we
can of course also do this at the level of the field equations and the energy-momentum
885
tensor. This is instructive in its own right, as it will tell us what is the form of the
energy-momentum tensor in the Schwarzschild gauge which will simply give rise to a
Vaidya metric in (Schwarzschild-) disguise upon solving the Einstein equations.
On the one hand, in the gauge (39.32) the (t, r)-components of the Einstein tensor have
the simple form (23.64)
r m(t, r) = 4GN r 2 (T tt )
t m(t, r) = 4GN r 2 (+T rt ) (39.83)
1
r hs (t, r) = 4GN rfs (t, r) (T tt + T rr ) .
Inserting this into the Einstein equations (39.83) one (re)discovers the relations (39.80).
which reflects the fact that the original Vaidya energy-momentum tensor was traceless.
Equivalently, using terminology that is adapted to the coordinates (t, r) of the Schwarz-
schild gauge, we can rephrase this as the statement that the energy density and radial
pressure are equal,
T rr + T tt = 0 = Pr . (39.88)
Moreover, one has the relation
T tt = e hs fs T tr , (39.89)
which expresses the lightlike nature of the energy-momentum content. Indeed, in the
Schwarzschild gauge an ingoing (for = +1) respectively outgoing (for = 1) radial
null vector k is characterised by
k k = 0 kr = fs e hs kt , (39.90)
T tt = e hs fs T tr T t k = 0 . (39.91)
Together
886
the tracelessness condition (39.88)
and the fact that the energy-momentum tensor is spherically symmetric and purely
longitudinal (i.e. that its transverse angular components are zero)
887
40 Vaidya Metrics II: Radial Null and Timelike Geodesics
In order to improve our understanding of the physics of the Vaidya metrics, and to
explore their causal structure, in this section we now look at radial and timelike null
geodesics and their properties.
m(v) 2
v + v =0
r2 (40.2)
m (v) 2
r + v = 0
r
(the null condition (40.1) has been used to put the r-equation into this simple form) or
by appropriate first integrals of these equations arising from conserved charges. These
are available only for special choices of m(v). There are two cases (I am aware of),
namely
Continuing for now with the general (but not generalised) Vaidya metric, ingoing radial
null geodesics satisfy dv = 0 or v = 0, and thus from (40.2)
v( ) = v0 , r( ) = r0 + r0 . (40.3)
The tangent vector x = dx /d is future-oriented, i.e. such that its scalar product
with t = v is negative, for r0 < 0,
!
g x (t ) = gv x = r = r0 < 0 , (40.4)
so that indeed the radius decreases along future-oriented ingoing null geodesics. More-
over, since r is affinely related to , r is an affine parameter along these ingoing null
geodesics. In particular, this means that the ingoing null vector field n = r introduced
in (39.58) is affinely parametrised, i.e. satisfies
n n = 0 . (40.5)
888
One can also see this directly from the calculation of the Christoffel symbols,
n n = rr = 0 (40.6)
(since the only non-trivial metric component gr is the constant grv = 1).
( ) = 12 ( ) = 0
r = 12 f v (40.7)
= ( v )
and noting (e.g. from the v-equation in (40.2)) that the only non-trivial Christoffel
symbol v is vvv = m(v)/r 2 , one deduces the inaffinity of ,
m(v)
= . (40.8)
r2
Affinely parametrised outgoing radial null geodesics, on the other hand, satisfy (40.2)
and
dr
f (v, r)dv + 2dr = 0 f (v, r)v + 2r = 0 2 = f (v, r) . (40.9)
dv
Remarkably, with the help of the null condition (40.9) the non-linear second order
geodesic equations can be integrated to first-order differential equations.175 As the
derivation is not given in that reference, we provide it here. First of all, we write
m (v)v = m
(40.10)
and use (40.9) to eliminate the remaining v from the r-equation in (40.2). Then one
finds
2m d 1
0 = r + r = r r log(r 2m) + r 2
r 2m d r 2m
d (40.11)
log(r/(r
2m)) = r/(r
2m)
d
r/(r
2m) = 1/( 0 )
or
dr r( ) 2m(v( ))
= . (40.12)
d 0
From (40.9) one then also deduces
dv 2r( )
= , (40.13)
d 0
175
R. Waugh, K. Lake, Backscattering radiation in the Vaidya metric near zero mass, Phys. Lett. B116
(1986) 154-156.
889
so that these are future-directed curves (increasing v) for > 0 . Equations (40.12) and
(40.13) imply the outgoing null condition (40.9) and govern the behaviour of outgoing
lightrays in the Vaidya metric. Without loss of generality we can set 0 = 0.
These 1st order equations allow us to rewrite the null geodesic equations (40.2) for
outgoing null geodesics in a way that will turn out to be useful later on: the geodesic
equation for r in (40.2) together with (40.13) leads to
4m (v)
r + r=0 ; (40.14)
2
likewise, the geodesic equation for v in (40.2) together with (40.13) leads to
4m(v)
v + =0 . (40.15)
2
These equations become particularly tractable (namely linear) in the case of a linear
mass function m(v) = v that we will study in detail later.
As a warm-up exercise, a first application of the above results, and for comparison (and,
later on, matching) purposes, we first rederive the equations for outgoing lightrays and
for the horizon generators in the constant mass Schwarzschild case m(v) = m0 in ingoing
Eddington-Finkelstein coordinates before addressing the same problem in the dynamical
Vaidya context.
For the Schwarzschild metric one has m (v) = 0 and (40.2) implies that r = 0 so that
r = E (40.16)
r( ) = 2m0 + E( 0 ) , (40.18)
For 0 , these lightrays emerge from the past horizon r = 2m0 and v .
This illustrates the past geodesic incompleteness of the ingoing Eddington-Finkelstein
coordinates.
A special case is the geodesic (or S 2 -family of geodesics) for E = 0, for which
890
with determined up to affine transformations a + b and v1 = v( = 1). These
null geodesics lie on and generate the future event horizon of the Schwarzschild black
hole.
Remarks:
V = e v/4m0 = (40.21)
2. Note also, for comparison with the Vaidya metric, that (40.9) implies that the 2nd
derivative d2 r/dv 2 is
d2 r m0
2
= 2 f (r) . (40.22)
dv 2r
For r > 2m0 one has f (r) > 0. Thus r(v) is convex and an initially outgoing
lightray, dr/dv > 0, will remain outgoing at all times. For r < 2m0 , on the other
hand, one has f (r) < 0 and therefore dr/dv < 0 and d2 r/dv 2 < 0. Thus r(v) is
concave, moving towards smaller values of r, and will ultimately reach r = 0 at a
finite value of v.
One can go through the same exercises for the outgoing Vaidya metric (39.12)
2m(u)
ds2 = f (u, r)du2 2dudr + r 2 d2 , f (u, r) = 1
. (40.23)
r
Here we now assume that the mass function m(u) is non-negative (m(u) 0) and
non-increasing (m (u) 0).
Outgoing null geodesics are given by u = const., and r is an affine parameter along
these outgoing null geodesics.
891
(a sign flip only in the 2nd equation, and we have set an integration constant 0 to
zero). We have r < 0 and u > 0 for r > 2m and < 0, the restriction on the range of
already suggesting a potential future geodesic incompleteness of this coordinate system.
We will come back to this issue below.
With the help of (40.26), the equations (40.25) can be put into the form
4m (u) 4m(u)
r r=0 ,
u =0 , (40.27)
2 2
the counterparts of (40.14) and (40.15), and again these will become linear decoupled
harmonic oscillator equations in the case of a linear (and now decreasing) mass function
m(u).
In order to improve our understanding of the Vaidya geometry, we will now relate the
data given by the geometry and matter content (metric and energy-momentum tensor)
to those measured by an oberver. We will concentrate on the outgoing (radiating)
Vaidya metric, but of course the ingoing case can be discussed in complete analogy.176
For f (u, r) = 1 2m(u)/r with m(u) bounded one has an asymptotically flat metric
in the (crude) sense that f 1 for r . In that case, it follows from the Vaidya
176
This discussion is adapted (and expanded) from R. Lindquist, R. Schwartz, C. Misner, Vaidyas
Radiating Schwarzschild Metric, Phys. Rev. 137 (1965) 5B, B1364-B1368.
892
line element that the proper time for an oberver at rest at infinity is simply d = du.
Thus (40.31) can be interpreted as the relation between the observers proper time and
the proper time at infinity,
d = (Eu + r )1 do . (40.33)
As usual, this formula will be related to that for gravitational redshift to be discussed
below.
We will also use them in section 40.4 to take a more detailed look at, and interpret,
the equations for timelike geodesics, and in section 40.6 to locate and detect potential
surfaces of infinite redshift and discuss the issues that arise in relation to them.
1. The null wave vector k of outgoing lightrays, in particular of the outgoing ra-
diation due to the energy-momentum tensor, will be proportional to the affinely
parametrised outgoing null vector = (r ) (39.67). The frequency o of this
lightray as determined by the timelike observer will essentially be given by the
projection of the wave vector k onto the observers rest-frame, namely
o = (x ) k (x ) = u = (Eu + r )1 . (40.34)
We can also equivalently phrase this in terms of the observer emitting outgoing
lightrays. Since outgoing lightrays travel along lines of constant u, signals with
initial separation u are received at infinity with the same separation u. The
different perceived frequencies are due to the differences in proper time, and this
explains the equivalence between (40.33) and (40.34).
2. It follows from (39.66) and (39.70) that the (null) energy density of the Vaidya
metric is
m (u)
out = , (40.35)
4GN r 2
so that (as noted before, in section 39.2) the total flux through the sphere of radius
r is independent of r and given by
and likewise
F = m (v)/GN (40.37)
893
for the ingoing Vaidya metric. Here the convention has been chosen that the flux
F m (w) is positive for both ingoing and outgoing radiation.
However, an oberver will not necessarily detect this static energy density and
flux. The same reasoning as above for the redshift leads to the conclusion that
the energy-density of the outgoing radiation provided by the background energy-
momentum tensor measured by this observer in his rest-frame is
o = T (x ) (x ) = out ( (x ) )2 = out (u )2 (40.38)
or
m (u)
o = (u )2 . (40.39)
4GN r 2
Thus o can be written in terms of the observers radial velocity r as
m (u)
o = (Eu + r )2 . (40.40)
4GN r 2
For an observer at rest at infinity one thus finds the total flux or luminosity
F = lim
lim 4r 2 0 = m (u)/GN = F (40.41)
r 0 r
by
Fo = F (u )2 = F (Eu + r )2 . (40.43)
As for the Hubble distance - redshift relation (section 33.8), the double redshift
factor is due to (a) the redshift of the energy (e.g. of each individual photon) as
it moves outwards and (b) the dilation of the time-interval over which the energy
is emitted (e.g. of the rate at which photons are emitted).
894
thus justifying the notation / abbreviation introduced in (40.30). Equation (40.32),
(r )2 + f (u, r) = (Eu )2 , (40.47)
then has the usual interpretation of a one-dimensional efffective potential equation for
the radial dynamics. The main difference from the static case is that here Eu is not
conserved.
For a solution of the Euler-Lagrange equations, one can determine the time-dependence
of Eu from
L m (u) 2
Eu = = (u ) . (40.48)
u r
Using (40.42), this can also bewritten in the intuitively attractive form
GN Fo
Eu = , (40.49)
r
stating that the change in energy of the particle is due to the energy flux (luminosity)
of the background energy-momentum tensor.
The radial equation of motion turns out to take the remarkably simple form
GN Fo m(u)
r = 2 . (40.50)
r r
It is best obtained not by differentiation of (40.47) (and division by r assuming that
one is not dealing with circular paths), since it requires a bit of rearrangement to put
the resulting equation into the form (40.50), but rather as the Euler-Lagrange equation
for r, rewritten using (40.31). Indeed, the Euler-Lagrange equation reads
d L L m(u) 2
= u = (u ) (40.51)
do r r r2
while
u 1
2 = = (r + Eu ) . (40.52)
(u ) u
Using (40.49) and (40.52), (40.51) can be written in the form (40.50).
Remarks:
1. This shows that the energy-flux gives a new non-Newtonian long-range contribu-
tion to the gravitational force, induced by the varying Newtonian term m(u)/r 2 .
3. If one considers non-radial motion, with conserved angular momentum L, then the
additional terms in the radial equation of motion are just the standard angular
momentum barrier term L2 /r 3 and the standard general relativistic correction
term m(u)L2 /r 4 .
895
40.5 Future Incompletetness of Outgoing Eddington-Finkelstein Coor-
dinates
We will now look at some issues related to the potential future incompleteness of the out-
going coordinate system at u = + for the general outgoing Vaidya metric. We could
have analogously discussed the potential past-incompleteness of the ingoing coordinate
system at v = , but past horizons are generally considered to be less physically
relevant than future horizons, which can form in the process of gravitational collapse,
and thus we focus on the outgoing case.
To set the stage for the subsequent discussions, and to remind ourselves of the basic
properties of outgoing coordinates, we will briefly recall the solutions for ingoing null
geodesics in these coordinates for the two special cases m(u) = 0 (Minkowski space)
and m(u) = m0 constant (Schwarzschild in outgoing Eddington-Finkelstein coordinates
that cover the Schwarzschild patch as well as the past white hole region).
1. For m(u) = 0 one has Minkowski space-time and the coordinate u is related to the
usual Minkowski coordinates (t, r) by u = t r. The 1st order geodesic equations
(40.26) are simply
r = r/ , u = 2r/ , (40.53)
and therefore
The integration constant c can be identified with (and used to construct) the
ingoing lightcone coordinate v = t + r from the outgoing lightcone coordinate
u = t r. Indeed, prior to the extension to > 0 (i.e. for geodesics that are
ingoing) one has
c = u 2E = u + 2r = t + r = v , (40.56)
so that ingoing null geodesics are lines of constant v.
2. For m(u) = m0 > 0 a positive constant one has the Schwarzschild geometry, and
the coordinate u is related to the usual Schwarzschild time coordinate t and the
tortoise coordinate r ,
r = r + 2m log |r 2m| , (40.57)
896
by u = t r . The 1st order geodesic equations (40.26) are simply
which integrate to
In this case, the situation is quite different. As 0 , one has r 2m0 and
u +. This is an infinite redshift surface and the future event horizon (points
on the past horizon correspond to r = 2m and u finite). Thus the space-time and
the coordinates need to be extended beyond u = +.
In the present case this is easily done by noting that the integration constant c
is (up to other constants) equal to the ingoing Eddington-Finkelstein coordinate
v = t + r . Indeed,
Thus ingoing radial lightrays are lines of constant v and the ingoing Eddington-
Finkelstein coordinates (v, r) cover the Schwarzschild patch as well as its future
extension beyond the future horizon (now located at r = 2m with v finite, v =
corresponding to the past horizon at which the newly constructed ingoing
Eddington-Finkelstein coordinates (v, r) are incomplete).
In section 40.3, we determined the gravitational redshift in the outgoing Vaidya ge-
ometry. We now try to determine and locate possible surfaces of infinite redshift. To
that end, note that it follows from (40.33) that infinite time dilation or infinite gravita-
tional redshift occurs when Eu + r 0. It then follows from (40.32) that a necessary
condition for this to occur is that f (u, r) 0 or r 2m(u) (from above),
Since f > 0 for r > 2m(u), and u > 0 for a future oriented path it then follows from
(40.30) that
Eu + r = f u + 2r 0 r < 0 . (40.62)
It is easy to see that this cannot occur at finite u:
f (u, r) 0 2u r +1 , (40.63)
which rules out a negative r . Indeed, r = 2m(u) with u finite is exactly like
the past event horizon for the Schwarzschild metric which can only be crossed
897
along future-directed paths in the direction of increasing r (and in section 31.9 we
already identifed this surface concretely as what is known as the past apparent or
trapping horizon of the outgoing Vaidya metric).
Eu + r 0 u = (Eu + r )1 + . (40.64)
Thus there is an infinite redshift surface (of a freely falling relatively to a static observer)
at r = 2m(u = ).
We will now show that, as in the Schwarzschild case, this infinite redshift surface is
at finite affine distance, i.e. that this surface can be reached in finite proper time (for
timelike geodesic observers) or affine parameter (for ingoing lightrays). This means
that the outgoing Vaidya coordinates are (like their Schwarzschild counterparts, the
outgoing Eddington-Finkelstein coordinates) future-incomplete and that the space-time
needs to be extended beyond this infinite redshift surface (an issue we will briefly turn
to afterwards).
To that end we use the 2nd of the equations (40.26), which we integrate to
Z
2r d du
u = = log | | = du/2r(u) , (40.65)
2r(u)
or R
| | = e du/2r(u) . (40.66)
As r(u) 2m(u), the leading term in this integral is
Z
4 log | | = du/m(u) + . . . (40.67)
It follows from (40.67) that r reaches 2m(u) at the finite time = 0 (selected by the
R
choice of integration constant) iff the integral du/m(u) diverges for large u, i.e. iff m(u)
grows slower than linearly at large u. Since we are only considering non-increasing m(u)
anyway, this shows that for any non-increasing function m(u) that is not identically zero
for some u u0 (then the previous reasoning leading to the conclusion that r 2m(u)
requires u does not apply) the surface of infinite redshift r = 2m() is at finite
affine distance.
1. Consider first the case when m(u) is bounded away from zero, i.e. when one has
In this case one can reduce the argument to that for the Schwarzschild metric with
constant mass m0 . Indeed, in this case the integral (40.67) is
Z Z
du du
4 log | | = = u/m0 (40.69)
m(u) m0
898
or
u 4m0 log | | . (40.70)
Thus one reproduces the Schwarzschild result (with inequality because the mass
was at least as big as m0 ), with the conclusion
u for 0 . (40.71)
2. This does not yet show what happens in the case where m(u) 0 asymptotically
for u . A priori it is conceivable that whether one finds the Minkowski space
behaviour (u only as , no need for completion) or the Schwarzschild
behaviour (u for finite , completion required) depends on the rate at which
m(u) 0.
If one assumes that for large u the mass function m(u) behaves like m(u) ua
for some a > 0, say, or goes to zero exponentially, then for large u the integral in
(40.67) gives
m(u) ua log | | ua+1
(40.72)
m(u) e au log | | e au ,
so this still implies u for 0 for any a > 0 (actually for any a > 1 in
the power-law case), in agreement with the general argument.
Thus we find that quite generally r = 2m(u = ) behaves exactly like the future
horizon of a static Schwarzschild black hole, which also sits at t = +, where t is the
Schwarzschild time, equivalently (up to some constant factor) the proper time of a non-
geodesic static observer, or at u = +, where u is the retarded Eddington-Finkelstein
coordinate.
As we have seen above, the outgoing Vaidya coordinates are future-incomplete and one
needs to future-extend the space-time beyond u = +. This issue was first pointed out
by Lindquist et al (footnote 176) who stated, however, that they were unable to find
an extension. Indeed, this issue turns out to be far from trivial and is not resolved in
general.
As regards this issue of future incompleteness and future extension, the two special
cases recalled in section 40.5 above should be prototypical in the sense that for the
non-increasing non-negative mass function m(u) of the outgoing Vaidya metric one only
has the following 4 options:
899
2. m(u) decreases to a positive constant value m0 > 0 at some finite value u = u0
(and then remains constant);
1. In the first case, for u u0 the metric is just the Minkowski space in outgoing
lightcone coordinates and the extension of the space-time should not be an issue.
However, since this means that r = 2m(u = ) 0, the singularity of the
metric at r = 0 is potentially dangerous. And indeed it is shown by Waugh and
Lake (footnote 175) that backscattered light emitted towards smaller values of
r is infinitely blueshifted for r 0. This implies that the backreaction of the
backscattering cannot be ignored, and that classically the process of a black hole
or star radiating away all its mass to leave behind Minkowski space is unstable to
this backreaction.
2. In the second case, the metric is the Schwarzschild metric for u u0 and the
future extension of the metric is well known (and can e.g. be described by ingoing
Eddington-Finkelstein coordinates or by Kruskal-Szekeres coordinates).
3. In the third case, the infinite redshift surface recedes to r 0, and the space-time
geometry is potentially singular there, either already prior to taking into account
backreaction or, as above, once backscattering is taken into account.
4. This leaves the fourth case, with a surface of infinite redshift at the finite value
r = 2m(u = ) of the radius that can be reached in finite proper time or afffine
parameter and behaves very much like a future horizon as the potentially most
interesting case to look at.
A future extension of the outgoing Vaidya metric in this case was first proposed
by W. Israel in 1967.177 It is based on a suitable generalisation of the remarkable
Israel coordinates for the Schwarzschild metric discussed in section 26.10.
However, one of the problems, already realised and discussed by W. Israel, is that,
in order to extend the metric beyond u = , one also has to extend the mass
function m(u) to that region, and it is not obvious (and in fact not true) that
177
W. Israel, Gravitational collapse of a radiating star, Physics Letters A24 (1967) 184.
900
there is a unique way of doing this. This issue was further discussed by Fayos et
al. who suggested a slight modification of the procedure proposed by Israel.178
One could try to side-step this non-uniqueness issue by attempting to solve the
Einstein equations directly in Kruskal-like double-null coordinates, but this gen-
erally leads to equations that cannot be solved analytically.179
178
F. Fayos, M. Martin-Prats, J. Senovilla, On the extension of Vaidya and Vaidya-Reissner-Nordstr
om
spacetimes, Class. Quant. Grav. 12 (1995) 2565-2576.
179
See e.g. B. Waugh, K. Lake, Double-null coordinates for the Vaidya metric, Phys. Rev. D34 (1986)
2978-2984, and F. Girotto, A. Saa, Semi-analytical approach for the Vaidya metric in double-null coor-
dinates, arXiv:gr-qc/0406067.
901
41 Vaidya Metrics III: Linear Mass m(v) = v (a case study)
In the following, in order to illustrate some of the properties of Vaidya metrics discussed
in the previous sections, we will focus mainly on the ingoing Vaidya metric (39.2), and
in particular on the case where the mass function is a linear function of v, m(v) =
v. This tractable example already displays a rich and intricate structure (with a
subtle dependence on the value of the mass parameter ), and gives a good idea of the
complexity of the properties of the general Vaidya metric.
m(v) = v . (41.1)
In order to avoid unphysical negative masses, we will only consider this space-time for
v 0 and glue it to empty Minkowski space with metric ds2 = dv 2 + 2dv dr + r 2 d2
at v = 0. Thus we are considering the Vaidya metric
with (
2m(v) 0 v0
f (v, r) = 1 , m(v) = (41.3)
r v v 0
Evidently this is a rather unphysical metric for v (the mass tending to infinity),
and we will rectify this later on by glueing on a Schwarzschild metric at some time
v0 > 0 (with constant mass m0 = v0 ).
In order to determine the event horizon, we will now first determine the outgoing light-
rays in the linear mass Vaidya metric. To that end, we need to solve the 2nd order null
geodesic equations (40.2) or the 1st order equations (40.12) and (40.13). Remarkably,
both sets of equations simplify tremendously in the linear mass case m(v) = v (and we
will see later on that this can be attributed to an additional symmetry of the problem
arising from a homothety of the metric):
1. For a linear mass function, and written in terms of the new (non-affine) parameter
t with = exp t, the 1st order equations (40.12) and (40.13) are simply a coupled
system of linear homogeneous differential equations with constant coefficients,
namely ) ! ! !
dr/dt = r 2v d r 1 2 r
= (41.4)
dv/dt = 2r dt v 2 0 v
This system of equations can be solved in standard ways, essentially by diagonal-
ising and/or exponentiating the (2 2)-matrix appearing in the above equation.
902
2. Alternatively, one observes that for a linear mass function the non-linear coupled
2nd order null geodesic equations (40.2) reduce to two decoupled (and identical)
linear harmonic oscillator equations for r( ) and v( ) respectively. Indeed, note
that for a linear mass function m(v) = v (40.14) and (40.15) reduce to
4
r( ) + r( ) = 0
2 (41.5)
4
v( ) + 2 v( ) = 0 .
This is simply the equation of a time-dependent harmonic oscillator with the
special (scale-invariant) potential 2 and with the same frequency 4 for both
r and v. It is straightforward to solve this equation for any .
Proceeding either way, one quickly learns that curiously the case = 1/16 is special and,
in fact, slightly more complicated (because of a resonance behaviour). In this section,
we will determine the outgoing null geodesics for all using the equations (41.5). An
alternative derivation based on the matrix equation (41.4) is given in the appendix
(section 41.6).
To proceed, we first observe that a priori the solutions to the 2nd order equations (41.5)
will involve 4 integration constants, but that (40.12) and (40.13) provide 2 relations
among them. In practice, it is then most convenient to determine the general solution
for r( ), and to then determine v( ) algebraically from r( ) and r(
) using (40.12),
1
) = r( ) 2v( )
r( v( ) = (r( ) r(
)) , (41.6)
2
In order to solve (41.5), we write (reparametrise) the frequency 4 in terms of a
parameter as
p
4 = (1 ) = 12 (1 1 16) . (41.7)
6= 1
16 : r( ) = c+ + + c
(41.9)
v( ) = 2(c+ /+ ) + + 2(c / ) .
Depending on the sign of
= 1 16 . (41.10)
there are two different subcases:
903
(a) m < 1/16, > 0: the two roots are real and positive,
= 1
2 i , 2 = (4 1/4) > 0 . (41.12)
In order for the solution to be real, the integration constants c should then
also be complex conjugates of each other,
c = c1 ic2 . (41.13)
Noting that
= 1/2 e i log (41.14)
the general solution can then, if desired, be recast into a manifestly real (but
not necessarily more enlightning) form, now expressed in terms of real linear
combinations of 1/2 cos( log ) and 1/2 sin( log ).
= 1
16 : r( ) = c 1/2 + d 1/2 log
(41.15)
v( ) = 4r( ) 8d 1/2 = (4c 8d) 1/2 + 4d 1/2 log .
It is clear from the above solutions (41.9) and (41.15) that the qualitative behaviour of
outgoing lightrays (and thus the lightcones) depends crucially on whether > 0, = 0
or < 0. We will make extensive use of these solutions below in order to determine the
horizon structure in the eternal Vaidya geometry and the Vaidya-glued-to-Schwarzschild
geometry, and I will just add some general qualitative comments here:
Remarks:
1. One noteworthy feature of the solutions (41.9) and (41.15) is that a simultaneous
scaling of (c+ , c ) in (41.9) or (c, d) in (41.15) is simply equivalent to a scaling of
the coordinates (v, r). This fact reflects a scaling symmetry of the metric that we
will discuss below.
2. It turns out that, among all the above solutions, a special role will be played by
those null geodesics that are invariant under this scaling, i.e. which are such that
r and v are linearly related so that the geodesics are straight lines (d2 r/dv 2 = 0)
in an (r, v)-diagram.
904
From (40.9), which in the present (linear mass) case reads
dr dr v 1
2 = f (v, r) + = , (41.16)
dv dv r 2
one finds
d2 r v 2v
2
= 1 1 . (41.17)
dv r 2r r
Thus r (v) = 0 along
v 2v
1 =1 r 2 vr/2 + v 2 = 0
2r r (41.18)
2 2
(r v/4) = (1 16)(v/4)
with solution
v
r= 1 = v v= r . (41.19)
4 2 2
Again we see that the value = 1/16, = 0 plays a special role, as these lines
exist only for 0.
3. For > 0 these are precisely the null geodesics with either c = 0 or c+ = 0,
>0: c = 0 r( ) = v( ) , (41.20)
2
while for = 0 these are the lines with d = 0,
= e 2 (41.22)
(so that scaling corresponds to shifting ) and defining C = 2d, the solution
takes the form
This is the form of the solution given by Poisson.180 It is more convenient, however,
also for purposes of matching the Vaidya and Schwarzschild geodesics, to use the
affinely parametrised solution given in (41.15).
180
E. Poisson, A Relativists Toolkit: the Mathematics of Black Hole Mechanics, section 5.7.2.
905
41.2 Some Comments on Homotheties, Geodesics and Wronskians
We have seen in a number of different ways that Vaidya metrics with a linear mass
function have some special properties which hint at an underlying additional symmetry
in this case. In this section I collect some (not indispensable) comments related to this
symmetry, to homotheties, Wronskians and the like.
To discover and describe this additional symmetry, note that from the null geodesic
equations (40.2) one finds for a general mass function m(v)
d v 2
= (vm (v) m(v))
(r v v r) . (41.24)
d r
Thus
D := r v v r (41.25)
is a constant of motion for outgoing null geodesics iff m(v) = vv m(v), i.e. iff m(v) = v
is a linear function of v,
d
D = 0 m(v) = v . (41.26)
d
This constant of motion can be understood and interpreted as the conserved charge
associated to the dilatation symmetry (homothety) of the Vaidya metric with a linear
mass function,
C = vv + rr , C + C = 2g (41.28)
and therefore (see the discussion in section 9.2, in particular equation (9.8)) leads to
the conserved charge
QC = g C x (41.29)
Using the null condition (40.9) for radial null geodesics, f (v, r)v = 2r,
one then finds
(41.25),
f (v, r)v = 2r QC = D . (41.32)
Remarks:
906
1. The scaling symmetry (41.27) implies that if (r( ), v( )) is a solution to the
geodesic equations, then so is (r( ), v( )), something that we had already ob-
served post facto based on the explicit solutions (41.9) and (41.15) of the null
geodesic equations. We see e.g. from (41.25) that this scaling is essentially equiv-
alent to changing the integration constant D by D 2 D, and therefore the
freedom to choose the value of D reflects this scale invariance.
4. Inserting (40.12) and (40.13) into (41.25), one finds an algebraic relation between
r( ) and v( ), namely
= 2r 2 vr + 2m(v)v = 2r 2 vr + 2v 2 ,
D = (r v v r) (41.35)
5. Because of their analytic tractability, these linear mass Vaidya metrics (also known
as Vaidya metrics describing self-similar gravitational collapse due to the exis-
tence of the homothety) have been much studied in the literature. Other aspects
and consequences of the homothety, in particular in relation to the characterisation
and properties of the singularity, have been explored by Lake and Zannias.181
181
K. Lake, T. Zannias, Naked singularities in self-similar gravitational collapse, Phys. Rev. D41 (1990)
3866-3868.
907
41.3 Event vs Apparent Horizons for m(v) = v: Overview
With all this detailed knowledge of null geodesics in the linear mass Vaidya space-time
at our disposal, it is now straightforward to determine the causal properties of this
space-time (and subsequently of the space-time obtained by glueing Schwarzschild to
Vaidya). As in our discussion above, the sign of = 1 16 and the value = 1/16
will turn out to play a special role.
The first and simplest exercise is to determine the apparent horizons and their geometry.
The apparent horizon (31.122) is the hypersurface
f (v, r) = 0 r = 2v (v 0) . (41.38)
Thus in the linear mass case this is a straight line in an (r, v)-diagram. The induced
metric (31.150) on the apparent horizon is
or, in terms of r,
ds2 |f (v,r)=0 = 1 dr 2 + r 2 d2 . (41.40)
Therefore the apparent horizon is spacelike for > 0. Radial outgoing null geodesics
reaching r = 2v will attain their maximal radius there and then turn around to smaller
values of r.
The intrinsic geometry of the apparent horizon is thus manifestly flat for = 1, and
while the factor 1 may look harmless it actually leads to a non-trivial curvature tensor
for 6= 1, with a curvature singularity at r = 0. Explicitly, one finds (see section 7.7,
in particular equations (7.74) and (7.76)) that e.g. the non-trivial components of the
Riemann tensor and the Ricci scalar are
R = 1 , R = 2(1 )r 2 . (41.41)
Further information is obtained from looking at the 2nd derivative d2 r/dv 2 , i.e. the 1st
derivative of f (v, r). From (41.17) one has
d2 r 2 2
= (r v/4) + ( 1/16)v , (41.42)
dv 2 r3
and from (41.19) we know that r (v) = 0 along the scale-invariant null geodesics (41.20)
and (41.21) given by
r = r v . (41.43)
2
There are now three different cases to consider, depending on the sign of .
908
There are no real roots for > 1/16 and from (41.42) one sees that r (v) < 0
is strictly negative everywhere. It turns out that in this case no lightray will be
able to escape to infinity, i.e. there is no future null infinity I + at all, and every
lightray ends up in the spacelike singularity at r = 0 in the future.
This does not follow just from the fact that d2 r/dv 2 < 0. One has to show that all
outgoing lightrays eventually reach and subsequently cross the apparent horizon
at which dr/dv = 0. This can be established by using the explicit real form of
the solution, with its characteristic 1/2 -modulated trigonometric dependence on
log .
Thus in this case there is no event horizon. In fact, the entire space-time for
v 0 looks somewhat like the interior of region II of the Kruskal extension of the
Schwarzschild space-time. This is clearly an artefact of having a mass that tends
sufficiently rapidly to infinity. This is illustrated in the Penrose diagram 59.182
r=0 i0
I
v
=
0
Figure 59: Penrose diagram of the linear mass m(v) = v Vaidya space-time for v 0
and > 1/16.
Note that it follows from (41.36) and (41.33) that for < 0 outgoing null geodesics
satisfy
(r v/4)2 + ( 1/16)v 2 = |c/|2 () 0 , (41.44)
where |c|2 = c+ c = c+ c+ and ||2 = + = + + denote the squares of the
absolute value. In particular if i > 0 denotes the (initial) time at which v(i ) = 0
(we are discarding i = 0 because for 0 the functions 1/2 cos( log ) and
1/2 sin( log ) are badly behaved), one has
This means that outgoing radial null geodesics start off at v = 0 at some positive
value of r. As one of these outgoing (families of) null geodesics will turn out
to become the event horizon for the metric obtained by combining Vaidya for
0 v v0 with Schwarzschild for v v0 , this will lead to the conclusion that in
the case < 0 the singularity at r = 0 is hidden behind an event horizon (no null
geodesic emerging from r = 0 can escape to infinity).
182
For this and the subsequent Penrose diagrams, see e.g. W. Hiscock, L. Williams, D. Eerdley, Creation
of particles by shell-focusing singularities, Phys. Rev. D26 (1982) 751-760, and B. Waugh, K. Lake,
Double-null coordinates for the Vaidya metric, Phys. Rev. D34 (1986) 2978-2984.
909
2. = 1/16 or = 0
For = 1/16 one has a double root at r+ = r = v/4. One has d2 r/dv 2 0, with
equality only for the line (null hypersurface) r = v/4,
d2 r
=0 r = v/4 . (41.46)
dv 2
This hypersurface is generated by the outgoing null geodesics with D = 0 (or
d = 0 in (41.15)).
It turns out that the behaviour of the other outgoing light rays depends strongly
on whether they are in the region r > v/4 or in the region r < v/4. If they are
in one of those regions initially, they always remain there, and for r > v/4 the
lightrays escape to infinity while for r < v/4 they reach a maximal radius at the
apparent horizon r = v/8 and then return to smaller values of r. These results
follow from the general solution (41.15) which we will analyse in a bit more detail
below.
Therefore the event horizon of the = 1/16 linear mass Vaidya black hole is the
scale-invariant null hypersurface r = v/4, outside the apparent horizon at r = v/8.
See Figure 60.
r=0 i+
4
v/
I+
=
r
=
r
0
i0
v
=
I
0
Figure 60: Penrose diagram of the linear mass m(v) = v Vaidya space-time for v 0
and = 1/16.
3. For 0 < < 1/16 there are two roots r and a correspondingly richer phase
diagram for the behaviour of outgoing lightrays in the (r, v)-plane. The two hy-
persurfaces r = r(v) along which r (v) = 0 are generated by the D = 0 null
geodesics (41.43). They divide the (r 0, v 0) quadrant into 3 wedges:
(a) For r > r+ = (+ /2)v one has r (v) < 0, and c+ > 0, c < 0 (thus D > 0).
(b) For r = r+ one has r (v) = 0, and c+ > 0, c = 0 (thus D = 0).
(c) For ( /2)v = r < r < r+ one has r (v) > 0, and c+ > 0, c > 0 (thus
D < 0).
910
(d) For r = r one has r (v) = 0, and c+ = 0, c > 0 (thus D = 0).
(e) For r < r one has r (v) < 0, and c+ < 0, c > 0 (thus D > 0).
(which is always satisfied in the given range of as can be seen by squaring the
two positive sides of this inequality), the apparent horizon at r = 2v lies in
the lowest wedge. This has the following implications for the behaviour of null
geodesics (that can also be checked from the explicit solution (41.9)):
Null geodesics that start off in the lowest wedge cross the apparent horizon
at r = 2v and then return to smaller values of r (and ultimately to r = 0 at
finite v), regardless of whether they were initially above the apparent horizon
(truly outgoing at that time) or below it.
Null geodesics in the central region, including the two lines r = ( /2)v start
off at r = 0 for v = 0 and reach r = for v = , with r (v) 0.
Null geodesics in the region r > (+ /2)v have r(v = 0) > 0 and escape to
infinity with r (v) < 0.
It follows that the lower line r = r = ( /2)v is the event horizon of the > 0
linear mass Vaidya black hole. It again lies outside the apparent horizon. See
Figure 61.
r=0 i+
r
=
r
I+
r
=
0
r+
=
r
i0
I
v
=
0
Figure 61: Penrose diagram of the linear mass m(v) = v Vaidya space-time for v 0
and < 1/16.
911
41.4 Null Geodesics, Horizons and Singularities for = 1/16
In this section we take a slightly more detailed look at the special case = 1/16. The
solution for outgoing radial null geodesics is given in (41.15), which we now compactly
write as
r( ) = c 1/2 + d 1/2 log = v( )/4 + 2d 1/2 . (41.48)
The behaviour of these geodesics depends crucially on the sign of the constant d.
2. For d 6= 0 there are two distinct types of geodesics, depending on the sign of d,
namely those with r > v/4 at all times (d > 0) and those with r < v/4 at all
times (d < 0):
(a) For d > 0 one has r > v/4. The requirement v( ) 0 (implying r( ) 0 in
this case) leads to the condition that
min = e (2 c/d) :
1/2
v(min ) = 0 , r(min ) = 2dmin . (41.50)
r(
a) = 0 a = e (c/|d| 2) < max r(a ) = v(a )/8 .
(41.54)
912
Monotonicity of the mass function (and the explicit solution) imply that these
will then not turn around again (and reach r = 0 at = max ).
Thus the null geodesic r = v/4 is the last null geodesic to (barely) escape to infinity,
and the event horizon is the null hypersurface,
Remarks:
1. The event horizon at r = v/4 manifestly lies outside the apparent horizon r = v/8.
In particular, while the outgoing expansion (31.132)
r 2m(v) r v/8
= 2
= (41.56)
r r2
is zero on the apparent horizon (by our informal definition of the apparent horizon
in section 31.8), it is strictly positive on the event horizon,
1
|v=4r = >0 . (41.57)
2r
2. The event horizon satisfies r(v = 0) = 0, so that it emerges from r = 0 at the time
v = 0. At this time v = 0 the singularity at r = 0 is massless, m(v = 0) = 0,
and one might be tempted to conclude, e.g. from the Kretschmann scalar (39.11),
which now has the form,
R R = 482 v 2 /r 6 , (41.58)
This becomes more manifest when one introduces, instead of r, the scale-invariant
coordinate x = v/r, say. Then one evidently has
R R = 482 x6 v 4 (41.60)
913
4. The fact that there are lightrays (albeit only a single S 2 -family of lightrays) that
can escape from the singularity to infinity means that the singularity is not com-
pletely hidden behind the event horizon, and is barely / marginally naked. Such
massless singularities are considered to be relatively harmless, and therefore they
are not considered to be genuine counterexamples to the spirit of the (or an ap-
propriately formulated) cosmic censorship conjecture.
We now consider a slightly more realistic and interesting scenario in which there is an
ingoing shell of radiation (null dust) during a finite interval of time v leaving behind a
Schwarzschild black hole. In equations this means that we consider the Vaidya metric
with
0 v0
2m(v)
f (v, r) = 1 , m(v) = v/16 0 v v0 (41.62)
r
m0 = v0 /16 v v0
Note that the mass function is continuous, but that its derivative m (v) has a jump,
m (v) = 1
16 [(v) (v v0 )] , (41.63)
To return to the current setting, for v v0 the metric with f (v, r) given by (41.62) is
the Schwarzschild metric, with apparent horizon = event horizon at
v v0 : r0 = 2m0 = v0 /8 , (41.64)
where we have chosen the freedom to scale so that v( = 1) = v0 . Thus this describes
the location of the horizon for 1.
914
For 0 v v0 , the apparent horizon is at r = v/8. This matches onto the apparent =
event horizon of the Schwarzschild black hole for v v0 .
In order to determine the event horizon of the total geometry, we now need to determine
the lightray for 0 v v0 that matches onto the above event horizon of the Schwarz-
schild geometry. Since r0 = v0 /8 < v0 /4, it is clear that this is to be sought among the
geodesics given in (41.48) with d < 0,
Since
r( = 1) = c , v( = 1) = 4c + 8|d| , (41.67)
r( = 1) = r0 , v( = 1) = v0 = 8r0 c = r0 , d = r0 /2 . (41.68)
Remarks:
2. The explicit expression for the event horizon confirms that in the Vaidya region
it lies in the region between r = v/4 and the apparent horizon at r = v/8, thus
outside the apparent horizon. Explicitly, for the difference between the event
horizon and the apparent horizon (given at each time by r = v/8) one has
915
3. This last equation also gives us information about the rate of expansion of the
event horizon, given by (40.12) by
One might perhaps naively have expected the expansion rate of the horizon to
increase during a period of infalling matter, but this equation shows that quite
the opposite is true: during the period that matter falls in, the event horizon
of course grows, but its expansion rate decreases, the expansion stopping when
matter stops falling in at = 1. This is a general and somewhat counterintuitive
feature of event horizons, reflecting the non-locality of the event horizon. We
discussed this in general terms with the help of the Raychaudhuri equation applied
to the generating null congruence of an event horizon in section 31.7, and we also
observed this in the, in other respects quite different, example of event horizons
in the Oppenheimer-Snyder geometry of a collapsing star in section 31.12.
(a) all geodesics that satisfy r(v0 ) > r0 are outside the Schwarzschild horizon,
and thus describe outgoing null geodesics in the Schwarzschild geometry that
escape to infinity; and
(b) all geodesics that satisfy r(v0 ) < v0 /4 emerged from r = 0 at time v = 0.
To conclude this discussion, I just mention that the situation for 6= 1/16, i.e. for the
Vaidya metric with
0 v0
2m(v)
f (v, r) = 1 , m(v) = v 0 v v0 (41.73)
r
m0 = v0 v v0
is the following:
916
1. For > 1/16 the entire singularity at r = 0 (the v-axis) is hidden behind the
event horizon and geodesics that reach infinity emerged from some finite value of
r at v = 0 (and for v < 0 there was just Minkowski space).
2. For < 1/16 there are many lightlike geodesics that emerge from r = 0 and that
reach infinity. Thus this is like the case = 1/16, because the geodesics that
escape from r = 0 ro r emerge from r = 0 at v = 0 (so that again this is a
massless singularity). It would have been worse if they had emerged from r = 0
at some time v > 0, where the singularity is massive, but this does not happen.
It is straightforward to establish all this analytically by using the explicit solution (41.9)
for the outgoing lightrays. It is also an instructive (and fairly elementary) exercise to
generalise the above discussion to the case where one initially has a constant mass
Schwarzschild black hole instead of Minkowski space, i.e. one has a mass function
m0 v v0
m(v) = m0 + (v v0 ) v0 v v1 (41.74)
m1 = m0 + (v1 v0 ) v v1
Here is an alternative derivation of the solutions (41.9) and (41.15) for outgoing null
geodesics of the linear mass m(v) = v ingoing Vaidya metric for any by determining
the solution of the coupled set of linear homogeneous differential equations (41.4) with
constant coefficients, ! ! !
d r 1 2 r
= . (41.75)
dt v 2 0 v
We write this as
d~x/dt = L~x (41.76)
with ~x = (r, v)t ((.)t denoting the transpose column vector) and
!
1 2
L= . (41.77)
2 0
One standard (but in general cumbersome) way to solve this equation is to exponentiate
the matrix L,
~x(t) = e tL ~x(0) . (41.78)
Another possibility is to diagonalise L either explicitly or implicitly. We follow the latter
approach by making the ansatz
X X
~x(t) = cJ ~aJ e J t cJ ~xJ (t) , (41.79)
J J
917
where for a (d d)-matrix L the index J = 1, 2, . . . d labels the d linearly independent
solutions, the frequencies J and their eigendirections ~aJ are to be determined, and the
cJ are integration constants. Plugging this ansatz into (41.76) one finds
Thus the J are the eigenvalues of L and the ~aJ are the corresponding eigenvectors.
The J are the roots of the degree d polynomial equation
1. If all the eigenvalues / roots are distinct, then (41.79) already gives the general
solution.
3. If eigenvalues and eigenvectors are degenerate, then (41.79) does not provide d
linearly independent solutions. A resonance phenomenon (familiar from forced
oscillations) arises, resulting in the appearance of a term proportional to t exp J t.
Indeed, in such a case the general solution associated with the degenerate root J
(let us assume that it is twice degenerate) is of the form
~xJ = cJ ~aJ e J t + dJ ~aJ t e J t + ~bJ e J t (41.82)
and evidently only defined modulo (i.e. up to addition of a multiple of) ~aJ . Using
(41.83), it is easy to see that indeed the 2nd part of (41.82) is a solution of (41.76),
d
( L) ~aJ t e J t + ~bJ e J t = 0 . (41.84)
dt
The eigenvectors ~aJ can either be readily constructed by hand (for small d) or, in
general, from the minors (cofactors) of the matrix M .
If M22 = M21 = 0, one can alternatively construct ~aJ from the components of the
1st row. This only fails if M (J ) = 0 identically, in which case the construction of
two linearly independent eigenvectors is trivial anyway: one can e.g. choose (1, 0)t
and (0, 1)t .
918
Moreover, for (22)-matrices, case (2) in the above list can only arise if M (J ) = 0
identically. Otherwise there will be a resonance iff one has a degenerate root.
Specifically and concretely, in the case at hand, with L the (2 2)-matrix given in
(41.77), one has
! !
1 2 1 2
L= M () = , (41.86)
2 0 2
and we see (rediscover from the present point of view) that there are 3 distinct
cases, depending on the sign of = 1 16 (41.10):
+ = 21 (1 + i) = , 2 = 16 1 > 0 . (41.89)
(c) = 1/16 or = 0:
In this case one has one degenerate root = 1/2, the matrix
!
1/2 1/8
M ( = 1/2) = (41.90)
2 1/2
is not identically zero, and therefore in this case the solution will involve an
additional
te t/2 = 1/2 log (41.91)
2. The eigenvectors ~aJ = ~a can be chosen to have the simple form (41.85),
! !
M ( )22
~a = = . (41.92)
M21 ( ) 2
919
3. For = 1/16, = 1/2, and the secondary null vector can be chosen to be
! ! ! !
~b = 1 1/2 1/8 1 1/2
M~b = = = ~a . (41.93)
0 2 1/2 0 2
Putting everything together, we can now find the general solution in the 3 cases. We
will immediately write things again in terms of proper time , related to the parameter
t by = exp t.
r( ) = c+ + + + c
(41.94)
c+ + + 2
v( ) = 2 c
with c real constants for > 0 and complex conjugates of each other for < 0.
This agrees with the solution (41.9) with the identification c = c .
This agrees with the solution (41.15) with the identification c = 2c 4d, d = 2d.
920
42 Exact Wave-like Solutions of the Einstein Equations
Such wave-metrics have been studied in the context of four-dimensional general relativity
for a long time even though they are not (and were never meant to be) phenomeno-
logically realistic models of gravitational plane waves. The reason for this is that in
the far-field gravitational waves are so weak that the linearised Einstein equations and
their solutions are adequate to describe the physics, whereas the near-field strong grav-
itational effects responsible for the production of gravitational waves, for which the
linearised equations are indeed insufficient, correspond to much more complicated solu-
tions of the Einstein equations (describing e.g. two very massive stars orbiting around
their common center of mass).
However, pp-waves have been useful and of interest as a theoretical play-ground since
they are in some sense the simplest essentially Lorentzian metrics with no non-trivial
Riemannian counterparts. As such they also provide a wealth of counterexamples to
conjectures that one might like to make about Lorentzian geometry by naive extrapo-
lation from the Riemannian case. They have also enjoyed some popularity in the string
theory literature as potentially exact and exactly solvable string theory backgrounds.
However, they seem to have made it into very few textbook accounts of general relativ-
ity, and the purpose of this section is to at least partially fill this gap by providing a
brief introduction to this topic.
We have seen in section 22 that a metric describing the propagation of a plane wave in
the x3 -direction (22.86) can be written as
We will now simply define a plane wave metric in general relativity to be a metric of
the above form, dropping the assumption that hij be small,
921
We will say that this is a plane wave metric in Rosen coordinates. This is not the
coordinate system in which plane waves are usually discussed, among other reasons
because typically in Rosen coordinates the metric exhibits spurious coordinate singu-
larities. This led to the mistaken belief in the past that there are no non-singular plane
wave solutions of the non-linear Einstein equations. We will establish the relation to
the more common and much more useful Brinkmann coordinates below.
Plane wave metrics are characterised by a single matrix-valued function of U , but two
metrics with quite different gij may well be isometric. For example,
s2 = 2dU dV + U 2 d~y 2
d (42.4)
is isometric to the flat Minkowski metric whose natural presentation in Rosen coordi-
nates is simply the Minkowski metric in lightcone coordinates,
s2 = 2dU dV + d~y 2 .
d (42.5)
This is not too difficult to see, and we will establish this as a consequence of a more
general result later on (but if you want to try this now, try scaling ~y by U and do
something to V . . . ).
That (42.4) is indeed flat should in any case not be too surprising. It is the null
counterpart of the spacelike fact that ds2 = dr 2 + r 2 d2 , with d2 the unit line
element on the sphere, is just the flat Euclidean metric in polar coordinates, and the
timelike statement that
2 ,
ds2 = dt2 + t2 d (42.6)
2 the unit line element on the hyperboloid, is just (a wedge of) the flat Minkowski
with d
metric. In cosmology this is known as the Milne Universe discussed in section 36.1, a
rather trivial solution of the Friedmann equations with k = 1, a(t) = t and = p = 0.
It is somewhat less obvious, but still true, that for example the two metrics
In the remainder of this section we will study gravitational plane waves in a more
systematic way. One of the characteristic features of the above plane wave metrics is
the existence of a nowhere vanishing covariantly constant null vector field, namely V .
We thus begin by deriving the general metric (line element) for a space-time admitting
such a covariantly constant null vector field. We will from now on consider general
(d + 2)-dimensional space-times, where d is the number of transverse dimensions.
922
Thus, let Z be a parallel (i.e. covariantly constant) null vector of the (d+ 2)-dimensional
Lorentzian metric g , Z = 0. This condition is equivalent to the pair of conditions
Z + Z = 0 (42.8)
Z Z = 0 . (42.9)
The first of these says that Z is a Killing vector field, and the second that Z is also
a gradient vector field. If Z is nowhere zero, without loss of generality we can assume
that
Z = v (42.10)
for some coordinate v since this simply means that we are using a parameter along
the integral curves of Z as our coordinate v. In terms of components this means that
Z = v , or
Z = gv . (42.11)
Zv = gvv = 0 . (42.12)
The Killing equation now implies that all the components of the metric are v-independent,
v g = 0 . (42.13)
Z Z = 0 Z Z = 0 , (42.14)
which implies that locally we can find a function u = u(x ) such that
Z = gv = u . (42.15)
There are no further constraints, and thus the general form of a metric admitting a
parallel null vector is, changing from the x -coordinates to {u, v, xa }, a = 1, . . . , d,
ds2 = g dx dx
= 2dudv + guu (u, xc )du2 + 2gau (u, xc )dxa du + gab (u, xc )dxa dxb
2dudv + K(u, xc )du2 + 2Aa (u, xc )dxa du + gab (u, xc )dxa dxb . (42.16)
Note that if we had considered a metric with a covariantly constant timelike or spacelike
vector, then we would have obtained the above metric with an additional term of the
form dv 2 . In that case, the cross-term 2dudv could have been eliminated by shifting
v v = v u, and the metric would have factorised into dv 2 plus a v -independent
metric. Such a factorisation does in general not occur for a covariantly constant null
vector, which makes metrics with such a vector potentially more interesting than their
timelike or spacelike counterparts.
923
There are still residual coordinate transformations which leave the above form of the
metric invariant. For example, both K and Aa can be eliminated in favour of gab . We
will not pursue this here, as we are primarily interested in a special class of metrics
which are characterised by the fact that gab = ab ,
Such metrics are called plane-fronted waves with parallel rays, or pp-waves for short.
plane-fronted refers to the fact that the wave fronts u = const. are planar (flat),
and parallel rays refers to the existence of a parallel null vector. Once again, there
are residual coordinate transformations which leave this form of the metric invariant.
Among them are shifts of v, v v + (u, xa ), under which the coefficients K and Aa
transform as
K K + 12 u
Aa Aa + a . (42.18)
Plane waves are a very special kind of pp-waves. By definition, a plane wave metric is
a pp-wave with Aa = 0 and K(u, xa ) quadratic in the xa (zeroth and first order terms
in xa can be eliminated by a coordinate transformation),
We will say that this is the metric of a plane wave in Brinkmann coordinates. The
relation between the expressions for a plane wave in Brinkmann coordinates and Rosen
coordinates will be explained in section 42.5. From now on barred quantities will refer
to plane wave metrics.
In Brinkmann coordinates a plane wave metric is characterised by a single symmet-
ric matrix-valued function Aab (u). Generically there is very little redundancy in the
description of plane waves in Brinkmann coordinates, i.e. there are very few residual
coordinate transformations that leave the form of the metric invariant, and the metric
is specified almost uniquely by Aab (u). In particular, as we will see below, a plane wave
metric is flat if and only if Aab (u) = 0 identically. Contrast this with the non-uniqueness
of the flat metric in Rosen coordinates. This uniqueness of the Brinkmann coordinates
is one of the features that makes them convenient to work with in concrete applications.
924
i.e. the solutions x ( ) to the geodesic equations
x (x( ))x ( )x ( ) = 0 ,
( ) + (42.21)
Rather than determining the geodesic equations by first calculating all the non-zero
Christoffel symbols, we make use of the fact that the geodesic equations can be obtained
more efficiently, and in a way that allows us to directly make use of the symmetries of
the problem, as the Euler-Lagrange equations of the Lagrangian
L = 1
2g x x
= u v + 12 Aab (u)xa xb u 2 + 12 ~x 2 , (42.22)
u = pv . (42.25)
Then the geodesic equations for the transverse coordinates are the Euler-Lagrange equa-
tions
a ( ) = Aab (pv )xb ( )p2v
x (42.26)
These are the equation of motion of a non-relativistic harmonic oscillator,
a ( ) = ab
x 2
( )xb ( ) (42.27)
The constraint
for null geodesics (the case 6= 0 can be dealt with in the same way) implies, and
thus provides a first integral for, the v-equation of motion. Multiplying the oscillator
equation by xa and inserting this into the constraint, one finds that this can be further
integrated to
pv v( ) = 21 xa ( )x a ( ) + pv v0 . (42.30)
925
Note that a particular solution of the null geodesic equation is the purely longitudinal
null geodesic
x ( ) = (u = pv , v = v0 , xa = 0) . (42.31)
Along this null geodesic, all the Christoffel symbols of the metric (in Brinkmann coor-
dinates) are zero. Hence Brinkmann coordinates can be regarded as a special case of
Fermi coordinates (briefly mentioned at the beginning of section 2.10).
Hlc = pu , (42.32)
1
Hlc = pv Hho . (42.35)
In summary, we note that in the lightcone gauge the equation of motion for a relativistic
particle becomes that of a non-relativistic harmonic oscillator. This harmonic oscillator
equation appears in various different contexts when discussing plane waves, and will
therefore also reappear several times later on in this section.
It is easy to see that there is essentially only one non-vanishing component of the
Riemann curvature tensor of a plane wave metric, namely
uaub = Aab .
R (42.36)
In particular, therefore, because of the null (or chiral) structure of the metric, there is
only one non-trivial component of the Ricci tensor,
uu = ab Aab Tr A ,
R (42.37)
926
and the only non-zero component of the Einstein tensor (7.92) is
uu = R
G uu . (42.39)
Thus, as claimed above, the metric is flat iff Aab = 0. Moreover, we see that in
Brinkmann coordinates the vacuum Einstein equations reduce to a simple algebraic
condition on Aab (regardless of its u-dependence), namely that it be traceless.
A simple example of a vacuum plane wave metric in four dimensions is
for arbitrary fuctions A(u) and B(u). This reflects the two polarisation states or de-
grees of freedom of a four-dimensional graviton. Evidently, this generalises to arbitrary
dimensions: the number of degrees of freedom of the traceless matrix Aab (u) correspond
precisely to those of a transverse traceless symmetric tensor (a.k.a. a graviton).
For d = 1, every plane wave is conformally flat, as is most readily seen in Rosen
coordinates.
When the Ricci tensor is non-zero (Aab has non-vanishing trace), then plane waves solve
the Einstein equations with null matter or null fluxes, i.e. with an energy-momentum
tensor T whose only non-vanishing component is Tuu ,
T = (u)u u . (42.44)
Examples are e.g. null Maxwell fields A (u) with field strength
Fu = Fu = u A . (42.45)
uu > 0 or Tr A < 0.
Physical matter (with positive energy density) corresponds to R
It is pretty obvious by inspection that not just the scalar curvature but all the scalar
curvature invariants of a plane wave, i.e. scalars built from the curvature tensor and its
covariant derivatives, vanish since there is simply no way to soak up the u-indices.
927
Usually, an unambiguous way to ascertain that what appears to be a singularity of
a metric is a true curvature singularity rather than just a singularity in the choice
of coordinates is to exhibit a curvature invariant that is singular at that point. For
example, for the Schwarzschild metric one has the Kretschmann scalar (26.145) K =
R R m2 /r 6 , which shows that the singularity at r = 0 is a true singularity.
Now for plane waves all curvature invariants are zero. Does this mean that plane waves
are non-singular? Or, if not, how does one detect the presence of a curvature singularity?
One way to do this is to study the tidal forces acting on extended objects or families
of freely falling particles. Indeed, in a certain sense the main effect of curvature (or
gravity) is that initially parallel trajectories of freely falling non-interacting particles
(dust, pebbles,. . . ) do not remain parallel, i.e. that gravity has the tendency to focus
(or defocus) matter. This statement find its mathematically precise formulation in the
geodesic deviation equation (7.36),
(D )2 x = R x x x . (42.46)
Here x is the separation vector between nearby geodesics. We can apply this equation
to some family of geodesics of plane waves discussed in section 42.3. We will choose x
to connect points on nearby geodesics with the same value of = u. Thus u = 0, and
the geodesic deviation equation for the transverse separations xa reduces to
d2 a a xb = Aab xb .
x = R ubu (42.47)
du2
This is (once again!) the harmonic oscillator equation, and generalises the corresponding
equation (22.95) of the linearised theory to the present case.
We could have also obtained this directly by varying the harmonic oscillator (geodesic)
equation for xa , using u = 0. We see that for negative eigenvalues of Aab (physical
matter) this tidal force is attractive, leading to a focussing of the geodesics. For vacuum
plane waves, on the other hand, the tidal force is attractive in some directions and
repulsive in the other (reflecting the quadrupole nature of gravitational waves).
What is of interest to us here is the fact that the above equation shows that Aab itself
contains direct physical information. In particular, these tidal forces become infinite
where Aab (u) diverges. This is a true physical effect and hence the plane wave space-
time is genuinely singular at such points.
Since, on the other hand, the plane wave metric is clearly smooth for non-singular
Aab (u), we can thus summarise this discussion by the statement that a plane wave is
singular if and only if Aab (u) is singular somewhere.
928
42.5 From Rosen to Brinkmann coordinates (and back)
I still owe you an explanation of what the heuristic considerations of section 42.1 have
to do with the rest of this section. To that end I will now describe the relation between
the plane wave metric in Brinkmann coordinates,
It is clear that, in order to transform the non-flat transverse metric in Rosen coordinates
to the flat transverse metric in Brinkmann coordinates, one should change variables as
ai y i ,
xa = E (42.50)
aE
gij = E b
i j ab . (42.51)
ia , one has
Denoting the inverse vielbein by E
ai E
gij dy i dy j = (dxa E bj E
ic xc dU )(dxb E j xd dU )ab . (42.52)
d
This generates the flat transverse metric as well as dU 2 -term quadratic in the xa , as
desired, but there are also unwanted dU dxa cross-terms. Provided that E satisfies the
symmetry condition
E ai E bi E
i = E ia (42.53)
b
ai E
V V 12 E i xa xb . (42.54)
b
Apart from eliminating the dU dxa -terms, this shift will also have the effect of gener-
ating other dU 2 -terms. Thanks to the symmetry condition, the term quadratic in first
cancels that arising from gij dy i dy j , and only a second-derivative part
derivatives of E
remains. The upshot of this is that after the change of variables
U = u
V ai E
= v + 21 E i xa xb
b
yi i a
= E ax , (42.55)
E
Aab = E i
ai b . (42.56)
929
This can also be written as the harmonic oscillator equation
= A E
E (42.57)
ai ab bi
we had already encountered in the context of the geodesic (and geodesic deviation)
equation.
Note that from this point of view the Rosen coordinates are labelled by d out of 2d
linearly independent solutions of the oscillator equation, and the symmetry condition
can now be read as the constraint that the Wronskian among these solutions be zero.
Thus, given the metric in Brinkmann coordinates, one can construct the metric in Rosen
coordinates by solving the oscillator equation, choosing a maximally commuting set of
ai , and then determining gij algebraically from the E
solutions to construct E ai .
In practice, once one knows that Rosen and Brinkmann coordinates are indeed just
two distinct ways of describing the same class of metrics, one does not need to perform
explicitly the coordinate transformation mapping one to the other. All one is interested
in is the above relation between gij (U ) and Aab (u), which essentially says that Aab is
the curvature of gij ,
Aab = E j
i E
a b RU iU j . (42.58)
The equations simplify somewhat when the metric gij (u) is diagonal,
gij (u) = ei (u)2 ij . (42.59)
In that case one can choose E a = ei a . The symmetry condition is automatically
i i
satisfied because a diagonal matrix is symmetric, and one finds that Aab is also diagonal,
Aab = (ea /
ea )ab . (42.60)
Conversely, therefore, given a diagonal plane wave in Brinkmann coordinates, to obtain
the metric in Rosen coordinates one needs to solve the harmonic oscillator equations
e
i (u) = Aii (u)
ei (u) . (42.61)
Thus the Rosen metric determined by gij (U ) is flat iff ei (u) = ai U +bi for some constants
ai , bi . In particular, we recover the fact that the metric (42.4),
s2 = 2dU dV + U 2 d~y 2
d (42.62)
is flat. We see that the non-uniqueness of the metric in Rosen coordinates is due to
the integration constants arising when trying to integrate a curvature tensor to a
corresponding metric.
As another example, consider the four-dimensional vacuum plane wave (42.40). Evi-
dently, one way of writing this metric in Rosen coordinates is
s2 = 2dU dV + sinh2 U dX 2 + sin2 U dY 2 ,
d (42.63)
and more generally any plane wave with constant Aab can be chosen to be of this
trigonometric form in Rosen coordinates.
930
42.6 More on Rosen Coordinates
Collecting the results of the previous sections, we can now gain a better understanding
of the geometric significance (and shortcomings) of Rosen coordinates for plane waves.
defines a preferred family (congruence) of null geodesics, namely the integral curves of
the null vector field U , i.e. the curves
(U ( ), V ( ), y k ( )) = (, V, y k ) (42.65)
with affine parameter = U and parametrised by the constant values of the coordinates
(V, y k ). In particular, the origin V = y k = 0 of this congruence is the longitudinal
null geodesic (42.31) with v0 = 0 in Brinkmann coordinates.
In the region of validity of this coordinate system, there is a unique null geodesic of
this congruence passing through any point, and one can therefore label (coordinatise)
these points by specifying the geodesic (V, y k ) and the affine parameter U along that
geodesic, i.e. by Rosen coordinates.
We can now also understand the reasons for the failure of Rosen coordinates: they cease
to be well-defined (and give rise to spurious coordinate singularities) e.g. when geodesics
in the family (congruence) of null geodesics interesect: in that case there is no longer
a unique value of the coordinates (U, V, y k ) that one can associate to that intersection
point.
To illustrate this point, consider simply R2 with its standard metric ds2 = dx2 + dy 2 .
An example of a good congruence of geodesics is the straight lines parallel to the
x-axis. The corresponding Rosen coordinates (Rosen in quotes because we are not
talking about null geodesics) are simply the globally well-defined Cartesian coordinates,
x playing the role of the affine parameter U and y that of the transverse coordinates y k
labelling the geodesics. An example of a bad family of godesics is the straight lines
through the origin. The corresponding Rosen coordinates are essentially just polar
coordinates. Away from the origin there is again a unique geodesic passing through any
point but, as is well known, this coordinate system breaks down at the origin.
With this in mind, we can now reconsider the bad Rosen coordinates
s2 = 2dU dV + U 2 d~y 2
d (42.66)
for flat space. As we have seen above, in Brinkmann coordinates the metric is manifestly
flat,
s2 = 2dudv + d~x2 .
d (42.67)
931
Using the coordinate transformation (42.55) from Rosen to Brinkmann coordinates, we
see that the geodesic lines y k = ck , V = c of the congruence defined by the metric (42.66)
correspond to the lines xk = ck u in Brinkmann (Minkowski) coordinates, but these are
precisely the straight lines through the origin. This explains the coordinate singularity
at U = 0 and further strengthens the analogy with polar coordinates mentioned at the
end of section 42.1.
More generally, we see from (42.55) that the relation between the Brinkmann coordinates
xa and the Rosen coordinates y k ,
a (U )y k ,
xa = E (42.68)
k
and hence the expression for the geodesic lines y k = ck , becomes degenerate when Eka be-
comes degenerate, i.e. precisely when gij becomes degenerate. Brinkmann coordinates,
on the other hand, provide a global coordinate chart for plane wave metrics.
satisfies
= Tr A + (Tr M )2 Tr(M 2 ) Tr A = R
E/E uu , (42.70)
uu = Tr A (42.37) for the Ricci tensor,
where use has been made of the expression R
and where Mab is the symmetric matrix (42.53)
ai E
Mab = E i . (42.71)
b
We now study the isometries of a generic plane wave metric. In Brinkmann coordinates,
because of the explicit dependence of the metric on u and the transverse coordinates,
only one isometry is manifest, namely that generated by the parallel null vector Z = v .
In Rosen coordinates, the metric depends neither on V nor on the transverse coordinates
y k , and one sees that in addition to Z = V there are at least d more Killing vectors,
184
This is adapted from G. Gibbons, Quantized Fields Propagating in Plane-Wave Spacetimes, Com-
mun. Math. Phys. 45 (1975) 191-202.
932
namely the yk . Together these form an Abelian translation algebra acting transitively
on the null hypersurfaces of constant U .
However, this is not the whole story. Indeed, one particularly interesting and peculiar
feature of plane wave space-times is the fact that they generically possess a solvable
(rather than semi-simple) isometry algebra, namely a Heisenberg algebra, only part of
which we have already seen above.
All Killing vectors V can be found in a systematic way by solving the Killing equations
LV g = V + V = 0 . (42.72)
I will not do this here but simply present the results of this analysis in Brinkmann
coordinates. The upshot is that a generic (2 + d)-dimensional plane wave metric has a
(2d + 1)-dimensional isometry algebra generated by the Killing vector Z = v and the
2d Killing vectors
X(f(K) ) X(K) = f(K)a a f(K)a xa v . (42.73)
Here the f(K)a , K = 1, . . . , 2d are the 2d linearly independent solutions of the harmonic
oscillator equation (again!)
fa (u) = Aab (u)fb (u) . (42.74)
These Killing vectors satisfy the algebra
933
Therefore the corresponding Killing vectors
[Q(a) , Z] = [P(a) , Z] = 0
[Q(a) , Q(b) ] = [P(a) , P(b) ] = 0
[Q(a) , P(b) ] = ab Z . (42.82)
Generically, a plane wave metric has just this Heisenberg algebra of isometries. It
acts transitively on the null hyperplanes u = const., with a simply transitive Abelian
subalgebra. However, for special choices of Aab (u), there may of course be more Killing
vectors. These could arise from internal symmetries of Aab , giving more Killing vectors
in the transverse directions. For example, the conformally flat plane waves (42.43)
have an additional SO(d) symmetry (and conversely SO(d)-invariance implies conformal
flatness).
Of more interest to us is the fact that for particular Aab (u) there may be Killing vectors
with a u -component. The existence of such a Killing vector renders the plane wave
homogeneous (away form the fixed points of this extra Killing vector). The obvious
examples are plane waves with a u-independent profile Aab ,
which have the extra Killing vector X = u . Since Aab is u-independent, it can be
diagonalised by a u-independent orthogonal transformation acting on the xa . Moreover,
the overall scale of Aab can be changed, Aab 2 Aab , by the coordinate transformation
(boost)
(u, v, xa ) (u, 1 v, xa ) . (42.84)
Thus these metrics are classified by the eigenvalues of Aab up to an overall scale and
permutations of the eigenvalues.
R
= 0 u Aab = 0 . (42.85)
Thus a plane wave with constant wave profile Aab is what is known as a locally symmetric
space.
The existence of the additional Killing vector X = u extends the Heisenberg algebra
to the harmonic oscillator algebra, with X playing the role of the number operator or
934
harmonic oscillator Hamiltonian. Indeed, X and Z = v obviously commute, and the
commutator of X with one of the Killing vectors X(f ) is
Note that this is consistent, i.e. the right-hand side is again a Killing vector, because
when Aab is constant and f satisfies the harmonic oscillator equation then so does its
u-derivative f. In terms of the basis (42.81), we have
Another way of understanding the relation between X = u and the harmonic oscillator
Hamiltonian is to look at the conserved charge associated with X for particles moving
along geodesics. As we have seen in section 9.1, given any Killing vector X, the quantity
QX = X x (42.88)
QX = pu = gu x (42.89)
which we had already identified (up to a constant for non-null geodesics) as minus the
harmonic oscillator Hamiltonian in section 42.3. This is indeed a conserved charge iff
the Hamiltonian is time-independent i.e. iff Aab is constant.
We thus see that the dynamics of particles in a symmetric plane wave background is
intimately related to the geometry of the background itself.
Another class of examples of plane waves with an interesting additional Killing vector
are plane waves with the non-trivial profile
for some constant matrix Bab = Aab (1). Without loss of generality one can then assume
that Bab and Aab are diagonal, with eigenvalues the oscillator frequency squares a2 ,
Aab = a2 ab u2 . (42.91)
du2
s2 = 2dudv + Bab xa xb
d + d~x2 (42.92)
u2
is invariant under the boost/scaling (42.84), corresponding to the extra Killing vector
X = uu vv . (42.93)
935
Note that in this case the Killing vector Z = v is no longer a central element of the
isometry algebra, since it has a non-trivial commutator with X,
[X, Z] = Z . (42.94)
Moreover, one finds that the commutator of X with a Heisenberg algebra Killing vector
X(f ), fa a solution to the harmonic oscillator equation, is the Heisenberg algebra Killing
vector
[X, X(f )] = X(uf) , (42.95)
This concludes our brief discussion of plane wave metrics even though much more can
and perhaps should be said about plane wave and pp-wave metrics, in particular in the
context of the so-called Penrose Limit construction. For more on this see my lecture
notes185 (from which I also took the material in this chapter).
185
M. Blau, Lecture Notes on Plane Waves and Penrose Limits, available from
http://www.blau.itp.unibe.ch/Lecturenotes.html
936
43 Kaluza-Klein Theory
[Note: I have not taught, and hence not updated / corrected / improved, the material
and the presentation of the material in this section in a long time and am not particularly
happy with its current appearance.]
Looking at the Einstein equations and the variational principle, we see that gravity is
nicely geometrised while the matter part has to be added by hand and is completely
non-geometric. This may be perfectly acceptable for phenomenological Lagrangians
(like that for a perfect fluid in cosmology), but it would clearly be desirable to have a
unified description of all the fundamental forces of nature.
Today, the fundamental forces of nature are described by two very different concepts.
On the one hand, we have - as we have seen - gravity, in which forces are replaced by
geometry, and on the other hand there are the gauge theories of the electroweak and
strong interactions (the standard model) or their (grand unified, . . . ) generalisations.
Thus, if one wants to unify these forces with gravity, there are two possibilities:
1. One can try to realise gravity as a gauge theory (and thus geometry as a conse-
quence of the gauge principle).
2. Or one can try to realise gauge theories as gravity (and hence make them purely
geometric).
The first is certainly an attractive idea and has attracted a lot of attention. It is also
quite natural since, in a broad sense, gravity is already a gauge theory in the sense
that it has a local invariance (under general coordinate transformations or, actively,
diffeomorphisms). Also, the behaviour of Christoffel symbols under general coordinate
transformations is analogous to the transformation behaviour of non-Abelian gauge
fields under gauge transformations, and the whole formalism of covariant derivatives
and curvatures is reminiscent of that of non-Abelian gauge theories.
At first sight, equating the Christoffel symbols with gauge fields (potentials) may ap-
pear to be a bit puzzling because we originally introduced the metric as the potential
of the gravitational field and the Christoffel symbol as the corresponding field strength
(representing the gravitational force). However, as we know, the concept of force is
itself a gauge (coordinate) dependent concept in General Relativity, and therefore these
field strengths behave more like gauge potentials themselves, with their curvature, the
Riemann curvature tensor, encoding the gauge covariant information about the gravi-
tational field. This fact, which reflects deep properties of gravity not shared by other
937
forces, is just one of many which suggest that an honest gauge theory interpretation of
gravity may be hard to come by. Let us nevertheless proceed in this direction for a little
while anyway.
Clearly, the gauge group should now not be some internal symmetry group like U (1)
or SU (3), but rather a space-time symmetry group itself. Among the gauge groups that
have been suggested in this context, one finds
1. the translation group (this is natural because, as we have seen, the generators of
coordinate transformations are infinitesimal translations)
2. the Lorentz group (this is natural if one wants to view the Christoffel symbols as
the analogues of the gauge fields of gravity)
However, what - by and large - these investigations have shown is that the more one
tries to make a gauge theory look like Einstein gravity the less it looks like a standard
gauge theory and vice versa.
The main source of difference between gauge theory and gravity is the fact that in the
case of Yang-Mills theory the internal indices bear no relation to the space-time indices
whereas in gravity these are the same - contrast Fa with (F )
= R .
In particular, in gravity one can contract the internal with the space-time indices to
obtain a scalar Lagrangian, R, linear in the curvature tensor. This is fortunate because,
from the point of view of the metric, this is already a two-derivative object.
For Yang-Mills theory, on the other hand, this is not possible, and in order to construct
a Lagrangian which is a singlet under the gauge group one needs to contract the space-
time and internal indices separately, i.e. one has a Lagrangian quadratic in the field
stregths. This gives the usual two-derivative action for the gauge potentials.
In spite of these and other differences and difficulties, this approach has not been com-
pletely abandoned and the gauge theory point of view is still very fruitful and useful
provided that one appreciates the crucial features that set gravity apart from standard
gauge theories.
The second possibility alluded to above, to realise gauge theories as gravity, is much
more radical, but how on earth is one supposed to achieve this? The crucial idea
has been known since 1919/20 (T. Kaluza), with important contributions by O. Klein
(1926). So what is this idea?
938
43.2 The Kaluza-Klein Miracle: History and Overview
In the early parts of the last century, the only other fundamental force that was known,
in addition to gravity, was electro-magnetism, In 1919, Kaluza submitted a paper (to
Einstein) in which he made a number of remarkable observations.
First of all, he stressed the similarity between Christoffel symbols and the Maxwell field
strength tensor,
1
= 2 ( g g + g )
F = A A . (43.1)
He then noted that F looks like a truncated Christoffel symbol and proposed, in order
to make this more manifest, to introduce a fifth dimension with a metric such that
5 F . This is inded possible. If one makes the identification
A = g5 , (43.2)
and the assumption that g5 is independent of the fifth coordinate x5 , then one finds,
using the standard formula for the Christoffel symbols, now extended to five dimensions,
that
1
5 = 2 (5 g + g5 g5 )
1 1
= 2 ( A A ) = 2 F . (43.3)
If there were all, this would not be particuarly exciting, but much more than this is
true. Kaluza went on to show that when one postulates a five-dimensional metric of the
form (hatted quantities will from now on refer to five dimensional quantities)
s2 = g dx dx + (dx5 + A dx )2 ,
db (43.4)
b = R 1 F F .
R (43.5)
4
This fact is affectionately known as the Kaluza-Klein Miracle! Moreover, the five-
dimensional geodesic equation turns into the four-dimensional Lorentz force equation
for a charged particle, and in this sense gravity and Maxwell theory have really been
unified in five-dimensional gravity.
However, although this is very nice, rather amazing in fact, and is clearly trying to tell
us something deep, there are numerous problems with this and it is not really clear
what has been achieved:
939
2. If it is to be treated as real, why should one make the assumption that the fields
are independent of x5 ? If, on the other hand, one does not make this assumption,
one will not get Einstein-Maxwell theory.
In spite of all this and other questions, related to non-Abelian gauge symmetries or the
quantum behaviour of these theories, Kaluzas idea has remained popular ever since or,
rather, has periodically created psychological epidemics of frantic activity, interrupted
by dormant phases. Today, Kaluzas idea, with its many reincarnations and variations,
is an indispensable and fundamental ingredient in the modern theories of theoretical high
energy physics (supergravity and string theories) and many of the questions/problems
mentioned above have been addressed, understood and overcome.
Let us now look at this more precisely. We consider a five-dimensional space-time with
coordinates xbM = (x , x5 ) and a metric of the form (43.4). For later convenience, we
will introduce a parameter into the metric (even though we will set = 1 for the time
being) and write it as
s2 = g dx dx + (dx5 + A dx )2 .
db (43.6)
gb = g + A A
gb5 = A
gb55 = 1 . (43.7)
gb = g
gb5 = A
gb55 = 1 + A A . (43.8)
We will (for now) assume that nothing depends on x5 (in the old Kaluza-Klein literature
this assumption is known as the cylindricity condition).
F = A A
B = A + A , (43.9)
940
the Christoffel symbols are readily found to be
b = 1 (F A + F A )
2
b 5
= 2 B 2 A (F A + F A ) A
1 1
b 1
5 = 2 F
b 55 = 1 F A
2
b =
b 555 = 0 . (43.10)
55
This does not look particularly encouraging, in particular because of the presence of
the B term, but Kaluza was not discouraged and proceeded to calculate the Riemann
tensor. I will spare you all the components of the Riemann tensor, but the Ricci tensor
we need:
b = R + 1 F F + 1 F F A A + 1 (A F + A F )
R 2 4 2
b 1 1
R5 = + 2 F + 4 A F F
b55 = 1 F F .
R (43.11)
4
This looks a bit more attractive and covariant but still not very promising. [However,
if you work in an orthonormal basis, as introduced for the Kaluza-Klein metric as an
example in section 3.8, the result looks much nicer. In such a basis only the first two
b and the first term in R
terms in R b5 are present and Rb55 is unchanged, so that all
the non-covariant looking terms disappear.] Now the miracle happens. Calculating the
curvature scalar, all the annoying terms drop out and one finds
b = R 1 F F ,
R (43.12)
4
i.e. the Lagrangian of Einstein-Maxwell theory. For 6= 1, the second term would have
been multiplied by 2 . We now consider the five-dimensional pure gravity Einstein-
Hilbert action Z p
b 1 b .
S= gbd5 x R (43.13)
b
8 G
In order for the integral over x5 to converge we assume that the x5 -direction is a circle
with radius L and we obtain
Z
b 2L 4
S= gd x (R 14 2 F F ) . (43.14)
b
8 G
Therefore, if we make the identifications
GN b
= G/2L
2 = 8GN , (43.15)
we obtain Z Z
1 4 1 4
Sb = gd x R gd x F F , (43.16)
8GN 4
i.e. precisely the four-dimensional Einstein-Maxwell Lagrangian! This amazing fact, that
coupled gravity gauge theory systems can arise from higher-dimensional pure gravity,
is certainly trying to tell us something.
941
43.3 Origin of Gauge Invariance
and contrast this with the most general form of the line element in five dimensions,
namely
Clearly, the form of the general five-dimensional line element (43.18) is invariant under
arbitrary five-dimensional general coordinate transformations xM M (xN ). This
is not true, however, for the Kaluza-Klein ansatz (43.17), as a general x5 -dependent
coordinate transformation would destroy the x5 -independence of b g = g and b g5 =
A and would also not leave gb55 = 1 invariant.
The form of the Kaluza-Klein line element is, however, invariant under the following
two classes of coordinate transformations:
x5 x5
x (x ) (43.19)
x5 5 (x , x5 ) = x5 + f (x )
x (x ) = x . (43.20)
xM xN
A = gb5
= gbM N
5
xM
= g5
x
f
= g5 g55
x
= A f . (43.21)
942
In other words, the Kaluza-Klein line element is invariant under the shift x5
x5 + f (x ) accompanied by A A f (and this can of course also be read
off directly from the metric).
This is precisely a gauge transformation of the vector potential A and we see that in
the present context gauge transformations arise as remnants of five-dimensional general
covariance!
Now it is clear that we are guaranteed to get Einstein-Maxwell theory in four dimensions:
First of all, upon integration over x5 , the shift in x5 is irrelevant and starting with the
five-dimensional Einstein-Hilbert action we are bound to end up with an action in four
dimensions, depending on g and A , which is
From this point of view, the gauge transformation of the vector potential arises from
the Lie derivative of gb5 along the vector field f (x )5 :
Y = f (x )5 Y = 0
Y5 =f
Y = A f
Y5 = f . (43.22)
(LY b
g)5 = b Y5 +
b 5 Y
= b Y M
Y5 2 5M
= f + F Y + F A Y5
= f
A = f . (43.23)
This point of view becomes particularly useful when one wants to obtain non-Abelian
gauge symmetries in this way (via a Kaluza-Klein reduction): One starts with a higher-
dimensional internal space with isometry group G and makes an analogous ansatz for the
943
metric. Then among the remnants of the higher-dimensional general coordinate trans-
formations there are, in particular, x -dependent isometries of the internal metric.
These act like non-Abelian gauge transformations on the off-block-diagonal compone-
nents of the metric and, upon integration over the internal space, one is guaranteed to
get, perhaps among other things, the four-dimensional Einstein-Hilbert and Yang-Mills
actions.
There is something else that works very beautifully in this context, namely the descrip-
tion of the motion of charged particles in four dimensions moving under the combined
influence of a gravitational and an electro-magnetic field. As we will see, also these two
effects are unfied from a five-dimensional Kaluza-Klein point of view.
x bM
M + N x L = 0 .
N Lx (43.24)
Either because the metric (and hence the Lagrangian) does not depend on x5 , or because
we know that V = 5 is a Killing vector of the metric, we know that we have a conserved
quantity
L
VM x M = x 5 + A x , (43.25)
x 5
along the geodesic world lines. We will see in a moment what this quantity corresponds
to. The remaining x -component of the geodesic equation is
b x N x L = x
+
x + b x x
NL
b x x 5 + 2
+ 2 b x 5 x 5
5 55
+ x x F A x x F x x 5
= x
+ x x F x (A x + x 5 ) .
= x (43.26)
+ x x = (A x + x 5 )F x .
x (43.27)
This is precisely the Lorentz law if one identifies the constant of motion with the ratio
of the charge and the mass of the particle,
e
x 5 + A x = . (43.28)
m
Hence electro-magnetic and gravitational forces are indeed unified. The fact that
charged particles take a different trajectory from neutral ones is not a violation of
the equivalence principle but only reflects the fact that they started out with a different
velocity in the x5 -direction!
944
43.5 First Problems: The Equations of Motion
R 21 g R = 8GN T
F = 0 . (43.29)
However, let us now take a look at the equations of motion following from the five-
dimensional Einstein-Hilbert action. These are, as we are looking at the vacuum equa-
tions, just the Ricci-flatness equations RbM N = 0. Looking back at (43.11) we see
that these are clearly not equivalent to the Einstein-Maxwell equations. In particular,
Rb55 = 0 imposes the constraint
b55 = 0 F F = 0 ,
R (43.30)
b = 0, R
and only then do the remaining equations R b5 = 0 become equivalent to the
Einstein-Maxwell equations (43.29).
What happened? Well, for one, taking variations and making a particular ansatz for
the field configurations in the variational principle are two operations that in general do
not commute. In particular, the Kaluza-Klein ansatz is special because it imposes the
condition g55 = 1. Thus in four dimensions there is no equation of motion corresponding
to gb55 whereas R b55 = 0, the additional constraint, is just that, the equation arising
from varying b g55 . Thus Einstein-Maxwell theory is not a consistent truncation of five-
dimensional General Relativity.
Now we really have to ask ourselves what we have actually achieved. We would like
to claim that the five-dimensional Einstein-Hilbert action unifies the four-dimensional
Einstein-Hilbert and Maxwell actions, but on the other hand we want to reject the
five-dimensional Einstein equations? Then we are not ascribing any dynamics to the
fifth dimension and are treating the Kaluza-Klein miracle as a mere kinematical, or
mathematical, or bookkeeping device for the four-dimensional fields. This is clearly
rather artificial and unsatisfactory.
There are some other unsatisfactory features as well in the theory we have developed so
far. For instance we demanded that there be no dependence on x5 , which again makes
the five-dimensional point of view look rather artificial. If one wants to take the fifth
dimension seriously, one has to allow for an x5 -dependence of all the fields (and then
explain later, perhaps, why we have not yet discovered the fifth dimension in every-day
or high energy experiments).
945
43.6 Masses and Charges from Scalar Fields in 5 Dimenions
With these issues in mind, we will now revisit the Kaluza-Klein ansatz, regarding the
fifth dimension as real and exploring the consequences of this. Instead of considering
directly the effect of a full (i.e. not restricted by any special ansatz for the metric) five-
dimensional metric on four-dimensional physics, we will start with the simpler case of
a free massless scalar field in five dimensions.
Let us assume that we have a five-dimensional space-time of the form M5 = M4 S 1
where we will at first assume that M4 is Minkowski space and the metric is simply
with x5 a coordinate on a circle with radius L. Now consider a massless scalar field b
on M5 , satisfying the five-dimensional massless Klein-Gordon equation
b (x
b , x5 ) = 0 .
b , x5 ) = bM N M N (x (43.32)
As x5 is periodic with period 2L, we can make a Fourier expansion of b to make the
x5 -dependence more explicit,
X 5
b , x5 ) =
(x n (x )e inx /L . (43.33)
n
Plugging this expansion into the five-dimensional Klein-Gordon equation, we find that
this turns into an infinite number of decoupled equations, one for each Fourier mode of
b namely
n of ,
( m2n )n = 0 . (43.34)
Here of course now refers to the four-dimensional dAlembertian, and the mass term
n2
m2n = (43.35)
L2
b
arises from the x5 -derivative 52 in .
Thus we see that, from a four-dimensional perspective, a massless scalar field in five
dimensions give rise to one massless scalar field in four dimensions (the harmonic or
constant mode on the internal space) and an infinite number of massive fields. The
masses of these fields, known as the Kaluza-Klein modes, have the behaviour mn n/L.
In general, this behaviour, an infinite tower of massive fields with mass 1/ length scale
is characteristic of massive fields arising from dimensional reduction from some higher
dimensional space.
Next, instead of looking at a scalar field on Minkowski space times a circle with the
product metric, let us consider the Kaluza-Klein metric,
946
and the corresponding Klein-Gordon equation
b , x5 ) = b
b (x
gM N b , x5 ) = 0 .
b M N (x (43.37)
Rather than spelling this out in terms of Christoffel symbols, it is more convenient to
p
use (4.56) and recall that gb = g = 1 to write this as
b = M (b
gM N N )
= gb + 5 gb5 + b
g5 5 + 5 b
g55 5
= + 5 (A ) + (A 5 ) + (1 + 2 A A )5 5
= ( A 5 )( A 5 ) + (5 )2 . (43.38)
Acting with this operator on the Fourier decomposition of ,b we evidently again get an
infinite number of decoupled equations, one for each Fourier mode n of ,b namely
n n 2
( i A )( i A ) mn n = 0 . (43.39)
L L
This shows that the non-constant (n 6= 0) modes are not only massive but also charged
under the gauge field A . Comparing the operator
n
i A (43.40)
L
with the standard form of the minimal coupling,
~
eA , (43.41)
i
we learn that the electric charge en of the nth mode is given by
en n
= . (43.42)
~ L
In particular, these charges are all integer multiples of a basic charge, en = ne, with
~ 8GN ~
e= = . (43.43)
L L
Thus we get a formula for L, the radius of the fifth dimension,
8GN ~2 8GN ~
L2 = 2
= 2 . (43.44)
e e /~
Restoring the velocity of light in this formula, and identifying the present U (1) gauge
symmetry with the standard gauge symmetry, we recognise here the fine structure con-
stant
= e2 /4~c 1/137 , (43.45)
947
Thus
22P
L2 = 2742P . (43.47)
This is very small indeed, and it is therefore no surprise that this fifth dimension, if it
is the origin of the U (1) gauge invariance of the world we live in, has not yet been seen.
Another way of saying this is that the fact that L is so tiny implies that the masses mn
are huge, not far from the Planck mass
r
~c
mP = 105 g 1019 GeV . (43.48)
GN
These would never have been spotted in present-day accelerators. Thus the massive
modes are completely irrelevant for low-energy physics, the non-constant modes can be
dropped, and this provides a justification for neglecting the x5 -dependence. However,
this also means that the charged particles we know (electrons, protons, . . . ) cannot
possibly be identified with these Kaluza-Klein modes.
The way modern Kaluza-Klein theories address this problem is by identifying the light
charged particles we observe with the massless Kaluza-Klein modes. One then requires
the standard spontaneous symmetry breaking mechanism to equip them with the small
masses required by observation. This still leaves the question of how these particles
should pick up a charge (as the zero modes are not only massless but also not charged).
This is solved by going to higher dimensions, with non-Abelian gauge groups, for which
massless particles are no longer necessarily singlets of the gauge group (they could e.g.
live in the adjoint).
We have seen above that a massless scalar field in five dimensions gives rise to a massless
scalar field plus an infinite tower of massive scalar fields in four dimensions. What
happens for other fields (after all, we are ultimately interested in what happens to the
five-dimensional metric)?
948
Retaining, for the same reasons as before, only the massless, i.e. x5 -independent, modes
we therefore obtain a theory involving one scalar field and one Abelian vector field
from pure Maxwell theory in five dimensions. The Lagrangian for these fields would be
(dropping all x5 -derivatives)
FM N F M N = F F + 2F5 F 5
F F + 2( )( ) . (43.49)
Likewise, we can now consider what happens to the five-dimensional metric b gM N (xL ).
From a four-dimensional perspective, this splits into three different kinds of fields,
namely a symmetric tensor b g , a covector A = gb5 and a scalar = b g55 . As be-
fore, these will each give rise to a massless field in four dimensions (which we interpret
as the metric, a vector potential and a scalar field) as well as an infinite number of
massive fields.
We see that, in addition to the massless fields we considered before, in the old Kaluza-
Klein ansatz, we obtain one more massless field, namely the scalar field . Thus, even
if we may be justified in dropping all the massive modes, we should keep this massless
field in the ansatz for the metric and the action. With this in mind we now return to
the Kaluza-Klein ansatz.
Let us once again consider pure gravity in five dimensions, i.e. the Einstein-Hilbert
action Z p
1
Sb = b .
gbd5 x R (43.50)
b
8 G
Let us now parametrise the full five-dimensional metric as
s2 = 1/3 [g dx dx + (dx5 + A dx )2 ] ,
db (43.51)
where all the fields depend on all the coordinates x , x5 . Any five-dimensional metric
can be written in this way and we can simply think of this as a change of variables
gbM N (g , A , ) . (43.52)
949
In matrix form, this metric reads
!
g + 2 A A A
gM N ) = 1/3
(b (43.53)
A
The only thing that may require some explanation is the strange overall power of . To
see why this is a good choice, assume that the overall power is a for some a. Then for
p
gb one finds
p
gb = 5a/2 1/2 g = (5a+1)/2 g . (43.54)
On the other hand, for the Ricci tensor one has, schematically,
= R + . . . ,
R (43.55)
and therefore
= gb R + . . .
R
= a g R + . . .
= a R + . . . . (43.56)
Thus, if one wants the five-dimensional Einstein-Hilbert action to reduce to the standard
four-dimensional Einstein-Hilbert action (plus other things), without any non-minimal
coupling of the scalar field to the metric, one needs to choose a = 1/3 which is the
choice made in (43.51,43.53).
Making a Fourier-mode expansion of all the fields, plugging this into the Einstein-Hilbert
action Z p
1 ,
gbd5 x R (43.58)
b
8 G
integrating over x5 and retaining only the constant modes g(0) , A(0) and (0) , one
obtains the action
Z
4 1 1 1 2
S= gd x R(g(0) ) (0) F(0) F(0) g (0) (0) .
8GN 4 48GN (0) (0)
(43.59)
Here we have once again made the identifications (43.15). This action may not look as
nice as before, but it is what it is. It is at least generally covariant and gauge invariant,
950
as expected. We also see very clearly that it is inconsistenst with the equations of
motion for (0) ,
3
log (0) = 8GN (0) F(0) F(0) , (43.60)
4
to set (0) = 1 as this would imply F(0) F(0) = 0, in agreement with our earlier
observations regarding R55 = 0.
Even though in certain respects the Abelian theory we have discussed above is atypi-
cal, it is rather straightforward to generalise the previous considerations from Maxwell
theory to Yang-Mills theory for an arbitrary non-Abelian gauge group. Of course, to
achieve that, one needs to consider higher-dimensional internal spaces, i.e. gravity in
4 + d dimensions, with a space-time of the form M4 Md . The crucial observation is
that gauge symmetries in four dimensions arise from isometries (Killing vectors) of the
metric on Md .
Let the coordinates on Md be xa , denote by gab the metric on Md , and let Kia , i =
1, . . . , n denote the n linearly independent Killing vectors of the metric gab . These
generate the Lie algebra of the isometry group G via the Lie bracket
Md could for example be the group manifold of the Lie group G itself, or a homogeneous
space G/H for some subgroup H G.
Note the appearance of fields with the correct index structure to act as non-Abelian
gauge fields for the gauge group G, namely the Ai . Again these should be thought of
as fluctuations of the metric around its ground state, M4 Md with its product metric
(g , gab ).
951
Now consider an infinitesimal coordinate transformation generated by the vector field
i.e.
xa = f i (x )Kia (xb ) . (43.64)
b
ga = LV gba (43.65)
i.e. precisely an infinitesimal non-Abelian gauge transformation. The easiest way to see
this is to use the form of the Lie derivative not in its covariant form,
b Va +
ga =
LV b b a V (43.67)
ga = V c c gba + V c b
LV b gca + a V c gbc . (43.68)
Inserting the definitions of gba and V a , using the fact that the Kia are Killing vectors
of the metric gab and the relation (43.61), one finds
ga = gab Kib D f i ,
LV b (43.69)
i
LY M F F j Kia Kjb gab (43.70)
The problem with this scenario (already prior to worrying about the inclusion of scalar
fields, of which there will be plenty in this case, one for each component of gab ) is
that the four-dimensional space-time cannot be chosen to be flat. Rather, it must
have a huge cosmological constant. This arises because the dimensional reduction of
the (4 + d)-dimensional Einstein-Hilbert Lagrangian R will also include a contribution
from the scalar curvature Rd of the metric on Md . For a compact internal space with
non-Abelian isometries this scalar curvature is non-zero and will therefore lead to an
effective cosmological constant in the four-dimensional action. This cosmological con-
stant could be cancelled by hand by introducing an appropriate cosmological constant
of the opposite sign into the (d + 4)-dimensional Einstein-Hilbert action, but this looks
rather contrived and artificial.
952
Nevertheless, this and other problems have not stopped people from looking for realistic
Kaluza-Klein theories giving rise to the standard model gauge group in four dimension.
Of course, in order to get the standard model action or something resembling it, fermions
need to be added to the (d + 4)-dimensional action.
An interesting observation in this regard is that the lowest possible dimension for a
homogenous space with isometry group G = SU (3) SU (2) U (1) is seven, so that the
dimension of space-time is eleven. This arises because the maximal compact subgroup H
of G, giving rise to the smallest dimensional homogeneous space G/H of G, is SU (2)
U (1) U (1). As the dimension of G is 8 + 3 + 1 = 12 and that of H is 3 + 1 + 1 = 5, the
dimension of G/H is 12 5 = 7. This is intriguing because eleven is also the highest
dimension in which supergravity exists (in higher dimensions, supersymmetry would
require the existence of spin > 2 particles). That, plus the hope that supergravity
would have a better quantum behaviour than ordinary gravity, led to an enourmous
amount of activity on Kaluza-Klein supergravity in the early 80s.
Unfortunately, it turned out that not only was supergravity sick at the quantum level
as well but also that it is impossible to get a chiral fermion spectrum in four dimensions
from pure gravity plus spinors in (4+d) dimensions. One way around the latter problem
is to include explicit Yang-Mills fields already in (d + 4)-dimensions, but that appeared
to defy the purpose of the whole Kaluza-Klein idea.
Today, the picture has changed and supergravity is regarded as a low-energy approxi-
mation to string theory which is believed to give a consistent description of quantum
gravity. These string theories typically live in ten dimensions, and thus one needs
to compactify the theory on a small internal six-dimensional space, much as in the
Kaluza-Klein idea. Even though non-Abelian gauge fields now typically do not arise
from Kaluza-Klein reduction but rather from explicit gauge fields in ten dimensions (or
objects called D-branes), in all other respects Kaluzas old idea is alive, doing very well,
and an indispensable part of the toolkit of modern theoretical high energy physics.
953
THE END
954