Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
124 views

Study Guide APM4806

Uploaded by

hstsnam135
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views

Study Guide APM4806

Uploaded by

hstsnam135
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 123

APM4806/001/4/2024

Tutorial letter 001/4/2024

RIEMANNIAN GEOMETRY AND TENSOR


CALCULUS
APM4806

Year module

Department of Mathematical Sciences

STUDY GUIDE

BARCODE

university
Define tomorrow. of south africa
APM3713/1

Contents

Page

1 Curves in E3 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Arclength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Tangent line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Normal Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Serret-Frenet equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Surfaces in E3 18
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 The First Fundamental Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Second Fundamental Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Principal curvatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 Fundamental Equations of Surface Theory . . . . . . . . . . . . . . . . . . . . . . 40
2.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3 Tensor analysis 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Contravariant and covariant tensors . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3 The metric tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4 Covariant derivatives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5 Covariant derivative of the metric tensor . . . . . . . . . . . . . . . . . . . . . . . 75
3.6 Geodesic equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.7 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
iii
CONTENTS

4 Special Relativity 87
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2 Newtonian relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3 Maxwell’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4 Einstein’s postulates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5 The Lorentz transformation equations . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.8 The four-vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.9 Relativistic momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.10 Relativistic force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.11 Kinetic energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.12 Transformation properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.13 Momentum transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.14 Force transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.15 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

iv
APM3713/1

Preface

The geometry of mathematical structures can be studied in several ways depending on what
one means by “geometry”. For example, Euclidean geometry is studied via axioms concerning
points and lines. Another approach is to study things such as curves, surfaces and more abstract
structures by means of differential calculus. We call this differential geometry and this module
deals with the differential geometry of curves and surfaces in 3-dimensional Euclidean space and
the last part introduces the theory of special relativity.
There are two aspects to the subject – local and global differential geometry. The former
deals with small regions of the structure under consideration and the latter with the structure
as a whole. We will only be concerned with local aspects. Global differential geometry is closely
linked with topology which is beyond the scope of an undergraduate course.
Differential geometry rightfully belongs in mathematics and the question is: why is it part of
an applied mathematics course? There are two reasons. In the first place, the subject generates
some beautiful and quite remarkable theorems and my personal feeling is that differential
geometry should be an essential part of the education of a student studying any aspect of
the mathematical sciences. Secondly, differential geometry has proved to be probably the most
widely used aspect of mathematics in our understanding of the physical world. Einstein’s general
theory of relativity, which must be regarded as one of the greatest intellectual achievements of all
time, is based on differential geometry. More recently, elementary particle physics underwent a
radical change via so-called gauge theory and, again, the mathematical foundation was differential
geometry. In fact, the interaction between physics and differential geometry is so strong that a
great deal of modern differential geometry has been developed by theoretical physicists.
There is little doubt that the theories of relativity captures the imagination. This fascination
interest even those students who are without much knowledge of physics. Relativity impacts
on various aspects of human intellect, one example being how we view the cosmos. The most
interesting aspect is that the mathematics behind the theory is elegant in both expression and
formulation.
This module is meant for student who wants to gain insight into, and confidence in handling,
the basic mathematics of relativity. The module is developed in such a way that it would be
possible for students to reach three major topics of current research interest, namely, general
relativity, cosmology and black holes. Students are urged to to attempt all the exercises which
accompany the various sections. Experience has shown that this is the only real way to achieve
v
Preface

a full understanding of the material.

Prof. W.M. Lesame 2010

vi
APM3713/1

Chapter 1

Curves in E3

1.1 Introduction

The theory of curves in E3 (3-dimensional Euclidean space) is very easy but some mathematically
interesting features will nonetheless emerge. We will use vector methods throughout and I will
assume that you know the basics of vector algebra including the scalar and vector products of
vectors.
We know that, in a plane, a circle, for example, can be given by two representations. We can
either write the equation of a circle as

x2 + y 2 = a2

or, in parametric form, as

x = a cos u, y = a sin u, 0 ≤ u ≤ 2π.

A similar situation prevails when we go from a plane to E3 . In E3 , however, a single equation


connecting x, y, and z generally represents a surface so that a curve is specified by two equations
viz. F (x, y, z) = 0 and G(x, y, z) = 0, the implication being that the curve represented by these
two equations is the curve of intersection of the surfaces F = 0, G = 0. In parametric form a
curve in E3 is given by
x = x(u), y = y(u), z = z(u) (1.1)
or, in vector notation, by
r = r(u) = (x(u), y(u), z(u)). (1.2)
Generally speaking, it is more convenient to work with the parametric representation but
there is one disadvantage here in that a parametric representation sometimes specifies too much.
By this I mean that a parametric representation can give a perfectly good description of a curve
but it can also ascribe some odd properties to the curve which are peculiar to the particular
choice of parameter. In other words, these properties would not necessarily be evident if a
different parameter were chosen. We get around this difficulty by concentrating our studies on
those properties of a curve which are invariant under a change of parameter. We might regard
these properties as being the true properties of the curve.
1
Chapter 1. Curves in E3

The non-parametric description of a curve (i.e. the description in terms of F = 0, G = 0)


whilst being more cumbersome than the parametric representation has the added disadvantage
that it often specifies too little. Consider, for example, the surfaces xz = y 2 , xy = z. If we
eliminate z between these equations we obtain y = x2 and if we eliminate y we obtain z = x3 . If
we choose a parameter u such that x = u, then the parametric form of the curve is

x = u, y = u2 , z = u3 −∞<u<∞ (1.3)

which is commonly called a twisted cubic. However, it is clear from xz = y 2, xy = z that these
equations are satisfied for all values of x when y = z = 0. In other words, the surfaces intersect
along the x-axis as well as along the cubic so in this sense the equations of the surfaces do not
define a unique curve.
dr
We shall restrict our attention to curves which are such that ṙ = 6= 0 on the domain of u
du
– such curves are called regular. A point where ṙ = 0 is called a singular point and such points
are excluded from our discussion.

1.2 Arclength

The arclength s of a curve between a given point (with parameter value u = u0) and an arbitrary
point is defined by Z u
s= |ṙ(u)| du (1.4)
u0

which, in a cartesian coordinate system, becomes


Z up
s= ẋ2 + ẏ 2 + ż 2 du.
u0

It is trivial to verify that s is independent of the choice of parameter. Very often s itself is used
as the parameter in a parametric representation of a curve and this is generally called the natural
parametrization of the curve. We often speak of the positive sense or positive orientation of a
curve and by this we simply mean that s increases with increasing parameter value u. In other
words, the orientation provides a direction to the curve.

1.3 Tangent line

Suppose that a curve Γ is given by r = r(u). The tangent vector to Γ is given by

dr
ṙ =
du

and the unit tangent vector t̂ is simply



t̂ = .
|ṙ|
2
APM3713/1

The orientation of ṙ is the same as that of Γ i.e. ṙ points in the direction of increasing u. If we
choose s as parameter rather than u and denote differentiation with respect to s by a prime then
dr
t̂ = = r′ (1.5)
ds
because by (1.4)
dr dr du ṙ(u)
= = .
ds du ds |ṙ(u)|
A straight line parallel to t̂ and passing through the point where t̂ is attached to Γ is called the
tangent line to the curve at that point.
Now let P and Q be neighboring points on a curve Γ : r = r(s) and suppose that the
parameter values of P and Q are 0 and s respectively. Consider the plane which contains the
tangent line at P and the point Q. In the limit as Q → P, this plane is called the osculating plane
of Γ at P. If Γ happens to be a straight line then clearly the osculating plane is not uniquely
defined – it can be any plane containing the line. It is a simple matter to obtain the equation of
the osculating plane.
Let R be the position vector of an arbitrary point in the plane containing r ′ (0) and Q. Clearly the
vectors r ′ (0) and PQ = r(s) − r(0) lie in this plane so that r ′ (0) × (r(s) − r(0)) is perpendicular
to this plane. The vector R − r(0) is an arbitrary vector in this plane so that

(R − r(0)) · [r ′ (0) × (r(s) − r(0))] = 0

is the equation of the plane. Now, by Taylor’s theorem we have


1
r(s) = r(0) + sr ′ (0) + s2 r ′′ (0) + O(s2 ).
2
Neglecting terms like O(s2 ) as we take the limit Q → P, we obtain the equation of the osculating
plane at P viz.
(R − r(0)) · [r ′ (0) × r ′′ (0)] = 0. (1.6)
This equation is valid provided that r ′ (0) and r ′′ (0) are not co-directional. Now since t̂ · t̂ =
r ′ · r ′ = 1, it follows that r ′ · r ′′ = 0 so that r ′′ (0) and r ′ (0) are not co-directional unless
r ′′ (0) = 0. A point where r ′′ = 0 is called a point of inflection. Unless the curve is a straight line
we still obtain a definite osculating plane at a point of inflexion by simply performing further
differentiations until we obtain the first non-zero derivative. For example: if r ′′ (0) = 0 then we
differentiate r ′ · r ′′ = 0 to obtain r ′ (0) · r ′′′ (0) = 0 (because r ′′ (0) = 0 so that either r ′ (0) and
r ′′′ (0) are not co-directional or r ′′′ (0) = 0. If r ′′′ (0) = 0 we continue differentiating until we
obtain the first non-zero derivative—say r(k) . We then have
sk (k)
r(s) = r(0) + r (0) + O(sk )
k!
and the equation of the osculating plane becomes

(R − r(0)) · r ′ (0) × r(k) (0) = 0.


 

3
Chapter 1. Curves in E3

1.4 Normal Plane

We now define the normal plane to Γ at P to be the plane passing through P which is orthogonal
to the tangent line at P. The principal normal at P is the intersection of the normal plane and the
osculating plane at P. A unit vector along the principal normal is denoted by n̂ and its direction
may be chosen arbitrarily provided that n̂ varies continuously along the curve i.e. at no stage do
we suddenly reverse the direction of n̂.
The rate at which the unit tangent vector changes direction with respect to arclength as P
moves along Γ is called the curvature κ of the curve. Hence, by definition,

dt̂
|κ| =
ds

but the sign of κ is not determined.


Now d(t̂)/ds = r ′′ so that d(t̂)/ds lies in the osculating plane and is also normal to t̂ so that
dt̂
= ±κn̂. To remove the sign ambiguity we will make the choice
ds

dt̂
r′′ = = κn̂. (1.7)
ds
We can now prove the following elementary result:
: A necessary and sufficient condition that a curve be a straight line is that κ = 0 at all points
on the curve.

Proof. A straight line has the equation r = as + b where a and b are constant vectors. Hence
t̂ = a, d(t̂)/ds = 0 so that κ = 0 is necessary. Conversely, if κ = 0 for all points on Γ then r ′′ = 0
i.e. r = as + b so that the condition is sufficient.


The binormal line at P is the line normal to the osculating plane at P. The direction of the
unit vector b̂ along the binormal line is chosen such that the set {t̂, n̂, b̂} form a right handed
system i.e.
b̂ = t̂ × n̂. (1.8)

We see thus that at every point of a curve there is defined an orthonormal basis. This is usually
called a moving reference frame.
As P moves along Γ, the osculating plane turns about the tangent and the arc-rate (rate of
change with respect to arc length) at which this occurs is called the torsion of the curve and is
denoted by τ .
Alternatively we can think of torsion as the tendency of Γ to climb out of the osculating
plane. It is easy to show that
b̂′ = −τ n̂. (1.9)

as follows :
4
APM3713/1

Since b̂ · b̂ = 1 it follows that b̂· b̂′ = 0 and hence b̂′ lies in the osculating plane. Also b̂ · t̂ = 0
which implies that b̂′ · t̂ + b̂ · t̂′ = 0. But b̂ · t̂′ = b̂ · (κn̂) = 0 so that b̂′ is orthogonal to t̂. Since
b̂′ lies in the osculating plane it must be parallel to n̂ and, by definition |b̂′ | = |τ |. Thus (1.9)
follows, the negative sign in (1.9) being introduced as a convention in which torsion is regarded
as positive if, as s increases, the rotation of the osculating plane is in the direction of a right
handed screw travelling in the direction of t̂.
^
b

n^
^t

It is easy to see that a right handed screw around t̂ causes the end of b̂ to move in the opposite
direction to which n̂ is pointing. The plane containing t̂ and b̂ is called the rectifying plane. The
plane containing n̂ and t̂ is the osculating plane, and the plane containing n̂ and b̂ the normal
plane. We can now prove the further elementary
: A necessary and sufficient condition that a curve be a plane curve is that τ = 0 at all
points.

Proof. If the curve lies in a plane then this plane is simply the osculating plane. Therefore the
osculating plane is fixed so that τ = 0 by definition. Conversely, if τ = 0, then b̂ is a constant
vector and b̂ · t̂ = b̂ · r′ = 0 implies that (b̂ · r)′ = 0 or b̂ · r = constant which implies that the
curve is plane so that the condition is sufficient.

Example 1.4.1

Prove that r′ · (r′′ × r′′′ ) = κ2 τ.


Solution. By (1.6), (1.7)
r′ × r′′ = t̂ × κn̂ = κb̂.
Differentiating gives
r′ × r′′′ = κb̂′ + κ′ b̂ = κ′ b̂ − κτ n̂
so that
r′′ · (r′ × r′′′ ) = κn̂ · (κ′ b̂ − κτ n̂) = −κ2 τ
or, using the vector identity a · b × c = −b · a × c,

r′ · (r′′ × r′′′ ) = κ2 τ.


5
Chapter 1. Curves in E3

Example 1.4.2

Show that r′ · (r′′ × r′′′ ) = 0 is necessary and sufficient for a curve to be plane.

Solution. If the left hand side is zero then either κ = 0 or τ = 0. We prove that τ = 0 always
as follows. Suppose that τ 6= 0 at some point. Then, by continuity, τ 6= 0 in a neighborhood of
this point. Hence κ = 0 in this neighborhood and the arc is a straight line. Thus τ = 0 on this
arc contrary to hypothesis so that τ = 0 at all points and the curve is plane.

Conversely, if the curve is plane, τ = 0 and the left hand side is zero. ♣

1.5 Serret-Frenet equations

Thus far we have discussed the arc-rate of change of t̂ and b̂. Let us now determine the arc-rate
of change of n̂. We have that n̂ = b̂ × t̂ so that

n̂′ = b̂′ × t̂ + b̂ × t̂′ = −τ n̂ × t̂ + b̂ × κn̂ = τ b̂ − κt̂.

The equations

t̂′ = κn̂ (1.10)


n̂′ = τ b̂ − κt̂ (1.11)
b̂′ = −τ n̂ (1.12)

are called the Serret-Frenet equations of a curve and are the fundamental equations of the theory
of curves in E3 .
The Serret-Frenet equations can be written in a very symmetrical form by defining the so-
called Darboux vector ω by
ω = τ t̂ + κb̂. (1.13)
The Serret-Frenet equations can then be written as

t̂′ = ω × t̂, n̂′ = ω × n̂, b̂′ = ω × b̂. (1.14)

To illustrate the use of equations (1.10)-(1.12), let us investigate the behaviour of a curve
r = r(s) in the neighbourhood of a point. We choose an origin O at the point corresponding to
parameter value s = 0 and we choose axes Ox, Oy, Oz along t̂, n̂, b̂ respectively. By Taylor’s
theorem
s2 s3
r(s) = r(0) + sr′ (0) + r′′ (0) + r′′′ (0) + O(s3 ).
2 6
We can now use (1.10-1.12) to determine the derivatives of r. We have

r′ = t̂
r′′ = t̂′ = κn̂
r′′′ = κ′ n̂ + κn̂′ = κ′ n̂ + κτ b̂ − κ2 t̂
6
APM3713/1

so that Taylor’s expansion can be written


s2 s3
r(s) − r(0) = st̂ + (κn̂) + (κ′ n̂ + κτ b̂ − κ2 t̂) + O(s3 ).
2 6
In view of our choice of axes we have
κ2 3
x =s− s
6
κ κ′
y = s2 + s3
2 6
κτ 3
z = s
6
and if we retain only the leading terms then
κ 2 κτ 3
x = s, y= s, z= s. (1.15)
2 6
We see thus that the projection of a small element of the curve onto
(i) the tangent, is of first order,
(ii) the normal, is second order and
(iii) the binormal, is third order.

What is more interesting is to obtain the projections of a small element of a curve onto the
osculating, normal and rectifying planes. We do this by eliminating s from equations (1.15) in
pairs. Eliminating s from the x and y equations yields
1
y = κx2 (1.16)
2
which is the projection in the osculating plane. By the same token the projection in the normal
plane is
2 τ2 3
z2 = y (1.17)

and in the rectifying plane
1
z = κτ x3 . (1.18)
6
These results are quite remarkable since we started with a perfectly arbitrary curve and yet the
projections of a small portion are always a parabola, a curve z 2 ∼ y 3 (which to my knowledge
doesn’t have a name) and a cubic. In some cases we do not have to restrict our attention to
small portions of the curve but these cases are exceptional. Consider, for example, the previously
mentioned (1.3) twisted cubic
xz = y 2 , xy = z. (1.19)
Removing z yields y = x2 , removing x yields z 2 = y 3 and removing y yields z = x3 and these
equations are valid globally. We can determine κ and τ at the origin from these equations very
easily by comparing them with equations (1.16)-(1.18). We have
1 2 τ2 1
κ = 1, = 1, κτ = 1
2 9κ 6
7
Chapter 1. Curves in E3

from which it follows that, for the cubic (1.19), κ = 2, τ = 3 at the origin.
We conclude our discussion of curves by showing that the curvature and torsion characterize
a curve completely. In other words, if κ and τ are given by

κ = f (s), τ = g(s) (1.20)

then these equations (which are called the intrinsic equations of a curve) determine the curve
uniquely up to a “Euclidean motion”. By a “Euclidean motion” we mean a translation and
a rotation in E3 . All this means is that (1.20) determines the shape of the curve but not its
position in E3 . Curves with the same shape are said to be congruent and if Γ1 : r1 = r1 (s) and
Γ2 : r2 = r2 (s) are two curves and P1 on Γ1 and P2 on Γ2 have the same parameter value s then
P1 and P2 are said to be corresponding points. We can now prove the fundamental result
: If Γ1 and Γ2 are two curves and if the curvature κ and torsion τ of Γ1 have the same values
at corresponding points of Γ2 then Γ1 and Γ2 are congruent.

Proof. Consider triads (t̂1 , n̂1 , b̂1 ), (t̂2 , n̂2 , b̂2 ) at corresponding points on Γ1 and Γ2 respectively.
Let P and Q be corresponding points on Γ1 and Γ2 respectively. Translate and rotate Γ2 so that
Q coincides with P and the two triads also coincide at P . The fact that the triads coincide at
a single point does not necessarily mean that they coincide everywhere – we have to prove this
using the fact that κ and τ are the same for both curves. At the point P we have,

t̂1 · t̂2 + n̂1 · n̂2 + b̂1 · b̂2 = 3

since the triads coincide at this point. The Serret-Frenet equations are valid for both Γ1 and Γ2
with the same κ and τ values. Hence we have
d 
t̂1 · t̂2 = t̂1 · t̂′2 + t̂′1 · t̂2 )
ds
= t̂1 · κn̂2 + κn̂1 · t̂2 = κ(t̂1 · n̂2 + n̂1 · t̂2 ,
d
(n̂1 · n̂2 ) = n̂1 · n̂′2 + n̂′1 · n̂2
ds
= n̂1 · (−κt̂2 + τ b̂2 ) + (−κt̂1 + τ b̂1 ) · n̂2
= −κ(n̂1 · t̂2 + t̂1 · n̂2 ) + τ (n̂1 · b̂2 + b̂1 · n̂2 ),
d  
b̂1 · b̂2 = b̂1 · b̂′2 + b̂′1 · b̂2
ds
= b̂1 · (−τ n̂2 ) + (−τ n̂1 ) · n̂2
= −τ (b̂1 · n̂2 + n̂1 · b̂2 ).

Adding the above gives


d  
t̂1 · t̂2 + n̂1 · n̂2 + b̂1 · b̂2 = 0.
ds
Hence, by integrating,
t̂1 · t̂2 + n̂1 · n̂2 + b̂1 · b̂2 = constant.
8
APM3713/1

But the above sum is 3 at P , so it must be 3 for all s. Since

−1 ≤ t̂1 · t̂2 ≤ 1,
−1 ≤ n̂1 · n̂2 ≤ 1,
−1 ≤ b̂1 · b̂2 ≤ 1

we must have

t̂1 · t̂2 = 1, n̂1 · n̂2 = 1, b̂1 · b̂2 = 1

for all s. Hence the triads coincide for all s, and in particular
d
t̂1 − t̂2 = (r − r2 ) = 0
ds 1
which implies that r1 − r2 is constant. Since r1 − r2 = 0 at P , it follows that r1 = r2 at all
corresponding points which completes the proof.


The interesting point about this proof is that one cannot prove directly that t̂1 = t̂2 . In other
words, we cannot prove that if t̂1 = t̂2 at P then t̂1 = t̂2 everywhere. We have to prove that the
triads coincide and hence that t̂1 and t̂2 coincide.

1.6 Examples

The most important topics here are the Frenet-Serret equations and the following formulae for
the curvature and torsion of a curve:

(i) κ2 = r′′ · r′′


r′ · (r′′ × r′′′ )
(ii) τ =
r′′ · r′′
|ṙ(t) × r̈(t)|
(iii) |κ| = ,
|ṙ|3
... ...
(ṙ(t) × r̈(t)) · r (t) ṙ(t) · (r̈(t) × r (t))
(iv) τ = =
|ṙ(t) × r̈(t)|2 |ṙ(t) × r̈(t)|2
Remember that a dot indicates a derivative with respect to an arbitrary curve parameter,
while a prime indicates a derivative with respect to arc-length s.

Example 1.6.1

Find the curvature and torsion along the circular helix

r(t) = (a cos t, a sin t, bt) a > 0, b > 0.


9
Chapter 1. Curves in E3

...
Solution. To use the above formulae, we must calculate ṙ, r̈ and r . We get

ṙ(t) = (−a sin t, a cos t, b)


r̈(t) = (−a cos t, −a sin t, 0)
...
r (t) = (a sin t, −a cos t, 0).

Hence p √
|ṙ| = a2 sin2 t + a2 cos2 t + b2 = a2 + b2
and

i j k
ṙ(t) × r̈(t) = −a sin t a cos t b
−a cos t −a sin t 0
= (ab sin t, −ab cos t, a2 ).

Hence
√ √
|ṙ(t) × r̈(t)| = a2 b2 + a4 = a a2 + b2 .
So for the curvature we get
|ṙ(t) × r̈(t)|
|κ| =
|ṙ|3
a(a2 + b2 )1/2 a
= 2 2 3/2
= 2 .
(a + b ) a + b2
...
For the torsion we need (ṙ(t) × r̈(t)) · r (t). We get
...
(ṙ(t) × r̈(t)) · r (t) = a2 b sin2 t + a2 b cos2 t + 0 = a2 b.

Hence
...
(ṙ(t) × r̈(t)) · r (t)
τ =
|ṙ(t) × r̈(t)|2
a2 b b
= 2 2 2
= 2 .
a (a + b ) a + b2

Example 1.6.2

Find the curvature and torsion of the curves

(a) r = (u, u2, u3 ),


1 + u 1 − u2
(b) r = (u, , ),
u u
10
APM3713/1

(c) r = (3u − u3 , 3u2, 3u + u3 ).

Solution.

(a) We get

ṙ(u) = (1, 2u, 3u2)


r̈(u) = (0, 2, 6u)
...
r (u) = (0, 0, 6).

Hence

|ṙ| = 1 + 4u2 + 9u4
and
i j k
ṙ(u) × r̈(u) = 1 2u 3u2
0 2 6u
= (6u2, −6u, 2).

Hence

|ṙ(u) × r̈(u)| = 2 9u4 + 9u2 + 1.
So for the curvature we get
|ṙ(u) × r̈(u)|
|κ| =
|ṙ|3
2(9u4 + 9u2 + 1)1/2
= .
(1 + 4u2 + 9u4)3/2
...
For the torsion we need (ṙ(u) × r̈(u)) · r (u). We get
...
(ṙ(u) × r̈(u)) · r (u) = 12

Hence
...
(ṙ(u) × r̈(u)) · r (u)
τ =
|ṙ(u) × r̈(u)|2
12 3
= 4 2
= 4 .
36u + 36u + 4 9u + 9u2 + 1

(b) We get

r(u) = (u, u−1 + 1, u−1 − u)


ṙ(u) = (1, −u−2 , −u−2 − 1)
r̈(u) = (0, 2u−3, 2u−3)
...
r = (0, −6u−4, −6u−4 ).
11
Chapter 1. Curves in E3

Hence

|ṙ| = 2 + 2u−2 + 2u−4
and

i j k
ṙ(u) × r̈(u) = 1 −u −2
−u − 1
−2

0 2u−3 2u−3
= (2u−3 , −2u−3 , 2u−3).

Hence

|ṙ(u) × r̈(u)| = 2 3u−3 .
So for the curvature we get
|ṙ(u) × r̈(u)|
|κ| =
|ṙ(u)|3
√ −3 r  
2 3u 3 1
= 3/2 = .
2 (1 + u−2 + u−4 ) 2 u2 + 1 + u−2
...
For the torsion we need (ṙ(u) × r̈(u)) · r (u). We get
...
(ṙ(u) × r̈(u)) · r (u) = 0

Hence
...
(ṙ(u) × r̈(u)) · r (u)
τ =
|ṙ(u) × r̈(u)|2
= 0.

(c) We get

ṙ(u) = (3 − 3u2 , 6u, 3 + 3u2 )


r̈(u) = (−6u, 6, 6u)
...
r = (−6, 0, 6).

Hence

|ṙ(u)| = ([3 − 3u2 ]2 + 36u2 + [3 + 3u2 ]2 )1/2



= 3 2(1 + u2 )

and

i j k
ṙ(u) × r̈(u) = 3 − 3u 6u 3 + 3u2)
2

−6u 6 6u
= 18(u2 − 1, −2u, u2 + 1).
12
APM3713/1

Hence

|ṙ(u) × r̈(u)|
= 18([u2 − 1]2 + 4u2 + [u2 + 1]2 )1/2
= 18([u2 + 1]2 + [u2 + 1]2 )1/2

= 18 2(u2 + 1).

So for the curvature we get


|ṙ(u) × r̈(u)|
|κ| =
|ṙ(u)|3
√ 2
18 2(u + 1) 1
= √ = .
54 2(u + 1)2 3 3(u + 1)2
2

...
For the torsion we need (ṙ(u) × r̈(u)) · r (u). We get
...
(ṙ(u) × r̈(u)) · r (u) = 18(u2 − 1, −2u, u2 + 1) · (−6, 0, 6)
= 216

Hence
...
(ṙ(u) × r̈(u)) · r (u)
τ =
|ṙ(u) × r̈(u)|2
216 1
= 2 2 2
= .
2.18 (u + 1) 3(u + 1)2
2

Example 1.6.3

Show that
d τ 
(t′ × t′′ ) · t′′′ = κ5 .
ds κ

Solution. We can use the Frenet-Serret equations to find the derivatives of t in terms of the
vectors t, n and b as follows. First
t′ = κn.
Therefore

t′′ = κ′ n + κn′
= κ′ n + κ(τ b − κt)
= −κ2 t + κ′ n + κτ b

Differentiating t′′ again gives (after using the Frenet-Serret equations to substitute for the
derivatives t′ , n′ and b′ )

t′′′ = −3κκ′ t + (−κ3 + κ′′ − κτ 2 )n + (2κ′ τ + κτ ′ )b.


13
Chapter 1. Curves in E3

Therefore

t′ × t′′ = κn × (−κ2 t + κ′ n + κτ b)
= κ3 b + κ2 τ t.

Hence

(t′ × t′′ ) · t′′′ = −3κ3 κ′ τ + 2κ3 κ′ τ + κ3 κτ


= κ3 (κτ ′ − κ′ τ )
κτ ′ − κ′ τ
= κ5
κ2
d τ
= κ5 .
ds κ

Example 1.6.4

Show that if τ 6= 0 and κ are the torsion and curvature of a curve lying on the surface of a sphere
then  ′ 
τ d κ
= .
κ ds κ2 τ

Solution. The result does not depend on where the center of the sphere is located, so for
simplicity let’s suppose that the center of the sphere is at the origin. Then the vector equation
of the sphere will be
R · R = a2

where a is the radius of the sphere and R is the position vector of any point on the sphere.
Suppose that the curve r(s) lies on the sphere. We will be using the Frenet-Serret equations, so
it will be convenient to express r(s) as a linear sum of t, n and b. (We can do this since these
vectors are mutually perpendicular and can therefore be used as basis vectors in place of i, j and
k.) So we put
r(s) = ut + vn + wb.

We can express u, v and w in terms of κ and τ as follows:


First
r · t = u, r · n = v, r · b = w.

Because r(s) lies on the sphere of radius a we get

|r(s)|2 = r(s) · r(s) = a2

and therefore differentiating gives


r′ · r = 0.
14
APM3713/1

or, because t = r′ ,
u = t · r = 0.
Differentiating this equation with respect to s and using the first of the Frenet-Serret equations
gives
0 = t′ · r + t · t = κn · r + 1. (1)
Hence
1
v =n·r=− .
κ
Differentiating equation (1) and using the second Frenet-Serret equation gives

0 = (κ′ n + κn′ ) · r = (κ′ n − κ2 t + κτ b) · r.

Hence
κ′
− + κτ b · r = 0
κ
and therefore
κ′
w =b·r= .
κ2 τ
So
1 κ′
r(s) = − n + 2 b.
κ κτ
Hence  ′ 2
2 1 κ
a = r·r = 2 + .
κ κ2 τ
Then differentiating both sides of this expression with respect to s gives
 ′   ′ 
−3 ′ κ d κ
−2κ κ + 2 =0
κ τ ds κ2 τ
2

and hence  
τ d κ′
=
κ ds κ2 τ

1.7 Exercises

1.7.1 Verify equations (1.14).

1.7.2 Show that the curvature and torsion of a curve r = r(s) are given by
r′ · (ṙ′′ × r′′′ )
κ2 = r′′ · r′′ and τ = .
r′′ · r′′

1.7.3 Show that


...
r′ · (r′′ × r′′′ ) = (u′)6 [ṙ · (r̈ × r )]
where
du 1
u′ = = .
ds |ṙ|

15
Chapter 1. Curves in E3

1.7.4 Show that the curvature and torsion of a curve r = r(u) are given by
...
2 |ṙ × r̈|2 ṙ · (r̈ × r )
κ = and τ = .
|ṙ|6 |ṙ × r̈|2

[This problem actually involves a little more manipulation than it would appear. The
difficulty is in obtaining κ2 – once you have this, τ follows directly from Questions 2 and
3. Proceed as follows:

r′=
|ṙ|
so that      
′′ d ṙ d ṙ du 1 d ṙ
r = = = .
ds |ṙ| du |ṙ| ds |ṙ| du |ṙ|
d
To determine du
(|ṙ|), use the relation

ṙ · ṙ = |ṙ|2 .

This is not all – one also has to make use of the familiar vector identity for (A × B)×C in
the form
.. .. ..
(ṙ × r) × ṙ = (ṙ · ṙ)r − (ṙ · r)ṙ
and the fact that ṙ is perpendicular to ṙ × r̈.]

1.7.5 Find the curvature and torsion of the curves

(a) r = (u, u2 , u3)


1 + u 1 − u2
 
(b) r = u, ,
u u
(c) r = (3u − u3 , 3u2 , 3u + u3 ).

1.7.6 Determine the function f (u) such that the curve r = (a cos u, a sin u, f (u)) be a plane
curve.

1.7.7 Find the equation of the osculating plane at an arbitary point on the curve r = (u, u2, u3 ).

1.7.8 Show that the curve of intersection of the surfaces


x2 y 2 z 
− 2 = 1, x = a cosh
a2 b a
can be written in parametric form as

x = a cosh u, y = b sinh u, z = au

and hence show that the arclength from (a, 0, 0) to (x, y, z) is



y a2 + b2
s= .
b
16
APM3713/1

1.7.9 When r = r(t) is the path of a moving point as a function of time, show that the
acceleration vector lies in the osculating plane.

1.7.10 Find the unit tangent to the curve of intersection of the surfaces

F (x, y, z) = 0, G(x, y, z) = 0.

17
Chapter 2. Surfaces in E3

Chapter 2

Surfaces in E3

2.1 Introduction

This chapter concerns the theory of 2-dimensional surfaces in E3 and here there are many
possibilities for indicating the road to more general and abstract geometry. We again use vector
methods in developing virtually all of the theory but wherever it is possible to re-write equations
in tensor notation, we do so. Our aim is to illustrate the conciseness of tensor notation and, at the
same time, to provide some exercise in manipulating tensors at a very elementary and concrete
level. You will notice that a “tensor” is never given a formal definition – it isn’t necessary to
do so because all the tensorial quantities which we work with are always expressed in terms of
vectors so that the abstract definition of a tensor is unnecessary.
The geometry of surfaces can be divided into two parts – intrinsic geometry and extrinsic
geometry. By intrinsic we mean those aspects of surfaces which are independent of the way
in which the surface is embedded in the surrounding space E3 . For example, if two points
on a surface are connected by a curve lying in the surface then the distance along the curve
between the two points is an intrinsic quantity. In general this distance will be different from
the Euclidean distance between the points which is just the length of the straight line joining
the points. Extrinsic properties, on the other hand, can be regarded as those properties of the
surface which can be described by viewing the surface from the vantage point of E3 .

2.2 Preliminaries

In E3 we know that the surface of a sphere of radius a can be described by the equation

x2 + y 2 + z 2 − a2 = 0

or, equivalently, by the parametric equations

x = a sin θ cos ϕ
y = a sin θ sin ϕ
z = a cos θ

18
APM3713/1

where the parameters θ and ϕ are the colatitude and longitude respectively.
In fact, the two representations above indicate the definition of a spherical surface Σ in E3 .
In the first instance, a surface can be represented by the implicit equation

F (x, y, z) = 0 (2.1)

or, by the so-called explicit equation


z = f (x, y). (2.2)

Both of these descriptions are particularly suitable for the global study of surfaces but we are
mainly interested in local differential geometry and in these circumstances it is generally more
convenient to work with the so-called parametric equations of Σ namely,

x = x(u1 , u2), y = y(u1, u2), z = z(u1 , u2 ) (2.3)

or, if r = (x, y, z) is the position vector of an arbitrary point on Σ then (2.3) can be written more
concisely as
r = r(u1 , u2) (2.4)

or
r = r(uα ), α = 1, 2. (2.5)

We see thus that a surface, which is a 2-dimensional object, requires two parameters for its
description in contrast to a curve, which is a 1-dimensional object, and requires only one
parameter.
Strictly speaking a parametric representation of points on the surface in E 3 is a mapping
r = r(u1 , u2) of an open set U in the u1 u2 plane onto Σ. Observe that the image of the coordinate
line u1 = c in U will be a curve r = r(c, u2 ) called the u2 -parameter curve. Similarly the image
of the coordinate line u2 = d in U is the curve r = r(u1 , d) called the u1 -parameter curve. Thus
the parametric representation covers S with two family of curves, the image of the coordinate
lines u1 =constant and u2 =constant.

Change of parameters

Consider now the surface defined by

x = u1 + u2 , y = u1 − u2 , z = 4u1u2 , -∞ < u1 , u2 < ∞.

If we eliminate u1 and u2 we obtain the explicit form

z = x2 − y 2 .

However, the parametric equations

x = ū1 , y = ū2 , z = (ū1 )2 − (ū2 )2


19
Chapter 2. Surfaces in E3

represents the same surface z = x2 − y 2 which shows that the parametric equations of a surface
are by no means unique.
When a surface can be given two parametric representations we regard it as part of the
definition of the surface that the two sets of parameters are related by a so-called parameter
transformation i.e.
ūµ = ūµ (uα ), α, µ = 1, 2. (2.6)

Note that, in terms of our notation, this equation stands for the pair of equations

ū1 = ū1 (u1 , u2 ) and ū2 = ū2 (u1 , u2 ).

The transformation (2.6) is said to be proper if the Jacobian of the matrix

∂ ū1 ∂ ū2
 
 µ   ∂u1 ∂u1 
∂ ū
= (2.7)
 
∂u α

 1 2
∂ ū ∂ ū
∂u2 ∂u2
is non-zero, i.e. if
µ
 
∂ ū
J¯ = det 6 0.
= (2.8)
∂uα
If D is the domain of uα and D̄ the domain of ūµ then, provided (2.8) is valid, equations (2.6)
are invertible in D ∩ D̄ so that we can write

uα = uα (ūµ ). (2.9)

Also associated to the inverse transformation (2.9) is a matrix

∂u1 ∂u2
 
 α   ∂ ū1 ∂ ū1 
∂u
= (2.10)
 
∂ ūµ

 1 2
∂u ∂u
∂ ū2 ∂ ū2
whose Jacobian is non-zero, i.e,
∂uα
 
J = det 6 0.
= (2.11)
∂ ūµ
The two matrices (2.7) and (2.10) are inverses of each other so now from matrix theory it follows
that
2′
X ∂ ūµ ∂uβ
α ∂ ūµ
= δαβ (2.12)
∂u
ᾱ=1

and
2
X ∂ ūµ ∂uα
= δνµ (2.13)
α=1
∂uα ∂ ūν
20
APM3713/1

where the right hand sides of these equations are called Kronecker deltas – they represent the
entries in the unit matrix and are defined by

1 when β = α,
δαβ = (2.14)
0 when β 6= α.

Summation convention

In most expressions involving summations so far, the summation index occurred precisely twice in
the expression. From now on whenever an index occurs twice in an expression we shall understand
P
that there must be a summation over this index from 1 to 2 and we will not write the . Indices
can, of course, occur only once in an expression but these will be free indices and will occur on
both sides of the equation (unless one side of the equation is zero). Free indices can be regarded
as labelling indices in the sense that an equation involving free indices means that the equation is
valid for all values of the free indices. Summation indices are often called dummy indices because
a summation index can always be changed to any other index, provided that the new index does
not occur elsewhere in the expression. For example equation (2.12) may be written using the
summation convention as
∂ ūµ ∂uβ
= δαβ (2.15)
∂uα ∂ ūµ
here α and β are free indices and summation in over µ̄ and

∂ ūµ ∂uα
= δνµ (2.16)
∂uα ∂ ūν
with µ and ν as free indices We shall henceforth apply the summation convention.

Tangent plane and normal vector

An arbitrary curve lying on the surface Σ can be represented by the parametric equations

uα = uα (t) (2.17)

where t is some parameter – we might call this the surface-description of the curve. The space-
description of the same curve can be obtained by substituting (2.17) into (2.5) to obtain

r = r(uα (t)) = r∗ (t) (2.18)

where r∗ is the position vector of an arbitrary point on the curve. The tangent vector to this
curve is given by
dr ∂r du1 ∂r du2 dr∗
= + = . (2.19)
dt ∂u1 dt ∂u2 dt dt
If we put
∂r ∂r ∂r
r,α = i.e. r,1 = , r,2 = (2.20)
∂uα ∂u1 ∂u2
21
Chapter 2. Surfaces in E3

then we can write


dr duα
= r,α . (2.21)
dt dt
(Note the usage of the summation convention). Now since (2.21) is the tangent vector to an
arbitrary curve in the surface we can regard this vector as being an arbitrary tangent vector to
the surface and, in view of (2.21), we see that this vector can always be written as the linear
combination of r,1 and r,2 . In other words, the vectors r,1 and r,2 span a tangent plane to the
surface at each point. In fact this statement is only true if r,1 and r,2 are not collinear and we
shall always assume this to be the case. In many cases r,1 · r,2 = 0 and when this is so, we
say that the curvilinear coordinates are orthogonal. As a simple exercise one can verify that the
curvilinear coordinates θ, ϕ on a sphere are orthogonal.
It is useful to note that the tangent plane through a point r0 on Σ is given by

T = r0 + hr,1 + kr,2 , −∞ < h, k < ∞

At the point of contact between the surface and a tangent plane we construct a normal vector
N defined by
N = r,1 × r,2 (2.22)
and we say that N is normal to the surface at the point of contact. An ordinary point of the
surface is one for which r,1 × r,2 6= 0 and a singular point is one where r,1 × r,2 = 0. We regard
surfaces whose set of points are all ordinary points as regular surfaces. From henceforth we shall
deal with regular surfaces. The equation of the normal line passing through a point r0 on the
surface can be written as
y = r0 + kN, −∞ < k < ∞.
It is easy to verify that
N · (T − r0 ) = 0
is the equation of normal the plane. If r(u1 , u2 ) is an arbitrary point on the surface then the
perpendicular distance d from r to the plane is

d = N̂ · (r − r0 ). (2.23)

Example 2.2.1

Consider the parametric representation from the φθ plane to Σ given by:

r = a(sin φ cos θ)e1 + b(sin φ cos θ)e2 + c(cos φ)e3

where a, b, c > 0, −∞ < θ < ∞, 0 < φ < π. Now setting

x = a(sin φ cos θ), y = b(sin φ cos θ), z = c(cos φ)

leads to
x2 y 2 z 2
+ 2 + 2 =1
a2 b c
22
APM3713/1

which is the equation of an ellipsoid. Furthermore from


∂r
r,φ = = −a(sin φ sin θ)e1 + b(sin φ cos θ)e2 ,
∂φ
r,θ = a(cos φ cos θ)e1 + b(cos φ sin θ)e2 − c(sin φ)e3

we obtain the normal vector

N = r,φ × r,θ
= −bc(sin2 φ cos θ)e1 − (ac)(sin2 φ sin θ)e2 − ab(sin φ cos φ)e3

and the magnitude of the normal vector


q
r,φ × r,θ = | sin φ| (a2 sin2 θ + b2 cos2 θ)c2 sin2 φ + a2 b2 cos2 φ.

Now if we assume that 0 < a < b < c then the following inequality is true
q
r,φ × r,θ ≥ | sin φ| (a2 sin2 θ + a2 cos2 θ)a2 sin2 φ + a4 cos2 φ = | sin φ|a2 .

We see now that that r,φ × r,θ = 0 at φ = 0, π and this correspond to the two cartesian
2 2 2
singular points (0, 0, c) and (0, 0, −c) respectively. Hence the ellipsoid xa2 + yb2 + zc2 = 1 defined
for −∞ < θ < ∞, 0 < φ < π is a regular surface punctured at (0, 0, −c) and (0, 0, c).
Now if we choose a point on the ellipsoid with parameter values, say φ = π/2, θ = 0 then
firstly note that this point can be written as

r0 = r(φ = π/2, θ = 0) = ae1

and secondly at r0

r,φ × r,θ = b a2 + c2 6= 0
hence the point is an ordinary point. The tangent plane at r0 is given by

T0 = r(φ = π/2, θ = 0) + hr,φ (φ = π/2, θ = 0) + kr, θ(φ = π/2, θ = 0), −∞ < h, k < ∞

and this becomes


T0 = ae1 + hbe2 + kce3 .
The normal vector at r0 is
N0 = −bce1
and the normal line
y = r0 + kN0 , −∞ < k < ∞.
becomes
y = (a − kbc)e1

23
Chapter 2. Surfaces in E3

Singularities

Before continuing it is perhaps worthwhile making a few remarks about singular points or
singularities. There are two types of singular points – essential singularities and removable
singularities. An essential singularity is due to an odd feature in the geometry of the surface
where either or both of r,1 , r,2 cannot be defined. The vertex of a cone is such a singularity. A
removable singularity, on the other hand, is geometrically perfectly normal and well behaved but
r,1 × r,2 = 0 because of the particular choice of the parameters u1 , u2 . The simplest example of
such a singularity is the origin of the xy-plane when described by the parameters r, θ (i.e. polar
coordinates). In this description we have

r = (r cos θ, r sin θ, 0)

so that
∂r ∂r
r,1 = = (cos θ, sin θ, 0), r,2 = = (−r sin θ, r cos θ, 0)
∂r ∂θ
and hence
r,1 × r,2 = r k̂

which is zero at r = 0. As the name suggests, this type of singularity can be “removed” by a
change of parameter.

Orientation of a surface

By defining the normal vector N in such a way that r,1 , r,2 , N form a right handed system we
say that we have established an orientation of the surface at the point where N is defined. If
N varies continuously over the whole surface we say that the surface is orientable. The most
familiar example of a non-orientable surface is the Möbius strip. This is the surface you get by
taking a strip of paper and glueing the shortest ends together after rotating one end through 180
degrees. If we choose N at a given point on this strip and move it all the way around the strip,
when we get back to the starting position, N will point in the opposite direction.
Clearly the orientation of a surface depends very much on the choice of parameter. For
example, if our orientation is such that r,1 , r,2 , N are form a right handed system and we simply
re-label the parameters such that u2 becomes u1 and u1 becomes u2 then the orientation will be
reversed.

2.3 The First Fundamental Form

We begin by obtaining an expression for distance between two points on a surface. Notice that
when we speak about distance in this context we mean the intrinsic distance and not the normal
Euclidean distance as viewed from E3 .
24
APM3713/1

Let Σ be a surface r = r(u1 , u2 ) and let P and Q be neighboring points on Σ with parameter
values uα and uα + duα respectively. The distance between P and Q is given by

ds2 = dr · dr
= (r,1 du1 + r,2 du2 ) · (r,1 du1 + r,2 du2 )
= (r,1 · r,1 )(du1 )2 + 2r,1 · r,2 du1du2 + r,2 · r,2 (du2)2 . (2.24)

It is customary to put

E(u1 , u2 ) = r,1 · r,1 (2.25)


F (u1 , u2 ) = r,1 · r,2 (2.26)
G(u1 , u2 ) = r,2 · r,2 (2.27)

so that
ds2 = E(du1 )2 + 2F du1du2 + G(du2 )2 (2.28)
and this is called the First Fundamental Form of the surface. We must bear in mind that ds, as
given by (2.28), is not a total differential i.e. there is no function f (u1, u2 ) such that ds = df
identically. The expression (2.28) is often termed the metric on the surface and in some books
the expression is denoted by I.
Now let us define a symmetric 2 × 2 matrix g, whose entries are gαβ , by

gαβ = r,α · r,β . (2.29)

We adopt the convention that the first index refers to the row number and the second to the
column number so that we can write
" # " # " #
g11 g12 r,1 · r,1 r,1 · r,2 E F
g= = = . (2.30)
g21 g22 r,2 · r,1 r,2 · r,2 F G

Notice that the fact that g is symmetric is expressed by gαβ = gβα which follows because
r,α · rβ = r,β · r,α . By (2.30) we have therefore that

g11 = E, g12 = g21 = F, g22 = G

and the metric can be written as


ds2 = gαβ duα duβ . (2.31)

Arc length and area measurements on surfaces

If uα = uα (t) is some curve Γ lying on Σ and we wish to determine the distance along Γ between
two points with parameter values t1 and t2 then this is clearly given by
Z t2 "  1 2  1 2  2 #1/2
du du du du
s= E + 2F +G dt
Γ t1
dt dt dt dt
25
Chapter 2. Surfaces in E3

and this distance obviously depends on the path of integration Γ.


It is easy to define an element of area on Σ. We consider two pairs of neighboring parameter
curves as shown.

D
C
u1 + du 1 = c1

r 1 du 1
A B
r 2 du 2
u1 = c 1

u2 = c 2 u2 + du 2 = c2

The figure with the vertices A(u1 , u2), B(u1 , u2 + du2 ), C(u1 + du1 , u2 + du2 ), D(u1 + du1 , u2 )
is approximately a parallelogram and its area dσ is given by

dσ = r,1 du1 × r,2 du2 = r,1 × r,2 du1 du2 (2.32)

It is customary to put
H = |r,1 × r,2 | (2.33)

so that
dσ = Hdu1du2 . (2.34)

Recall the vector


N = r,1 × r,2

is normal to the plane and, as usual, it is convenient to work with the unit normal vector N̂
defined by
r,1 × r,2 r,1 × r,2
N̂ = = .
r,1 × r,2 H
The arc length of a curve Γ : uα = uα (t) lying on Σ is then

t2 2 2 !1/2
du1 du1 du2
    
du
Z
S= g11 + 2g12 + g22 dt
Γ t1 dt dt dt dt

along the path of integration Γ.


We also write the area of a patch on a surface in terms of gαβ as follows. First recall the
vector identity
|A × B|2 = A2 B2 − (A · B)2
26
APM3713/1

so that
H 2 = |r,1 × r,2 |2 = r2,1 r2,2 − (r,1 · r,2 )2 = EG − F 2
or
H 2 = g11 g22 − (g12 )2 .
Now the determinant of g is given by
g11 g12
det g = = g11 g22 − (g12 )2 = H 2
g21 g22
so that
p
H= det g
and hence
p
dσ = det gdu1 du2
or ZZ p
Area = det gdu1du2 .
Σ

Invariance of the first fundamental form

It is clear that the quantities E, F and G are invariants with respect to changes in coordinates.
However, these quantities are not invariants with respect to parameter transformation. On the
other hand the first fundamental form ought to be invariant with respect to both parameter and
coordinate change. We prove invariance of the first fundamental form with respect to parameter
transformation as follows: Given

ū1 = ū1 (u1, u2 ), ū2 = ū2(u1 , u2 ),


we write (ds̄)2 as

(ds̄)2 = Ē(dū1 )2 + 2F̄ ( dū1)(dū2 ) + Ḡ(dū2)2


 2
∂r 1 ∂r 2
= dū + 2 dū
∂ ū1 ∂ ū
1
∂r ∂u2
 
∂r ∂u
= + dū1
∂u1 ∂ ū1 ∂u2 ∂ ū1
2
∂r ∂u1 ∂r ∂u2
 
2
+ + dū
∂u1 ∂ ū2 ∂u2 ∂ ū2
∂r ∂u1 1 ∂u1 2
  
= dū + 2 dū
∂u1 ∂ ū1 ∂ ū
 2 2
∂r ∂u 1 ∂u2 2
+ dū + 2 dū
∂u2 ∂ ū1 ∂ ū
1 2 2

= r,1 du + r,2 du
2 2
= E du1 + 2F du1du2 + G du2 = ds2

27
Chapter 2. Surfaces in E3

so that the metric is invariant.

Example 2.3.1

Consider the torus

r = (b + a sin φ)(cos θ)e1 + (b + a sin φ)(sin θ)e2 + (a cos θ)e3

Here it is easy to show that

E = r,θ .r,θ = (b + a sin φ)2 , F = r,θ .r,φ = 0, G = r,θ .r, θ = a2

and the surface area of the torus is


Z Z √ Z 2π Z 2π 
s= 2
EG − F dθdφ = a(b + a sin φ)dφ dθ = 4π 2 ab.
0 0

Transformation properties of metric coefficients

Let us now proceed to establish transformation properties of the quantities gαβ under a parameter

transformation ūµ = uα (uα ) and hence deduce the transformation properties of E, F , G and H.
Since ds2 = (ds̄)2 we have that

gαβ duα duβ = ḡµν dūµduν̄ .

Now
∂ ūµ α
dūµ = du (2.35)
∂uα
so that
∂ ūµ ∂ ūν
 
gαβ du du = ḡµν α β duα duβ
α β
∂u ∂u
from which it follows that the metric coefficients transforms as follows
∂ ūµ ∂ ūν
gαβ = ḡµν . (2.36)
∂uα ∂uβ
Note that from (2.15) and (2.16) we obtain

∂uα ∂uβ ∂ ūµ ∂ ūν ∂uα ∂uβ


 
gαβ τ σ = ḡµν α β
∂ ū ∂ ū ∂u ∂u ∂ ūτ ∂ ūσ
= ḡτ σ

from (2.36). In this way we have


∂uα ∂uβ
ḡµν = gαβ . (2.37)
∂ ūµ ∂ ūν
28
APM3713/1

From (2.37) we now have

Ē = ḡ11
 1 2  2 2
∂u ∂u2 ∂u1 ∂u
=E 1
+ 2F 1 1 + G
∂ ū ∂ ū ∂ ū ∂ ū1
F̄ = ḡ12
 1 1  2 1
∂u1 ∂u2

∂u ∂u ∂u ∂u
=E +F +
∂ ū1 ∂ ū2 ∂ ū1 ∂ ū2 ∂ ū1 ∂ ū2
 2 2
∂u ∂u
+G
∂ ū1 ∂ ū2
Ḡ = ḡ22
 1 2  2 1  2 2
∂u ∂u ∂u ∂u
=E 2
+ 2F 2 2
+G .
∂ ū ∂ ū ∂ ū ∂ ū2

Finally, we notice that there are several ways of determining H̄. One way has already been
done and the result is contained in equation (2.32). Another way (guaranteed to lead to despair
and suicide) is to use H̄ 2 = Ē Ḡ − F̄ 2 and multiply out the above expressions. Perhaps the
simplest way, however, is to recognize that the right hand side of (2.37) is the product of three
∂uα ∂uβ
matrices. The matrices represented by and are, of course, the same and if we denote
∂ ūµ ∂ ūν
as in (2.11) the determinant of these matrices by J then from (2.37) we have

det ḡ = J 2 (det g)

where we have used the fact that the determinant of a product of matrices is equal to the
product of determinants and we have used the symbols (det ḡ), (det g) for the determinants of
the matrices represented by ḡµν and gαβ respectively. We thus have
p p
det ḡ = J det g (2.38)

or
H̄ = JH.
From these results it is easy to show that the area on a surface is parameter-invariant. We
have
dσ̄ = H̄dū1 dū2
and on using
¯ 1 du2
dū1dū2 = Jdu
we get

¯ 1 du2
dσ̄ = H̄ Jdu
= Hdu1du2
= dσ.
29
Chapter 2. Surfaces in E3

Tangent vectors

We consider now any vector X which lies in the tangent plane. Since the plane is spanned by r,1
and r,2 , X must be a linear combination of these vectors i.e.

X = X α r,α (2.39)

where the X α are the components of X relative to the basis vectors r,α .
Notice, however, that X can also be regarded as a vector in space i.e. a vector in E3 and its
length |X| is given, as usual, by
|X|2 = X · X

where the scalar product is an operation in E3 . When we view the vector as an element of the
tangent plane we must have

|X|2 = X · X = (X α rα ) · (X β r,β )
= (r,α · r,β )X α X β
= gαβ X α X β . (2.40)

We see thus that the matrix g not only gives us a measure of distance in the surface but also
a measure of length of vectors in the tangent planes. We might say that the right hand side of
(2.40) defines a generalized scalar product which operates in the tangent planes. In fact, if X and
Y are two vectors in a tangent plane then

X · Y = gαβ X α Y β (2.41)

and the orthogonality condition in E3 viz. X · Y = 0 becomes

gαβ X α Y β = 0. (2.42)

Of particular importance in this context is the tangent vector to a curve uα = uα (s) which
lies on the surface r = r(uα). Notice that we have used the arclength s as the curve parameter
here so that the tangent vector to the curve, when viewed in E3 , obeys the usual condition

t̂ · t̂ = r′ · r′ = 1.

But t̂ is also tangent to the surface i.e. an element of a tangent plane so that the above condition
can be written
duα duβ
gαβ =1 (2.43)
ds ds
because
∂r duα duα
r′ = α = r,α . (2.44)
∂u ds ds
duα
In other words, the can be regarded as the components of a unit vector in the tangent plane.
ds
30
APM3713/1

Transformation property of tangent vectors

Let us now return to the equation (2.39) Now we know that a parameter transformation uα → ūµ
induces the folowing transformation of r,α ;

∂uα
r̄,µ = r
∂ ūµ ,α
and this, together with (2.39) enables us to obtain the transformation law for the X α . Note
that the vector X itself is parameter invariant (i.e. X = X̄) in the sense that X is independent
of the basis vectors. In other words, a change in the basis vectors r,α induces a change in the
components X α but leaves X unaltered. We thus have
 α 
α µ µ ∂u
X r,α = X̄ r̄µ = X̄ r
∂ ūµ ,α

and from this it is easy to deduce that


∂ ūµ α
X̄ µ = X . (2.45)
∂uα
Notice the similarity between this equation and (2.35) viz.

∂ ūµ α
dūµ = du .
∂uα
The implication of this similarity is that the duα can also be regarded as the components of some
vector relative to the basis r,α . In fact, the duα can be regarded as the components of dr because

∂r α
dr = α
du = r,α duα. (2.46)
∂u

2.4 Second Fundamental Form

Thus far our discussion has been concerning largely with intrinsic quantities on a surface. Later
we will be a bit more precise about the meaning of “intrinsic” but for the moment let us say
that such a quantity is one which does not change when we bend the surface without stretching,
compressing or tearing it. Lengths and areas, for example, are intrinsic quantities. It appears
therefore that the coefficients E, F , and G of the first fundamental form (which are used to
define lengths and areas) are not sufficient to determine a surface uniquely and we need some
quantity which will enable us to distinguish between surfaces which are “bent” versions of each
other. In other words we need a quantity which describes the shape of the surface as viewed from
E3 . This quantity (which is called the second fundamental form and generally denoted by II)
together with the first fundamental form will enable us to characterize the surface completely.
Quantities which depend on the way in which a surface is embedded in E3 are called extrinsic
so that the second fundamental form is an extrinsic quantity. In order to get a good geometric
feeling for the second fundamental form we shall present several equivalent views of it.
31
Chapter 2. Surfaces in E3

• First we consider an arbitrary curve Γ lying on the surface r = r(u1, u2 ) and we suppose
that the parameter of the curve is the arclength s. We recall that the tangent vector to Γ is
t̂ = r′ and r′′ = t̂′ = κn̂ where κ is the curvature of Γ and n̂ the principal normal. The vector
r′′ = κn̂ is sometimes called the curvature vector of Γ. As a matter of notation we should point
out that the “prime” here, and in what follows, refers to differentiation with respect to s and not
to a change in parameter. Note also that the vector N̂ which we defined as the normal vector to
the surface is an extrinsic quantity. Now at any point of the surface through which Γ passes, the
curvature vector can be written as a linear combination of N̂, r,1 and r,2 at that point i.e.

r′′ = κn N̂ + λr,1 + µr,2 (2.47)

where κn , λ and µ are numbers. By taking the dot product of both sides of equation (2.47) with
N̂ we get
κn = r′′ · N̂ (2.48)
and we call κn the normal curvature of Γ. It is the magnitude of the projection of r′′ onto N̂.
Now since Γ lies on the surface we have that
du1 du2
r′ = r,1 + r,2
ds ds
and hence
2
d2 u 1 d2 u 2 du1

′′
r = r,1 2 + r,2 2 + r,11
ds ds ds
 1  2  2 2
du du du
+ 2r,12 + r,22
ds ds ds
where
∂2r ∂2r ∂2r
r,11 = , r,12 = = r,21 , r,22 = .
∂ (u1 )2 ∂u1 ∂u2 ∂ (u2 )2
If we substitute for r′′ in (2.48) and bear in mind that

N̂ · r,1 = N̂ · r,2 = 0,

we obtain 2 2
du1 du1 du2 du2
 
κn = (N̂ · r,11 ) + 2(N̂ · r,12 ) + (N̂ · r,22 ) .
ds ds ds ds
We now define quantities L, M, and N by

L = N̂ · r,11
M = N̂ · r,12 = N̂ · r,21 (2.49)
N = N̂ · r,22

and, after factoring out 1/(ds)2, we write κn as


2 2
L (du1) + 2Mdu1 du2 + N (du2 )
κn =
ds2
32
APM3713/1

or
2 2
L (du1 ) + 2Mdu1 du2 + N (du2 )
κn = . (2.50)
E (du1 )2 + 2F du1du2 + G (du2 )2
The quadratic form
II = L(du1 )2 + 2Mdu1 du2 + N(du2 )2 (2.51)

is called the Second Fundamental Form of the surface and the functions L, M, and N of u1 , and
u2 are called the coefficients of the second fundamental form.
It follows from (2.50) that the normal curvature κn is a property of the surface and a direction
at a point in the surface because all the coefficients in (2.50) are functions of u1 and u2 and the
differentials du1, du2 define a direction. We deduce that all curves having the same direction at
a point have the same normal curvature at that point. If Γ has principal normal n̂ and the angle
between n̂ and the surface normal N̂ is ϕ then by (2.48)

κn = κ cos ϕ (2.52)

and this is called Meusnier’s Theorem.


• We can also express κn in a slightly different way which gives a different but equivalent
set of expressions for L,M and N. Since t̂ is tangent to Γ we must have t̂ · N̂ = 0 which on
differentiating with respect to s yields

dt̂ dN̂ dr dN̂


κn = r′′ · N̂ = · N̂ = −t̂ · =− ·
ds ds ds ds
i.e.
dr · dN̂
κn = − . (2.53)
ds2
We now use
dN̂ = N̂,1du1 + N̂,2 du2
and equation (2.46) for dr to obtain
  2     2
r,1 · N̂,1 du1 + r,1 · N̂,2 +r,2 · N̂,1 du1 du2 + r,2 · N̂,2 du2
− κn =
ds2
and it follows from this that

L = −r,1 · N̂,1
 
1
M =− r · N̂,2 + r,2 · N̂,1 (2.54)
2 ,1
N = −r,2 · N̂,2 .

The equivalence of (2.49) and (2.54) can be established very easily by differentiating

r,1 · N̂ = 0, r,2 · N̂ = 0.
33
Chapter 2. Surfaces in E3

Let us now introduce a 2 × 2 symmetric matrix b whose entries bαβ are defined by

bαβ = N̂ · r, αβ = −r,α · N̂,β (2.55)

so that " #" # " #


b11 b12 N̂ · r,11 N̂ · r,12 L M
b= = (2.56)
b21 b22 N̂ · r,21 N̂ · r,22 M N
and the second fundamental form (2.51) can be written as

II = bαβ duαduβ . (2.57)

Osculating paraboloid

Let us now obtain an expression for the deviation of a surface from a tangent plane near to the
point of contact between the surface and plane. We recall (2.23) which gives us the perpendicular
distance between an arbitrary point on a surface and a tangent plane at r0 . For our purposes
here we suppose that the point of contact r0 has parameter values uα and the arbitrary, nearby
point has parameters uα + duα . We then have

d = N̂ · [r(u1 + du1, u2 + du2 ) − r(u1 , u2 )]


1
= N̂ · [(du1 r,1 + du2r,2 ) + {(du1)2 r,11 + 2du1h2 r,12 + (du2 )2 r,22 }]
2

where we have ignored (duα)3 and higher order terms in the Taylor expansion. Using N̂ · r,α = 0,
we have
1
d = bαβ duα duβ . (2.58)
2
The function d is called the osculating paraboloid at the point of contact. In a small region
around the point of contact, therefore, d is positive if the point uα + duα is on that side of the
tangent plane into which N̂ is pointing and negative if on the opposite side.
Let us now obtain a classification of the points of a surface. Since ds is always positive
(except when both du1 and du2 are zero) we see from (2.50) that the sign of κn is the same as
the sign of II. Now the sign of II, or rather the change in sign of II is governed by the value of
the discriminant LN − M 2 of II.

Elliptic case: If LN − M 2 > 0 then II does not change sign at the particular point in question
and the sign of κn is the same as the sign of II for all directions at that point. We call this an
elliptic point. In the neighbourhood of an elliptic point the surface lies entirely on one side of
the tangent plane at the point. All points of a sphere and an ellipsoid are elliptic.

Hyperbolic case: If LN − M 2 < 0 at a point P then there are two distinct lines in the tangent
plane of P which divide the tangent plane into four subsubsections in which II is alternately
positive and negative. The point P is called a hyperbolic point and on the two lines called
34
APM3713/1

Figure 2.1:

asymptotic lines II = 0. In the neighborhood of a hyperbolic point P the surface lies on both
sides of the tangent plane of P .

Parabolic case: If LN − M 2 = 0 at a point P , with not all coefficients L, M and N vanishing,


then II is zero for one particular direction du10 , du20 only and has the same sign for all other
directions. In this case P is called a parabolic point. For example, all points on a cylinder are
parabolic. As with an elliptic point, the surface near a parabolic point also lies on one side of the
tangent plane but the order of contact between the plane and the surface at a parabolic point is
higher than one. By this we mean that the plane does not touch the surface in one point only
but along a line or curve.

Planar case: If L = M = N = 0 at a point P then the osculating paraboloid degenerates to a


plane. The shape of the surface around near the point of contact is not defined by the process
used here in he sense that nothing can be said with its aid concerning the sign of d.

As an exercise prove that a surface having only planar points is a plane.

A torus (i.e. a doughnut with a hole in the middle) is a particularly interesting surface because
it has three types of points. On the outer rim all the points are elliptic. On the inner rim all the
points are hyperbolic and on the circles around the top and bottom of the torus, all points are
parabolic.

2.5 Principal curvatures

Since the normal curvature of a surface depends on the direction duα it is natural to expect that
there will be certain directions in which it assumes extreme values of κn . Let us now investigate
35
Chapter 2. Surfaces in E3

this problem and for convenience let us simply denote the normal curvature by κ. Previously
we denoted the curvature of a curve in E3 by κ and if there is any danger of confusion in what
follows, we shall revert to κn for normal curvature of a surface. We re-write (2.50) as

bαβ duα duβ


κ= (2.59)
gµν duµ duν

where we have used (2.31) and (2.57). This equation can be written as

(bαβ − κgαβ )duαduβ = 0. (2.60)

To avoid the notation becoming cumbersome let us put τ α = duα so

(bαβ − κgαβ )τ α τ β = 0. (2.61)

To find the directions τ α which extremalize κ we differentiate (2.61) with respect to τ α to obtain

∂κ
2(bαβ − κgαβ )τ β − gµν τ µ τ ν = 0.
∂τ α
[Note the care exercised in the summation convention.] If we bear in mind that gµν τ µ τ ν 6= 0
(because I is positive definite and (τ 1 , τ 2 ) 6= (0, 0)) then the usual equation for maxima and
minima viz.
∂κ
=0
∂τ α
reduces to
(bαβ − κgαβ )τ β = 0 (2.62)

which is nothing more than an eigenvalue problem. Before we determine the eigenvectors τ β
directly from (2.62), let us first determine the eigenvalues κ from the characteristic equation

det (bαβ − κgαβ ) = 0 (2.63)

which is a quadratic equation in κ. Written out in full, (2.63) is

(EN + GL − 2F M) LN − M 2
κ2 − κ + = 0. (2.64)
EG − F 2 EG − F 2
The roots κ(1) and κ(2) of this equation are called the principal curvatures – they are the maximum
and minimum values of the normal curvature.

Mean curvature and Gaussian curvature

The actual values of κ(1) and κ(2) are of little importance – what is of importance, however, is
the sum and the product of κ(1) and κ(2) . We define the mean curvature µ by

1 EN + GL − 2F M
µ = (κ(1) + κ(2) ) = (2.65)
2 2(EG − F 2 )
36
APM3713/1

and the Gaussian curvature K by

LN − M 2 det bαβ
K = κ(1) κ(2) = 2
= . (2.66)
EG − F det gαβ

The expressions on the right of equations (2.65) and (2.66) are obtained by using the well known
formulae for the sum and product of the roots of a quadratic equation.
We shall show later that although K is defined in terms of extrinsic quantities bαβ , it is in
fact an intrinsic quantity, i.e. a function of gαβ and its derivatives. Gauss himself regarded this
as a most remarkable result and called it the “Theorema Egregium”.

Principal directions

We now return to the eigenvectors of (2.62) i.e. to the so-called principal directions – obviously
these are the directions in which the normal curvature take on the values of the principal
β β
curvatures. If we denote these directions by τ(1) and τ(2) corresponding to κ(1) and κ(2)
respectively, then by (2.62) we have

β
(bαβ − κ(1) gαβ )τ(1) =0 (2.67)

β
(bαβ − κ(2) gαβ )τ(2) = 0. (2.68)
α α
If we multiply (2.67) by τ(2) , equation (2.68) by τ(1) and subtract we obtain

α β
(κ(2) − κ(1) )gαβ τ(1) τ(2) = 0

α β
and if κ(1) 6= κ(2) then gαβ τ(1) τ(2) = 0 and by (2.42) the principal directions are orthogonal.
We now determine the principal directions – we do this by eliminating κ from equations
(2.62). For the two values of α in (2.62) we have two equations viz.

b1β τ β = κg1β τ β

and
b2β τ β = κg2β τ β .

We multiply the first equation by g2α τ α , the second by g1α τ α and subtract to obtain

(g1α b2β − g2α b1β )τ α τ β = 0. (2.69)

[Note that
g1β g2α τ α τ β = g1α g2β τ α τ β

because the α and β are both dummy indices so that they can be interchanged on either side of
the equation.] If we now sum over α and β in (2.69) we obtain

(EM − F L)(τ 1 )2 + (EN − GL)τ 1 τ 2 + (F N − GM)(τ 2 )2 = 0 (2.70)


37
Chapter 2. Surfaces in E3

which we can regard as a quadratic in


τ1 du1
= .
τ2 du2
α
In other words, (2.70) determines two directions which, of course, correspond to τ(1) and
α
τ(2) . Notice however that when the coefficients of the first and second fundamental forms are
proportional i.e.
L M N
= = (2.71)
E F G
then all the coefficients in (2.70) are zero and the principal directions are indeterminate i.e. the
normal curvature is the same in all directions. A point where this occurs i.e. where (2.71) is
valid, is called an umbilic or navel point. Also, since at an umbilic point the normal curvature
is the same in all directions, it follows that the values of the principal curvatures are equal, i.e.
κ(1) = κ(2) .

Lines of curvature

A curve on a surface whose tangent at each point is along a principal direction is called a line of
curvature. There is a simple relationship between dN̂ and dr along a line of curvature, which we
now derive. By equation (2.62), along a line of curvature we have
 
bαβ − κ(p) gαβ duα = 0,

where κ(p) denotes one of the principal curvatures. But, by equations (2.55) and (2.29),

bαβ = −rα · N̂β


gαβ = rα · rβ .

Therefore
(−rα · N̂β − κ(p) rα · rβ )duβ = 0.
which gives
rα · (N̂β duβ + κ(p) rβ duβ ) = 0
or, since dN̂ = N̂β duβ and dr = rβ duβ ,

rα · (dN̂ + κ(p) dr) = 0 α = 1, 2.

where dr now refers to the tangent to the line of curvature, and dN̂ is the change in N̂ along the
line of curvature. We also have, since N̂ · dN̂ = 0 and N̂ · dr = 0,

N̂ · (dN̂ + κ(p) dr) = 0.

Since the vectors r,1 , r,2 and N̂ are linearly independent it follows that

dN̂ + κ(p) dr = 0 (2.72)


38
APM3713/1

along the line of curvature. Equation (2.72) is called Rodrigues’s formula and it characterizes
the lines of curvature.
It can be shown (the proof is not easy) that provided there are no umbilic points on the surface,
the lines of curvature cover the surface completely and can be used as an orthogonal curvilinear
coordinate system. When we choose our parametric curves in this way the fundamental forms
assume a particularly simple form. Since the curves are orthogonal we have

r,1 · r,2 = g12 = F = 0. (2.73)

We now recall that (2.70) determined the principal directions so in this equation we can firstly
choose du1 = 0, u2 arbitrary and then u1 arbitrary, du2 = 0. This gives us the equations

F N − GM = 0, EM − F L = 0

and, in view of (2.73), we deduce that

M = b12 = 0. (2.74)

In this special coordinate system, therefore, we have that the normal curvature is
2 2
bαβ duαduβ L (du1) + N (du2 )
κ= =
gµν duµ duν E (du1 )2 + G (du2 )2
 1 2  2 2
du du
=L +N (2.75)
ds ds

and the principal curvatures are obtained by choosing

du2 = 0, then du1 = 0

in succession to get
L N
κ(1) = , κ(2) = . (2.76)
E G

Angle measurements on surfaces

We can now obtain a very convenient formula for normal curvature but to derive it we need to
define the notion of angle in a tangent plane. If X and Y are vectors in a tangent plane and θ
the angle between X and Y then, in E3 ,

X · Y = |X||Y| cos θ.

If we re-write this in the 2-dimensional language of the tangent plane we have

gαβ X α Y β
cos θ = 1 1 .
(gµν X µ X ν ) 2 (gρσ Y ρ Y σ ) 2

39
Chapter 2. Surfaces in E3

2.6 Fundamental Equations of Surface Theory

In the previous subsection we treated the coefficients of the first and second fundamental forms
as independent quantities. This subsection will be devoted to finding relations between the gαβ
and bαβ and trying to interpret these results geometrically. Frequently we shall have to deal with
derivatives of quantities like gαβ and we shall adopt the notation
∂gαβ
= gαβ,γ .
∂uγ
Let us begin by considering r,αβ . In general this is some vector in E3 and since the vectors
r,1 , r,2 and N̂ form a basis for E3 at each point of the surface, we can always write r,αβ as a
linear combination of these vectors i.e.

r,αβ = Γγαβ r,γ +Ωαβ N̂. (2.77)

Here, for the first time we have an object with 3 indices. We can think of the Γγαβ as being
a 3-dimensional 2 × 2 × 2 matrix consisting of 8 blocks – each block corresponding to one of the
8 quantities
Γ1 11 , Γ112 , Γ121 , Γ122 , Γ211 , Γ212 , Γ221 , Γ222 .
If we denote this matrix by Γ then we have
11111
00000
Γ 2 11111
00000
000000
111111
0000
111100000
11111
00000
11111
000
111
1111111
00000
00000
11111
000
111
0000
111100000
11111
000
111
00000
11111
00000
11111
Γ122
00000
11111
000
111
Γ1
00000
11111
000011111
1111000
111
00000
11111
Γ 1 11111
000000
111111
00000
00000
11111
000
111
00000
11 1211111
00000
0000
1111
000000
111111
00000
11111
000
111
Γ= Γ2
0000
1111
0000
1111
000000
111111
00000
11111
22
0000
1111
Γ
0000
1111
0000
1111
1
Γ 1
000
111
00000011111
111111
21
0000
1111 00000
22

Obviously the entry Γ221 is hidden from view. If we take the scalar product of (2.77) with N̂ we
obtain
Ωαβ = N̂ · r,αβ
and hence, by (2.55), we see that Ωαβ = bαβ i.e. the Ωαβ are the coefficients of the second
fundamental form. We now take the scalar product of (2.77) with r,δ to obtain

r,δ · r,αβ = Γγαβ r,γ · r,δ = gγδ Γγαβ . (2.78)

It is convenient at this stage to introduce the matrix inverse to g i.e. g −1 . We denote the entries
of g −1 by g αβ so that
g αβ gβγ = δγα .
We also adopt the convention that when we have symbols of the form gαβ X β or g αβ Yβ , we simply
write these as
gαβ X β = Xα and g αβ Yβ = Y α .
40
APM3713/1

In other words when gαβ or g αβ are multiplied by some object and there is summation over one
of the indices of the gαβ or g αβ then this process results in a lowering or raising of the summation
index on the object as shown. The essential point is that we do not change the kernel letter
representing the object. This only applies to gαβ and g αβ – if, for example, we have something like
bαβ X β , we would write this as
bαβ X β = Zα
to indicate that multiplication by bαβ changes the object X β into something totally different viz.
Zα .
Following this scheme we see that (2.78) can be written as

r,δ · r,αβ = Γ δαβ (2.79)

where
Γ δαβ = gγδ Γγ αβ or Γγ αβ = g γδ Γ δαβ .
We recall that gδα is defined by
gδα = r,δ · r,α
and if we differentiate this with respect to uβ we obtain

gδα,β = r,δ · r,αβ + r,α · r,δβ

or by (2.79),
gδα,β = Γ δαβ + Γ αδβ . (2.80)
We can obtain two similar equations by cyclicly permuting the indices α, β and δ, i.e.

α, β, δ → δ, α, β → β, δ, α.

We obtain
gβδ,α = Γ βδα + Γ δβα (2.81)
gαβ,δ = Γ αβδ + Γ βαδ . (2.82)
Now we notice from (2.77) that Γγ αβ is symmetric in its lower indices i.e.

Γγ αβ = Γγβα

so that
Γ γαβ = Γ γβα .
If we add (2.80) and (2.81) and substract (2.82) we thus obtain
1
Γ δαβ = (gδα,β + gβδ,α − gαβ,δ ) (2.83)
2
or
1
Γγ αβ = g γδ Γ δαβ = g γδ (gδα,β + gβδ,α − gαβ,δ ). (2.84)
2
41
Chapter 2. Surfaces in E3

Equations (2.83) or (2.84) are called the Christoffel relations and the symbols Γ δαβ and Γγαβ are
called Christoffel symbols of the first and second kind respectively. Equation (2.77) can thus be
written
1
r,αβ = g γδ (gδα,β + gβδ,α − gαβ,δ )r,γ + bαβ N̂ (2.85)
2
and this is called Gauss’s equation.
Let us now obtain an expression for the derivative of the surface normal vector N̂. Since
N̂ · N̂ = 1, it follows that N̂ · N̂,α = 0 so that N̂,α lies in the tangent plane and we can write

N̂,α = θβ α r,β . (2.86)

We take the scalar product of this equation with r,γ to obtain

r,γ · N̂,α = θβ α gβγ = θγα . (2.87)

Now since r,γ · N̂ = 0, we have by differentiating that

r,γ · N̂,α = −N̂ · r,γα = −bγα

or, from equations (2.86) and (2.87),

N̂,α = −g βγ bγα r,β (2.88)

and these are called Weingarten’s equations.


Thus far the development of this subsection has been largely a matter of formalism and we
have not said what the purpose of the formalism is. Let us now try to see exactly what is going
on and where we are going. The Gauss-Weingarten equations are differential equations for the
vectors r,α and N̂. The coefficients in these equations are the coefficients of the fundamental
forms of the surface r = r(uα ). The question now arises: if we are given arbitrary functions

gαβ (uα ) and bαβ (uα ),

is it possible to integrate the Gauss-Weingarten equations to obtain a function r = r(uα ) which


will represent some surface and, if so, what will the fundamental forms of this surface be? The first
part of the question can be answered very easily – we merely have to determine the integrability
conditions for the Gauss-Weingarten equations. These conditions are differential relationships
between the gαβ and bαβ and are obtained in the following way.
The integrability conditions of (2.77) are

r,αβǫ − r,αǫβ = 0. (2.89)

This is just a generalization of the very familiar situation where we have, for example,
∂f ∂f
= g(x, y) and = h(x, y).
∂x ∂y
42
APM3713/1

The integrability condition for this system is


∂2f ∂2f
− = 0.
∂x∂y ∂y∂x
Now, differentiating (2.77) with respect to uǫ yields

r,αβǫ = Γγαβ,ǫ r,γ + Γγαβ r,γǫ + bαβ,ǫ N̂ + bαβ N̂,ǫ

and if we substitute for r,γǫ and N̂,ǫ from (2.77) and (2.88) (exercising due care with the
summation convention!) we obtain

r,αβǫ = Γγαβ,ǫ r,γ +Γγαβ (Γµγǫ r,µ + bγǫ N̂+


+ bαβ,ǫ N̂ − bαβ (g µν bνǫ r,µ )
= r,µ (Γµαβ,ǫ + Γγαβ Γµγǫ − g µν bαβ bνǫ )+
+ N̂(bαβ,ǫ + Γγαβ bγǫ ).

Now condition (2.89) simply means that we must interchange the indices β and ǫ in the above
equation and subtract. Since the vectors r,µ and N̂ form a basis we can obtain two separate
equations from (2.89) – one from the coefficients of r,µ and the other from the coefficients of N̂.
We thus obtain

Γµαβ,ǫ − Γµαǫ,β + Γγαβ Γµγǫ − Γγαǫ Γµαβ = g µν (bαβ bνǫ − bαǫ bνβ ) (2.90)

and
bαβ,ǫ − bαǫ,β = Γγαǫ bγβ − Γγαβ bγǫ . (2.91)
These are called Gauss’s equation and Codazzi’s equation respectively. It is customary to
represent the left hand side of (2.90) by the 4-index symbol Rµαβǫ i.e.

Rµαβǫ = Γµαβ,ǫ − Γµαǫ,β + Γγαβ Γµγǫ − Γγαǫ Γµαβ (2.92)

and, as usual, we write


gµν Rµαβǫ = Rναβǫ
so that (2.90) can be written as

Rναβǫ = bαβ bνǫ − bαǫ bνβ . (2.93)

At this stage we should also investigate the integrability conditions of Weingarten’s equation
but these turn out to be equivalent to (2.91) so we can say that the Gauss-Codazzi equations
are the only integrability conditions for the Gauss-Weingarten equations. We see thus that if
we are given a surface r = r(uα ) then the coefficients of the fundamental forms of this surface
satisfy the differential conditions (2.90) and (2.91). We can now answer the second part of the
question posed earlier viz. given arbitrary functions gαβ and bαβ , can we integrate the Gauss-
Weingarten equations to obtain r = r(uα ) and, if so, what are the fundamental forms of this
43
Chapter 2. Surfaces in E3

surface? Clearly the functions gαβ and bαβ must satisfy the Gauss-Codazzi equations because
they are the integrability conditions for the Gauss-Weingarten equations and it turns out, as we
might expect, that the coefficients of the fundamental forms are just gαβ and bαβ respectively. We
will not actually prove this last statement because the full, rigorous proof is rather complicated
and belongs more to the theory of differential equations than to geometry. We can summarize
all of this in the so-called Fundamental Theorem of Surfaces or, more commonly,

44
APM3713/1

Bonnet’s Theorem
If gαβ duα duβ and bαβ duαduβ are given quadratic differential forms such that

(i) gαβ duα duβ > 0 for all duα 6= 0 and

(ii) the coefficients gαβ and bαβ satisfy the Gauss-Codazzi equations,

then there exists a surface, obtained by integrating the Gauss-Weingarten equations, whose
first and second fundamental forms are gαβ duα duβ and bαβ duα duβ respectively. This surface
is uniquely determined up to an Euclidean motion.
Let us now be a little more explicit about what we mean by an intrinsic property of a surface.
To be absolutely precise about this definition requires a lot of preliminary discussion on the notion
of an isometric mapping of one surface onto another so let us rather give a quick un-rigorous
version. Suppose that r = r(uα ) and r = r̃(v α ) are two surfaces and suppose that ϕ is a mapping
of the points uα to the points v α . This mapping will induce a mapping of the vectors r,α to r̃,α
and hence a mapping of gαβ to ϕ(gαβ ) = g̃αβ . The mapping ϕ is called isometric if

gαβ (uγ )duαduβ = g̃αβ (v γ )dv α dv β

i.e. if ϕ preserves the notion of length. A quantity which is preserved under all isometric mappings
is called an intrinsic property of the surface. Now, we can simplify matters enormously if we
choose the parameters v α to coincide with uα and this can always be done, locally at least. It
then follows directly that the isometric mapping is characterized by

gαβ = g̃αβ

so that not only length but also area and angle in the tangent plane are intrinsic properties. But,
because of this special choice of the parameters we see that

gαβ,γ = g̃αβ,γ , gαβ,γǫ = g̃αβ,γǫ

etc. so that the Christoffel symbols Γ µαβ and Γµαβ which are functions of gαβ , g αβ and their first
derivatives are also preserved under ϕ. By the same token Rµαβǫ and Rναβǫ which depend on
Γµαβ and Γµαβ,ǫ are also intrinsic properties. In fact, any object which is a function of the gαβ and
its derivatives is intrinsic.
By contrast, the one object which is certainly not preserved under an isometric mapping
is the normal vector N̂ since there is no guarantee that r,α × r,β will be preserved under ϕ. It
follows that any quantity containing N̂ in its definition is not intrinsic and we call such quantities
extrinsic. Obvious examples are bαβ and r,αβ .
We now have a very interesting situation if we look at equation (2.93) – the left hand side
is intrinsic while the right hand side appears to be very extrinsic. It follows that when the
coefficients of the second fundamental form are combined in the form

bαβ bνǫ − bαǫ bνβ


45
Chapter 2. Surfaces in E3

then this quantity is intrinsic. This is the famous Theorema Egregium of Gauss because if we
choose ν = 1, α = β = 2, ǫ = 1 we obtain from (2.93)

R1221 = b22 b11 − b21 b12 = LN − M 2 . (2.94)

We see from (2.66), therefore, that the Gaussian curvature K defined by

LN − M 2 R1221
K= = (2.95)
EG − F EG − F 2
is an intrinsic quantity.
Now the object Rναβǫ can be thought of as the entries in a 4-dimensional 2 × 2 × 2 × 2 matrix
so that there are 24 = 16 entries and one of these entries viz. R1221 has given us the Theorema
Egregium so we might be tempted to look at the other 15 entries in the hope of finding some
equally remarkable results. Unfortunately equation (2.94) is the only information that can be
extracted from (2.93) because it is easy to check that

R1221 = R2112 = −R2121 = −R1212

and the other 12 entries are all identically zero.

2.7 Concluding Remarks

The next natural step in the development of the theory of surfaces would be an investigation of
those curves on a surface which minimize the distance between two points. Such curves are called
geodesics. The geodesics on the surface of a sphere, for example, are great circles, i.e. circles
whose centers are at the center of the sphere. The theory of geodesics is not particularly difficult
and will be dealt with in the next chapter.
It would be crime, however, to write about the differential geometry of surfaces without
mentioning the Gauss-Bonnet theorem.
Suppose that on a surface we draw a geodesic triangle ABC. By this we mean that the points
A, B and C are joined by geodesics. In general, the sum of the angles A + B + C 6= π and we
call A + B + C − π the excess of the triangle. Now if the sides of the geodesic triangle enclose a
region Ω of the surface, then we define the total curvature of this region to be
Z
K dσ

where K = K(uα ) is the Gaussian curvature. In its simplest form, the Gauss-Bonnet theorem
states that Z
A+B+C −π = K dσ

i.e. the excess of the triangle is equal to the total curvature.
The proof of the theorem is not all that difficult and is given in all standard textbooks on
differential geometry. Let us look at one interesting implication of the theorem. Suppose there
46
APM3713/1

existed intelligent 2-dimensional beings who were constrained to live on the surface of a sphere.
If these characters started doing some mathematics they would develop Euclidean geometry, the
notion of intrinsic and extrinsic geometry of surfaces, the theorema egregium and the Gauss-
Bonnet theorem. They could then determine whether they were living on a plane or on a curved
surface by the following experiment. They could construct a geodesic triangle by joining up the
vertices with pieces of elastic (remember that a geodesic minimizes length), they could measure
the angles and determine the excess. By the Gauss-Bonnet theorem they would deduce that the
total curvature of their world was not zero and consequently that they were living on a curved
surface.
This sounds a bit hypothetical but, in fact, it was precisely this sort of experiment which gave
a lot of credence to Einstein’s general theory of relativity. The basic postulate of this theory is
that matter curves the geometry of the universe. This means that geodesics will not, in general,
be straight lines. When it was observed that a light ray from a distant star was deflected from
a straight line by the sun, the idea of the sun actually bending the space around it became very
acceptable.
We can summarize the first two chapters as follows. The curvature of a curve is a measure
of how much the curve differs from a straight line and curvature is an extrinsic quantity, i.e. it
cannot be measured by simply taking measurements along the curve. By the same token, the
normal curvature of a surface is a measure of how much the surface differs from a plane and
normal curvature is an extrinsic property. Of particular importance are the extreme values of
the normal curvature – these are the principal curvatures. The remarkable thing is that the
product of the principal curvatures (i.e. the Gaussian curvature) is an intrinsic property – this
is the theorema egregium. This means that the Gaussian curvature can be measured by taking
measurements entirely on the surface and the Gauss-Bonnet theorem tells us how to make these
measurements.

2.8 Examples

Most of the following examples use the surface

u3 v3
 
r(u, v) = u− + uv 2 , v − + vu2 , u2 − v 2 (u, v) ∈ R2
3 3

to illustrate the theory of surfaces. This surface is called Enneper’s surface, and is interesting
because it is an example of a minimal surface. A minimal surface has mean curvature zero
everywhere, and is called minimal because it has the property that it is the surface of minimal
area having a given closed curve as boundary.

47
Chapter 2. Surfaces in E3

Example 2.8.1

Consider the surface with parameterization


u3 v3
 
2 2 2 2
r(u, v) = u − + uv , v − + vu , u − v (u, v) ∈ R2
3 3
Find the first and second fundamental forms, the principal curvatures, the lines of curvature,
and the asymptotic curves of this surface. Are there any umbilic points on this surface?.

Solution. Let u be the parameter u1 and v be the parameter u2 . Then

r1 = ru = (1 − u2 + v 2 , 2uv, 2u)
r2 = rv = (2uv, 1 − v 2 + u2, −2v)
r11 = ruu = (−2u, 2v, 2)
r12 = ruv = (2v, 2u, 0)
r22 = rvv = (2u, −2v, −2)

Hence

E = ru · ru = (1 − u2 + v 2 )2 + 4u2v 2 + 4u2 = (1 + u2 + v 2 )2
F = ru · rv = 0
G = rv · rv = 4u2 v 2 + (1 − v 2 + u2 )2 + 4v 2 = (1 + u2 + v 2 )2

So
ds2 = (1 + u2 + v 2 )2 du2 + (1 + u2 + v 2 )2 dv 2 .
We require the unit normal N̂. We have

r1 × r2 = ru × rv
i j k
2
= 1−u +v 2
2uv 2u
2 2
2uv 1 − v + u −2v
= (1 + u2 + v 2 )(−2u, 2v, 1 − u2 − v 2 ).

Hence
p
|ru × rv | = (1 + u2 + v 2 ) 4u2 + 4v 2 + (1 − u2 − v 2 )2
p
= (1 + u2 + v 2 ) (1 + u2 + v 2 )2
= (1 + u2 + v 2 )2 .

Therefore
ru × rv
N̂ =
|r × rv |
u 
1
= (−2u, 2v, 1 − u2 − v 2 ).
1 + u2 + v 2

48
APM3713/1

So we get

L = N̂ · r11 = N̂ · ruu
2(1 + u2 + v 2 )
= = 2.
1 + u2 + v 2
M = N̂ · r12 = N̂ · ruv
2(−2uv + 2uv)
= = 0.
1 + u2 + v 2
N = N̂ · r22 = N̂ · rvv
2(−1 − u2 − v 2 )
= = −2.
1 + u2 + v 2
(So, by the way, LN − M 2 = −4 < 0 so all points on this surface are hyperbolic.) The second
fundamental form is
II = 2 du2 − 2 dv 2.
We can now determine the principal curvatures using the characteristic equation (2.64).
Substituting the values we have calculated for E, F , G, L, M and N into equation (2.64)
gives
4
κ2 − = 0.
(1 + u2 + v 2 )4
Factorizing the left side of this equation gives
  
2 2
κ+ κ− = 0.
(1 + u2 + v 2 )2 (1 + u2 + v 2 )2
The principal curvatures are therefore
2
κ1 =
(1 + u2
+ v 2 )2
2
κ2 = − .
(1 + u2 + v 2 )2
We can use equation (2.70) to determine the principal directions (τ1 , τ2 ) at each point (u, v) on
the surface. Substituting the values for E, F , G, L, M and N into (2.70) gives
−4
τ 1τ 2 = 0
(1 + u2 + v 2 )2
with solutions τ 1 = 0, τ 2 = anything or τ 1 = anything, τ 2 = 0. Hence the principal directions are
(0, 1) and (1, 0) at each point. Therefore the parametric curves u = constant and v = constant
are the lines of curvature on this surface. The differential equation II = 0 determines the
asymptotic curves. Hence
du2 − dv 2 = 0
determines the asymptotic curves. This equation can be written

(du + dv)(du − dv) = 0.


49
Chapter 2. Surfaces in E3

Thus d(u + v) = 0 or d(u − v) = 0, which gives u + v = constant, u − v = constant for the


asymptotic curves. The two families of asymptotic curves will therefore be

α1 (t) = r(t, c1 − t)

and
α2 (t) = r(t, t − c2 ).

Since umbilic points occur where κ1 = κ2 and we have κ1 6= κ2 everywhere on the surface, it
follows that there are no umbilic points on this surface.

Example 2.8.2

For the surface given in example 2.8.1, verify that Rodrigues’s formula holds along the line of
curvature u = 1.

Solution. Since
ds2 = (1 + u2 + v 2 )2 du2 + (1 + u2 + v 2 )2 dv 2

and
II = 2 du2 − 2 dv 2

we get from equation (2.59) that the normal curvature along u = 1 is

−2 dv 2 −2
κn = 2 2 2
= .
(2 + v ) dv (2 + v 2 )2

(Note that this agrees with the principal curvature which we called κ2 in example 2.8.1) In order
to verify Rodrigues’s formula (2.72) we must still calculate dr and dN̂ along u = 1. Then du = 0
and we get

∂r ∂r
dr = du + dv
∂u ∂v
= (2uv, 1 − v 2 + u2 , −2v) dv along u = 1
= (2v, 2 − v 2 , −2v) dv

and

∂ N̂ ∂ N̂
dN̂ = du + dv
∂u ∂v 
1
= (4uv, 2 + 2u2 − 2v 2 , −4v) dv
(1 + u + v 2 )2
2
 
1
= (4v, 4 − 2v 2 , −4v) dv along u = 1.
(2 + v 2 )2

50
APM3713/1

Hence along u = 1

κ2 dr + dN̂
 
−2
= (2v, 2 − v 2 , −2v) dv
(2 + v 2 )2
 
1
+ (4v, 4 − 2v 2 , −4v) dv
(2 + v 2 )2
  
1 2 2
= (−4v, −4 + 2v , 4v) + (4v, 4 − 2v , −4v) = 0
(2 + v 2 )2

as required.

Example 2.8.3

For the surface given in example 2.8.1, calculate the third fundamental form III and verify that
for this surface
III − 2µ II + K I = 0.

Solution. The third fundamental form is defined in Exercise 2.9.8 at the end of Chapter 2 to be

III = dN̂ · dN̂.

From example 2.8.1 we have


 
1
N̂ = (−2u, 2v, 1 − u2 − v 2 ).
1 + u2 + v 2
Therefore
∂N ∂ N̂
dN̂ = du + dv
∂u ∂v 
1
= ((−2 + 2u2 − 2v 2 , −4uv, −4u)du
(1 + u + v 2 )2
2

+ (4uv, 2 + 2u2 − 2v 2, −4v)dv)

and so we get  
1
III = (A du2 + B dv 2 )
(1 + u2 + v 2 )4
where

A = (−2 + 2u2 − 2v 2 )2 + 16u2 v 2 + 16u2 = 4(1 + u2 + v 2 )2


B = (2 + 2u2 − 2v 2 )2 + 16u2v 2 + 16v 2 = 4(1 + u2 + v 2 )2 .

Hence  
4
III = (du2 + dv 2 )
(1 + u2 + v 2 )2
51
Chapter 2. Surfaces in E3

and from example 2.8.1

I = (1 + u2 + v 2 )2 (du2 + dv 2 )
II = 2(du2 − dv 2 ).

Also
κ1 + κ2
µ= = 0,
2
4
K = κ1 κ2 = − .
(1 + u2 + v 2 )4

Hence

III − 2µ II + K I
 
4
= 2 2 2
(du2 + dv 2 )
(1 + u + v )
 
4
− 2 2 4
(1 + u2 + v 2 )2 (du2 + dv 2 )
(1 + u + v )
=0

as required.

Example 2.8.4

Use Euler’s Theorem to calculate the normal curvature in the direction (1, 1) at each point of
the surface in example 2.8.1 Verify your answer by direct calculation from equation (2.50).

Solution In the solution to example 2.8.1 we showed that the parametric curves were lines of
curvature, and since we also showed that there were no umbilic points on this surface, it follows
that (see page 59) the lines of curvature are orthogonal. Hence the direction (du, dv) = (1, 1),
or du = dv, bisects the angle between the parametric lines, and therefore makes an angle of
45 degrees with both parametric curves. By Euler’s Theorem the normal curvature κn in this
direction is therefore

κn = κ(1) cos2 (45◦ ) + κ(2) sin2 (45◦ )


 
1 2 1 2
= + −
2 (1 + u2 + v 2 )2 2 (1 + u2 + v 2 )2
=0

where we have used the values for the principal curvatures calculated in example 2.8.1 We also
have from example 2.8.1 that L = 2, M = 0 and N = −2. Since du = dv it follows immediately
from equation (2.50) that κn = 0 for the direction du = dv.
52
APM3713/1

Example 2.8.5

Calculate the Christoffel symbols of both kinds for the surface in example 2.8.1

Solution.
From example 2.8.1 we have

" #
(1 + u2 + v 2 )2 0
g = [gαβ ] = .
0 (1 + u2 + v 2 )2

Therefore for the inverse we get

 1 
0
 (1 + u2 + v 2 )2
g −1 = [g αβ ] =  .

1
0
(1 + u2 + v 2 )2

We can use either (2.79) or (2.83) to calculate Γγαβ . I will use (2.83) and you can check the
answers with equation (2.79).

1
Γ111 = (g11,1 + g11,1 − g11,1 ) = 2u(1 + u2 + v 2 ),
2
1
Γ211 = (g21,1 + g12,1 − g11,2 ) = −2v(1 + u2 + v 2 ),
2
1
Γ121 = (g12,1 + g11,2 − g21,1 ) = 2v(1 + u2 + v 2 ),
2
1
Γ221 = (g22,1 + g12,2 − g21,2 ) = 2u(1 + u2 + v 2 ),
2
1
Γ112 = (g11,2 + g21,1 − g12,1 ) = 2v(1 + u2 + v 2 ),
2
1
Γ212 = (g21,2 + g22,1 − g12,2 ) = 2u(1 + u2 + v 2 ),
2
1
Γ122 = (g12,2 + g21,2 − g22,1 ) = −2u(1 + u2 + v 2 ),
2
1
Γ222 = (g22,2 + g22,2 − g22,2 ) = 2v(1 + u2 + v 2 ).
2

For the Christoffel symbols of the second kind we get from equation (2.84)

Γγαβ = g γ1 Γ1αβ + g γ2 Γ2αβ .


53
Chapter 2. Surfaces in E3

Therefore
2u
Γ111 = ,
1 + u2 + v 2
2v
Γ121 = ,
1 + u2 + v 2
2v
Γ112 = ,
1 + u2 + v 2
2u
Γ122 =− ,
1 + u2 + v 2
2v
Γ211 =− ,
1 + u2 + v 2
2u
Γ221 = ,
1 + u2 + v 2
2u
Γ212 = ,
1 + u2 + v 2
2v
Γ222 = .
1 + u2 + v 2

Example 2.8.6

Use the results from the above Examples to verify that Gauss’s equation (2.85) is valid for the
surface in example 2.8.1

Solution. Gauss’s equation is


r,αβ = Γγαβ r,γ + bαβ N̂.

We have now calculated everything appearing in this equation so it is simply a matter of plugging
these quantities into the equation and checking that the left hand side equals the right hand side.
I will do it for α = 1, β = 1 and leave the rest to you (That is, if you think you still need some
practice!). For α = 1, β = 1 the equation is

r,11 = Γ111 r,1 + Γ211 r,2 + b11 N̂.

Now from example 2.8.1

r,1 = (1 − u2 + v 2 , 2uv, 2u),


r,2 = (2uv, 1 − v 2 + u2 , −2v),
r,11 = (−2u, 2v, 2)

and

b11 = L = 2,
 
1
N̂ = (−2u, 2v, 1 − u2 − v 2 ).
1 + u2 + v 2

54
APM3713/1

Hence
 
2u
Γ111 r,1= 2 2
(1 − u2 + v 2 , 2uv, 2u),
1+u +v
 
2 −2v
Γ11 r,2 = (2uv, 1 − v 2 + u2 , −2v),
1 + u2 + v 2
 
2
b11 N̂ = (−2u, 2v, 1 − u2 − v 2 ).
1 + u2 + v 2

The sum of the above vectors gives

(−2u, 2v, 2) = r,11

as required.

Example 2.8.7

Verify that Weingarten’s equations (2.88) hold for the surface in example 2.8.1

Solution. Weingarten’s equations are

N̂,α = −g βγ bγα r,β .

I will verify it for α = 1 and you can verify it for α = 2.


For α = 1 we get

N̂,1 = −g 1γ bγ1 r,1 − g 2γ bγ1 r,2


= −g 11 b11 r,1 − g 12 b21 r,1 − g 21 b11 r,2 − g 22 b21 r,2 .

Since g 12 = g 21 = b21 = 0 the above equation reduces to

N̂,1 = −g 11 b11 r,1 .

From Example 3 we get


 
−2
N̂,1 = (1 − u2 + v 2 , 2uv, 2u).
(1 + u2 + v 2 )2

Since
1
b11 = 2, g 11 =
(1 + u2 + v 2 )2
and
r,1 = (1 − u2 + v 2 , 2uv, 2u)

the result follows.


55
Chapter 2. Surfaces in E3

Example 2.8.8

Derive Rodrigues’s Formula using Weingarten’s equations.

Solution. Multiply both sides of Weingarten’s equations by τ α to get (remember we are using
the summation convention!)

N̂,α τ α = −g βγ bγα r,β τ α


= −g βγ bγα τ α r,β .

(1)

But for a principal direction τ α = duα and for the principal curvature κ(p) in this direction we
get from equation (2.62)
bαβ τ β = κ(p) gαβ τ β

or by changing the free index and the dummy indices

bγα τ α = κ(p) gγα τ α

Substituting into equation (1) above gives

N̂,α τ α = −g βγ κ(p) gγα τ α r,β




= −κ(p) δαβ τ α r,β


= −κ(p) τ α r,α

Hence, replacing the comma notation with the partial derivatives and τ α by duα gives

∂ N̂ α ∂r
α
du = −κ(p) α duα
∂u ∂u

or
∂ N̂ α ∂r
α
du + κ(p) α duα = 0.
∂u ∂u
But
∂ N̂ α
dN̂ = du
∂uα
and
∂r α
dr = du
∂uα
so we get
dN̂ + κ(p) dr = 0

when dr is a principal direction.


56
APM3713/1

Example 2.8.9

Use Weingarten’s equations to calculate the third fundamental form III = dN̂ · dN̂ and hence
show that
III − 2µII + KI = 0.

Solution. We have
! !
∂ N̂ α ∂ N̂ β
dN̂ · dN̂ = du · du
∂uα ∂uβ
= N̂,α · N̂,β duα duβ .

By Weingarten’s equations we get

N̂,α = −g ηξ bξα r,η


N̂,β = −g σγ bγβ r,σ .

Hence

N̂,α · N̂,β = g ηξ g σγ bξα bγβ r,η · r,σ


= g ηξ g σγ bξα bγβ gησ
= g ηξ δηγ bξα bγβ
= g ηξ bξα bηβ .

Hence
III = g ηξ bξα bηβ duα duβ

and if we let aαβ be the coefficients in III then III = aαβ duα duβ and

aαβ = g ηξ bξα bηβ .

Next we express everything in terms of the functions E, F , G, L, M and N. We have


" #
L M
b = [bαβ ] = ,
M N
" #
E F
g = [gαβ ] = .
F G

Hence " #
−1 αβ 1 G −F
g = [g ] = .
EG − F 2 −F E
57
Chapter 2. Surfaces in E3

Hence

a11 = g 11 b11 b11 + g 12 b21 b11 + g 21 b11 b21 + g 22 b21 b21


 
1 2 2
= GL − 2F ML + EM
EG − F 2
a21 = a12 = g 11 b11 b12 + g 12 b21 b12 + g 21 b11 b22 + g 22 b21 b22
 
1 2
= GLM − F M − F LN + EMN
EG − F 2
a22 = g 11 b12 b12 + g 21 b12 b22 + g 12 b22 b12 + g 22 b22 b22
 
1 2
= GM − 2F MN + EN
EG − F 2
Let
2µII − KI = cαβ duα duβ .
We must show that aαβ = cαβ . By equations (2.65) and (2.66) we have
EN + GL − 2F M
2µ = ,
EG − f 2
LN − M 2
K= .
EG − F 2
Hence
EN + GL − 2F M LN − M 2
cαβ = bαβ − gαβ .
EG − L2 EG − F 2
Therefore
GL2 − 2F ML + EM 2
c11 = = a11 ,
EG − F 2
ENM + MGL − F M 2 − F LN
c12 = c21 = = a12 = a21 ,
EG − F 2
EN 2 − 2F MN − GM 2
c22 = = a22 .
EG − F 2
Therefore
III − 2µ II − K I = 0.

Example 2.8.10

Show that the torsions of the asymptotic lines through a point P on a surface are ±(−K)1/2 ,
where K is the Gaussian curvature of the surface at P .
(Note that asymptotic lines are curves on the surface which satisfy the equation

L (du1)2 + 2M du1 du2 + N (du2 )2 = 0,

so at each point on the curve the tangent to the curve points in an asymptotic direction at that
point.)

58
APM3713/1

Solution. On an asymptotic line the second fundamental form II = 0. Therefore by the result
proved in Example 9 we see that on an asymptotic line the Gaussian curvature K is given by
III
K=− .
I
We also get from equation (2.50) that the normal curvature κn = 0 along an asymptotic line.
Hence by equation (2.48) we have r′′ · N̂ = 0, so r′′ is perpendicular to N̂, and since (by a Frenet-
Serret equation) r′′ is parallel to the unit normal vector n to the asymptotic line, it follows that
n is perpendicular to N̂. So both n and t, the tangent vector to the asymptotic line, lie in the
tangent plane to the surface. Hence the unit binormal vector b to the asymptotic line is parallel
to N̂, so N̂ = ±b. Hence the Frenet-Serret equation
db
= −τ n
ds
becomes
dN̂
= ∓τ n
ds
along an asymptotic line. Taking the dot product of each side of this equation with itself gives

dN̂ · dN̂
= τ 2.
(ds)2
But
dN̂ · dN̂ III
2
= τ2 = = −K
(ds) I
so τ 2 = −K, and we get
τ = ±(−K)1/2
on an asymptotic line, as required.

2.9 Exercises

2.9.1. Express the following parametric surfaces in the form F (x, y, z) = 0.

(a) Ellipsoid: x = a sin u1 cos u2 , y = b sin u1 sin u2 , z = c cos u1 .


(b) Hyperboloid of 2 sheets: x = a sinh u1 cos u2 , y = b sinh u1 sin u2 , z = c cosh u1 .
(c) Cone: x = a sinh u1 sinh u2 , y = b sinh u1 cosh u2 , z = c sinh u1 .
(d) Elliptic Paraboloid: x = au1 cos u2 , y = bu1 sin u2 , z = (u1)2 .
(e) Hyperbolic Paraboloid: x = au1 cosh u2 ,
y = bu1 sinh u2 , z = (u1 )2 .

2.9.2 Show that the surface in question 1(e) can be given in the form

x = a(u1 + u2 ), y = b(u1 − u2), z = 4u1u2 .

59
Chapter 2. Surfaces in E3

2.9.3 Given a surface z = f (x, y), find the first fundamental form and N̂.
[Let the parameters u1 , u2 be x, y].
Repeat the problem for the surface F (x, y, z) = 0.

2.9.4 Find the element of area for the surface z = f (x, y).

2.9.5 What is the second fundamental form for z = f (x, y)?

2.9.6 Prove the equivalence of equations (2.49) and (2.54).

2.9.7 Show that (2.64) follows from (2.63).

2.9.8 We define the so-called 3rd Fundamental form III by III = dN̂ · dN̂. Show that

III − 2µII + KI = 0

where µ is the mean curvature and K the Gaussian curvature. [Because III is defined in
terms of vectors, it is parameter-invariant. This means that we can make a special choice
of parameters u1 , u2 in order to prove the identity. We choose u1 , u2 to be in the direction
of the principal directions at a point of the surface (i.e. we choose the parameter curves to
be the lines of curvature) which means that we can use Rodrigues‘s formula.]

2.9.9 Compute the Christoffel symbols of both kinds for polar coordinates in the plane.

60
APM3713/1

Chapter 3

Tensor analysis

3.1 Introduction

To work effectively in Newtonian theory, one really needs the formalism of vectors. This formalism
helps solve certain problems readily and reveals structure and thereby insight into the problem.
In exactly the same way, in relativity theory, one need the formalism of tensors. The approach
we adopt concentrates on the techniques of tensors without taking into account the deeper
geometrical significance behind the theory. That is, we shall be more concerned with what
we do with tensors rather than what tensors really are.

Manifolds and coordinates

We shall start by working with tensors defined in N dimensions. A tensor is an object which is
defined on a geometric entity called a manifold. In simple terms a manifold is something which
‘locally’ looks like an N-dimensional Euclidean space IRN . For example, compare a 2-sphere with
the Euclidean plane. They are clearly different. But a small bit of the 2-sphere looks very much
like a bit of the Euclidean plane. The fact that the 2-sphere is compact whereas the Euclidean
plane goes on to infinity is a global property. The property that manifolds are locally Euclidean
means that there is a homeomorphism (i.e., a continuous one-to-one mapping) which carries a
small region of the manifold onto a small region of IRN . If we choose a cartesian coordinate system
in IRN . It follows that each element of the small region of the manifold can be associated with N
real numbers xi (i = 1, 2, · · · , N) which are the coordinates of the corresponding points in IRN .
We call xi the local coordinates of the relevant element of the manifold. The local coordinates
are indeed local, in the sense that, the homeomorphism which carries one small region of the
manifold to IR N does not necessarily carry the whole manifold into IR N . In general each small
region has its own homeomorphism. It follows then that where two small regions overlap, the
elements in the intersection will have two sets of local coordinates and these two sets of numbers
can always be expresses in terms of each other by IRN → IRN coordinate transformation functions:
x̄i = x̄i (xj ). (3.1)
The elements of the manifold shall always be regarded as points. From the implicit function
theorem we can obtain the inverse transformation equations corresponding to (3.1) provided
61
Chapter 3. Tensor analysis

that Jacobian of the transformation is non-zero, that is,

∂ x̄i
det 6= 0. (3.2)
∂xj

This condition shall be assumed true for all coordinate transformations considered here.
We shall really only consider differentiable manifolds. These are spaces that are continuous
and differentiable. Roughly speaking, this means that it is possible to define a scalar field φ at
each point of the manifold and be sure that it can be differentiated everywhere. For example
the surface of a sphere is differentiable everywhere. That of a cone is differentiable everywhere
except at the apex. Nearly all manifolds used in physics are differentiable almost everywhere.

Curves and surfaces

Given a manifold, we shall be concerned with points on it and subsets of points which define
curves and surfaces of different dimensions. These curves and surfaces are frequently defined
parametrically. Thus since a curve has one degree of freedom it depends on one parameter and
so we define a curve by the parametric equations

C : xi = xi (u), i = 1, 2, · · · , N;

where u is the parameter and x1 (u), x2 (u), · · · , xN (u) are functions of u. Similarly surfaces of m
dimensions shall be defined in terms of m parameters. In this manner all the properties of curves
and surfaces dealt with previously are valid.

3.2 Contravariant and covariant tensors

There are two distinct approaches to the teaching of tensors: the abstract or index-free
(coordinate-free) approach and the conventional approach based on indices. The main advantage
of index-free approach is that it offers deeper geometrical insight. However, it has two
disadvantages. First, it requires much more mathematical background, which in turn takes
time to develop. Secondly, for all its elegance, when one wants to do a real calculation with
tensors then recourse has to be made to indices. We shall adopt the more conventional index
approach, because it proves faster and more practical for this module.

Covariant property of tensors. The essential point about tensors is that when we make a
statement we do not wish it to simply hold just for one coordinate system but rather for all
coordinate systems. Consequently we need to find out how quantities behave when we go from
one coordinate system to another. The approach we are going to adopt is to define a tensor in
terms of transformation properties that keeps it covariant under a coordinate transformation
(3.1).

62
APM3713/1

P′

1

  ix + dxi


P 
•
xi

Figure 3.1: The infinitesimal vector attached to P .

Scalar quantities. A scalar quantity, or in short, a scalar, is a quantity which can be measured
with a “scale”. It is a number and does not depend on the choice of coordinates, that is, V is a
scalar quantity if
V̄ = V. (3.3)

The scalar V may be defined at all points in a region in which case it defines an invariant field:

V̄ (x̄i ) = V (xi ). (3.4)

Also of importance are so-called scalar densities or invariant densities. A scalar density L of
weight p is an object which transforms according to the law

L̄ = J p L, (3.5)

where J is the Jacobian. As the name suggest a scalar density is a scalar field per unit volume.
Contravariant vectors. Consider an infinitesimal displacement in space from a point P with
coordinates xi to another point P ′ with coordinates xi + dxi . The two points define an infinitesi-
mal displacement vector whose coordinates are dxi . The vector is not to be regarded as free, but
as being attached to the point P (see Fig. 3.1.). In another coordinate system the infinitesimal
displacement vector will be from a point x̄i to a point x̄i + dx̄i .

Exercise: Show that the components of the displacement vector are related by

∂ x̄i j
dx̄i = dx , (3.6)
∂xj
where here and throughout we adopt summation convention.

Now taking the transformation of the components of the displacement vector (3.6) as a
prototype, we shall say that any set of N quantities Ai , (i = 1, 2, · · · ., N), which transform
according to (3.6), namely,
∂ x̄i
Āi = j Aj , (3.7)
∂x
forms components of a contravariant vector.
63
Chapter 3. Tensor analysis

Transformation rule for contravariant vectors. The form of the transformation equation
(3.7) for components of the contravariant vector should be studied carefully. It will be observed
that the dummy summation index j occurs once as a superscript and once as a subscript (i.e.,
in the denominator of the partial derivative). The free index i occurs as a superscript and is
associated with the “barred” symbol on both sides of the equation.

The infinitesimal vector is a special case of (3.7) where the components Ai are defined by dxi . A
contravariant vector may be defined at one point only. However, if it is defined at every point of
a region, so that Ai are functions of xk , then a contravariant vector field is said to exist in the
region.
An example of a vector with finite components is provided by the tangent vector
dxi
Ai = , (3.8)
du
to the curve C : xi = xi (u) (see Fig. 3.2).

i
[ dx ]
du P i
 [ dx ]R
du -

R


P

Figure 3.2: The tangent vector at two points of a curve C : xi = xi (u)

Covariant vectors. Consider a scalar field V . Its transformation property is given by (3.4).
Now the N derivatives ∂V /∂xi in the x-frame are related to the N derivatives ∂ V̄ /∂ x̄i in the
x̄-frame by
∂ V̄ ∂xj ∂V
= . (3.9)
∂ x̄i ∂ x̄i ∂xj
It is well known that ∂V /∂xj are components of a vector called the gradient of V and equation
(3.9) is the transformation of the gradient vector. Using this as a prototype we now call any set
of quantities Bi that transforms as (3.9), namely,
∂xj
B̄i = Bj , (3.10)
∂ x̄i
a covariant vector. Covariant vectors will be distinguished from contravariant vectors by writing
their components with subscripts instead of superscripts.

Transformation rule for Covariant vectors. From the form of the transformation
equation (3.10) we observe that the dummy summation index j occurs once as a superscript
and once as a subscript. The free index i occurs as a subscript and is associated with the
“barred” symbol on both sides of the equation.

64
APM3713/1

Tensors. A generalization of the above transformation laws for vectors can be carried out as
follows. Suppose that Ai and B j are two contravariant vectors. The N 2 quantities Ai B j are
taken as components of a contravariant second rank tensor. Its transformation properties are
∂ x̄i ∂ x̄j k l
Āi B̄ j = A B. (3.11)
∂xk ∂xl
Any set of N 2 quantities C ij transforming according to (3.11), i.e.,
∂ x̄i ∂ x̄j kl
C̄ ij = A , (3.12)
∂xk ∂xl
is a contravariant tensor of second rank. In a similar manner we show can that the N 2 quantities
Aij define a covariant tensor of second rank if they transform as

∂xk ∂xl
Āij = Akl , (3.13)
∂ x̄i ∂ x̄j
and the N 2 quantities B i j define a mixed tensor of second rank if they transform as

∂ x̄i ∂xl k
B̄ i j = B l. (3.14)
∂xk ∂ x̄j
A further generalization of tensors to higher ranks should now be an obvious step. The
transformation property of a generalized tensor T i···j k···l is given by
 i
∂ x̄j
 r
∂xs

i···j ∂ x̄ ∂x
T̄ k···l = ··· q · · · l T p···q r···s . (3.15)
∂xp ∂x ∂ x̄k ∂ x̄
If a mixed tensor has contravariant rank p and covariant rank q then it is said to be of type or
valence (p, q). For example, the transformation property of a type (1,2) tensor Ai jk is
∂ x̄i ∂xr ∂xs p
Āi kl = A rs . (3.16)
∂xp ∂ x̄k ∂ x̄l
The following properties of tensors are easy to deduce:

Equality. Two tensors are equal if their components are equal.

Addition. New tensors may be formed from known tensors by addition or subtraction.
However, only tensors of the same type may be added to yield new tensors, for example,
C i jk = Ai jk + B i jk . However Ai j and Ai jk cannot be added.

Product. The product of a tensor and a scalar is a tensor and any two arbitrary tensors may
be multiplied to form a new tensor. However note that the multiplication process is not
commutative. For example, the type (2,3) tensor defined by

C ik jlm = Ai j B k lm ,

is not the same as the type (2,3) tensor defined by

D ki lmj = B k lm Ai j .
65
Chapter 3. Tensor analysis

This can be seen by comparing, say, C 12 112 with D 12 112 , and this gives

C 12 112 = A1 1 B 2 12 , D 12 112 = B 1 11 A2 2 .

A special case of vector product is the scalar or dot product of two vectors defined by
A.B = Ai B i . Note that the magnitude is the invariant A defined by

A2 = Ai Ai . (3.17)

Contraction Any tensor with repeated indices can be contracted to obtain a tensor of lower
rank. For example Ai ij implies a summation over the index i and thus defined a covariant
vector Bj :
Bj = Ai ij .

Exercise: Show that Bj defined by Ai ij transforms as a covariant vector.

Quotient theorem. If the product of a given set of elements and a tensor results in a tensor
then the given set of elements are components of a tensor. It will be sufficient to prove
the theorem for a particular case, since the argument will easily be seen to be of general
application.
Exercise: Consider the product
Ai jk B k s = C i js , (3.18)

where B k s and C i js are known to be tensors. Show that Ai jk transforms as components of


a type (1,2) tensor.

Symmetry properties. A tensor is said to be symmetric with respect to two of its superscripts
(or subscripts) if the components of the tensor remain the same after the two superscripts (or
subscripts) are interchanged. For example, Aijk is symmetric with respect to the indices j, k
provided
Aijk = Aikj . (3.19)

If we define
1
Ai[jk] = (Aijk − Aikj ),
2
then the symmetry property (3.19) maybe written as

Ai[jk] = 0. (3.20)

On the other hand if interchanging to superscripts (or subscripts) leads to a change in the sign of
the components then the tensor is said to skew-symmetric or anti-symmetric. For example Bijk
is skew-symmetric in the indices j, k provided

Bijk = −Bikj . (3.21)


66
APM3713/1

Here also if we define


1
Bi(jk) = (Bijk + Bikj ),
2
then the skew-symmetry property (3.21) maybe written as

Bi(jk) = 0. (3.22)

Example 3.2.1

Show that any arbitrary tensor Sij can always be decomposed into a symmetric tensor and a
skew-symmetric.
Solution: It is easy to construct the identity
1 1
Sij = (Sij + Sji ) + (Sij − Sji)
2 2
= S(ij) + S[ij]

where we put
1 1
S(ij) = (Sij + Sji), S[ij] = (Sij − Sji ).
2 2
Note also that
S(ij) = S(ji) , S[ij] = −S[ji] .

3.3 The metric tensor

A differentiable manifold on which the functions gij (xk ) are defined to determine a measure of
distance between adjacent points on the manifold is called a differentiable Riemannian manifold.
The functions gi j(xk ) act as components of the metric tensor field:

ds2 = gij dxi dxj . (3.23)

Basically equation (3.23) is a relationship between all pairs of adjacent points and is called a
metrical connection, with ds2 termed the Riemann metric.
By imposing conditions on the functions gij it is sometimes possible to obtain Cartesian
coordinates yi so that
ds2 = (dy 1)2 + (dy 2)2 + · · · + (dy N )2 , (3.24)

and reduce the (3.23) to a Euclidean metric. However (3.23) for arbitrary functions gij describes
the metric in a Riemannian space. The N 2 coefficients gij in (3.23) are specified in some
coordinate frame at every point of the Riemannian space. It will be assumed, without loss
of generality, that the gij are symmetric. The distance ds between any two points is an invariant
and the gij must accordingly transform so that this shall be so. Since dxi dxj is an arbitrary
symmetric tensor, ds2 is an invariant (i.e., a tensor of zero rank). It follows from (3.23), using
the quotient theorem, that gij is a tensor. We refer to gij as the covariant metric tensor. The
67
Chapter 3. Tensor analysis

contravariant tensor which is conjugate to gij , namely, g ij is called the contravariant metric tensor
i.e.,
gij g jk = δik , (3.25)
where
co-factors of gij
g ij = , (3.26)
determinant of gij
and this exist only if the magnitude of the determinant g = |gij | =6 0. In (3.25) the quantity δik
is the kronecker delta symbol and it is either 0 or 1 according to the following definition
(
1 if k = i,
δik = (3.27)
0 if k 6= i.

Exercise: Confirm that δik transforms as a tensor.


Raising and lowering indices. In the case where the Riemannian space is specialized to a
Euclidean space, rectangular Cartesian coordinates can be defined and, in such a frame gij = δij .
In the Cartesian frame, covariant and contravariant vectors are indistinguishable. For example,
Euclidean vectors are expressed both as V i or Vi . In Riemannian spaces one is able to switch from
covariant components to contravariant components and vice versa using the metric as follows.
The covariant components Ai are related to the contravariant components Aj by

Ai = gij Aj . (3.28)

This process of converting contravariant components of a vector into covariant components is


termed lowering the index. Similarly one may raise the index with the aid of g ij by

B i = g ij Bj . (3.29)

In general any index of a tensor can be lowered or raised. For example, if Aij k is a (2,1) type
tensor, we define a (1,2) type tensor by

Ai jk = gjr Air k . (3.30)

Suppose that an index of the covariant metric tensor gij is raised. The result is

g k j = g kigij = δjk .

If both subscripts of gij are raised, the result is

g rig sj gij = g riδis = g rs .

The notation is entirely consistent, therefore gij , g ij , δji are taken to be the covariant, contravari-
ant and mixed components respectively of a single metric tensor.
Exercise: Show that by raising the first index in Ai Ai the magnitude (3.17) can be written
as
A2 = gij Ai Aj .

68
APM3713/1

Example 3.3.1

Express the two dimensional metric

ds2 = dx2 + dy 2

in polar coordinates
x = rcosθ, y = rsinθ
and determine the covariant and contravariant polar metric tensors.

Solution First the corresponding differential interval are

dx = cosθdr − rsinθdθ, dy = sinθdr + rcosθdθ,

and conversely,
1 1
dr = cosθdx + sinθdy, dθ = − sinθdx + cosθdy.
r r
It follows easily that
ds2 = dr 2 + r 2 dθ2 .
In polar coordinates the covariant metric tensor is then
!
1 0
gij = ,
0 r2

and the inverse metric is obtained by


!
co-factor of gij 1 0
g ij = = .
determinant ofgij 0 1/r 2

3.4 Covariant derivatives.

We shown in equation (3.9) that if V is an invariant field then its gradient ∂V /∂xi is a covariant
vector. However, note that if a covariant vector is differentiated, the result is not a tensor. For,
let Ai be a covariant vector field, so that

∂xk
Āi = Ak . (3.31)
∂ x̄i
Differentiating both sides with respect to x̄j , we obtain

∂ Āi ∂xk ∂xl ∂Ak ∂ 2 xk


Āi,j ≡ = + Ak . (3.32)
∂ x̄j ∂ x̄i ∂ x̄j ∂xl ∂ x̄i ∂ x̄j
The presence of the second term on the right hand side of this equations reveals that Ai,j does
not transform as a tensor. The source of the difficulty is that to define Ai,j , it is necessary to
compare the values assumed by the vector field Ai at two neighboring, but distinct, points P with
69
Chapter 3. Tensor analysis

coordinates xi and P ′ with coordinates (xi + dxi ). We denote the components of the covariant
vector at P by Ai = Ai (xj ) and at P ′ by Ai + dAi = A(xj + dxj ). It follows from Taylor’s
expansion that for P and P ′ close enough
∂Ai k
Ai (xk + dxk ) = Ai (xk ) + (x )dxj .
∂xj

Exercise. Show that the difference between the two vectors, namely,
∂Ai j
Ai (xk + dxk ) − Ai (xk ) = dAi = dx , (3.33)
∂xj
is not a vector. (Hint: check the transformation properties of dAi )

Parallel displacement

Suppose that the procedure for ordinary differentiation could be replaced by another, involving
the comparison of two vectors defined at the same point, the modified equation would be expected
to be a tensor equation featuring a new form of derivative which is a tensor. One way of achieving
this is based on the concept of parallel displacement. Suppose that Ai is displaced from the point
P , at which it is defined, to the neighbouring point P ′ , without change in magnitude or direction,
so that it may be thought of as being the same vector now defined at the neighboring point. We
denote the components of the parallel vector at P ′ by Ai + δAi and comparing this with the
components Ai + dAi of the vector field at P ′ (see Fig 3.3) naturally leads to a new vector that
we write as
dAi − δAi = Ai;j dxj , (3.34)

where Ai;j is the appropriate replacement for Ai,j . Since dxj is an arbitrary vector and the left
hand side is known to be a vector, it follows by the quotient theorem that Ai;j will be a covariant
vector.

Ai  Ai + δAi  PPP
dAi − δAi
  Pq
P
  i
  A + dAi
 
 
 
 
 
P P′

Figure 3.3: A vector diagram at point P ′

Affinity

The problem of defining a tensor derivative has now been re-expressed as the problem of defining
infinitesimal parallel displacement of a vector. We are at liberty to define parallel displacement
70
APM3713/1

of Ai from P to P ′ in any way we find convenient, however our definition must conform with
the definition of parallelism in Euclidean space as this is a special case of Riemannian space.
Consider a transformation from Riemannian coordinates xj to Euclidean Cartesian coordinates
y j . Let Bi (y j ) be the components at P of the Euclidean vector field and let Ai (xj ) be components
of the corresponding Riemannian vector field at the same point then

∂y i ∂xj
Ai = Bj , Bi = Aj . (3.35)
∂xj ∂y i

If the vector Ai is parallel displaced to the point P ′ the corresponding Euclidean vector at P ′
stays the same as that at P i.e., δBi = Bi − Bi = 0. Hence, from (3.35), we obtain
 j   j
∂y ∂y ∂2yj
δAi = δ Bj = δ B j = dxk Bj . (3.36)
∂xi ∂xi ∂xi ∂xk

Now substituting for Bj from the second equations in (3.35) gives

δAi = Γl ik Al dxk , (3.37)

where
∂ 2 y j ∂xl
Γl ik = . (3.38)
∂xi ∂xk ∂y j
Thus the δAi are bilinear in the Al and dxk . We shall adopt equation (3.37) as a definition of δAi
for general coordinate transformations in Riemannian spaces. The set of N 3 quantities Γl kl is
called an affinity and specifies an affine connection between the points of the Riemann space. A
space which is affinely connected possesses sufficient structure to permit the operations of tensor
analysis to be carried out within it. Also note that the components of the affinity (3.38) are
symmetric in the two lower indices, i.e.,

Γl kl = Γl lk . (3.39)

If we now use (3.33) and (3.37) we may write (3.34) as

∂Ai j
dAi − δAi = j
dx − Γk ij Ak dxj
∂x
 
∂Ai
= − Γ ij Ak dxj .
k
(3.40)
∂xj

The covariant derivative of a covariant vector. From (3.34) and (3.40) we define the
covariant tensor
∂Ai
Ai;j = − Γk ij Ak , (3.41)
∂xj
as the covariant derivative of Ai .

Note that if the components of the affinity all vanish over some region of Riemann space then
the covariant and partial derivatives are identical over this region. However, we will show below
71
Chapter 3. Tensor analysis

that Γi jk is not a tensor and as such Γi jk = 0 is not a tensor equation, and is only valid in the
particular reference frame being employed. In any other frame the components of the affinity
will, in general, be non-zero and the distinction between the two derivatives will be maintained.
In tensor equations which are to be valid in every frame, therefore, only covariant derivatives
may appear, even if it is possible to find a frame relative to which the affinity vanishes.
The question that remain is, how do we extend the process of covariant differentiation to
tensors of all ranks and types. Consider first an invariant field V . When V undergoes a parallel
displacement from point P to P ′ , its value remains unaltered, i.e., δV = 0 in all frames. Hence

∂V i
dV − δV = dx , (3.42)
∂xi
is the counterpart for an invariant of equation (3.34).

Covariant derivative of an invariant field. It follows from (3.34) and (3.42) that

V,i = V;i , (3.43)

that is, the covariant derivative of an invariant is identical with its partial derivative or its
gradient.

Now let B i be a contravariant vector field and Ai an arbitrary covariant vector. Then Ai B i
is an invariant and, when parallel displaced from P to P ′ its value remains unchanged. Thus

0 = δ(Ai B i ) = δ(Ai )B i + Ai δ(B i ), (3.44)

and hence by equation (3.37)


Ak δB k = −Γk ij Ak dxj B i . (3.45)

Since Ak is arbitrary we may equate its coefficients to obtain

δB k = −Γk ij dxj B i , (3.46)

which defines parallel displacement of B i . The covariant derivative then follows as before from

∂B k
 
k k
dB − δB = j
+ Γ ij B dxj ,
k i
(3.47)
∂x

and since dxj is arbitrary and dB k − δB k is a known vector, it then follows that:

The covariant derivative of a contravariant vector is given by

∂B k
B k ;j = + Γk ij B i . (3.48)
∂xj

72
APM3713/1

Similar, if Ai j is a tensor field, we consider the parallel displacement of the invariant Ai j Bi C j ,


where Bi and C j are arbitrary vectors. Then from δ(Ai j Bi C j ) = 0 and equations (3.34) and
(3.46) we obtain
δAi j = Γl jk dxk Ai l − Γi lk dxk Al j , (3.49)

from which it follows that


∂Ai j
Ai j;k = − Γl jk Ai l + Γi lk Al j , (3.50)
∂xk
is the covariant derivative of the type (1,1) tensor Ai j .

Rule for covariant differentiation. The following rule for finding covariant derivative
follows from examination of equation (3.50), namely, the appropriate partial derivative is first
written down and this is followed by affinity terms; the latter consist of an inner product of
the affinity and the tensor with respect to each of its indices in turn, prefixing a positive sign
when the index is contravariant and a negative sign when it is covariant.

In the remaining part of this section we shall demonstrate that the ordinary rules for
differentiation of sums and products also applies to covariant differentiation.

Addition. The right-hand side of (3.50) is linear in Ai j it then follows immediately that if

C i j = Ai j + B i j ,

then
C i j;k = Ai j;k + B i j;k .

Product. Suppose C i = Ai j B j then

∂C i
C i ;k = + Γi rk C r
∂xk
∂(Ai j B j )
= k
+ Γi rk Ar j B j
 i ∂x 
∂A j
= + Γ rk A j − Γ jk A r B j
i r r i
∂xk
 j 
∂B
+ − Γ rk B Ai j
j r
∂xk
= Ai j;k B j + B j ;k Ai j , (3.51)

which is the ordinary rule for the differentiation of a product.

Covariant divergence. If the covariant derivative of a tensor field is found and then contracted
with respect to the index of differentiation and any superscript, the result is called a
divergence of a tensor. The divergence of Ai is obtained from the contraction Ai ;i . From
the tensor C ij k two divergences can be formed, namely, C ij k;i and C ij k;j .

73
Chapter 3. Tensor analysis

Transformation of affinity

The manner in which each of the quantities occurring in (3.41) transforms is known, with the
exception of the affinity Γi jk . Relative to the x̄-frame, equation (3.41) is written as

∂ Āi
Āi;j = − Γ̄k ij Āk . (3.52)
∂ x̄j
Since Ai , Ai;j are tensors, they transform as

∂xr
Āi = Ar , (3.53)
∂ x̄i
∂xs ∂xt
Āi;j = As;t . (3.54)
∂ x̄i ∂ x̄j
Substituting (3.53), (3.54) into (3.52) gives

∂xs ∂xt ∂xr ∂xu ∂Ar ∂ 2 xr k ∂x


r
As;t = + Ar − Γ̄ ij Ar . (3.55)
∂ x̄i ∂ x̄j ∂ x̄i ∂ x̄j ∂xu ∂ x̄i ∂ x̄j ∂ x̄k
If we now use (3.41) in (3.55) to substitute for As;t we obtain

∂xs ∂xt r ∂ 2 xr k ∂x
r
− Γ A
st r = Ar − Γ̄ ij Ar . (3.56)
∂ x̄i ∂ x̄j ∂ x̄i ∂ x̄j ∂ x̄k
Recall that the tensor Ar is an arbitrary tensor, so we can equate coefficients of Ar in (3.56) to
obtain
∂xr ∂xs ∂xt r ∂ 2 xr
Γ̄k ij k = Γ st + . (3.57)
∂ x̄ ∂ x̄i ∂ x̄j ∂ x̄i ∂ x̄j
To eliminate ∂xr /∂ x̄k from the left hand side of equation (3.57), we multiply by ∂ x̄l /∂xr on both
sides and use the result that
∂xr ∂ x̄l ∂ x̄l
k r
= k
= δkl . (3.58)
∂ x̄ ∂x ∂ x̄
In this manner equation (3.57) then becomes:

The transformation law for Γijk .

∂ x̄l ∂xs ∂xt r ∂ x̄l ∂ 2 xr


Γ̄k ij = Γ st + , (3.59)
∂xr ∂ x̄i ∂ x̄j ∂xr ∂ x̄i ∂ x̄j
The affinity is not a tensor, since its the components do not transform like a tensor due to
the presence of the second term in the right-hand side of the transformation equation (3.59).

The transformation law of the affinity is linear in the components of an affinity but is not
homogeneous like a tensor transformation. This has the consequence that, if all the components
of an affinity are zero relative to one frame, they are not necessarily zero relative to another
frame.
74
APM3713/1

Parallel fields

Let us now return to the question of parallelism in Riemannian space. Consider two points P
and P ′ with coordinates xi and xi + dxi respectively and let Ai and Ai + dAi be the vectors
of a covariant vector field associated with these points. Earlier we introduced at P ′ the vector
Ai + δAi as the vector obtained by parallel displacing the components Ai from P to P ′ so that
the magnitude and direction stay unaltered. If the vector field is such that Ai + dAi and Ai + δAi
are the same for all points in the Riemannian space then the vector field is said to be a parallel
vector field. From (3.40) it follows that a parallel vector field satisfy the differential equation

∂Ai
j
= Γk ij Ak , (3.60)
∂x
or, simply

Ai;j = 0. (3.61)

Quite obviously a parallel contravariant vector B i is defined by

B i ;j = 0. (3.62)

3.5 Covariant derivative of the metric tensor

Consider the mixed metric tensor δik defined in (3.25). The components of δji are either zero or
one and hence δji is an invariant. So now taking the covariant derivative of δji using (3.50), we
obtain
i ∂δ i j
δj;k = − Γl jk δ i l + Γi lk δ l j = −Γi jk + Γi lk δ l j = 0. (3.63)
∂xk
i
By lowering and raising the indices i and j in the tensor equation δj;k = 0 we obtain respectively,

gij;k = 0, g ij ;k = 0. (3.64)

If we write out (3.64) in detail, using gij;k = 0, we establish the following lemma:

The Ricci lemma:


gij;k = gij,k − Γh ik ghj − Γh jk gih = 0. (3.65)
This result implies that the metric tensor behaves like a constant with respect to covariant
differentiation.

It is not difficult to show that by cyclicly permuting the indices i, j and k we can write two
equations similar to (3.65)

gki;j = gki,j − Γh kj ghi − Γh ij gkh = 0, (3.66)


gjk;i = gjk,i − Γh ji ghk − Γh ki gjh = 0. (3.67)
75
Chapter 3. Tensor analysis

Exercise Show that, from equations (3.65), (3.66) and (3.67), the components of the affine
connection maybe expressed in terms of the metric and its derivatives by
1
Γh ij = g hk (gki,j + gjk,i − gij,k ), (3.68)
2
and
1
Γkij ≡ ghk Γh ij = g hk (gki,j + gjk,i − gij,k ). (3.69)
2
Written in this form the components of the affinity are called Christoffel symbols; with Γkij
referred to as Christoffel symbols of the first kind and Γh ij as Christoffel symbols of the second
kind.

Example 3.5.1

Find the expression for the contracted affine connection Γh ih .

Solution. Note that although Γh ih is not a tensor, we can nevertheless talk about a contraction
of this object from (3.68) simply by putting j = h and summing over h. We thus have
1
Γh ih = g hk (gki,h + ghk,i − gih,k ),
2
which can be simplified in the following way. We consider g hk gki = δih and differentiate with
respect to xh to obtain
g hk ,h gki = −g hk gki,h.
By the same token g hk gih = δik gives

−g hk gih,k = g hk ,k gih = g kh ,h gik .

We thus have
1
Γh ih = g hk ghk,i . (3.70)
2
We now adopt the notation
|g| = detgij , (3.71)
p
and we note that |g| is a scalar density (see exercises) of weight 2, that is,

|ḡ| = J 2 |g|,

and from this it follows that


p p
|ḡ| = J |g|.
Now the differential of a determinant (3.71) can be found by differentiating each row separately
and summing the results we deduce that

d|g| = |g|g hk dghk , (3.72)


76
APM3713/1

so that
p 1 1 1
d(log( |g|) = p p d|g| = g hk dghk , (3.73)
|g| 2 |g| 2
and if we substitute this in (3.72) we obtain
∂ p p
Γh ih = (log |g|) = (log |g|),i . (3.74)
∂xi

Example 3.5.2

Prove that the covariant divergence of a vector density is equal to its ordinary divergence

Proof. Here we used the definition that the product of a scalar density of weight p and a type
(r, s) tensor defines a tensor density of weight p.
Let Ai be the vector and define the vector density |Ai | by
p
|Ai | = |g|Ai .

We now have to prove that


|Ai ;i| = |Ai ,i |. (3.75)
Firstly we note that, by (3.72)
|g|;i = |g|g hk ghk;i = 0,
by Ricci’s lemma. Secondly from (3.70) and (3.73) we may write
p 1p
( |g|),i = |g|g hk ghk,i
2
p
= |g|Γj ij , (3.76)

We thus have
p
|Ai ,i| = ( |g|Ai ),i
p p
= |g|Ai,i + |g|,iAi
p p
= |g|Ai,i + |g|Γj ij Ai
p
= |g|[Ai,i + Γj ij Ai ]
p
= |g|Ai;i
= |Ai;i |,

and this proves that the covariant divergence of a vector density is equal to the ordinary divergence
of the vector density.

Riemannian coordinates. Before leaving this section let us briefly mention a special coordinate
system which can often be very useful in tensor manipulations. We note that since Γi jk is not
a tensor, we can always find a coordinate system in which Γi jk = 0 at a point, such coordinate
systems are Riemann coordinates. It is, in fact, quite a simple matter to construct such a
77
Chapter 3. Tensor analysis

coordinate system but we won’t go through the details The actual point of the Riemannian
space where Γi jk = 0 is called the pole of the Riemannian coordinate system and we must
emphasize that in general Γi jk vanishes only at the pole –at a neighbourhood point Γi jk 6= 0,
which means that the derivative of Γi jk do not vanish at the pole of the Riemannian coordinate
system. The essential point about Riemannian coordinates is that covariant derivatives reduce
to partial derivatives at the pole. It is very often easy to prove a certain equation true in
Riemannian coordinates and if the equation is a tensor equation then it will hold true in any
coordinate system–this is the virtue of Riemannian coordinate system.

3.6 Geodesic equation

Consider now a finite curve that joins point P1 to point P2 . We denote this curve in parametric
form by C : xi = xi (t), where t is a local parameter. The proper length s of the curve follows
from (3.23) by writing
dxi dxj
ds2 = gij ,
dt dt
and integration yields Z t2 p
s= gij ẋi ẋj dt, (3.77)
C t1

where ẋi = dxi /dt. We know that in Euclidean space the shortest distance between two points
is a straight line. Let us look at the corresponding problem in Riemannian space with ds2 > 0.
That is, we seek those curves which minimize the distance functional (3.77). We know from
variational principles (see module APM312-3/PHY310-D) that it is necessary that the sought
curves satisfy the Euler-Lagrange equations
 
d ∂L ∂L
i
− i = 0, (3.78)
dt ∂ ẋ ∂x
where
p
L(xk , ẋk ) = gij ẋi ẋj . (3.79)
Now one can first show that
∂L 1 ∂L 1 ∂gjk j k
i
= gjiẋj , i
= ẋ ẋ , (3.80)
∂ ẋ L ∂x 2L ∂xi
 
d ∂L 1 dL 1 1 ∂gji j k
i
= − 2 gjiẋj + gji ẍj + ẋ ẋ , (3.81)
dt ∂ ẋ L dt L L ∂xk

where we recall that gij = gij (xk ). On substituting these results in (3.78) we obtain
   
1 j ∂gji 1 ∂gjk j k 1 dL j
gjiẍ + − ẋ ẋ − gjiẋ = 0. (3.82)
L ∂xk 2 ∂xi L dt
We now introduce the notation
∂gij
gij,k = ,
∂xk
78
APM3713/1

and note that by suitably interchanging dummy variables j, k we can write

gji,k ẋj ẋk = gki,j ẋj ẋk = gki,j ẋk ẋj .

It follows that equation (3.82) can be written as


 
1 j 1 j i 1 dL j
gjiẍ + (gki,j + gji,k − gkj,i) ẋ ẋ − gji ẋ = 0. (3.83)
L 2 L dt

If we multiply by g hi we obtain the Euler-Lagrange equation in terms of the Christoffel symbols


(3.68)

1 dL h
ẍh + Γh jk ẋj ẋk − ẋ = 0. (3.84)
L dt
Finally recall that in differential form the integral (3.77) can be written as

ds = Ldt,

so that L = ds/dt and the last term on the left hand side of (3.83) becomes

d2 s
 
1 dL h dt
ẋ = ẋh ,
L dt ds dt2

and if we choose s = t then this term vanishes and (3.83) becomes

d2 xh i
h dx dx
j
+ Γ ij = 0. (3.85)
ds2 ds ds
Now the choice of parameter s = t is equivalent to setting L = 1 and from (3.79) this introduces
the constraint
dxi dxj
gij = 1. (3.86)
ds ds
A curve which satisfies equation (3.85) and is constrained by the condition (3.86) is called a
geodesic curve. It follows that for the integral (3.77) to be minimum, the path of integration
must be along a geodesic curve. Note incidentally, that this is only a necessary condition. It
does not always follow that the integration along a geodesic minimizes the integral.1
Finally we point out here that the geodesic equation (3.85) is a tensorial equation (see
exercises). What is fairly interesting here is that neither d2 xh /ds2 nor Γh ij are tensors, however,
the non-tensorial part of the two terms cancel.
Exercise: Show that in Euclidean spaces the geodesic equations (3.85) and (3.86) can be
integrated to yield x(s) = αs + β, where α and β are constants. That is, the shortest distance
between two neighbouring points on a Euclidean manifold is a straight line.
1
Further conditions that analyse the extreme nature of geodesic curves are called the Weiestrass conditions
and are studied in the module APM312-4 which offers calculus of variations.

79
Chapter 3. Tensor analysis

3.7 Curvature

The metric tensor tells us everything about its geometry. At the same time, however, the form of
gij can be so distorted by our choice of coordinate system that it is often difficult to tell whether
the metric tensor describes a curved or a flat space. What we would like is some characteristic of
the geometry which can tell us at a glance whether the space is curved or flat. Such an object is
provided by the curvature tensor and, as we might expect, this tensor is a function of the metric
tensor and its derivatives.
There are a number of ways of arriving at the definition of the curvature tensor but they all
boil down to the same thing, namely, that curvature is a statement about the non-integrability of
the parallel transfer. Recall that a vector field B i is a parallel vector field if equation (3.62) i.e,

B i ,h = −Γi jh B j , (3.87)

is satisfied. The integrability conditions for equation (3.87) are given by

B i ,hk = B i ,kh . (3.88)

If we differentiate (3.87) with respect to xk we obtain

B i ,hk = −Γi jh,k B j − Γi jh B j ,k , (3.89)

and if we substitute for B i k from (3.87) we get

B i ,hk = −Γi jh,k B j − Γi jh Γj mk B m . (3.90)

The expression for B i ,kh is precisely the same expect that h and k are interchanged i.e.,

B i ,kh = −Γi jk,hB j − Γi jk Γj mh B m . (3.91)

If we use (3.90) and (3.91) in the integrability conditions (3.88) we obtain

B i ,hk − B i ,kh = −(Γi jh,k − Γi jk,h + Γi mh Γm jk


−Γi mk Γm jh )B j , (3.92)

where we have interchanged the dummy indices m and j in the ΓΓ terms. We can write this
equation as

B i ,hk − B i ,kh = Ri jhk B j , (3.93)

by setting

Ri jhk = Γi jk,h − Γi jh,k + Γi mk Γm jh − Γi mh Γm jk . (3.94)

We thus see that the integrability condition (3.88) is equivalent to

Ri jhk B j = 0. (3.95)
80
APM3713/1

That is, for B i to be a parallel vector field, (3.95) must be satisfied. At this stage there is no
guarantee that the object Ri jhk is a tensor. However, it is easy to show that for an arbitrary
tensor Ai calculating Ai ;hk and Ai ;kh also gives

Ai ;hk − Ai ;kh = Ri jhk Aj , (3.96)

and this proves, using the quotient theorem, that Ri jhk are components of a tensor. We call Ri jhk
the curvature tensor, in some books it is called the Riemann tensor or the Riemann-Christoffel
tensor. We see from (3.94) that Ri jhk depends on the derivatives of the Christoffel symbols and
thus it is a function of gij , gij,h and gij,hk .

Exercise: Show that if D ij is an arbitrary mixed tensor then we can show that (see exercises)

D i j;hk − D i j,kh = Ri mhk D m j − Rm jhk D j m , (3.97)

and this equation has obvious generalization for arbitrary tensors of higher rank.

Symmetry properties of Rijhk

We show below that the curvature tensor satisfies the following symmetry conditions:

(a) It is skew-symmetric in the last two indices

Ri jhk = −Ri jkh . (3.98)

(b) If we lower the super script i, that is,

Rijhk = gim Rm jhk ,

then we can show that Rijhk is skew-symmetric in the first pair of indices as well. To prove
this consider equation (3.97) with D i j replaced by gij , this yields

gij;hk − gij,kh = −Rm ihk gmj − Rm jhk gim , (3.99)

as the counterpart of (3.97)). But gij;h = 0 from (3.64) so that

Rjihk = Rm ihk gmj = −Rm jhk gim = −Rijhk . (3.100)

(c) The curvature tensor also obeys the “cyclic identity”:

Ri jhk + Ri kjh + Ri hkj = 0, (3.101)

often referred to as the Jacobi identity.


81
Chapter 3. Tensor analysis

(d) Finally, Rijhk is symmetric under an interchange of the first and second pair of indices

Rijhk = Rhkij . (3.102)

To prove (3.102) one has to use the Christoffel relation (3.67) first to show that
1 ∂
Rijhk = gim k [g mr (grj,h + ghr,j − gjh,r )]
2 ∂x
1 ∂
− gim h [g mr (grj,k + gkr,j − gjk,r )]
2 ∂x
+gim (Γn jh Γm nk − Γn jk Γm nh ) , (3.103)

and then show that this simplifies to


1
Rijhk = (gjh,ik + gki,jh − ghi,jk − gjk,ih)
2
+gnm (Γm jk Γn ih − Γm jh Γn ik ) . (3.104)

The identity (3.102) now follow directly from (3.104). In fact, equation (3.104) separates
the second derivatives of gij from the first derivatives in the expression for the curvature
tensor and this is useful in the theory of gravitation.

Independent components of Rijhk

With the aid of the symmetry properties (3.98), (3.100), (3.101) and (3.102) we can evaluate
the number of independent components of the curvature tensor. We recall that an N × N skew-
symmetric matrix has 12 N(N − 1) independent entries and an N × N symmetric matrix has
1
2
N(N + 1) independent entries. If we look at Rijkh then, since it is skew in both pairs of indices,
we have that there are 12 N(N − 1) independent components for both pairs i, j and h, k. Since
the tensor is symmetric under a change in the pairs of indices, the independent components are
reduced to   
1 1 1
N(N − 1) N(N − 1) + 1 .
2 2 2
As yet we have not used the cyclic identity (3.100). It is not difficult to see that unless i, j, h, k are
all different, this identity is included in one of the other three. It follows that the new identities
provided by (3.100) are simply equal to the number of combinations of N objects taken 4 at a
time, that is,
N! 1
NC4 = = N(N − 1)(N − 2)(N − 3),
4!(n − 4)! 24
and if we subtract this from the expression above we obtain
1 2 2
N (N − 1), (3.105)
2
as the total number of independent components of Rijhk . For example when N = 4 in the case of
relativity there are 20 independent components of Rijhk .

82
APM3713/1

Curvature is intrinsic. One can easily show that a Riemann space is Euclidean if and only if
the curvature tensor vanishes everywhere (see exercises). Thus the vanishing or non-vanishing
of the curvature tensor determines whether space is flat or curved. Now the curvature of space
is an intrinsic quantity and should not be confused with the curvature of a curve which is an
extrinsic quantity. If we look at the expression for the number of independent components of the
curvature tensor and put N = 1 (a curve is a 1-dimensional space) in (3.105) we see there are
zero components of the curvature tensor i.e., there is no such a thing as the intrinsic curvature
of a curve–the curvature must be measured by going outside of the curve. In many ways the
curvature tensor is a more natural way of looking at curvature–by viewing it from outside the
space.

Differential identities of Rijhk

In addition to the algebraic identities, the curvature tensor also satisfies differential identities
called the Bianchi identities and these are given by

Ri jhk;m + Ri jmh;k + Ri jkm;h = 0, (3.106)

or, equivalently
Rijhk;m + Rijmh;k + Rijkm;h = 0. (3.107)
Note that the above equations are indeed equivalent because lowering the index i is brought
about by multiplying through by the metric tensor and since the covariant derivative of the
metric tensor is zero we are entitled to multiply “under the semi-colon”. There are many ways
of proving (3.106)–perhaps the easiest is to work at the pole of a Riemannian coordinate system.
Recall that in this system the Γi jk are zero but that the Γi kj,h are not zero– also covariant
derivatives reduce to ordinary derivatives at the pole.

Contraction of Rijhk

At first sight it would appear that there are 6 contractions of the curvature tensor, namely
g ij Rijhk , g ih Rijhk , g ik Rijhk , g jhRijhk , g jk Rijhk and g hk Rijhk . It turns out, using the symmetry
properties of g ij and Rijhk that all these reduce to only one contraction, namely,

g ih Rijhk = Rh jhk ≡ Rjk . (3.108)

and this defines a second rank tensor Rjk called the Ricci tensor. We now show that there is
indeed only one contraction. Firstly g ij Rijhk and g hk Rijhk are both identically zero because of
the skew-symmetry of Rijhk in both pairs of indices. Secondly

g ik Rijhk = −g ik Rijhk = −Rjh ,

so that this contraction is the negative of the Ricci tensor. By the same token g jhRijhk is also
the negative of the Ricci tensor and g jk Rijhk is equal to the Ricci tensor. It is easy to show that
83
Chapter 3. Tensor analysis

the Ricci tensor is symmetric: from (3.108)

Rkj = g ih Rikhj = g ih Rhjik = Rjk , (3.109)

where we used the property that Rijhk is symmetric under the interchange of pairs of indices.
We can contract the Ricci tensor to obtain a scalar

R = g jk Rkj , (3.110)

called the curvature scalar or curvature invariant.


Let us now return to the Bianchi identities (3.107) and multiply through by g im to obtain

g im Rijhk;m + g im Rijmh;k − g im Rijmk;h = 0, (3.111)

This can be written as

g im Rijhk;m + Rjh;k − Rjk;h = 0. (3.112)

If we contract (3.112) by multiplying by g jk (using the fact that Rijhk = Rjikh ) we obtain

g im Rih;m + g jk Rjh;k − R;h = 0, (3.113)

or,

1
g jk Rjh;k − R;h = 0. (3.114)
2
Which we can write as
1
(Rk h − ghk R);k = 0. (3.115)
2
These are called the contracted Bianchi identities. The tensor in the bracket
1
Gk h = Rk h − ghk R (3.116)
2
is called the Einstein tensor. Notice that equation (3.115) says that the covariant divergence of
the Einstein tensor is zero, i.e.,
Gkh ;k = 0. (3.117)

This does not mean that the covariant derivatives Gkh ;j are zero– there must be a contraction
over the differentiating index and one of the indices of Ghk (it does not matter which one since
Ghk = Gkh ). Also notice that the Einstein tensor, being a function of the Ricci tensor Rij and
curvature scalar R which are contractions of the curvature tensor Ri jhk , is a function of gij , gij,h
and gij,hk . This tensor plays a fundamental role in the general theory of relativity.
84
APM3713/1

The Weyl tensor

Consider dimensional analysis for the case where N = 4. Here Rijhk has 20 independent
components and Rjk has 10. This means that we should be able to decompose Rijhk into
terms consisting of Rjk and some other tensor which has 10 independent components. This
decomposition is given by
1
Rijhk = (gih Rjk − gik Rjh − gjh Rik + gjk Rih )
2
1
− R (gih gjk − gik gjh ) + Cijhk . (3.118)
6
This equation can be regarded as a definition of the tensor Cijhk which is called the Weyl tensor
or the conformal tensor. It can be shown (see exercises) that the Weyl tensor has precisely the
same symmetries as the curvature tensor and in addition the equivalent of the Ricci tensor for
C ijhk is zero i.e.,
g ih Cijhk = C h jhk = 0. (3.119)
These represents 10 conditions on Cijhk so that it has 10 independent components. The Weyl
tensor also plays an important role in relativity.

3.8 Conclusion

The mathematical machinery of dealing with tensors is quite formidable. There are many
important equations in this chapter, but few need to be memorized. It is far more important to
understand their derivation. The main aim of the chapter is to give the students an idea what
the mathematics means.

Exercises

3.1 If Ai is a covariant vector field, show that Ai,j − Aj,i is a skew-symmetric second rank tensor
field.

3.2 Suppose that φ = B i Ai is an invariant for an arbitrary contravariant vector B i . Show Ai


transforms like a covariant vector.

3.3 Show that Ai,j − Aj,i = Ai;j − Aj;i where Ai is a covariant vector.

3.4 Suppose that φ = B i B j Aij is an invariant for an arbitrary contravariant vector B i . Show
that A(ij) (that is, the symmetric part of Aij ) is a second rank covariant tensor.

3.5 If Aij is a skew-symmetric covariant tensor, verify that

Bijk = Aij,k + Aki,j + Ajk,i

transforms as a tensor.
85
Chapter 3. Tensor analysis

3.6 If Aij is symmetric, prove that Aij;k is symmetric in the indices i and j.

3.7 The object γ i jk is an affine connection which is not symmetric in j and k (γ i jk and Γi jk
have the same transformation properties). Show that γ i [jk] is a (1,2) tensor.

3.8 Show that the metric on a sphere of radius a is given by

ds2 = a2 (dθ2 + sin2 θdφ2 ).

3.9 Show that the auto-parallels (ie., curves whose tangent vectors are parallel) of a Riemannian
space are also the geodesics of the space.

3.10 Show that


d2 xi j
i dx dx
k
+ Γ jk
ds2 ds ds
is a tensor equation, That is it transforms as components of a type (1,0) tensor

3.11 Prove that a Riemannian space is Euclidean if and only if the curvature tensor vanishes
everywhere.

3.12 Write down the metric of a sphere of radius a and hence calculate the curvature tensor,
the Ricci tensor and the curvature scalar for the sphere.

3.13 An N-dimensional Riemann space has the following metric

ds2 = eλ dxi dxi ,

where λ = λ(xk ). Show that the only non-zero Christoffel symbols of the second kind are
1 1
Γi jj = (δji − )λ,i , Γi ij = λ,j , (no summation).
2 2
Deduce that the scalar curvature of this space is given by
1
R = (N − 1)e−λ [λ,ii + (N − 2)λ,iλ,i ], (no summation),
4
where λ,i = ∂λ/∂xi and λ,ii = ∂ 2 λ/∂xi ∂xi

3.14 Two N-dimensional Riemannian spaces M and M̄ have the metric tensors gij and ḡij , k
being a constant. What are the relationship between the curvature tensors, Ricci tensors,
curvature scalar and Einstein tensors of the two spaces?

3.15 Show that the Weyl tensor has the same symmetries as the curvature tensor and show
further that all contractions of the Weyl tensor are zero.

86
APM3713/1

Chapter 4

Special Relativity

4.1 Introduction

In the first part of this chapter we introduce aspects of relativity that deal with the motion of
bodies without reference to the forces which produce those motions. Such a study is referred
to as a kinematic study. We focus on a relativistic treatment of kinematic properties such as
position, velocity and acceleration. Here we revisit the classical approach of Newtonian mechanics
and highlight the crucial role of an inertial reference frame. The central theme throughout this
chapter is a comparative analysis of measurements (of kinematics quantities) by observers in
relative uniform motion. In classical mechanics the equations that connect observations in inertial
reference frames are the Galilean transformation equations.
We show that the Galilean transformation equations do not extend to cover electromagnetic
theory, and indeed the rest of physics. A new kinematic approach was introduced by Einstein
in 1905. One of the postulates of Einstein’s theory is that the laws of physics are valid in all
inertial frames. This theory is inertial (i.e, excludes gravitation) and is called the special theory of
relativity. The equations that relate measurements by inertial observers in special relativity are
the Lorentz transformation equations and form the main theme of the last part of this chapter.
The aim of second part of this chapter is to obtain a generalization of non-relativistic
Newtonian dynamics to relativistic dynamics; the latter are valid under circumstances in which
relative speeds are of the order of the speed of light c. It may be best to re-iterate the notion
that a law of physics is a statement of the relationship between different physical entities, each
of which can be experimentally measured, at least in principle:
• In this context the principle of relativity implies that it is possible to state a physical law
in such a manner that the statement takes the same form relative to every inertial system.
• The principle of the invariance of the speed of light implies that the different inertial systems
are related through the Lorentz transformation equations.
In order that an equation be compatible with the principle of relativity, it must be possible
to express such an equation in the same form relative to every inertial reference frame. Thus the
form of the equation must stay the same under the Lorentz transformations. Such equations are
said to be covariant under the Lorentz transformations. It follows that any equation that is a
statement of a law of physics must be covariant under Lorentz transformations, or the equations
87
Chapter 4. Special Relativity

would violate the principle of relativity.


We focus entirely on the dynamics of a single particle. This we do for the simple reason
that Newtonian dynamics of a single particle is well understood and one is able to develop a
relativistic treatment from a purely mathematical approach. Admittedly this approach loses out
on most of the interesting physics. For example the notion of energy-momentum conservation
does not feature here. We also miss out on all the physics associated with elastic collisions and
inelastic collisions. However from the mathematical treatment we expect to be able to handle
mathematical manipulations of the dynamic variables introduced. In this regard the application
of the Lorentz transformations is paramount and various exercises are used throughout this
chapter to emphasize this point.

4.2 Newtonian relativity

Very briefly the situation in physics just prior to 1905 was the following. Newton’s first law (the
law of inertia) states that if no forces act on a body the body will either stay at rest or move
with uniform motion in a straight line (i.e. it will not undergo any acceleration)
It should be perfectly clear that an observer who finds Newton’s first law to be correct is a
very special observer. For example, a man on a trampoline or even a man fixed to the surface of
the earth would observe accelerations in a body on which no forces were acting. However, Newton
himself believed that there were observers or frames of reference in which his first law would be
correct and such observers are called inertial observers. A frame of reference is a conventional
standard of rest relative to which measurements can be made and experiments described. Thus
an inertial frame of reference is a frame in which the law of inertia – Newton’s first law – holds.
The objects whose motion we study may be accelerating with respect to such frames but the
frames themselves are unaccelerated. The geometry of the spatial relations in inertial reference
frames is Euclidean. This immediately excludes gravity since the latter destroys Euclidicity (as
shown by Einstein in his general relativity theory, the details of which are beyond the scope
of this module). Hence inertial frames are free-falling frames. Most of us have seen televised
pictures of space capsules in which astronauts are weightless. Such capsules are primarily inertial
reference frames.

Galilean transformations

It is trivial to work out the transformation law connecting two inertial frames. Consider an
inertial frame S, represented by cartesian coordinates (x, y, z), and another inertial frame S̄,
represented by (x̄, ȳ, z̄), which moves at a constant velocity v relative to S. Observers in the two
inertial frames use meter sticks, which have been compared and calibrated against one another,
and clocks which have been synchronized and calibrated against one another. The classical
procedure, (which we will re-visit later on) assumes that length measurements and time intervals
are absolute, i.e. they are the same for all inertial observers.
88
APM3713/1

y ȳ
6 6

v-

-
O@ x Ō @ x̄
@ @
@ @ v-
@ @
@ @
R
@
z R
@

Figure 4.1: Inertial reference frames S and S̄ as seen by S.

For convenience, we choose the three sets of axes to be parallel and allow their relative motion
to be along the common x, x̄ axis that is, v = (v, 0, 0) (see fig. 4.1). Hence, if we assume that
the origins of S and S̄ coincide at time t = 0, t̄ = 0 then a given event will have the coordinates
(x, y, z, t) and (x̄, ȳ, z̄, t̄) in S and S̄ respectively, where

x̄ = x − vt, ȳ = y, z̄ = z, t̄ = t. (4.1)

These are the Galilean coordinate transformation equations. It follows directly from this
transformations that
x̄˙ = ẋ − v, x̄¨ = ẍ, (4.2)
where we use the notation ẋ = dx/dt. In other words, the spatial coordinates (x and x̄) and
˙ depend on the reference frame in which they are measured whereas
the velocities (ẋ and x̄)
acceleration does not. We say that position and velocity are relative quantities whilst time and
acceleration are absolute.
One can show that quantities such as mass and force are unaffected by the Galilean
transformation, thus they too are absolute, so that the inertial observer will agree on the validity
of Newton’s second law and hence will agree on all dynamical processes. We can summarize
these statements in the so-called Newtonian principle of relativity viz. all inertial observers are
equivalent in so far as dynamical experiments are concerned. In other words, on the basis of
purely dynamical experiments it is not possible to determine a preferred frame of reference (e.g.
a frame at absolute rest) in which the equations of motion assume a particular simple form. For
example imagine a man in an idealized train i.e., there are no bumps or curves in the track, the
compartment is soundproof and the blinds are drawn so that the man cannot see outside. If
the train travels with constant velocity then the man will be unable to determine whether he is
moving or not.
89
Chapter 4. Special Relativity

Example 4.2.1

Two electrons are ejected in opposite directions from radioactive atoms in a sample of radioac-
tive material at rest in the laboratory. Each electron has a speed of 0.67c as measured by a
laboratory observer. What is the speed of one electron as measured from the other, according to
the Galilean velocity addition theorem?

Solution Here we may regard one electron as the S frame, the laboratory as the S̄ frame, and
the other electron as the object whose speed is sought. In the S̄ frame, the objects speed is
0.67c, moving, say, in the positive x̄ direction, and the speed of the S frame is 0.67c moving in
the negative x̄ direction. Thus ūx = +0.67c and v = +0.67c, so that the other electron’s speed
with respect to the S frame is ux = ūx + v = 1.34c.

4.3 Maxwell’s equations

This situation did not end with dynamics. Attempts to determine a frame at absolute rest on the
basis of electromagnetic experiments also failed. Electromagnetic theory was developed in the
nineteenth century and the fundamental equations of this theory are called Maxwell’s equations:

ǫ0 div E = ρ, (4.3)
div B = 0, (4.4)
curl B = µ0 (j + ǫ0 ∂E/∂t), (4.5)
curl E = −∂B/∂t, (4.6)

where the quantities ǫ0 , ρ, µ0 tell us how electric charges and currents produce electric E and
magnetic B fields. Electric fields are produced by electric charges, both stationary and moving
(4.3), and also by varying magnetic fields (4.6). Magnetic fields are somewhat different. There are
no magnetic charges in nature (4.4) but a moving electric charge produces a magnetic field as well
as an electric field (4.5). In addition, a varying electric field produces a magnetic field (4.5). The
process by which a varying electric field creates a varying magnetic field which creates a varying
electric field and so on, is the mechanism by which electromagnetic radiation is propagated. All
this information is contained in Maxwell’s equations. One of the consequences of Maxwell’s
equations is that all forms of electromagnetic radiation (such as light, X-ray, radio waves, etc.)
are propagated with the same velocity which is usually denoted by c (c = 3 × 108 m/s).
In the nineteenth century it was believed that all forms of radiation required a medium for
its transmission and since light travelled through vacuum without any difficulty, a “substance”
called ether was introduced as the required medium. The ether was assumed to fill all space and
to be the medium with respect to which light propagated at the speed c. It followed then that
90
APM3713/1

an observer moving through ether with velocity v would measure a velocity c̄ for a beam of light,
where c̄ = c − v if the beam of light is propagated in the same direction as v, or c̄ = c + v if the
beam of light is propagated in the opposite direction of v.
The most notable of the experiments performed was by Michelson and Morley in 1887. No
evidence of the ether stream was detected. The idea was then put forward that the velocity
of light must be dependent upon the velocity of the source relative to the observer. However,
by observing light emitted form a double-star it was found that the light rays emitted by the
approaching and the receding components of the star had the same velocity so this hypothesis had
to be abandoned. Clearly this contradicts the Galilean transformation equations. The Galilean
relativity does apply to Newton’s laws of mechanics but not to Maxwell’s laws of electrodynamics.

4.4 Einstein’s postulates

Both Newton’s laws and Maxwell’s laws prove to be experimentally correct. What is needed
is a new theory that encompass both Newtonian and Maxwellian laws. Imagine a man in an
idealized train i.e. there are no bumps or curves in the track, it is not surprising that in fact he
cannot perform a purely dynamical experiment. Whatever apparatus he decides to use he must
use eyesight to observe the experiment so that light propagation enters the picture. Einstein
proposed two postulates on which the special theory of relativity is based:

• The Principle of the Constancy of the Speed of Light: The speed of light in vacuum
has the same value c in all inertial systems.

• The Principle of Relativity: The laws of physics are the same in all inertial systems.
No preferred inertial system exists.

Einstein’s relativity principle goes beyond the Newtonian relativity principle, which dealt
only with the laws of mechanics, to include all the laws of physics. The second principle flatly
contradicts the Galilean velocity transformation (4.2) and requires a new set of transformation
equations. The rest of the chapter is devoted to these crucial transformation equations.

4.5 The Lorentz transformation equations

Consider two inertial reference frames S and S̄ having a relative velocity v. Each frame has its
own meter sticks and synchronized clocks. The observers note two lightning bolts, each hitting
and leaving permanent marks in the frames. Assume that each inertial observer was located
exactly at the midpoint of the marks left on his reference frame. In fig 4.2 the marks are left at
A, B on the S-frame and at Ā, B̄ on the S̄-frame, the observers being at O and Ō. Suppose that
O finds that the lightning bolts struck simultaneously then fig 4.2 (c) shows that the lightning
91
Chapter 4. Special Relativity

Ā Ō B̄
v  *
A O
*
B

Ā Ō B̄
v  A O B

Ā Ō B̄
v  A O B

Ā Ō B̄
v  A O B

Figure 4.2: The point of view of the S frame, the S̄ frame moving to the right. A light wave leaves
A, Ā and B, B̄ in (a). Successive drawings corresponds to the assumption that the event AĀ and the
event B B̄ are simultaneous in the S frame. In (b) one wavefront reaches O. In (c) both wave fronts
reach Ō. In (d) the other wavefront reaches O.

bolts are not simultaneous to Ō. We could have well supposed that the lightning bolt struck so
that Ō found them to be simultaneous in which case O would have concluded that they were not
simultaneous. Hence the concept of simultaneity, and hence time, is a relative concept, not an
absolute one.

Some other conclusion arise from the relativity of simultaneity. To measure the length of an
object means to locate its endpoints simultaneously. Thus length measurement is also relative.

Perhaps the easiest way of seeing that both length and time are relative concepts is by
converting “lengths” into “time” or vice versa using the constancy of the velocity of light. A
typical example of such a procedure is the notion of a “light year”. This procedure also illustrates
the fact that there is little virtue in separating the role of distance and time in the theory of
relativity.

It is obvious from what has been said thus far that the precise location of an event requires
three spatial coordinates (x, y, z) and the time t at which the event occurs. Hence, in future, we
shall simply label an event by 4 coordinates (x, y, z, t).

We have already mentioned that the Galilean transformation of classical mechanics is quite
evidently incorrect in so far as the postulates of special relativity are concerned and the problem
at hand is to determine the transformation law from one inertial system to another which is in
accordance with these postulates.
92
APM3713/1

Derivation

Consider two inertial frames S and S̄ having a relative velocity as discussed previously (see
fig. 4.1) and we assume as before that the origins O, Ō of S, S̄ coincide at t = t̄ = 0 and at that
instant we suppose that a light signal is emitted from the coinciding points O, Ō. An observer
situated at O will observe a spherical wave-front emanating from O and the equation of the
wave-front will be
x2 + y 2 + z 2 = c2 t2 (4.7)

where, as usual, c is the velocity of light. In view of the postulate concerning the constancy of
the speed of light, an observer at Ō will also observe a spherical wave-front emanating form Ō
satisfying the equation
x̄2 + ȳ 2 + z̄ 2 = c2 t̄2 . (4.8)

The fact that both observers see spherical wave-fronts is quite plausible if one bears in mind the
relativity of simultaneity i.e., different points which are reached by the wave-front in S are not
necessarily reached simultaneously in S̄. We now assume that the transformation that we are
seeking is linear. This assumption can be justified by the requirement that uniform rectilinear
motion in S also be uniform and rectilinear in S̄. Now in view of the special construction of S
and S̄ we have, additionally,that
ȳ = y, z̄ = z. (4.9)

So now the linear transformation can be written as

x̄ = Ax + Bt, (4.10)
t̄ = Cx + Dt (4.11)

where A, B, C, D are some functions depending on v—the velocity of S̄ relative to S.


Consider the origin Ō as viewed from O. In this case x̄ = 0 and x = vt so that (4.10) becomes

0 = A(vt) + Bt, B = −Av. (4.12)

Now consider the origin O as viewed from Ō. Here we have x = 0 and x̄ = −v̄t so that (4.10)
and (4.11) become

−v t̄ = Bt,
t̄ = Dt

from which it follows that


B = −Dv (4.13)

and from (4.12) we get


A = D. (4.14)
93
Chapter 4. Special Relativity

Substituting (4.12), (4.13), (4.14) in (4.10),(4.11) yields

x̄ = A(x − vt), (4.15)


t̄ = Cx + At. (4.16)

To determine the constants A and C we return to the spherical wave-fronts (4.7),(4.8) emanating
from O and Ō respectively. If we now substitute (4.9), (4.15) and (4.16) into (4.8) we get

A2 (x − vt)2 + y 2 + z 2 = c2 (Cx + At)2 . (4.17)

Rearranging the terms give us

(A2 − c2 C 2 )x2 + y 2 + z 2 − 2(vA2 + c2 AC)xt = A2 (c2 − v 2 )t2 . (4.18)

In order for (4.18) to agree with (4.7), which represents the same thing, we must have

A2 − c2 C 2 = 1, (4.19)
vA2 + c2 AC = 0, (4.20)
A2 (c2 − v 2 ) = c2 . (4.21)
p p
From (4.21) it follows that A = 1/ 1 − v 2 /c2 and using this in (4.20) gives C = −v/(c2 1 − v 2 /c2
with the (4.19) identically satisfied. Now by substituting these values into (4.15),(4.16) and incor-
porating (4.9), we obtain the sought-after equations,

(x − vt)
x̄ = p , (4.22)
1 − v 2 /c2
ȳ = y, (4.23)
z̄ = z, (4.24)
t − (v/c2 )x
t̄ = p . (4.25)
1 − v 2 /c2

We call this the special or 2-dimensional Lorentz transformation. Note that equations (4.10),
(4.11) can be written in matrix form as
" # " #" #
x̄ A B x
=
t̄ C D t
and that the 2-dimensional Lorentz transformation maybe regarded as a rotation of the 2-
dimensional spacetime axes x and t.
In general a linear transformation that keeps the form of the equation of the wave-front
coordinate-invariant satisfies the condition

c2 (t2 − t1 )2 − (x2 − x1 )2 − (y2 − y1 )2 − (z2 − z1 )2 (4.26)


= c2 (t̄2 − t̄1 )2 − (x̄2 − x̄1 )2 − (ȳ2 − ȳ1 )2 − (z̄2 − z̄1 )2 , (4.27)
94
APM3713/1

where the two events (x1 , y1, z1 , t1 ) and (x2 , y2 , z2 , t2 ) relative to one inertial system are related
to the coordinates of the same two events, (x̄1 , ȳ1 , z̄1 , t̄1 ) and (x̄2 , ȳ2 , z̄2 , t̄2 ) in another inertial
reference system. Such a transformation is called a 4-dimensional Lorentz transformation. One
may write (4.27) as

ds2 = c2 (t2 − t1 )2 − (x2 − x1 )2 − (y2 − y1 )2 − (z2 − z1 )2 (4.28)


= c2 (t̄2 − t̄1 )2 − (x̄2 − x̄1 )2 − (ȳ2 − ȳ1 )2 − (z̄2 − z̄1 )2 , (4.29)

meaning that the interval ds between neighboring events on a spacetime curve representing a
photon history is zero, such spacetime curves are called null curves. In this manner we are the
able to write a metric as follows; first use the notation

x0 = ct, x1 = x, x2 = y, x3 = z

then the interval (metric), in cartesian coordinates, takes the form

ds2 = ηµν dxµ dxν , (4.30)

on using the summation convention, where


 
1 0 0 0
0 −1 0 0
η= . (4.31)
 
0 0 −1 0 
0 0 0 −1
The metric of the above form is called a Lorentz metric.
Before looking at the meaning of these equations, we first put them through two essential
tests. First, if we were to exchange our frames of reference, the only change required by relativity
is the physical one of a change in relative velocity from v to −v. That is, from the S̄ frame the
S frame moves to the left whereas from S the S̄ frame moves to the right.

Exercise:

(a) Show that solving equations (4.22—4.25) for (x, y, z, t) in terms of the barred coordinates
yields
(x̄ + v t̄)
x = p , (4.32)
1 − v 2 /c2
y = ȳ, (4.33)
z = z̄, (4.34)
t̄ + (v/c2)x̄
t = p , (4.35)
1 − v 2 /c2
which are identical in form to (4.22—4.25) except that as required v changes to −v. Another
requirement is that for speeds v small compared to c the Lorentz equations should reduce
to the correct Galilean transformation equation.
95
Chapter 4. Special Relativity

(b) Show also that this is the case, that is, when v/c << 1, equations (4.22)—(4.25) become
equations (4.1).
Let us now examine some of the consequence of the Lorentz transformation. The sort of
problem we are going to be concerned with is the following. Suppose S and S̄ are two inertial
frame with a relative velocity as described earlier and suppose that they are each supplied with a
measuring rod and a clock and we assume the clocks are identical. We would like to know what
will be the outcome of observations made by both S and S̄ on the other’s measuring rod and
clock. As one might expect the results are all contained in the Lorentz transformation equations
(4.22)—(4.25) and (4.32)–(4.35) but it is of utmost importance that the correct equation be
chosen and the reason for the choice be understood completely. To do this we must say exactly
what we mean by, for example, the measurement by S̄ of the length of the measuring rod which
is at rest relative to S.

Simultaneity

Consider two inertial frames S and S̄ having a relative as in fig 4.1. Two events are said to
be simultaneous in S if they both occur at time t = t0 . However for these events the Lorentz
equation (4.25) shows that
t0 − (v/c2 )x
t̄ = p . (4.36)
1 − v 2 /c2
the times measured in S̄, depend on the position x in S. The events are therefore not simultaneous
in S̄.

Length contraction

Let us commence with this precise example. Suppose that the measuring rod at rest relative to
S has length L in S’s reference frame i.e. if the coordinate of the endpoints of the rod are x1 and
x2 then x2 − x1 = L. We notice that L is independent of S’s time t. Suppose that S̄ wishes to
measure the length of this rod. We define the length of the rod according to S̄ to be L̄ ≡ x̄2 − x̄1
where both x̄1 and x̄2 are to be recorded by S̄ at the same instant of time t̄. Now the Lorentz
transformation does everything. We simply choose the equation which involves x, x̄ and t̄ but is
independent of t. This is equation (4.32) viz.
(x̄1 + v t̄)
x1 = p ,
1 − v 2 /c2
(x̄2 + v t̄)
x2 = p
1 − v 2 /c2
and, on subtracting, this gives

L = p , or
1 − v 2 /c2
p
L̄ = L 1 − v 2 /c2 .
96
APM3713/1

p
Now v can never exceed c (see exercises) so that 1 − v 2 /c2 ≤ 1 and hence

L̄ ≤ L.

Obviously the situation is symmetric with respect to S and S̄ i.e., consider the rod at rest in S̄
(i.e., L̄ = x̄2 − x̄1 is independent of t̄). Then S must record the coordinates of the endpoints
of the rod at the same instant of time t. We seek the equation which connects x and x̄ and is
independent of t̄ viz (4.22). We obtain as above
L
L̄ = p , or
1 − v 2 /c2
p
L = L̄ 1 − v 2 /c2 .

So that in this case


L ≤ L̄.
We have thus established that the length of a rod as measured by an observer moving relative
to it is always shorter than the length as measured by an observer who is at rest relative to the
rod. This contraction of the rod is called Lorentz contraction. The length as measured by an
observer who is at rest relative to it is called the proper length. By virtue of equation (4.23)(4.24)
and (4.33)(4.34) it is obvious that the dimensions of the rod perpendicular to its length are
unaltered so that its volume undergoes the same Lorentz contraction as its length (see exercises).

Time dilation

It is reasonable to expect that time will also suffer a contraction which is generally referred to as
time dilation but the situation here is slightly more complicated. Suppose that two events take
place at the same spatial point x̄ in S̄ at times t̄1 and t̄2 . The time difference △t̄ = t̄2 − t̄1 is
clearly independent of x̄. Suppose that S assigns times t1 and t2 to the respective events. Clearly
these events cannot occur at the same spatial point in S but rather at the points x and x + v △ t.
Equation (4.25) relates t and t̄ independently of x̄ so that

t1 − (v/c2)x
t′1 = p , (4.37)
1 − v 2 /c2
t2 − v(x + v △ t)/c2
t̄2 = p . (4.38)
1 − v 2 /c2

It then follows that


△t − v 2 △ t/c2
t̄2 − t̄1 = △t̄ = p
1 − v 2 /c2
p
= △t 1 − v 2 /c2 (4.39)

and hence
△ t ≥ △t̄. (4.40)
97
Chapter 4. Special Relativity

This means that if △t is, say 1 hour then △t̄ will be less than 1 hour. In other words, S will
maintain that S̄’s clock is running slower than his own.

Exercise: Show that time dilation is symmetric i.e., consider two events which occur at times
t1 and t2 at the same spatial points x in S. If S̄ accords the times t̄1 and t̄2 respectively to the
events, show that △t̄ ≥ △t.

Example 4.5.1

Two events in a particle’s life occur, relative to an inertial observer S, at time t1 with ct1 = 2m
and at the point x1 = 1m, y1 = z1 = 0, and at time t2 with ct2 = 5m and at the point
x2 = 3m, y2 = z2 = 0.

(a) Find the average speed relative to S with which the particle moved between the two events.

(b) An inertial observer S̄ moves with velocity v = (4c/5)x̂ relative S. The origins of the two
systems coincided at time t = t̄ = 0. Find the coordinates of the two events relative to S̄.

(c) Find the distance between the two events relative to S̄ and also the time between the two
events relative to S̄.

(d) Find the average speed relative to S̄ with which the particle moved between the two events.

Solution

(a) In S we have that x2 − x1 = 2m and t2 − t1 = (ct2 − ct1 )/c = 1 × 10−8 sec. Therefore the
average speed of the particle is v = 2 × 108 m/sec

(b) Using the Lorentz transformation (4.22) and (4.25) the position relative to S̄ of the particle
at the first event is x̄1 = −1m and the time is t̄1 = 2/3 × 10−8sec. At the second event
x̄2 = −5/3m and t̄2 = 13/9 × 10−8 sec.

(c) The distance relative to S̄ is given by x̄2 − x̄1 = −2/3m and the time between the two events
is t̄2 − t̄2 = 7/9 × 10−8 sec.

(d) The average speed of the particle relative to S̄ is given by


v̄ = (x̄2 − x̄1 )/(t̄2 − t̄1 ) = −6/7 × 108 m/sec.

The proper time separating the two events is the time measured by a clock which is at rest
relative to the events. All observers moving relative to this given clock would find that the time
separating the events is longer than the proper time.
98
APM3713/1

Twin paradox

The symmetry of the time dilation effect between two inertial observers gives rise to the famous
twin paradox. Let the twins be A and B and suppose that B leaves the earth on a spacecraft
at high velocity. Twin A maintains that B’s time is slower than his own i.e. his heart-beat,
for example, slows down (since heart-beats can be regarded as a type of clock) and so does his
whole physical metabolism. In other words according to A, B ages at a slower rate than himself
so that when B finally returns to the earth A claims that B is much younger than himself. But
because of the symmetry of time dilation B is entitled to argue in precisely the same way as A
so that on his return to earth, B would claim that A has aged less than himself giving rise to a
paradox. The solution is quite simply that either A or B is not an inertial observer and hence
cannot apply the equations of special relativity. In this particular case A is an inertial observer
(if we neglect the acceleration of the earth around the sun) and hence A is entitled to apply the
Lorentz transformation equations to B. On the other hand, B must at some stage of his journey
undergo a fairly substantial acceleration to attain a high velocity relative to A, which means that
he is not an inertial observer. So, in fact, B will be younger than A when he returns to the earth.

Velocity transformation

To conclude our discussion on the Lorentz transformations, we derive an expression for the
addition of velocities in special relativity. Consider the usual set-up of two inertial frames S and
S̄ with relative velocity v along the x − x̄ axis. Let u = (ux , uy , uz ) and ū = (ūx , ūy , ūz ) denote
the velocity of a particle in S and S̄ respectively. Two distinct velocity transformation equations
are obtained as follows.
Taking differentials throughout equations (4.32) — (4.35) gives
dx̄ + vdt̄
dx = p , (4.41)
1 − v 2 /c2
dy = dȳ, (4.42)
dz = dz̄, (4.43)

and
dt̄ + vdx̄/c2
dt = p . (4.44)
1 − v 2 /c2
On dividing equations (4.41), (4.42) and (4.43) by (4.44), we obtain
dx̄
dx dx̄ + vdt̄ dt̄
+v
= = , (4.45)
dt dt̄ + vdx̄/c2 1 + cv2 dx̄
dt̄
dȳ
p p
dy dȳ 1 − v 2 /c2 dt̄
1 − v 2 /c2
= = , (4.46)
dt dt̄ + vdx̄/c2 1 + cv2 dx̄
dt̄
p dz̄
p
dz dz̄ 1 − v 2 /c2 dt̄
1 − v 2 /c2
= = (4.47)
dt dt̄ + vdx̄/c2 1 + cv2 dx̄
dt̄

But ux = dx/dt, uy = dy/dt, uz = dz/dt and ūx = dx̄/dt̄, ūy = dȳ/dt̄, ūz = dz̄/dt̄, so that
99
Chapter 4. Special Relativity

ūx + v
ux = , (4.48)
1 + ūx v/c2
p
ūy 1 − v 2 /c2
uy = , (4.49)
1 + ūx (v/c2 )
p
ūz 1 − v 2 /c2
uz = . (4.50)
1 + ūx (v/c2 )

We can write the corresponding inverse transformation by merely changing v to −v and


interchange the barred and unbarred quantities

ux − v
ūx = , (4.51)
1 − ux v/c2
p
uy 1 − v 2 /c2
ūy = , (4.52)
1 − ux (v/c2)
p
uz 1 − v 2 /c2
ūz = . (4.53)
1 − ux (v/c2 )

Now consider the special case wherein the velocity of the particle is along the x − x̄ direction
that is, u = (u, 0, 0). The velocity transformation equation and its inverse maybe written from
(4.48), (4.51) as

ū + v
u= , (4.54)
1 + ūv/c2
u−v
ū = , (4.55)
1 − uv/c2

and are called the relativistic or Einstein velocity addition equations. One can show from (4.54),
(4.55) that the addition of two velocities, each smaller than c, cannot exceed the velocity of light.

Example 4.5.2

In example 4.2.1 we found that when two electrons leave a radioactive sample in opposite direc-
tions, each having a speed 0.67c with respect to the sample, the speed of one electron relative to
the other is 1.34c according to classical physics. What is the relativistic result?

Solution: We may regard one electron as the S frame, the sample as the S̄ frame, and the other
electron as the object whose speed in the S frame we seek. Then ū = 0.67c, v = 0.67c and
100
APM3713/1

using (4.54) it follows that u = 0.92c. The speed of one electron relative to the other is less than
c.

Example 4.5.3

(a) A pulse of light travels with velocity cȳˆ relative to an observer S̄. If S̄ travels with velocity
vx̂ relative to another observer S, determine the velocity of the pulse of light relative to S.

(b) Show explicitly that the speed of the light pulse relative to S is c.

Solution

(a) Let us assume that the light pulse starts from the origin of the S̄ frame at time t̄ = 0. Then,
at later time t̄, the light pulse is at the point with coordinates x̄ = z̄ = 0, ȳ = ct̄. The
coordinates of x, y, z of that point and the time t at which the light pulse is there, relative
to S, are given by (4.32)–(4.35)

t̄ (v t̄)
t= p , x= p , y = ȳ = ct̄, z = 0.
1 − v 2 /c2 1 − v 2 /c2

The velocity of the light pulse relative to S is given by the components

dx
ux = = v,
dt
dy p √
uy = = c 1 − (v 2 /c2 ) = c2 − v 2 ,
dt
dz
uz = = 0.
dt

(b) The speed v of the light pulse relative to S is then given by

u2 = (u2x + u2y + u2z ) = v 2 + (c2 − v 2 ) = c2

thus, v = c.

4.6 Conclusion

In any physical theory a clear and precise development of kinematic properties is crucial. We
have seen how the Galilean approach faltered when extended to the electromagnetic theory. The
point to be noted is that the Newtonian approach is still valid though when using it one must
keep its limitations in mind. The kinematic properties of the Lorentzian approach forms the
corner stone upon which the dynamics of Einstein’s special relativity theory is based.
101
Chapter 4. Special Relativity

4.7 Exercises

4.7.1 Momentum is conserved in a collision of two objects as measured by an observer on a


uniformly moving train. Show that momentum is also conserved for a ground observer.

4.7.2 Show that the electromagnetic wave equation

∂2ψ ∂2ψ ∂2ψ 1 ∂2ψ ∂2ψ


+ + + − =0
∂x2 ∂y 2 ∂z 2 c2 ∂t2 ∂x2

does not retain its form (i.e., is not invariant) under the Galilean transformation equations.

4.7.3 A pulse of light, emitted at the origin at the time t = 0 relative to S, lies on the surface
of the sphere,
x2 + y 2 + z 2 = c2 t2

at time t relative to S. On what surface relative to S̄ would the light pulse lie if the Galilean
transformation were applicable?

4.7.4 A train 1km long travels at 60km/hr past a station. An observer S̄ on the train uses
a coordinate system with the origin at the rear of the train and with the positive x̄ axis
pointing toward the front of the train. An observer S on the station platform uses a
coordinate system with the origin at one end of the platform and with the positive x axis
pointing in the direction of the moving train. The two observers set their clocks at t = 0
at that instant at which the two origins coincide.
(a) Find t and t̄ for that instant when the front of the train passes the origin of S.
(b) Find the coordinates x and x̄ of the rear of the train for that instant when the front of
the train passes the origin of S.
(c) Find the coordinates x and x̄ of the front end of the train at t = t̄ = 0.
(d) Find the coordinates x and x̄ of the front of the train and also those of the back at
time t = 1min.

4.7.5 An inertial observer measures the length of a meter stick, moving along the direction of
its length, to be 57cm. How fast is the meter stick moving?

4.7.6 The ends of a meter stick lie along the x̄ axis of the inertial frame S̄ at points x̄1 = 2.00m
and x̄2 = 3.00m. Another inertial observer S measures the length of the meter stick at the
time t0 to be ct0 = 3.00m. The relative velocity between the two frames is v = 4c/5.
(a) Find the positions relative to S of the endpoints of the S̄ stick at the time of the
measurement.
(b) Find ct̄1 and ct̄2 , where t̄1 and t̄2 are the relative times to S̄ of the two events in the
measurements by S.
102
APM3713/1

(c) At time t̄0 given by ct̄0 = 3.00m, S̄ measures the distance between the two points in S
that are described in (a). Find the positions relative to S̄ of those points at the time t̄0 .

(d) Find ct1 and ct2 , where t1 and t2 are the times relative to S of the two events in the
measurement by S̄ described in (c).

4.7.7 Observer S measures the length of an S̄ meter stick whose length lies along the direction
of the relative motion of S and S̄. The results is 92.8cm.

(a) Observer S̄ measures the length the meter stick of S. What results does he obtain?
(b) What time elapse, according to S̄, during which the hands of a clock of S move 1hr
ahead?

4.7.8 Prove the invariance of the electromagnetic wave equation in relativity by showing that
the corresponding differential operator is an invariant. That is, show that

∂2 ∂2 ∂2 1 ∂2 ∂2 ∂2 ∂2 1 ∂2
+ + − = + + −
∂x2 ∂y 2 ∂z 2 c2 ∂t2 ∂x′2 ∂y ′2 ∂z ′2 c2 ∂t′2

when the space-time variables are related by the Lorentz transformations.

4.7.9 Two observers in the S frame, A and B, are separated by a distance of 60m. Let S ′ move
at the speed 53 c relative to S, the origins of the two systems, O ′ and O, being coincident at
t′ = t = 3 × 10−7 sec (i.e., 90/c). The S ′ frame has two observers, one at A′ and one at B ′
such that, according to clocks in the S frame, A′ is opposite A at the same time that B ′ is
opposite B.
(a) What is the reading on the clock of B ′ when B ′ is opposite B? Do this twice: first, use
the direct Lorentz transformation to find t′ ; second, use the inverse Lorentz transformation
but again solve for t′ . Do the answers agree? (Note: x and x′ are related as improper and
proper lengths).
(b) The S ′ system continues until A′ is opposite B. What is the reading on the clock of B
when B is opposite A′ ?
(c) What is the reading on the clock of A′ when A′ is opposite B? Do this also in two ways:
first, use the Lorentz transformations; second, use the concept of proper and improper time
intervals. (You may find it convenient to express time in units of 1/c.)

4.7.10 An airplane travels at a speed of 1500km/hr.

(a) Calculate the percentage change in its length due to Lorentz contraction relative to the
earth.

(b) How far does the plane travel before the pilot’s travel watch is slowed down by 1sec
relative to the earth?
103
Chapter 4. Special Relativity

4.7.11 An airplane 40m in length in its rest system is moving at a uniform velocity with respect
to earth at a speed of 630m/sec.
(a) By what fraction of its rest length will it appear to be shortened to an observer on
earth.
(b) How long will it take by earth clocks for the airplane’s clock to fall behind by 1
microsecond? (assume that special relativity only applies).

4.7.12 An S̄ clock moves past the observer S, who measures that the S̄ clock loses 1sec every
hour.
(a) Calculate the speed of S̄ relative to S.
(b) The clocks of S are situated 1m apart along the direction of motion of S̄. How many
clocks of S does the S̄ clock pass when it clocks off 1sec?

4.7.13 When an observer S measures a meter stick of S̄ that lies along the direction of motion
of S̄, he notes a contraction of 1mm. Find the speed of S̄ relative to S.

4.7.14 A cube with sides of length L moves with S̄.


(a) Find the volume of the cube as measured by S.
(b) How fast would the cube have to move in order that its volume be halved according to
S?

4.7.15 A meter stick, at rest in S̄, is inclined at an angle of θ = π/4 relative to S with respect
to the direction of motion of S̄ relative to S. The frame S̄ moves with a speed of 0.9c past
the frame S. What is the length of the meter stick as measured by the observer S, and
what is the angle between the meter stick and the x axis as observed by S?

4.7.16 A meter stick, at rest in S̄, is inclined at an angle of θ relative to S with respect to the
direction of motion of S̄ relative to S. The frame S̄ moves with a speed of 0.9c past the
frame S. Calculate the length l(θ) of the meter stick as measured by the observer S, and
draw a graph that shows the value of l(θ) for each value of θ.

4.7.17 The radius of our galaxy is 3 × 1020 m, or about 3 × 104 light-years.


(a) Can a person, in principle, travel from the center to the edge of our galaxy in a normal
lifetime? Explain, using either time-dilation or length contraction arguments.
(b) What constant velocity would he need to make the trip in 30 years (proper time)?

4.7.18 Suppose that event A causes event B in frame S, the effect being propagated with a
speed greater than c. Show, using the velocity addition equations, that there exist an
inertial frame S̄, which moves relative to S with a velocity less than c, in which the order
of these events are reversed. Hence, if concepts of cause and effect are to be preserved, it
is impossible to send signals with a speed greater than that of light.
104
APM3713/1

4.7.19 (a) Show that, with u2 = u2x + u2y and ū2 = ū2x + ū2y , we can write

c2 (c2 − ū2)(c2 − v 2 )
c2 − u 2 = .
(c2 + ūx v)2

(b) From this result show that if ū < c and v < c, then u must be less than c. That is, the
relativistic addition of two velocities, each less than c, is itself less than c
(c) Show also that if ū = c and v̄ = c, then u must be equal to c. That is, the relativistic
addition of any velocity to the velocity of light merely gives again the velocity of light.

4.7.20 Derive the special relativistic acceleration transformation

ax (1 − v 2 /c2 )3/2
āx =
(1 − ux v/c2 )3
where ax = d ux /dt and āx = d ūx/dt̄.
[Hint recall dux /dt̄ = (dux /dt)(dt/dt̄).]

Answers to exercises

1.3 (x̄ + v t̄)2 + ȳ 2 + z̄ 2 = c2 t̄4 .

1.4 (a) t = t̄ = −d/v = −1min.


(b) x̄ = 0, x = −1km.
(c) x = x̄ = 1km.
(d) x = 2km, x̄ = 1km; x = 1km, x̄ = 0.
p
1.5 1 − v 2 /c2 = 0.57, 2.46 × 108 m/sec.

1.6 (a) 3.6m, 4.2m.


(b) 0.2m, -0.6m.
(c) -0.24m, 0.12m.
(d) 4.68m, 5.16m.

1.7 (a) 92.8m.


(b) 1hr, 4min, 4sec.

1.9 (a) 45/c = 1.5 × 10−7sec.


(b) 190/c = 6.3 × 10−7 sec.
(c) 170/c = 5.7 × 10−7 sec.

1.10 (a) −2.5 × 1010 .%.


(b) 130 yr.
105
Chapter 4. Special Relativity

1.11 (a) △L/L0 = 2.2 × 10−12 m.


(b) 4.54 × 105 sec. = 5.26days.

1.12 (a) 7.1 × 106 m/sec.


(b) 7.1 × 106 clocks.

1.13 1.3 × 107 m/sec.


p
1.14 (a) volS = volS̄ × 1 − v 2 /c2 .
(b) 2.60 × 108 m/sec.

1.15 l2 − (1m/ 2)2 [1 + (1 − 0.99)2 , 0.714m; 82.00 .
p
1.16 [x/ 1 − v 2 /c2 ]2 + y 2 = 1m2 , y = x tan θ,
p √
l = x2 + y 2 = sec θ/ 5.3 + tan2 θ.

4.8 The four-vector

The first section is on relativistic dynamics and here we introduce the concept of the four-velocity,
the four-momentum and the four-force. From these we define the relativistic three-momentum
and the relativistic three-force. The notion of work done on a particle by an external force leads
naturally to the definition of kinetic energy and the total energy. All this is developed within an
inertial observer reference frame S wherein the particle is moving with velocity u.
We first recall that an event in the relativistic sense involves a description of the spatial
point (x, y, z) and the time t at which the event occurs. It was also emphasized in the preceding
sections that, unlike the Newtonian approach, the time t is not invariant relative to a change
of inertial reference frames (see subsection (4.5)). Consider an event displacement from event
O to event E in the inertial reference frame S. In principle, using the lorentz transformation
equations, inertial observer S can calculate the components of the event displacement, from Ō
to Ē, relative to any inertial observer S̄. Thus once an event displacement is known in one
inertial frame it is determinable in all inertial frames by virtue of the Lorentz transformations.
Therefore, since one inertial observer can tell how another inertial observer describes the same
event displacement, we say that event displacements exist independent of the observer.
We now define a four-vector as an entity that is represented by four components and its
description is independent of inertial observer by virtue of its transformations properties. Note
that the four components of a four-vector will generally be different from inertial observer to
inertial observer but once they have been determined by any one inertial observer the components
of the four-vector for the other inertial observers can be obtained from the Lorentz transformation
or its consequences e.g, the velocity transformation equations and the acceleration transformation
equations (see exercises).
106
APM3713/1

Our main focus is on the so-called four-velocity vector. Consider a particle moving relative to
an inertial frame S with velocity u. We denote the coordinates of the particle (called the world
line) measured in S by xµ = (ct, x, y, z)1 .

• First note that for convenience we use ct with spatial units instead of t.

• Secondly the time t in the S frame is an improper time, the proper time τ being measured
in the particle frame P . Recall that the two time parameters are related by the time dilation
p
formula dτ = 1 − u2 /c2 dt.

• Thirdly the world-line of the particle xµ in the S frame is a function of the proper time τ ,
that is xµ = xµ (τ ) and this is the relativistic generalization of the particle position as a function
of time.

We can now define the four-velocity as the rate of change of the four-vector xµ with respect
to the proper time τ :

dxµ d
ẋu ≡ = (ct, x, y, z)
dτ dτ
dt dx dy dz
= (c , , , )
dτ dτ dτ dτ
dt dx dy dz
= (c, , , )
dτ dt dt dt
1
= p (c, u(t))
1 − u2 /c2
= (γc, γu(t)), (4.56)

where we set
1
γ=p .
1 − u2 /c2
Note that the norm or magnitude of the four velocity is a constant c2 irrespective of u:

ηµν ẋµ ẋν = γ 2 (c2 − u2 )


1
= (c2 − u2 ), u2 = u.u
1 − u2 /c2
= c2 (4.57)

where ηµν is the Lorentz metric. Equation (4.57) emphasizes the independence of the four-velocity
ẋu from any observer. The two velocities have different interpretations. On the one hand u(t) is
1
It is convenient to introduce, without much rigor, the tensorial notation from henceforth. Thus xµ (or xµ )
denotes the four-vector components where the superscript µ takes the value µ = 1, 2, 3, 4 for each component.
A useful feature of this notation is the summation convention: xµ xµ denotes the norm or magnitude of the
four-vector where xµ xµ = x1 x1 + x2 x2 + x3 x3 + x4 x4 . In general Aµ Bµ = A1 B1 + A2 B2 + A3 B3 + A4 B4 .

107
Chapter 4. Special Relativity

the rate of change of the particle’s position in space with respect to the time of the observer S.
Other observers will obtain different results, and since observers may not agree on simultaneity,
the measurement of u(t) is observer dependent. On the other hand four-velocity ẋµ correspond
to the space-time separation between two neighboring events xµ and like events displacement,
this separation is observer independent.

4.9 Relativistic momentum

As pointed out in the introduction, there are various ways one may introduce the concept of
relativistic momentum. One way is based on momentum conservation. We opt for an approach
based on a systematic generalization of Newtonian quantities to their relativistic counterpart.
An obvious requirement is that in the non-relativistic limit, u/c << 1, the relativistic definition
reduces to its Newtonian counterpart. An example of this procedure is that of the four-velocity
as a relativistic generalization of the spatial velocity treated in the above subsection.
Firstly in the simplest case, namely, that of a particle that experiences no external force,
the characteristic of such a particle is that, relative to any inertial system S, its velocity u(t)
is constant in time t. Now at any instant of time, the velocity vector u determines in a unique
manner a four-velocity vector ẋµ , which as we have stated is an entity that exists independently
of the reference system S. Secondly, in the general accelerated motion, (i.e., time dependent
u(t)), the four-velocity ẋµ varies from event to event along the world line, and hence it can be
represented as a four-vector function of one parameter that specifies the change along the world
line. The appropriate parameter that specifies events along the world line is the proper time τ .
In this manner the motion of the particle is described independently of the inertial system and
we write ẋµ = ẋµ (τ ).
We now implement a direct relativistic generalization of the Newtonian momentum vector as
the product of mass m0 and velocity u(t) by replacing the velocity vector u(t) by the four-velocity
to obtain the four-vector:

pµ = m0 ẋµ (τ )
= (γm0 c, γm0 u(t)), (4.58)

called the energy-momentum vector. We use m0 to denote the rest mass (also called proper mass)
of the particle as measured in the rest frame P of the particle. The rest mass m0 is an invariant,2
i.e, all observers agree on its value at any instant in the particle’s history. From the equation
(4.58) it follows that the energy-momentum vector is independent of the inertial reference frame.
2
But we do not have any guarantee that m0 is a constant: essentially the rest mass may be altered by its
environment and the forces that are acting on the particle. For example electromagnetic forces do not change
the rest mass and such forces are called pure forces. On the other hand forces that alter the rest mass are called
heat-like or impure forces. An example of impure forces are forces that are derivable from scalar potential Φ
according to the equation Fµ = ∂Φ/∂xµ .

108
APM3713/1

The latter is confirmed by noting that the norm of the energy-momentum vector is invariant;

pµ pµ = m20 ẋµ ẋµ


= m20 c2 . (4.59)

The spatial components of the energy-momentum defines a three-vector called the relativistic
three-momentum;

p = γm0 u(t) = mu(t), (4.60)

where
m ≡ γm0 . (4.61)

is called the relativistic inertial mass. Notice that the relativistic mass m increases with increasing
velocity u; when u = 0 the two mass quantities are the same m = m0 .
The time-like component of the energy-momentum vector is

p0 = γm0 c
= mc. (4.62)

We will show in a later equation (4.74) that this component is related to the total relativistic
energy of the particle.

4.10 Relativistic force

It is instructive to stop and take a look backwards at this stage. First the concept of a spatial
point (x, y, z) is generalized to an event (ct, x, y, z). The proper time τ then describes a unique (in
the sense that there is only one inertial frame stationary relative to the particle) scalar variable
that specifies the change of events along a world line. It is thus sensible that the rate of change of
events with respect to the proper time then defines a unique four-velocity vector. Both the event
displacement and the four-velocity vector are independent of an observer (in the sense that once
known by one observer, the Lorentz transformations determines these quantities for all other
observers). The four-momentum, or more appropriately the energy-momentum vector, follows
as the product of mass and four-velocity.
Now, in a similar manner as in Newtonian dynamics, the rate of change of the energy-
momentum vector with respect to the proper time is determined by the total external forces
acting on the particle;

dpµ d
Fµ = = (γm0 ẋµ )
dτ dτ
d
= (mc, p). (4.63)

109
Chapter 4. Special Relativity

This generalizes Newton’s second law and defines the covariant equation of motion of a particle
in terms of the Minkowski four-force F µ . Now in terms of the improper time t in an inertial
frame S the four-force in (4.63) simplifies as follows
d
Fµ = (mc, p)

d
= γ (mc, p)
dt
dm dp
= γ(c , )
dt dt
dm
= γ(c , f), (4.64)
dt
where we define the three-force in terms of the relativistic three momentum (4.60) by
dp d(mu)
f≡ = , (4.65)
dt dt
u being the velocity of the particle in S. A crucial point to note here is that the physical content
of equation (4.65) is determined from the environment influencing the particle. For example,
later we will look at the transformation properties of the three-force f and put particular focus
on the dynamics of the particle in an electromagnetic field; here the force is the Lorentz force
f = q(E + u × B). In this manner the physical content of (4.65) would be fully determined and
we will show that the force transformation laws lead naturally to the transformation laws of the
electromagnetic fields E and B.

4.11 Kinetic energy

Having considered the sequence of events along the world line of the particle, the four-velocity
vector and the external force acting on the particle; the next dynamic variable to follow is the
kinetic energy of the particle. Suppose that in the inertial frame S the spatial force f acting on
a particle gives rise to a spatial displacement dl in the direction of the force then the work done
by the force is f.dl. If in addition it is assumed that all the work goes into increasing the speed
of the particle then we can express this work as the kinetic energy by
dl
dT = f.dl = (f. )dt
dt
= f.udt. (4.66)

Now f and u are in the same direction so we may write (4.66) in terms of the components f and
u in a suitably chosen coordinate frame, thus

dT = f udt. (4.67)

If we put

u = c sin θ, θ = θ(t) (4.68)


110
APM3713/1

then
1
γ=p = sec θ. (4.69)
1 − u2 /c2
Hence from (4.60) and (4.65) we get

p = m0 c tan θ; f = m0 c sec2 θ . (4.70)
dt
Substituting for f and u in (4.67) gives
sinθ
dT = m0 c2 dθ. (4.71)
cos2 θ
Now if we integrate (4.71), assuming the particle starts from rest when T = 0 and θ = 0, we
obtain

T = m0 c2 (sec θ − 1) = m0 c2 (γ − 1)
= mc2 − m0 c2 . (4.72)

Newtonian limit of T:
Now does (4.72) reduce to the classical T = 12 mu2 in the classical limit u/c << 1? Let us
check this. Write (4.72) as

T = m0 c2 (γ − 1)
" #
2 −1/2

u
= m0 c2 1− 2 −1
c
1 u2 3 u4
  
2
= m0 c 1+ 2 + + ... − 1
2c 8 c4
1
≈ m0 u2, (4.73)
2
where we have used the binomial theorem expansion in u/c.

The term m0 c2 in (4.72) depends only on the rest mass and is called the rest energy. From
(4.72) we define the total energy E as the sum of the kinetic energy and the rest energy

E ≡ T + m0 c2 = mc2
= p0 c. (4.74)

So now from (4.62) we then write the energy-momentum vector (4.58) as


 
µ E
p = ,p . (4.75)
c
Also from (4.60) and (4.74) we relate the velocity to the relativistic momentum and the energy
by the relation
pc2
u= . (4.76)
E
111
Chapter 4. Special Relativity

4.12 Transformation properties

In the above formulation of relativistic dynamics of a single particle only two reference frames
were introduced. The first is the particle frame P in which the particle is stationary; the rest
mass m0 and the rest energy E0 are measured in this frame. The second reference frame is the
observer frame S in which the particle is moving with velocity u. Note that the particle is in
general allowed to accelerate and thus the velocity u is not uniform. Consequently the particle
frame is not an inertial frame and it for this reason that very little is said about the Lorentz
transformations of quantities between P and S. The only property used to relate the two frames
is the time dilation and (as argued in the twin paradox) this applies also to non-inertial frames.
We proceed to derive transformation properties of relativistic dynamic variables from one
inertial reference frame S to another S̄ having a relative uniform velocity v along common x, x̄
axes. Let u denote the velocity of a particle in S and u′ its velocity in S̄. Due to the lengthy
calculations most of the derivations are left to the reader to complete as exercises. In effect these
are nothing more than further consequences of the Lorentz transformations.

4.13 Momentum transformation

It is instructive to obtain the transformation of γ. First from problem 4.7.19 in the previous
chapter we have
2 2 c2 (c2 − v 2 ) 2
c −u = 2 2
(c − ū2 ),
(c + ūx v)
2
now divide throughout by c , invert and take the square root to obtain the transformation
equation of γ
1
γ = p
1 − u2 /c2
1 + ūx v/c2
= p p
1 − ū2 /c2 1 − v 2 /c2
1 + ūx v/c2
= γ̄ p . (4.77)
1 − v 2 /c2
The inverse relation to (4.77) follows as
1 − ux v/c2
γ̄ = γ p . (4.78)
1 − v 2 /c2
By definition, in the S frame we write the energy-momentum vector components as
px = γm0 ux , py = γm0 uy ,
pz = γm0 uz , E = γm0 c2 . (4.79)
Also, by definition, in the S̄ frame the corresponding quantities are
p̄x = γ̄m0 ūx , p̄y = γ̄m0 ūy ,
p̄z = γ̄m0 ūz , Ē = γ̄m0 c2 . (4.80)

112
APM3713/1

Exercise: Use the γ−transformation equation (4.77),(4.78), the transformation equations for
the velocity components (4.48-4.50), (4.51-4.53) and keep the rest mass m0 invariant to show
that the energy-momentum transformation equations are
 
1 Ēv
px = p p̄x + 2 , (4.81)
1 − v 2 /c2 c
py = p̄y , pz = p̄z , (4.82)
1 
E = p Ē + v p̄x (4.83)
1 − v 2 /c2

and the inverse relations, obtained by replacing v with −v and the barred with the unbarred
quantities, are
 
1 Ev
p̄x = p px − 2 , (4.84)
1 − v 2 /c2 c
p̄y = py , p̄z = pz , (4.85)
1
Ē = p (E − vpx ) . (4.86)
1 − v 2 /c2

Note that the quantities px , py , pz and E/c2 transform exactly as the spacetime coordinates
x, y, z and t in the Lorentz transformation.

Example 5.14.1

Show that the quantity p2 − E 2 /c2 is an invariant, where p2 denotes the norm of the relativistic
three-momentum vector (4.60).
Solution The energy-momentum vector transformation equations (4.84)–(4.86) give

p̄2 − Ē 2 /c2 = p̄2x + p̄2y + p̄2z − Ē 2 /c2


= γ 2 (px − vE/c2 )2 + p2y + p2z
−γ 2 (E − vpx )2 /c2
= p2x + p2y + p2z − E 2 /c2
= p2 − E 2 /c2 .

Hence the quantity p2 − E 2 /c2 is an invariant. In fact it can be easily shown that its numerical
value is −m20 c2 ; confirm this.

4.14 Force transformation

Now in the S frame the components of the three-force vector (4.65) are
d d d
fx = (γm0 ux ), fy = (γm0 uy ), fz = (γm0 uz ) (4.87)
dt dt dt
113
Chapter 4. Special Relativity

and the corresponding components in the S̄ frame are

d d d
f¯x = (γ̄m0 ūx ), f¯y = (γ̄m0 ūy ), f¯z = (γ̄m0 ūz ). (4.88)
dt̄ dt̄ dt̄
These are related by the relativistic force transformation equations (also called the Lorentz force
transformations) given below.3

Exercise : Derive the following equations

ūy v ¯ ūz v ¯
fx = f¯x + 2 fy + 2 fz , (4.89)
c + ūx v c + ūx v
p
1 − v 2 /c2 ¯
fy = fy , (4.90)
1 + ūx v/c2
p
1 − v 2 /c2 ¯
fz = fz (4.91)
1 + ūx v/c2

and the inverse equations are


uy v uz v
f¯x = fx − 2 fy − 2 fz , (4.92)
c − ux v c − ux v
p
1 − v 2 /c2
f¯y = fy , (4.93)
1 − ux v/c2
p
1 − v 2 /c2
f¯z = fz . (4.94)
1 − ux v/c2

A special case of the Lorentz force transformation which is both simple and useful is that of a
particle that is instantaneously at rest in the S̄ frame, that is ū = 0. Here S̄ is the proper frame
and the Lorentz force transformations (4.89)–(4.91) become

fx = f¯x ,
fy = f¯y 1 − v 2 /c2 ,
p

fz = f¯z 1 − v 2 /c2 .
p
(4.95)

Notice that the force in the particle’s instantaneous rest frame is greater than the corresponding
force in any other frame
3
Hint: Start from
dp̄x
f¯x = ,
dt̄
d dt d
substitute for p̄x from (4.84), replace dt̄ by dt̄ dt and show from the lorentz equation (4.25) that
p
dt̄ 1 − v 2 /c2
= .
dt 1 − vux /c2

Then (4.89) follows easily. Equations (4.90) and (4.91) are similarly obtained.

114
APM3713/1

4.15 Conclusion

It now clear that Newtonian dynamical variables can be cast in a relativistic form in order to
accommodate high-speed physics. From a physicist point of view the interest lies in a detailed
formulation of special relativity to encompass the various physical situations that are gravity-
free. The mathematician, on the other hand, is more interested in transformation properties;
issues like which variables are invariant? and what formulae are covariant? The substance of
this module forms the basis of solutions to these questions in the study of special and general
relativity.

4.16 Exercises

4.16.1 Calculate the norm of the four-force i.e, F µ Fµ . Is the four-force an invariant quantity?

4.16.2 Calculate the following dot products;


(a) pµ ẋµ ,
(b) F µ pµ ,
(c) F µ ẋµ = 0.

4.16.3 Use equation (4.68) to show that the total energy E and the relativistic momentum of
the particle are related by E 2 = c2 p2 + m20 c4 .

4.16.4 Use the previous problem to show that the relativistic mass m of a particle is given by

p 2 c2 − T 2
m= .
2T c2
Calculate the mass of the particle if its momentum is 130MeV /c and its kinetic energy is
50MeV .

4.16.5 An electron has a momentum of 78MeV /c. Find the following: (a) the total energy (b)
the kinetic energy and (c) the speed of the electron.
Ans: (a) 78 MeV, (b) 77.5 MeV, (c) (1 − 2 × 10−5)c.

4.16.6 Calculate the velocities and momenta of protons of kinetic energies


(a) 100MeV
(b) 10GeV .
What is the total energy E = T + mc2 in each case?
Ans: (a) u = 0.428c; p = 445MeV /c; E = 1038MeV
(b) u = 0.996c; p = 10897MeV /c; E = 10938MeV
115
Chapter 4. Special Relativity

4.16.7 The Lorentz transformation equations are directly involved in determining the trans-
formation properties of an event xµ from an inertial frame S to another inertial frame S̄
moving with velocity v along a common x, x̄ axis. Obtain the transformation equations for
the four-velocity vector ẋµ .

4.16.8 An electron is moving along the x-axis of the inertial reference frame S with velocity
u = 0.8c.
(a) Determine the momentum and the total energy of the electron in S.
(b) Use the momentum and energy transformations to determine the momentum and the
total energy of the electron in the inertial reference frame S̄ that is moving with uniform
velocity v = 0.6c along the x-axis of S.
Ans: (a) 0.6813MeV/c; 0.8517 MeV. (b) 0.2129MeV/c; 0.5537 MeV.

4.16.8 The force transformation equations (4.89-4.91), (4.92-4.94) focuses on the relativistic
three-force defined by equation (4.65). Obtain the transformation equations for the
Minkowski four-force defined in equation (4.64).

4.16.9 Suppose that an electromagnetic field is purely magnetic (i.e, B 6= 0 but E = 0) in an


inertial frame S. Describe this field in an inertial frame S̄.

4.16.10 Show that E 2 − c2 B 2 is an invariant quantity under a Lorentz transformation. Then


argue that, if E = cB in one inertial frame, E ′ = cB ′ in any other inertial frame, and that
if E > cB in one inertial frame, then E ′ > cB ′ in any other inertial frame.

4.16.11 Show that E.B is an invariant quantity under a Lorentz transformation. Then argue
that if the electric and magnetic fields are perpendicular to one another in one inertial
frame, they are perpendicular in all frames.

4.16.12 Show that for a given electromagnetic field, we can find an inertial frame in which
either E = 0 (if E < cB) or B = 0 (if E > cB) at a given point if, and only if, E.B = 0
at that point. That is , if (only if) E and B are perpendicular to one another, we can find
a frame in which we have either no electric field or else no magnetic field. Use the results
of problems 2.14 and 2.15 above

4.16.13 A particle of charge q moves with uniform velocity u in an inertial frame S. Consider
a frame S ′ , moving with uniform velocity u relative to S, in which the charge is at rest and
the force on the charge is F′ = qE′ . Show that the force on the particle in frame S is the
Lorentz force, F = q(E + u × B), by using the transformation for the components of force
and the transformation of the components of E and B.

4.16.14 Use the force transformation to show that, if the source charge (the source of the field)
moves with speed v relative to a test charge (the charge acted on by the field), either toward
116
APM3713/1

or away from it along the line connecting the two charges, the force on the test charge is
(1 − v 2/c2 ) times the ordinary Coulomb force. (Hint. Transform from a frame in which the
source charge is at rest and remember the space contraction effect).

117
References

References

GOETZ, A. Introduction to Differential Geometry, Addison Wesley, (1970).

McCONNELL, A.J. Applications of Tensor Analysis, Dover, (1957).

STRUIK, D.J. Lectures on Classical Differential Geometry. Addison Wesley, (1950).

LIPSCHUTZ, M.M. Differential Geometry, Schaum‘s Outline Series, (1969).

D’INRVERNO, R. Introducing Einstein’s relativity, Clarendon Press, Oxford, 1992.

LAWDEN, D.F. An Introduction to Tensor Calculus, Relativity and Cosmology,


John Wiley and Sons, New York, 1982.

MOULD,R. A., Basic Relativity, Springer-Verlag, New York, 1994.

RESNICK, R. Introduction to Special Relativity, John Wiley and Sons, New York, 1968.

RINDLER, W. Introduction to Special Relativity, Clarendon Press, Oxford, 1982.

SKINNER, R. Relativity for Scientists and Engineers, Dover Publ., New York, 1982.

118

You might also like