2018 Bookmatter ClassicalMechanics PDF

Appendix A
Vector Calculus
Vector calculus is an indispensable mathematical tool for classical mechanics. It pro-

vides a geometric (i.e., coordinate-independent) framework for formulating the laws
of mechanics, and for solving for the motion of a particle, or a system of particles,
subject to external forces. In this appendix, we summarize several key results of
differential and integral vector calculus, which are used repeatedly throughout the
text. For a more detailed introduction to these topics, including proofs, we recom-
mend, e.g., Schey (1996), Boas (2006), and Griffiths (1999), on which we’ve based
our discussion. Since the majority of the calculations in classical mechanics involve
working with ordinary three-dimensional spatial vectors, e.g., position, velocity, ac-
celeration, etc., we focus attention on such vectors in this appendix. Extensions
to four-dimensional vectors, which arise in the context of relativistic mechanics,
are discussed in Chap. 11, and the calculus of differential forms is discussed in
Appendix B.
A.1 Vector Algebra
In addition to adding two vectors, A + B, and multiplying a vector by a scalar, aA,

we can form various products of vectors: (i) The dot product (also called the scalar
product or inner product) of two vectors A and B is defined by
A · B ≡ AB cos θ , (A.1)
where A ≡ |A|, B ≡ |B| are the magnitudes (or norms) of A, B, and θ is the angle
between the two vectors. (ii) The cross product (also called the vector product or
exterior product) of A and B is defined by
A × B ≡ AB sin θ n̂ , (A.2)
© Springer International Publishing AG 2018 407

M.J. Benacquista and J.D. Romano, Classical Mechanics, Undergraduate
Lecture Notes in Physics, https://doi.org/10.1007/978-3-319-68780-3
408 Appendix A: Vector Calculus
where θ is as before (assumed to be between 0◦ and 180◦ ), and n̂ is a unit vector

perpendicular to the plane spanned by A and B, whose direction is given by the
right-hand rule.1 Note that if A and B are parallel, then A × B = 0, while if A and
B are perpendicular, A · B = 0.
The dot product and cross product of A and B can also be written rather simply in
terms of the components Ai , Bi (i = 1, 2, 3) of A and B with respect to an orthonor-
mal basis {ê1 , ê2 , ê3 }. (See Appendix D.2.1 for a general review about decomposing
a vector into its components with respect to a basis.) Expanding A and B as

A= Ai êi , B= Bi êi , (A.3)
i i
it follows that

A·B= δi j Ai B j = Ai Bi (A.4)
i, j i
and

(A × B)i = εi jk A j Bk , (A.5)
j,k
where

1 i= j
δi j ≡ (A.6)
0 i = j
is the Kronecker delta, and2

⎧
⎨ 1 if i jk is an even permutation of 123
εi jk ≡ −1 if i jk is an odd permutation of 123 (A.7)
⎩
0 otherwise
is the Levi-Civita symbol. We note that the above component expressions for dot
product and cross product are valid with respect to any orthonormal basis, and not
just for Cartesian coordinates.
Geometrically, the dot product of two vectors is the projection of one vector onto
the direction of the other vector, times the magnitude of the other vector. Thus, by
1 Point the fingers of your right hand in the direction of A, and then curl them toward your palm in
the direction of B. Your thumb then points in the direction of n̂.

2 An odd (even) permutation of 123 corresponds to an odd (even) number of interchanges of two of
the numbers. For example, 213 is an odd permutation of 123, while 231 is an even permutation.
Appendix A: Vector Calculus 409
Fig. A.1 Components of a A2 = A · ê2

2-dimensional vector A in A
terms of dot products with
the orthonormal basis
vectors ê1 , ê2 e2
e1 A1 = A · ê1
taking the dot product of A with the orthonormal basis vectors êi , we obtain the
components of A with respect to this basis, i.e., Ai = A · êi . This is shown in
Fig. A.1.
Exercise A.1 Prove that the geometric and component expressions for both the
dot product, (A.1) and (A.4), and cross product, (A.2) and (A.5), are equivalent
to one another, choosing a convenient coordinate system to do the calculation.
A key identity relating the Kronecker delta and Levi-Civita symbol is

εi jk εilm = δ jl δkm − δ jm δkl . (A.8)
i
Using this identity and the component forms of the dot product and cross product,
one can prove the following three results:
Scalar triple product:
A · (B × C) = B · (C × A) = C · (A × B) (A.9)
Vector triple product:
A × (B × C) = B(A · C) − C(A · B) (A.10)
Jacobi identity:
A × (B × C) + B × (C × A) + C × (A × B) = 0 (A.11)
Exercise A.2 Prove the above three identities.

A.2 Vector Component and Coordinate Notation
Before proceeding further, we should comment on the index notation that we’ll be
using throughout this book.
A.2.1 Contravariant and Covariant Vectors
In general, one should distinguish between vectors with components Ai (so-called

contravariant vectors) and vectors with components Ai (so-called covariant vec-
tors or dual vectors). By definition these components transform inversely to one
another under a change of basis or coordinate system. For example, if Ai and
Ai denote the components of a contravariant and covariant vector with respect to
a coordinate basis (See Appendix A.4.1), then under a coordinate transformation

x i → x i = x i (x i ):
∂ xi ∂xi
i
A = Ai , Ai = Ai . (A.12)
i
∂xi i
∂ xi
But since most of the calculations that we will perform involve quantities in ordi-
nary 3-dimensional Euclidean space with components defined with respect to an
orthonormal basis, then
êi · ê j = δi j = diag(1, 1, 1) , (A.13)
and the two sets of components Ai and Ai can be mapped to one another using the
Kronecker delta:

Ai = δi j A j ⇔ A1 = A1 , A2 = A2 , A3 = A3 . (A.14)
j
Hence, Ai = Ai , so it doesn’t matter where we place the index. For simplicity of

notation, we will typically use the subscript notation, which is the standard notation
in the classical mechanics literature.
This equality between covariant and contravariant components will not hold,
however, when we discuss spacetime 4-vectors in the context of special relativity
(Chap. 11). This is because a set of orthonormal spacetime basis vectors satisfies
(See Sect. 11.5.2.1)
eα · eβ = ηαβ = diag(−1, 1, 1, 1) , (A.15)
where α = 0, 1, 2, 3 labels the spacetime coordinates x α ≡ (ct, x, y, z) of an inertial

reference frame. Thus, the two sets of components Aα and Aα are related by

Aα = ηαβ Aβ ⇔ A0 = −A0 , A1 = A1 , A2 = A2 , A3 = A3 .
β
(A.16)
So for relativistic mechanics, A0 and A0 differ by a minus sign, and hence it will
be important to distinguish between the components of contravariant and covariant
vectors. Whether an index is a superscript or a subscript does make a difference in
special relativity.

Exercise A.3 Show that under a coordinate transformation x i → x i (x i ), the
components Ai of a contravariant vector transform like the coordinate differ-
entials dx i , while the components Ai of a covariant vector transform like the
partial derivative operators ∂/∂ x i .
A.2.2 Coordinate Notation
Regarding coordinates, we will generally use superscripts (as we have above) to

denote the collection of coordinates as a whole, e.g.,
(i) x i ≡ (x 1 , x 2 , x 3 ) = (x, y, z), (r, θ, φ), or (ρ, φ, z) for ordinary 3-dimensional
Euclidean space,
(ii) x α ≡ (ct, x, y, z) for Minkowski spacetime,
(iii) q a ≡ (q 1 , q 2 , . . . , q n ) for generalized coordinates defining the configuration of
a system of particles having n degrees of freedom.
Note that the superscripts just label the different coordinates, e.g., x 2 and x 3 , and
do not correspond to the square or cube of a single coordinate x. We choose this
notation because tangents to curves in these spaces naturally define contravariant
vectors, e.g.,
dx i (λ)
vi ≡ , (A.17)
dλ
and partial derivatives of scalars with respect to the coordinates naturally define
covariant vectors, e.g.,
∂ϕ
ωi ≡ i , (A.18)
∂x
with the placement of the superscript or subscript indices matching on both sides of
the equation.3 This is valid even for spaces that are not Euclidean and for coordinates
that are not Cartesian, e.g., the angular coordinates describing the configuration of a
planar double pendulum (See e.g., Problem 1.4).
A.2.3 Other Indices
All other types of indices that we might need to use, e.g., to label different functions,
basis vectors, or particles in a system, etc., will be placed as either superscripts or
subscripts in whichever way is most notationally convenient for the discussion at
hand. There is no “transformation law” associated with changes in these types of
indices, so there is no standard convention for their placement.
A.3 Differential Vector Calculus
To do calculus with vectors, we need fields—both scalar fields, which assign a real
number to each position in space, and vector fields, which assign a three-dimensional
vector to each position. An example of a scalar field is the gravitational potential (r)
for a stationary mass distribution, written as a function of the spatial location r. An
example of a vector field is the velocity v(r, t) of a fluid at a fixed time t, which is a
function of position r within the fluid.
Given a scalar field U (r) and vector field A(r), we can define the following
derivatives:
(i) Gradient:

U (r2 ) − U (r1 )
(∇U ) · t̂ ≡ lim , (A.19)
s→0 s
where r1 and r2 are the endpoints (i.e., the ‘boundary’) of the vector displacement
s ≡ s t̂. Thus, (∇U ) · t̂ measures the change in U in the direction of t̂. This is the
directional derivative of the scalar field U . The direction of ∇U is perpendicular
to the contour lines U (r) = const, since the right-hand side is zero for points that
3 If we had swapped the notation and denoted the components of contravariant vectors with subscripts
and the components of covariant vectors with superscripts, then to match indices would require
denoting the collection of coordinates with subscripts, like xi . In retrospect, this might have been a
less confusing notation for coordinates (e.g., no chance of confusing the second coordinate x2 with
x-squared, etc.). But we will stick with the coordinate index notation that we have adopted above
since it is the standard notation in the literature.
10
5
U(x,y)
−5
−10
1
1
0 0.5
0
−0.5
y −1 −1
x
1 6
0.8 5
0.6 4
3
0.4
2
0.2
1
0
y
0
−0.2
−1
−0.4
−2
−0.6 −3
−0.8 −4
−1 −5
−1 −0.5 0 0.5 1
x
Fig. A.2 Top panel: Function U (x, y) displayed as a 2-dimensional surface. Bottom panel: Contour
plot (lines of constant U , lighter lines corresponding to larger values) with gradient vector field ∇U
superimposed. Note that the direction of ∇U is perpendicular to the U (x, y) = const lines and is
largest in magnitude where the change in U is greatest
lie along a contour. Hence the gradient ∇U points in the direction of steepest ascent
of the function U . This is illustrated graphically in Fig. A.2 for a function of two
variables U (x, y).
(ii) Curl:

1
(∇ × A) · n̂ ≡ lim A · ds , (A.20)
a→0 a C
where C is the boundary of the area element a ≡ n̂ a, and ds is the infinitesimal
displacement vector tangent to C. The curl measures the circulation of A(r) around
an infinitesimal closed curve. An example of a vector field with a non-zero curl is
shown in panel (a) of Fig. A.3.
(iii) Divergence:

1
∇ · A ≡ lim A · n̂ da , (A.21)
V →0 V S
where S is the boundary of the volume V . The divergence measures the flux of
A(r) through the surface bounding an infinitesimal volume element. An example of
a vector field with a non-zero divergence is shown in panel (b) of Fig. A.3.
The beauty of the above definitions is that they are geometric and do not refer
to a particular coordinate system. In Appendix A.5, we will write down expressions
for the gradient, curl, and divergence in arbitrary orthogonal curvilinear coordinates
(u, v, w), which can be derived from the above definitions. In Cartesian coordinates
(x, y, z), the expressions for the three different derivatives turn out to be particularly
simple:
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
y
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
x x
(a) (b)
Fig. A.3 Panel (a) Example of a vector field, A(r) = −y x̂ + x ŷ, with a non-zero curl, ∇ ×A = 2ẑ.
Panel (b) Example of a vector field A(r) = x x̂ + y ŷ + z ẑ, with a non-zero divergence, ∇ · A = 3.
In both cases, just the z = 0 values of the vector fields are shown in these figures
∂U ∂U ∂U
∇U = x̂ + ŷ + ẑ ,
∂x ∂y ∂z

∂ Az ∂ Ay ∂ Ax ∂ Az ∂ Ay ∂ Ax
∇×A= − x̂ + − ŷ + − ẑ ,
∂y ∂z ∂z ∂x ∂x ∂y
∂ Ax ∂ Ay ∂ Az
∇·A= + + .
∂x ∂y ∂z
(A.22)
In more compact form,

(∇U )i = ∂i U , (∇ × A)i = εi jk ∂ j Ak , ∇ · A = ∂i Ai , (A.23)
j,k i
where ∂i is shorthand for the partial derivative ∂/∂ x i , where x i ≡ (x, y, z).
We conclude this subsection by noting that the curl and divergence of a vector
field A(r), although important derivative operations, do not completely capture how
a vector field changes as you move from point to point. A simple counting argument
shows that, in three-dimensions, we need 3 × 3 = 9 components to completely
specify how a vector field changes from point to point (three components of A times
the three directions in which to take the derivative). The curl and divergence supply
3 + 1 = 4 of those components. So we are missing 5 components, which turn out
to have the geometrical interpretation of shear (See, e.g., Romano and Price 2012).
The shear can be calculated in terms of the directional derivative of a vector field,
which we shall discuss in Appendix A.4. Figure A.4 shows an example of a vector
field that has zero curl and zero divergence, but is clearly not a constant. This is an
example of a pure-shear field (See Exercise A.11).
Fig. A.4 Example of a 1

vector field A(r) = y x̂ + x ŷ 0.8
that has both ∇ × A = 0 and
∇ · A = 0, but non-zero 0.6
shear 0.4
0.2
0
y
−0.2
−0.4
−0.6
−0.8
−1
−1 −0.5 0 0.5 1
x
A.3.1 Product Rules
It turns out that the product rule
d df dg
( f g) = g+ f (A.24)
dx dx dx
for ordinary functions of one variable, f (x) and g(x), extends to the gradient, curl,
and divergence operations, although the resulting expressions are more complicated.
Since there are four different ways of combining a pair of scalar and/or vector fields
(i.e., f g, A · B, f A, A × B) and two different ways of taking derivatives of vector
fields (either curl or divergence), there are six different product rules: here are six
different product
∇( f g) = (∇ f )g + f (∇g) , (A.25a)
∇(A · B) = A × (∇ × B) + B × (∇ × A) + (A · ∇)B + (B · ∇)A ,(A.25b)
∇ × ( f A) = (∇ f ) × A + f ∇ × A , (A.25c)
∇ × (A × B) = (B · ∇)A − (A · ∇)B + A(∇ · B) − B(∇ · A) , (A.25d)
∇ · ( f A) = (∇ f ) · A + f ∇ · A , (A.25e)
∇ · (A × B) = (∇ × A) · B − A · (∇ × B) . (A.25f)
We will discuss some of these product rules in more detail in Appendix A.4.
Exercise A.4 Prove the above product rules. (Hint: Do the calculations in Carte-
sian coordinates where the expressions for gradient, curl, and divergence are the
simplest.)
A.3.2 Second Derivatives
It is also possible to take second (and higher-order) derivatives of scalar and vector
fields. Since ∇U and ∇ × A are vector fields, we can take either their divergence or
curl. Since ∇ · A is a scalar field, we can take only its gradient. Thus, there are five
such second derivatives:
∇ · ∇U ≡ ∇ 2 U , (A.26a)
∇ × ∇U = 0 , (A.26b)
∇(∇ · A) = a vector field , (A.26c)
∇ · (∇ × A) = 0 , (A.26d)
∇ × (∇ × A) ≡ ∇(∇ · A) − ∇ 2 A . (A.26e)
Note that the curl of a gradient, ∇ ×∇U , and the divergence of a curl, ∇ ·(∇ ×A), are
both identically zero. The divergence of a gradient defines the Laplacian of a scalar
field, ∇ 2 U , and the curl of a curl defines the Laplacian of a vector field, ∇ 2 A (second
term on the right-hand side of (A.26e)). In Cartesian coordinates x i ≡ (x, y, z), the
scalar and vector Laplacians are given by
∂ 2U ∂ 2U ∂ 2U
∇ 2U = + + ,
∂x2 ∂ y2 ∂z 2
(A.27)
2 ∂ 2 Ai ∂ 2 Ai ∂ 2 Ai
∇ Ai= + + , i = 1, 2, 3 .
∂x2 ∂ y2 ∂z 2
The gradient of a divergence is a non-zero vector field in general, but it has no special
name, as it does not appear as frequently as the Laplacian operator.
Example A.1 Prove that ∇ × ∇U = 0 and ∇ · (∇ × A) = 0.

Solution: Since these are vector equations, we can do the proof in any coordinate
system. For simplicity, we will use Cartesian coordinates x i ≡ (x, y, z) where the
gradient, curl, and divergence are given by (A.23). Then

[∇ × ∇U ]i = εi jk ∂ j ∂k U = 0 , (A.28)
j,k
since partial derivatives commute and εi jk is totally anti-symmetric. Similarly,

⎛ ⎞

∇ · (∇ × A) = ∂i ⎝ εi jk ∂ j Ak ⎠ = 0 , (A.29)
i j,k
again since partial derivatives commute and εi jk is totally anti-symmetric.
Exercise A.5 Verify ∇ × (∇ × A) = ∇(∇ · A) − ∇ 2 A in Cartesian coordinates.
A.4 Directional Derivatives
You might have noticed that the right-hand sides of (A.25b) and (A.25d) for ∇(A · B)
and ∇ × (A × B) involve quantities of the form (B · ∇)A, which are not gradients,
curls, or divergences of a vector or scalar field. Geometrically, (B · ∇)A represents
the directional derivative of the vector field A in the direction of B, which generalizes
the definition of the directional derivative of a scalar field. To calculate (B · ∇)A,
we need to evaluate the directional derivatives of the components Ai with respect to
a basis êi , as well as the directional derivatives of the basis vectors themselves. But
before doing that calculation, it is worthwhile to remind ourselves about directional
derivatives of scalar fields, and also how to calculate coordinate basis vectors in
arbitrary curvilinear coordinates (u, v, w).
A.4.1 Directional Derivative of a Function; Coordinate

Basis Vectors
Suppose we are given a curve x i = x i (λ) parametrized by λ, with tangent vector

vi ≡ dx i /dλ defined along the curve. Then the directional derivative of a (scalar)
function f evaluated at any point along the curve is given by
df dx i ∂ f ∂f
≡ = vi i ≡ v( f ) . (A.30)
dλ i
dλ ∂ x i
i
∂x
The notation v( f ) should be thought

of as v “acting on” f . Note that if we abstract
away the function f , we have v = i vi ∂/∂ x i , with the partial derivative operators
∂/∂ x i playing the role of coordinate basis vectors. Denoting these basis vectors by
the boldface symbol ∂ i , we have4

v= vi ∂ i . (A.31)
i
Thus, the directional derviative of a scalar field sets up aone-to-one correspondence

between vectors v and directional derivative operators i vi ∂/∂ x i .
One nice feature about this correspondence between vectors and directional
derivative operators is that it suggests how to calculate the coordinate basis vec-
tors for arbitrary curvilinear coordinates (u, v, w) in terms of the Cartesian basis
vectors x̂, ŷ, ẑ. One simply takes the chain rule
∂ ∂x ∂ ∂y ∂ ∂z ∂
= + + , etc. , (A.32)
∂u ∂u ∂ x ∂u ∂ y ∂u ∂z
4 A particular coordinate basis vector ∂ points along the x i coordinate line, with all other coordinates
i
(i.e., xj with j = i) constant.
and formally converts it to a vector equation
∂x ∂y ∂z
∂u = ∂x + ∂y + ∂z , etc. , (A.33)
∂u ∂u ∂u
with partial derivative operators replaced everywhere by coordinate basis vectors.
But since the coordinate basis vectors in Cartesian coordinates are orthogonal and
have unit norm, with ∂ x = x̂, etc., it follows that
∂x ∂y ∂z
∂u = x̂ + ŷ + ẑ ,
∂u ∂u ∂u
∂x ∂y ∂z
∂v = x̂ + ŷ + ẑ , (A.34)
∂v ∂v ∂v
∂x ∂y ∂z
∂w = x̂ + ŷ + ẑ .
∂w ∂w ∂w
The norms of these coordinate basis vectors are then given by

2
2
2
∂x ∂y ∂z
Nu ≡ |∂ u | = + + ,
∂u ∂u ∂u

2
2
∂y 2
∂x ∂z
Nv ≡ |∂ v | = + + , (A.35)
∂v
∂v ∂v

2
2
∂x 2 ∂y ∂z
Nw ≡ |∂ w | = + + ,
∂w ∂w ∂w
which we can then use to calculate unit vectors
û = Nu−1 ∂ u , v̂ = Nv−1 ∂ v , ŵ = Nw−1 ∂ w . (A.36)
In general, these unit vectors will not be orthogonal, although they will be for several
common coordinate systems, including spherical coordinates (r, θ, φ) and cylindrical
coordinates (ρ, φ, z) (See Appendix A.5 for details). We will use the above results
in the next section when calculating the directional derivative of a vector field in
non-Cartesian coordinates.
A.4.2 Directional Derivative of a Vector Field
Let’s return now to the problem of calculating (B·∇)A, which started this discussion
of directional derivatives. In Cartesian coordinates, it is natural to define (B · ∇)A in
terms of its components via

(B · ∇)A ≡ (Bi ∂i )A j ê j (A.37)
i, j
since the orthonormal basis vectors êi = {x̂, ŷ, ẑ} are constant vector fields. In
non-Cartesian coordinates, where the coordinate basis vectors change from point to
point, we would need to make the appropriate coordinate transformations for both
the vector components and the partial derivative operators. Although straightforward,
this is usually a rather long and tedious process.
A simpler method for calculating (B · ∇)A in non-Cartesian coordinates x i is to
expand both A and B in terms of the orthonormal basis vectors êi ,

(B · ∇)A = (Bi êi · ∇)(A j ê j ) = Bi (∇êi A j )ê j + Bi A j (∇êi ê j ) , (A.38)
i, j i, j i, j
and then evaluate ∇êi ê j by further expanding ê j as a linear combination of the

Cartesian basis vectors êi = {x̂, ŷ, ẑ}. (Here we are using the notation ∇êi ≡ êi · ∇,
and we are using a prime to distinguish the Cartesian basis vectors êi from the
non-Cartesian basis vectors êi .) This leads to

∇êi ê j = ∇êi jk êk ≡ (∇êi jk )êk , (A.39)
k k
where we have applied the derivatives only to the expansion coefficients jk , since
the Cartesian basis vectors êk are constants. For example, for the spherical coordinate
basis vectors êi = {r̂, θ̂ , φ̂}, we have
⎡ ⎤
sin θ cos φ sin θ sin φ cos θ
= ⎣ cos θ cos φ cos θ sin φ − sin θ ⎦ ,
− sin φ cos φ 0
⎡ ⎤ (A.40)
cos φ sin θ cos φ cos θ − sin φ
−1 = ⎣ sin φ sin θ sin φ cos θ cos φ ⎦ ,
cos θ − sin θ 0
for the matrix of expansion coefficients jk and its inverse (−1 )k l (See Exam-
ple A.2 for details). If we then re-express the Cartesian basis vectors in terms of the
original non-Cartesian basis vectors using the inverse transformation matrix (−1 )k l ,
we obtain
∇êi ê j = (∇êi jk ) (−1 )k l êl = Ci jl êl , (A.41)
k l l
where
Ci jl ≡ (∇êi jk )(−1 )k l . (A.42)
k
The Ci jl are often called connection coefficients. Thus,

(B · ∇)A = Bi (∇êi A j )ê j + Bi A j Ci jl êl . (A.43)
i, j i, j,l
Finally, if the non-Cartesian coordinates x i are orthogonal, as is the case for spherical
coordinates (r, θ, φ) and cylindrical coordinates (ρ, φ, z), then ∇êi = Ni−1 ∂/∂ x i ,
where Ni is a normalization factor relating the (in general, unnormalized) coordi-
nate basis vectors ∂ i to the orthonormal basis vectors êi . For example, in spherical
coordinates r̂ = ∂ r , θ̂ = r −1 ∂ θ , and φ̂ = (r sin θ )−1 ∂ φ .
Although this might seem like a complicated procedure when discussed abstractly,
in practice it is relatively easy to carry out, as the following example shows.
Example A.2 Calculate ∇êi ê j in spherical coordinates (r, θ, φ).

Solution: Recall that spherical coordinates (r, θ, φ) are related to Cartesian coordi-
nates (x, y, z) via
x = r sin θ cos φ , y = r sin θ sin φ , z = r cos θ . (A.44)
Using the chain rule to relate partial derivatives, e.g.,
∂ ∂x ∂ ∂y ∂ ∂z ∂
= + + , etc. , (A.45)
∂r ∂r ∂ x ∂r ∂ y ∂r ∂z
it follows that
r̂ = ∂ r = sin θ cos φ x̂ + sin θ sin φ ŷ + cos θ ẑ ,

θ̂ = r −1 ∂ θ = cos θ cos φ x̂ + cos θ sin φ ŷ − sin θ ẑ , (A.46)
φ̂ = (r sin θ )−1 ∂ φ = − sin φ x̂ + cos φ ŷ ,
using the one-to-one correspondence between vectors and directional derivative op-
erators discussed in Appendix A.4.1. The inverse transformation is given by
x̂ = ∂ x = cos φ sin θ r̂ + cos φ cos θ θ̂ − sin φ φ̂ ,

ŷ = ∂ y = sin φ sin θ r̂ + sin φ cos θ θ̂ + cos φ φ̂ , (A.47)
ẑ = ∂ z = cos θ r̂ − sin θ θ̂ .
Performing the derivatives as described above, we find

∇r̂ r̂ = ∂r sin θ cos φ x̂ + sin θ sin φ ŷ + cos θ ẑ = 0 ,

∇θ̂ r̂ = r −1 ∂θ sin θ cos φ x̂ + sin θ sin φ ŷ + cos θ ẑ

= r −1 cos θ cos φ x̂ + cos θ sin φ ŷ − sin θ ẑ = r −1 θ̂ , (A.48)

∇φ̂ r̂ = (r sin θ )−1 ∂φ sin θ cos φ x̂ + sin θ sin φ ŷ + cos θ ẑ

= (r sin θ )−1 − sin θ sin φ x̂ + sin θ cos φ ŷ = r −1 φ̂ .
Continuing in this fashion:
∇r̂ r̂ = 0 , ∇θ̂ r̂ = r −1 θ̂ , ∇φ̂ r̂ = r −1 φ̂ ,

∇r̂ θ̂ = 0 , ∇θ̂ θ̂ = −r −1 r̂ , ∇φ̂ θ̂ = r −1 cot θ φ̂ , (A.49)
−1 −1
∇r̂ φ̂ = 0 , ∇θ̂ φ̂ = 0 , ∇φ̂ φ̂ = −r r̂ − r cot θ θ̂ .
Exercise A.6 Calculate ∇êi ê j in cylindrical coordinates (ρ, φ, z).

You should find
x = ρ cos φ , y = ρ sin φ , z = z. (A.50)
Relation between basis vectors:
ρ̂ = ∂ ρ = cos φ x̂ + sin φ ŷ ,
φ̂ = ρ −1 ∂ φ = − sin φ x̂ + cos φ ŷ , (A.51)
ẑ = ∂ z = ẑ ,
with inverse relations:

x̂ = cos φ ρ̂ − sin φ φ̂ ,
ŷ = sin φ ρ̂ + cos φ φ̂ , (A.52)
ẑ = ẑ ,
and directional derivatives:
∇ρ̂ ρ̂ = 0 , ∇φ̂ ρ̂ = ρ −1 φ̂ , ∇ẑ ρ̂ = 0 ,

∇ρ̂ φ̂ = 0 , ∇φ̂ φ̂ = −ρ −1 ρ̂ , ∇ẑ φ̂ = 0 , (A.53)
∇ρ̂ ẑ = 0 , ∇φ̂ ẑ = 0 , ∇ẑ ẑ = 0 .
A.5 Orthogonal Curvilinear Coordinates
In this section we derive expressions for the gradient, curl, and divergence in general
orthogonal curvilinear coordinates (u, v, w). Our starting point will be the definitions
of gradient, curl, and divergence given in (A.19), (A.20), and (A.21). Examples of
orthogonal curvilinear coordinates include Cartesian coordinates (x, y, z), spherical
coordinates (r, θ, φ), and cylindrical coordinates (ρ, φ, z). These are the three main
coordinate systems that we will be using most in this text.
Recall that in Cartesian coordinates (x, y, z), the line element or infinitesimal
squared distance between two nearby points is given by
ds 2 = dx 2 + dy 2 + dz 2 (Cartesian) . (A.54)
Using the transformation equations (A.44) and (A.50), it is fairly easy to show that
in spherical coordinates (r, θ, φ) and in cylindrical coordinates (ρ, φ, z):
ds 2 = dr 2 + r 2 dθ 2 + r 2 sin2 θ dφ 2 (spherical) ,
(A.55)
ds = dρ + ρ dφ + dz
2 2 2 2 2
(cylindrical) .
More generally, in orthogonal curvilinear coordinates (u, v, w):
ds 2 = f 2 du 2 + g 2 dv2 + h 2 dw2 , (A.56)
where f , g, and h are functions of (u, v, w) in general. The fact that there are no
cross terms, like du dv, is a consequence of the coordinates being orthogonal. Note
that f = 1, g = r , and h = r sin θ for spherical coordinates and f = 1, g = ρ, and
h = 1 for cylindrical coordinates. These results are summarized in Table A.1.
For completely arbitrary curvilinear coordinates x i ≡ (x 1 , x 2 , x 3 ), the line ele-
ment, (A.56), has the more general form

ds 2 = gi j dx i dx j , (A.57)
i, j
Table A.1 Coordinates (u, v, w) and functions f , g, h for different orthogonal curvilinear coordi-
nate systems
Coordinates u v w f g h
Cartesian x y z 1 1 1
Spherical r θ φ 1 r r sin θ
Cylindrical ρ φ z 1 ρ 1
where gi j ≡ gi j (x 1 , x 2 , x 3 ). The quantities gi j are called the components of the

metric, and they can be represented in this case by a 3×3 matrix (See Appendix D.4.3
for a general discussion of matrix calculations). The metric components arise, for
example, when finding the geodesic curves (shortest distance paths) between two
points in a general curved space (See e.g., (C.14) and Exercise (C.7) in the context
of the calculus of variations, Appendix C).
To make connection with the definitions of gradient, curl, and divergence given in
(A.19), (A.20), and (A.21), we need expressions for the infinitesimal displacement
vector, volume element, and area elements in general orthogonal curvilinear coordi-
nates (u, v, w). From the line element (A.56), we can conclude that the infinitesimal
displacement vector ds connecting two nearby points is
ds = f du û + g dv v̂ + h dw ŵ . (A.58)
This is illustrated graphically in panel (a) of Fig. A.5. It is also easy to see from this
figure that the infinitesimal volume element dV is given by
dV = f gh du dv dw . (A.59)
The infinitesimal area elements are

⎧
⎨ ±û gh dv dw
n̂ da = ±v̂ h f dw du (A.60)
⎩
±ŵ f g du dv
with the ± sign depending on whether the unit normals to the area elements point in
the direction of increasing (or decreasing) coordinate value. One such area element
is illustrated graphically in panel (b) of Fig. A.5.
(u+du,v+dv,w+dw)
ds
(u+du,v+dv,w)
w w
h dw
g dv g dv
v v
u u
(u,v,w) f du (u,v,w) f du
(a) (b)
Fig. A.5 Panel (a) Infinitesimal displacement vector ds and volume element dV = f gh du dv dw in
general orthogonal curvilinear coordinates. Panel (b) The infinitesimal area element corresponding
to the bottom (w = const) surface of the volume element shown in panel (a)
For arbitrary curvilinear coordinates x i ≡ (x 1 , x 2 , x 3 ) with non-zero off-diagonal

terms, we note that the above expressions generalize to

dV = det g dx 1 dx 2 dx 3 , (A.61)
where det g is the determinant of the matrix g of metric components gi j (See Ap-
pendix D.4.3.2), and
⎧
⎨ ±n̂1 g22 g33 − (g23 )2 dx 2 dx 3
n̂ da = ±n̂2 g33 g11 − (g31 )2 dx 3 dx 1 (A.62)
⎩
±n̂3 g11 g22 − (g12 )2 dx 1 dx 2
where
∂2 × ∂3
n̂1 = , etc. (A.63)
|∂ 2 × ∂ 3 |
Of particular relevance for both arbitrary curvilinear coordinates (x 1 , x 2 , x 3 ) and

orthogonal curvilinear coordinates (u, v, w) is the distinction between the
3-dimensional and 2-dimensional coordinate volume and area elements, e.g., d3 x ≡
dx 1 dx 2 dx 3 and d2 x ≡ dx 2 dx 3 , etc. (which are just products of coordinate differen-
tials), and the invariant volume and area elements, dV and n̂ da, which include the
appropriate factors of the metric components gi j .
Exercise A.7 Verify (A.61) and (A.62).
A.5.1 Gradient
Using the definition (A.19), it follows that
(∇U ) · ds = dU . (A.64)
From (A.58), the left-hand side of the above equation can be written as
(∇U ) · ds = (∇U )u f du + (∇U )v g dv + (∇U )w h dw , (A.65)
while the right-hand side can be written as
∂U ∂U ∂U
dU = du + dv + dw . (A.66)
∂u ∂v ∂w
By equating these last two equations, we can read off the components of the gradient,
from which we obtain
1 ∂U 1 ∂U 1 ∂U
∇U = û + v̂ + ŵ . (A.67)
f ∂u g ∂v h ∂w
Exercise A.8 Consider a particle of mass m moving in the potential
1
U (x, y, z) = k(x 2 + y 2 ) + mgz . (A.68)
2
Calculate the force F = −∇U in (a) spherical coordinates and (b) cylindrical
coordinates. You should find:
(a) F = (−kr sin2 θ − mg cos θ ) r̂ + (−kr sin θ cos θ + mg sin θ ) θ̂ ,

(b) F = −kρ ρ̂ − mg ẑ .
(A.69)
A.5.2 Curl

(∇ × A) · n̂ da = A · ds , (A.70)
C
where C is the infinitesimal closed curve bounding the area element n̂ da, with
orientation given by the right-hand rule relative to n̂. To calculate the components
of ∇ × A, we take (in turn) the three different infinitesimal area elements given in
(A.60). Starting with n̂ da = û gh dv dw, the left-hand side of (A.70) becomes
(∇ × A) · n̂ da = (∇ × A)u gh dv dw . (A.71)
Since this area element lies in a u = const surface, du = 0, for which
A · ds = Av g dv + Aw h dw . (A.72)
Integrating this around the corresponding boundary curve C shown in Fig. A.6, we
find that the right-hand side of (A.70) becomes
Fig. A.6 Infinitesimal (v+dv,w+dw)

closed curve C in the
u = const surface bounding
the area element
n̂ da = û gh dv dw
(v,w+dw) C (v+dv,w)
h dw
g dv
(v,w)

A · ds = (Av g)|w dv + (Aw h)|v+dv dw − (Av g)|w+dw dv − (Aw h)|v dw
C
∂ ∂
= (Aw h) dv dw − (Av g) dw dv .
∂v ∂w
(A.73)
Thus,

1 ∂ ∂
(∇ × A)u = (Aw h) − (Av g) . (A.74)
gh ∂v ∂w
Repeating the above calculation for the other two components yields

1 ∂ ∂
(∇ × A)v = (Au f ) − (Aw h) , (A.75)
hf ∂w ∂u
and

1 ∂ ∂
(∇ × A)w = (Av g) − (Au f ) . (A.76)
f g ∂u ∂v
A.5.3 Divergence

(∇ · A) dV = A · n̂ da , (A.77)
S
Fig. A.7 Infinitesimal (fg)|w+dw du dv

closed surface S bounding
the volume element (hf )|v+dv dw du
dV = f gh du dv dw. The (gh)|u+du dv dw
magnitudes of the
infinitesimal area elements
comprising S are also given;
the notation u means
evaluated at u, etc (gh)|u dv dw (hf )|v dw du
(u,v,w)
(fg)|w du dv
where S is the infinitesimal closed surface bounding the volume element dV , with
outward pointing normal n̂. From (A.59), the left-hand side of the above equation
can be written as
(∇ · A) dV = (∇ · A) f gh du dv dw . (A.78)
From (A.60), the integrand of the right-hand side of (A.77) contains the terms
±Au gh dv dw
A · n̂ da = ±Av h f dw du . (A.79)
±Aw f g du dv
Integrating over the boundary surface S shown in Fig. A.7, we obtain

A · n̂ da = (Au gh)|u+du dv dw − (Au gh)|u dv dw
S
+(Av h f )|v+dv dw du − (Av h f )|v dw du
(A.80)
+ (Aw f g)|w+dw du dv − (Aw f g)|w du dv

∂ ∂ ∂
= (Au gh) + (Av h f ) + (Aw f g) du dv dw .
∂u ∂v ∂w
Thus,

1 ∂ ∂ ∂
∇·A= (Au gh) + (Av h f ) + (Aw f g) . (A.81)
f gh ∂u ∂v ∂w
A.5.4 Laplacian
Since the Laplacian of a scalar field is defined as the divergence of the gradient, it
immediately follows from (A.67) and (A.81) that

1 ∂ gh ∂U ∂ h f ∂U ∂ f g ∂U
∇ U=
2
+ + .
f gh ∂u f ∂u ∂v g ∂v ∂w h ∂w
(A.82)
Exercise A.9 Show that in spherical coordinates (r, θ, φ):
∂U 1 ∂U 1 ∂U
∇U = r̂ + θ̂ + φ̂ ,
∂r r ∂θ r sin θ ∂φ

1 ∂ ∂ Aθ
∇×A= (Aφ sin θ ) − r̂
r sin θ ∂θ ∂φ

1 1 ∂ Ar ∂ 1 ∂ ∂ Ar
+ − (Aφ r ) θ̂ + (Aθ r ) − φ̂ ,
r sin θ ∂φ ∂r r ∂r ∂θ
1 ∂ 2 1 ∂ 1 ∂ Aφ
∇·A= 2 r Ar + (sin θ Aθ ) + ,
r ∂r r sin θ ∂θ r sin θ ∂φ

1 ∂ ∂U 1 ∂ ∂U 1 ∂ 2U
∇ 2U = 2 r2 + 2 sin θ + 2 2 .
r ∂r ∂r r sin θ ∂θ ∂θ r sin θ ∂φ 2
(A.83)
Exercise A.10 Show that in cylindrical coordinates (ρ, φ, z):
∂U 1 ∂U ∂U
∇U = ρ̂ + φ̂ + ẑ ,
∂ρ ρ ∂φ ∂z

1 ∂ Az ∂ Aφ ∂ Aρ ∂ Az 1 ∂ ∂ Aρ
∇×A= − ρ̂ + − θ̂ + (Aφ ρ) − ẑ ,
ρ ∂φ ∂z ∂z ∂ρ ρ ∂ρ ∂φ
1 ∂ 1 ∂ Aφ ∂ Az
∇·A= ρ Aρ + + ,
ρ ∂ρ ρ ∂φ ∂z

1 ∂ ∂U 1 ∂ 2U ∂ 2U
∇ 2U = ρ + 2 + .
ρ ∂ρ ∂ρ ρ ∂φ 2 ∂z 2
(A.84)
A.6 Integral Theorems of Vector Calculus
Using the definitions of gradient, curl, and divergence given above, one can prove
the following fundamental theorems of integral vector calculus:
Theorem A.1 Fundamental theorem for gradients:

(∇U ) · ds = U (rb ) − U (ra ) , (A.85)
C
where ra , rb are the endpoints of C.
Theorem A.2 Stokes’ theorem:

(∇ × A) · n̂ da = A · ds , (A.86)
S C
where C is the closed curved bounding the surface S, with orientation given by the
right-hand rule relative to n̂.
Theorem A.3 Divergence theorem:

(∇ · A) dV = A · n̂ da , (A.87)
V S
where S is the closed surface bounding the volume V , with outward pointing
normal n̂.
For infinitesimal volume elements, area elements, and path lengths, the proofs
of these theorems follow trivially from the definitions given in (A.19), (A.20), and
(A.21). For finite size volumes, areas, and path lengths, one simply adds together
the contribution from infinitesimal elements. The neighboring surfaces, edges, and
endpoints of these infinitesimal elements have oppositely-directed normals, tangent
vectors, etc., and hence yield terms that cancel out when forming the sum. For detailed
proofs, we recommend Schey (1996), Boas (2006), or Griffiths (1999).
A.7 Some Additional Theorems for Vector Fields
Here we state (without proof) some additional theorems for vector fields. These make
use of the identities ∇ × ∇U = 0 and ∇ · (∇ × A) = 0, which we derived earlier
(See Example A.1), and the integral theorems of vector calculus from the previous
subsection.
Theorem A.4 Any vector field F can be written in the form
F = −∇U + ∇ × W . (A.88)
Note that this decomposition is not unique as the transformations
U → U + C , where C = const ,
(A.89)
W → W + ∇ ,
leave F unchanged.
Theorem A.5 Curl-free vector fields:

∇×F=0 ⇔ F = −∇U ⇔ F · ds = 0 . (A.90)
C
Theorem A.6 Divergence-free vector fields:

∇·F=0 ⇔ F=∇×W ⇔ F · n̂ da = 0 . (A.91)
S
Both of the above theorems require that: (i) F be differentiable, and (ii) the region of
interest be simply-connected (i.e., that there are not any holes; see the discussion in
Appendix B.2). We will assume that both of these conditions are always satisfied.
Theorem A.5 is particularly relevant in the context of conservative forces, which
we encounter often in the main text. Recall that F is conservative if and only if
the work done by F in moving a particle from ra to rb is independent of the path
connecting
the two points. But path-independence is equivalent to the condition that
C F · d s = 0 for any closed curve C. Thus, from Theorem A.5, we can conclude that
a conservative force is curl-free, i.e., ∇ × F = 0, and that it can always be written
as the gradient of a scalar field.
Exercise A.11 In two dimensions, consider the vector fields
A = x x̂ + y ŷ , B = −y x̂ + x ŷ , C = y x̂ + x ŷ . (A.92)
(a) Show that ∇ × A = 0, ∇ · B = 0, and ∇ × C = 0, ∇ · C = 0.

(b) Make plots of these vector fields.
(c) Show that A = ρ ρ̂ and B = ρ φ̂, where (ρ, φ) are plane polar coordinates
related to (x, y) via x = ρ cos φ and y = ρ sin φ.
(d) Show that C = 21 ∇V , where (U, V ) are orthogonal hyperbolic coordinates

on the plane defined by U ≡ x 2 − y 2 and V ≡ 2x y.
A non-constant vector field like C, which is both curl-free and divergence-free,
is said to be a pure-shear vector field. The “shearing pattern” of the C will look
like that in Fig. A.4.
A.8 Dirac Delta Function
The Dirac delta function δ(r − r0 ) is a mathematical representation of a “spike”—

i.e., a quantity that is zero at all points except at the spike, where it is infinite,

0, if r = r0
δ(r − r0 ) = (A.93)
∞, if r = r0
and such that5

dV δ(r − r0 ) = 1 (A.94)
V
for any volume V containing the spike. The Dirac delta function is not an ordinary
mathematical function. It is what mathematicians call a generalized function or
distribution. An example of a Dirac delta function is the mass density of an idealized
point particle, μ(r) = mδ(r − r0 ).
In one dimension, the Dirac delta function δ(x − x0 ) can be represented as the
limit of a sequence of functions f n (x) all of which have unit area, but which get
narrower and higher as n → ∞. Some simple example sequences are:
(i) A sequence of top-hat functions centered at x0 with width 2/n:

n/2 x0 − 1/n < x < x0 + 1/n
f n (x) = (A.95)
0 otherwise
(ii) A sequence of Gaussian probability distributions with mean μ = x0 and stan-

dard deviation σ = 1/n:
5 We are adopting here the standard “physicist’s” definition of a 3-dimensional Dirac delta function
(See e.g., Griffiths 1999), where we integrate it against the 3-dimensional volume element dV . But
note that we could also define a 3-dimensional Dirac delta function δ̃(r − r0 ) with respect to the
coordinate volume element d3 x via V d3 x δ̃(r − r0 ) = 1. (Recall that d3 x = du dv dw while
dV = f gh du dv dw for orthogonal curvilinear coordinates (u, v, w).) The difference between
these two definitions of the Dirac delta function shows up in their transformation properties under
a coordinate transformation, see Footnote 7.
n
f n (x) = √ e−n (x−x0 ) /2
2 2
(A.96)
2π
(iii) A sequence of sinc functions6 centered at x0 of the form:

n
f n (x) = sinc [n(x − x0 )] (A.97)
π
Equivalently, the Dirac delta function can be defined in terms of its action on a
set of test functions f (x), which are infinitely differentiable and which vanish as
x → ±∞. The defining property of a 1-dimensional Dirac delta function is then

b
f (x ) a < x < b
dx f (x)δ(x − x ) = (A.98)
a 0 otherwise
for any test function f (x).
Exercise A.12 Prove the following properties of the 1-dimensional Dirac delta
function, which follow from the defining property (A.98):
d
δ(x − a) = (u(x − a)) ,
dx
δ (−x) = −δ (x) ,
δ(−x) = δ(x) ,
(A.99)
1
δ(ax) = δ(x) ,
|a|
δ(x − xi )
δ [ f (x)] = .
i
| f (xi )|
In the above expressions, u(x) is the unit step function,

0, x <0
u(x) = (A.100)
1, x ≥0
and f (x) is such that f (xi ) = 0 and f (xi ) = 0. The last two properties
indicate that the one-dimensional Dirac delta function δ(x) transforms like a
density under a change of variables—i.e., δ(x) dx = δ(y) dy.
6 The sinc function, sinc x, is defined by sinc x ≡ sin x/x.

In three dimensions, the defining property of the Dirac delta function is

f (r ) if r ∈ V
dV f (r)δ(r − r ) = (A.101)
V 0 otherwise
for any test function f (r). Note that this definition implies
δ(r − r ) = δ(x − x )δ(y − y )δ(z − z ) ,

1
δ(r − r ) = 2 δ(r − r )δ(θ − θ )δ(φ − φ ) ,
r sin θ (A.102)
1
δ(r − r ) = δ(ρ − ρ )δ(φ − φ )δ(z − z ) ,
ρ
in order that

dV δ(r − r ) = dx dy dz δ(x − x )δ(y − y )δ(z − z )

= dr dθ dφ δ(r − r )δ(θ − θ )δ(φ − φ ) (A.103)

= dρ dφ dz δ(ρ − ρ )δ(φ − φ )δ(z − z )
be independent of the choice of coordinates. For general orthogonal curvilinear co-

ordinates (u, v, w),
1
δ(r − r ) = δ(u − u )δ(v − v )δ(w − w ) (A.104)
f gh
as a consequence of dV = f gh du dv dw.7
There is also an integral representation of the 1-dimensional Dirac delta function,
which can be heuristically “derived” by taking a limit of sinc functions:
L ∞
L 1 1
δ(x) = lim sinc(L x) = lim dk e ikx
= dk eikx , (A.105)
L→∞ π L→∞ 2π −L 2π −∞
where we used (A.97). Thus,

∞
1
δ(x − x ) = dk e±ik(x−x ) . (A.106)
2π −∞
7 Ifwe used the alternative definition of the 3-dimensional Dirac delta function δ̃(r − r0 ) discussed
in Footnote 5, then δ̃(r − r ) = δ(u − u )δ(v − v )δ(w − w ), without the factor of f gh.
Similarly, in 3-dimensions,

1
δ(r − r ) = dVk e±ik·(r−r ) , (A.107)
(2π )3 all space
where dVk is the 3-dimensional volume element in k-space.
Example A.3 Recall that in Newtonian gravity the gravitational potential (r, t)
satisfies Poisson’s equation
∇ 2 (r, t) = 4π Gμ(r, t) , (A.108)
where G is Newton’s constant and μ(r, t) is the mass density of the source distri-
bution. Note that the left-hand side of the above equation is just the Laplacian of .
We now show that for a stationary point source μ(r, t) = mδ(r − r0 ), the potential
is given by the well-known formula
Gm
(r) = − . (A.109)
|r − r0 |
We begin by noting that

r̂
∇· = 0, for r = 0 , (A.110)
r2
which follows from the expression for the divergence in spherical coordinates (See
Exercise A.9). To determine its behavior at r = 0, we consider the volume integral
of ∇ · (r̂/r 2 ) over a spherical volume of radius R centered at the origin. Using the
divergence theorem (A.87), we obtain

2π π
r̂ r̂ 1 2
∇· dV = · n̂ da = R sin θ dθ dφ = 4π , (A.111)
V r2 S r2 φ=0 θ=0 R2
independent of the radius R. Thus, by comparison with the definition of the Dirac
delta function, (A.94), we can conclude that

r̂
∇· = 4π δ(r) . (A.112)
r2
But since

1 r̂
∇ =− 2, (A.113)
r r
we can also write

1
∇2 = −4π δ(r) . (A.114)
r
Finally, by simpling shifting the origin, we have

r − r 1
∇· = 4π δ(r − r ) , ∇ 2
= −4π δ(r − r ) . (A.115)
|r − r |3 |r − r |
Thus, (r) = −Gm/|r − r0 | as claimed.
Exercise A.13 Show that
∇ × (r n r̂) = 0 for all n ,

(A.116)
∇ · (r n r̂) = (n + 2)r n−1 for n = −2 .
Thus, it is only for n = −2 that we get a Dirac delta function.
Suggested References
Full references are given in the bibliography at the end of the book.
Boas (2006): Chapter 6 is devoted to vector algebra and vector calculus, especially
suited for undergraduates.
Griffiths (1999): Chapter 1 provides an excellent review of vector algebra and vector
calculus, at the same level as this appendix. Our discussion of orthogonal curvilin-
ear coordinates in Appendix A.5 is a summary of Appendix A in Griffiths, which
has more detailed derivations and discussion.
Schey (1996): An excellent introduction to vector calculus emphasizing the geometric
nature of the divergence, gradient, and curl operations.
Appendix B
Differential Forms
Although we will not need to develop the full machinery of tensor calculus for the
applications to classical mechanics covered in this book, the concept of a differential
form and the associated operations of exterior derivative and wedge product will
come in handy from time to time. For example, they are particulary useful for deter-
mining whether certain differential equations or constraints on a mechanical system
(See e.g., Sect. 2.2.3) are integrable or not. They are also helpful in understanding
the geometric structure underlying Poisson brackets (Sect. 3.5). More generally, dif-
ferential forms are actually the quantities that you integrate on a manifold, with the
integral theorems of vector calculus (Appendix A.6) being special cases of a more
general (differential-form version) of Stokes’ theorem.
In broad terms, the exterior derivative is a generalization of the total derivative (or
gradient) of a function, and the curl of a vector field in three dimensions. The wedge
product is a generalization of the cross-product of two vectors. And differential forms
are quantities constructed from a sum of wedge products of coordinate differentials
dx i . Readers interested to learn more about differential forms and related topics
should see e.g., Flanders (1963) and Schutz (1980).
B.1 Definitions
Since we have not developed a general framework for working with tensors, our
presentation of differential forms will be somewhat heuristic, starting with familar
examples for 0-forms and 1-forms, and then adding mathematical operations as need-
ed (e.g., wedge product and exterior derivative) to construct higher-order differential
forms. To keep things sufficiently general, we will consider an n-dimensional mani-
fold M with coordinates x i ≡ (x 1 , x 2 , . . . , x n ). From time to time we will consider
ordinary 3-dimensional space to make connection with more familiar mathematical
objects and operations.

438 Appendix B: Differential Forms
B.1.1 0-Forms, 1-Forms, and Exterior Derivative
To begin, a 0-form is just a function
α ≡ α(x 1 , x 2 , . . . , x n ) , (B.1)
while a 1-form is a linear combination of the coordinate differentials,

β≡ βi dx i , (B.2)
i
for which the components βi ≡ βi (x 1 , x 2 , . . . , x n ) transform according to
∂xi
βi = βi , i = 1 , 2 , . . . , n , (B.3)
i
∂ xi

under a coordinate transformation x i → x i (x i ). We impose this requirement on the
components in order that

β≡ βi dx i = βi dx i (B.4)
i i
be invariant under a coordinate transformation. The set {dx 1 , dx 2 , . . . , dx n } is a

coordinate basis for the n-dimensional space of 1-forms on M. A simple example of
a 1-form is the exterior derivative of a 0-form α,

dα ≡ (∂i α) dx i , (B.5)
i
where ∂i α ≡ ∂α/∂ x i . Note that the exterior derivative of a 0-form is just the usual
total differential (or gradient) of a function.
Exercise B.1 Verify (B.4) using (B.3) and the transformation property of the
coordinate differentials dx i .
B.1.2 2-Forms and Wedge Product
To construct a 2-form from two 1-forms, we introduce the wedge product of two
forms. We require this product to be anti-symmetric,
Appendix B: Differential Forms 439
dx i ∧ dx j = −dx j ∧ dx i , (B.6)
and linear with respect to its arguments,
α ∧ ( fβ + gγ ) = f (α ∧ β) + g(α ∧ γ ) , (B.7)
where f and g are any two functions. Given this definition, it immediately follows
that the wedge product of two 1-forms α and β can be written as

α∧β = αi β j dx i ∧ dx j = (αi β j − α j βi ) dx i ∧ dx j , (B.8)
i, j i< j
where we used the anti-symmetry of dx i ∧ dx j to get the last equality. Note that in
three dimensions
αi β j − α j βi = εi jk (α × β)k , (B.9)
k
where on the right-hand side we are treating αi and βi as the components of two
vectors α and β. Thus, the wedge product of two 1-forms generalizes the cross
product of two vectors in three dimensions. The most general 2-form on M will have
the form
γ ≡ γi j dx i ∧ dx j , (B.10)
i< j
where the components γi j ≡ γi j (x 1 , x 2 , . . . , x n ) are totally anti-symmetric under

interchange of i and j (i.e., γi j = −γ ji for all i and j).
The exterior derivative can also be extended to an arbitrary 1-form α. We simply
take the exterior derivative of the components α j , for j = 1, 2, . . . , n, and then
wedge those 1-forms dα j = i (∂i α j ) dx with the coordinate differentials dx .
i j
This leads to the 2-form

dα ≡ (∂i α j ) dx i ∧ dx j = (∂i α j − ∂ j αi ) dx i ∧ dx j . (B.11)
i, j i< j
Note that in three dimensions

∂i α j − ∂ j αi = εi jk (∇ × α)k , (B.12)
k
where on the right-hand side we are treating αi as the components of a vector field
α. So the exterior derivative of a 1-form generalizes the curl of a vector field in three
dimensions.
B.1.3 3-Forms and Higher-Order Forms
We can continue in this fashion to construct 3-forms, 4-forms, etc., by requiring that
the wedge product be associative,
α ∧ (β ∧ γ ) = (α ∧ β) ∧ γ = α ∧ β ∧ γ . (B.13)
Thus, a general p-form α can be written as

α= αi1 i2 ···i p dx i1 ∧ dx i2 ∧ · · · ∧ dx i p , (B.14)
i 1 <i 2 <···<i p
where the components αi1 i2 ···i p ≡ αi1 i2 ···i p (x 1 , x 2 , . . . , x n ) are totally anti-symmetric
under interchange of the indices i 1 , i 2 , . . . , i p . Similarly, the exterior derivative of a
p-form α is the ( p + 1) form

dα = ∂i1 αi2 i3 ···i p+1 − ∂i2 αi1 i3 ···i p+1 · · · − ∂i p+1 αi2 i3 ···i p i1
i 1 <i 2 <···<i p+1
dx i1 ∧ dx i2 ∧ · · · ∧ dx i p +1 . (B.15)
Note that (n + 1) and higher-rank forms in an n-dimensional space are identically

zero due to the anti-symmetry of the wedge product, i.e., dx i ∧ dx j = 0 for i = j.
B.1.4 Total Anti-Symmetrization
By introducing a notation for totally anti-symmetrizing a set of indices, e.g.,
1
[i j] ≡ (i j − ji) ,
2!
1 (B.16)
[i jk] ≡ (i jk − ik j + jki − jik + ki j − k ji) ,
3!
etc. ,
we can write down the general expressions for the components of the wedge product
and exterior derivative in compact form:
( p + q)!
(α ∧ β)i1 ···i p j1 ··· jq = α[i1 ···i p β j1 ··· jq ] (B.17)
p!q!
and
(dα)i1 i2 ···i p+1 = ( p + 1)∂[i1 αi2 ···i p+1 ] , (B.18)
where α and β denote a p-form and q-form, respectively.
Exercise B.2 Let α be a p-form and β be a q-form. Show that
α ∧ β = (−1) pq β ∧ α . (B.19)
Exercise B.3 Let α be a p-form and β be a q-form. Show that the exterior
derivative of the wedge product α ∧ β satisfies
d(α ∧ β) = (dα) ∧ β + (−1) p α ∧ (dβ) , (B.20)
as a consequence of the ordinary product rule for partial derivatives and the
anti-symmetry of the differential forms.
B.2 Closed and Exact Forms
Given the above definitions, we can introduce some additional terminology:

• A p-form α is said to be exact if there exists a ( p − 1)-form β for which α = dβ.
• A p-form α is said to be closed if dα = 0.
It is easy to show that all exact forms are closed—i.e.,
d(dβ) = 0 , (B.21)
as a consequence of the commutativity of partial derivatives, ∂i ∂ j = ∂ j ∂i . This result

is called the Poincaré lemma. But what about the converse? Are all closed forms
also exact?
The answer is that all closed forms are locally exact, but globally this need not be
true (See e.g., Schutz 1980 for a proof). Global exactness of a closed form requires
that the space be topologically trivial (i.e., simply-connected), in the sense that the
space shouldn’t contain any “holes”. More precisely, this means that any closed loop
in the space should be (smoothly) contractible to a point. Ordinary 3-dimensional
space with no points removed or the surface of a 2-sphere are examples of simply-
connected spaces. The punctured plane (R2 with the origin removed) or the surface
of a torus are examples of spaces that are not simply-connected. Any closed curve
Fig. B.1 The coordinate

lines on a torus are examples
of closed curves that are not
contractible to a point
encircling the origin of the punctured plane, and any of the coordinates lines on the
torus shown in Fig. B.1 are not contractible to a point. (See Schutz 1980 or Flanders
1963 for more details.)
Exercise B.4 In three dimensions, show that d(dα) = 0 corresponds to

(a) ∇ × ∇α = 0 if α is a 0-form;
(b) ∇ · (∇ × A) = 0 if α is a 1-form α = i αi dx i with Ai ≡ αi .
Exercise B.5 Consider the 1-form

1
α≡ (−y dx + x dy) (B.22)
x 2 + y2
defined on the punctured plane. Show that α is closed, but globally is not exact.
Find a function f (x, y) for which α = d f locally. (Hint: Plane polar coordinates
(r, φ) might be useful for this.)
B.3 Frobenius’ Theorem
You may recall from a math methods class trying to determine if a 1st-order differ-
ential equation of the form
A(x, y) dx + B(x, y) dy = 0 (B.23)
is integrable or not. You may also remember that if
∂ y A = ∂x B , (B.24)
then (at least locally) there exists a function ϕ ≡ ϕ(x, y) for which
dϕ = A(x, y) dx + B(x, y) dy , with A = ∂x ϕ , B = ∂y ϕ . (B.25)
But requiring that the differential equation be exact is actually too strong a require-
ment for integrability. More generally, (B.23) is integrable if and only if there exists
a function μ ≡ μ(x, y), called an integrating factor, for which
μ(x, y) [ A(x, y) dx + B(x, y) dy] (B.26)
is exact, so that1
∂ y (μA) = ∂x (μB) . (B.27)
It turns out that in two dimensions one can always find such an integrating factor.
Thus, all 1st-order differential equations of the form given in (B.23) are integrable
(Exercise B.6). But explicitly finding an integrating factor in practice is not an easy
task in general.
Now in three and higher dimensions not all 1st-order differential equations are
integrable, so testing for integrability is a necessary and important task. Writing the
differential equation in n dimensions as

α≡ αi dx i = 0 , (B.28)
i
where αi ≡ αi (x 1 , x 2 , . . . , x n ), the question of integrability again becomes does

there exist an integrating factor μ ≡ μ(x 1 , x 2 , . . . , x n ) for which
dϕ = μα (B.29)
for some ϕ. This would imply
∂i (μα j ) = ∂ j (μαi ) for all i, j = 1, 2, . . . , n , (B.30)
which in the language of differential forms becomes
0 = dμ ∧ α + μ dα ⇔ dα = −μ−1 dμ ∧ α . (B.31)
1 Recall from thermodynamics that heat flow is described by an inexact differential d¯Q (notationally,
the bar on the ‘d’ is to indicate that it is not the total differential of a function Q). But d¯Q becomes
exact when multiplied by an integrating factor, i.e., dS = d¯Q/T , where T is the temperature and
S is the entropy.
But since α ∧ α = 0, it follows that
dα ∧ α = 0 . (B.32)
This is a necessary condition for (B.28) to be integrable. That it is also a sufficient

condition was proven by Frobenius in 1877. Thus, Frobenius’ theorem tells us that
(B.32) is necessary and sufficient for the integrability of the 1st-order differential
equation (B.28).
Frobenius’s theorem can also be extended to the case of a system of 1st-order
differential equations:

αA ≡ αiA dx i = 0 , A = 1, 2, . . . , M , (B.33)
i
where M < n and αiA ≡ αiA (x 1 , x 2 , . . . , x n ). We would like to know if this system
is integrable in the sense of defining an (n − M)-dimensional hypersurface in the
original n-dimensional space of coordinates. The necessary and sufficient condition
for this to be true is the existence of an invertible transformation from the α A to a set
of exact 1-forms (i.e., total differentials):

dϕ A = μ AB α B ⇔ α A = μ−1 AB dϕ B , (B.34)
B B
where ϕ A ≡ ϕ A (x 1 , x 2 , . . . , x n ) is a set of functions, and μ AB ≡ μ AB (x 1 , x 2 , . . . , x n )

are the components of an invertible matrix (which is a generalization of the integrat-
ing factor μ for a single equation). In this context, Frobenius’ theorem states that the
set of constraints given by (B.33) is integrable if and only if
dα A ∧ α 1 ∧ α 2 ∧ · · · ∧ α M = 0 , A = 1, 2, . . . , M . (B.35)
For a single differential equation, we recover (B.32).
Exercise B.6 Using Frobenius’ theorem, prove that any 1st-order differential
equation in two dimensions,
α ≡ A(x, y) dx + B(x, y) dy = 0 , (B.36)
is integrable.
Exercise B.7 (Adapted from Flanders 1963.) Consider the 1st-order differential
equation
α ≡ yz dx + x z dy + dz = 0 , (B.37)
in three dimensions.
(a) Use Frobenius’ theorem to show that this equation is integrable.

(b) Verify that μ = ex y is an integrating factor for α with ϕ = zex y .
B.4 Integration of Differential Forms
Although you may not have thought about it this way, the things that you integrate
on a manifold are really just differential forms. Indeed, the integrand f (x) dx of the
familiar integral x2
f (x) dx (B.38)
x1
from calculus is, in the language of this appendix, a 1-form. And the transformation
property of f (x) under a change of variables x → y(x),

f (x)
f (x) → f (y) ≡ , (B.39)
dy/dx x=x(y)
is just what you need in order for
f (x) dx = f (y) dy (B.40)
to be independent of the choice of coordinates (compare these last two equations

with (B.3) and (B.4)).
More generally, a 1-form field α on an n-dimensional manifold M can be thought
of as mapping from a 1-dimensional curve C (with parameter λ and tangent vector
dx i /dλ) to the value of the line integral
λ2 dx i
α= αi dx i ≡ αi dλ . (B.41)
C C i λ1 i
dλ
Similarly, a 2-form field β can be thought of as a mapping from a 2-dimensional

surface S (with coordinates (u, v) and tangent vectors ∂ x i /∂u, ∂ x i /∂v) to the value
of the surface integral

β= βi j dx i ∧ dx j
S S i< j

u2 v2
∂xi ∂x j ∂x j ∂xi
≡ βi j − du dv (B.42)
u1 v1 i< j
∂u ∂v ∂u ∂v

u2 v2
∂(x i , x j )
= βi j du dv ,
u1 v1 i< j
∂(u, v)
where the Jacobian of the transformation from (x i , x j ) to (u, v) is2

i i
∂(x i , x j ) ∂∂ux ∂∂vx ∂ x i ∂ x j ∂x j ∂xi
≡ ∂x j ∂x j = − . (B.43)
∂(u, v) ∂u ∂v ∂u ∂v ∂u ∂v
The extension to 3-form, 4-form, · · · , n-form fields follows by noting that the above
integrands are special cases of the general result that a p-form γ maps a set of p
vectors with components {Ai , B j , . . . , C k } to the real number

γi j···k Ai B j · · · C k − A j B i · · · C k − · · · − Ak B j · · · C i . (B.44)
i< j<···<k
By taking
∂xi ∂x j ∂xk
Ai = du , B j = dv , ··· , C k = dw , (B.45)
∂u ∂v ∂w
where (u, v, . . . , w) are the coordinates for a p-dimensional hypersurface, we are able
to generalize (B.41) and (B.42) to arbitrary p-forms, with the appropriate Jacobians
entering these expressions. Figure B.2 shows the tangent vectors and infinitesimal
coordinate area element for a 2-dimensional surface spanned by the coordinates
(u, v).
Note that all of these integrals are oriented in the sense that swapping the order
of the coordinates, e.g., (u, v) → (v, u), in the parametrization of the 2-dimensional
surface S, changes the sign of the Jacobian and hence the sign of the integral. In
addition, if one decides to change coordinates to do a particular integral, the Jacobian
of the transformation enters automatically via the wedge product of the coordinate
differentials. For example, in two dimensions, if one transforms from (x, y) to (u, v)
it follows that
2 The vertical lines in (B.43) mean you should take the determinant of the 2 × 2 matrix of partial
derivatives. See Appendix D.4.3.2 for more details, if needed.
Fig. B.2 Infinitesimal v+dv

coordinate area element for a
2-dimensional surface
spanned by the coordinates
(u, v)
∂xj
dv
∂v u+du
∂xi
du
∂u

∂x ∂x ∂y ∂y
dx ∧ dy = du + dv ∧ du + dv
∂u ∂v ∂u ∂v

(B.46)
∂x ∂y ∂x ∂y ∂(x, y)
= − du ∧ dv = du ∧ dv ,
∂u ∂v ∂v ∂u ∂(u, v)
where we used the chain rule and the anti-symmetry of the wedge product to get the
first and second equalities above. In n dimensions, for a coordinate transformation

from x i ≡ (x 1 , x 2 , . . . , x n ) to x i = (x 1 , x 2 , . . . , x n ), we have
∂x1 ∂ x n i1
dx 1 ∧ · · · ∧ dx n = ··· ··· i n
dx ∧ · · · ∧ dx in
∂x i 1 ∂ x
i 1 i
n
∂x1 ∂ x n i1 ···in 1

= ··· ··· i n
ε dx ∧ · · · ∧ dx n (B.47)
∂ x i1 ∂ x
i 1 i n
∂(x 1 , . . . , x n )
= dx 1 ∧ · · · ∧ dx n ,
∂(x , . . . , x )
1 n
where we made use of the n-dimensional Levi-Civita symbol and used (D.81) to
get the last equality. Note that this is precisely the inverse transformation of the
components of an n-form

∂(x 1 , . . . , x n )
ω1···n = ω1 ···n , (B.48)
∂(x 1 , . . . , x n )
which is needed for

ω = ω1···n dx 1 ∧ · · · ∧ dx n = ω1 ···n dx 1 ∧ · · · ∧ dx n (B.49)
to be invariant under a coordinate transformation. Given the presence of the Jacobian

in (B.47), it is natural to interpret the n-dimensional coordinate element dn x as the
wedge product
dn x ≡ dx 1 ∧ dx 2 ∧ · · · ∧ dx n , (B.50)
and the integral of ω1···n ≡ ω1···n (x 1 , x 2 , . . . , x n ) as the integral of the n-form ω:

ω1···n d x ≡
n
ω1···n dx ∧ · · · ∧ dx =
1 n
ω. (B.51)
B.4.1 Stokes’ Theorem for Differential Forms
Finally, to end this section, we note that the integral theorems of vector calculus
(Appendix A.6) are actually special cases of an all-inclusive Stokes’ theorem, written
in terms of differential forms,

dα = α, (B.52)
U ∂U
where α is a ( p −1) form and U is p-dimensional region in M with boundary ∂U . We

will not prove (B.52) here (See e.g., Flanders 1963). Rather we leave it as an exercise
(Exercise B.9) to show that in three dimensions (B.52) reduces to the fundamental
theorem for gradients (A.85), Stokes’ theorem (A.86), and the divergence theorem
(A.87), if one makes the appropriate identification of 1-forms and 2-forms with
vector fields, and exterior derivative with either the gradient, curl, or divergence.
Thus, (B.52) unifies the integral theorems of vector calculus.
Exercise B.8 (a) Show explicitly by taking partial derivatives that a coordinate
transformation from Cartesian coordinates (x, y) to plane polar coordinates
(r, φ) leads to
dx ∧ dy = r dr ∧ dφ . (B.53)
(b) Similarly, show that a coordinate transformation from Cartesian coordinates

(x, y, z) to spherical coordinates (r, θ, φ) leads to
dx ∧ dy ∧ dz = r 2 sin θ dr ∧ dθ ∧ dφ . (B.54)
Exercise B.9 Show that in three dimensions (B.52) reduces to the fundamen-
tal theorem for gradients (A.85), Stokes’ theorem (A.86), and the divergence
theorem (A.87), by making the following identifications:
(a) For p = 1, identify the 0-form α with the function U , and the exterior
derivative dα with the gradient ∇U . Also, identify
dx i
ds (B.55)
ds
with the line element ds, where dx i /ds is the tangent vector to the curve C
parameterized by the arc length s.
(b) For p = 2, identify the 1-form α with the vector field Ai ≡ αi , and use
(B.12) to identify dα with ∇ × A. Also, identify
∂(x j , x k )
εi jk du dv (B.56)
j<k
∂(u, v)
with the area element n̂ da, where (u, v) are coordinates on S.

(c) For p = 3, identifty the 2-form α with the vector field Ai ≡ j<k εi jk α jk ,
and the exterior derivative dα with εi jk ∇ · A. Also, identify
∂(x i , x j , x k )
εi jk du dv dw (B.57)
i< j<k
∂(u, v, w)
with the volume element dV , where (u, v, w) are coordinates in V .
Flanders (1963): A classic text about differential forms, appropriate for graduate
students or advanced undergraduates comfortable with abstract mathematics.
Schutz (1980): A introduction to differential geometry, including tensor calculus
and differential forms, with an emphasis on geometrical methods. Appropriate for
graduate students or advanced undergraduates comfortable with abstract mathe-
matics.
Appendix C
Calculus of Variations
The calculus of variations is an extension of the standard procedure for finding the
extrema (i.e., maxima and minima) of a function f (x) of a single real variable x.
But instead of extremizing a function f (x), we extremize a functional I [y], which
is a “function of a function” y = f (x). Classic problems that can be solved using
the calculus of variations are: (i) finding the curve connecting two points in the
plane that has the shortest distance (a geodesic problem), (ii) finding the shape of
a closed curve of fixed length that encloses the maximum area (an isoperimetric
problem), (iii) finding the shape of a wire joining two points such that a bead will
slide along the wire under the influence of gravity in the shortest amount of time (the
famous brachistochrone problem of Johann Bernoulli). The calculus of variations
also provides an alternative way of obtaining the equations of motion for a particle,
or a system of particles, in classical mechanics. In this appendix, we derive the Euler
equations, discuss ways of solving these equations in certain simplified scenarios,
and extend the formalism to deal with integral constraints. For a more thorough
introduction to the calculus of variations, see, e.g., Boas (2006), Gelfand and Fomin
(1963), and Lanczos (1949). Specific applications to classical mechanics will be
given in Chap. 3.
C.1 Functionals
In its simplest form, a functional I = I [y] is a mapping from some specified set of
functions {y = f (x)} to the set of real numbers R. For the types of problems that we
will be most interested in, the functions y = f (x) are defined on some finite interval
x ∈ [x1 , x2 ]; they are single-valued and have continuous first derivatives; and they
have fixed endpoints ℘1 ≡ (x1 , y1 ), ℘2 ≡ (x2 , y2 ). (Curves that cannot be described
by a single-valued function can be put in parametric form, x = x(t), y = y(t), which
we will discuss in detail in Appendix C.6.) A simple concrete example of a functional
is the arc length of the curve traced out by a function y = f (x) that connects ℘1
and ℘2 :
452 Appendix C: Calculus of Variations
Fig. C.1 The arc length of

the curve traced out by the
y
function y = f (x) between
℘1 and ℘2 can be thought of
as the value of a functional
I [y] evaluated for this
2
particular function y = f (x)
y=f(x)
1
x
℘2 ℘2 x2
I [y] ≡ ds = dx 2 + dy 2 = 1 + y 2 dx , (C.1)
℘1 ℘1 x1
as shown in Fig. C.1. The corresponding calculus of variations problem is then to

find the function y = f (x) that minimizes the arc length between the two endpoints.
We know the answer to this problem is a straight line, but to actually prove it requires
some work. We will do this explicitly using the formalism of the calculus of variations
in Example C.1 below.
More generally, we will consider functionals of the form
x2
I [y] ≡ F(y, y , x) dx , (C.2)
x1
where x is the independent variable and the set of functions {y = f (x)} is as before,
but the integrand F is now an arbitrary function of the three variables
(y, y , x).

(For the arc-length functional defined previously, F(y, y , x) = 1 + y 2 , which is
independent of x and y, but that does not have to be the case in general.) Note that
to do the integral over x, we need to express both y and y in terms of f (x), but for
the variational calculations that follow, we simply treat F as an ordinary function of
three independent variables. The fact that y and y are related to one another only
shows up later on, when we need to relate the variation δy to δy, cf. (C.6).
Toward the end of this appendix, in Appendix C.7, we will extend our definition
of a functional to n-degrees of freedom:
x2
I [y1 , y2 , . . . , yn ] ≡ F(y1 , y2 , . . . , yn ; y1 , y2 , . . . , yn ; x) dx , (C.3)
x1
Appendix C: Calculus of Variations 453
where yi ≡ f i (x), i = 1, 2, . . . , n, are n functions of the independent variable

x ∈ [x1 , x2 ], which is a form more appropriate for classical mechanics problems
with x replaced by the time t. But for most of this appendix, we will work with the
simpler functional given by (C.2).
C.2 Deriving the Euler Equation
Given I [y], we now want to find its extrema—i.e., those functions y = f (x) for
which I [y] has a local maximum or minimum. Similar to ordinary calculus, a nec-
essary (but not sufficient) condition for y to be an extremum is that the 1st-order
change in I [y] vanish for arbitrary variations to y = f (x) that preserve the boundary
conditions. We define such a variation to y = f (x) by
δy ≡ f¯(x) − f (x) , (C.4)
where f¯(x) is a function that differs infinitesimally from f (x) at each value of x in
the domain [x1 , x2 ] (See Fig. C.2). In terms of δy, the variation of the functional is
then given by δ I [y] ≡ I [y +δy]− I [y], where we ignore all terms that are 2nd-order
or higher in δy, δy . The condition δ I [y] = 0 determines the stationary values of
the functional. These include maxima and minima, but also points of inflection or
saddle points. To check if a stationary value is an extremum, we need to calculate the
change in I [y] to 2nd-order in δy. If the 2nd-order contribution δ 2 I [y] is positive,
then we have a minimum; if it is negative, a maximum; and if it is zero, a saddle
point. However, in the calculations that follow, we will stop at 1st order, as it will
usually be obvious from the context of the problem whether our stationary solution
is a maximum or minimum, without having to explicitly carry out the 2nd-order
variation.
Given definition (C.2) of the functional I [y], it follows that
δ I [y] ≡ I [y + δy] − I [y]

x2 x2
= F(y + δy, y + δy , x) dx − F(y, y , x) dx
x1 x1 (C.5)
x2

!
∂F ∂F
= δy + δy dx ,
x1 ∂y ∂ y
where we ignored all 2nd-order terms to the get the last line. As mentioned earlier,
the variations δy and δy are not independent of one another, but are related by

dy d
δy ≡ δ = δy . (C.6)
dx dx
Fig. C.2 Graphical

illustration of a variation
y
δy ≡ f¯(x) − f (x) to the
function y = f (x). Note that
the variation must vanish at
the end points in order to
2
preserve the boundary
conditions
y+ y=f(x)
y
y=f(x)
1
x
Making this substitution and then integrating the term involving δy by parts, we find

x2

!
∂F x2
∂F d ∂F
δ I [y] = δy + − δy dx . (C.7)
∂ y x1 x1 ∂y dx ∂ y
But since the variation δy must vanish at the endpoints, i.e.,
δy|x1 = 0 , δy|x2 = 0 , (C.8)
the first term on the right-hand side of (C.7) is zero. Then, since the variation δy is
otherwise arbitrary, it follows that

∂F d ∂F
δ I [y] = 0 ⇔ − = 0. (C.9)
∂y dx ∂ y
The equation on the right-hand side is called the Euler equation. (In the context of
classical mechanics, where F is the Lagrangian of the system, the above equation is
called the Euler-Lagrange equation. See Chap. 3 for details.)
Example C.1 Using the Euler equation, show that the curve that minimizes the
distance between two fixed points in a plane is a straight line.

Proof From (C.1) we have F(y, y , x) = 1 + y 2 , which is independent of both x
and y. Thus, the Euler equation (C.9) simplifies to

d ∂F ∂F
=0 ⇔ = const . (C.10)
dx ∂ y ∂ y
Performing the derivative for our particular F, we get
∂F y

= = const , (C.11)
∂y 1 + y 2
which, after rearranging, gives y = A (another constant). So y = Ax + B, which is

the equation of a straight line. The integration constants A and B are determined by
the fixed endpoint conditions y1 = f (x1 ) and y2 = f (x2 ).
A straight line is an example of a geodesic—i.e., the shortest distance path between

two points. In the following two exercises, you are asked to determine the geodesics
on the surface of the cylinder and the surface of a sphere. Recall that the line element
on the surface of a cylinder of radius R is
ds 2 = R 2 dφ 2 + dz 2 , (C.12)
and the line element on the surface of a sphere of radius R is
ds 2 = R 2 (dθ 2 + sin2 θ dφ 2 ) . (C.13)
In general, for an n-dimensional space with coordinates x i ≡ (x 1 , x 2 , . . . , x n ), the

line element can be written as

n
ds 2 = gi j dx i dx j , (C.14)
i, j=1
where gi j ≡ gi j (x 1 , x 2 , . . . , x n ).
Exercise C.1 Show that a geodesic on the surface of a cylinder of radius ρ = R

is a helix—i.e., z = Aφ + B, where A, B are constants, determined by the
location of the endpoints ℘1 = (φ1 , z 1 ) and ℘2 = (φ2 , z 2 ). This result should
not be surprising given that the surface of a cylinder is intrinsically flat, just like
a plane in two dimensions.
Exercise C.2 Show that a geodesic on the surface of a sphere is an arc of a great
circle—i.e., the intersection of the surface of the sphere with a plane passing
through the center of the sphere, A cos φ + B sin φ + cot θ = 0, where A and
B are constants, determined by the end points of the curve. (Hint: Take θ as the
independent variable for this calculation.)
Exercise C.3 Consider the functionals

x2 x2

I1 [y] ≡ F(y, y , x) dx , I2 [y] ≡ F 2 (y, y , x) dx , (C.15)
x1 x1
where F(y, y , x) is everywhere positive. As usual, assume that the functions

y = f (x) are fixed at the end points x1 and x2 . Show that the Euler equations
for I1 [y] and I2 [y] agree if and only if
dF ∂F
= 0 or = 0. (C.16)
dx ∂ y
Thus, unlike varying an ordinary function f (x) > 0 for which the stationary
values of f (x) and g(x) ≡ f 2 (x) are identical, the stationary values of the
functionals defined by F(y, y , x) and F 2 (y, y , x) differ in general.
C.3 A More Formal Discussion of the Variational Process
We can give a more formal derivation of Euler’s equation and the associated varia-
tional process by writing the variation δy of the function y = f (x) as
δy(x) ≡ f¯(x) − f (x) = εη(x) , (C.17)
where η(x) is a function satisfying the appropriate boundary conditions (e.g., it

vanishes at the endpoints), and ε is a real variable, which we take to be infinitesimal
for δy to represent an infinitesimal variation of y. The variation of a functional I [y]
resulting from the above variation of y is then
δ I [y] ≡ I [y + εη] − I [y] . (C.18)
Note that since I [y + εη] is an ordinary function of the real variable ε, we can Taylor
expand I [y + εη] or take its derivatives with respect to ε in the usual way. For our
applications, we will be particularly interested in the first derivative of I [y + εη]
with respect to ε evaluated at ε = 0:

dI [y + εη] I [y + εη] − I [y]
≡ lim . (C.19)
dε ε=0 ε→0 ε
In terms of this derivative, we can define the functional derivative of I , denoted

δ I [y]/δy(x) or more simply δ I [y]/δy, as1
x2
dI [y + εη] δ I [y]
= dx η(x) , (C.20)
dε ε=0 x1 δy(x)
where the integration is over the domain x ∈ [x1 , x2 ] of the functions y = f (x) on
which the functional I [y] is defined. Note that the above definition is the functional
analogue of the definition of the directional derivative of a function ϕ(x 1 , x 2 , . . . , x n )
in the direction of η:

dϕ(x + εη) ϕ(x + εη) − ϕ(x) ∂ϕ i
n
≡ lim = η , (C.21)
dε ε=0 ε→0 ε i=1
∂xi
where integration over the continuous variable x in (C.20) replaces the summation
over the discrete index i in (C.21); see also Appendix A.4.1 and (A.30).
If the functional I [y] has the form given in (C.2), i.e.,
x2
I [y] ≡ dx F(y, y , x) , (C.22)
x1
where the functions y = f (x) are fixed at x1 and x2 , then the variational procedure
described above leads to the same results that we found in the previous section,
namely

x2 x2

!
dI [y + εη] ∂F
∂F d ∂F
= η + − η dx . (C.23)
dε ε=0 ∂ y x1 x1 ∂y dx ∂ y
But since the function η(x) vanishes at the endpoints, the above expression simplifies
to x2
!
dI [y + εη] ∂F d ∂F
= − η dx , (C.24)
dε ε=0 x1 ∂y dx ∂ y
for which
1A word of caution. The functional derivative δ I [y]/δy(x) is a density in x, being defined inside
an integral, (C.20). As such, the dimensions of dx δ I [y]/δy(x) are the same as the dimensions of I
divided by the dimensions of y. For example, if I [y] is the arc length functional, then δ I [y]/δy(x)
has dimension of 1/length.

δ I [y] ∂F d ∂F
= − . (C.25)
δy ∂y dx ∂ y
Thus, in terms of a functional derivative, the Euler equation (C.9) can be written as
δ I [y]/δy(x) = 0.
Exercise C.4 Let I [y] be a functional that depends only on the value of y at a
particular value of x, e.g.,
I [y] ≡ y(x0 ) . (C.26)
Show that for this case the functional derivative is the Dirac delta function:
δ I [y]
= δ(x − x0 ) . (C.27)
δy(x)
C.4 Alternate Form of the Euler Equation
When F does not explicitly depend upon x, it is convenient to work with the Euler
equation in an alternative form. If we simply take the total derivative of F with
respect to x we have
dF ∂ F ∂ F ∂ F
= y + y + . (C.28)
dx ∂y ∂y ∂x
But since

d ∂F ∂F d ∂F
y = y + y , (C.29)
dx ∂ y ∂y dx ∂ y
we can rewrite (C.28) as

dF ∂F d ∂F ∂F d ∂F
= + y +y
− . (C.30)
dx ∂x dx ∂y ∂y dx ∂ y
Thus, using the Euler equation, (C.9), we have

∂F d ∂F
δ I [y] = 0 ⇔ 0= + y −F . (C.31)
∂x dx ∂ y
C.5 Possible Simplifications
The Euler equation (C.9) or its alternate form (C.31) is a 2nd-order ordinary differen-
tial equation with respect to the independent variable x. This equation may simplify
depending on the form of F(y, y , x):
1. If F is independent of y, then ∂ F/∂ y = 0 and the Euler equation (C.9)

can be integrated to yield
∂F
= const . (C.32)
∂ y
2. If F does not depend explicitly on x, then ∂ F/∂ x = 0 and the alternate

form of the Euler equation (C.31) can be integrated to yield
∂F
y − F = const . (C.33)
∂ y
It turns out that simplification (2) is equivalent to making a change of the independent
variable in the integrand of the functional from x to y using
dx = x dy ⇔ y = 1/x , (C.34)
so that
x2 y2 y2
I [y] =
F(y, y ) dx =
F(y, 1/x ) x dy ≡ F̃(x, x , y) dy ≡ I˜[x] .
x1 y1 y1
(C.35)
But since
F̃(x, x , y) ≡ x F(y, 1/x ) (C.36)
is independent of x, the Euler equation for I˜[x] simplifies to ∂ F̃/∂ x = const. But
note that
∂ F̃ ∂
∂ F ∂(1/x )

1 ∂F ∂F
= x F(y, 1/x ) = F + x = F − = F − y ,
∂x ∂x ∂ y ∂ x x ∂y ∂y
(C.37)
which means that
∂ F̃ ∂F

= const ⇔ y − F = const (C.38)
∂x ∂y
as claimed. (Note: The change of independent variables from x to y assumes that

the function y = f (x) is invertible, so that x = f −1 (y) and F̃(x, x , y) are well-
defined.)
Example C.2 A soap film is suspended between two circular loops of wire, as shown
in Fig. C.3. Ignoring the effects of gravity, the soap film takes the shape of a surface
of revolution, which has minimimum surface area. Thus, in terms of the function
y = f (x), the functional that we need to minimize is the surface area of revolution
℘2 x2
I [y] = 2π y ds = 2π y 1 + y 2 dx , (C.39)
℘1 x1
which has
F(y, y , x) = 2π y 1 + y 2 . (C.40)
But since F does not depend explicitly on x, we can use simplification (2) to write
∂F 2π yy 2π y
y
− F = y − 2π y 1 + y 2 = − = const . (C.41)
∂y 1+y 2 1 + y 2
Rewriting this constant as −2π A and solving for y yields
dy 1 2
y ≡ = y − A2 . (C.42)
dx A
This is a separable equation, which can be integrated using the hyperbolic trig substi-
tutions y = A cosh u, recalling that cosh2 u − sinh2 u = 1, and d cosh u = sinh u du.
Thus,

dy A sinh u du
x=A +B = A +B = Au+B = A cosh−1 (y/A)+B ,
y 2 − A2 A sinh u
(C.43)
or, equivalently,

x−B
y = A cosh . (C.44)
A
Such a curve is called a catenary. As usual, the integration constants A and B can
be determined by the boundary conditions for y = f (x), which are related to the
radii of the two circular loops of wire. Unfortunately, solving for A and B involves
solving a transcendental equation. See Chap. 17 of Arfken (1970) for a discussion of
special cases of this problem.

Fig. C.3 Soap film y=f(x)

suspended between two
circular loops of wire. If we
ignore the effects of gravity,
the soap film takes the shape
of a surface of revolution
about the horizontal axis,
which has minimum surface
area
Exercise C.5 Find the shape of a wire joining two points such that a bead
will slide along the wire under the influence of gravity (without friction) in the
shortest amount of time. (See Fig. C.4.) Assume that the bead is released from
rest at y = 0. Such a curve is called a brachistochrone, which in Greek means
“shortest time.”
Hint: You should extremize the functional
℘2 x2
ds 1 + y 2
I [y] = = dx √ , (C.45)
℘1 v x1 2gy
where conservation of energy
1 2
mv − mgy = 0 ⇒ v= 2gy (C.46)
2
was used to yield an expression for the speed v in terms of y. By using simpli-
fication (2) or changing the independent variable of the functional from x to y,
you should find
dy 1 − Ay
y ≡ = . (C.47)
dx Ay
which has solution

1 1
x= (θ − sin θ ) , y= (1 − cos θ ) . (C.48)
2A 2A
This is the parametric representation of a cycloid (i.e., the path traced out by
a point on the rim of a wheel as it rolls without slipping across a horizontal
surface). See Fig. C.5.
1
x
Fig. C.4 Geometrical set-up for the brachistochrone problem, Exercise C.5. The goal is to find
the shape of the wire connecting points ℘1 and ℘2 such that a bead slides along the wire under the
influence of gravity (and in the absence of friction) in the shortest amount of time. Note that we
have chosen the y-axis to increase in the downward direction
P
0
P
0.5
P
0 1 2 3 4 5 6
Fig. C.5 A cycloid is the path traced out by a point on the rim of a wheel as it rolls without slipping
across a flat surface. It is also the shape of the wire that solves the brachistochrone problem, Exer-
cise C.5. To be consistent with the geometry of Exercise C.5 shown in Fig. C.4, we are considering
the wheel as rolling to the right in contact with the top horizontal surface
Exercise C.6 (Adapted from Kuchǎr 1995.) Consider a two-dimensional sur-

face of revolution
obtained by rotating the curve z = f (ρ) around the z-axis,
where ρ ≡ x 2 + y 2 . An example of such a surface is shown in panel (a) of
Fig. C.6, which is a paraboloid, defined by f (ρ) = ρ 2 . Surfaces of revolution
are most conveniently described by embedding equations
x = ρ cos φ , y = ρ sin φ , z = f (ρ) , (C.49)
where φ is the standard azimuthal angle in the x y-plane.
(a) Write down the line element ds 2 on the surface of revolution in terms of the
coordinates (ρ, φ) by simply substituting the embedding equations into the
3-dimensional line element dx 2 + dy 2 + dz 2 .
(b) By varying the arc length functional, obtain the geodesic equation for a
curve ρ = ρ(φ) on the surface of revolution, and show that it can be solved
via quadratures,

ρ
dρ 1 + [ f (ρ)]2
φ − φ0 = c1 , (C.50)
ρ0 ρ ρ 2 − c12
where c1 is a constant.
(c) Evaluate the above integral for the case of a surface of a cone with half-
angle α, which is defined by f (ρ) = ρ cot α. (See panel (b) of Fig. C.6.)
You should find c1
ρ= , (C.51)
cos(φ sin α + c2 )
where c1 , c2 are constants determined by the boundary conditions.

(d) Show that the above solution is equivalent to a straight line in a flat 2-
dimensional space with Cartesian coordinates
x̄ ≡ ρ cos(φ sin α) , ȳ ≡ ρ sin(φ sin α) . (C.52)
z z
z = f (ρ) = ρ2 z = f (ρ) = ρ cot α
y y
x x
(a) (b)
Fig. C.6 Examples of surfaces of revolution, obtained by rotating a curve z = f (ρ) around the
z-axis. Panel (a) A paraboloid defined by f (ρ) ≡ ρ 2 . Panel (b) A cone with half-angle α, defined
by f (ρ) = ρ cot α
C.6 Variational Problem in Parametric Form
Although we have been considering functionals of the form given by (C.2), where the
curve is explicitly described by the function y = f (x), there may be cases where it
is more convenient (or even necessary) to describe the curve in parametric form, e.g.,
x = x(t), y = y(t). Such an example is the variational problem to find the shape of
a closed curve of fixed length that encloses the greatest area. (We will revisit this in
more detail in Appendix C.8.) Here we derive the necessary and sufficient conditions
for a functional to depend only on the curve in the x y-plane and not on the choice of
parametric representation of the curve. The relevant theorem is:
Theorem C.1 The necessary and sufficient conditions for the functional
t2
I [x, y] = G(x, y, ẋ, ẏ, t) dt (C.53)
t1
to depend only on the curve in the x y-plane and not on the choice of parametric
representation of the curve is that G not depend explicitly on t and be a positive-
homogeneous function of degree one in ẋ and ẏ—i.e.,
G(x, y, λẋ, λ ẏ) = λG(x, y, ẋ, ẏ) (C.54)
for all λ > 0.
Proof Start with the functional2

x2
I [y] = F(y, y , x) dx , (C.55)
x1
which by its definition depends only the curve traced out by the function y = f (x).
We then introduce a parameter t so that x = x(t), y = y(t). Then
dx ẋ dy ẏ
dx = ẋ dt , dy = ẏ dt , x = = , y = = , (C.56)
dy ẏ dx ẋ
and x2 t2

F(y, y , x) dx = F(y, ẏ/ẋ, x)ẋ dt . (C.57)
x1 t1
Thus,
2 This derivation closely follows that given in Sect. 10 of Gelfand and Fomin (1963).
x2 t2
I [y] = F(y, y , x) dx = G(x, y, ẋ, ẏ) dt ≡ I [x, y] , (C.58)
x1 t1
where
G(x, y, ẋ, ẏ) = F(y, ẏ/ẋ, x)ẋ . (C.59)
Note that G has the properties that it does not depend explicitly on t, and that
G(x, y, λẋ, λ ẏ) = λG(x, y, ẋ, ẏ) (C.60)
for all λ > 0. Thus, G is a positive-homogeneous function of degree one in ẋ and ẏ.
Conversely, suppose that we have a functional of the form
t2
I [x, y] = G(x, y, ẋ, ẏ) dt , (C.61)
t1
where G does not explicitly depend on t and is a positive-homogeneous function

of degree one in ẋ and ẏ. Then if we change the parametrization from t to a new
parameter τ , we have
dt dx dτ dy dτ
dt = dτ , ẋ = , ẏ = , (C.62)
dτ dτ dt dτ dt
and

dx dτ dy dτ dx dy dτ
G(x, y, ẋ, ẏ) = G x, y, , = G x, y, , , (C.63)
dτ dt dτ dt dτ dτ dt
where the last equality used the positive-homogeneous-of-degree-one property of G.

Thus,
t2 τ2

dx dy
I [x, y] = G(x, y, ẋ, ẏ) dt = G x, y, , dτ , (C.64)
t1 τ1 dτ dτ
which shows that the functional is independent of the parametric representation of

the curve.
Example C.3 Here we show explicitly that the Euler equations obtained from the
parametrized functional
t2 t2
I [x, y] ≡ G(x, y, ẋ, ẏ) dt = F(y, ẏ/ẋ, x)ẋ dt (C.65)
t1 t1
reduce to the standard Euler equation (C.9) obtained from

x2
I [y] = F(y, y , x) dx . (C.66)
x1
Proof The Euler equations obtained from (C.65) by varying both x and y are

∂G d ∂G
− = 0, (C.67a)
∂x dt ∂ ẋ

∂G d ∂G
− = 0. (C.67b)
∂y dt ∂ ẏ
Since G does not depend explicitly on t, then by an extension of (C.33) to two

variables (See also Appendix C.7.3), we also have
∂G ∂G
ẋ + ẏ − G = const . (C.68)
∂ ẋ ∂ ẏ
Differentiating this last equation with respect to t yields

d ∂G ∂G d ∂G ∂G
ẋ − + ẏ − = 0, (C.69)
dt ∂ ẋ ∂x dt ∂ ẏ ∂y
where we have cancelled out the terms involving ẍ and ÿ. This last equation shows
that the two equations (C.67a) and (C.67b) are not independent, but follow one from
the other. So, without loss of generality, let’s consider (C.67b). Then by writing G
in terms of F and performing the derivatives, we find

∂G d ∂G
0= −
∂y dt ∂ ẏ

∂ dx d ∂
= [F(y, ẏ/ẋ, x)ẋ] − [F(y, ẏ/ẋ, x)ẋ]
∂y dt dx ∂ ẏ

(C.70)
∂F d ∂F 1
= ẋ − ẋ ẋ
∂y dx ∂ y ẋ

∂F d ∂F
= ẋ − ,
∂y dx ∂ y
which is proportional to the standard Euler equation (C.9).

Exercise C.7 Consider the parametrized form of the arc length functional in
two dimensions,
℘2 t2
I [x1 , x2 ] ≡ ds = G(x1 , x2 , ẋ1 , ẋ2 ) dt , (C.71)
℘1 t1
where
ds
G(x1 , x2 , ẋ1 , ẋ2 ) = = gi j ẋi ẋ j . (C.72)
dt i, j
Note that by writing the arc length in this form (See (C.14)), we are allowing
for the possibility that the 2-dimensional space be curved (e.g., the surface of a
sphere) and that the coordinates need not be Cartesian, so gi j ≡ gi j (x1 , x2 ) in
general.
(a) Show that the Euler equations for this functional are
⎛ ⎞

d ⎝ ⎠ 1 ∂g jk 1 dG
gi j ẋ j − ẋ j ẋk = gi j ẋ j . (C.73)
dt j
2 j,k ∂ xi G dt j
These are the geodesic equations in an arbitrary parametrization.

(b) Show that if we choose the parameter t to be linearly related to the arc length
s along the curve,
t = as + b , a, b = const , (C.74)
then G = const, and the geodesic equation simplifies to

⎛ ⎞
d ⎝ 1 ∂g jk
gi j ẋ j ⎠ − ẋ j ẋk = 0 . (C.75)
dt j
2 j,k ∂ xi
Such a parametrization of the curve is called an affine parametrization.

(c) Show that one obtains the same simplified form of the geodesic equation
by varying instead the kinetic energy functional,
t2
J [x1 , x2 ] ≡ K (x1 , x2 , ẋ1 , ẋ2 ) dt , (C.76)
t1
where
1 1
K (x1 , x2 , ẋ1 , ẋ2 ) ≡ gi j ẋi ẋ j = G 2 (x1 , x2 , ẋ1 , ẋ2 ) . (C.77)
2 i, j 2
The equivalence of these two approaches follows from the fact that G and
K differ by an overall multiplicative constant when t is an affine parameter.
If t is not an affine parameter, then the two functionals I and J lead to the
different equations of motion, consistent with the results of Exercise C.3.
(Note that these results hold, in general, in n dimensions.)
C.7 Generalizations
The standard calculus of variations problem (C.2) discussed in the preceding sections
can be extended in several ways. Here we describe three such extensions.
C.7.1 Functionals that Depend on Higher-Order Derivatives
Consider a functional of the form

x2
I [y] = F(y, y , y , x) dx , (C.78)
x1
where the set of functions {y = f (x)} is now restricted so that both y and y are
fixed at the endpoints ℘1 and ℘2 . Then proceeding in a manner similar to that in
Appendix C.2, we find

∂F d ∂F d2 ∂F
δ I [y] = 0 ⇔ − + = 0. (C.79)
∂y dx ∂ y dx 2 ∂ y
Note that the Euler equation for this case may contain 3rd or even 4th-order deriva-
tives of y = f (x). Although most problems in classical mechanics involve 2nd-order
differential equations, 3rd or higher-order differential equations have applications in
certain areas of chaos theory (See, e.g., Goldstein et al. 2002).
C.7.2 Allowing Variations with Free Endpoints
The standard variational problem (C.2) with fixed endpoints can be generalized by
allowing the variations δy to be non-zero at either one or both endpoints ℘1 , ℘2 . The
derivation given in Appendix C.2 then leads to

∂F d ∂F ∂ F ∂ F
δ I [y] = 0 ⇔ − = 0, = 0, = 0.
∂y dx ∂ y ∂ y x1 ∂ y x2
(C.80)
The conditions
∂ F ∂ F
= 0, = 0, (C.81)
∂ y x1 ∂ y x2
are sometimes called natural boundary conditions for the curve. In the context
of classical mechanics, the natural boundary conditions for a particle moving in
response to a velocity-independent conservative force correspond to zero velocity
at the endpoints. The only solution to the equations of motion that satisfies these
boundary conditions at both endpoints is the trivial solution, where the particle just
sits at one location forever. Imposing the natural boundary condition at just one
endpoint and fixed boundary conditions at the other allows for non-trivial solutions,
in general.
Exercise C.8 Redo the brachistochrone problem (Exercise C.5), but this time
allowing the second endpoint at x2 to be free—i.e., δy|x2 = 0. You should find
that the solution is again a cycloid, but which intersects the line x = x2 at a
right angle.
C.7.3 Generalization to Several Dependent Variables
The derivation of the Euler equation given in Appendix C.2 can be easily be extended
to functionals of the form
x2
I [y1 , . . . , yn ] ≡ F(y1 , . . . , yn ; y1 , . . . , yn ; x) dx , (C.82)
x1
where yi ≡ f i (x), i = 1, 2, . . . , n, are n functions of the independent variable x ∈

[x1 , x2 ], which we require to be fixed at the endpoints ℘1 and ℘2 . The corresponding
Euler equations obtained from the variational principle δ I [y1 , . . . , yn ] = 0 with
respect to variations δyi that vanish at the endpoints are

∂F d ∂F
− = 0, i = 1, 2, . . . , n . (C.83)
∂ yi dx ∂ yi
These equations form a system of n 2nd-order ordinary differential equations for the
functions yi = f i (x) with respect to the independent variable x.
The simplifications discussed in Appendix C.5 carry over to the general case of n
degrees of freedom:
1. If F is independent of a particular yi , then ∂ F/∂ yi = 0 and the corre-

sponding Euler equation can be integrated to yield
∂F
= const . (C.84)
∂ yi
2. If F does not depend explicitly on x, then

n
∂F
h≡ yi − F = const . (C.85)
i=1
∂ yi
To prove the second result above:
∂F d
n n
n

dh ∂F ∂F ∂ F
= yi + yi − yi + yi = 0 , (C.86)
dx i=1
∂ yi i=1
dx ∂ yi i=1
∂ yi ∂ yi
where we assumed that F does not depend explicitly on x (i.e., ∂ F/∂ x = 0) and
used the Euler equations (C.83) to get the last equality.
There is also an additional simplification if F does not depend on the derivative
of one of the variables:
3. If F is independent of a particular derivative, which we will take (with-

out loss of generality) to be yn , then the Euler equation for yn becomes
∂ F/∂ yn = 0, which can be solved algebraically for yn in terms of all of
the other variables and their derivatives (assuming ∂ 2 F/∂ yn2 = 0). The
Euler equations for all the other variables can then be obtained from the
reduced functional
x2
I [y1 , . . . , yn−1 ] ≡ F(y1 , . . . , yn−1 ; y1 , . . . , yn−1

; x) dx , (C.87)
x1
where
F(y1 , . . . , yn−1 ; y1 , . . . , yn−1

; x)

≡ F(y1 , . . . , yn ; y1 , . . . , yn−1

; x) yn =yn (y1 ,...,yn−1 ;y ,...,y .
(C.88)
1 n−1 ;x)
Exercise C.9 Verify simplification (3) above by showing that

∂F d ∂F ∂F d ∂F
− =0 ⇒ − = 0, (C.89)
∂ yi dx ∂ yi ∂ yi dx ∂ yi
for i = 1, 2, . . . , n − 1 as a consequence of ∂ F/∂ yn = 0.
C.8 Isoperimetric Problems
So far, we’ve been considering variational problems of the form δ I [y] = 0, where
the functions y = f (x) have been subject only to boundary conditions, e.g., fixed at
the endpoints x = x1 and x2 . But there might also exist situations where the functions
y = f (x) are subject to an integral constraint
x2
J [y] ≡ G(y, y , x) dx = J0 , (C.90)
x1
where J0 is a constant. Such problems are called isoperimetric problems, since the
classic example of such a problem is to find the shape of a closed curve of fixed length
(perimeter) that encloses the maximum area. Due to the constraint, the variations
of y in δ I = 0 are not free, but are subject to the condition that δ J [y] = 0. We
can incorporate this condition into the variational problem by using the method of
Lagrange multipliers (See Sect. 2.4). This amounts to adding to δ I = 0 a multiple
of δ J = 0,
δ I [y] + λδ J [y] = 0 , (C.91)
where λ is an undetermined constant (the Lagrange multiplier for this problem). Note
that (C.91) can be recast as finding the stationary values of the functional
x2
I¯[y, λ] ≡ I [y] + λ (J [y] − J0 ) = (F + λG) dx − λ J0 (C.92)
x1
with respect to unconstrained variations of both y and λ. (The variation with respect
to λ recovers the integral constraint J [y] − J0 = 0.) Performing the variations give
rise to two equations, which can be solved for the two unknowns y = f (x) and λ (if
desired).
Example C.4 Here we will find the shape of a closed curve of fixed length that
encloses the largest area. Since the curve is closed, we will not be able to represent
it as a single-valued function y = f (x) or x = g(y). Instead, we have to represent it

parametrically—i.e., by x = x(t), y = y(t), where t ∈ [t1 , t2 ] is a parameter along
the curve. Without loss of generality, we can orient the curve in the x y-plane so that
x|t1 ,t2 = 0 , y|t1 ,t2 = 0 , ẋ|t1 ,t2 = −v , ẏ|t1 ,t2 = 0 , (C.93)
with the derivatives so chosen as to avoid a kink at the origin. (See Fig. C.7.)
The functional that we want to extremize is the area under the curve
t2
I [x, y] = y dx = y ẋ dt , (C.94)
t1
subject to the constraint

t2
J [x, y] = ds = ẋ 2 + ẏ 2 dt = . (C.95)
t1
The curve is traversed clockwise so that the area obtained is enclosed by the curve.
Using the method of Lagrange multipliers discussed above, we extremize the
combined functional
t2
I¯[x, y, λ] = (F + λG) dt − λ , (C.96)
t1
where

F + λG = y ẋ + λ ẋ 2 + ẏ 2 . (C.97)
Note that F + λG does not explicitly depend on t and is a positive-homogeneous

function of degree one in ẋ and ẏ. Thus, from the discussion of Appendix C.6, the
solution to the variational problem will depend only on the curve in the x y-plane and
not on a particular parametric representation of the curve.
The Euler equations obtained from F + λG by varying x and y are

d λẋ λẋ
y+ =0 ⇒ y+ = A, (C.98)
dt ẋ 2 + ẏ 2 ẋ 2 + ẏ 2

d λ ẏ λ ẏ
− ẋ = 0 ⇒ −x = B, (C.99)
dt ẋ 2 + ẏ 2 ẋ 2 + ẏ 2
where A and B are constants. These equations can be simplified if we switch the
parametric representation from t to arc length s, noting that
dt 1
= . (C.100)
ds ẋ 2 + ẏ 2
The right-hand sides of (C.98) and (C.99) then become
dx dy
y+λ = A, λ −x = B. (C.101)
ds ds
We now apply the boundary conditions of (C.93) but in terms of arc length s,

dx dy
x|s=0, = 0 , y|s=0, = 0 , = −1 , = 0, (C.102)
ds s=0, ds s=0,
which lead to B = 0 and A = −λ. The first equation in (C.101) can then be solved
for y in terms of dx/ds,

dx
y = −λ 1 + , (C.103)
ds
and substituted back into the second equation. This leads to
d2 x
= −λ−2 x , (C.104)
ds 2
with solution
x(s) = −|λ| sin(s/|λ|) , (C.105)
which satisfies the boundary conditions above. Using (C.103) we have
y(s) = −λ (1 − cos(s/|λ|)) . (C.106)
Note that these last two equations are the parametric representation of a circle
x 2 + (y + λ)2 = λ2 , (C.107)
with radius R = |λ| and center (0, −λ), In order that the circle lie above the x-axis,
as suggested by Fig. C.7, we need λ = −R. Finally, since 2π R = is the length of
the curve, the Lagrange multiplier λ = −/2π . Thus,

x =− sin(2π s/l) , y= (1 − cos(2π s/l)) . (C.108)
2π 2π
The area enclosed by the circle is A = π R 2 = 2 /4π , which is the largest area
enclosed by a closed curve of fixed length .

Fig. C.7 Closed curve of

fixed length in the
x y-plane. The coordinates
and parametrization are
chosen so that the curve
starts and stops at the origin
and is tangent to the x-axis at
the origin. The curve is
traversed in the clockwise
direction so that the area
calculated in (C.94) is that
enclosed by the curve
Exercise C.10 Find the shape of the curve of fixed length and free endpoint
℘2 on the x-axis that encloses the largest area between it and the x-axis. (See
Fig. C.8.) You should find:

x= (1 − cos(π s/l)) , y=− sin(π s/l) , (C.109)
π π
which is the parametric representation of a semi-circle of radius R = /π,
length , and center (/π, 0). Note that the area enclosed by this curve and the
x-axis is 2 /2π , which is twice as large as that calculated in Example C.4 for
the closed-curve variational problem.
Fig. C.8 Curve of fixed

length , having one
endpoint ℘1 fixed at the
origin and the other endpoint
℘2 freely-variable in the
x-direction
Fig. C.9 Flexible hanging y

cable of fixed length
supported at endpoints ℘1
and ℘2 . A uniform
gravitational field g points
downward 1
Exercise C.11 Find the shape of a flexible hanging cable of fixed length ,
which is supported at endpoints ℘1 and ℘2 in a uniform gravitational field g
pointing downward. (See Fig. C.9.)
Hint: The shape minimizes the gravitational potential energy of the cable
℘2 ℘2 x2
I [x] = dm gy = μds gy = μg y 1 + y 2 dx , (C.110)
℘1 ℘1 x1
where μ is the mass-per-unit-length of the cable (assumed constant), subject to

the constraint that the cable has fixed length , i.e.,
℘2 x2
J [y] = ds = 1 + y 2 dx = . (C.111)
℘1 x1
You should find that the solution has the form of a catenary, y ∼ cosh x.
Boas (2006): Chapter 9 is devoted solely to the calculus of variations; an excellent
introduction to the topic well-suited for undergraduates with many examples and
problems. Solutions to several of the problems presented in this appendix can be
found in Boas (2006).
Gelfand and Fomin (1963): A more rigorous mathematical treament of the calculus
of variations. Our discussion of variational problems in parametric form follows
closely the presentation given in Sect. 10 of this book.
Lanczos (1949): In our opinion, the best book on variational methods in the context
of classical mechanics. It contains excellent descriptions/explanations of the cal-
culus of variations, constrained systems, the method of Lagrange multipliers, etc.
Suitable for either advanced undergraduates or graduate students.
Appendix D
Linear Algebra
Linear algebra can be thought of as an extension of the mathematical structure of

ordinary (3-dimensional) vectors and matrices to an arbitrary number of dimensions.
The general mathematical framework of linear algebra is particularly relevant for
the matrix calculations required for describing rigid-body motion (Chaps. 6 and 7),
and for calculating the normal modes associated with small oscillations (Chap. 8).
Although not strictly necessary for classical mechanics, we will allow our vectors and
matrices to be complex-valued, since this generalization requires limited additional
work, and it turns out to be extremely useful for quantum mechanics, where complex
numbers are the rule, not the exception. We will, however, restrict ourselves to a finite
number of dimensions n, although it is also possible to have infinite-dimensional
vector spaces (e.g., function spaces).
Since we will only be summarizing key results here and not giving detailed proofs,
we encourage readers to refer to other texts, e.g., Boas (2006); Dennery and Kryzwic-
ki (1967); Griffiths (2005); Halmos (1958) to fill in the missing details. Our approach
in this appendix is similar to that of Appendix A in (Griffiths, 2005).
D.1 Vector Space
An abstract vector space consists of two types of objects (vectors and scalars) and
two types of operations (vector addition and scalar multiplication), which interact
with one another and are subject to certain properties (enumerated below). We will
denote vectors by boldface symbols, A, B, C, · · · , and scalars (which we will take
to be complex numbers) by italicized symbols, a, b, c, · · · . Vector addition will be
denoted by a + sign between two vectors, e.g., A + B, and scalar multiplication
by juxtaposition of a scalar and a vector, e.g., aA. The properties obeyed by these
operations are as follows.

478 Appendix D: Linear Algebra
D.1.1 Vector Addition
1. Closure: The addition of two vectors is also a vector:
A+B=C (D.1)
2. Commutativity:
A+B=B+A (D.2)
3. Associativity:
(A + B) + C = A + (B + C) (D.3)
4. Zero vector:
∃0 such that A + 0 = A ∀A (D.4)
5. Inverse vector:
∀A ∃ − A such that A + (−A) = 0 (D.5)
D.1.2 Scalar Multiplication
1. Closure: The multiplication of a scalar and a vector is also a vector:
aA = B (D.6)
2. Identity: The scalar 1 is the identity operator on vectors:
1A = A (D.7)
3. Scalar multiplication is distributive with respect to scalar addition:
(a + b)A = aA + bA (D.8)
4. Scalar multiplication is distributive with respect to vector addition:
a(A + B) = aA + aB (D.9)
5. Scalar multiplication is associative with respect to scalar multiplication:
a(bA) = (ab)A (D.10)

Appendix D: Linear Algebra 479
Using the various properties given above, it is easy to see that 0A = 0 and (−1)A =
−A, since
(A + 0A) = (1A + 0A) = (1 + 0)A = 1A = A , (D.11)
and
(A + (−1)A) = (1A + (−1)A) = (1 + (−1))A = 0A = 0 . (D.12)
The above definitions and properties allow us to extend the familiar properties of
ordinary 3-dimensional vectors and real numbers to other sets of objects. Although
most properties of 3-dimensional vectors carry over to these higher-dimensional
abstract vector spaces, some do not, such as the cross (vector) product of two vectors,
e.g., A × B, which is defined in 3-dimensions by (A.2). (But see the wedge product
of differential forms described in Appendix B.)
D.2 Basis Vectors
A key concept when working with vectors is that of a basis. But in order to define
what we mean by a basis, we must first introduce some terminology:
• A linear combination of vectors A, B, · · · is any vector of the form
aA + bB + · · · (D.13)
where a, b, · · · are scalars.

• A vector C is linearly independent of the set of vectors {A, B, · · · } if and only if
C cannot be written as a linear combination of the vectors in the set—i.e.,
¬∃a, b, · · · such that C = aA + bB + · · · (D.14)
• A set of vectors {A, B, · · · } is a linearly independent set if and only if each vector
in the set is linearly independent of all the other vectors in the set.
• A set of vectors {A, B, · · · } spans the vector space if and only if any vector C
in the vector space can be written as a linear combinaton of the vectors in the
set—i.e.,
∃a, b, · · · such that C = aA + bB + · · · (D.15)
In terms of the above definitions, a basis for a vector space is defined to be any
set of vectors which is (i) linearly independent and (ii) spans the vector space. The
number of basis vectors is defined as the dimension of the vector space. Thus, an
n-dimensional vector space has a basis consisting of n vectors,
{e1 , e2 , . . . , en } . (D.16)
This is, of course, consistent with what we know about the space of ordinary 3-
dimensional vectors. The set of unit vectors {x̂, ŷ, ẑ} is a basis for that space. (More
about what unit means for a more general vector space in just a bit.)
D.2.1 Components of a Vector
Given a set of basis vectors (D.16), we can write

A = A 1 e1 + A 2 e2 + · · · + A n en ≡ A i ei . (D.17)
i
The scalars A1 , A2 , . . . , An are called the components of A with respect to the basis
{e1 , e2 , . . . , en }. The decomposition of A into its components is unique for a given
basis as shown in Exercise D.1 below. But for a different set of basis vectors, e.g.,
{e1 , e2 , . . . , en }, the components of A will be different.
Exercise D.1 Prove that the decomposition (D.17) of A into its components
A1 , A2 , . . . , An is unique. Hint: Use proof by contradiction—i.e., assume that
there exist other components A1 , A2 , . . . , An , for which

A= Ai ei . (D.18)
i
Then show that this leads to a contradiction regarding the linear independence
of the basis vectors unless Ai = Ai for all i.
Vector addition and scalar multiplication are what you might expect in terms of
the components of the vectors. That is, if we denote the correspondence between
vectors and components by1
⎡ ⎤
A1
⎢ A2 ⎥
⎢ ⎥
A ↔ A ≡ [A1 , A2 , . . . , An ]T ≡ ⎢ . ⎥ , (D.19)
⎣ .. ⎦
An
then it is easy to show that
1 Our notation is such that A denotes the abstract vector,Ai its ith component with respect to a basis,
and A the collection of components A1 , A2 , . . . , An represented as an n × 1 column matrix. The
superscript T denotes transpose, which converts a row matrix into a colum matrix, and vice versa.
A+B ↔ A + B = [A1 + B1 , A2 + B2 , . . . , An + Bn ]T , (D.20)
and
aA ↔ aA = [a A1 , a A2 , . . . , a An ]T . (D.21)
Thus, vector addition corresponds to ordinary addition of the (scalar) components of

the vectors, and scalar multiplication of a vector corresponds to multiplying each of
the components of the vector by that scalar.
Exercise D.2 Show that the zero vector 0 and inverse vector −A can be written
in terms of components as
0 ↔ 0 = [0, 0, . . . , 0]T , (D.22)
and
−A ↔ −A = [−A1 , −A2 , . . . , −An ]T . (D.23)
D.3 Inner Product
An abstract n-dimensional vector space as defined above generalizes several key

properties of vectors in ordinary 3-dimensional space. But by itself, the definition of
a vector space does not specify how to calculate the length of a vector, or the angle
that one vector makes with another vector.2 In order to extend these concepts to
higher-dimensional vector spaces, we need to introduce an additional mathematical
structure, called an inner product on the space of vectors. As we shall see below, this
inner product generalizes the notion of the ordinary dot product (or scalar product)
A · B of 3-dimensional vectors, (A.1), and hence will allow us to talk about unit
vectors and orthogonality of two vectors.
Definition: Given two vectors A and B, the inner product of A and B (denoted A·B)
is a scalar (i.e., a complex number) that satisfies the following three properties:
A · B = (B · A)∗ (D.24a)
A · A ≥ 0 , A · A = 0 if and only if A = 0 (D.24b)
A · (bB + cC) = b(A · B) + c(A · C) (D.24c)
2 As anybody who has taken freshman physics knows, length and angle are key concepts for ordinary
(3-dimensional) vectors, which are sometimes defined as “arrows” having magnitude and direction!
Mathematicians call a vector space with the additional structure of an inner product
an inner product space.
Exercise D.3 Show that if C = bB then C · A = b∗ (B · A).
The fact that A · A ≥ 0 allows us to interpret A · A as the (squared) length or norm

(or magnitude) of the vector A. We will denote the norm of A as either |A| or A, so
that √
|A| ≡ A ≡ A · A . (D.25)
This leads to following definitions:
• A vector A is said to have unit norm (or to be normalized) if and only if |A| = 1.
• Two vectors A and B are said to be orthogonal if and only if the inner product of
A and B vanishes—i.e., A · B = 0.
• A set of vectors {A1 , A2 , · · · } is said to be orthonormal if and only if Ai ·A j = δi j
for all i, j = 1, 2, . . . , n, where δi j is the Kronecker delta symbol (which equals
one if i = j, and equals zero otherwise).
• An orthonormal basis is an orthonormal set of basis vectors, which we will typi-
cally denote with hats, ê1 , ê2 , . . . , ên . These vectors are thus linearly independent,
span the vector space, and satisfy
êi · ê j = δi j . (D.26)
D.3.1 Gram-Schmidt Orthonormalization Procedure
As we already know from doing calculations with ordinary vectors in 3-dimensions,

it is often convenient to work with a set of orthonormal basis vectors, e.g., {x̂, ŷ, ẑ}.
Hence, it is good to know that there exists a general procedure, called the Gram-
Schmidt orthonormalization procedure, for taking an arbitrary set of basis vectors
in n dimensions and converting it into an orthonormal set of basis vectors. As must
be the case, this new set of basis vectors is formed by taking appropriate linear
combinations of the original (non-orthonormal) basis vectors.
We start with a set of basis vectors
{e1 , e2 , . . . , en } , (D.27)
which we will assume is not orthonormal. (If the basis vectors were already orthonor-
mal, then there would be nothing that you need to do!) Take e1 and simply divide by
its norm. The result is a unit vector that points in the same direction as e1 ,
e1
f̂ 1 ≡ . (D.28)
|e1 |
Now take e2 , and subtract off its component in the direction of f̂ 1 :
f 2 ≡ e2 − (f̂ 1 · e2 ) f̂ 1 . (D.29)
This makes f 2 orthogonal to f̂ 1 as one can easily check,
f̂ 1 · f 2 = f̂ 1 · e2 − (f̂ 1 · e2 )(f̂ 1 · f̂ 1 ) = 0 . (D.30)
Then normalize f 2 ,
f2
f̂ 2 ≡ . (D.31)
|f 2 |
Thus, both f̂ 1 and f̂ 2 have unit norm and they are orthogonal to one another. For f̂ 3
we proceed in a similar fashion:
f 3 ≡ e3 − (f̂ 1 · e3 ) f̂ 1 − (f̂ 2 · e3 ) f̂ 2 , (D.32)
and
f3
f̂ 3 ≡ . (D.33)
|f 3 |
Continue as above for f̂ 4 , f̂ 5 , . . . , f̂ n .

Note that this procedure does not produce a unique orthonormal basis. The re-
sulting set of orthonormal basis vectors depends on the ordering of the basis vectors
as illustrated in the following exercise.
Exercise D.4 (a) Use the Gram-Schmidt orthonormalization procedure to con-

struct an orthonormal basis starting from
e1 ≡ x̂ , e2 ≡ x̂ + ŷ , e3 ≡ x̂ + ŷ + ẑ , (D.34)
where x̂, ŷ, ẑ are the standard orthonormal basis vectors in ordinary 3-
dimensional space. (b) Repeat the procedure, but this time with the basis vectors
enumerated in the reverse order,
e1 ≡ x̂ + ŷ + ẑ , e2 ≡ x̂ + ŷ , e3 ≡ x̂ . (D.35)
Do you get the same result as in part (a)?

D.3.2 Component Form of the Inner Product
To illustrate that the inner product defined above generalizes the dot product of
ordinary 3-dimensional vectors, it is simplest to show that we can recover the form
of the dot product given in (A.4), which we rewrite here as

A · B = A1 B1 + A2 B2 + A3 B3 = Ai Bi , (D.36)
i
where Ai , Bi with i = 1, 2, 3 are the components of the ordinary 3-dimensional

vectors A, B with respect to some orthonormal basis {ê1 , ê2 , ê3 }.
So let’s work now in an arbitrary n-dimensional vector space equipped with an
inner product, and let {ê1 , ê2 , . . . , ên } denote an orthonormal basis for this space.
Then according to (D.17), we can write

A= Ai êi , B= Bi êi , (D.37)
i i
for any two vectors A, B, where Ai , Bi with i = 1, 2, . . . , n are the components of

these vectors with respect to the given orthonormal basis. Since the basis vectors are
orthonormal, it is easy to show that
Ai = êi · A . (D.38)
The proof is simply

⎛ ⎞

êi · A = êi · ⎝ A j ê j ⎠ = A j (êi · ê j ) = A j δi j = Ai , (D.39)
j j j
where we used the linearity property (D.24c) of the inner product and the orthonor-
mality (D.26) of the basis vectors to obtain the second and third equalities.3 Using
these results, it is then fairly straightforward to show that

A · B = A∗1 B1 + A∗2 B2 + · · · + A∗n Bn = Ai∗ Bi , (D.40)
i
and, as a consequence,
3 Notethat Ai = A · êi in general, since the components of a vector can be complex. Using
Exercise (D.3), it follows that A · êi = Ai∗ .

|A|2 = |Ai |2 . (D.41)
i
Note that (D.40) does indeed generalize the dot product (D.36) to arbitrary dimen-
sions. Setting n = 3 and taking our vector components to be real-valued, we see that
(D.40) reduces to (D.36).
Exercise D.5 Prove the component form (D.40) of the inner product.
D.3.3 Schwarz Inequality
Recall that for ordinary vectors in 3-dimensions, the dot product A · B can also be
written as
A · B = AB cos θ , (D.42)
√ √
where A ≡ |A| ≡ A · A and B ≡ |B| ≡ B · B are the magnitudes of the two
vectors, and θ is the angle between them; see (A.1). If we rewrite the above equation
as
A·B
cos θ = , (D.43)
|A||B|
then it can be thought of as the definition of the angle between the two vectors in terms
of their dot products and their magnitudes. This suggests a way of generalizing the
concept of “angle between two vectors” to an arbitrary n-dimensional vector space.
Namely, simply interpret the expressions on the right-hand side of (D.43) in terms
of the inner product defined by (D.24a), (D.24b), (D.24c). Unfortunately this won’t
work since A · B is a complex number in general, so the angle θ would not be real.
But it turns out that there is a simple solution, which amounts to taking the absolute
value of the right-hand side,

|A · B| |A · B|2
cos θ = = . (D.44)
|A||B| (A · A)(B · B)
So this is now a real quantity, but the fact that this equation actually gives us something
that we can interpret as an angle is thanks to the Schwarz inequality
|A · B|2 ≤ (A · A)(B · B) , (D.45)
which guarantees that the right-hand side of (D.44) has a value ≤ 1.

Fig. D.1 Graphical

illustration of the vector C
A
used in the proof of the
Schwarz inequality. Note
that ((B · A)/B · B)B is the C
projection of A onto B, so
(by construction) C is
orthogonal to B. Note that
C = 0 if and only if A and B (B · A)
B
are proportional to one B·B B
another, which corresponds
to the equal sign in the
Schwarz inequality
Exercise D.6 Prove the Schwarz inequality. (Hint: Consider the vector
(B · A)
C≡A− B, (D.46)
B·B
and then use |C|2 ≡ C · C ≥ 0. (See Fig. D.1.) Note that the Schwarz inequality
becomes an equality when A and B are proportional (i.e., parallel or anti-parallel)
to one another.)
D.4 Linear Transformations
Given our abstract n-dimensional vector space, we would now like to define a cer-
tain class of operations (called linear transformations), which map vectors to other
vectors in such a way that they preserve the linear property of vector addition and
scalar multiplication of vectors. Rotations of ordinary 3-dimensional vectors, which
play an important role in all branches of physics, are just one example of linear
transformations.
Definition: A linear transformation T is a mapping that takes a vector A to another

vector A ≡ TA such that
T(aA + bB) = a(TA) + b(TB) . (D.47)
This method of introducing additional structure on a space in such a way that it inter-
acts “naturally” with other structures in the space (in this case scalar multiplication
and vector addition) is common practice in mathematics.
Note that multiplying every vector in the space by the same scalar c (i.e., A → cA)
is an example of a linear transformation since
c(aA + bB) = c(aA) + c(bB)

= (ca)A + (cb)B
(D.48)
= (ac)A + (bc)B
= a(cA) + b(cB) ,
where we used the distributive property of scalar multiplication with respect vector
addition (D.9); the associative property of scalar multiplication of vectors (D.10); the
commutative property for multiplication of two scalars (complex numbers); and the
associative property of scalar multiplication of vectors (again) to get the successive
equalities above. But adding a constant vector C to every vector in the space is not
an example of a linear transformation as you are asked to show in the following
exercise.
Exercise D.7 Prove that adding a constant vector C to every vector in the space,
i.e., A → A + C, is not an example of a linear transformation.
The set of linear transformations, by itself, has an interesting mathematical struc-

ture. You can define (for any vector A):
(i) addition of two linear transformations:
(S + T)A ≡ SA + TA (D.49)
(ii) multiplication of a linear transformation by a scalar:
(aT)A ≡ a(TA) (D.50)
(iii) multiplication (or composition) of two linear transformations:
(ST)A ≡ S(TA) (D.51)
With the first two operations, (D.49) and (D.50), the space of linear transforma-
tions has the structure of an n 2 -dimensional vector space over the complex numbers
(Exercise D.8). Note that multiplication of linear transformations is not commutative,
however, since
ST = TS , (D.52)
in general. (Think of rotations in 3-dimensions; see, e.g., Fig. 6.5.) We will return
to the multiplicative structure of linear transformations later in this section, after we
develop the connection between linear transformations and matrices.
Exercise D.8 Show that with the above definitions of addition of linear trans-
formations and multiplication of a linear transformation by a scalar, the set of
linear transformations has the structure of a vector space over the complex num-
bers. Note that you will need to verify that these operations satisfy the properties
given in (D.1)–(D.5) and (D.6)–(D.10).
D.4.1 Component Form of a Linear Transformation
The real beauty of the linearity property (D.47) is that once you know what a lin-
ear transformation T does to a set of basis vectors {e1 , e2 , . . . , en }, you can easily
determine what it does to any vector A. To see that this is the case, let’s begin by
writing
Te1 = T11 e1 + T21 e2 + · · · + Tn1 en = Ti1 ei , (D.53)
i
which follows from the fact that any vector (in this case Te1 ) can be written as a
linear combination of the basis vectors). Similarly,

Te2 = T12 e1 + T22 e2 + · · · + Tn2 en = Ti2 ei ,
i
.. (D.54)
.

Ten = T1n e1 + T2n e2 + · · · + Tnn en = Tin ei .
i
Thus, we see that the action of T on the n basis vectors is completely captured by
the n × n numbers Ti j , where

Te j = Ti j ei , j = 1, 2, . . . , n . (D.55)
i
We will write these components as an n × n matrix:

⎡ ⎤
T11 T12 ··· T1n
⎢ T21 T22 ··· T2n ⎥
⎢ ⎥
T ↔ T=⎢ . .. .. .. ⎥ , (D.56)
⎣ .. . . . ⎦
Tn1 Tn2 · · · Tnn
where the doubleheaded arrow ↔ reminds us that T are the components of T with
respect to a particular basis. (With respect to a different basis, the matrix components
will change in a manner that we will investigate shortly.) If the basis is orthonormal,
then we can write
Ti j = ei · (Te j ) (orthonomal basis) , (D.57)
which follows immediately from (D.55).

Returning now to our claim that the action of T on any vector A is completely
determined once we know the action of T on the basis vectors, we can write
⎛ ⎞

TA = T( Ajej) = A j (Te j ) = Aj Ti j ei = ⎝ Ti j A j ⎠ ei ,
j j j i i j
(D.58)
where we used the linearity property (D.47) to get the second equality and the action
of T on the basis vectors (D.55) to get the third. Thus, knowing the action of T on
the basis vectors (i.e., the components Ti j ), we can determine the action of T on any
vector A. If we denote TA as A and the components of A with respect to the basis
{e1 , e2 , · · · , en } as Ai , then (D.58) becomes

A = TA ⇔ Ai = Ti j A j , (D.59)
j
which is equivalent to the matrix equation A = TA:

⎡ ⎤ ⎡ ⎤⎡ ⎤
A1 T11 T12 ··· T1n A1
⎢ A2 ⎥ ⎢ T21 T22 ··· T2n ⎥ ⎢ A2 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ .. ⎥ = ⎢ .. .. .. .. ⎥ ⎢ .. ⎥ . (D.60)
⎣ . ⎦ ⎣ . . . . ⎦⎣ . ⎦
An Tn1 Tn2 · · · Tnn An
Thus, the mathematical structure of linear transformations on an n-dimensional vec-

tor space is equivalent to that of n × n matrices.
Exercise D.9 Show that addition of linear transformations, multiplication of a

linear transformation by a scalar, and multiplication (composition) of two linear
transformations, defined by (D.49), (D.50), and (D.51), become simply:

Si j + Ti j , aTi j , Sik Tk j , (D.61)
k
in terms of the corresponding components. Note that the last expression is just
the component form of ordinary matrix multiplication of two matrices.
D.4.2 Change of Basis
Before discussing properties of matrices in general, let’s first determine how the
components of a vector A and the components (i.e., matrix elements) of a linear
transformation T transform under a change of basis. The classic example of a change
of basis is given by a rotation of the coordinate basis vectors x̂, ŷ, ẑ to a new set of
basis vectors x̂ , ŷ , ẑ .
So let’s denote the two sets of basis vectors by
{e1 , e2 , . . . , en } , {e1 , e2 , . . . , en } , (D.62)
where we use primed indices to distinguish between the two bases. We will assume,
for now, that these are arbitrary bases—i.e., we do not require that they be orthonor-
mal. Since any vector can be expanded in terms of either set of basis vectors, we can
write
e1 = S1 1 e1 + S2 1 e2 + · · · + Sn 1 en = S j 1e j ,
j

e2 = S1 2 e1 + S2 2 e2 + · · · + Sn 2 en = S j 2e j ,
j
(D.63)
..
.

en = S1 n e1 + S2 n e2 + · · · + Sn 2 en = S j n e j ,
j
for some set of components S j i . In compact form

ei = S j i e j , i = 1, 2, . . . , n . (D.64)
j
The components S j i define an n×n matrix, which is necessarily invertible (otherwise

{e1 , e2 , · · · } wouldn’t form a basis).
Now consider a single vector A, which has components Ai and Ai , respectively,
with respect to the unprimed and primed bases:

A= A i ei = A i ei . (D.65)
i i
Then using (D.64), it follows that

A i ei = Ai S j i e j = S j i Ai e j . (D.66)
i i j j i
Comparing with (D.65) we see that

A j = S j i Ai , j = 1 , 2 , . . . , n . (D.67)
i
The inverse of the above equation can be written as

Ai = (S −1 )i j A j , (D.68)
j
where (S −1 )i j are the components of the inverse matrix to S j i :

(S −1 )i j S j k = δik , S j i (S −1 )ik = δ j k . (D.69)
j i
Now take a linear transformation T, which maps A to B ≡ TA. Using (D.67), (D.59)
and (D.68), it follows that

Bi = Si k Bk = Si k Tkl Al = Si k Tkl (S −1 )l j A j
k k l k l j
(D.70)

−1
= Si k Tkl (S )l j A j = Ti j A j
j k,l j
where
Ti j ≡ Si k Tkl (S −1 )l j . (D.71)
k,l
Noting that the products and summations on the right-hand side are exactly those for
a product of matrices, we have
T = STS−1 , (D.72)
where T is the n × n matrix of components Ti j . Such a transformation of matrices

is called a similarity transformation.
D.4.3 Matrix Definitions and Operations
As illustrated by the calculations in the last two subsections, matrices play a key
role in linear algebra. In this section, we summarize some important definitions and
operations involving matrices, which we will refer to repeatedly in the main text.
Most of the discussion will be restricted to n × n (i.e., square) matrices, although the
transpose and conjugate operations (complex conjugate and Hermitian conjugate)
can be defined for arbitrary n × m (i.e., rectangular) matrices.
D.4.3.1 Transpose and Conjugate Matrices
The transpose, conjugate, and Hermitian conjugate of a matrix T are defined by:
(TT )i j = T ji , (T∗ )i j = Ti∗j , (T† )i j = T ji∗ . (D.73)
A matrix is said to be symmetric if and only if
T = TT ↔ Ti j = T ji , (D.74)
and Hermitian if and only if
T = T† ↔ Ti j = T ji∗ . (D.75)
Anti-symmetric and Anti-hermitian matrices are defined with minus signs in the
last two equations.
Exercise D.10 Show that the transpose of a product of matrices equals the
product of transposes in the opposite order:
(ST)T = TT ST , (D.76)
and similarly for the Hermitian conjugate:
(ST)† = T† S† . (D.77)
Note that these relations hold in general for the product of an m × n matrix and
an n × p matrix.
Exercise D.11 Show that the inner product (D.40) of two vectors A and B can
be written in terms of row and column matrices as
A · B = A† B . (D.78)
D.4.3.2 Determinants
The determinant of a 2 × 2 matrix is defined by

ab a b
T= ,
det T ≡ ≡ ad − bc . (D.79)
cd c d
For a higher-order n × n matrix T, we define its determinant in terms of the deter-

minants of (n − 1) × (n − 1) sub-matrices of T. (This procedure is called Laplace
development of the determinant.) Explicitly, if we expand off of the ith row of T
then

det T = Ti j (−1)i+ j Mi j = Ti j Ci j , (D.80)
j j
where Mi j is the minor of Ti j and Ci j ≡ (−1)i+ j Mi j is the corresponding cofactor.

The minor Mi j is calculated by taking the determinant of the (n − 1) × (n − 1) matrix
obtained from T by removing its ith row and jth column. Note that you get the same
answer for the determinant of T regardless of which row you expand off of, or if you
expand off of a column instead of a row. In addition, as you will show in part (c) of
Exercise D.12 below, adding a multiple of one row (or column) of a square matrix
to another row (or column) does not change the value of its determinant. Thus, a
judicious choice of such elementary row (or column) operations can simplify the
calculation of the determinant.
Exercise D.12 The determinant of an n × n matrix can also be defined by

det T = εi1 i2 ···in T1i1 T2i2 · · · Tnin , (D.81)
i 1 ,i 2 ,···i n
where εi1 i2 ···in is the n-dimensional Levi-Civita symbol:

⎧
⎨ 1 if i 1 i 2 · · · i n is an even permutation of 12 · · · n
εi1 i2 ···in ≡ −1 if i 1 i 2 · · · i n is an odd permutation of 12 · · · n (D.82)
⎩
0 otherwise
See (A.7) for the 3-dimensional version of the Levi-Civita symbol, which enters
the expression for the vector product of two 3-dimensional vectors.
(a) Work out the explicit expression for the determinant of a 3 × 3 matrix using
the definition given in (D.81).
(b) Do the same using the earlier definition (D.80), and confirm that the two
expressions you obtain agree with one another.
(c) Using the above definition (D.81), show that the determinant of an n × n
matrix T is unchanged if you add a multiple of one row (or column) of T to
another row (or column) before taking its determinant.
D.4.3.3 Unit Matrix and Inverses
The unit (or identity) matrix 1 has components given by the Kronecker delta δi j :
⎡ ⎤
1 0 ··· 0
⎢0 1 ··· 0⎥
⎢ ⎥
1=⎢. .. .. .. ⎥ . (D.83)
⎣ .. . . .⎦
0 0 ··· 1
A matrix T is said to be invertible if and only if there exist another matrix T−1 , called
the inverse matrix of T, such that
TT−1 = T−1 T = 1 , (D.84)
or, equivalently,
Tik (T −1 )k j = (T −1 )ik Tk j = δi j . (D.85)
k k
It turns out that a matrix is invertible if and only if its determinant is non-zero. An
explicit expression for the inverse matrix is
1
T−1 = CT , (D.86)
det T
where C is the matrix of cofactors. The inverse of a product of two invertible matrices
S and T is the product of the inverse matrices in S−1 and T−1 in the opposite order,
(ST)−1 = T−1 S−1 . (D.87)

Exercise D.13 Calculate the inverse matrices for the general 2 × 2 and 3 × 3
matrices ⎡ ⎤
ab c
ab ⎣d e f ⎦ ,
, (D.88)
cd
gh i
assuming that the determinants are non-zero for both.
D.4.3.4 Orthogonal and Unitary Matrices
A matrix is said to be orthogonal if and only if

TT = T−1 ↔ Tik T jk = δi j , Tki Tk j = δi j . (D.89)
k k
A matrix is said to be unitary if and only if

T† = T−1 ↔ Tik T jk∗ = δi j , Tki∗ Tk j = δi j . (D.90)
k k
Exercise D.14 Show that an ordinary rotation in 3-dimensions, e.g.,

⎡ ⎤
cos φ sin φ 0
Rz (φ) = ⎣ − sin φ cos φ 0⎦ , (D.91)
0 0 1
is an example of an orthogonal matrix.
D.4.3.5 Useful Properties of Determinants
The determinant satisfies several useful properties:
det 1 = 1 , det(TT ) = det T , det(ST) = det S det T . (D.92)
Applying the above results to (D.84), it follows that
1
det(T−1 ) = = (det T)−1 . (D.93)
det T
In Appendix D.4.2, we derived how the components of a linear transformation T

change under a change of basis, (D.72). Using the above properties, it follows that
det(STS−1 ) = det S det T det(S−1 ) = det S det T (det S)−1 = det T . (D.94)
Thus, the determinant of a matrix is invariant under a similarity transformation. In

other words, the value of the determinant doesn’t depend on what basis we use to
convert a linear transformation T to a matrix of components Ti j .
D.4.3.6 Trace
There is another operation on matrices that is invariant under a similarity transfor-

mation. It is the trace, which is defined as the sum of the diagonal elements of the
matrix,

Tr(T) ≡ Tii . (D.95)
i
Since one can show that (Exercise D.15)
Tr(ST) = Tr(TS) , (D.96)
it follows trivially that
Tr(STS−1 ) = Tr(S−1 ST) = Tr(T) . (D.97)
Thus, the trace of a matrix, like the determinant, is also invariant under a similarity
transformation, (D.72).
Exercise D.15 Prove property (D.96) for the trace operation.
D.5 Eigenvectors and Eigenvalues
The last topic that we will discuss in our review of linear algebra involves eigen-
vectors and eigenvalues of a linear transformation T. Eigenvectors of T are special
vectors, which are effectively unchanged by the action of T. By “effectively un-
changed” we mean that the eigenvector need only be mapped to itself up to an
overall proportionality factor, which is called the eigenvalue of the eigenvector. If
we denote an eigenvector of T by v and its eigenvalue by λ, then
Tv = λv . (D.98)
Note that the magnitude of the eigenvector v is not fixed by the above equation as
v ≡ av is also an eigenvector of T with the same eigenvalue λ.
Example D.1 As a simple example of an eigenvector, consider the space of ordinary

3-dimensional vectors and let’s take as our linear transformation a counter-clockwise
rotation about some axis n̂ through the angle , which we will denote as Rn̂ ().
Then n̂ is trivially an eigenvector of Rn̂ () with eigenvalue 1, since all points on the
axis of rotation are left invariant by the transformation.
Exercise D.16 Suppose we restrict attention to ordinary real-valued vectors

and rotations in 2-dimensions. Do any non-zero real-valued eigenvectors exist
for such transformations? If so, what rotation angles do they correspond to?
D.5.1 Characteristic Equation
If we introduce a basis {e1 , e2 , . . . , en }, then we can recast (D.98) as a matrix equation
Tv = λv , (D.99)
where v and T are the matrix representations of v and T with respect to the basis.
This last equation is equivalent to
(T − λ1)v = 0 , (D.100)
where the right-hand side is the zero-vector 0 ≡ [0, 0, . . . , 0]T . Since this is a
homogeneous equation, v = 0 is a (trivial) solution, and it is the only solution if
(T − λ1) is invertible. Hence, a non-zero solution to this equation requires that the
matrix (T − λ1) not be invertible or, equivalently, that
det(T − λ1) = 0 . (D.101)
Expanding the determinant yields an nth-order polynomial equation for λ, which is

called the characteristic equation
det(T − λ1) = c0 + c1 λ + c2 λ2 + · · · + cn λn = 0 , (D.102)
where the coefficients ci are algebraic expressions involving the matrix elements Ti j .
By the fundamental theorem of algebra, this equations admits n complex roots λi ,
which might be zero or repeated multiple times,
(λ1 − λ)(λ2 − λ) · · · (λn − λ) = 0 . (D.103)
The n roots are the eigenvalues of (D.99).

Given the eigenvalues, λ1 , λ2 , . . . , λn , we can now substitute them back into
(D.100), one at a time, and solve for the elements of the corresponding eigenvectors
v1 , v2 , . . . , vn . Since det(T − λ1) = 0, not all of the components of the individual
eigenvectors vi will be uniquely determined. As mentioned earlier, there is always
the freedom of an overall normalization factor, which we will usually chose to make
each eigenvector have unit norm.
Example D.2 Find the eigenvectors and eigenvalues of the matrix

01
T= . (D.104)
10
We start by writing down the characteristic equation

−λ 1

det(T − λ1) = = λ2 − 1 = 0 . (D.105)
1 −λ
This has two real solutions
λ+ = 1 , λ− = −1 . (D.106)
Substituting the solution λ+ = 1 back into the eigenvector-eigenvalue equation

(D.100), we have
−1 1 v1 0
= . (D.107)
1 −1 v2 0
This yields two equations

−v1 + v2 = 0 ,
(D.108)
v1 − v2 = 0 ,
which (as expected) are linearly dependent on one another. The solution to these
equations is
v1 = v2 . (D.109)
√ freedom in choice of an overall multiplicative factor, we can choose v1 =

Using our
v2 = 1/ 2 for which
1 1
v+ = √ . (D.110)
2 1
Repeating this procedure for λ− = −1, we find

1 1
v− = √ . (D.111)
2 −1
Note that these two eigenvectors have unit norm and are orthogonal to one another,

1 1 1
v†+ v− = 11 = (1 − 1) = 0 . (D.112)
2 −1 2
Thus, the corresponding vectors v+ , v− form an orthonormal basis for the (real-
valued) 2-dimensional vector space. But as we shall explain in the next subsection,
it is not always the case that the eigenvectors of an arbitrary matrix form a basis for
the vector space.
Exercise D.17 Find the eigenvectors and eigenvalues of the 2-dimensional ro-
tation matrix
cos φ sin φ
R(φ) = . (D.113)
− sin φ cos φ
Note that you will need to allow complex-valued eigenvectors in general.
D.5.2 Diagonalizing a Matrix
If the eigenvectors v1 , v2 , . . . , vn of a linear transformation T span the n-dimensional

vector space, then they can be used as a new set of basis vectors
e1 ≡ v1 , e 2 ≡ v 2 , ··· , en ≡ vn , (D.114)
in place of the original basis vectors e1 , e2 , . . . , en . Since
Tei = λi ei , i = 1 , 2 , . . . , n , (D.115)
it follows that the components Ti j of T in this new basis are given by Ti j = λi δi j ,

which in matrix form is
⎡ ⎤
λ1 0 ··· 0
⎢0 λ2 ··· 0 ⎥
⎢ ⎥
T = diag(λ1 , λ2 , . . . , λn ) = ⎢ . .. .. .. ⎥. (D.116)
⎣ .. . . . ⎦
0 0 · · · λn
Thus, the matrix T is diagonal.

It’s not too hard to show that the matrix S, which transforms the components Ti j
to the components Ti j via
T = STS−1 , (D.117)
has

S−1 = e1 e2 · · · en = v1 v2 · · · vn , (D.118)
or, equivalently,
(S −1 )i j = (e j )i . (D.119)
In other words, the columns of S−1 are just the eigenvectors of T in the original
basis.
Proof

(STS−1 )i j = Si k Tkl (S −1 )l j = Si k Tkl (e j )l
k l k l

= Si k λ j (e j )k = λ j Si k (S −1 )k j (D.120)
k k
= λ j δi j = Ti j ,
where we used (D.119) twice and also Te j = λ j e j .
If, in addition to spanning the vector space, the eigenvectors of T are orthonormal,
then the matrix S also has a simple form,
⎡ ⎤
v†1
⎢ v† ⎥ †
⎢ 2⎥
S = ⎢ . ⎥ = v1 v2 · · · vn . (D.121)
⎣ .. ⎦
v†n
Comparing with (D.118) we see that

S† = S−1 , (D.122)
so the similarity matrix S is unitary, cf. (D.90).
Exercise D.18 Using the results of Example D.2, show explicitly that

1 1 1
S−1 = v+ v− = √ (D.123)
2 1 −1
diagonalizes
01
T= . (D.124)
10
In so doing, calculate S and show that it is unitary (actually orthogonal in this

case, since the eigenvectors are all real-valued).
Although every linear transformation (or n × n matrix) admits (complex-valued)

eigenvectors, not all n × n matrices can be diagonalized. The eigenvectors would
need to span the vector space, but this is not the case in general. As a simple example,
consider the 2 × 2 matrix
01
T= . (D.125)
00
Its two eigenvalues λ1 , λ2 are both equal to 0, and the corresponding eigenvectors v1 ,
v2 are both proportional to [1, 0]T . Hence the eigenvectors span only a 1-dimensional
subspace of the 2-dimensional vector space, and the similarity transformation S
needed to map T to the diagonal matix

λ1 0 00
= (D.126)
0 λ2 00
does not exist. Thus, the matrix given by (D.125) cannot be diagonalized.
Fortunately, there is a certain class of matrices that are guaranteed to be diagonal-
izable. These are Hermitian matrices, for which Ti j = T ji∗ . (For a real-valued vector
space, these matrices are symmetric, i.e., Ti j = T ji .) Not only do the eigenvectors
of a Hermitian matrix span the space, but the eigenvalues are real, and the eigen-
vectors corresponding to distinct eigenvalues are orthogonal to one another. These
results are especially relevant for quantum mechanics, where the observables of the
theory are represented by Hermitian transformations. (For proofs of these statements
regarding Hermitian matrices, and for an excellent introduction to quantum theory,
see Griffiths 2005.)
Exercise D.19 Diagonalize the Hermitian matrix


1 i
T= (D.127)
−i 1
by finding its eigenvalues and eigenvectors, etc. Verify that the similarity trans-
formation that diagonalizes T is unitary.
D.5.3 Determinant and Trace in Terms of Eigenvalues
We end this section by showing that for any matrix T (diagonalizable or not), the
determinant and trace of T can be written very simply in terms of its eigenvalues:
$
det T = λi , Tr(T) = λi . (D.128)
i i
For a diagonalizable matrix, the above two results follow immediately from
(D.116) for T and the fact that the determinant and trace of a matrix are invariant un-
der a similarity transformation, (D.94) and (D.97). For a non-diagonalizable matrix,
we proceed by first equating the expansion of the characteristic equation (D.102) in
terms of powers of λ and its factorization (D.103) in terms of its eigenvalues:
c0 + c1 λ + c2 λ2 + · · · + cn λn = (λ1 − λ)(λ2 − λ) · · · (λn − λ) . (D.129)
From this equality we can see that the constant term c0 is given by
$
c0 = λi , (D.130)
i
while the factor multiplying λn−1 is given by

cn−1 = (−1)n−1 λi . (D.131)
i
Now return to (D.102),
det(T − λ1) = c0 + c1 λ + c2 λ2 + · · · + cn λn . (D.132)

Setting λ = 0 in this equation gives
c0 = det T , (D.133)
while expanding the determinant using (D.81),

det(T − λ1) = εi1 i2 ···in (T1i1 − λδ1i1 )(T2i2 − λδ2i2 ) · · · (Tnin − λδnin ) ,
i 1 ,i 2 ,...,i n
(D.134)
gives
cn = (−1)n , cn−1 = (−1)n−1 Tr(T) . (D.135)
(To see this, note that the terms proportional to λn and λn−1 in (D.134) must come
from the product
(T11 − λ)(T22 − λ) · · · (Tnn − λ) , (D.136)
which leads to (D.135).) Then by comparing (D.130) and (D.131) with (D.133) and
(D.135), we get (D.128).
Boas (2006): Chapter 3 is devoted to linear algebra. The treatment is especially suited
for undergraduates, with many examples and problems.
Dennery and Kryzwicki (1967): A mathematical methods book suited for advanced
undergraduates and graduate students. Chapter 2 discusses finite-dimensional vec-
tor spaces; Chap. 3 extends the formalism to (infinite-dimensional) function spaces.
Griffiths (2005): Appendix A provides a review of linear algebra, especially relevant
for calculations that arise in quantum mechanics. Our presentation follows that of
Griffiths.
Halmos (1958): A classic text on vector spaces and linear algebra, written primarily
for undergraduate students majoring in mathematics. As such, the mathematical
rigor is higher than that in most mathematical methods books for scientists and
engineers.
Appendix E
Special Functions
Special functions play an important role in the physical sciences. They often arise as
power series solutions of ordinary differential equations, which in turn come from a
separation-of-variables decomposition of common partial differential equations (e.g.,
Laplace’s equation, Helmholtz’s equation, the wave equation, the diffusion equation,
· · · ). Special functions behave like vectors in an infinite-dimensional vector space,
sharing many of the properties of vectors described in Appendix D. Each set of special
functions is orthogonal with respect to an inner product defined as an appropriate
integral of a product of two such functions. Special functions also form a set of basis
functions in terms of which one can expand the general solution of the original partial
differential equation.
In this appendix, we review the key properties of several special functions, with
particular emphasis on those functions that appear often in classical mechanics
applications. We will assume that the reader is already familiar with the general
(Frobenius) method of power series solutions, which we describe very briefly in Ap-
pendix E.1. As such we will omit detailed derivations of the recursion relations for
the coefficients of the various power series solutions. For those details, you should
consult, e.g., Chap. 12 in Boas (2006). The definitive source for anything related to
special functions is Abramowitz and Stegun (1972).
E.1 Series Solutions of Ordinary Differential Equations
The general form of a homogeneous, linear, 2nd-order ordinary differential equation

is
y (x) + p(x) y (x) + q(x) y(x) = 0 , (E.1)
where p(x) and q(x) are arbitrary functions of x. We are interested in power series
solutions of the form

506 Appendix E: Special Functions
∞
∞

y(x) = an x n or y(x) = x σ an x n (E.2)
n=0 n=0
for some value of σ . We need to consider the second (more general) power series
expansion (i.e., with σ = 0), called a Frobenius series, if x = 0 is a regular
singular point of the differential equation—that is, if p(x) or q(x) is singular (i.e.,
infinite) at x = 0, but x p(x) and x 2 q(x) are finite at x = 0. If x = 0 is a regular
point of the differential equation, then one can simply set σ = 0 and use the first
expansion.
The basic procedure for finding a power series solution to (E.1) is to differentiate
the power series expansion for y(x) term by term, and then substitute the expansion
into the differential equation for y(x). Since the resulting sum must vanish for all
values of x, the coefficients of x n must all equal zero, leading to a recursion relation,
which relates an to some subset of the previous ar (r < n), and a quadratic equation
for σ , called the indicial equation. The following theorerm, called Fuch’s theorem,
tells us how to obtain the general solution of the differential equation from two
Frobenius series solutions.
Theorem E.1 Fuch’s theorem: The general solution of the differential equation
(E.1) with a regular singular point at x = 0 consists of of either:
(i) a sum of two Frobenius series S1 (x) and S2 (x), or
(ii) the sum of one Frobenius series S1 (x), and a second solution of the form
S1 (x) ln x + S2 (x), where S2 (x) is another Frobenius series.
Case (ii) occurs only if the roots of the indicial equation for σ are equal to one
another or differ by an integer.
If x = 0 is a regular point of the differential equation, then the general solution is
simply the sum of two ordinary series solutions.
E.2 Trigonometric and Hyperbolic Functions
Trigonometric and hyperbolic functions (e.g., cos θ , sin θ , cosh χ , sinh χ , etc.) can
be defined geometrically in terms of circles and hyperbolae. For example, cos θ is
the projection onto the x-axis of a point P on the unit circle making an angle θ with
respect to the x-axis. Here, instead, we define these functions in terms of power series
solutions to differential equations.
Appendix E: Special Functions 507
E.2.1 Trig Functions
Trigonometric functions are solutions to the differential equation
y + k 2 y = 0 . (E.3)
A power series expansion leads to the two-term recursion relation
−k 2
an+2 = an . (E.4)
(n + 1) (n + 2)
Thus, there are two independent solutions, one starting with a0 and the other starting
with a1 . If a0 = A and a1 = B, then the general solution to this equation is a linear
superposition of sine and cosine functions,
y(x) = A cos kx + B sin kx , (E.5)
where
(kx)2 (kx)4
cos kx ≡ 1 − + − ··· ,
2! 4! (E.6)
(kx)3 (kx)5
sin kx ≡ kx − + − ··· .
3! 5!
These functions can be written in terms of complex exponentials using Euler’s iden-
tity
eiθ = cos θ + i sin θ , (E.7)
which can be inverted to yield explicit expressions for the cosine and sine functions:
1 iθ 1 iθ
cos θ = e + e−iθ , sin θ = e − e−iθ . (E.8)
2 2i
The trig functions are periodic with period 2π , and form an orthogonal set of functions
on the interval [−π, π ]:
π
dx sin(nx) sin(mx) = π δnm ,
−π
π
dx cos(nx) cos(mx) = π δnm , (E.9)
−ππ
dx sin(nx) cos(mx) = 0 .
−π
sin(θ) 1
cos(θ)
-π +π
-1
Fig. E.1 The functions sin θ and cos θ plotted over the interval −π to π
This is a key property of trig functions used in Fourier expansions of periodic func-
tions. Plots of sin θ and cos θ are given in Fig. E.1. Finally, from sine and cosine we
can define other trig fucntions:
sin x 1 1 1
tan x ≡ ≡ , sec x ≡ , csc x ≡ . (E.10)
cos x cot x cos x sin x
Exercise E.1 Verify the recursion relation given in (E.4).
Exercise E.2 Verify the orthogonality property of the sine and cosine functions,
(E.9).
E.2.2 Hyperbolic Functions
Hyperbolic functions are solutions to the differential equation
y − k 2 y = 0 . (E.11)
The recursion relation for this case is
k2
an+2 = an , (E.12)
(n + 1) (n + 2)
and the general solution to this equation is a linear combination of sinh and cosh
functions:
y(x) = A cosh kx + B sinh kx , (E.13)
where
(kx)2 (kx)4
cosh kx ≡ 1 + + + ··· ,
2! 4! (E.14)
(kx)3 (kx)5
sinh kx ≡ kx + + + ··· .
3! 5!
Hyperbolic functions can be also be written in terms of ordinary exponentials,
1 x 1 x
cosh x = e + e−x , sinh x = e − e−x , (E.15)
2 2
and trig functions,
cosh x = cos(ix) , sinh x = −i sin(ix) . (E.16)
From sinh and cosh we can define other hyperbolic functions, analogous to (E.10):
sinh x 1 1 1
tanh x ≡ ≡ , sech x ≡ , csch x ≡ .
cosh x coth x cosh x sinh x
(E.17)
Plots of sinh x, cosh x, and tanh x are given in Fig. E.2.
Exercise E.3 Verify (E.15) and (E.16).
E.3 Legendre Polynomials and Associated Legendre

Functions
Legendre’s equation for y(x) is
(1 − x 2 ) y − 2x y + l(l + 1) y = 0 , (E.18)
where l is a constant. This ordinary differential equation arises when one uses sep-
aration of variables for Laplace’s equation ∇ 2 = 0 in spherical coordinates (here
-2 2
-1 sinh(x)
cosh(x)
tanh(x)
-3
Fig. E.2 The hyperbolic functions sinh x, cosh x, and tanh x
x ≡ cos θ ). One can show that y(x) admits a regular power series solution with
recursion relation
n(n + 1) − l(l + 1)
an+2 = an , n = 0, 1, · · · (E.19)
(n + 1)(n + 2)
Using the ratio test, it follows that the power series solution converges for |x| < 1.
But one can also show (Exercise E.4, part (c)) that the power series solution diverges
at x = ±1 (corresponding to the North and South poles of the sphere) unless the
series terminates after some finite value of n.
Exercise E.4 (a) Verify the recursion relation (E.19). (b) Show that for l = 0,
the power series solution obtained by taking a0 = 0 and a1 = 1 is
1 1
y(x) = x + x 3 + x 5 + · · · . (E.20)
3 5
(c) Using the integral test, show that this solution diverges at x = 1 or x = −1.
E.3.1 Legendre Polynomials
From the recursion relation (E.19), we see that if l is a non-negative integer (l =

0, 1, · · · ), one of the power series solutions terminates (the even solution if l is even,
and the odd solution if l is odd). The other solution can be set to zero (by hand)
by choosing a1 = 0 or a0 = 0. The finite solutions thus obtained are polynomials
Legendre polynomials
1
0.8
0.6
0.4
0.2
0
−0.2
−0.4
P0(x)
−0.6 P (x)
1
−0.8 P (x)
2
−1 P3(x)
−1 −0.5 0 0.5 1
x
Fig. E.3 First few Legendre polynomials Pl (x) plotted as functions of x ∈ [−1, 1]
of order l. When appropriately normalized, they are called Legendre polynomials,

denoted Pl (x). By convention, the normalization condition is Pl (1) = 1. The first
four Legendre polynomials are
P0 (x) = 1 ,
P1 (x) = x ,
1 (E.21)
P2 (x) = (3x 2 − 1) ,
2
1
P3 (x) = (5x 3 − 3x) .
2
Figures E.3 and E.4 give two different graphical representations of the first few Leg-
endre polynomials. Note that Pl (−x) = (−1)l Pl (x).
Exercise E.5 Verify (E.21).
Exercise E.6 (a) Show that one also obtains a polynomial solution if l is a
negative integer (l = −1, −2, · · · ). (b) Verify that these solutions are the same
as those for non-negative l (e.g., l = −1 yields the same solution as l = 0,
and l = −2 yields the same solution as l = 1, etc.). Thus, there is no loss of
generality in restricting attention to l = 0, 1, · · · .
Fig. E.4 The magnitude |P0(cosθ)| |P1(cosθ)|

|Pl (cos θ)| of the first few 1 1
Legendre polynomials
plotted as functions of cos θ
in the x z plane (or yz) plane.
The angle θ is measured with z 0 z 0
respect to the positive z-axis.
Note that by plotting the
magnitude, information
about the sign (i.e., ±) of the -1 -1
-1 0 1 -1 0 1
Legendre polynomials |P2(cosθ)| |P3(cosθ)|
Pl (cos θ) is lost in this 1 1
graphical representation
z 0 z 0
-1 -1
-1 0 1 -1 0 1
E.3.2 Some Properties of Legendre Polynomials
E.3.2.1 Rodrigues’ Formula
The Legendre polynomials can be generated using Rodrigues’ formula:

l
1 d
Pl (x) = l (x 2 − 1)l . (E.22)
2 l! dx
E.3.2.2 Orthogonality
The Legendre polynomials for different values of l are orthogonal to one another,
1
2
dx Pl (x)Pl (x) = δll . (E.23)
−1 2l + 1
Exercise E.7 Prove (E.23). (Hint: The proof of orthogonality is simple if you
write down Legendre’s equation for both Pl (x) and Pl (x); multiply these equa-
tions by Pl (x) and Pl (x); and then subtract and integrate the result between −1
and 1. The derivation of the normalization constant is harder, but can be proved
using mathematical induction and Rodrigues’ formula for Pl (x).)
E.3.2.3 Completeness
The Legendre polynomials are complete in the sense that any square-integrable func-
tion f (x) defined on the interval x ∈ [−1, 1] can be expanded in terms of Legendre
polynomials:
∞

2l + 1 1
f (x) = Al Pl (x) , where Al = dx f (x) Pl (x) . (E.24)
l=0
2 −1
Exercise E.8 Show that the function

−1 , −1 ≤ x < 0
f (x) = (E.25)
+1 , 0<x ≤1
can be expanded in terms of Legendre polynomials as
3 7 11
f (x) = P1 (x) − P3 (x) + P5 (x) + · · · (E.26)
2 8 16
E.3.2.4 Generating Function
The Legendre polynomials can also be obtained as the coefficients of a power series
expansion in t of a so-called generating function
∞
1
√ = Pn (x) t n . (E.27)
1 − 2xt + t 2
n=0
With this result, one can rather easily express 1/r potentials using a series of Legendre
polynomials
∞
rl
1 <
= P (cos γ ) ,
l+1 l
(E.28)
|r − r | l=0
r >
where r< (r> ) is the smaller (larger) of r and r , and γ is the angle between r and r ,
r̂ · r̂ ≡ cos γ = cos θ cos θ + sin θ sin θ cos(φ − φ ) . (E.29)
E.3.2.5 Recurrence Relations
Using the generating function, one can derive the following relations, called recur-
rence relations,1 which relate Legendre polynomials Pn (x) and their derivatives
Pn (x) to neighboring Legendre polynomials:
(n + 1) Pn+1 = (2n + 1)x Pn − n Pn−1 , (E.30a)

Pn = Pn+1 − 2x Pn +
Pn−1 , (E.30b)
n Pn = x Pn − Pn−1

, (E.30c)
(n + 1) Pn = Pn+1 − x Pn ,

(E.30d)

(2n + 1) Pn = Pn+1 − Pn−1 , (E.30e)
(1 − x 2 ) Pn = n(Pn−1 − x Pn ) . (E.30f)
Note that Legendre’s equation
(1 − x 2 ) Pn − 2x Pn + n(n + 1) Pn = 0 (E.31)
can be obtained by differentiating (E.30f) with respect to x and then using (E.30c).
In addition, the normalization Pn (1) = 1 also follows simply from the generating
function.
Exercise E.9 Prove the above recurrence relations by differentiating the gener-
ating function with respect to t and x separately, and then combining the various
expressions.
1 Most authors use either “recursion relation” or “recurrence relation” exclusively, and apply it to
any relation between indexed objects of different order. Here, we have decided to use “recurrence
relation” when describing relationships between special functions of different order, while using
“recursion relation” when describing relationships between the coefficients of the power series.
E.3.3 Associated Legendre Functions
The associated Legendre equation is given by

m2
(1 − x ) y − 2x y + l(l + 1) −
2
y = 0. (E.32)
(1 − x 2 )
It differs from the ordinary Legendre equation, (E.18), by the extra term proportional
to m 2 . It turns out that power series solutions of this differential equation also diverge
at the poles (x = ±1) unless l = 0, 1, · · · (as before) and m = −l, −l + 1, . . . , l.
The finite solutions are called associated Legendre functions, Plm (x), and are given
by derivatives of the Legendre polynomials,
dm
Plm (x) = (−1)m (1 − x 2 )m/2 Pl (x) , for m ≥ 0 ,
dx m
(E.33)
(l − m)! m
Pl−m (x) = (−1) m
P (x) , for m < 0 .
(l + m)! l
Exercise E.10 Prove by direct substitution that the above expression for Plm (x)
satisfies the associated Legendre equation (E.32).
The associated Legendre functions are not polynomials in x on account of the

square root factor (1 − x 2 )m/2 for odd m. But since we are often ultimately interested
in the replacement x ≡ cos θ , these non-polynomial factors are just proportional to
sinm θ . Thus, the associated Legendre functions can be written as polynomials in
cos θ if m is even, and polynomials in cos θ multiplied by sin θ if m is odd. The first
few associated Legendre functions are given by:
l = 0:
P00 (cos θ ) = 1 , (E.34)
l = 1:
P10 (cos θ ) = cos θ ,
(E.35)
P11 (cos θ ) = − sin θ ,
l = 2:
1
P20 (cos θ ) = 3 cos2 θ − 1 ,
2
P21 (cos θ ) = −3 sin θ cos θ , (E.36)
P22 (cos θ ) = 3(1 − cos2 θ ) ,

l\m 0 1 2 3
1
0 0 |Plm (cos θ)|
-1
1 1
1 0 0
-1 -1
1 2 3
2 0 0 0
-1 -2 -3
1 2 5 15
3 0 0 0 0
-1 -2 -5 -15
Fig. E.5 The magnitude |Plm (cos θ)| of the first few associated Legendre functions plotted as
functions of cos θ in the x z plane (or yz) plane. The angle θ is measured with respect to the
positive z-axis. Similar to the plot in Fig. E.4, the sign (i.e., ±) of the associated Legendre functions
Plm (cos θ) is lost in this graphical representation. Note that the scale changes for larger values of m
l = 3:
1
P30 (cos θ ) = 5 cos3 θ − 3 cos θ ,
2
3
P31 (cos θ ) = − sin θ 5 cos2 θ − 1 , (E.37)
2
P32 (cos θ ) = 15 cos θ − cos3 θ ,

P33 (cos θ ) = −15 sin θ 1 − cos2 θ .
Plots of the magnitude of the first few of these functions are given in Fig. E.5.
E.3.4 Some Properties of Associated Legendre Functions
E.3.4.1 Rodrigues’ Formula
Using Rodrigues’ formula for Legendre polynomials (E.22), we can write down
an analogous Rodrigues’ formula for associated Legendre functions, valid for both
positive and negative values of m:
(−1)m 2 m/2 d
l+m
Plm (x) = (1 − x ) (x 2 − 1)l . (E.38)
2l l! dx l+m
E.3.4.2 Orthonormality
For each m, the associated Legendre functions are orthogonal to one another,
1
2 (l + m)!
dx Plm (x)Plm (x) = δll . (E.39)
−1 2l + 1 (l − m)!
For each m, the associated Legendre functions form a complete set (in the index l)
for square-integrable functions on x ∈ [−1, 1]:
∞

2l + 1 (l − m)! 1
f (x) = Al Plm (x) , where Al = dx f (x) Plm (x) .
l=0
2 (l + m)! −1
(E.40)
E.4 Spherical Harmonics
Spherical harmonics are solutions to the (θ, φ) part of Laplace’s equation ∇ 2 = 0

in spherical coordinates. As such they are proportional to the product of associated
Legendre functions Plm (cos θ ) and complex exponentials eimφ :

2l + 1 (l − m)!
Ylm (θ, φ) ≡ Nlm Plm (cos θ )eimφ , Nlm ≡ . (E.41)
4π (l + m)!
The proportionality constants have been chosen so that

∗
d Ylm (θ, φ)Yl m (θ, φ) = δll δmm , (E.42)
S2
where
d ≡ d(cos θ ) dφ = sin θ dθ dφ . (E.43)
This is the orthonormality condition for spherical harmonics. Note that for m = 0,
spherical harmonics reduce to Legendre polynomials, up to a normalization factor:
%
2l + 1
Yl0 = Pl (cos θ ) . (E.44)
4π
Exercise E.11 Show that

∗
Yl,−m (θ, φ) = (−1)m Ylm (θ, φ) , (E.45)
and
Ylm (π − θ, φ + π ) = (−1)l Ylm (θ, φ) . (E.46)
The first equation tells you how to get Yl,−m from Ylm ; the second equation relates
the values of the spherical harmonic Ylm at antipodal (i.e., opposite) points on
the 2-sphere.
The first few spherical harmonics are given by:

l = 0:
%
1
Y00 (θ, φ) = , (E.47)
4π
l = 1: %
3
Y11 (θ, φ) = − sin θ eiφ ,
8π
%
3
Y10 (θ, φ) = cos θ , (E.48)
4π
%
3
Y1,−1 (θ, φ) = sin θ e−iφ ,
8π
l = 2: %
1 15
Y22 (θ, φ) = sin2 θ e2iφ ,
4 2π
%
15
Y21 (θ, φ) = − sin θ cos θ eiφ ,
8π
%

5 3 1
Y20 (θ, φ) = cos2 θ − , (E.49)
4π 2 2
%
15
Y2,−1 (θ, φ) = sin θ cos θ e−iφ ,
8π
%
1 15
Y2,−2 (θ, φ) = sin2 θ e−2iφ .
4 2π
Since Ylm (θ, φ) differs from Plm (θ ) by only a constant multiplicative factor and phase
eimφ , the magnitude |Ylm (θ, φ)| has the same shape as |Plm (θ )| (See Fig. E.5).
E.4.1 Some Properties of Spherical Harmonics
Spherical harmonics are complete in the sense that any square-integrable function
f (θ, φ) on the unit 2-sphere can be expanded in terms of spherical harmonics:
∞
l
f (θ, φ) = Alm Ylm (θ, φ) , where

l=0 m=−l (E.50)
∗
Alm = d f (θ, φ) Ylm (θ, φ) .
S2
Equivalently, the completeness property can be written as
∞
l
∗
Ylm (θ , φ )Ylm (θ, φ) = δ(n̂, n̂ ) , (E.51)
l=0 m=−l
where n̂ and n̂ are the unit (radial) vectors
n̂ = sin θ cos φ x̂ + sin θ sin φ ŷ + cos θ ẑ ,

(E.52)
n̂ = sin θ cos φ x̂ + sin θ sin φ ŷ + cos θ ẑ ,
and δ(n̂, n̂ ) is the 2-dimensional Dirac delta function on the 2-sphere:
1
δ(n̂, n̂ ) = δ(cos θ − cos θ )δ(φ − φ ) = δ(θ − θ )δ(φ − φ ) . (E.53)
sin θ
In terms of spherical harmonics, the general solution to Laplace’s equation ∇ 2 = 0

in spherical coordinates is
∞
l

(r, θ, φ) = Alm r l + Blm r −(l+1) Ylm (θ, φ) , (E.54)
l=0 m=−l
where the terms in square brackets is the solution to the radial part of Laplace’s
equation.
E.4.1.2 Addition Theorem
If one sums only over m in (E.51), one obtains the so-called addition theorem of
spherical harmonics,

l
2l + 1
∗
Ylm (θ , φ )Ylm (θ, φ) = Pl (cos γ ) , (E.55)
m=−l
4π
where
cos γ ≡ n̂ · n̂ = cos θ cos θ + sin θ sin θ cos(φ − φ ) . (E.56)
Completeness of the spherical harmonics and the addition theorem imply

∞
2l + 1
δ(n̂, n̂ ) = Pl (n̂ · n̂ ) , (E.57)
l=0
4π
which is an expansion of the Dirac delta function on the 2-sphere in terms of the
Legendre polynomials.
Exercise E.12 Using the addition theorem, show that the 1/r potential for a
point source can be written as
∞ l
1 4π r<l
= Y ∗ (θ , φ )Ylm (θ, φ) , (E.58)
|r − r | l=0 m=−l
2l + 1 r l+1 lm
>
where r< (r> ) is the smaller (larger) of r and r . This expression is fully-
factorized into a product of functions of the unprimed and primed coordinates.
E.4.1.3 Transformation Under a Rotation
We can imagine rotating our coordinates through the Euler angles α, β, γ using the
zyz form of the rotation matrix R(α, β, γ ) (See Sect. 6.2.3.1). This is equivalent to
a rotation of the 2-sphere. In this case, the spherical harmonics transform according
to

l
Ylm (θ , φ ) = Dlm,m (α, β, γ )Ylm (θ, φ) , (E.59)
m =−l
where (θ , φ ) are the coordinates of a point P = (θ, φ) after the rotation of the sphere.
The fact that Ylm (θ , φ ) can be written as a linear combination of the Ylm (θ, φ) with
the same l is a consequence of the spherical harmonics being eigenfunctions of the
(rotationally-invariant) Laplacian on the unit 2-sphere with eigenvalues depending
only on l,
(2) 2
∇ Ylm (θ, φ) = −l(l + 1)Ylm (θ, φ) , (E.60)
where

(2) 2 1 ∂ ∂f 1 ∂2 f
∇ f (θ, φ) ≡ sin θ + . (E.61)
sin θ ∂θ ∂θ sin2 θ ∂φ 2
The coefficients Dlm,m (α, β, γ ) in (E.59) are called Wigner rotation matrices.
They arise in applications of group theory to quantum mechanics (Wigner 1931). In
terms of the Euler angles, the components of the Wigner rotation matrices can be
written as

Dlm,m (α, β, γ ) = e−imα dlm,m (β)e−im γ , (E.62)
where

dlm,m (β) ≡ (l + m)!(l − m)!(l + m )!(l − m )!
&

(−1)m−m +s
×
s
(l + m − s)!s!(m − m + s)!(l − m − s)! (E.63)

'
β 2l+m −m−2s β m−m +2s
× cos sin ,
2 2
and where the sum over s is chosen such that the factorials inside the summation
always remain non-negative. The Wigner matrices also satisfy

l
∗
Dlm,m (α, β, γ )Dlm ,m (α, β, γ ) = δmm (E.64)
m =−l
as a consequence of

∗
d Ylm (θ , φ )Yl m (θ , φ ) = δll δmm . (E.65)
S2
E.5 Bessel Functions and Spherical Bessel Functions
Separation of variables of Laplaces’s equation in cylindrical coordinates (ρ, φ, z)

leads to either of the following two differential equations for the radial function R(ρ):

1 ν2
R (ρ) + R (ρ) + k 2 − R(ρ) = 0 ,
ρ ρ2

(E.66)
1 ν2
R (ρ) + R (ρ) − k 2 + R(ρ) = 0 .
ρ ρ2
The two equations correspond to different choices for the sign of the separation
constant, ±k 2 . These equations can be put into more standard form by making a
change of variables x ≡ kρ, with y(x)|x=kρ ≡ R(ρ):

1 ν2
y (x) + y (x) + 1 − y(x) = 0 ,
x x2

(E.67)
1 ν2
y (x) + y (x) − 1 + y(x) = 0 .
x x2
The first equation is called Bessel’s equation of order ν; the second is called the
modified Bessel’s equation of order ν.
Exercise E.13 Show that if y(x) is a solution of Bessel’s equation, then ȳ(x) ≡
y(ix) is a solution of the modified Bessel’s equation.
E.5.1 Bessel Functions of the 1st Kind
Since x = 0 is a regular singular point of Bessel’s equation, the method of Frobenius

(Appendix E.1) requires that we consider a power series expansion of the form
∞

y(x) = x σ an x n . (E.68)
n=0
Substituting this expansion into Bessel’s equation and equating coefficients multiply-
ing like powers of x leads to a quadratic equation for σ (called the indicial equation)
and a recursion relation relating an+2 to an (and σ ) for n = 0, 1, · · · . Setting a1 = 0
(which forces all of the higher-order odd coefficients to vanish) and choosing the
normalization coefficient a0 appropriately, we obtain the solution
∞
(−1)n ( x )2n+ν
Jν (x) = . (E.69)
n=0
n!(n + 1 + ν) 2
Jν (x) is called a Bessel function of the 1st kind of order ν. The function (n +
1 + ν) which appears in the denominator of the expansion coefficients is the gamma
function defined by
∞
(z) ≡ dx x z−1 e−x , Re(z) > 0 . (E.70)
0
The gamma function generalizes the ordinary factorial function n! = n(n − 1) · · · 1

to non-integer arguments in the sense that
(n + 1) = n! for n = 0, 1, · · ·
(E.71)
(z + 1) = z (z) for Re(z) > 0 .
Exercise E.14 (a) Prove (z + 1) = z(z) for Re(z) > 0. (Hint: Integrate
(z + 1) by parts taking u = x z and √dv = e−x dx.) (b) Show by explicit
calculation that (1) = 1 and (1/2) = π .
E.5.1.1 Asymptotic Form
To gain a better intuitive understanding of the Bessel function Jν (x), it is useful to

look at its asymptotic form—i.e., its behavior for both small and large values of x.
Using the general definition (E.69), one can show that
1 ( x )ν
x 1: Jν (x) → ,
(ν + 1) 2
% ( (E.72)
2 νπ π)
x 1, ν : Jν (x) → cos x − − .
πx 2 4
Thus, J0 (0) = 1 and Jν (0) = 0 for all ν = 0; while for large x, Jν (x) behaves like
a damped sinusoid, and has infinitely many zeros xνn :
Jν (xνn ) = 0 , n = 1, 2, · · · . (E.73)
Bessel functions of the 1st kind: J (x)

n
1
J0(x)
J1(x)
J (x)
2
J3(x)
0.5
−0.5
0 1 2 3 4 5 6 7 8 9 10
x
Fig. E.6 First few Bessel functions of the 1st kind for integer ν
Plots of the first few Bessel functions of the 1st kind for integer values of ν are given
in Fig. E.6.
Exercise E.15 Using (E.72), show that the zeros of Jν (x) are given by

1 π
xνn nπ + ν − . (E.74)
2 2
E.5.1.2 Integral Representation
It is also possible to write Jn (x) for integer n as an integral involving trig functions,
π
1
Jn (x) = cos(nθ − x sin θ ) dθ . (E.75)
π 0
This result is useful for finding a Fourier series solution to Kepler’s equation as
discussed in Sect. 4.3.4.
E.5.2 Bessel Functions of the 2nd Kind
If ν is not an integer, then J−ν (x) is the second independent solution to Bessel’s
equation. But if ν = m is an integer, then
J−m (x) = (−1)m Jm (x) , (E.76)
so J−m (x) is not an independent solution for this case. A second solution, which is
independent of Jν (x) for all values of ν (integer or not) is2
Jν (x) cos(νπ ) − J−ν (x)

Nν (x) ≡ . (E.77)
sin(νπ )
Nν (x) is called a Neumann function (or a Bessel function of the 2nd kind). In
some references, Nν (x) is denoted by Yν (x).
The asymptotic form of Nν (x) is given by

⎧ 2 x
⎨ π ln 2 + 0.5772 · · · , ν = 0
x 1: Nν (x) →
⎩ (ν) 2 ν
− π x , ν = 0 (E.78)
% ( )
2 νπ π
x 1, ν : Nν (x) → sin x − − .
πx 2 4
Note that for all ν, Nν (x) → −∞ as x → 0. In addition, just as we saw for Jν (x),
Nν (x) behaves for large x like a damped sinusoid, but is 90◦ out of phase with Jν (x).
Plots of the first few Bessel functions of the 2nd kind for integer values of ν are given
in Fig. E.7.
With Jν (x) and Nν (x) as the two independent solutions to Bessel’s equation, it
follows that the most general solution to the radial part of Laplace’s equation in
cylindrical coordinates is
R(ρ) = A Jν (kρ) + B Nν (kρ) . (E.79)
But since Nν (x) blows up at x = 0, if ρ = 0 is in the region of interest, then all of

the B coefficients must vanish to yield a finite solution to Laplace’s equation on the
axis. Since both Jν (x) and Nν (x) go to zero as x → ∞, there is no constraint on
either A or B as ρ → ∞.
2 Forν = m an integer, one needs to use L’Hôpital’s rule to show that the right-hand side of the
expression defining Nm (x) is well-defined.
Bessel functions of the 2nd kind: N (x)

n
0.6
0.4
0.2
−0.2
−0.4
−0.6
N (x)
0
N (x)
−0.8 1
N2(x)
N3(x)
−1
0 1 2 3 4 5 6 7 8 9 10
x
Fig. E.7 First few Bessel functions of the 2nd kind for integer ν
E.5.3 Some Properties of Bessel Functions
E.5.3.1 Recurrence Relations
The following relations hold for either Jν (x), Nν (x), or any linear combination of
these functions with constant coefficients:
(x ν Jν (x)) = x ν Jν−1 (x) ,

x −ν Jν (x) = −x −ν Jν+1 (x) ,
ν
Jν (x) = − Jν (x) + Jν−1 (x) ,
x
ν (E.80)
Jν (x) = Jν (x) − Jν+1 (x) ,
x

2Jν (x) = Jν−1 (x) − Jν+1 (x) ,
2ν
Jν (x) = Jν−1 (x) + Jν+1 (x) .
x
E.5.3.2 Orthogonality and Normalization
Bessel functions Jν (x) satisfy the following orthogonality and normalization condi-
tions
a
1 2 2
dρ ρ Jν (xνn ρ/a)Jν (xνn ρ/a) = a Jν+1 (xνn ) δnn , (E.81)
0 2
where xνn and xνn are the nth and n th zeroes of Jν (x). Note that the orthogonality
of Bessel functions is with respect to different arguments of a single function Jν (x),
and not with respect to different functions Jν (x) and Jν (x) of the same argument.
(This latter case held for the Legendre polynomials Pl (x) and Pl (x).) Thus, the
orthogonality of Bessel functions is similar to the orthogonality of the sine functions
sin(n2π x/a) on the interval [0, a] for different values of n.
If the interval [0, a] becomes infinite [0, ∞), then the orthogonality and normal-
ization conditions actually become simpler,
∞
1
dρ ρ Jν (kρ)Jν (k ρ) = δ(k − k ) , (E.82)
0 k
where k now takes on a continuous range of values. This is similar to the transition
from Fourier series (basis functions eikn x with kn = n2π/a) to Fourier transforms
(basis functions eikx with k a real variable):
a/2 ∞
i2π(n−n )x/a
dx e =aδ nn → dx ei(k−k )x = 2π δ(k − k ) . (E.83)
−a/2 −∞
Exercise E.16 Prove the orthogonality part of (E.81). (Hint: Let f (ρ) =
Jν (xνn ρ/a) and g(ρ) = Jν (xνn ρ/a) with n = n . Then write down Bessel’s
equation for both f and g; multiply these equations by g and f , respectively;
then subtract and integrate.)
Exercise E.17 Prove the normalization part of (E.81). (Hint: You will need to
integrate by parts and then use Bessel’s equation to substitute for x 2 Jν (x) in one
of the integrals.)
E.5.4 Modified Bessel Functions of the 1st and 2nd Kind
As mentioned previously, the modified Bessel’s equation of order ν is given by:

1 ν2
y (x) + y (x) − 1 + 2 y(x) = 0 . (E.84)
x x
Modified Bessel functions of the 1st kind: In(x)

5
4.5
3.5
2.5
1.5
1 I (x)
0
I (x)
1
0.5 I (x)
2
I (x)
3
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
Fig. E.8 First few modified Bessel functions of the 1st kind for integer ν
It differs from the ordinary Bessel’s equation only in the sign of one of the terms
multiplying y(x). Modified (or hyperbolic) Bessel functions (of the 1st and 2nd
kind) are solutions to the above equation. They are defined by
π ν+1 (1)
Iν (x) ≡ i−ν Jν (ix) , K ν (x) ≡ i Hν (ix) . (E.85)
2
Note the pure imaginary arguments on the right-hand side of the above definitions,
consistent with our earlier statement that if y(x) is a solution of Bessel’s equation then
y(ix) is a solution of the modified Bessel’s equation. Plots of the first few modified
Bessel functions of the first and second kind, Iν (x) and K ν (x), for integer values of
ν are given in Figs. E.8 and E.9.
The asymptotic behavior of the modified Bessel functions Iν (x) and K ν (x) are
given by
Modified Bessel functions of the 2nd kind: Kn(x)

10
K (x)
0
K1(x)
9
K (x)
2
K3(x)
8
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
Fig. E.9 First few modified Bessel functions of the 2nd kind for integer ν
1 ( x )ν
x 1: Iν (x) → ,
(ν + 1) 2
⎧ x
⎨ − ln 2 + 0.5772 · · · , ν = 0
K ν (x) →
⎩ (ν) 2 ν
, ν = 0 (E.86)
2 x

1 1
x 1, ν : Iν (x) → √ e 1+O
x
,
2π x x
%

π −x 1
K ν (x) → e 1+O .
2x x
Thus, I0 (0) = 1 and Iν (0) = 0 for all ν = 0, while K ν (x) → ∞ as x → 0 for all ν.
For large x, Iν (x) → ∞ while K ν (x) → 0 for all ν.
Given Iν (x) and K ν (x), the most general solution to the radial part of Laplace’s
equation for the choice of negative separation constant −k 2 is
R(ρ) = A Iν (kρ) + B K ν (kρ) . (E.87)
Since K ν (x) blows up at x = 0, if ρ = 0 is in the region of interest, then all of the B

coefficients must vanish to yield a finite solution to Laplace’s equation on the axis.
Similarly, since Iν (x) blows up as x → ∞, if the solution to Laplace’s equation is
to vanish as ρ → ∞, then all of the A coefficients must vanish.
E.5.5 Spherical Bessel Functions
Spherical Bessel functions (of the 1st and 2nd kind) are defined in terms of ordinary
Bessel functions via
% %
π π
jn (x) ≡ J 1 (x) , n n (x) ≡ N 1 (x) , (E.88)
2x n+ 2 2x n+ 2
where n = 0, 1, 2, · · · . Given the explicit form of Jn+ 21 (x) one can show that

1 d n sin x
jn (x) = x − n
,
x dx x

n ( (E.89)
1 d cos x )
n n (x) = −x n − .
x dx x
In particular, it follows that
sin x cos x
j0 (x) = , n 0 (x) = − . (E.90)
x x
Plots of the first few spherical Bessel functions are given in Figs. E.10 and E.11.
Spherical bessel functions of the 1st kind: j (x)

n
1
j (x)
0
j (x)
1
j (x)
2
j (x)
3
0.5
−0.5
0 1 2 3 4 5 6 7 8 9 10
x
Fig. E.10 First few spherical Bessel functions of the 1st kind
Spherical Bessel functions of the 2nd kind: n (x)

n
0.6
0.4
0.2
−0.2
−0.4
−0.6
n (x)
0
n1(x)
−0.8
n2(x)
n (x)
3
−1
0 1 2 3 4 5 6 7 8 9 10
x
Fig. E.11 First few spherical Bessel functions of the 2nd kind
Exercise E.18 Verify (E.90) for j0 (x) directly from its definition in terms of
the ordinary Bessel function J1/2 (x).
E.5.5.1 Spherical Bessel Differential Equation
Given the relationship between jn (x) and Jn+ 21 (x), one can show that the spherical
Bessel functions satisfy the differential equation

2 n(n + 1)
jn (x) + jn (x) + 1 − jn (x) = 0 . (E.91)
x x2
Alternatively, one arrives at the same differential equation by using separation of

variables in spherical coordinates to solve the Helmholtz equation
∇ 2 (r, θ, φ) + k 2 (r, θ, φ) = 0 . (E.92)
The φ equation is the standard harmonic oscillator equation with separation constant
−m 2 ; the θ equation is the associated Legendre’s equation with separation constants
l and m; and the radial equation is

2 l(l + 1)
R (r ) + R (r ) + k −
2
R(r ) = 0 . (E.93)
r r2
Making the change of variables x ≡ kr with y(x)|x=kr ≡ R(r ) leads to

2 l(l + 1)
y (x) + y (x) + 1 − y(x) = 0 , (E.94)
x x2
which is the differential equation (E.91) we found earlier with solution y(x) = jl (x).
E.5.5.2 Integral Representation
Spherical Bessel functions can be written as an integral involving Legendre polyno-

mials and complex exponentials,
1
2(−i)l jl (x) = dy Pl (y)e−ix y . (E.95)
−1
E.6 Elliptic Integrals and Elliptic Functions
Elliptic integrals and elliptic functions arise in some simple applications, such as
finding the length of a conic section (e.g., an ellipse) and solving for the motion of
a simple pendulum when one goes beyond the small-angle approximation. In the
following two subsections, we briefly define elliptic integrals and elliptic functions
using the notation given in Chap. 12 of Boas 2006. Other references may use slightly
different notation.
E.6.1 Elliptic Integrals
Elliptic integrals of the 1st and 2nd kind are often written in two different forms;
the Legendre forms:
φ
dθ
F(φ, k) ≡ , 0 ≤ k ≤ 1,
0 1 − k 2 sin2 θ (E.96)
φ
E(φ, k) ≡ 1 − k 2 sin2 θ dθ , 0 ≤ k ≤ 1,
0
and the Jacobi forms:

x
dt
F(φ, k) ≡ √ √ , 0 ≤ k ≤ 1,
0 1 − k t2 1 − t2
2
√ (E.97)
x
1 − k2t 2
E(φ, k) ≡ √ dt , 0 ≤ k ≤ 1,
0 1 − t2
with x ≡ sin φ. The two arguments of these functions are called the amplitude φ
and the modulus k. Note that the Jacobi and Legendre forms of elliptic integrals are
related by the change of variables t = sin θ .
Complete elliptic integrals of the 1st and 2nd kind, K (k) and E(k), are defined
by setting the amplitude φ = π/2 (or x = 1) in the above expressions:
K (k) ≡ F(π/2, k) , E(k) ≡ E(π/2, k) . (E.98)
Exercise E.19 (a) Show that the arc length of an ellipse (x/a)2 + (y/b)2 = 1
from θ = φ1 to θ = φ2 can be written as
s(φ1 , φ2 ) = a [E(φ2 , e) − E(φ1 , e)] , (E.99)

where e ≡ 1 − (b/a)2 is the eccentricity of the ellipse. (Here θ is defined by
x = a sin θ , y = b cos θ , and we are assuming that a ≥ b.) (b) Using the result
of part (a), show that the total arc length s = 4a E(e). (c) Show that for nearly
circular ellipses (i.e., for e 1), s ≈ 2πa(1 − e2 /4).
Exercise E.20 (a) Show that the period of a simple pendulum of mass m, length
, released from rest at θ = θ0 is given by

P(θ0 ) = 4 K (sin(θ0 /2)) . (E.100)
g
Do not assume that the small-angle approximation is valid for this part of the
problem. (Hint: Use conservation of total mechanical energy to find an equation
for θ̇ in terms of θ and θ0 .) (b) Show that for θ0 1, the answer from part (a)
reduces to

1
P(θ0 ) ≈ 2π 1 + θ02 , (E.101)
g 16
which in the limit of very small θ0 is√the small-angle approximation for the
period of a simple pendulum, P ≈ 2π /g.
E.6.2 Elliptic Functions
The elliptic function sn y is defined as the inverse of the elliptic integral y = F(φ, k)
for a fixed value of k,
x
dt
y= √ √ ≡ sn −1 x ⇔ x = sn y . (E.102)
0 1 − k2t 2 1 − t 2
Since x = sin φ, we can also write sn y = sin φ in terms of the amplitude φ. Note
that the above definition of sn y is very similar to the integral representation of the
inverse sine function
x
dt
y= √ = sin−1 x ⇔ x = sin y . (E.103)
0 1−t 2
In fact, when k = 0, sn y = sin y. In addition, sn y is periodic with period

1
dt
P=4 √ √ = 4F(π/2, k) = 4K (k) , (E.104)
0 1 − k2t 2 1 − t 2
similar to the sine function. Plots of x = sn y for k 2 = 0, 0.25, 0.5, and 0.75 are
shown in Fig. E.12. These have periods P = 6.28, 6.74, 7.42, and 8.63 to three
significant digits.
Elliptic function: sn(y)
1
0.8
0.6
0.4
0.2
sn(y)
0
−0.2
−0.4
−0.6 k2=0
2
−0.8 k =0.25
k2=0.50
−1 2
k =0.75
0 2 4 6 8 10
y
Fig. E.12 Plots to the elliptic function sn y for k 2 = 0, 0.25. 0.5 and 0.75. Recall that for k 2 = 0,
sn y = sin y
Given sn y, one can define other elliptic functions using relations similar to those
between trig functions,

cn y ≡ 1 − sn 2 y , dn y ≡ 1 − k 2 sn 2 y . (E.105)
Using the above definitions, it is easy to show that cn y = cos φ. In addition, using
the Legendre form of the elliptic integral F(φ, k), it follows that dn y = dφ/dy. The
proof is simply
*
dφ 1
= = 1 − k 2 sin2 φ = 1 − k 2 sn 2 y = dn y . (E.106)
dy dy/dφ
Exercise E.21 Show that

d
(sn y) = cn y dn y . (E.107)
dy
Abramowitz and Stegun (1972): A must-have reference for all things related to special
functions.
Boas (2006): Chapters 11, 12, and 13 discuss special functions, series solutions of
differential equations, and partial differential equations, respectively, filling in
most of the details omitted in this appendix. An excellent introduction to these
topics, especially suited for undergraduates. There are many examples and prob-
lems to choose from.
Mathews and Walker (1970): Chapters 1, 7, and 8 discuss ordinary differential equa-
tions, special functions, and partial differential equations, respectively. The level
of this text is more appropriate for graduate students or mathematically-minded
undergraduates.
References
B.P. Abbott, R. Abbott, T.D. Abbott, M.R. Abernathy, F. Acernese, K. Ackley, C. Adams, T.
Adams, P. Addesso, R.X. Adhikari et al., Observation of gravitational waves from a binary black
hole merger. Phys. Rev. Lett. 116(6), 061102 (2016). https://doi.org/10.1103/PhysRevLett.116.
061102
B.P. Abbott, R. Abbott, T.D. Abbott, M.R. Abernathy, F. Acernese, K. Ackley, C. Adams, T. Adams,
P. Addesso, R.X. Adhikari et al., The basic physics of the binary black hole merger GW150914.
Annalen der Physik 529, 1600209 (2017). https://doi.org/10.1002/andp.201600209
M. Abramowitz, I.A. Stegun, Handbook of Mathematical Functions (Dover Publications Inc, New
York, 1972). ISBN 0-486-61272-4
G. Arfken, Mathematical Methods for Physicists (Academic Press Inc, New York, 1970)
V.I. Arnold, Mathematical Methods of Classical Mechanics, vol. 60, Graduate Texts in Mathematics
(Springer, New York, 1978). ISBN 0-387-90314-3
M. Benacquista, An Introduction to the Evolution of Single and Binary Stars (Springer, New York,
Heidelberg, Dordrecht, London, 2013)
R.E. Berg, D.G. Stork, The Physics of Sound, 3rd edn. (Pearson Prentice Hall, Englewood Cliffs,
New Jersey, 2005). ISBN 978-0131457898
J. Bertrand, C.R. Acad. Sci. 77, 849–853 (1873)
M.L. Boas, Mathematical Methods in the Physical Sciences, 3rd edn. (John Wiley & Sons Inc,
United States of America, 2006). ISBN 0-471-19826-9
H. Bondi, Relativity and Common Sense: A New Approach to Einstein (Dover Publications Inc,
New York, 1962). ISBN 0-486-24021-5
P. Dennery, A. Kryzwicki, Mathematics for Physcists (Dover Publications Inc, Mineola, New York,
1967)
S. Dutta, S. Ray, Bead on a rotating circular hoop: a simple yet feature-rich dynamical system.
ArXiv e-prints, (December 2011)
A. Einstein, Zur Elektrodynamik bewegter Körper. Annalen der Physik 322, 891–921 (1905). https://
doi.org/10.1002/andp.19053221004
L.A. Fetter, J.D. Walecka, Theoretical Mechanics of Particles and Continua (McGraw-Hill Book
Company, United States of America, 1980)
R.P. Feynman, Surely You’re Joking Mr. Feynman! Adventures of a Curious Character (W.W. Norton
& Company, New York, London, 1985). ISBN 0-393-31604-1
R.P. Feynman, R.B. Leighton, Matthew Sands, in The Feyman Lectures on Physics, vol. II (Addison-
Wesley Publishing Company, Reading, Massachusetts, 1964). ISBN 0-201-02117-X-P

538 References
H. Flanders, Differential Forms with Applications to the Physical Sciences (Dover Publications Inc,
New York, 1963). ISBN 0-486-66169-5
M.R. Flannery, The enigma of nonholonomic constraints. Am. J. Phys. 73, 265–272 (2005). https://
doi.org/10.1119/1.1830501
I.M. Gelfand, S.V. Fomin, Calculus of Variations (Dover Publications Inc, Mineola, New York,
1963). ISBN 0-486-41448-5. (Translated and Edited by Richard A. Silverman)
H. Goldstein, C. Poole, J. Safko, Classical Mechanics, 3rd edn. (Addison Wesley, San Francisco,
CA, 2002). ISBN 0-201-65702-3
D.J. Griffiths, Introduction to Electrodynamics, 3rd edn. (Pearson Prentice Hall, United States of
America, 1999). ISBN 0-13-805326-X
D.J. Griffiths, Introduction to Quantum Mechanics, 2nd edn. (Pearson Prentice Hall, United States
of America, 2005). ISBN 0-13-111892-7
P.R. Halmos, Finite-Dimensional Vector Spaces, 2nd edn. (D. Van Nostrand Company Inc, Prince-
ton, New Jersey, 1958)
J.B. Hartle, Gravity: An Introduction to Einstein’s General Relativity (Benjamin Cummings, illus-
trate edition, January 2003). ISBN 0805386629
H. Hertz, The Principles of Mechanics Presented in a New Form (Dover Publications Inc, New
York, 2004). ISBN 978-0486495576 (The original german edition Die Prinzipien der Mechanik
in neuem zusammenhange dargestellt was published in 1894)
R.W. Hilditch, An Introduction to Close Binary Stars (Cambridge University Press, Cambridge,
2001)
K.V. Kuchǎr Theoretical mechanics. Unpublished lecture notes (1995)
J.B. Kuipers. Quarternions and Rotation Sequences: A Primer with Applications to Orbits,
Aerospace, and Virtual Reality (Princeton University Press, 1999)
C. Lanczos, The Variational Principles of Mechanics, 4th edn. (Dover Publications Inc, New York,
1949). ISBN 0-486-65067-7
L.D. Landau, E.M Lifshitz, Classical Theory of Fields, Course of Theoretical Physics, 4th edn.,
vol. 2 (Pergamon Press, Oxford, 1975). ISBN 0-08-025072-6
L.D. Landau, E.M Lifshitz, Mechanics, Course of Theoretical Physics, 3rd edn., vol. 1 (Elsevier
Ltd, Oxford, 1976). ISBN 978-0-7506-2896-9
J.B. Marion, S.T. Thornton, Classical Dynamics of Particles and Systems, 4th edn. (Saunders College
Publishing, United States of America, 1995). ISBN 0-03-097302-3
J. Mathews, R.L. Walker, Mathematical Methods of Physics (Benjamin/Cummings, United States
of America, 1970). ISBN 0-8053-7002-1
N.D. Mermin, It’s About Time: Understanding Einstein’s Relativity (Princeton University Press,
Princeton, New Jersey, 2005). ISBN 0-691-12201-6
E. Noether, Invariante Variationsprobleme. Nachr. D. König. Gesellsch. Wiss. Zu Göttingen,
1918:235–257, 1918
E. Noether, Invariant variation problems. Trans. Theory Stat. Phys. 1, 186–207 (1971). https://doi.
org/10.1080/00411457108231446
J.D. Romano, R.H. Price, Why no shear in “Div, grad, curl, and all that”? Am. J. Phys. 80(6),
519–524 (2012). https://doi.org/10.1119/1.3688678
T.D. Rossing, P.A. Wheeler, R.M. Taylor, The Science of Sound, 3rd edn. (Addison Wesley, San
Francisco, 2002). ISBN 978-0805385656
F.C. Santos, V. Soares, A.C. Tort. An English translation o Bertrand’s theorem. ArXiv e-prints,
(April 2007)
H.M. Schey, div, grad, curl and all that: An informal text on vector calculus, 3rd edn. (W.W. Norton
& Co., New York, 1996)
B. Schutz, A First Course in General Relativity (Cambridge University Press, May 2009). ISBN
9780521887052
B. Schutz, Geometrical Methods of Mathematical Physics (Cambridge University Press, Cambridge,
1980)
References 539
E.F. Taylor, J.A. Wheeler, Spacetime Physics: Introduction to special relativity (W.H. Freeman and
Company, New York, 1992)
J. Terrell, Invisibility of the Lorentz contraction. Phys. Rev. 116, 1041–1045 (1959). https://doi.
org/10.1103/PhysRev.116.1041
C.G. Torre, Introduction to Classical Field Theory. All Complete Monographs (2016). http://
digitalcommons.usu.edu/lib_mono/3/
E.P. Wigner, Gruppentheorie und ihre Anwendungen auf die Quantenmechanik der Atomspektren
(Vieweg Verlag, Braunschweig, Germany, 1931)
Index
A Bessel’s equation, 522

Absolute elsewhere, 382 Body cone, 238
Absolute future, 381 Body frame, 190
Absolute past, 381 Boost, 374, 377–378
Action, 74 Brachistochrone, 461
Active transformation, 193–197
Addition theorem, 520–521
Affine parameter, 389, 467 C
Angular momentum, 8–11 Calculus of variations, 451–476
Anti-Hermitian, 492 Canonical transformation, 90–96
Antipodal point, 217 Carathéodory’s theorem, 95
Anti-symmetric, 492 Catenary, 460
Apapsis, 119 Cauchy conditions, 312
Associated Legendre functions, 319, 515– Causal structure, 380–381
517 Center of mass, 10
orthonormality condition, 517 Central force, 3, 111–152
Rodrigues’ formula, 517 Centrifugal force, 23
Astronomical unit, 170 Characteristic coordinates, 304–306
Auxiliary circle, 125 Characteristic equation, 497–499
Axis-angle representation, 191 Chirp mass, 256
Closed form, 441–442
Closed orbit, 133
B Coefficient of restitution, 154
Bank, 204 Cofactor, 493
Barycenter, 124 Co-latitude, 25
Barycenter frame, 155–158 Commutator, 102, 210
Basis, 479–481 Components, 480
Bertrand’s theorem, 139 Configuration space, 11, 42, 84
Bessel function of the 2nd kind, 525 Conjugate, 492–493
Bessel functions, 522–532 Conjugate momentum density, 336
orthonormality condition, 526–527 Conjugation, 201
recurrence relations, 526 Connection coefficients, 421
Bessel functions of the 1st kind, 522–525 Conservation laws, 16–17, 80–82
asymptotic form, 523, 524 Conservation of angular momentum, 17, 80
integral representation, 524 Conservation of linear momentum, 17, 80
Bessel functions of the 2nd kind, 525 Conservation of mechanical energy, 17, 80
asymptotic form, 525 Conservative forces, 8, 12–16, 431
542 Index
Conserved charge, 349 Elastic collision, 154

Conserved quantities, 101–102, 341–345 Electromagnetic field tensor, 404
Constancy of speed of light, 361 Elevation, 204
Constrained systems, 31–37 Elliptic coordinates, 145
Constrained variations, 77–80 Elliptic functions, 534–535
Constraint surface, 44 Elliptic integrals, 114, 532–533
Continuity equation, 342 Embedding, 44
Continuous symmetry, 101–102 Energy function, 81
Contravariant vectors, 410–411 Energy-momentum 4-vector, 388
Coordinate basis vectors, 418–419 Equilibrium, 262
Coriolis force, 23 Ether, 360
Coupled oscillations, 270–273 Euler angles, 190, 202–204
Covariant vectors, 410–411 Euler equation, 74, 453–456
Critically-damped oscillator, 266 Euler-Lagrange equation, 74, 454
Cross product, 407 Euler’s equations for rigid body motion, 225,
Curl, 414, 426–427 234–235
Cycloid, 461 Euler’s theorem, 206
Cyclonic motion, 23 Exact form, 441–442
Exterior derivative, 437–438, 440
Exterior product, 407
D Extrema, 451
D’Alembertian, 316 Extrinsic rotation matrices, 201
D’Alembert’s principle, 54–55
Damped oscillations, 265–266, 291–292
F
Damping ratio, 266
Fictitious force, 18, 19, 23
Determinant, 493–494, 502–503
Foucault’s pendulum, 25–30
Differential cross section, 163–166
4-acceleration, 387–388
Differential forms, 46, 437–449
4-force, 392
Dimension, 479
Fourier transform, 312, 313
Dirac delta function, 311, 432–436
4-momentum, 388–389
Directional derivative, 417–423
4-vector, 382–383
scalar function, 412, 418–419
4-velocity, 386–387
vector field, 415, 418–423
Frobenius series, 506
Direction cosines, 20, 197
Frobenius’ theorem, 46, 47, 442–445
Dispersion relation, 291 Fuch’s theorem, 506
Dissipation, 265 Functional, 451–453
Divergence, 414, 427–429 Functional derivative, 331–332, 456–458
Divergence theorem, 430 Fundamental theorem for gradients, 430
Dot product, 407, 481 Fundamental theorem of algebra, 498
Double cover, 220
Double pendulum, 38, 278–282
Driven harmonic oscillator, 133 G
Driven oscillations, 266–269, 292–293 Galilean transformation, 17, 360
Dual vectors, see covariant vectors Gamma function, 523
Gauge transformation, 402
Generalized coordinates, 42
E Generalized force, 56, 62
Eccentric anomaly, 125 Generalized momenutm, 81
Effective one-body formalism, 117–119 Generalized potential, 64–66
Effective potential, 60, 119 General relativity, 359, 362
Ehrenfest’s theorem, 104 Generating function, 91–95, 513
Eigenvalue, 496–503 Geodesic, 455
Eigenvector, 496–503 Gimbal lock, 191, 217–219
Index 543
Gradient, 412, 425–426 J

Gram-Schmidt orthonormalization, 482– Jacobian, 446
483 Jacobi forms, 532
Gravitational slingshot, 169–173
Group, 197
K
k calculus, 363–368
H Kepler’s equation, 125–127
Hamiltonian, 82, 84–86 Kepler’s laws, 115–116
Hamiltonian density, 335–337 k factor, 365
Hamilton-Jacobi equation, 110 Kinetic energy, 7
Hamilton’s equations, 82–88 Klein-Gordon equation, 331
Hamilton’s principal function, 109 Kronecker delta, 408
Hamilton’s principle, 73–77
Heading, 204
L
Heaviside theta function, 311, 315
Lab frame, 155
Helmholtz equation, 303, 316, 531
Lagrange multipliers, 52–54, 471
Hermitian, 492
Lagrange points, 143
Hermitian conjugate, 492
Lagrange’s equations of the 1st kind, 55–61
Holonomic constraints, 42–44, 78–79
Lagrange’s equations of the 2nd kind, 61–64
Hyperbolic angle, 376
Lagrangian, 64
Hyperbolic equations, 304
Lagrangian density, 327–328
Hyperbolic functions, 508–509
Laplace development, 493
recursion relation, 508
Laplace-Runge-Lenz vector, 141–142
Laplacian, 417, 429–430
Legendre forms, 532
I Legendre polynomials, 510–515
Ideal gas law, 131 generating function, 513
Identity matrix, see unit matrix orthonormality condition, 512
Impact parameter, 159 recurrence relations, 514
Implicit function theorem, 83 recursion relation, 510
Indicial equation, 506, 523 Rodrigues’ formula, 512
Inelastic collision, 154, 178 Legendre transform, 82–84
Inelastic scattering, 178–180 Length contraction, 371–373
Inertial reference frame, 2, 359 Levi-Civita symbol, 408
Infinite boundary conditions, 312–315 Light cone, 381–382
Infinitesimal canonical transformation, 96– Lightlike, 380
100 Linear combination, 479
Infinitesimal generating function, 98 Linear independence, 479
Infinitesimal transformation, 209 Linear transformation, 486–496
Inner product, 383, 407, 481–486 Linear triatomic molecule, 282–285
Inner product space, 482 Line element, 423
Instantaneous angular velocity vector, 21, Line of nodes, 203
211 Loaded string, 285–291
Integrals of the motion, 16 Lorentz group, 374
Integrating factor, 46, 443 Lorentz gauge, 403
Internal symmetries, 352 Lorentzian geometry, 379
Interparticle forces, 9 Lorentz transformation, 361, 374–379
Intrinsic rotation matrices, 201 Lunar Orbit Insertion, 148
Inverse matrix, 494
Inverse-square-law force, 115
Inversion, see parity transformation M
Isoperimetric problem, 471–475 Mean anomaly, 127
544 Index
Mechanical energy, 8 Parity transformation, 198

Metric, 424 Passive transformation, 193–197
Minkowski metric, 379 Pauli spin matrices, 223
Minkowski spacetime, 379 Perfectly inelastic collision, 155
Minor, 493 Periapse precession, 149
Modal matrix, 275 Periapsis, 119
Modified Bessel functions, 527–529 Periodic boundary conditions, 308–312
asymptotic form, 528 Phase space, 84
Modified Bessel’s equation, 522 Photons, 389–391
Moment of inertia, 228 Physical pendulum, 256, 262
Momentum, 2 Pitch, 204
Monogenic forces, 65 Plummer potential, 150
Poincaré group, 374–379
Poincaré lemma, 441
N Point transformation, 93
Natural boundary conditions, 469 Poisson brackets, 88–90
Natural frequency, 265 Polygenic, 65
Neumann function, 525 Potential energy, 8
Newton’s 1st law, 2 Power series solutions, 505
Newton’s 2nd law, 2 Precession of the equinoxes, 247–252
Newton’s 3rd law, 2 Principal axes, 228, 233–234
strong form, 3 Principal moments of inertia, 233
Newton’s laws, 2–4 Principle of relativity, 360
Noether’s theorem, 348–354 Principle of virtual work, 49–51
Non-Euclidean geometry, 379 Prolate, 239
Non-holonomic constraints, 42, 44–46, 79– Proper length, 371
80 Proper reference frame, 371
Non-inertial reference frames, 17–30 Proper time, 369, 384–385
Non-integrable constraints, 42
Norm, 384
Normal coordinates, 276–278
Q
Normal form, 304–306
Quadrupole formula, 255
Normalized, 482
Quaternions, 191, 213–217
Normal modes, 276–278
Q-value, 180
Normal mode frequencies, 276–278
Null, 380
Nutation, 246
R
Radial force, see also central force
O Radius of gyration, 257
Oblate, 239 Rapidity, 376
Orthogonal, 384, 482 Recurrence relations, 514
Orthogonal curvilinear coordinates, 423– Bessel functions, 526
430 Legendre polynomials, 514
Orthogonal group, 197–200 Recursion relation, 506
Orthogonal matrix, 20, 495 hyperbolic functions, 508
Orthogonal transformation, 192 Legendre polynomials, 510
Orthonormal, 482 trigonometric functions, 507
Orthonormal basis, 482 Redshift, 373
Overdamped oscillator, 266 Reduced mass, 117
Reduced rotational inertia tensor, 249
Regular singular point, 506
P Relative future, 382
Parallel-axis theorem, 230–232 Relative past, 382
Index 545
Relativistic beaming, 399 Stable equilibrium, 262

Relativistic Doppler effect, 374 Standing waves, 290
Relativistic dynamics, 392–393 Static equilibrium, 51
Relativistic energy, 388 Stationary values, 453
Relativistic field theory, 395–396 Stellar aberration, 399
Relativistic kinematics, 386–391 Stokes’ theorem, 430, 448–449
Relativistic kinetic energy, 388 Stress-energy tensor, 345–348
Relativistic Lagrangian formalism, 393–396 Strong form of Newton’s 3rd law, 2
Relativistic 3-momentum, 388 SU (2), 223
Resonance, 269–270 Symmetric, 492
Rest energy, 388 Symplectic structure, 107
Rigid body, 189
Roche model, 142
Roche potential, 143 T
Roll, 204 Tachyon, 381
Rotational inertia, 225 Tait-Bryan angles, 190, 204–205
Rotational inertia tensor, 227–230 Terrell rotation, 400
Rotation matrix, 20, 200–202 Test function, 433
Rutherford formula, 169 Thomas rotation, 378, 400
Rutherford scattering, 169 Three-body problem, 142–144
3-sphere, 215
Time dilation, 369–371
Timelike, 381
S
Torque, 9
Scalar field, 412
Total angular momentum, 10
Scalar multiplication, 477–479
Total anti-symmetrization, 440–441
Scalar product, 407
Total cross section, 160
Scalars, 477
Total linear momentum, 9
Schrödinger equation, 103, 358
Trace, 496, 502–503
Schwarz inequality, 485–486 Trans Lunar Injection, 148
Semi-holonomic constraints, 105 Transpose, 492–493
Semi-major axis, 121 Transverse relativistic Doppler effect, 374,
Semi-minor axis, 121 391
Shear, 415, 432 Trigonometric functions, 507–508
Similarity transformation, 491 orthonormality condition, 507
Simultaneity, 368–369 recursion relation, 507
Slope parameter, 375 True anomaly, 125
S O(3), 198, 219–221 Turning points, 246
Space cone, 239 Twin paradox, 396
Spacelike, 380
Spacetime, 359
Spacetime diagram, 363 U
Spacetime inner product, 383–384 Underdamped oscillator, 266
Spacetime interval, 379 Unitary matrix, 495
Spacetime line element, 379–380 Unit matrix, 494–495
Span, 479 Unstable equilibrium, 262
Special orthogonal group, 197–200
Special relativity, 359–405
Spherical Bessel functions, 319, 530–532 V
Spherical harmonics, 319, 517–522 Variational derivative, 334–335
addition theorem, 520–521 Vector addition, 477–478
orthonormality condition, 518 Vector field, 412
Spherical pendulum, 31 Vector product, 407
Spontaneous disintegration, 181 Vectors, 477
546 Index
Vector space, 477–479 Work, 6–7, 11–12

Velocity dispersion, 151 Work-energy theorem, 7–8, 12
Vibrating string, 300–302 World line, 381
Virial theorem, 129–132
Virtual displacement, 49
Virtual work, 50 X
x yz convention, 204
W
Wave equation, 299–302 Y
one-dimensional, 302–306 Yaw, 204
three-dimensional, 316–320 Yukawa potential, 148
Wave number, 290
Wave vector, 317, 390
Wedge product, 107, 437–439 Z
Wigner rotation, 378, 400 Zero-rest-mass particle, see photons
Wigner rotation matrices, 521 zyz convention, 203

2018 Bookmatter ClassicalMechanics PDF

Uploaded by

Copyright:

Available Formats

2018 Bookmatter ClassicalMechanics PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2018 Bookmatter ClassicalMechanics PDF

Uploaded by

Copyright:

Available Formats

Appendix A

Vector calculus is an indispensable mathematical tool for classical mechanics. It pro-

A.1 Vector Algebra

In addition to adding two vectors, A + B, and multiplying a vector by a scalar, aA,

© Springer International Publishing AG 2018 407

where θ is as before (assumed to be between 0◦ and 180◦ ), and n̂ is a unit vector

is the Kronecker delta, and2

the direction of B. Your thumb then points in the direction of n̂.

Fig. A.1 Components of a A2 = A · ê2

A key identity relating the Kronecker delta and Levi-Civita symbol is

Vector triple product:

A × (B × C) = B(A · C) − C(A · B) (A.10)

Exercise A.2 Prove the above three identities.

A.2 Vector Component and Coordinate Notation

A.2.1 Contravariant and Covariant Vectors

In general, one should distinguish between vectors with components Ai (so-called

êi · ê j = δi j = diag(1, 1, 1) , (A.13)

Hence, Ai = Ai , so it doesn’t matter where we place the index. For simplicity of

eα · eβ = ηαβ = diag(−1, 1, 1, 1) , (A.15)

where α = 0, 1, 2, 3 labels the spacetime coordinates x α ≡ (ct, x, y, z) of an inertial

A.2.2 Coordinate Notation

Regarding coordinates, we will generally use superscripts (as we have above) to

A.2.3 Other Indices

A.3 Differential Vector Calculus

In more compact form,

Fig. A.4 Example of a 1

A.3.1 Product Rules

It turns out that the product rule

A.3.2 Second Derivatives

Example A.1 Prove that ∇ × ∇U = 0 and ∇ · (∇ × A) = 0.

since partial derivatives commute and εi jk is totally anti-symmetric. Similarly,

again since partial derivatives commute and εi jk is totally anti-symmetric.

Exercise A.5 Verify ∇ × (∇ × A) = ∇(∇ · A) − ∇ 2 A in Cartesian coordinates.

A.4 Directional Derivatives

A.4.1 Directional Derivative of a Function; Coordinate

Suppose we are given a curve x i = x i (λ) parametrized by λ, with tangent vector

The notation v( f ) should be thought

Thus, the directional derviative of a scalar field sets up aone-to-one correspondence

and formally converts it to a vector equation

which we can then use to calculate unit vectors

û = Nu−1 ∂ u , v̂ = Nv−1 ∂ v , ŵ = Nw−1 ∂ w . (A.36)

A.4.2 Directional Derivative of a Vector Field

and then evaluate ∇êi ê j by further expanding ê j as a linear combination of the

The Ci jl are often called connection coefficients. Thus,

Example A.2 Calculate ∇êi ê j in spherical coordinates (r, θ, φ).

x = r sin θ cos φ , y = r sin θ sin φ , z = r cos θ . (A.44)

Using the chain rule to relate partial derivatives, e.g.,

r̂ = ∂ r = sin θ cos φ x̂ + sin θ sin φ ŷ + cos θ ẑ ,

x̂ = ∂ x = cos φ sin θ r̂ + cos φ cos θ θ̂ − sin φ φ̂ ,

Performing the derivatives as described above, we find

Continuing in this fashion:

∇r̂ r̂ = 0 , ∇θ̂ r̂ = r −1 θ̂ , ∇φ̂ r̂ = r −1 φ̂ ,

Exercise A.6 Calculate ∇êi ê j in cylindrical coordinates (ρ, φ, z).

x = ρ cos φ , y = ρ sin φ , z = z. (A.50)

Relation between basis vectors:

with inverse relations:

and directional derivatives:

∇ρ̂ ρ̂ = 0 , ∇φ̂ ρ̂ = ρ −1 φ̂ , ∇ẑ ρ̂ = 0 ,

A.5 Orthogonal Curvilinear Coordinates

Thus, the directional derviative of a scalar field sets up aone-to-one correspondence

δ(r − r ) = δ(x − x )δ(y − y )δ(z − z ) ,

∇ 2 (r, t) = 4π Gμ(r, t) , (A.108)

Thus, (r) = −Gm/|r − r0 | as claimed.