2018 Bookmatter ClassicalMechanics PDF
2018 Bookmatter ClassicalMechanics PDF
2018 Bookmatter ClassicalMechanics PDF
Vector Calculus
A · B ≡ AB cos θ , (A.1)
where A ≡ |A|, B ≡ |B| are the magnitudes (or norms) of A, B, and θ is the angle
between the two vectors. (ii) The cross product (also called the vector product or
exterior product) of A and B is defined by
A × B ≡ AB sin θ n̂ , (A.2)
it follows that
A·B= δi j Ai B j = Ai Bi (A.4)
i, j i
and
(A × B)i = εi jk A j Bk , (A.5)
j,k
where
1 i= j
δi j ≡ (A.6)
0 i = j
is the Levi-Civita symbol. We note that the above component expressions for dot
product and cross product are valid with respect to any orthonormal basis, and not
just for Cartesian coordinates.
Geometrically, the dot product of two vectors is the projection of one vector onto
the direction of the other vector, times the magnitude of the other vector. Thus, by
1 Point the fingers of your right hand in the direction of A, and then curl them toward your palm in
the numbers. For example, 213 is an odd permutation of 123, while 231 is an even permutation.
Appendix A: Vector Calculus 409
e1 A1 = A · ê1
taking the dot product of A with the orthonormal basis vectors êi , we obtain the
components of A with respect to this basis, i.e., Ai = A · êi . This is shown in
Fig. A.1.
Exercise A.1 Prove that the geometric and component expressions for both the
dot product, (A.1) and (A.4), and cross product, (A.2) and (A.5), are equivalent
to one another, choosing a convenient coordinate system to do the calculation.
εi jk εilm = δ jl δkm − δ jm δkl . (A.8)
i
Using this identity and the component forms of the dot product and cross product,
one can prove the following three results:
Scalar triple product:
A · (B × C) = B · (C × A) = C · (A × B) (A.9)
Jacobi identity:
A × (B × C) + B × (C × A) + C × (A × B) = 0 (A.11)
Before proceeding further, we should comment on the index notation that we’ll be
using throughout this book.
∂ xi ∂xi
i
A = Ai , Ai = Ai . (A.12)
i
∂xi i
∂ xi
But since most of the calculations that we will perform involve quantities in ordi-
nary 3-dimensional Euclidean space with components defined with respect to an
orthonormal basis, then
and the two sets of components Ai and Ai can be mapped to one another using the
Kronecker delta:
Ai = δi j A j ⇔ A1 = A1 , A2 = A2 , A3 = A3 . (A.14)
j
Exercise A.3 Show that under a coordinate transformation x i → x i (x i ), the
components Ai of a contravariant vector transform like the coordinate differ-
entials dx i , while the components Ai of a covariant vector transform like the
partial derivative operators ∂/∂ x i .
with the placement of the superscript or subscript indices matching on both sides of
the equation.3 This is valid even for spaces that are not Euclidean and for coordinates
that are not Cartesian, e.g., the angular coordinates describing the configuration of a
planar double pendulum (See e.g., Problem 1.4).
All other types of indices that we might need to use, e.g., to label different functions,
basis vectors, or particles in a system, etc., will be placed as either superscripts or
subscripts in whichever way is most notationally convenient for the discussion at
hand. There is no “transformation law” associated with changes in these types of
indices, so there is no standard convention for their placement.
To do calculus with vectors, we need fields—both scalar fields, which assign a real
number to each position in space, and vector fields, which assign a three-dimensional
vector to each position. An example of a scalar field is the gravitational potential (r)
for a stationary mass distribution, written as a function of the spatial location r. An
example of a vector field is the velocity v(r, t) of a fluid at a fixed time t, which is a
function of position r within the fluid.
Given a scalar field U (r) and vector field A(r), we can define the following
derivatives:
(i) Gradient:
U (r2 ) − U (r1 )
(∇U ) · t̂ ≡ lim , (A.19)
s→0 s
where r1 and r2 are the endpoints (i.e., the ‘boundary’) of the vector displacement
s ≡ s t̂. Thus, (∇U ) · t̂ measures the change in U in the direction of t̂. This is the
directional derivative of the scalar field U . The direction of ∇U is perpendicular
to the contour lines U (r) = const, since the right-hand side is zero for points that
3 If we had swapped the notation and denoted the components of contravariant vectors with subscripts
and the components of covariant vectors with superscripts, then to match indices would require
denoting the collection of coordinates with subscripts, like xi . In retrospect, this might have been a
less confusing notation for coordinates (e.g., no chance of confusing the second coordinate x2 with
x-squared, etc.). But we will stick with the coordinate index notation that we have adopted above
since it is the standard notation in the literature.
Appendix A: Vector Calculus 413
10
5
U(x,y)
−5
−10
1
1
0 0.5
0
−0.5
y −1 −1
x
1 6
0.8 5
0.6 4
3
0.4
2
0.2
1
0
y
0
−0.2
−1
−0.4
−2
−0.6 −3
−0.8 −4
−1 −5
−1 −0.5 0 0.5 1
x
Fig. A.2 Top panel: Function U (x, y) displayed as a 2-dimensional surface. Bottom panel: Contour
plot (lines of constant U , lighter lines corresponding to larger values) with gradient vector field ∇U
superimposed. Note that the direction of ∇U is perpendicular to the U (x, y) = const lines and is
largest in magnitude where the change in U is greatest
lie along a contour. Hence the gradient ∇U points in the direction of steepest ascent
of the function U . This is illustrated graphically in Fig. A.2 for a function of two
variables U (x, y).
414 Appendix A: Vector Calculus
(ii) Curl:
1
(∇ × A) · n̂ ≡ lim A · ds , (A.20)
a→0 a C
where C is the boundary of the area element a ≡ n̂ a, and ds is the infinitesimal
displacement vector tangent to C. The curl measures the circulation of A(r) around
an infinitesimal closed curve. An example of a vector field with a non-zero curl is
shown in panel (a) of Fig. A.3.
(iii) Divergence:
1
∇ · A ≡ lim A · n̂ da , (A.21)
V →0 V S
where S is the boundary of the volume V . The divergence measures the flux of
A(r) through the surface bounding an infinitesimal volume element. An example of
a vector field with a non-zero divergence is shown in panel (b) of Fig. A.3.
The beauty of the above definitions is that they are geometric and do not refer
to a particular coordinate system. In Appendix A.5, we will write down expressions
for the gradient, curl, and divergence in arbitrary orthogonal curvilinear coordinates
(u, v, w), which can be derived from the above definitions. In Cartesian coordinates
(x, y, z), the expressions for the three different derivatives turn out to be particularly
simple:
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
y
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
x x
(a) (b)
Fig. A.3 Panel (a) Example of a vector field, A(r) = −y x̂ + x ŷ, with a non-zero curl, ∇ ×A = 2ẑ.
Panel (b) Example of a vector field A(r) = x x̂ + y ŷ + z ẑ, with a non-zero divergence, ∇ · A = 3.
In both cases, just the z = 0 values of the vector fields are shown in these figures
Appendix A: Vector Calculus 415
∂U ∂U ∂U
∇U = x̂ + ŷ + ẑ ,
∂x ∂y ∂z
∂ Az ∂ Ay ∂ Ax ∂ Az ∂ Ay ∂ Ax
∇×A= − x̂ + − ŷ + − ẑ ,
∂y ∂z ∂z ∂x ∂x ∂y
∂ Ax ∂ Ay ∂ Az
∇·A= + + .
∂x ∂y ∂z
(A.22)
where ∂i is shorthand for the partial derivative ∂/∂ x i , where x i ≡ (x, y, z).
We conclude this subsection by noting that the curl and divergence of a vector
field A(r), although important derivative operations, do not completely capture how
a vector field changes as you move from point to point. A simple counting argument
shows that, in three-dimensions, we need 3 × 3 = 9 components to completely
specify how a vector field changes from point to point (three components of A times
the three directions in which to take the derivative). The curl and divergence supply
3 + 1 = 4 of those components. So we are missing 5 components, which turn out
to have the geometrical interpretation of shear (See, e.g., Romano and Price 2012).
The shear can be calculated in terms of the directional derivative of a vector field,
which we shall discuss in Appendix A.4. Figure A.4 shows an example of a vector
field that has zero curl and zero divergence, but is clearly not a constant. This is an
example of a pure-shear field (See Exercise A.11).
shear 0.4
0.2
0
y
−0.2
−0.4
−0.6
−0.8
−1
−1 −0.5 0 0.5 1
x
416 Appendix A: Vector Calculus
d df dg
( f g) = g+ f (A.24)
dx dx dx
for ordinary functions of one variable, f (x) and g(x), extends to the gradient, curl,
and divergence operations, although the resulting expressions are more complicated.
Since there are four different ways of combining a pair of scalar and/or vector fields
(i.e., f g, A · B, f A, A × B) and two different ways of taking derivatives of vector
fields (either curl or divergence), there are six different product rules: here are six
different product
∇( f g) = (∇ f )g + f (∇g) , (A.25a)
∇(A · B) = A × (∇ × B) + B × (∇ × A) + (A · ∇)B + (B · ∇)A ,(A.25b)
∇ × ( f A) = (∇ f ) × A + f ∇ × A , (A.25c)
∇ × (A × B) = (B · ∇)A − (A · ∇)B + A(∇ · B) − B(∇ · A) , (A.25d)
∇ · ( f A) = (∇ f ) · A + f ∇ · A , (A.25e)
∇ · (A × B) = (∇ × A) · B − A · (∇ × B) . (A.25f)
We will discuss some of these product rules in more detail in Appendix A.4.
Exercise A.4 Prove the above product rules. (Hint: Do the calculations in Carte-
sian coordinates where the expressions for gradient, curl, and divergence are the
simplest.)
It is also possible to take second (and higher-order) derivatives of scalar and vector
fields. Since ∇U and ∇ × A are vector fields, we can take either their divergence or
curl. Since ∇ · A is a scalar field, we can take only its gradient. Thus, there are five
such second derivatives:
∇ · ∇U ≡ ∇ 2 U , (A.26a)
∇ × ∇U = 0 , (A.26b)
∇(∇ · A) = a vector field , (A.26c)
∇ · (∇ × A) = 0 , (A.26d)
∇ × (∇ × A) ≡ ∇(∇ · A) − ∇ 2 A . (A.26e)
Appendix A: Vector Calculus 417
Note that the curl of a gradient, ∇ ×∇U , and the divergence of a curl, ∇ ·(∇ ×A), are
both identically zero. The divergence of a gradient defines the Laplacian of a scalar
field, ∇ 2 U , and the curl of a curl defines the Laplacian of a vector field, ∇ 2 A (second
term on the right-hand side of (A.26e)). In Cartesian coordinates x i ≡ (x, y, z), the
scalar and vector Laplacians are given by
∂ 2U ∂ 2U ∂ 2U
∇ 2U = + + ,
∂x2 ∂ y2 ∂z 2
(A.27)
2
∂ 2 Ai ∂ 2 Ai ∂ 2 Ai
∇ Ai= + + , i = 1, 2, 3 .
∂x2 ∂ y2 ∂z 2
The gradient of a divergence is a non-zero vector field in general, but it has no special
name, as it does not appear as frequently as the Laplacian operator.
You might have noticed that the right-hand sides of (A.25b) and (A.25d) for ∇(A · B)
and ∇ × (A × B) involve quantities of the form (B · ∇)A, which are not gradients,
curls, or divergences of a vector or scalar field. Geometrically, (B · ∇)A represents
418 Appendix A: Vector Calculus
the directional derivative of the vector field A in the direction of B, which generalizes
the definition of the directional derivative of a scalar field. To calculate (B · ∇)A,
we need to evaluate the directional derivatives of the components Ai with respect to
a basis êi , as well as the directional derivatives of the basis vectors themselves. But
before doing that calculation, it is worthwhile to remind ourselves about directional
derivatives of scalar fields, and also how to calculate coordinate basis vectors in
arbitrary curvilinear coordinates (u, v, w).
df dx i ∂ f ∂f
≡ = vi i ≡ v( f ) . (A.30)
dλ i
dλ ∂ x i
i
∂x
∂ ∂x ∂ ∂y ∂ ∂z ∂
= + + , etc. , (A.32)
∂u ∂u ∂ x ∂u ∂ y ∂u ∂z
4 A particular coordinate basis vector ∂ points along the x i coordinate line, with all other coordinates
i
(i.e., xj with j = i) constant.
Appendix A: Vector Calculus 419
∂x ∂y ∂z
∂u = ∂x + ∂y + ∂z , etc. , (A.33)
∂u ∂u ∂u
with partial derivative operators replaced everywhere by coordinate basis vectors.
But since the coordinate basis vectors in Cartesian coordinates are orthogonal and
have unit norm, with ∂ x = x̂, etc., it follows that
∂x ∂y ∂z
∂u = x̂ + ŷ + ẑ ,
∂u ∂u ∂u
∂x ∂y ∂z
∂v = x̂ + ŷ + ẑ , (A.34)
∂v ∂v ∂v
∂x ∂y ∂z
∂w = x̂ + ŷ + ẑ .
∂w ∂w ∂w
The norms of these coordinate basis vectors are then given by
2
2
2
∂x ∂y ∂z
Nu ≡ |∂ u | = + + ,
∂u ∂u ∂u
2
2
∂y 2
∂x ∂z
Nv ≡ |∂ v | = + + , (A.35)
∂v
∂v ∂v
2
2
∂x 2 ∂y ∂z
Nw ≡ |∂ w | = + + ,
∂w ∂w ∂w
In general, these unit vectors will not be orthogonal, although they will be for several
common coordinate systems, including spherical coordinates (r, θ, φ) and cylindrical
coordinates (ρ, φ, z) (See Appendix A.5 for details). We will use the above results
in the next section when calculating the directional derivative of a vector field in
non-Cartesian coordinates.
Let’s return now to the problem of calculating (B·∇)A, which started this discussion
of directional derivatives. In Cartesian coordinates, it is natural to define (B · ∇)A in
terms of its components via
420 Appendix A: Vector Calculus
(B · ∇)A ≡ (Bi ∂i )A j ê j (A.37)
i, j
since the orthonormal basis vectors êi = {x̂, ŷ, ẑ} are constant vector fields. In
non-Cartesian coordinates, where the coordinate basis vectors change from point to
point, we would need to make the appropriate coordinate transformations for both
the vector components and the partial derivative operators. Although straightforward,
this is usually a rather long and tedious process.
A simpler method for calculating (B · ∇)A in non-Cartesian coordinates x i is to
expand both A and B in terms of the orthonormal basis vectors êi ,
(B · ∇)A = (Bi êi · ∇)(A j ê j ) = Bi (∇êi A j )ê j + Bi A j (∇êi ê j ) , (A.38)
i, j i, j i, j
where we have applied the derivatives only to the expansion coefficients jk , since
the Cartesian basis vectors êk are constants. For example, for the spherical coordinate
basis vectors êi = {r̂, θ̂ , φ̂}, we have
⎡ ⎤
sin θ cos φ sin θ sin φ cos θ
= ⎣ cos θ cos φ cos θ sin φ − sin θ ⎦ ,
− sin φ cos φ 0
⎡ ⎤ (A.40)
cos φ sin θ cos φ cos θ − sin φ
−1 = ⎣ sin φ sin θ sin φ cos θ cos φ ⎦ ,
cos θ − sin θ 0
for the matrix of expansion coefficients jk and its inverse (−1 )k l (See Exam-
ple A.2 for details). If we then re-express the Cartesian basis vectors in terms of the
original non-Cartesian basis vectors using the inverse transformation matrix (−1 )k l ,
we obtain
∇êi ê j = (∇êi jk ) (−1 )k l êl = Ci jl êl , (A.41)
k l l
where
Ci jl ≡ (∇êi jk )(−1 )k l . (A.42)
k
Appendix A: Vector Calculus 421
(B · ∇)A = Bi (∇êi A j )ê j + Bi A j Ci jl êl . (A.43)
i, j i, j,l
Finally, if the non-Cartesian coordinates x i are orthogonal, as is the case for spherical
coordinates (r, θ, φ) and cylindrical coordinates (ρ, φ, z), then ∇êi = Ni−1 ∂/∂ x i ,
where Ni is a normalization factor relating the (in general, unnormalized) coordi-
nate basis vectors ∂ i to the orthonormal basis vectors êi . For example, in spherical
coordinates r̂ = ∂ r , θ̂ = r −1 ∂ θ , and φ̂ = (r sin θ )−1 ∂ φ .
Although this might seem like a complicated procedure when discussed abstractly,
in practice it is relatively easy to carry out, as the following example shows.
∂ ∂x ∂ ∂y ∂ ∂z ∂
= + + , etc. , (A.45)
∂r ∂r ∂ x ∂r ∂ y ∂r ∂z
it follows that
using the one-to-one correspondence between vectors and directional derivative op-
erators discussed in Appendix A.4.1. The inverse transformation is given by
ρ̂ = ∂ ρ = cos φ x̂ + sin φ ŷ ,
φ̂ = ρ −1 ∂ φ = − sin φ x̂ + cos φ ŷ , (A.51)
ẑ = ∂ z = ẑ ,
In this section we derive expressions for the gradient, curl, and divergence in general
orthogonal curvilinear coordinates (u, v, w). Our starting point will be the definitions
of gradient, curl, and divergence given in (A.19), (A.20), and (A.21). Examples of
orthogonal curvilinear coordinates include Cartesian coordinates (x, y, z), spherical
coordinates (r, θ, φ), and cylindrical coordinates (ρ, φ, z). These are the three main
coordinate systems that we will be using most in this text.
Recall that in Cartesian coordinates (x, y, z), the line element or infinitesimal
squared distance between two nearby points is given by
ds 2 = dx 2 + dy 2 + dz 2 (Cartesian) . (A.54)
Using the transformation equations (A.44) and (A.50), it is fairly easy to show that
in spherical coordinates (r, θ, φ) and in cylindrical coordinates (ρ, φ, z):
ds 2 = dr 2 + r 2 dθ 2 + r 2 sin2 θ dφ 2 (spherical) ,
(A.55)
ds = dρ + ρ dφ + dz
2 2 2 2 2
(cylindrical) .
where f , g, and h are functions of (u, v, w) in general. The fact that there are no
cross terms, like du dv, is a consequence of the coordinates being orthogonal. Note
that f = 1, g = r , and h = r sin θ for spherical coordinates and f = 1, g = ρ, and
h = 1 for cylindrical coordinates. These results are summarized in Table A.1.
For completely arbitrary curvilinear coordinates x i ≡ (x 1 , x 2 , x 3 ), the line ele-
ment, (A.56), has the more general form
ds 2 = gi j dx i dx j , (A.57)
i, j
Table A.1 Coordinates (u, v, w) and functions f , g, h for different orthogonal curvilinear coordi-
nate systems
Coordinates u v w f g h
Cartesian x y z 1 1 1
Spherical r θ φ 1 r r sin θ
Cylindrical ρ φ z 1 ρ 1
424 Appendix A: Vector Calculus
ds = f du û + g dv v̂ + h dw ŵ . (A.58)
This is illustrated graphically in panel (a) of Fig. A.5. It is also easy to see from this
figure that the infinitesimal volume element dV is given by
dV = f gh du dv dw . (A.59)
with the ± sign depending on whether the unit normals to the area elements point in
the direction of increasing (or decreasing) coordinate value. One such area element
is illustrated graphically in panel (b) of Fig. A.5.
(u+du,v+dv,w+dw)
ds
(u+du,v+dv,w)
w w
h dw
g dv g dv
v v
u u
(u,v,w) f du (u,v,w) f du
(a) (b)
Fig. A.5 Panel (a) Infinitesimal displacement vector ds and volume element dV = f gh du dv dw in
general orthogonal curvilinear coordinates. Panel (b) The infinitesimal area element corresponding
to the bottom (w = const) surface of the volume element shown in panel (a)
Appendix A: Vector Calculus 425
where det g is the determinant of the matrix g of metric components gi j (See Ap-
pendix D.4.3.2), and
⎧
⎨ ±n̂1 g22 g33 − (g23 )2 dx 2 dx 3
n̂ da = ±n̂2 g33 g11 − (g31 )2 dx 3 dx 1 (A.62)
⎩
±n̂3 g11 g22 − (g12 )2 dx 1 dx 2
where
∂2 × ∂3
n̂1 = , etc. (A.63)
|∂ 2 × ∂ 3 |
A.5.1 Gradient
(∇U ) · ds = dU . (A.64)
From (A.58), the left-hand side of the above equation can be written as
∂U ∂U ∂U
dU = du + dv + dw . (A.66)
∂u ∂v ∂w
426 Appendix A: Vector Calculus
By equating these last two equations, we can read off the components of the gradient,
from which we obtain
1 ∂U 1 ∂U 1 ∂U
∇U = û + v̂ + ŵ . (A.67)
f ∂u g ∂v h ∂w
1
U (x, y, z) = k(x 2 + y 2 ) + mgz . (A.68)
2
Calculate the force F = −∇U in (a) spherical coordinates and (b) cylindrical
coordinates. You should find:
A.5.2 Curl
where C is the infinitesimal closed curve bounding the area element n̂ da, with
orientation given by the right-hand rule relative to n̂. To calculate the components
of ∇ × A, we take (in turn) the three different infinitesimal area elements given in
(A.60). Starting with n̂ da = û gh dv dw, the left-hand side of (A.70) becomes
(∇ × A) · n̂ da = (∇ × A)u gh dv dw . (A.71)
A · ds = Av g dv + Aw h dw . (A.72)
Integrating this around the corresponding boundary curve C shown in Fig. A.6, we
find that the right-hand side of (A.70) becomes
Appendix A: Vector Calculus 427
(v,w)
A · ds = (Av g)|w dv + (Aw h)|v+dv dw − (Av g)|w+dw dv − (Aw h)|v dw
C
∂ ∂
= (Aw h) dv dw − (Av g) dw dv .
∂v ∂w
(A.73)
Thus,
1 ∂ ∂
(∇ × A)u = (Aw h) − (Av g) . (A.74)
gh ∂v ∂w
Repeating the above calculation for the other two components yields
1 ∂ ∂
(∇ × A)v = (Au f ) − (Aw h) , (A.75)
hf ∂w ∂u
and
1 ∂ ∂
(∇ × A)w = (Av g) − (Au f ) . (A.76)
f g ∂u ∂v
A.5.3 Divergence
(u,v,w)
(fg)|w du dv
where S is the infinitesimal closed surface bounding the volume element dV , with
outward pointing normal n̂. From (A.59), the left-hand side of the above equation
can be written as
(∇ · A) dV = (∇ · A) f gh du dv dw . (A.78)
From (A.60), the integrand of the right-hand side of (A.77) contains the terms
±Au gh dv dw
A · n̂ da = ±Av h f dw du . (A.79)
±Aw f g du dv
A · n̂ da = (Au gh)|u+du dv dw − (Au gh)|u dv dw
S
+(Av h f )|v+dv dw du − (Av h f )|v dw du
(A.80)
+ (Aw f g)|w+dw du dv − (Aw f g)|w du dv
∂ ∂ ∂
= (Au gh) + (Av h f ) + (Aw f g) du dv dw .
∂u ∂v ∂w
Thus,
1 ∂ ∂ ∂
∇·A= (Au gh) + (Av h f ) + (Aw f g) . (A.81)
f gh ∂u ∂v ∂w
Appendix A: Vector Calculus 429
A.5.4 Laplacian
Since the Laplacian of a scalar field is defined as the divergence of the gradient, it
immediately follows from (A.67) and (A.81) that
1 ∂ gh ∂U ∂ h f ∂U ∂ f g ∂U
∇ U=
2
+ + .
f gh ∂u f ∂u ∂v g ∂v ∂w h ∂w
(A.82)
∂U 1 ∂U 1 ∂U
∇U = r̂ + θ̂ + φ̂ ,
∂r r ∂θ r sin θ ∂φ
1 ∂ ∂ Aθ
∇×A= (Aφ sin θ ) − r̂
r sin θ ∂θ ∂φ
1 1 ∂ Ar ∂ 1 ∂ ∂ Ar
+ − (Aφ r ) θ̂ + (Aθ r ) − φ̂ ,
r sin θ ∂φ ∂r r ∂r ∂θ
1 ∂ 2
1 ∂ 1 ∂ Aφ
∇·A= 2 r Ar + (sin θ Aθ ) + ,
r ∂r r sin θ ∂θ r sin θ ∂φ
1 ∂ ∂U 1 ∂ ∂U 1 ∂ 2U
∇ 2U = 2 r2 + 2 sin θ + 2 2 .
r ∂r ∂r r sin θ ∂θ ∂θ r sin θ ∂φ 2
(A.83)
∂U 1 ∂U ∂U
∇U = ρ̂ + φ̂ + ẑ ,
∂ρ ρ ∂φ ∂z
1 ∂ Az ∂ Aφ ∂ Aρ ∂ Az 1 ∂ ∂ Aρ
∇×A= − ρ̂ + − θ̂ + (Aφ ρ) − ẑ ,
ρ ∂φ ∂z ∂z ∂ρ ρ ∂ρ ∂φ
1 ∂
1 ∂ Aφ ∂ Az
∇·A= ρ Aρ + + ,
ρ ∂ρ ρ ∂φ ∂z
1 ∂ ∂U 1 ∂ 2U ∂ 2U
∇ 2U = ρ + 2 + .
ρ ∂ρ ∂ρ ρ ∂φ 2 ∂z 2
(A.84)
430 Appendix A: Vector Calculus
Using the definitions of gradient, curl, and divergence given above, one can prove
the following fundamental theorems of integral vector calculus:
where C is the closed curved bounding the surface S, with orientation given by the
right-hand rule relative to n̂.
where S is the closed surface bounding the volume V , with outward pointing
normal n̂.
For infinitesimal volume elements, area elements, and path lengths, the proofs
of these theorems follow trivially from the definitions given in (A.19), (A.20), and
(A.21). For finite size volumes, areas, and path lengths, one simply adds together
the contribution from infinitesimal elements. The neighboring surfaces, edges, and
endpoints of these infinitesimal elements have oppositely-directed normals, tangent
vectors, etc., and hence yield terms that cancel out when forming the sum. For detailed
proofs, we recommend Schey (1996), Boas (2006), or Griffiths (1999).
Here we state (without proof) some additional theorems for vector fields. These make
use of the identities ∇ × ∇U = 0 and ∇ · (∇ × A) = 0, which we derived earlier
(See Example A.1), and the integral theorems of vector calculus from the previous
subsection.
Appendix A: Vector Calculus 431
F = −∇U + ∇ × W . (A.88)
U → U + C , where C = const ,
(A.89)
W → W + ∇ ,
leave F unchanged.
Both of the above theorems require that: (i) F be differentiable, and (ii) the region of
interest be simply-connected (i.e., that there are not any holes; see the discussion in
Appendix B.2). We will assume that both of these conditions are always satisfied.
Theorem A.5 is particularly relevant in the context of conservative forces, which
we encounter often in the main text. Recall that F is conservative if and only if
the work done by F in moving a particle from ra to rb is independent of the path
connecting
the two points. But path-independence is equivalent to the condition that
C F · d s = 0 for any closed curve C. Thus, from Theorem A.5, we can conclude that
a conservative force is curl-free, i.e., ∇ × F = 0, and that it can always be written
as the gradient of a scalar field.
A = x x̂ + y ŷ , B = −y x̂ + x ŷ , C = y x̂ + x ŷ . (A.92)
for any volume V containing the spike. The Dirac delta function is not an ordinary
mathematical function. It is what mathematicians call a generalized function or
distribution. An example of a Dirac delta function is the mass density of an idealized
point particle, μ(r) = mδ(r − r0 ).
In one dimension, the Dirac delta function δ(x − x0 ) can be represented as the
limit of a sequence of functions f n (x) all of which have unit area, but which get
narrower and higher as n → ∞. Some simple example sequences are:
(i) A sequence of top-hat functions centered at x0 with width 2/n:
n/2 x0 − 1/n < x < x0 + 1/n
f n (x) = (A.95)
0 otherwise
5 We are adopting here the standard “physicist’s” definition of a 3-dimensional Dirac delta function
(See e.g., Griffiths 1999), where we integrate it against the 3-dimensional volume element dV . But
note that we could also define a 3-dimensional Dirac delta function δ̃(r − r0 ) with respect to the
coordinate volume element d3 x via V d3 x δ̃(r − r0 ) = 1. (Recall that d3 x = du dv dw while
dV = f gh du dv dw for orthogonal curvilinear coordinates (u, v, w).) The difference between
these two definitions of the Dirac delta function shows up in their transformation properties under
a coordinate transformation, see Footnote 7.
Appendix A: Vector Calculus 433
n
f n (x) = √ e−n (x−x0 ) /2
2 2
(A.96)
2π
b
f (x ) a < x < b
dx f (x)δ(x − x ) = (A.98)
a 0 otherwise
Exercise A.12 Prove the following properties of the 1-dimensional Dirac delta
function, which follow from the defining property (A.98):
d
δ(x − a) = (u(x − a)) ,
dx
δ (−x) = −δ (x) ,
δ(−x) = δ(x) ,
(A.99)
1
δ(ax) = δ(x) ,
|a|
δ(x − xi )
δ [ f (x)] = .
i
| f (xi )|
and f (x) is such that f (xi ) = 0 and f (xi ) = 0. The last two properties
indicate that the one-dimensional Dirac delta function δ(x) transforms like a
density under a change of variables—i.e., δ(x) dx = δ(y) dy.
f (r ) if r ∈ V
dV f (r)δ(r − r ) = (A.101)
V 0 otherwise
for any test function f (r). Note that this definition implies
in order that
dV δ(r − r ) = dx dy dz δ(x − x )δ(y − y )δ(z − z )
= dr dθ dφ δ(r − r )δ(θ − θ )δ(φ − φ ) (A.103)
= dρ dφ dz δ(ρ − ρ )δ(φ − φ )δ(z − z )
1
δ(r − r ) = δ(u − u )δ(v − v )δ(w − w ) (A.104)
f gh
as a consequence of dV = f gh du dv dw.7
There is also an integral representation of the 1-dimensional Dirac delta function,
which can be heuristically “derived” by taking a limit of sinc functions:
L ∞
L 1 1
δ(x) = lim sinc(L x) = lim dk e ikx
= dk eikx , (A.105)
L→∞ π L→∞ 2π −L 2π −∞
7 Ifwe used the alternative definition of the 3-dimensional Dirac delta function δ̃(r − r0 ) discussed
in Footnote 5, then δ̃(r − r ) = δ(u − u )δ(v − v )δ(w − w ), without the factor of f gh.
Appendix A: Vector Calculus 435
Similarly, in 3-dimensions,
1
δ(r − r ) = dVk e±ik·(r−r ) , (A.107)
(2π )3 all space
Example A.3 Recall that in Newtonian gravity the gravitational potential (r, t)
satisfies Poisson’s equation
where G is Newton’s constant and μ(r, t) is the mass density of the source distri-
bution. Note that the left-hand side of the above equation is just the Laplacian of .
We now show that for a stationary point source μ(r, t) = mδ(r − r0 ), the potential
is given by the well-known formula
Gm
(r) = − . (A.109)
|r − r0 |
which follows from the expression for the divergence in spherical coordinates (See
Exercise A.9). To determine its behavior at r = 0, we consider the volume integral
of ∇ · (r̂/r 2 ) over a spherical volume of radius R centered at the origin. Using the
divergence theorem (A.87), we obtain
2π π
r̂ r̂ 1 2
∇· dV = · n̂ da = R sin θ dθ dφ = 4π , (A.111)
V r2 S r2 φ=0 θ=0 R2
independent of the radius R. Thus, by comparison with the definition of the Dirac
delta function, (A.94), we can conclude that
r̂
∇· = 4π δ(r) . (A.112)
r2
But since
1 r̂
∇ =− 2, (A.113)
r r
436 Appendix A: Vector Calculus
r − r 1
∇· = 4π δ(r − r ) , ∇ 2
= −4π δ(r − r ) . (A.115)
|r − r |3 |r − r |
Suggested References
Full references are given in the bibliography at the end of the book.
Boas (2006): Chapter 6 is devoted to vector algebra and vector calculus, especially
suited for undergraduates.
Griffiths (1999): Chapter 1 provides an excellent review of vector algebra and vector
calculus, at the same level as this appendix. Our discussion of orthogonal curvilin-
ear coordinates in Appendix A.5 is a summary of Appendix A in Griffiths, which
has more detailed derivations and discussion.
Schey (1996): An excellent introduction to vector calculus emphasizing the geometric
nature of the divergence, gradient, and curl operations.
Appendix B
Differential Forms
Although we will not need to develop the full machinery of tensor calculus for the
applications to classical mechanics covered in this book, the concept of a differential
form and the associated operations of exterior derivative and wedge product will
come in handy from time to time. For example, they are particulary useful for deter-
mining whether certain differential equations or constraints on a mechanical system
(See e.g., Sect. 2.2.3) are integrable or not. They are also helpful in understanding
the geometric structure underlying Poisson brackets (Sect. 3.5). More generally, dif-
ferential forms are actually the quantities that you integrate on a manifold, with the
integral theorems of vector calculus (Appendix A.6) being special cases of a more
general (differential-form version) of Stokes’ theorem.
In broad terms, the exterior derivative is a generalization of the total derivative (or
gradient) of a function, and the curl of a vector field in three dimensions. The wedge
product is a generalization of the cross-product of two vectors. And differential forms
are quantities constructed from a sum of wedge products of coordinate differentials
dx i . Readers interested to learn more about differential forms and related topics
should see e.g., Flanders (1963) and Schutz (1980).
B.1 Definitions
Since we have not developed a general framework for working with tensors, our
presentation of differential forms will be somewhat heuristic, starting with familar
examples for 0-forms and 1-forms, and then adding mathematical operations as need-
ed (e.g., wedge product and exterior derivative) to construct higher-order differential
forms. To keep things sufficiently general, we will consider an n-dimensional mani-
fold M with coordinates x i ≡ (x 1 , x 2 , . . . , x n ). From time to time we will consider
ordinary 3-dimensional space to make connection with more familiar mathematical
objects and operations.
α ≡ α(x 1 , x 2 , . . . , x n ) , (B.1)
∂xi
βi = βi , i = 1 , 2 , . . . , n , (B.3)
i
∂ xi
under a coordinate transformation x i → x i (x i ). We impose this requirement on the
components in order that
β≡ βi dx i = βi dx i (B.4)
i i
where ∂i α ≡ ∂α/∂ x i . Note that the exterior derivative of a 0-form is just the usual
total differential (or gradient) of a function.
Exercise B.1 Verify (B.4) using (B.3) and the transformation property of the
coordinate differentials dx i .
To construct a 2-form from two 1-forms, we introduce the wedge product of two
forms. We require this product to be anti-symmetric,
Appendix B: Differential Forms 439
dx i ∧ dx j = −dx j ∧ dx i , (B.6)
α ∧ ( fβ + gγ ) = f (α ∧ β) + g(α ∧ γ ) , (B.7)
where f and g are any two functions. Given this definition, it immediately follows
that the wedge product of two 1-forms α and β can be written as
α∧β = αi β j dx i ∧ dx j = (αi β j − α j βi ) dx i ∧ dx j , (B.8)
i, j i< j
where we used the anti-symmetry of dx i ∧ dx j to get the last equality. Note that in
three dimensions
αi β j − α j βi = εi jk (α × β)k , (B.9)
k
where on the right-hand side we are treating αi and βi as the components of two
vectors α and β. Thus, the wedge product of two 1-forms generalizes the cross
product of two vectors in three dimensions. The most general 2-form on M will have
the form
γ ≡ γi j dx i ∧ dx j , (B.10)
i< j
where on the right-hand side we are treating αi as the components of a vector field
α. So the exterior derivative of a 1-form generalizes the curl of a vector field in three
dimensions.
440 Appendix B: Differential Forms
We can continue in this fashion to construct 3-forms, 4-forms, etc., by requiring that
the wedge product be associative,
α ∧ (β ∧ γ ) = (α ∧ β) ∧ γ = α ∧ β ∧ γ . (B.13)
where the components αi1 i2 ···i p ≡ αi1 i2 ···i p (x 1 , x 2 , . . . , x n ) are totally anti-symmetric
under interchange of the indices i 1 , i 2 , . . . , i p . Similarly, the exterior derivative of a
p-form α is the ( p + 1) form
dα = ∂i1 αi2 i3 ···i p+1 − ∂i2 αi1 i3 ···i p+1 · · · − ∂i p+1 αi2 i3 ···i p i1
i 1 <i 2 <···<i p+1
dx i1 ∧ dx i2 ∧ · · · ∧ dx i p +1 . (B.15)
1
[i j] ≡ (i j − ji) ,
2!
1 (B.16)
[i jk] ≡ (i jk − ik j + jki − jik + ki j − k ji) ,
3!
etc. ,
we can write down the general expressions for the components of the wedge product
and exterior derivative in compact form:
( p + q)!
(α ∧ β)i1 ···i p j1 ··· jq = α[i1 ···i p β j1 ··· jq ] (B.17)
p!q!
Appendix B: Differential Forms 441
and
α ∧ β = (−1) pq β ∧ α . (B.19)
Exercise B.3 Let α be a p-form and β be a q-form. Show that the exterior
derivative of the wedge product α ∧ β satisfies
as a consequence of the ordinary product rule for partial derivatives and the
anti-symmetry of the differential forms.
d(dβ) = 0 , (B.21)
encircling the origin of the punctured plane, and any of the coordinates lines on the
torus shown in Fig. B.1 are not contractible to a point. (See Schutz 1980 or Flanders
1963 for more details.)
defined on the punctured plane. Show that α is closed, but globally is not exact.
Find a function f (x, y) for which α = d f locally. (Hint: Plane polar coordinates
(r, φ) might be useful for this.)
You may recall from a math methods class trying to determine if a 1st-order differ-
ential equation of the form
∂ y A = ∂x B , (B.24)
Appendix B: Differential Forms 443
then (at least locally) there exists a function ϕ ≡ ϕ(x, y) for which
But requiring that the differential equation be exact is actually too strong a require-
ment for integrability. More generally, (B.23) is integrable if and only if there exists
a function μ ≡ μ(x, y), called an integrating factor, for which
is exact, so that1
∂ y (μA) = ∂x (μB) . (B.27)
It turns out that in two dimensions one can always find such an integrating factor.
Thus, all 1st-order differential equations of the form given in (B.23) are integrable
(Exercise B.6). But explicitly finding an integrating factor in practice is not an easy
task in general.
Now in three and higher dimensions not all 1st-order differential equations are
integrable, so testing for integrability is a necessary and important task. Writing the
differential equation in n dimensions as
α≡ αi dx i = 0 , (B.28)
i
dϕ = μα (B.29)
0 = dμ ∧ α + μ dα ⇔ dα = −μ−1 dμ ∧ α . (B.31)
1 Recall from thermodynamics that heat flow is described by an inexact differential d¯Q (notationally,
the bar on the ‘d’ is to indicate that it is not the total differential of a function Q). But d¯Q becomes
exact when multiplied by an integrating factor, i.e., dS = d¯Q/T , where T is the temperature and
S is the entropy.
444 Appendix B: Differential Forms
dα ∧ α = 0 . (B.32)
where M < n and αiA ≡ αiA (x 1 , x 2 , . . . , x n ). We would like to know if this system
is integrable in the sense of defining an (n − M)-dimensional hypersurface in the
original n-dimensional space of coordinates. The necessary and sufficient condition
for this to be true is the existence of an invertible transformation from the α A to a set
of exact 1-forms (i.e., total differentials):
dϕ A = μ AB α B ⇔ α A = μ−1 AB dϕ B , (B.34)
B B
dα A ∧ α 1 ∧ α 2 ∧ · · · ∧ α M = 0 , A = 1, 2, . . . , M . (B.35)
Exercise B.6 Using Frobenius’ theorem, prove that any 1st-order differential
equation in two dimensions,
is integrable.
Appendix B: Differential Forms 445
Exercise B.7 (Adapted from Flanders 1963.) Consider the 1st-order differential
equation
α ≡ yz dx + x z dy + dz = 0 , (B.37)
in three dimensions.
Although you may not have thought about it this way, the things that you integrate
on a manifold are really just differential forms. Indeed, the integrand f (x) dx of the
familiar integral x2
f (x) dx (B.38)
x1
from calculus is, in the language of this appendix, a 1-form. And the transformation
property of f (x) under a change of variables x → y(x),
f (x)
f (x) → f (y) ≡ , (B.39)
dy/dx x=x(y)
The extension to 3-form, 4-form, · · · , n-form fields follows by noting that the above
integrands are special cases of the general result that a p-form γ maps a set of p
vectors with components {Ai , B j , . . . , C k } to the real number
γi j···k Ai B j · · · C k − A j B i · · · C k − · · · − Ak B j · · · C i . (B.44)
i< j<···<k
By taking
∂xi ∂x j ∂xk
Ai = du , B j = dv , ··· , C k = dw , (B.45)
∂u ∂v ∂w
where (u, v, . . . , w) are the coordinates for a p-dimensional hypersurface, we are able
to generalize (B.41) and (B.42) to arbitrary p-forms, with the appropriate Jacobians
entering these expressions. Figure B.2 shows the tangent vectors and infinitesimal
coordinate area element for a 2-dimensional surface spanned by the coordinates
(u, v).
Note that all of these integrals are oriented in the sense that swapping the order
of the coordinates, e.g., (u, v) → (v, u), in the parametrization of the 2-dimensional
surface S, changes the sign of the Jacobian and hence the sign of the integral. In
addition, if one decides to change coordinates to do a particular integral, the Jacobian
of the transformation enters automatically via the wedge product of the coordinate
differentials. For example, in two dimensions, if one transforms from (x, y) to (u, v)
it follows that
2 The vertical lines in (B.43) mean you should take the determinant of the 2 × 2 matrix of partial
derivatives. See Appendix D.4.3.2 for more details, if needed.
Appendix B: Differential Forms 447
∂xj
dv
∂v u+du
∂xi
du
∂u
∂x ∂x ∂y ∂y
dx ∧ dy = du + dv ∧ du + dv
∂u ∂v ∂u ∂v
(B.46)
∂x ∂y ∂x ∂y ∂(x, y)
= − du ∧ dv = du ∧ dv ,
∂u ∂v ∂v ∂u ∂(u, v)
where we used the chain rule and the anti-symmetry of the wedge product to get the
first and second equalities above. In n dimensions, for a coordinate transformation
from x i ≡ (x 1 , x 2 , . . . , x n ) to x i = (x 1 , x 2 , . . . , x n ), we have
∂x1 ∂ x n i1
dx 1 ∧ · · · ∧ dx n = ··· ··· i n
dx ∧ · · · ∧ dx in
∂x i 1 ∂ x
i 1 i
n
∂(x 1 , . . . , x n )
= dx 1 ∧ · · · ∧ dx n ,
∂(x , . . . , x )
1 n
where we made use of the n-dimensional Levi-Civita symbol and used (D.81) to
get the last equality. Note that this is precisely the inverse transformation of the
components of an n-form
∂(x 1 , . . . , x n )
ω1···n = ω1 ···n , (B.48)
∂(x 1 , . . . , x n )
dn x ≡ dx 1 ∧ dx 2 ∧ · · · ∧ dx n , (B.50)
Finally, to end this section, we note that the integral theorems of vector calculus
(Appendix A.6) are actually special cases of an all-inclusive Stokes’ theorem, written
in terms of differential forms,
dα = α, (B.52)
U ∂U
Exercise B.8 (a) Show explicitly by taking partial derivatives that a coordinate
transformation from Cartesian coordinates (x, y) to plane polar coordinates
(r, φ) leads to
dx ∧ dy = r dr ∧ dφ . (B.53)
dx ∧ dy ∧ dz = r 2 sin θ dr ∧ dθ ∧ dφ . (B.54)
Appendix B: Differential Forms 449
Exercise B.9 Show that in three dimensions (B.52) reduces to the fundamen-
tal theorem for gradients (A.85), Stokes’ theorem (A.86), and the divergence
theorem (A.87), by making the following identifications:
(a) For p = 1, identify the 0-form α with the function U , and the exterior
derivative dα with the gradient ∇U . Also, identify
dx i
ds (B.55)
ds
with the line element ds, where dx i /ds is the tangent vector to the curve C
parameterized by the arc length s.
(b) For p = 2, identify the 1-form α with the vector field Ai ≡ αi , and use
(B.12) to identify dα with ∇ × A. Also, identify
∂(x j , x k )
εi jk du dv (B.56)
j<k
∂(u, v)
∂(x i , x j , x k )
εi jk du dv dw (B.57)
i< j<k
∂(u, v, w)
Suggested References
Full references are given in the bibliography at the end of the book.
Flanders (1963): A classic text about differential forms, appropriate for graduate
students or advanced undergraduates comfortable with abstract mathematics.
Schutz (1980): A introduction to differential geometry, including tensor calculus
and differential forms, with an emphasis on geometrical methods. Appropriate for
graduate students or advanced undergraduates comfortable with abstract mathe-
matics.
Appendix C
Calculus of Variations
The calculus of variations is an extension of the standard procedure for finding the
extrema (i.e., maxima and minima) of a function f (x) of a single real variable x.
But instead of extremizing a function f (x), we extremize a functional I [y], which
is a “function of a function” y = f (x). Classic problems that can be solved using
the calculus of variations are: (i) finding the curve connecting two points in the
plane that has the shortest distance (a geodesic problem), (ii) finding the shape of
a closed curve of fixed length that encloses the maximum area (an isoperimetric
problem), (iii) finding the shape of a wire joining two points such that a bead will
slide along the wire under the influence of gravity in the shortest amount of time (the
famous brachistochrone problem of Johann Bernoulli). The calculus of variations
also provides an alternative way of obtaining the equations of motion for a particle,
or a system of particles, in classical mechanics. In this appendix, we derive the Euler
equations, discuss ways of solving these equations in certain simplified scenarios,
and extend the formalism to deal with integral constraints. For a more thorough
introduction to the calculus of variations, see, e.g., Boas (2006), Gelfand and Fomin
(1963), and Lanczos (1949). Specific applications to classical mechanics will be
given in Chap. 3.
C.1 Functionals
In its simplest form, a functional I = I [y] is a mapping from some specified set of
functions {y = f (x)} to the set of real numbers R. For the types of problems that we
will be most interested in, the functions y = f (x) are defined on some finite interval
x ∈ [x1 , x2 ]; they are single-valued and have continuous first derivatives; and they
have fixed endpoints ℘1 ≡ (x1 , y1 ), ℘2 ≡ (x2 , y2 ). (Curves that cannot be described
by a single-valued function can be put in parametric form, x = x(t), y = y(t), which
we will discuss in detail in Appendix C.6.) A simple concrete example of a functional
is the arc length of the curve traced out by a function y = f (x) that connects ℘1
and ℘2 :
© Springer International Publishing AG 2018 451
M.J. Benacquista and J.D. Romano, Classical Mechanics, Undergraduate
Lecture Notes in Physics, https://doi.org/10.1007/978-3-319-68780-3
452 Appendix C: Calculus of Variations
y=f(x)
1
x
℘2 ℘2 x2
I [y] ≡ ds = dx 2 + dy 2 = 1 + y 2 dx , (C.1)
℘1 ℘1 x1
x2
I [y] ≡ F(y, y , x) dx , (C.2)
x1
where x is the independent variable and the set of functions {y = f (x)} is as before,
but the integrand F is now an arbitrary function of the three variables
(y, y , x).
(For the arc-length functional defined previously, F(y, y , x) = 1 + y 2 , which is
independent of x and y, but that does not have to be the case in general.) Note that
to do the integral over x, we need to express both y and y in terms of f (x), but for
the variational calculations that follow, we simply treat F as an ordinary function of
three independent variables. The fact that y and y are related to one another only
shows up later on, when we need to relate the variation δy to δy, cf. (C.6).
Toward the end of this appendix, in Appendix C.7, we will extend our definition
of a functional to n-degrees of freedom:
x2
I [y1 , y2 , . . . , yn ] ≡ F(y1 , y2 , . . . , yn ; y1 , y2 , . . . , yn ; x) dx , (C.3)
x1
Appendix C: Calculus of Variations 453
Given I [y], we now want to find its extrema—i.e., those functions y = f (x) for
which I [y] has a local maximum or minimum. Similar to ordinary calculus, a nec-
essary (but not sufficient) condition for y to be an extremum is that the 1st-order
change in I [y] vanish for arbitrary variations to y = f (x) that preserve the boundary
conditions. We define such a variation to y = f (x) by
where f¯(x) is a function that differs infinitesimally from f (x) at each value of x in
the domain [x1 , x2 ] (See Fig. C.2). In terms of δy, the variation of the functional is
then given by δ I [y] ≡ I [y +δy]− I [y], where we ignore all terms that are 2nd-order
or higher in δy, δy . The condition δ I [y] = 0 determines the stationary values of
the functional. These include maxima and minima, but also points of inflection or
saddle points. To check if a stationary value is an extremum, we need to calculate the
change in I [y] to 2nd-order in δy. If the 2nd-order contribution δ 2 I [y] is positive,
then we have a minimum; if it is negative, a maximum; and if it is zero, a saddle
point. However, in the calculations that follow, we will stop at 1st order, as it will
usually be obvious from the context of the problem whether our stationary solution
is a maximum or minimum, without having to explicitly carry out the 2nd-order
variation.
Given definition (C.2) of the functional I [y], it follows that
where we ignored all 2nd-order terms to the get the last line. As mentioned earlier,
the variations δy and δy are not independent of one another, but are related by
dy d
δy ≡ δ = δy . (C.6)
dx dx
454 Appendix C: Calculus of Variations
y+ y=f(x)
y
y=f(x)
1
x
Making this substitution and then integrating the term involving δy by parts, we find
x2
!
∂F x2
∂F d ∂F
δ I [y] = δy + − δy dx . (C.7)
∂ y x1 x1 ∂y dx ∂ y
the first term on the right-hand side of (C.7) is zero. Then, since the variation δy is
otherwise arbitrary, it follows that
∂F d ∂F
δ I [y] = 0 ⇔ − = 0. (C.9)
∂y dx ∂ y
The equation on the right-hand side is called the Euler equation. (In the context of
classical mechanics, where F is the Lagrangian of the system, the above equation is
called the Euler-Lagrange equation. See Chap. 3 for details.)
Example C.1 Using the Euler equation, show that the curve that minimizes the
distance between two fixed points in a plane is a straight line.
Appendix C: Calculus of Variations 455
Proof From (C.1) we have F(y, y , x) = 1 + y 2 , which is independent of both x
and y. Thus, the Euler equation (C.9) simplifies to
d ∂F ∂F
=0 ⇔ = const . (C.10)
dx ∂ y ∂ y
∂F y
= = const , (C.11)
∂y 1 + y 2
ds 2 = R 2 dφ 2 + dz 2 , (C.12)
n
ds 2 = gi j dx i dx j , (C.14)
i, j=1
where gi j ≡ gi j (x 1 , x 2 , . . . , x n ).
Exercise C.2 Show that a geodesic on the surface of a sphere is an arc of a great
circle—i.e., the intersection of the surface of the sphere with a plane passing
through the center of the sphere, A cos φ + B sin φ + cot θ = 0, where A and
B are constants, determined by the end points of the curve. (Hint: Take θ as the
independent variable for this calculation.)
dF ∂F
= 0 or = 0. (C.16)
dx ∂ y
Thus, unlike varying an ordinary function f (x) > 0 for which the stationary
values of f (x) and g(x) ≡ f 2 (x) are identical, the stationary values of the
functionals defined by F(y, y , x) and F 2 (y, y , x) differ in general.
We can give a more formal derivation of Euler’s equation and the associated varia-
tional process by writing the variation δy of the function y = f (x) as
Note that since I [y + εη] is an ordinary function of the real variable ε, we can Taylor
expand I [y + εη] or take its derivatives with respect to ε in the usual way. For our
applications, we will be particularly interested in the first derivative of I [y + εη]
with respect to ε evaluated at ε = 0:
Appendix C: Calculus of Variations 457
dI [y + εη] I [y + εη] − I [y]
≡ lim . (C.19)
dε ε=0 ε→0 ε
where the integration is over the domain x ∈ [x1 , x2 ] of the functions y = f (x) on
which the functional I [y] is defined. Note that the above definition is the functional
analogue of the definition of the directional derivative of a function ϕ(x 1 , x 2 , . . . , x n )
in the direction of η:
dϕ(x + εη) ϕ(x + εη) − ϕ(x) ∂ϕ i
n
≡ lim = η , (C.21)
dε ε=0 ε→0 ε i=1
∂xi
where integration over the continuous variable x in (C.20) replaces the summation
over the discrete index i in (C.21); see also Appendix A.4.1 and (A.30).
If the functional I [y] has the form given in (C.2), i.e.,
x2
I [y] ≡ dx F(y, y , x) , (C.22)
x1
where the functions y = f (x) are fixed at x1 and x2 , then the variational procedure
described above leads to the same results that we found in the previous section,
namely
x2 x2
!
dI [y + εη] ∂F
∂F d ∂F
= η + − η dx . (C.23)
dε ε=0 ∂ y x1 x1 ∂y dx ∂ y
But since the function η(x) vanishes at the endpoints, the above expression simplifies
to x2
!
dI [y + εη] ∂F d ∂F
= − η dx , (C.24)
dε ε=0 x1 ∂y dx ∂ y
for which
1A word of caution. The functional derivative δ I [y]/δy(x) is a density in x, being defined inside
an integral, (C.20). As such, the dimensions of dx δ I [y]/δy(x) are the same as the dimensions of I
divided by the dimensions of y. For example, if I [y] is the arc length functional, then δ I [y]/δy(x)
has dimension of 1/length.
458 Appendix C: Calculus of Variations
δ I [y] ∂F d ∂F
= − . (C.25)
δy ∂y dx ∂ y
Thus, in terms of a functional derivative, the Euler equation (C.9) can be written as
δ I [y]/δy(x) = 0.
Exercise C.4 Let I [y] be a functional that depends only on the value of y at a
particular value of x, e.g.,
I [y] ≡ y(x0 ) . (C.26)
Show that for this case the functional derivative is the Dirac delta function:
δ I [y]
= δ(x − x0 ) . (C.27)
δy(x)
When F does not explicitly depend upon x, it is convenient to work with the Euler
equation in an alternative form. If we simply take the total derivative of F with
respect to x we have
dF ∂ F ∂ F ∂ F
= y + y + . (C.28)
dx ∂y ∂y ∂x
But since
d ∂F ∂F d ∂F
y = y + y , (C.29)
dx ∂ y ∂y dx ∂ y
∂F d ∂F
δ I [y] = 0 ⇔ 0= + y −F . (C.31)
∂x dx ∂ y
Appendix C: Calculus of Variations 459
The Euler equation (C.9) or its alternate form (C.31) is a 2nd-order ordinary differen-
tial equation with respect to the independent variable x. This equation may simplify
depending on the form of F(y, y , x):
∂F
y − F = const . (C.33)
∂ y
It turns out that simplification (2) is equivalent to making a change of the independent
variable in the integrand of the functional from x to y using
dx = x dy ⇔ y = 1/x , (C.34)
so that
x2 y2 y2
I [y] =
F(y, y ) dx =
F(y, 1/x ) x dy ≡ F̃(x, x , y) dy ≡ I˜[x] .
x1 y1 y1
(C.35)
But since
F̃(x, x , y) ≡ x F(y, 1/x ) (C.36)
is independent of x, the Euler equation for I˜[x] simplifies to ∂ F̃/∂ x = const. But
note that
∂ F̃ ∂
∂ F ∂(1/x )
1 ∂F ∂F
= x F(y, 1/x ) = F + x = F − = F − y ,
∂x ∂x ∂ y ∂ x x ∂y ∂y
(C.37)
which means that
∂ F̃ ∂F
= const ⇔ y − F = const (C.38)
∂x ∂y
460 Appendix C: Calculus of Variations
Example C.2 A soap film is suspended between two circular loops of wire, as shown
in Fig. C.3. Ignoring the effects of gravity, the soap film takes the shape of a surface
of revolution, which has minimimum surface area. Thus, in terms of the function
y = f (x), the functional that we need to minimize is the surface area of revolution
℘2 x2
I [y] = 2π y ds = 2π y 1 + y 2 dx , (C.39)
℘1 x1
which has
F(y, y , x) = 2π y 1 + y 2 . (C.40)
But since F does not depend explicitly on x, we can use simplification (2) to write
∂F 2π yy 2π y
y
− F = y − 2π y 1 + y 2 = − = const . (C.41)
∂y 1+y 2 1 + y 2
dy 1 2
y ≡ = y − A2 . (C.42)
dx A
This is a separable equation, which can be integrated using the hyperbolic trig substi-
tutions y = A cosh u, recalling that cosh2 u − sinh2 u = 1, and d cosh u = sinh u du.
Thus,
dy A sinh u du
x=A +B = A +B = Au+B = A cosh−1 (y/A)+B ,
y 2 − A2 A sinh u
(C.43)
or, equivalently,
x−B
y = A cosh . (C.44)
A
Such a curve is called a catenary. As usual, the integration constants A and B can
be determined by the boundary conditions for y = f (x), which are related to the
radii of the two circular loops of wire. Unfortunately, solving for A and B involves
solving a transcendental equation. See Chap. 17 of Arfken (1970) for a discussion of
special cases of this problem.
Appendix C: Calculus of Variations 461
Exercise C.5 Find the shape of a wire joining two points such that a bead
will slide along the wire under the influence of gravity (without friction) in the
shortest amount of time. (See Fig. C.4.) Assume that the bead is released from
rest at y = 0. Such a curve is called a brachistochrone, which in Greek means
“shortest time.”
Hint: You should extremize the functional
℘2 x2
ds 1 + y 2
I [y] = = dx √ , (C.45)
℘1 v x1 2gy
1 2
mv − mgy = 0 ⇒ v= 2gy (C.46)
2
was used to yield an expression for the speed v in terms of y. By using simpli-
fication (2) or changing the independent variable of the functional from x to y,
you should find
dy 1 − Ay
y ≡ = . (C.47)
dx Ay
1
x
Fig. C.4 Geometrical set-up for the brachistochrone problem, Exercise C.5. The goal is to find
the shape of the wire connecting points ℘1 and ℘2 such that a bead slides along the wire under the
influence of gravity (and in the absence of friction) in the shortest amount of time. Note that we
have chosen the y-axis to increase in the downward direction
P
0
P
0.5
P
0 1 2 3 4 5 6
Fig. C.5 A cycloid is the path traced out by a point on the rim of a wheel as it rolls without slipping
across a flat surface. It is also the shape of the wire that solves the brachistochrone problem, Exer-
cise C.5. To be consistent with the geometry of Exercise C.5 shown in Fig. C.4, we are considering
the wheel as rolling to the right in contact with the top horizontal surface
(a) Write down the line element ds 2 on the surface of revolution in terms of the
coordinates (ρ, φ) by simply substituting the embedding equations into the
3-dimensional line element dx 2 + dy 2 + dz 2 .
Appendix C: Calculus of Variations 463
(b) By varying the arc length functional, obtain the geodesic equation for a
curve ρ = ρ(φ) on the surface of revolution, and show that it can be solved
via quadratures,
ρ
dρ 1 + [ f (ρ)]2
φ − φ0 = c1 , (C.50)
ρ0 ρ ρ 2 − c12
where c1 is a constant.
(c) Evaluate the above integral for the case of a surface of a cone with half-
angle α, which is defined by f (ρ) = ρ cot α. (See panel (b) of Fig. C.6.)
You should find c1
ρ= , (C.51)
cos(φ sin α + c2 )
z z
y y
x x
(a) (b)
Fig. C.6 Examples of surfaces of revolution, obtained by rotating a curve z = f (ρ) around the
z-axis. Panel (a) A paraboloid defined by f (ρ) ≡ ρ 2 . Panel (b) A cone with half-angle α, defined
by f (ρ) = ρ cot α
464 Appendix C: Calculus of Variations
Although we have been considering functionals of the form given by (C.2), where the
curve is explicitly described by the function y = f (x), there may be cases where it
is more convenient (or even necessary) to describe the curve in parametric form, e.g.,
x = x(t), y = y(t). Such an example is the variational problem to find the shape of
a closed curve of fixed length that encloses the greatest area. (We will revisit this in
more detail in Appendix C.8.) Here we derive the necessary and sufficient conditions
for a functional to depend only on the curve in the x y-plane and not on the choice of
parametric representation of the curve. The relevant theorem is:
Theorem C.1 The necessary and sufficient conditions for the functional
t2
I [x, y] = G(x, y, ẋ, ẏ, t) dt (C.53)
t1
to depend only on the curve in the x y-plane and not on the choice of parametric
representation of the curve is that G not depend explicitly on t and be a positive-
homogeneous function of degree one in ẋ and ẏ—i.e.,
which by its definition depends only the curve traced out by the function y = f (x).
We then introduce a parameter t so that x = x(t), y = y(t). Then
dx ẋ dy ẏ
dx = ẋ dt , dy = ẏ dt , x = = , y = = , (C.56)
dy ẏ dx ẋ
and x2 t2
F(y, y , x) dx = F(y, ẏ/ẋ, x)ẋ dt . (C.57)
x1 t1
Thus,
2 This derivation closely follows that given in Sect. 10 of Gelfand and Fomin (1963).
Appendix C: Calculus of Variations 465
x2 t2
I [y] = F(y, y , x) dx = G(x, y, ẋ, ẏ) dt ≡ I [x, y] , (C.58)
x1 t1
where
Note that G has the properties that it does not depend explicitly on t, and that
for all λ > 0. Thus, G is a positive-homogeneous function of degree one in ẋ and ẏ.
Conversely, suppose that we have a functional of the form
t2
I [x, y] = G(x, y, ẋ, ẏ) dt , (C.61)
t1
dt dx dτ dy dτ
dt = dτ , ẋ = , ẏ = , (C.62)
dτ dτ dt dτ dt
and
dx dτ dy dτ dx dy dτ
G(x, y, ẋ, ẏ) = G x, y, , = G x, y, , , (C.63)
dτ dt dτ dt dτ dτ dt
Example C.3 Here we show explicitly that the Euler equations obtained from the
parametrized functional
466 Appendix C: Calculus of Variations
t2 t2
I [x, y] ≡ G(x, y, ẋ, ẏ) dt = F(y, ẏ/ẋ, x)ẋ dt (C.65)
t1 t1
Proof The Euler equations obtained from (C.65) by varying both x and y are
∂G d ∂G
− = 0, (C.67a)
∂x dt ∂ ẋ
∂G d ∂G
− = 0. (C.67b)
∂y dt ∂ ẏ
∂G ∂G
ẋ + ẏ − G = const . (C.68)
∂ ẋ ∂ ẏ
where we have cancelled out the terms involving ẍ and ÿ. This last equation shows
that the two equations (C.67a) and (C.67b) are not independent, but follow one from
the other. So, without loss of generality, let’s consider (C.67b). Then by writing G
in terms of F and performing the derivatives, we find
∂G d ∂G
0= −
∂y dt ∂ ẏ
∂ dx d ∂
= [F(y, ẏ/ẋ, x)ẋ] − [F(y, ẏ/ẋ, x)ẋ]
∂y dt dx ∂ ẏ
(C.70)
∂F d ∂F 1
= ẋ − ẋ ẋ
∂y dx ∂ y ẋ
∂F d ∂F
= ẋ − ,
∂y dx ∂ y
Appendix C: Calculus of Variations 467
Exercise C.7 Consider the parametrized form of the arc length functional in
two dimensions,
℘2 t2
I [x1 , x2 ] ≡ ds = G(x1 , x2 , ẋ1 , ẋ2 ) dt , (C.71)
℘1 t1
where
ds
G(x1 , x2 , ẋ1 , ẋ2 ) = = gi j ẋi ẋ j . (C.72)
dt i, j
Note that by writing the arc length in this form (See (C.14)), we are allowing
for the possibility that the 2-dimensional space be curved (e.g., the surface of a
sphere) and that the coordinates need not be Cartesian, so gi j ≡ gi j (x1 , x2 ) in
general.
(a) Show that the Euler equations for this functional are
⎛ ⎞
d ⎝ ⎠ 1 ∂g jk 1 dG
gi j ẋ j − ẋ j ẋk = gi j ẋ j . (C.73)
dt j
2 j,k ∂ xi G dt j
where
1 1
K (x1 , x2 , ẋ1 , ẋ2 ) ≡ gi j ẋi ẋ j = G 2 (x1 , x2 , ẋ1 , ẋ2 ) . (C.77)
2 i, j 2
468 Appendix C: Calculus of Variations
The equivalence of these two approaches follows from the fact that G and
K differ by an overall multiplicative constant when t is an affine parameter.
If t is not an affine parameter, then the two functionals I and J lead to the
different equations of motion, consistent with the results of Exercise C.3.
(Note that these results hold, in general, in n dimensions.)
C.7 Generalizations
The standard calculus of variations problem (C.2) discussed in the preceding sections
can be extended in several ways. Here we describe three such extensions.
where the set of functions {y = f (x)} is now restricted so that both y and y are
fixed at the endpoints ℘1 and ℘2 . Then proceeding in a manner similar to that in
Appendix C.2, we find
∂F d ∂F d2 ∂F
δ I [y] = 0 ⇔ − + = 0. (C.79)
∂y dx ∂ y dx 2 ∂ y
Note that the Euler equation for this case may contain 3rd or even 4th-order deriva-
tives of y = f (x). Although most problems in classical mechanics involve 2nd-order
differential equations, 3rd or higher-order differential equations have applications in
certain areas of chaos theory (See, e.g., Goldstein et al. 2002).
The standard variational problem (C.2) with fixed endpoints can be generalized by
allowing the variations δy to be non-zero at either one or both endpoints ℘1 , ℘2 . The
derivation given in Appendix C.2 then leads to
Appendix C: Calculus of Variations 469
∂F d ∂F ∂ F ∂ F
δ I [y] = 0 ⇔ − = 0, = 0, = 0.
∂y dx ∂ y ∂ y x1 ∂ y x2
(C.80)
The conditions
∂ F ∂ F
= 0, = 0, (C.81)
∂ y x1 ∂ y x2
are sometimes called natural boundary conditions for the curve. In the context
of classical mechanics, the natural boundary conditions for a particle moving in
response to a velocity-independent conservative force correspond to zero velocity
at the endpoints. The only solution to the equations of motion that satisfies these
boundary conditions at both endpoints is the trivial solution, where the particle just
sits at one location forever. Imposing the natural boundary condition at just one
endpoint and fixed boundary conditions at the other allows for non-trivial solutions,
in general.
Exercise C.8 Redo the brachistochrone problem (Exercise C.5), but this time
allowing the second endpoint at x2 to be free—i.e., δy|x2 = 0. You should find
that the solution is again a cycloid, but which intersects the line x = x2 at a
right angle.
The derivation of the Euler equation given in Appendix C.2 can be easily be extended
to functionals of the form
x2
I [y1 , . . . , yn ] ≡ F(y1 , . . . , yn ; y1 , . . . , yn ; x) dx , (C.82)
x1
∂F d ∂F
− = 0, i = 1, 2, . . . , n . (C.83)
∂ yi dx ∂ yi
These equations form a system of n 2nd-order ordinary differential equations for the
functions yi = f i (x) with respect to the independent variable x.
470 Appendix C: Calculus of Variations
The simplifications discussed in Appendix C.5 carry over to the general case of n
degrees of freedom:
∂F
= const . (C.84)
∂ yi
n
∂F
h≡ yi − F = const . (C.85)
i=1
∂ yi
∂F d
n n
n
dh ∂F ∂F ∂ F
= yi + yi − yi + yi = 0 , (C.86)
dx i=1
∂ yi i=1
dx ∂ yi i=1
∂ yi ∂ yi
where we assumed that F does not depend explicitly on x (i.e., ∂ F/∂ x = 0) and
used the Euler equations (C.83) to get the last equality.
There is also an additional simplification if F does not depend on the derivative
of one of the variables:
where
So far, we’ve been considering variational problems of the form δ I [y] = 0, where
the functions y = f (x) have been subject only to boundary conditions, e.g., fixed at
the endpoints x = x1 and x2 . But there might also exist situations where the functions
y = f (x) are subject to an integral constraint
x2
J [y] ≡ G(y, y , x) dx = J0 , (C.90)
x1
where J0 is a constant. Such problems are called isoperimetric problems, since the
classic example of such a problem is to find the shape of a closed curve of fixed length
(perimeter) that encloses the maximum area. Due to the constraint, the variations
of y in δ I = 0 are not free, but are subject to the condition that δ J [y] = 0. We
can incorporate this condition into the variational problem by using the method of
Lagrange multipliers (See Sect. 2.4). This amounts to adding to δ I = 0 a multiple
of δ J = 0,
δ I [y] + λδ J [y] = 0 , (C.91)
where λ is an undetermined constant (the Lagrange multiplier for this problem). Note
that (C.91) can be recast as finding the stationary values of the functional
x2
I¯[y, λ] ≡ I [y] + λ (J [y] − J0 ) = (F + λG) dx − λ J0 (C.92)
x1
with respect to unconstrained variations of both y and λ. (The variation with respect
to λ recovers the integral constraint J [y] − J0 = 0.) Performing the variations give
rise to two equations, which can be solved for the two unknowns y = f (x) and λ (if
desired).
Example C.4 Here we will find the shape of a closed curve of fixed length that
encloses the largest area. Since the curve is closed, we will not be able to represent
472 Appendix C: Calculus of Variations
with the derivatives so chosen as to avoid a kink at the origin. (See Fig. C.7.)
The functional that we want to extremize is the area under the curve
t2
I [x, y] = y dx = y ẋ dt , (C.94)
t1
The curve is traversed clockwise so that the area obtained is enclosed by the curve.
Using the method of Lagrange multipliers discussed above, we extremize the
combined functional
t2
I¯[x, y, λ] = (F + λG) dt − λ , (C.96)
t1
where
F + λG = y ẋ + λ ẋ 2 + ẏ 2 . (C.97)
where A and B are constants. These equations can be simplified if we switch the
parametric representation from t to arc length s, noting that
dt 1
= . (C.100)
ds ẋ 2 + ẏ 2
Appendix C: Calculus of Variations 473
dx dy
y+λ = A, λ −x = B. (C.101)
ds ds
We now apply the boundary conditions of (C.93) but in terms of arc length s,
dx dy
x|s=0, = 0 , y|s=0, = 0 , = −1 , = 0, (C.102)
ds s=0, ds s=0,
which lead to B = 0 and A = −λ. The first equation in (C.101) can then be solved
for y in terms of dx/ds,
dx
y = −λ 1 + , (C.103)
ds
d2 x
= −λ−2 x , (C.104)
ds 2
with solution
x(s) = −|λ| sin(s/|λ|) , (C.105)
Note that these last two equations are the parametric representation of a circle
x 2 + (y + λ)2 = λ2 , (C.107)
with radius R = |λ| and center (0, −λ), In order that the circle lie above the x-axis,
as suggested by Fig. C.7, we need λ = −R. Finally, since 2π R = is the length of
the curve, the Lagrange multiplier λ = −/2π . Thus,
x =− sin(2π s/l) , y= (1 − cos(2π s/l)) . (C.108)
2π 2π
The area enclosed by the circle is A = π R 2 = 2 /4π , which is the largest area
enclosed by a closed curve of fixed length .
474 Appendix C: Calculus of Variations
Exercise C.10 Find the shape of the curve of fixed length and free endpoint
℘2 on the x-axis that encloses the largest area between it and the x-axis. (See
Fig. C.8.) You should find:
x= (1 − cos(π s/l)) , y=− sin(π s/l) , (C.109)
π π
which is the parametric representation of a semi-circle of radius R = /π,
length , and center (/π, 0). Note that the area enclosed by this curve and the
x-axis is 2 /2π , which is twice as large as that calculated in Example C.4 for
the closed-curve variational problem.
Exercise C.11 Find the shape of a flexible hanging cable of fixed length ,
which is supported at endpoints ℘1 and ℘2 in a uniform gravitational field g
pointing downward. (See Fig. C.9.)
Hint: The shape minimizes the gravitational potential energy of the cable
℘2 ℘2 x2
I [x] = dm gy = μds gy = μg y 1 + y 2 dx , (C.110)
℘1 ℘1 x1
You should find that the solution has the form of a catenary, y ∼ cosh x.
Suggested References
Full references are given in the bibliography at the end of the book.
Boas (2006): Chapter 9 is devoted solely to the calculus of variations; an excellent
introduction to the topic well-suited for undergraduates with many examples and
problems. Solutions to several of the problems presented in this appendix can be
found in Boas (2006).
476 Appendix C: Calculus of Variations
Gelfand and Fomin (1963): A more rigorous mathematical treament of the calculus
of variations. Our discussion of variational problems in parametric form follows
closely the presentation given in Sect. 10 of this book.
Lanczos (1949): In our opinion, the best book on variational methods in the context
of classical mechanics. It contains excellent descriptions/explanations of the cal-
culus of variations, constrained systems, the method of Lagrange multipliers, etc.
Suitable for either advanced undergraduates or graduate students.
Appendix D
Linear Algebra
An abstract vector space consists of two types of objects (vectors and scalars) and
two types of operations (vector addition and scalar multiplication), which interact
with one another and are subject to certain properties (enumerated below). We will
denote vectors by boldface symbols, A, B, C, · · · , and scalars (which we will take
to be complex numbers) by italicized symbols, a, b, c, · · · . Vector addition will be
denoted by a + sign between two vectors, e.g., A + B, and scalar multiplication
by juxtaposition of a scalar and a vector, e.g., aA. The properties obeyed by these
operations are as follows.
A+B=C (D.1)
2. Commutativity:
A+B=B+A (D.2)
3. Associativity:
(A + B) + C = A + (B + C) (D.3)
4. Zero vector:
∃0 such that A + 0 = A ∀A (D.4)
5. Inverse vector:
∀A ∃ − A such that A + (−A) = 0 (D.5)
aA = B (D.6)
1A = A (D.7)
(a + b)A = aA + bA (D.8)
a(A + B) = aA + aB (D.9)
Using the various properties given above, it is easy to see that 0A = 0 and (−1)A =
−A, since
(A + 0A) = (1A + 0A) = (1 + 0)A = 1A = A , (D.11)
and
(A + (−1)A) = (1A + (−1)A) = (1 + (−1))A = 0A = 0 . (D.12)
The above definitions and properties allow us to extend the familiar properties of
ordinary 3-dimensional vectors and real numbers to other sets of objects. Although
most properties of 3-dimensional vectors carry over to these higher-dimensional
abstract vector spaces, some do not, such as the cross (vector) product of two vectors,
e.g., A × B, which is defined in 3-dimensions by (A.2). (But see the wedge product
of differential forms described in Appendix B.)
A key concept when working with vectors is that of a basis. But in order to define
what we mean by a basis, we must first introduce some terminology:
aA + bB + · · · (D.13)
• A set of vectors {A, B, · · · } is a linearly independent set if and only if each vector
in the set is linearly independent of all the other vectors in the set.
• A set of vectors {A, B, · · · } spans the vector space if and only if any vector C
in the vector space can be written as a linear combinaton of the vectors in the
set—i.e.,
∃a, b, · · · such that C = aA + bB + · · · (D.15)
In terms of the above definitions, a basis for a vector space is defined to be any
set of vectors which is (i) linearly independent and (ii) spans the vector space. The
number of basis vectors is defined as the dimension of the vector space. Thus, an
n-dimensional vector space has a basis consisting of n vectors,
{e1 , e2 , . . . , en } . (D.16)
480 Appendix D: Linear Algebra
This is, of course, consistent with what we know about the space of ordinary 3-
dimensional vectors. The set of unit vectors {x̂, ŷ, ẑ} is a basis for that space. (More
about what unit means for a more general vector space in just a bit.)
A = A 1 e1 + A 2 e2 + · · · + A n en ≡ A i ei . (D.17)
i
The scalars A1 , A2 , . . . , An are called the components of A with respect to the basis
{e1 , e2 , . . . , en }. The decomposition of A into its components is unique for a given
basis as shown in Exercise D.1 below. But for a different set of basis vectors, e.g.,
{e1 , e2 , . . . , en }, the components of A will be different.
Exercise D.1 Prove that the decomposition (D.17) of A into its components
A1 , A2 , . . . , An is unique. Hint: Use proof by contradiction—i.e., assume that
there exist other components A1 , A2 , . . . , An , for which
A= Ai ei . (D.18)
i
Then show that this leads to a contradiction regarding the linear independence
of the basis vectors unless Ai = Ai for all i.
Vector addition and scalar multiplication are what you might expect in terms of
the components of the vectors. That is, if we denote the correspondence between
vectors and components by1
⎡ ⎤
A1
⎢ A2 ⎥
⎢ ⎥
A ↔ A ≡ [A1 , A2 , . . . , An ]T ≡ ⎢ . ⎥ , (D.19)
⎣ .. ⎦
An
1 Our notation is such that A denotes the abstract vector,Ai its ith component with respect to a basis,
and A the collection of components A1 , A2 , . . . , An represented as an n × 1 column matrix. The
superscript T denotes transpose, which converts a row matrix into a colum matrix, and vice versa.
Appendix D: Linear Algebra 481
and
aA ↔ aA = [a A1 , a A2 , . . . , a An ]T . (D.21)
Exercise D.2 Show that the zero vector 0 and inverse vector −A can be written
in terms of components as
and
−A ↔ −A = [−A1 , −A2 , . . . , −An ]T . (D.23)
A · B = (B · A)∗ (D.24a)
A · A ≥ 0 , A · A = 0 if and only if A = 0 (D.24b)
A · (bB + cC) = b(A · B) + c(A · C) (D.24c)
2 As anybody who has taken freshman physics knows, length and angle are key concepts for ordinary
(3-dimensional) vectors, which are sometimes defined as “arrows” having magnitude and direction!
482 Appendix D: Linear Algebra
Mathematicians call a vector space with the additional structure of an inner product
an inner product space.
• A vector A is said to have unit norm (or to be normalized) if and only if |A| = 1.
• Two vectors A and B are said to be orthogonal if and only if the inner product of
A and B vanishes—i.e., A · B = 0.
• A set of vectors {A1 , A2 , · · · } is said to be orthonormal if and only if Ai ·A j = δi j
for all i, j = 1, 2, . . . , n, where δi j is the Kronecker delta symbol (which equals
one if i = j, and equals zero otherwise).
• An orthonormal basis is an orthonormal set of basis vectors, which we will typi-
cally denote with hats, ê1 , ê2 , . . . , ên . These vectors are thus linearly independent,
span the vector space, and satisfy
êi · ê j = δi j . (D.26)
{e1 , e2 , . . . , en } , (D.27)
which we will assume is not orthonormal. (If the basis vectors were already orthonor-
mal, then there would be nothing that you need to do!) Take e1 and simply divide by
its norm. The result is a unit vector that points in the same direction as e1 ,
Appendix D: Linear Algebra 483
e1
f̂ 1 ≡ . (D.28)
|e1 |
f 2 ≡ e2 − (f̂ 1 · e2 ) f̂ 1 . (D.29)
Then normalize f 2 ,
f2
f̂ 2 ≡ . (D.31)
|f 2 |
Thus, both f̂ 1 and f̂ 2 have unit norm and they are orthogonal to one another. For f̂ 3
we proceed in a similar fashion:
and
f3
f̂ 3 ≡ . (D.33)
|f 3 |
e1 ≡ x̂ , e2 ≡ x̂ + ŷ , e3 ≡ x̂ + ŷ + ẑ , (D.34)
where x̂, ŷ, ẑ are the standard orthonormal basis vectors in ordinary 3-
dimensional space. (b) Repeat the procedure, but this time with the basis vectors
enumerated in the reverse order,
e1 ≡ x̂ + ŷ + ẑ , e2 ≡ x̂ + ŷ , e3 ≡ x̂ . (D.35)
To illustrate that the inner product defined above generalizes the dot product of
ordinary 3-dimensional vectors, it is simplest to show that we can recover the form
of the dot product given in (A.4), which we rewrite here as
A · B = A1 B1 + A2 B2 + A3 B3 = Ai Bi , (D.36)
i
Ai = êi · A . (D.38)
where we used the linearity property (D.24c) of the inner product and the orthonor-
mality (D.26) of the basis vectors to obtain the second and third equalities.3 Using
these results, it is then fairly straightforward to show that
A · B = A∗1 B1 + A∗2 B2 + · · · + A∗n Bn = Ai∗ Bi , (D.40)
i
and, as a consequence,
3 Notethat Ai = A · êi in general, since the components of a vector can be complex. Using
Exercise (D.3), it follows that A · êi = Ai∗ .
Appendix D: Linear Algebra 485
|A|2 = |Ai |2 . (D.41)
i
Note that (D.40) does indeed generalize the dot product (D.36) to arbitrary dimen-
sions. Setting n = 3 and taking our vector components to be real-valued, we see that
(D.40) reduces to (D.36).
Exercise D.5 Prove the component form (D.40) of the inner product.
Recall that for ordinary vectors in 3-dimensions, the dot product A · B can also be
written as
A · B = AB cos θ , (D.42)
√ √
where A ≡ |A| ≡ A · A and B ≡ |B| ≡ B · B are the magnitudes of the two
vectors, and θ is the angle between them; see (A.1). If we rewrite the above equation
as
A·B
cos θ = , (D.43)
|A||B|
then it can be thought of as the definition of the angle between the two vectors in terms
of their dot products and their magnitudes. This suggests a way of generalizing the
concept of “angle between two vectors” to an arbitrary n-dimensional vector space.
Namely, simply interpret the expressions on the right-hand side of (D.43) in terms
of the inner product defined by (D.24a), (D.24b), (D.24c). Unfortunately this won’t
work since A · B is a complex number in general, so the angle θ would not be real.
But it turns out that there is a simple solution, which amounts to taking the absolute
value of the right-hand side,
|A · B| |A · B|2
cos θ = = . (D.44)
|A||B| (A · A)(B · B)
So this is now a real quantity, but the fact that this equation actually gives us something
that we can interpret as an angle is thanks to the Schwarz inequality
Exercise D.6 Prove the Schwarz inequality. (Hint: Consider the vector
(B · A)
C≡A− B, (D.46)
B·B
and then use |C|2 ≡ C · C ≥ 0. (See Fig. D.1.) Note that the Schwarz inequality
becomes an equality when A and B are proportional (i.e., parallel or anti-parallel)
to one another.)
Given our abstract n-dimensional vector space, we would now like to define a cer-
tain class of operations (called linear transformations), which map vectors to other
vectors in such a way that they preserve the linear property of vector addition and
scalar multiplication of vectors. Rotations of ordinary 3-dimensional vectors, which
play an important role in all branches of physics, are just one example of linear
transformations.
This method of introducing additional structure on a space in such a way that it inter-
acts “naturally” with other structures in the space (in this case scalar multiplication
and vector addition) is common practice in mathematics.
Note that multiplying every vector in the space by the same scalar c (i.e., A → cA)
is an example of a linear transformation since
Appendix D: Linear Algebra 487
where we used the distributive property of scalar multiplication with respect vector
addition (D.9); the associative property of scalar multiplication of vectors (D.10); the
commutative property for multiplication of two scalars (complex numbers); and the
associative property of scalar multiplication of vectors (again) to get the successive
equalities above. But adding a constant vector C to every vector in the space is not
an example of a linear transformation as you are asked to show in the following
exercise.
Exercise D.7 Prove that adding a constant vector C to every vector in the space,
i.e., A → A + C, is not an example of a linear transformation.
(S + T)A ≡ SA + TA (D.49)
With the first two operations, (D.49) and (D.50), the space of linear transforma-
tions has the structure of an n 2 -dimensional vector space over the complex numbers
(Exercise D.8). Note that multiplication of linear transformations is not commutative,
however, since
ST = TS , (D.52)
in general. (Think of rotations in 3-dimensions; see, e.g., Fig. 6.5.) We will return
to the multiplicative structure of linear transformations later in this section, after we
develop the connection between linear transformations and matrices.
488 Appendix D: Linear Algebra
Exercise D.8 Show that with the above definitions of addition of linear trans-
formations and multiplication of a linear transformation by a scalar, the set of
linear transformations has the structure of a vector space over the complex num-
bers. Note that you will need to verify that these operations satisfy the properties
given in (D.1)–(D.5) and (D.6)–(D.10).
The real beauty of the linearity property (D.47) is that once you know what a lin-
ear transformation T does to a set of basis vectors {e1 , e2 , . . . , en }, you can easily
determine what it does to any vector A. To see that this is the case, let’s begin by
writing
Te1 = T11 e1 + T21 e2 + · · · + Tn1 en = Ti1 ei , (D.53)
i
which follows from the fact that any vector (in this case Te1 ) can be written as a
linear combination of the basis vectors). Similarly,
Te2 = T12 e1 + T22 e2 + · · · + Tn2 en = Ti2 ei ,
i
.. (D.54)
.
Ten = T1n e1 + T2n e2 + · · · + Tnn en = Tin ei .
i
Thus, we see that the action of T on the n basis vectors is completely captured by
the n × n numbers Ti j , where
Te j = Ti j ei , j = 1, 2, . . . , n . (D.55)
i
where the doubleheaded arrow ↔ reminds us that T are the components of T with
respect to a particular basis. (With respect to a different basis, the matrix components
Appendix D: Linear Algebra 489
will change in a manner that we will investigate shortly.) If the basis is orthonormal,
then we can write
A = TA ⇔ Ai = Ti j A j , (D.59)
j
in terms of the corresponding components. Note that the last expression is just
the component form of ordinary matrix multiplication of two matrices.
490 Appendix D: Linear Algebra
Before discussing properties of matrices in general, let’s first determine how the
components of a vector A and the components (i.e., matrix elements) of a linear
transformation T transform under a change of basis. The classic example of a change
of basis is given by a rotation of the coordinate basis vectors x̂, ŷ, ẑ to a new set of
basis vectors x̂ , ŷ , ẑ .
So let’s denote the two sets of basis vectors by
where we use primed indices to distinguish between the two bases. We will assume,
for now, that these are arbitrary bases—i.e., we do not require that they be orthonor-
mal. Since any vector can be expanded in terms of either set of basis vectors, we can
write
e1 = S1 1 e1 + S2 1 e2 + · · · + Sn 1 en = S j 1e j ,
j
e2 = S1 2 e1 + S2 2 e2 + · · · + Sn 2 en = S j 2e j ,
j
(D.63)
..
.
en = S1 n e1 + S2 n e2 + · · · + Sn 2 en = S j n e j ,
j
ei = S j i e j , i = 1, 2, . . . , n . (D.64)
j
A j = S j i Ai , j = 1 , 2 , . . . , n . (D.67)
i
Now take a linear transformation T, which maps A to B ≡ TA. Using (D.67), (D.59)
and (D.68), it follows that
Bi = Si k Bk = Si k Tkl Al = Si k Tkl (S −1 )l j A j
k k l k l j
(D.70)
−1
= Si k Tkl (S )l j A j = Ti j A j
j k,l j
where
Ti j ≡ Si k Tkl (S −1 )l j . (D.71)
k,l
Noting that the products and summations on the right-hand side are exactly those for
a product of matrices, we have
T = STS−1 , (D.72)
As illustrated by the calculations in the last two subsections, matrices play a key
role in linear algebra. In this section, we summarize some important definitions and
operations involving matrices, which we will refer to repeatedly in the main text.
492 Appendix D: Linear Algebra
Most of the discussion will be restricted to n × n (i.e., square) matrices, although the
transpose and conjugate operations (complex conjugate and Hermitian conjugate)
can be defined for arbitrary n × m (i.e., rectangular) matrices.
The transpose, conjugate, and Hermitian conjugate of a matrix T are defined by:
T = TT ↔ Ti j = T ji , (D.74)
T = T† ↔ Ti j = T ji∗ . (D.75)
Anti-symmetric and Anti-hermitian matrices are defined with minus signs in the
last two equations.
Exercise D.10 Show that the transpose of a product of matrices equals the
product of transposes in the opposite order:
(ST)T = TT ST , (D.76)
(ST)† = T† S† . (D.77)
Note that these relations hold in general for the product of an m × n matrix and
an n × p matrix.
Exercise D.11 Show that the inner product (D.40) of two vectors A and B can
be written in terms of row and column matrices as
A · B = A† B . (D.78)
Appendix D: Linear Algebra 493
D.4.3.2 Determinants
See (A.7) for the 3-dimensional version of the Levi-Civita symbol, which enters
the expression for the vector product of two 3-dimensional vectors.
(a) Work out the explicit expression for the determinant of a 3 × 3 matrix using
the definition given in (D.81).
494 Appendix D: Linear Algebra
(b) Do the same using the earlier definition (D.80), and confirm that the two
expressions you obtain agree with one another.
(c) Using the above definition (D.81), show that the determinant of an n × n
matrix T is unchanged if you add a multiple of one row (or column) of T to
another row (or column) before taking its determinant.
The unit (or identity) matrix 1 has components given by the Kronecker delta δi j :
⎡ ⎤
1 0 ··· 0
⎢0 1 ··· 0⎥
⎢ ⎥
1=⎢. .. .. .. ⎥ . (D.83)
⎣ .. . . .⎦
0 0 ··· 1
A matrix T is said to be invertible if and only if there exist another matrix T−1 , called
the inverse matrix of T, such that
or, equivalently,
Tik (T −1 )k j = (T −1 )ik Tk j = δi j . (D.85)
k k
It turns out that a matrix is invertible if and only if its determinant is non-zero. An
explicit expression for the inverse matrix is
1
T−1 = CT , (D.86)
det T
where C is the matrix of cofactors. The inverse of a product of two invertible matrices
S and T is the product of the inverse matrices in S−1 and T−1 in the opposite order,
Exercise D.13 Calculate the inverse matrices for the general 2 × 2 and 3 × 3
matrices ⎡ ⎤
ab c
ab ⎣d e f ⎦ ,
, (D.88)
cd
gh i
TT = T−1 ↔ Tik T jk = δi j , Tki Tk j = δi j . (D.89)
k k
T† = T−1 ↔ Tik T jk∗ = δi j , Tki∗ Tk j = δi j . (D.90)
k k
1
det(T−1 ) = = (det T)−1 . (D.93)
det T
496 Appendix D: Linear Algebra
det(STS−1 ) = det S det T det(S−1 ) = det S det T (det S)−1 = det T . (D.94)
D.4.3.6 Trace
Tr(T) ≡ Tii . (D.95)
i
Thus, the trace of a matrix, like the determinant, is also invariant under a similarity
transformation, (D.72).
The last topic that we will discuss in our review of linear algebra involves eigen-
vectors and eigenvalues of a linear transformation T. Eigenvectors of T are special
vectors, which are effectively unchanged by the action of T. By “effectively un-
changed” we mean that the eigenvector need only be mapped to itself up to an
overall proportionality factor, which is called the eigenvalue of the eigenvector. If
we denote an eigenvector of T by v and its eigenvalue by λ, then
Appendix D: Linear Algebra 497
Tv = λv . (D.98)
Note that the magnitude of the eigenvector v is not fixed by the above equation as
v ≡ av is also an eigenvector of T with the same eigenvalue λ.
Tv = λv , (D.99)
where v and T are the matrix representations of v and T with respect to the basis.
This last equation is equivalent to
(T − λ1)v = 0 , (D.100)
where the right-hand side is the zero-vector 0 ≡ [0, 0, . . . , 0]T . Since this is a
homogeneous equation, v = 0 is a (trivial) solution, and it is the only solution if
(T − λ1) is invertible. Hence, a non-zero solution to this equation requires that the
matrix (T − λ1) not be invertible or, equivalently, that
where the coefficients ci are algebraic expressions involving the matrix elements Ti j .
By the fundamental theorem of algebra, this equations admits n complex roots λi ,
which might be zero or repeated multiple times,
λ+ = 1 , λ− = −1 . (D.106)
which (as expected) are linearly dependent on one another. The solution to these
equations is
v1 = v2 . (D.109)
Appendix D: Linear Algebra 499
Note that these two eigenvectors have unit norm and are orthogonal to one another,
1 1 1
v†+ v− = 11 = (1 − 1) = 0 . (D.112)
2 −1 2
Thus, the corresponding vectors v+ , v− form an orthonormal basis for the (real-
valued) 2-dimensional vector space. But as we shall explain in the next subsection,
it is not always the case that the eigenvectors of an arbitrary matrix form a basis for
the vector space.
Exercise D.17 Find the eigenvectors and eigenvalues of the 2-dimensional ro-
tation matrix
cos φ sin φ
R(φ) = . (D.113)
− sin φ cos φ
Tei = λi ei , i = 1 , 2 , . . . , n , (D.115)
500 Appendix D: Linear Algebra
has
S−1 = e1 e2 · · · en = v1 v2 · · · vn , (D.118)
or, equivalently,
(S −1 )i j = (e j )i . (D.119)
In other words, the columns of S−1 are just the eigenvectors of T in the original
basis.
Proof
(STS−1 )i j = Si k Tkl (S −1 )l j = Si k Tkl (e j )l
k l k l
= Si k λ j (e j )k = λ j Si k (S −1 )k j (D.120)
k k
= λ j δi j = Ti j ,
If, in addition to spanning the vector space, the eigenvectors of T are orthonormal,
then the matrix S also has a simple form,
⎡ ⎤
v†1
⎢ v† ⎥ †
⎢ 2⎥
S = ⎢ . ⎥ = v1 v2 · · · vn . (D.121)
⎣ .. ⎦
v†n
S† = S−1 , (D.122)
Exercise D.18 Using the results of Example D.2, show explicitly that
1 1 1
S−1 = v+ v− = √ (D.123)
2 1 −1
diagonalizes
01
T= . (D.124)
10
Its two eigenvalues λ1 , λ2 are both equal to 0, and the corresponding eigenvectors v1 ,
v2 are both proportional to [1, 0]T . Hence the eigenvectors span only a 1-dimensional
subspace of the 2-dimensional vector space, and the similarity transformation S
needed to map T to the diagonal matix
λ1 0 00
= (D.126)
0 λ2 00
does not exist. Thus, the matrix given by (D.125) cannot be diagonalized.
Fortunately, there is a certain class of matrices that are guaranteed to be diagonal-
izable. These are Hermitian matrices, for which Ti j = T ji∗ . (For a real-valued vector
space, these matrices are symmetric, i.e., Ti j = T ji .) Not only do the eigenvectors
of a Hermitian matrix span the space, but the eigenvalues are real, and the eigen-
vectors corresponding to distinct eigenvalues are orthogonal to one another. These
results are especially relevant for quantum mechanics, where the observables of the
theory are represented by Hermitian transformations. (For proofs of these statements
regarding Hermitian matrices, and for an excellent introduction to quantum theory,
see Griffiths 2005.)
1 i
T= (D.127)
−i 1
by finding its eigenvalues and eigenvectors, etc. Verify that the similarity trans-
formation that diagonalizes T is unitary.
We end this section by showing that for any matrix T (diagonalizable or not), the
determinant and trace of T can be written very simply in terms of its eigenvalues:
$
det T = λi , Tr(T) = λi . (D.128)
i i
For a diagonalizable matrix, the above two results follow immediately from
(D.116) for T and the fact that the determinant and trace of a matrix are invariant un-
der a similarity transformation, (D.94) and (D.97). For a non-diagonalizable matrix,
we proceed by first equating the expansion of the characteristic equation (D.102) in
terms of powers of λ and its factorization (D.103) in terms of its eigenvalues:
From this equality we can see that the constant term c0 is given by
$
c0 = λi , (D.130)
i
c0 = det T , (D.133)
gives
cn = (−1)n , cn−1 = (−1)n−1 Tr(T) . (D.135)
(To see this, note that the terms proportional to λn and λn−1 in (D.134) must come
from the product
(T11 − λ)(T22 − λ) · · · (Tnn − λ) , (D.136)
which leads to (D.135).) Then by comparing (D.130) and (D.131) with (D.133) and
(D.135), we get (D.128).
Suggested References
Full references are given in the bibliography at the end of the book.
Boas (2006): Chapter 3 is devoted to linear algebra. The treatment is especially suited
for undergraduates, with many examples and problems.
Dennery and Kryzwicki (1967): A mathematical methods book suited for advanced
undergraduates and graduate students. Chapter 2 discusses finite-dimensional vec-
tor spaces; Chap. 3 extends the formalism to (infinite-dimensional) function spaces.
Griffiths (2005): Appendix A provides a review of linear algebra, especially relevant
for calculations that arise in quantum mechanics. Our presentation follows that of
Griffiths.
Halmos (1958): A classic text on vector spaces and linear algebra, written primarily
for undergraduate students majoring in mathematics. As such, the mathematical
rigor is higher than that in most mathematical methods books for scientists and
engineers.
Appendix E
Special Functions
Special functions play an important role in the physical sciences. They often arise as
power series solutions of ordinary differential equations, which in turn come from a
separation-of-variables decomposition of common partial differential equations (e.g.,
Laplace’s equation, Helmholtz’s equation, the wave equation, the diffusion equation,
· · · ). Special functions behave like vectors in an infinite-dimensional vector space,
sharing many of the properties of vectors described in Appendix D. Each set of special
functions is orthogonal with respect to an inner product defined as an appropriate
integral of a product of two such functions. Special functions also form a set of basis
functions in terms of which one can expand the general solution of the original partial
differential equation.
In this appendix, we review the key properties of several special functions, with
particular emphasis on those functions that appear often in classical mechanics
applications. We will assume that the reader is already familiar with the general
(Frobenius) method of power series solutions, which we describe very briefly in Ap-
pendix E.1. As such we will omit detailed derivations of the recursion relations for
the coefficients of the various power series solutions. For those details, you should
consult, e.g., Chap. 12 in Boas (2006). The definitive source for anything related to
special functions is Abramowitz and Stegun (1972).
where p(x) and q(x) are arbitrary functions of x. We are interested in power series
solutions of the form
∞
∞
y(x) = an x n or y(x) = x σ an x n (E.2)
n=0 n=0
for some value of σ . We need to consider the second (more general) power series
expansion (i.e., with σ = 0), called a Frobenius series, if x = 0 is a regular
singular point of the differential equation—that is, if p(x) or q(x) is singular (i.e.,
infinite) at x = 0, but x p(x) and x 2 q(x) are finite at x = 0. If x = 0 is a regular
point of the differential equation, then one can simply set σ = 0 and use the first
expansion.
The basic procedure for finding a power series solution to (E.1) is to differentiate
the power series expansion for y(x) term by term, and then substitute the expansion
into the differential equation for y(x). Since the resulting sum must vanish for all
values of x, the coefficients of x n must all equal zero, leading to a recursion relation,
which relates an to some subset of the previous ar (r < n), and a quadratic equation
for σ , called the indicial equation. The following theorerm, called Fuch’s theorem,
tells us how to obtain the general solution of the differential equation from two
Frobenius series solutions.
Theorem E.1 Fuch’s theorem: The general solution of the differential equation
(E.1) with a regular singular point at x = 0 consists of of either:
(i) a sum of two Frobenius series S1 (x) and S2 (x), or
(ii) the sum of one Frobenius series S1 (x), and a second solution of the form
S1 (x) ln x + S2 (x), where S2 (x) is another Frobenius series.
Case (ii) occurs only if the roots of the indicial equation for σ are equal to one
another or differ by an integer.
If x = 0 is a regular point of the differential equation, then the general solution is
simply the sum of two ordinary series solutions.
Trigonometric and hyperbolic functions (e.g., cos θ , sin θ , cosh χ , sinh χ , etc.) can
be defined geometrically in terms of circles and hyperbolae. For example, cos θ is
the projection onto the x-axis of a point P on the unit circle making an angle θ with
respect to the x-axis. Here, instead, we define these functions in terms of power series
solutions to differential equations.
Appendix E: Special Functions 507
y + k 2 y = 0 . (E.3)
−k 2
an+2 = an . (E.4)
(n + 1) (n + 2)
Thus, there are two independent solutions, one starting with a0 and the other starting
with a1 . If a0 = A and a1 = B, then the general solution to this equation is a linear
superposition of sine and cosine functions,
where
(kx)2 (kx)4
cos kx ≡ 1 − + − ··· ,
2! 4! (E.6)
(kx)3 (kx)5
sin kx ≡ kx − + − ··· .
3! 5!
These functions can be written in terms of complex exponentials using Euler’s iden-
tity
eiθ = cos θ + i sin θ , (E.7)
which can be inverted to yield explicit expressions for the cosine and sine functions:
1 iθ
1 iθ
cos θ = e + e−iθ , sin θ = e − e−iθ . (E.8)
2 2i
The trig functions are periodic with period 2π , and form an orthogonal set of functions
on the interval [−π, π ]:
π
dx sin(nx) sin(mx) = π δnm ,
−π
π
dx cos(nx) cos(mx) = π δnm , (E.9)
−ππ
dx sin(nx) cos(mx) = 0 .
−π
508 Appendix E: Special Functions
sin(θ) 1
cos(θ)
-π +π
-1
Fig. E.1 The functions sin θ and cos θ plotted over the interval −π to π
This is a key property of trig functions used in Fourier expansions of periodic func-
tions. Plots of sin θ and cos θ are given in Fig. E.1. Finally, from sine and cosine we
can define other trig fucntions:
sin x 1 1 1
tan x ≡ ≡ , sec x ≡ , csc x ≡ . (E.10)
cos x cot x cos x sin x
Exercise E.2 Verify the orthogonality property of the sine and cosine functions,
(E.9).
y − k 2 y = 0 . (E.11)
k2
an+2 = an , (E.12)
(n + 1) (n + 2)
Appendix E: Special Functions 509
and the general solution to this equation is a linear combination of sinh and cosh
functions:
y(x) = A cosh kx + B sinh kx , (E.13)
where
(kx)2 (kx)4
cosh kx ≡ 1 + + + ··· ,
2! 4! (E.14)
(kx)3 (kx)5
sinh kx ≡ kx + + + ··· .
3! 5!
1 x
1 x
cosh x = e + e−x , sinh x = e − e−x , (E.15)
2 2
and trig functions,
From sinh and cosh we can define other hyperbolic functions, analogous to (E.10):
sinh x 1 1 1
tanh x ≡ ≡ , sech x ≡ , csch x ≡ .
cosh x coth x cosh x sinh x
(E.17)
(1 − x 2 ) y − 2x y + l(l + 1) y = 0 , (E.18)
where l is a constant. This ordinary differential equation arises when one uses sep-
aration of variables for Laplace’s equation ∇ 2 = 0 in spherical coordinates (here
510 Appendix E: Special Functions
-2 2
-1 sinh(x)
cosh(x)
tanh(x)
-3
x ≡ cos θ ). One can show that y(x) admits a regular power series solution with
recursion relation
n(n + 1) − l(l + 1)
an+2 = an , n = 0, 1, · · · (E.19)
(n + 1)(n + 2)
Using the ratio test, it follows that the power series solution converges for |x| < 1.
But one can also show (Exercise E.4, part (c)) that the power series solution diverges
at x = ±1 (corresponding to the North and South poles of the sphere) unless the
series terminates after some finite value of n.
Exercise E.4 (a) Verify the recursion relation (E.19). (b) Show that for l = 0,
the power series solution obtained by taking a0 = 0 and a1 = 1 is
1 1
y(x) = x + x 3 + x 5 + · · · . (E.20)
3 5
(c) Using the integral test, show that this solution diverges at x = 1 or x = −1.
Legendre polynomials
1
0.8
0.6
0.4
0.2
0
−0.2
−0.4
P0(x)
−0.6 P (x)
1
−0.8 P (x)
2
−1 P3(x)
−1 −0.5 0 0.5 1
x
Fig. E.3 First few Legendre polynomials Pl (x) plotted as functions of x ∈ [−1, 1]
P0 (x) = 1 ,
P1 (x) = x ,
1 (E.21)
P2 (x) = (3x 2 − 1) ,
2
1
P3 (x) = (5x 3 − 3x) .
2
Figures E.3 and E.4 give two different graphical representations of the first few Leg-
endre polynomials. Note that Pl (−x) = (−1)l Pl (x).
Exercise E.6 (a) Show that one also obtains a polynomial solution if l is a
negative integer (l = −1, −2, · · · ). (b) Verify that these solutions are the same
as those for non-negative l (e.g., l = −1 yields the same solution as l = 0,
and l = −2 yields the same solution as l = 1, etc.). Thus, there is no loss of
generality in restricting attention to l = 0, 1, · · · .
512 Appendix E: Special Functions
z 0 z 0
-1 -1
-1 0 1 -1 0 1
l
1 d
Pl (x) = l (x 2 − 1)l . (E.22)
2 l! dx
E.3.2.2 Orthogonality
The Legendre polynomials for different values of l are orthogonal to one another,
1
2
dx Pl (x)Pl (x) = δll . (E.23)
−1 2l + 1
Appendix E: Special Functions 513
Exercise E.7 Prove (E.23). (Hint: The proof of orthogonality is simple if you
write down Legendre’s equation for both Pl (x) and Pl (x); multiply these equa-
tions by Pl (x) and Pl (x); and then subtract and integrate the result between −1
and 1. The derivation of the normalization constant is harder, but can be proved
using mathematical induction and Rodrigues’ formula for Pl (x).)
E.3.2.3 Completeness
The Legendre polynomials are complete in the sense that any square-integrable func-
tion f (x) defined on the interval x ∈ [−1, 1] can be expanded in terms of Legendre
polynomials:
∞
2l + 1 1
f (x) = Al Pl (x) , where Al = dx f (x) Pl (x) . (E.24)
l=0
2 −1
3 7 11
f (x) = P1 (x) − P3 (x) + P5 (x) + · · · (E.26)
2 8 16
The Legendre polynomials can also be obtained as the coefficients of a power series
expansion in t of a so-called generating function
∞
1
√ = Pn (x) t n . (E.27)
1 − 2xt + t 2
n=0
With this result, one can rather easily express 1/r potentials using a series of Legendre
polynomials
514 Appendix E: Special Functions
∞
rl
1 <
= P (cos γ ) ,
l+1 l
(E.28)
|r − r | l=0
r >
where r< (r> ) is the smaller (larger) of r and r , and γ is the angle between r and r ,
Using the generating function, one can derive the following relations, called recur-
rence relations,1 which relate Legendre polynomials Pn (x) and their derivatives
Pn (x) to neighboring Legendre polynomials:
can be obtained by differentiating (E.30f) with respect to x and then using (E.30c).
In addition, the normalization Pn (1) = 1 also follows simply from the generating
function.
Exercise E.9 Prove the above recurrence relations by differentiating the gener-
ating function with respect to t and x separately, and then combining the various
expressions.
1 Most authors use either “recursion relation” or “recurrence relation” exclusively, and apply it to
any relation between indexed objects of different order. Here, we have decided to use “recurrence
relation” when describing relationships between special functions of different order, while using
“recursion relation” when describing relationships between the coefficients of the power series.
Appendix E: Special Functions 515
It differs from the ordinary Legendre equation, (E.18), by the extra term proportional
to m 2 . It turns out that power series solutions of this differential equation also diverge
at the poles (x = ±1) unless l = 0, 1, · · · (as before) and m = −l, −l + 1, . . . , l.
The finite solutions are called associated Legendre functions, Plm (x), and are given
by derivatives of the Legendre polynomials,
dm
Plm (x) = (−1)m (1 − x 2 )m/2 Pl (x) , for m ≥ 0 ,
dx m
(E.33)
(l − m)! m
Pl−m (x) = (−1) m
P (x) , for m < 0 .
(l + m)! l
Exercise E.10 Prove by direct substitution that the above expression for Plm (x)
satisfies the associated Legendre equation (E.32).
l = 1:
P10 (cos θ ) = cos θ ,
(E.35)
P11 (cos θ ) = − sin θ ,
l = 2:
1
P20 (cos θ ) = 3 cos2 θ − 1 ,
2
P21 (cos θ ) = −3 sin θ cos θ , (E.36)
l\m 0 1 2 3
1
-1
1 1
1 0 0
-1 -1
1 2 3
2 0 0 0
-1 -2 -3
1 2 5 15
3 0 0 0 0
-1 -2 -5 -15
Fig. E.5 The magnitude |Plm (cos θ)| of the first few associated Legendre functions plotted as
functions of cos θ in the x z plane (or yz) plane. The angle θ is measured with respect to the
positive z-axis. Similar to the plot in Fig. E.4, the sign (i.e., ±) of the associated Legendre functions
Plm (cos θ) is lost in this graphical representation. Note that the scale changes for larger values of m
l = 3:
1
P30 (cos θ ) = 5 cos3 θ − 3 cos θ ,
2
3
P31 (cos θ ) = − sin θ 5 cos2 θ − 1 , (E.37)
2
P32 (cos θ ) = 15 cos θ − cos3 θ ,
P33 (cos θ ) = −15 sin θ 1 − cos2 θ .
Plots of the magnitude of the first few of these functions are given in Fig. E.5.
Appendix E: Special Functions 517
Using Rodrigues’ formula for Legendre polynomials (E.22), we can write down
an analogous Rodrigues’ formula for associated Legendre functions, valid for both
positive and negative values of m:
(−1)m 2 m/2 d
l+m
Plm (x) = (1 − x ) (x 2 − 1)l . (E.38)
2l l! dx l+m
E.3.4.2 Orthonormality
For each m, the associated Legendre functions are orthogonal to one another,
1
2 (l + m)!
dx Plm (x)Plm (x) = δll . (E.39)
−1 2l + 1 (l − m)!
E.3.4.3 Completeness
For each m, the associated Legendre functions form a complete set (in the index l)
for square-integrable functions on x ∈ [−1, 1]:
∞
2l + 1 (l − m)! 1
f (x) = Al Plm (x) , where Al = dx f (x) Plm (x) .
l=0
2 (l + m)! −1
(E.40)
2l + 1 (l − m)!
Ylm (θ, φ) ≡ Nlm Plm (cos θ )eimφ , Nlm ≡ . (E.41)
4π (l + m)!
where
d ≡ d(cos θ ) dφ = sin θ dθ dφ . (E.43)
This is the orthonormality condition for spherical harmonics. Note that for m = 0,
spherical harmonics reduce to Legendre polynomials, up to a normalization factor:
%
2l + 1
Yl0 = Pl (cos θ ) . (E.44)
4π
and
Ylm (π − θ, φ + π ) = (−1)l Ylm (θ, φ) . (E.46)
The first equation tells you how to get Yl,−m from Ylm ; the second equation relates
the values of the spherical harmonic Ylm at antipodal (i.e., opposite) points on
the 2-sphere.
l = 1: %
3
Y11 (θ, φ) = − sin θ eiφ ,
8π
%
3
Y10 (θ, φ) = cos θ , (E.48)
4π
%
3
Y1,−1 (θ, φ) = sin θ e−iφ ,
8π
Appendix E: Special Functions 519
l = 2: %
1 15
Y22 (θ, φ) = sin2 θ e2iφ ,
4 2π
%
15
Y21 (θ, φ) = − sin θ cos θ eiφ ,
8π
%
5 3 1
Y20 (θ, φ) = cos2 θ − , (E.49)
4π 2 2
%
15
Y2,−1 (θ, φ) = sin θ cos θ e−iφ ,
8π
%
1 15
Y2,−2 (θ, φ) = sin2 θ e−2iφ .
4 2π
Since Ylm (θ, φ) differs from Plm (θ ) by only a constant multiplicative factor and phase
eimφ , the magnitude |Ylm (θ, φ)| has the same shape as |Plm (θ )| (See Fig. E.5).
E.4.1.1 Completeness
Spherical harmonics are complete in the sense that any square-integrable function
f (θ, φ) on the unit 2-sphere can be expanded in terms of spherical harmonics:
∞
l
f (θ, φ) = Alm Ylm (θ, φ) , where
l=0 m=−l (E.50)
∗
Alm = d f (θ, φ) Ylm (θ, φ) .
S2
∞
l
∗
Ylm (θ , φ )Ylm (θ, φ) = δ(n̂, n̂ ) , (E.51)
l=0 m=−l
and δ(n̂, n̂ ) is the 2-dimensional Dirac delta function on the 2-sphere:
520 Appendix E: Special Functions
1
δ(n̂, n̂ ) = δ(cos θ − cos θ )δ(φ − φ ) = δ(θ − θ )δ(φ − φ ) . (E.53)
sin θ
∞
l
(r, θ, φ) = Alm r l + Blm r −(l+1) Ylm (θ, φ) , (E.54)
l=0 m=−l
where the terms in square brackets is the solution to the radial part of Laplace’s
equation.
If one sums only over m in (E.51), one obtains the so-called addition theorem of
spherical harmonics,
l
2l + 1
∗
Ylm (θ , φ )Ylm (θ, φ) = Pl (cos γ ) , (E.55)
m=−l
4π
where
cos γ ≡ n̂ · n̂ = cos θ cos θ + sin θ sin θ cos(φ − φ ) . (E.56)
which is an expansion of the Dirac delta function on the 2-sphere in terms of the
Legendre polynomials.
Exercise E.12 Using the addition theorem, show that the 1/r potential for a
point source can be written as
∞ l
1 4π r<l
= Y ∗ (θ , φ )Ylm (θ, φ) , (E.58)
|r − r | l=0 m=−l
2l + 1 r l+1 lm
>
where r< (r> ) is the smaller (larger) of r and r . This expression is fully-
factorized into a product of functions of the unprimed and primed coordinates.
Appendix E: Special Functions 521
We can imagine rotating our coordinates through the Euler angles α, β, γ using the
zyz form of the rotation matrix R(α, β, γ ) (See Sect. 6.2.3.1). This is equivalent to
a rotation of the 2-sphere. In this case, the spherical harmonics transform according
to
l
Ylm (θ , φ ) = Dlm,m (α, β, γ )Ylm (θ, φ) , (E.59)
m =−l
where (θ , φ ) are the coordinates of a point P = (θ, φ) after the rotation of the sphere.
The fact that Ylm (θ , φ ) can be written as a linear combination of the Ylm (θ, φ) with
the same l is a consequence of the spherical harmonics being eigenfunctions of the
(rotationally-invariant) Laplacian on the unit 2-sphere with eigenvalues depending
only on l,
(2) 2
∇ Ylm (θ, φ) = −l(l + 1)Ylm (θ, φ) , (E.60)
where
(2) 2 1 ∂ ∂f 1 ∂2 f
∇ f (θ, φ) ≡ sin θ + . (E.61)
sin θ ∂θ ∂θ sin2 θ ∂φ 2
The coefficients Dlm,m (α, β, γ ) in (E.59) are called Wigner rotation matrices.
They arise in applications of group theory to quantum mechanics (Wigner 1931). In
terms of the Euler angles, the components of the Wigner rotation matrices can be
written as
Dlm,m (α, β, γ ) = e−imα dlm,m (β)e−im γ , (E.62)
where
dlm,m (β) ≡ (l + m)!(l − m)!(l + m )!(l − m )!
&
(−1)m−m +s
×
s
(l + m − s)!s!(m − m + s)!(l − m − s)! (E.63)
'
β 2l+m −m−2s β m−m +2s
× cos sin ,
2 2
and where the sum over s is chosen such that the factorials inside the summation
always remain non-negative. The Wigner matrices also satisfy
l
∗
Dlm,m (α, β, γ )Dlm ,m (α, β, γ ) = δmm (E.64)
m =−l
522 Appendix E: Special Functions
as a consequence of
∗
d Ylm (θ , φ )Yl m (θ , φ ) = δll δmm . (E.65)
S2
The two equations correspond to different choices for the sign of the separation
constant, ±k 2 . These equations can be put into more standard form by making a
change of variables x ≡ kρ, with y(x)|x=kρ ≡ R(ρ):
1 ν2
y (x) + y (x) + 1 − y(x) = 0 ,
x x2
(E.67)
1 ν2
y (x) + y (x) − 1 + y(x) = 0 .
x x2
The first equation is called Bessel’s equation of order ν; the second is called the
modified Bessel’s equation of order ν.
Exercise E.13 Show that if y(x) is a solution of Bessel’s equation, then ȳ(x) ≡
y(ix) is a solution of the modified Bessel’s equation.
Substituting this expansion into Bessel’s equation and equating coefficients multiply-
ing like powers of x leads to a quadratic equation for σ (called the indicial equation)
and a recursion relation relating an+2 to an (and σ ) for n = 0, 1, · · · . Setting a1 = 0
(which forces all of the higher-order odd coefficients to vanish) and choosing the
normalization coefficient a0 appropriately, we obtain the solution
∞
(−1)n ( x )2n+ν
Jν (x) = . (E.69)
n=0
n!(n + 1 + ν) 2
Jν (x) is called a Bessel function of the 1st kind of order ν. The function (n +
1 + ν) which appears in the denominator of the expansion coefficients is the gamma
function defined by
∞
(z) ≡ dx x z−1 e−x , Re(z) > 0 . (E.70)
0
(n + 1) = n! for n = 0, 1, · · ·
(E.71)
(z + 1) = z (z) for Re(z) > 0 .
Exercise E.14 (a) Prove (z + 1) = z(z) for Re(z) > 0. (Hint: Integrate
(z + 1) by parts taking u = x z and √dv = e−x dx.) (b) Show by explicit
calculation that (1) = 1 and (1/2) = π .
1 ( x )ν
x 1: Jν (x) → ,
(ν + 1) 2
% ( (E.72)
2 νπ π)
x 1, ν : Jν (x) → cos x − − .
πx 2 4
Thus, J0 (0) = 1 and Jν (0) = 0 for all ν = 0; while for large x, Jν (x) behaves like
a damped sinusoid, and has infinitely many zeros xνn :
Jν (xνn ) = 0 , n = 1, 2, · · · . (E.73)
524 Appendix E: Special Functions
0.5
−0.5
0 1 2 3 4 5 6 7 8 9 10
x
Fig. E.6 First few Bessel functions of the 1st kind for integer ν
Plots of the first few Bessel functions of the 1st kind for integer values of ν are given
in Fig. E.6.
Exercise E.15 Using (E.72), show that the zeros of Jν (x) are given by
1 π
xνn nπ + ν − . (E.74)
2 2
It is also possible to write Jn (x) for integer n as an integral involving trig functions,
π
1
Jn (x) = cos(nθ − x sin θ ) dθ . (E.75)
π 0
This result is useful for finding a Fourier series solution to Kepler’s equation as
discussed in Sect. 4.3.4.
Appendix E: Special Functions 525
If ν is not an integer, then J−ν (x) is the second independent solution to Bessel’s
equation. But if ν = m is an integer, then
so J−m (x) is not an independent solution for this case. A second solution, which is
independent of Jν (x) for all values of ν (integer or not) is2
Nν (x) is called a Neumann function (or a Bessel function of the 2nd kind). In
some references, Nν (x) is denoted by Yν (x).
Note that for all ν, Nν (x) → −∞ as x → 0. In addition, just as we saw for Jν (x),
Nν (x) behaves for large x like a damped sinusoid, but is 90◦ out of phase with Jν (x).
Plots of the first few Bessel functions of the 2nd kind for integer values of ν are given
in Fig. E.7.
With Jν (x) and Nν (x) as the two independent solutions to Bessel’s equation, it
follows that the most general solution to the radial part of Laplace’s equation in
cylindrical coordinates is
2 Forν = m an integer, one needs to use L’Hôpital’s rule to show that the right-hand side of the
expression defining Nm (x) is well-defined.
526 Appendix E: Special Functions
0.4
0.2
−0.2
−0.4
−0.6
N (x)
0
N (x)
−0.8 1
N2(x)
N3(x)
−1
0 1 2 3 4 5 6 7 8 9 10
x
Fig. E.7 First few Bessel functions of the 2nd kind for integer ν
The following relations hold for either Jν (x), Nν (x), or any linear combination of
these functions with constant coefficients:
Bessel functions Jν (x) satisfy the following orthogonality and normalization condi-
tions
Appendix E: Special Functions 527
a
1 2 2
dρ ρ Jν (xνn ρ/a)Jν (xνn ρ/a) = a Jν+1 (xνn ) δnn , (E.81)
0 2
where xνn and xνn are the nth and n th zeroes of Jν (x). Note that the orthogonality
of Bessel functions is with respect to different arguments of a single function Jν (x),
and not with respect to different functions Jν (x) and Jν (x) of the same argument.
(This latter case held for the Legendre polynomials Pl (x) and Pl (x).) Thus, the
orthogonality of Bessel functions is similar to the orthogonality of the sine functions
sin(n2π x/a) on the interval [0, a] for different values of n.
If the interval [0, a] becomes infinite [0, ∞), then the orthogonality and normal-
ization conditions actually become simpler,
∞
1
dρ ρ Jν (kρ)Jν (k ρ) = δ(k − k ) , (E.82)
0 k
where k now takes on a continuous range of values. This is similar to the transition
from Fourier series (basis functions eikn x with kn = n2π/a) to Fourier transforms
(basis functions eikx with k a real variable):
a/2 ∞
i2π(n−n )x/a
dx e =aδ nn → dx ei(k−k )x = 2π δ(k − k ) . (E.83)
−a/2 −∞
Exercise E.16 Prove the orthogonality part of (E.81). (Hint: Let f (ρ) =
Jν (xνn ρ/a) and g(ρ) = Jν (xνn ρ/a) with n = n . Then write down Bessel’s
equation for both f and g; multiply these equations by g and f , respectively;
then subtract and integrate.)
Exercise E.17 Prove the normalization part of (E.81). (Hint: You will need to
integrate by parts and then use Bessel’s equation to substitute for x 2 Jν (x) in one
of the integrals.)
4.5
3.5
2.5
1.5
1 I (x)
0
I (x)
1
0.5 I (x)
2
I (x)
3
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
Fig. E.8 First few modified Bessel functions of the 1st kind for integer ν
It differs from the ordinary Bessel’s equation only in the sign of one of the terms
multiplying y(x). Modified (or hyperbolic) Bessel functions (of the 1st and 2nd
kind) are solutions to the above equation. They are defined by
π ν+1 (1)
Iν (x) ≡ i−ν Jν (ix) , K ν (x) ≡ i Hν (ix) . (E.85)
2
Note the pure imaginary arguments on the right-hand side of the above definitions,
consistent with our earlier statement that if y(x) is a solution of Bessel’s equation then
y(ix) is a solution of the modified Bessel’s equation. Plots of the first few modified
Bessel functions of the first and second kind, Iν (x) and K ν (x), for integer values of
ν are given in Figs. E.8 and E.9.
The asymptotic behavior of the modified Bessel functions Iν (x) and K ν (x) are
given by
Appendix E: Special Functions 529
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
Fig. E.9 First few modified Bessel functions of the 2nd kind for integer ν
1 ( x )ν
x 1: Iν (x) → ,
(ν + 1) 2
⎧ x
⎨ − ln 2 + 0.5772 · · · , ν = 0
K ν (x) →
⎩ (ν) 2
ν
, ν = 0 (E.86)
2 x
1 1
x 1, ν : Iν (x) → √ e 1+O
x
,
2π x x
%
π −x 1
K ν (x) → e 1+O .
2x x
Thus, I0 (0) = 1 and Iν (0) = 0 for all ν = 0, while K ν (x) → ∞ as x → 0 for all ν.
For large x, Iν (x) → ∞ while K ν (x) → 0 for all ν.
Given Iν (x) and K ν (x), the most general solution to the radial part of Laplace’s
equation for the choice of negative separation constant −k 2 is
Spherical Bessel functions (of the 1st and 2nd kind) are defined in terms of ordinary
Bessel functions via
% %
π π
jn (x) ≡ J 1 (x) , n n (x) ≡ N 1 (x) , (E.88)
2x n+ 2 2x n+ 2
where n = 0, 1, 2, · · · . Given the explicit form of Jn+ 21 (x) one can show that
1 d n sin x
jn (x) = x − n
,
x dx x
n ( (E.89)
1 d cos x )
n n (x) = −x n − .
x dx x
sin x cos x
j0 (x) = , n 0 (x) = − . (E.90)
x x
Plots of the first few spherical Bessel functions are given in Figs. E.10 and E.11.
0.5
−0.5
0 1 2 3 4 5 6 7 8 9 10
x
Fig. E.10 First few spherical Bessel functions of the 1st kind
Appendix E: Special Functions 531
0.4
0.2
−0.2
−0.4
−0.6
n (x)
0
n1(x)
−0.8
n2(x)
n (x)
3
−1
0 1 2 3 4 5 6 7 8 9 10
x
Fig. E.11 First few spherical Bessel functions of the 2nd kind
Exercise E.18 Verify (E.90) for j0 (x) directly from its definition in terms of
the ordinary Bessel function J1/2 (x).
Given the relationship between jn (x) and Jn+ 21 (x), one can show that the spherical
Bessel functions satisfy the differential equation
2 n(n + 1)
jn (x) + jn (x) + 1 − jn (x) = 0 . (E.91)
x x2
The φ equation is the standard harmonic oscillator equation with separation constant
−m 2 ; the θ equation is the associated Legendre’s equation with separation constants
l and m; and the radial equation is
2 l(l + 1)
R (r ) + R (r ) + k −
2
R(r ) = 0 . (E.93)
r r2
532 Appendix E: Special Functions
which is the differential equation (E.91) we found earlier with solution y(x) = jl (x).
Elliptic integrals and elliptic functions arise in some simple applications, such as
finding the length of a conic section (e.g., an ellipse) and solving for the motion of
a simple pendulum when one goes beyond the small-angle approximation. In the
following two subsections, we briefly define elliptic integrals and elliptic functions
using the notation given in Chap. 12 of Boas 2006. Other references may use slightly
different notation.
Elliptic integrals of the 1st and 2nd kind are often written in two different forms;
the Legendre forms:
φ
dθ
F(φ, k) ≡ , 0 ≤ k ≤ 1,
0 1 − k 2 sin2 θ (E.96)
φ
E(φ, k) ≡ 1 − k 2 sin2 θ dθ , 0 ≤ k ≤ 1,
0
x
dt
F(φ, k) ≡ √ √ , 0 ≤ k ≤ 1,
0 1 − k t2 1 − t2
2
√ (E.97)
x
1 − k2t 2
E(φ, k) ≡ √ dt , 0 ≤ k ≤ 1,
0 1 − t2
with x ≡ sin φ. The two arguments of these functions are called the amplitude φ
and the modulus k. Note that the Jacobi and Legendre forms of elliptic integrals are
related by the change of variables t = sin θ .
Complete elliptic integrals of the 1st and 2nd kind, K (k) and E(k), are defined
by setting the amplitude φ = π/2 (or x = 1) in the above expressions:
Exercise E.19 (a) Show that the arc length of an ellipse (x/a)2 + (y/b)2 = 1
from θ = φ1 to θ = φ2 can be written as
Exercise E.20 (a) Show that the period of a simple pendulum of mass m, length
, released from rest at θ = θ0 is given by
P(θ0 ) = 4 K (sin(θ0 /2)) . (E.100)
g
Do not assume that the small-angle approximation is valid for this part of the
problem. (Hint: Use conservation of total mechanical energy to find an equation
for θ̇ in terms of θ and θ0 .) (b) Show that for θ0 1, the answer from part (a)
reduces to
1
P(θ0 ) ≈ 2π 1 + θ02 , (E.101)
g 16
which in the limit of very small θ0 is√the small-angle approximation for the
period of a simple pendulum, P ≈ 2π /g.
534 Appendix E: Special Functions
The elliptic function sn y is defined as the inverse of the elliptic integral y = F(φ, k)
for a fixed value of k,
x
dt
y= √ √ ≡ sn −1 x ⇔ x = sn y . (E.102)
0 1 − k2t 2 1 − t 2
Since x = sin φ, we can also write sn y = sin φ in terms of the amplitude φ. Note
that the above definition of sn y is very similar to the integral representation of the
inverse sine function
x
dt
y= √ = sin−1 x ⇔ x = sin y . (E.103)
0 1−t 2
similar to the sine function. Plots of x = sn y for k 2 = 0, 0.25, 0.5, and 0.75 are
shown in Fig. E.12. These have periods P = 6.28, 6.74, 7.42, and 8.63 to three
significant digits.
1
0.8
0.6
0.4
0.2
sn(y)
0
−0.2
−0.4
−0.6 k2=0
2
−0.8 k =0.25
k2=0.50
−1 2
k =0.75
0 2 4 6 8 10
y
Fig. E.12 Plots to the elliptic function sn y for k 2 = 0, 0.25. 0.5 and 0.75. Recall that for k 2 = 0,
sn y = sin y
Appendix E: Special Functions 535
Given sn y, one can define other elliptic functions using relations similar to those
between trig functions,
cn y ≡ 1 − sn 2 y , dn y ≡ 1 − k 2 sn 2 y . (E.105)
Using the above definitions, it is easy to show that cn y = cos φ. In addition, using
the Legendre form of the elliptic integral F(φ, k), it follows that dn y = dφ/dy. The
proof is simply
*
dφ 1
= = 1 − k 2 sin2 φ = 1 − k 2 sn 2 y = dn y . (E.106)
dy dy/dφ
Suggested References
Full references are given in the bibliography at the end of the book.
Abramowitz and Stegun (1972): A must-have reference for all things related to special
functions.
Boas (2006): Chapters 11, 12, and 13 discuss special functions, series solutions of
differential equations, and partial differential equations, respectively, filling in
most of the details omitted in this appendix. An excellent introduction to these
topics, especially suited for undergraduates. There are many examples and prob-
lems to choose from.
Mathews and Walker (1970): Chapters 1, 7, and 8 discuss ordinary differential equa-
tions, special functions, and partial differential equations, respectively. The level
of this text is more appropriate for graduate students or mathematically-minded
undergraduates.
References
B.P. Abbott, R. Abbott, T.D. Abbott, M.R. Abernathy, F. Acernese, K. Ackley, C. Adams, T.
Adams, P. Addesso, R.X. Adhikari et al., Observation of gravitational waves from a binary black
hole merger. Phys. Rev. Lett. 116(6), 061102 (2016). https://doi.org/10.1103/PhysRevLett.116.
061102
B.P. Abbott, R. Abbott, T.D. Abbott, M.R. Abernathy, F. Acernese, K. Ackley, C. Adams, T. Adams,
P. Addesso, R.X. Adhikari et al., The basic physics of the binary black hole merger GW150914.
Annalen der Physik 529, 1600209 (2017). https://doi.org/10.1002/andp.201600209
M. Abramowitz, I.A. Stegun, Handbook of Mathematical Functions (Dover Publications Inc, New
York, 1972). ISBN 0-486-61272-4
G. Arfken, Mathematical Methods for Physicists (Academic Press Inc, New York, 1970)
V.I. Arnold, Mathematical Methods of Classical Mechanics, vol. 60, Graduate Texts in Mathematics
(Springer, New York, 1978). ISBN 0-387-90314-3
M. Benacquista, An Introduction to the Evolution of Single and Binary Stars (Springer, New York,
Heidelberg, Dordrecht, London, 2013)
R.E. Berg, D.G. Stork, The Physics of Sound, 3rd edn. (Pearson Prentice Hall, Englewood Cliffs,
New Jersey, 2005). ISBN 978-0131457898
J. Bertrand, C.R. Acad. Sci. 77, 849–853 (1873)
M.L. Boas, Mathematical Methods in the Physical Sciences, 3rd edn. (John Wiley & Sons Inc,
United States of America, 2006). ISBN 0-471-19826-9
H. Bondi, Relativity and Common Sense: A New Approach to Einstein (Dover Publications Inc,
New York, 1962). ISBN 0-486-24021-5
P. Dennery, A. Kryzwicki, Mathematics for Physcists (Dover Publications Inc, Mineola, New York,
1967)
S. Dutta, S. Ray, Bead on a rotating circular hoop: a simple yet feature-rich dynamical system.
ArXiv e-prints, (December 2011)
A. Einstein, Zur Elektrodynamik bewegter Körper. Annalen der Physik 322, 891–921 (1905). https://
doi.org/10.1002/andp.19053221004
L.A. Fetter, J.D. Walecka, Theoretical Mechanics of Particles and Continua (McGraw-Hill Book
Company, United States of America, 1980)
R.P. Feynman, Surely You’re Joking Mr. Feynman! Adventures of a Curious Character (W.W. Norton
& Company, New York, London, 1985). ISBN 0-393-31604-1
R.P. Feynman, R.B. Leighton, Matthew Sands, in The Feyman Lectures on Physics, vol. II (Addison-
Wesley Publishing Company, Reading, Massachusetts, 1964). ISBN 0-201-02117-X-P
H. Flanders, Differential Forms with Applications to the Physical Sciences (Dover Publications Inc,
New York, 1963). ISBN 0-486-66169-5
M.R. Flannery, The enigma of nonholonomic constraints. Am. J. Phys. 73, 265–272 (2005). https://
doi.org/10.1119/1.1830501
I.M. Gelfand, S.V. Fomin, Calculus of Variations (Dover Publications Inc, Mineola, New York,
1963). ISBN 0-486-41448-5. (Translated and Edited by Richard A. Silverman)
H. Goldstein, C. Poole, J. Safko, Classical Mechanics, 3rd edn. (Addison Wesley, San Francisco,
CA, 2002). ISBN 0-201-65702-3
D.J. Griffiths, Introduction to Electrodynamics, 3rd edn. (Pearson Prentice Hall, United States of
America, 1999). ISBN 0-13-805326-X
D.J. Griffiths, Introduction to Quantum Mechanics, 2nd edn. (Pearson Prentice Hall, United States
of America, 2005). ISBN 0-13-111892-7
P.R. Halmos, Finite-Dimensional Vector Spaces, 2nd edn. (D. Van Nostrand Company Inc, Prince-
ton, New Jersey, 1958)
J.B. Hartle, Gravity: An Introduction to Einstein’s General Relativity (Benjamin Cummings, illus-
trate edition, January 2003). ISBN 0805386629
H. Hertz, The Principles of Mechanics Presented in a New Form (Dover Publications Inc, New
York, 2004). ISBN 978-0486495576 (The original german edition Die Prinzipien der Mechanik
in neuem zusammenhange dargestellt was published in 1894)
R.W. Hilditch, An Introduction to Close Binary Stars (Cambridge University Press, Cambridge,
2001)
K.V. Kuchǎr Theoretical mechanics. Unpublished lecture notes (1995)
J.B. Kuipers. Quarternions and Rotation Sequences: A Primer with Applications to Orbits,
Aerospace, and Virtual Reality (Princeton University Press, 1999)
C. Lanczos, The Variational Principles of Mechanics, 4th edn. (Dover Publications Inc, New York,
1949). ISBN 0-486-65067-7
L.D. Landau, E.M Lifshitz, Classical Theory of Fields, Course of Theoretical Physics, 4th edn.,
vol. 2 (Pergamon Press, Oxford, 1975). ISBN 0-08-025072-6
L.D. Landau, E.M Lifshitz, Mechanics, Course of Theoretical Physics, 3rd edn., vol. 1 (Elsevier
Ltd, Oxford, 1976). ISBN 978-0-7506-2896-9
J.B. Marion, S.T. Thornton, Classical Dynamics of Particles and Systems, 4th edn. (Saunders College
Publishing, United States of America, 1995). ISBN 0-03-097302-3
J. Mathews, R.L. Walker, Mathematical Methods of Physics (Benjamin/Cummings, United States
of America, 1970). ISBN 0-8053-7002-1
N.D. Mermin, It’s About Time: Understanding Einstein’s Relativity (Princeton University Press,
Princeton, New Jersey, 2005). ISBN 0-691-12201-6
E. Noether, Invariante Variationsprobleme. Nachr. D. König. Gesellsch. Wiss. Zu Göttingen,
1918:235–257, 1918
E. Noether, Invariant variation problems. Trans. Theory Stat. Phys. 1, 186–207 (1971). https://doi.
org/10.1080/00411457108231446
J.D. Romano, R.H. Price, Why no shear in “Div, grad, curl, and all that”? Am. J. Phys. 80(6),
519–524 (2012). https://doi.org/10.1119/1.3688678
T.D. Rossing, P.A. Wheeler, R.M. Taylor, The Science of Sound, 3rd edn. (Addison Wesley, San
Francisco, 2002). ISBN 978-0805385656
F.C. Santos, V. Soares, A.C. Tort. An English translation o Bertrand’s theorem. ArXiv e-prints,
(April 2007)
H.M. Schey, div, grad, curl and all that: An informal text on vector calculus, 3rd edn. (W.W. Norton
& Co., New York, 1996)
B. Schutz, A First Course in General Relativity (Cambridge University Press, May 2009). ISBN
9780521887052
B. Schutz, Geometrical Methods of Mathematical Physics (Cambridge University Press, Cambridge,
1980)
References 539
E.F. Taylor, J.A. Wheeler, Spacetime Physics: Introduction to special relativity (W.H. Freeman and
Company, New York, 1992)
J. Terrell, Invisibility of the Lorentz contraction. Phys. Rev. 116, 1041–1045 (1959). https://doi.
org/10.1103/PhysRev.116.1041
C.G. Torre, Introduction to Classical Field Theory. All Complete Monographs (2016). http://
digitalcommons.usu.edu/lib_mono/3/
E.P. Wigner, Gruppentheorie und ihre Anwendungen auf die Quantenmechanik der Atomspektren
(Vieweg Verlag, Braunschweig, Germany, 1931)
Index
W
Wave equation, 299–302 Y
one-dimensional, 302–306 Yaw, 204
three-dimensional, 316–320 Yukawa potential, 148
Wave number, 290
Wave vector, 317, 390
Wedge product, 107, 437–439 Z
Wigner rotation, 378, 400 Zero-rest-mass particle, see photons
Wigner rotation matrices, 521 zyz convention, 203