Joel A Shapiro Classical Mechanic

Classical Mechanics
Joel A. Shapiro
October 5, 2010
ii
Copyright C 1994-2010 by Joel A. Shapiro

All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted in any form or by any means, electronic,
mechanical, photocopying, or otherwise, without the prior written permission
of the author.
This is a preliminary version of the book, not to be considered a fully

published edition.
The author welcomes corrections, comments, and criticism.

Contents
1 Particle Kinematics 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Single Particle Kinematics . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Motion in configuration space . . . . . . . . . . . . . . 3
1.2.2 Conserved Quantities . . . . . . . . . . . . . . . . . . . 5
1.3 Systems of Particles . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 External and internal forces . . . . . . . . . . . . . . . 9
1.3.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 Generalized Coordinates for Unconstrained Systems . . 16
1.3.4 Kinetic energy in generalized coordinates . . . . . . . . 17
1.4 Phase Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Dynamical Systems . . . . . . . . . . . . . . . . . . . . 21
1.4.2 Phase Space Flows . . . . . . . . . . . . . . . . . . . . 25
2 Lagrange’s and Hamilton’s Equations 35

2.1 Lagrangian for unconstrained systems . . . . . . . . . . . . . . 36
2.2 Lagrangian for Constrained Systems . . . . . . . . . . . . . . 39
2.2.1 Some examples of the use of Lagrangians . . . . . . . . 42
2.3 Hamilton’s Principle . . . . . . . . . . . . . . . . . . . . . . . 44
2.3.1 Examples of functional variation . . . . . . . . . . . . . 46
2.4 Conserved Quantities . . . . . . . . . . . . . . . . . . . . . . . 48
2.4.1 Ignorable Coordinates . . . . . . . . . . . . . . . . . . 48
2.4.2 Energy Conservation . . . . . . . . . . . . . . . . . . . 50
2.5 Hamilton’s Equations . . . . . . . . . . . . . . . . . . . . . . . 52
2.6 Don’t plug Equations of Motion into the Lagrangian! . . . . . 54
2.7 Velocity-dependent forces . . . . . . . . . . . . . . . . . . . . . 55
iii
iv CONTENTS
3 Two Body Central Forces 65

3.1 Reduction to a one dimensional problem . . . . . . . . . . . . 65
3.1.1 Reduction to a one-body problem . . . . . . . . . . . . 65
3.1.2 Reduction to one dimension . . . . . . . . . . . . . . . 67
3.2 Integrating the motion . . . . . . . . . . . . . . . . . . . . . . 68
3.2.1 The Kepler problem . . . . . . . . . . . . . . . . . . . 70
3.2.2 Nearly Circular Orbits . . . . . . . . . . . . . . . . . . 73
3.3 The Laplace-Runge-Lenz Vector . . . . . . . . . . . . . . . . . 76
3.4 The virial theorem . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5 Rutherford Scattering . . . . . . . . . . . . . . . . . . . . . . . 78
4 Rigid Body Motion 85

4.1 Configuration space for a rigid body . . . . . . . . . . . . . . . 85
4.1.1 Orthogonal Transformations . . . . . . . . . . . . . . . 87
4.1.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 Kinematics in a rotating coordinate system . . . . . . . . . . . 93
4.3 The moment of inertia tensor . . . . . . . . . . . . . . . . . . 97
4.3.1 Motion about a fixed point . . . . . . . . . . . . . . . . 97
4.3.2 More General Motion . . . . . . . . . . . . . . . . . . . 99
4.4 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.4.1 Euler’s Equations . . . . . . . . . . . . . . . . . . . . . 106
4.4.2 Euler angles . . . . . . . . . . . . . . . . . . . . . . . . 112
4.4.3 The symmetric top . . . . . . . . . . . . . . . . . . . . 115
5 Small Oscillations 123

5.1 Small oscillations about stable equilibrium . . . . . . . . . . . 123
5.1.1 Molecular Vibrations . . . . . . . . . . . . . . . . . . . 126
5.1.2 An Alternative Approach . . . . . . . . . . . . . . . . . 132
5.2 Other interactions . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.3 String dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.4 Field theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.4.1 Lagrangian density . . . . . . . . . . . . . . . . . . . . 139
5.4.2 Three dimensional continua . . . . . . . . . . . . . . . 142
6 Hamilton’s Equations 153

6.1 Legendre transforms . . . . . . . . . . . . . . . . . . . . . . . 153
6.2 Variations on phase curves . . . . . . . . . . . . . . . . . . . . 158
6.3 Canonical transformations . . . . . . . . . . . . . . . . . . . . 159
CONTENTS v
6.4 Poisson Brackets . . . . . . . . . . . . . . . . . . . . . . . . . 162

6.5 Higher Differential Forms . . . . . . . . . . . . . . . . . . . . . 167
6.6 The natural symplectic 2-form . . . . . . . . . . . . . . . . . . 175
6.6.1 Generating Functions . . . . . . . . . . . . . . . . . . . 179
6.7 Hamilton–Jacobi Theory . . . . . . . . . . . . . . . . . . . . . 187
6.8 Action-Angle Variables . . . . . . . . . . . . . . . . . . . . . . 192
7 Perturbation Theory 197

7.1 Integrable systems . . . . . . . . . . . . . . . . . . . . . . . . 197
7.2 Canonical Perturbation Theory . . . . . . . . . . . . . . . . . 206
7.2.1 Time Dependent Perturbation Theory . . . . . . . . . 208
7.3 Adiabatic Invariants . . . . . . . . . . . . . . . . . . . . . . . 210
7.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 210
7.3.2 For a time-independent Hamiltonian . . . . . . . . . . 210
7.3.3 Slow time variation in H(q, p, t) . . . . . . . . . . . . . 211
7.3.4 Systems with Many Degrees of Freedom . . . . . . . . 217
7.3.5 Formal Perturbative Treatment . . . . . . . . . . . . . 220
7.4 Rapidly Varying Perturbations . . . . . . . . . . . . . . . . . . 222
8 Field Theory 229

8.1 Lagrangian Mechanics for Fields . . . . . . . . . . . . . . . . . 229
8.2 Special Relativity . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.3 Noether’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 242
8.3.1 Applications of Noether’s Theorem . . . . . . . . . . . 250
8.4 Examples of Relativistic Fields . . . . . . . . . . . . . . . . . 252
A Appendices 259
A.1 ijk and cross products . . . . . . . . . . . . . . . . . . . . . . 259
A.1.1 Vector Operations: δij and ijk . . . . . . . . . . . . . . 259
A.2 The gradient operator . . . . . . . . . . . . . . . . . . . . . . 262
A.3 Gradient in Spherical Coordinates . . . . . . . . . . . . . . . . 264
vi CONTENTS
Chapter 1
Particle Kinematics
1.1 Introduction
Classical mechanics, narrowly defined, is the investigation of the motion of
systems of particles in Euclidean three-dimensional space, under the influence
of specified force laws, with the motion’s evolution determined by Newton’s
second law, a second order differential equation. That is, given certain laws
determining physical forces, and some boundary conditions on the positions
of the particles at some particular times, the problem is to determine the po-
sitions of all the particles at all times. We will be discussing motions under
specific fundamental laws of great physical importance, such as Coulomb’s
law for the electrostatic force between charged particles. We will also dis-
cuss laws which are less fundamental, because the motion under them can be
solved explicitly, allowing them to serve as very useful models for approxima-
tions to more complicated physical situations, or as a testbed for examining
concepts in an explicitly evaluatable situation. Techniques suitable for broad
classes of force laws will also be developed.
The formalism of Newtonian classical mechanics, together with investi-
gations into the appropriate force laws, provided the basic framework for
physics from the time of Newton until the beginning of the last century. The
systems considered had a wide range of complexity. One might consider a
single particle on which the Earth’s gravity acts. But one could also con-
sider systems as the limit of an infinite number of very small particles, with
displacements smoothly varying in space, which gives rise to the continuum
limit. One example of this is the consideration of transverse waves on a
1
2 CHAPTER 1. PARTICLE KINEMATICS
stretched string, in which every point on the string has an associated degree
of freedom, its transverse displacement.
The scope of classical mechanics was broadened in the 19th century, in
order to consider electromagnetism. Here the degrees of freedom were not
just the positions in space of charged particles, but also other quantities,
distributed throughout space, such as the the electric field at each point.
This expansion in the type of degrees of freedom has continued, and now in
fundamental physics one considers many degrees of freedom which correspond
to no spatial motion, but one can still discuss the classical mechanics of such
systems.
As a fundamental framework for physics, classical mechanics gave way
on several fronts to more sophisticated concepts in the early 1900’s. Most
dramatically, quantum mechanics has changed our focus from specific solu-
tions for the dynamical degrees of freedom as a function of time to the wave
function, which determines the probabilities that a system have particular
values of these degrees of freedom. Special relativity not only produced a
variation of the Galilean invariance implicit in Newton’s laws, but also is, at
a fundamental level, at odds with the basic ingredient of classical mechanics
— that one particle can exert a force on another, depending only on their
simultaneous but different positions. Finally general relativity brought out
the narrowness of the assumption that the coordinates of a particle are in a
Euclidean space, indicating instead not only that on the largest scales these
coordinates describe a curved manifold rather than a flat space, but also that
this geometry is itself a dynamical field.
Indeed, most of 20th century physics goes beyond classical Newtonian
mechanics in one way or another. As many readers of this book expect
to become physicists working at the cutting edge of physics research, and
therefore will need to go beyond classical mechanics, we begin with a few
words of justification for investing effort in understanding classical mechanics.
First of all, classical mechanics is still very useful in itself, and not just
for engineers. Consider the problems (scientific — not political) that NASA
faces if it wants to land a rocket on a planet. This requires an accuracy
of predicting the position of both planet and rocket far beyond what one
gets assuming Kepler’s laws, which is the motion one predicts by treating
the planet as a point particle influenced only by the Newtonian gravitational
field of the Sun, also treated as a point particle. NASA must consider other
effects, and either demonstrate that they are ignorable or include them into
the calculations. These include
1.2. SINGLE PARTICLE KINEMATICS 3
• multipole moments of the sun

• forces due to other planets
• effects of corrections to Newtonian gravity due to general relativity
• friction due to the solar wind and gas in the solar system
Learning how to estimate or incorporate such effects is not trivial.
Secondly, classical mechanics is not a dead field of research — in fact, in
the last few decades there has been a great deal of interest in “dynamical
systems”. Attention has shifted from calculation of the trajectory over fixed
intervals of time to questions of the long-term stability of the motion. New
ways of looking at dynamical behavior have emerged, such as chaos and
fractal systems.
Thirdly, the fundamental concepts of classical mechanics provide the con-
ceptual framework of quantum mechanics. For example, although the Hamil-
tonian and Lagrangian were developed as sophisticated techniques for per-
forming classical mechanics calculations, they provide the basic dynamical
objects of quantum mechanics and quantum field theory respectively. One
view of classical mechanics is as a steepest path approximation to the path
integral which describes quantum mechanics. This integral over paths is of
a classical quantity depending on the “action” of the motion.
So classical mechanics is worth learning well, and we might as well jump
right in.
1.2 Single Particle Kinematics

We start with the simplest kind of system, a single unconstrained particle,
free to move in three dimensional space, under the influence of a force F~ .
1.2.1 Motion in configuration space

The motion of the particle is described by a function which gives its posi-
tion as a function of time. These positions are points in Euclidean space.
Euclidean space is similar to a vector space, except that there is no special
point which is fixed as the origin. It does have a metric, that is, a notion
of distance between any two points, D(A, B). It also has the concept of a
displacement A − B from one point B in the Euclidean space to another,
A. These displacements do form a vector space, and for a three-dimensional

Euclidean space, the vectors form a three-dimensional real vector space R3 ,
which can be given an orthonormalqP basis such that the distance between A
3 2
and B is given by D(A, B) = i=1 [(A − B)i ] . Because the mathematics
of vector spaces is so useful, we often convert our Euclidean space to a vector
space by choosing a particular point as the origin. Each particle’s position
is then equated to the displacement of that position from the origin, so that
it is described by a position vector ~r relative to this origin. But the origin
has no physical significance unless it has been choosen in some physically
meaningful way. In general the multiplication of a position vector by a scalar
is as meaningless physically as saying that 42nd street is three times 14th
street. The cartesian components of the vector ~r, with respect to some fixed
though arbitrary coordinate system, are called the coordinates, cartesian co-
ordinates in this case. We shall find that we often (even usually) prefer to
change to other sets of coordinates, such as polar or spherical coordinates,
but for the time being we stick to cartesian coordinates.
The motion of the particle is the function ~r(t) of time. Certainly one of
the central questions of classical mechanics is to determine, given the physical
properties of a system and some initial conditions, what the subsequent mo-
tion is. The required “physical properties” is a specification of the force, F~ .
The beginnings of modern classical mechanics was the realization early in the
17th century that the physics, or dynamics, enters into the motion (or kine-
matics) through the force and its effect on the acceleration, and not through
any direct effect of dynamics on the position or velocity of the particle.
Most likely the force will depend on the position of the particle, say for a
particle in the gravitational field of a fixed (heavy) source at the origin, for
which
GM m
F~ (~r) = − 3 ~r. (1.1)
r
But the force might also depend explicitly on time. For example, for the
motion of a spaceship near the Earth, we might assume that the force is
given by sum of the Newtonian gravitational forces of the Sun, Moon and
Earth. Each of these forces depends on the positions of the corresponding
heavenly body, which varies with time. The assumption here is that the
motion of these bodies is independent of the position of the light spaceship.
We assume someone else has already performed the nontrivial problem of
finding the positions of these bodies as functions of time. Given that, we
can write down the force the spaceship feels at time t if it happens to be at
position ~r,
~ S (t)
~r − R ~ E (t)
~r − R
F~ (~r, t) = −GmMS − GmM E
|r − RS (t)|3 |r − RE (t)|3
~r − R ~ M (t)
−GmMM .
|r − RM (t)|3
So there is an explicit dependence on t Finally, the force might depend on

the velocity of the particle, as for example for the Lorentz force on a charged
particle in electric and magnetic fields
F~ (~r, ~v , t) = q E(~
~ r, t) + q ~v × B(~
~ r, t). (1.2)
However the force is determined, it determines the motion of the particle

through the second order differential equation known as Newton’s Second
Law
d2~r
F~ (~r, ~v , t) = m~a = m 2 .
dt
As this is a second order differential equation, the solution depends in general
on two arbitrary (3-vector) parameters, which we might choose to be the
initial position and velocity, ~r(0) and ~v (0).
For a given physical situation and a given set of initial conditions for
the particle, Newton’s laws determine the motion ~r(t), which is a curve in
configuration space parameterized by time t, known as the trajectory in
configuration space. If we consider the curve itself, independent of how it
depends on time, this is called the orbit of the particle. For example, the
orbit of a planet, in the approximation that it feels only the field of a fixed
sun, is an ellipse. That word does not imply any information about the time
dependence or parameterization of the curve.
1.2.2 Conserved Quantities

While we tend to think of Newtonian mechanics as centered on Newton’s
Second Law in the form F~ = m~a, he actually started with the observation
that in the absence of a force, there was uniform motion. We would now say
that under these circumstances the momentum p~(t) is conserved, d~p/dt =
0. In his second law, Newton stated the effect of a force as producing a rate
of change of momentum, which we would write as
F~ = d~p/dt,
rather than as producing an acceleration F~ = m~a. In focusing on the con-

cept of momentum, Newton emphasized one of the fundamental quantities of
physics, useful beyond Newtonian mechanics, in both relativity and quantum
mechanics1 . Only after using the classical relation of momentum to velocity,
p~ = m~v , and the assumption that m is constant, do we find the familiar
F~ = m~a.
One of the principal tools in understanding the motion of many systems
is isolating those quantities which do not change with time. A conserved
quantity is a function of the positions and momenta, and perhaps explicitly
of time as well, Q(~r, p~, t), which remains unchanged when evaluated along
the actual motion, dQ(~r(t), p~(t), t)/dt = 0. A function depending on the
positions, momenta, and time is said to be a function on extended phase
space2 . When time is not included, the space is called phase space. In this
language, a conserved quantity is a function on extended phase space with
a vanishing total time derivative along any path which describes the motion
of the system.
A single particle with no forces acting on it provides a very simple exam-
ple. As Newton tells us, p~˙ = d~p/dt = F~ = 0, so the momentum is conserved.
There are three more conserved quantities Q(~~ r, p~, t) := ~r(t) − t~p(t)/m, which
have a time rate of change dQ/dt = ~r − p~/m −tp~˙/m = 0. These six indepen-
~ ˙
dent conserved quantities are as many as one could have for a system with
a six dimensional phase space, and they completely solve for the motion. Of
course this was a very simple system to solve. We now consider a particle
under the influence of a force.
Energy
Consider a particle under the influence of an external force F~ . In general,
the momentum will not be conserved, although if any cartesian component
of the force vanishes along the motion, that component of the momentum
1
The relationship of momentum to velocity is changed in these extensions, however.
2
Phase space is discussed further in section 1.4.
will be conserved. Also the kinetic energy, defined as T = 12 m~v 2 , will not
in general be conserved, because
dT
= m~v˙ · ~v = F~ · ~v .
dt
As the particle moves from the point ~ri to the point ~rf the total change in
the kinetic energy is the work done by the force F~ ,
Z ~rf
∆T = F~ · d~r.
~
ri
If the force law F~ (~r, p~, t) applicable to the particle is independent of time
and velocity, then the work done will not depend on how quickly the particle
moved along the path from ~ri to ~rf . If in addition the work done is inde-
pendent of the path taken between these points, so it depends only on the
endpoints, then the force is called a conservative force and we assosciate
with it potential energy
Z ~r0
U (~r) = U (~r0 ) + F~ (~r 0 ) · d~r 0 ,
~
r
where ~r0 is some arbitrary reference position and U (~r0 ) is an arbitrarily

chosen reference energy, which has no physical significance in ordinary me-
chanics. U (~r) represents the potential the force has for doing work on the
particle if the particle is at position ~r.
rf rf
The condition for the path inte-

gral to be independent of the path is
that it gives the same results along
any two coterminous paths Γ1 and Γ2 , Γ2
or alternatively that it give zero when Γ
evaluated along any closed path such
as Γ = Γ1 − Γ2 , the path consisting of Γ1
following Γ1 and then taking Γ2 back- ri ri
wards to the starting point. By Stokes’ Independence of path Γ1 = Γ2 is
R R
Theorem, this line integral is equiva- equivalent to vanishing of the path
lent to an integral over any surface S integral over closed paths Γ, which
bounded by Γ, is in turn equivalent to the vanishing
I Z of the curl on the surface whose
F~ · d~r = ~ × F~ dS.
∇ boundary is Γ.
Γ S
Thus the requirement that the integral of F~ · d~r vanish around any closed
path is equivalent to the requirement that the curl of F~ vanish everywhere
in space.
By considering an infinitesimal path from ~r to ~r + ∆~r, we see that
~ − U (~r) = −F~ · ∆~r, or
U (~r + ∆)
F~ (r) = −∇U
~ (r).
The value of the concept of potential energy is that it enables finding

a conserved quantity, the total energy, in situtations in which all forces are
conservative. Then the total energy E = T + U changes at a rate
dE dT d~r ~
= + · ∇U = F~ · ~v − ~v · F~ = 0.
dt dt dt
The total energy can also be used in systems with both conservative and non-
conservative forces, giving a quantity whose rate of change is determined by
the work done only by the nonconservative forces. One example of this use-
fulness is in the discussion of a slightly damped harmonic oscillator driven by
a periodic force near resonance. Then the amplitude of steady-state motion
is determined by a balence between the average power input by the driving
force and the average power dissipated by friction, the two nonconservative
forces in the problem, without needing to worry about the work done by the
spring.
Angular momentum
Another quantity which is often useful because it may be conserved is the an-
gular momentum. The definition requires a reference point in the Euclidean
space, say ~r0 . Then a particle at position ~r with momentum p~ has an angu-
lar momentum about ~r0 given by L ~ = (~r − ~r0 ) × p~. Very often we take the
reference point ~r0 to be the same as the point we have chosen as the origin
in converting the Euclidian space to a vector space, so ~r0 = 0, and
~ = ~r × p~
L
~
dL d~r d~p 1
= × p~ + ~r × = p~ × p~ + ~r × F~ = 0 + ~τ = ~τ .
dt dt dt m
where we have defined the torque about ~r0 as τ = (~r − ~r0 ) × F~ in general,
and τ = ~r × F~ when our reference point ~r0 is at the origin.
1.3. SYSTEMS OF PARTICLES 9
We see that if the torque ~τ (t) vanishes (at all times) the angular momen-
tum is conserved. This can happen not only if the force is zero, but also if
the force always points to the reference point. This is the case in a central
force problem such as motion of a planet about the sun.
1.3 Systems of Particles

So far we have talked about a system consisting of only a single particle,
possibly influenced by external forces. Consider now a system of n particles
with positions ~ri , i = 1, . . . , n, in flat space. The configuration of the system
then has 3n coordinates (configuration space is R3n ), and the phase space
has 6n coordinates {~ri , p~i }.
1.3.1 External and internal forces

Let F~i be the total force acting on particle i. It is the sum of the forces
produced by each of the other particles and that due to any external force.
Let F~ji be the force particle j exerts on particle i and let F~iE be the external
force on particle i. Using Newton’s second law on particle i, we have
F~i = F~iE + F~ji = p~˙i = mi~v˙ i ,
X
where mi is the mass of the i’th particle. Here we are assuming forces have
identifiable causes, which is the real meaning of Newton’s second law, and
that the causes are either individual particles or external forces. Thus we are
assuming there are no “three-body” forces which are not simply the sum of
“two-body” forces that one object exerts on another.
Define the center of mass and total mass
P
~ = Pmi~ri ,
X
R M= mi .
mi
Then if we define the total momentum
d X ~
dR
P~ =
X X
p~i = mi~vi = mi~ri = M ,
dt dt
we have
dP~ ˙ X˙
= P~ = F~i = F~iE + F~ji .
X X X
p~i =
dt i ij
Let us define F~ E = i F~iE to be the total external force. If Newton’s

P
Third Law holds,

F~ji = −F~ij , so F~ij = 0, and
X
ij
˙
P~ = F~ E . (1.3)
Thus the internal forces cancel in pairs in their effect on the total momentum,
which changes only in response to the total external force. As an obvious
but very important consequence3 the total momentum of an isolated system
is conserved.
The total angular momentum is also just a sum over the individual an-
gular momenta, so for a system of point particles,
~ = ~i =
X X
L L ~ri × p~i .
Its rate of change with time is
~
dL ~˙ = ~ri × F~i = 0 + ~ri × F~iE + ~ri × F~ji .
X X X X
=L ~vi × p~i +
dt i i ij
3
There are situations and ways of describing them in which the law of action and
reaction seems not to hold. For example, a current i1 flowing through a wire segment d~s1
contributes, according to the law of Biot and Savart, a magnetic field dB~ = µ0 i1 d~s1 ×
3
~r/4π|r| at a point ~r away from the current element. If a current i2 flows through a
segment of wire d~s2 at that point, it feels a force
µ0 d~s2 × (d~s1 × ~r)
F~12 = i1 i2
4π |r|3
due to element 1. On the other hand F~21 is given by the same expression with d~s1 and
d~s2 interchanged and the sign of ~r reversed, so
µ0 i1 i2
F~12 + F~21 = [d~s1 (d~s2 · ~r) − d~s2 (d~s1 · ~r)] ,
4π |r|3
which is not generally zero.
One should not despair for the validity of momentum conservation. The Law of Biot
and Savart only holds for time-independent current distributions. Unless the currents form
closed loops, there will be a charge buildup and Coulomb forces need to be considered. If
the loops are
R R closed, the total momentum will involve integrals over the two closed loops,
for which F12 + F21 can be shown to vanish. More generally, even the sum of the
momenta of the current elements is not the whole story, because there is momentum in
the electromagnetic field, which will be changing in the time-dependent situation.
The total external torque is naturally defined as
~ri × F~iE ,
X
~τ =
i
so we might ask if the last term vanishes due the Third Law, which permits
us to rewrite F~ji = 12 F~ji − F~ij . Then the last term becomes
1X 1X
~ri × F~ji = ~ri × F~ji − ~ri × F~ij
X
ij 2 ij 2 ij
1X 1X
= ~ri × F~ji − ~rj × F~ji
2 ij 2 ij
1X
= (~ri − ~rj ) × F~ji .
2 ij
This is not automatically zero, but vanishes if one assumes a stronger form
of the Third Law, namely that the action and reaction forces between two
particles acts along the line of separation of the particles. If the force law
is independent of velocity and rotationally and translationally symmetric,
there is no other direction for it to point. For spinning particles and magnetic
forces the argument is not so simple — in fact electromagnetic forces between
moving charged particles are really only correctly viewed in a context in which
the system includes not only the particles but also the fields themselves.
For such a system, in general the total energy, momentum, and angular
momentum of the particles alone will not be conserved, because the fields can
carry all of these quantities. But properly defining the energy, momentum,
and angular momentum of the electromagnetic fields, and including them in
the totals, will result in quantities conserved as a result of symmetries of the
underlying physics. This is further discussed in section 8.3.
Making the assumption that the strong form of Newton’s Third Law
holds, we have shown that
~
dL
~τ = . (1.4)
dt
The conservation laws are very useful because they permit algebraic so-
lution for part of the velocity. Taking a single particle as an example, if
E = 12 mv 2 + U (~r) is conserved, the speed |v(t)| is determined at all times
~ is conserved,
(as a function of ~r) by one arbitrary constant E. Similarly if L
the components of ~v which are perpendicular to ~r are determined in terms

~ With both conserved, ~v is completely determined
of the fixed constant L.
except for the sign of the radial component. Examples of the usefulness of
conserved quantities are everywhere, and will be particularly clear when we
consider the two body central force problem later. But first we continue our
discussion of general systems of particles.
As we mentioned earlier, the total angular momentum depends on the
point of evaluation, that is, the origin of the coordinate system used. We
now show that it consists of two contributions, the angular momentum about
the center of mass and the angular momentum of a fictitious point object
located at the center of mass. Let ~r 0i be the position of the i’th particle with
~ Then
respect to the center of mass, so ~r 0i = ~ri − R.

~˙

~ = ~ × ~r˙ 0 + R
mi ~r 0i + R
X X
L mi~ri × ~vi = i
i i
mi~r 0i × ~r˙ 0i + ~˙
mi~r 0i × R
X X
=
i i
~× mi~r˙ 0i + M R ~˙
~ ×R
X
+R
~ × P~ .
~r 0i × ~p 0i + R
X
=
i
Here we have noted that mi~r 0i = 0, and also its derivative mi~v 0i = 0.
P P
We have defined ~p 0i = mi~v 0i , the momentum in the center of mass reference

frame. The first term of the final form is the sum of the angular momenta
of the particles about their center of mass, while the second term is the
angular momentum the system would have if it were collapsed to a point at
the center of mass. Notice we did not need to assume the center of mass is
unaccelerated.
What about the total energy? The kinetic energy
1X 1X 0 ~ 0 ~
T = mi vi2 = mi ~v i + V · ~v i + V
2 2
1X 2 1
= mi v 0 i + M V 2 , (1.5)
2 2
where V~ = R~˙ is the velocity of the center of mass. The cross term vanishes
once again, because mi~v 0i = 0. Thus the kinetic energy of the system can
P
also be viewed as the sum of the kinetic energies of the constituents about
the center of mass, plus the kinetic energy the system would have if it were
collapsed to a particle at the center of mass.
If the forces on the system are due to potentials, the total energy will
be conserved, but this includes not only the potential due to the external
P
forces but also that due to interparticle forces, Uij (~ri , ~rj ). In general this
contribution will not be zero or even constant with time, and the internal
potential energy will need to be considered. One exception to this is the case
of a rigid body.
1.3.2 Constraints
A rigid body is defined as a system of n particles for which all the inter-
particle distances are constrained to fixed constants, |~ri − ~rj | = cij , and the
interparticle potentials are functions only of these interparticle distances. As
these distances do not vary, neither does the internal potential energy. These
interparticle forces cannot do work, and the internal potential energy may
be ignored.
The rigid body is an example of a constrained system, in which the gen-
eral 3n degrees of freedom are restricted by some forces of constraint which
place conditions on the coordinates ~ri , perhaps in conjunction with their mo-
menta. In such descriptions we do not wish to consider or specify the forces
themselves, but only their (approximate) effect. The forces are assumed to
be whatever is necessary to have that effect. It is generally assumed, as in
the case with the rigid body, that the constraint forces do no work under dis-
placements allowed by the constraints. We will consider this point in more
detail later.
If the constraints can be phrased so that they are on the coordinates
and time only, as Φi (~r1 , ...~rn , t) = 0, i = 1, . . . , k, they are known as holo-
nomic constraints. These constraints determine hypersurfaces in configu-
ration space to which all motion of the system is confined. In general this
hypersurface forms a 3n − k dimensional manifold. We might describe the
configuration point on this manifold in terms of 3n − k generalized coordi-
nates, qj , j = 1, . . . , 3n − k, so that the 3n − k variables qj , together with the
k constraint conditions Φi ({~ri }) = 0, determine the ~ri = ~ri (q1 , . . . , q3n−k , t)
The constrained subspace of

configuration space need not be a
flat space. Consider, for exam- z
ple, a mass on one end of a rigid
light rod of length L, the other
end of which is fixed to be at the θ
origin ~r = 0, though the rod is L
completely free to rotate. Clearly
the possible values of the carte- y
sian coordinates ~r of the position ϕ
of the mass satisfy the constraint
|~r| = L, so ~r lies on the sur- x
face of a sphere of radius L. We
might choose as generalized coor- Generalized coordinates (θ, φ) for
dinates the standard spherical an- a particle constrained to lie on a
gles θ and φ. Thus the constrained sphere.
subspace is two dimensional but [Note: mathematics books tend
not flat — rather it is the surface to interchange θ and φ from the
of a sphere, which mathematicians choice we use here, which is what
call S 2 . It is natural to reexpress most physics books use.]
the dynamics in terms of θ and φ.
Note that with this constrained configuration space, we see that ideas
common in Euclidean space are no longer clear. The displacement between
two points A and B, as a three vector, cannot be added to a general point
C, and in two dimensions, a change, for example, of ∆φ is a very differnent
change in configuration depending on what θ is.
The use of generalized (non-cartesian) coordinates is not just for con-
strained systems. The motion of a particle in a central force field about the
origin, with a potential U (~r) = U (|~r|), is far more naturally described in
terms of spherical coordinates r, θ, and φ than in terms of x, y, and z.
Before we pursue a discussion of generalized coordinates, it must be
pointed out that not all constraints are holonomic. The standard example is
a disk of radius R, which rolls on a fixed horizontal plane. It is constrained
to always remain vertical, and also to roll without slipping on the plane. As
coordinates we can choose the x and y of the center of the disk, which are
also the x and y of the contact point, together with the angle a fixed line on
the disk makes with the downward direction, φ, and the angle the axis of the
disk makes with the x axis, θ.
As the disk rolls through an

angle dφ, the point of contact
moves a distance Rdφ in a di-
rection depending on θ, z
Rdφ sin θ = dx R
Rdφ cos θ = dy
φ y
Dividing by dt, we get two con-
straints involving the positions
and velocities, x θ
Φ1 := Rφ̇ sin θ − ẋ = 0 A vertical disk free to roll on a plane. A
Φ2 := Rφ̇ cos θ − ẏ = 0. fixed line on the disk makes an angle of φ
with respect to the vertical, and the axis of
The fact that these involve the disk makes an angle θ with the x-axis.
velocities does not automati- The long curved path is the trajectory of
cally make them nonholonomic. the contact point. The three small paths
In the simpler one-dimensional are alternate trajectories illustrating that
problem in which the disk is x, y, and φ can each be changed without
any net change in the other coordinates.
confined to the yz plane, rolling
along x = 0 (θ = 0), we would have only the coordinates φ and y, with
the rolling constraint Rφ̇ − ẏ = 0. But this constraint can be integrated,
Rφ(t) − y(t) = c, for some constant c, so that it becomes a constraint among
just the coordinates, and is holomorphic. This cannot be done with the two-
dimensional problem. We can see that there is no constraint among the four
coordinates themselves because each of them can be changed by a motion
which leaves the others unchanged. Rotating θ without moving the other
coordinates is straightforward. By rolling the disk along each of the three
small paths shown to the right of the disk, we can change one of the variables
x, y, or φ, respectively, with no net change in the other coordinates. Thus
all values of the coordinates4 can be achieved in this fashion.
There are other, less interesting, nonholonomic constraints given by in-
equalities rather than constraint equations. A bug sliding down a bowling
4
Thus the configuration space is x ∈ R, y ∈ R, θ ∈ [0, 2π) and φ ∈ [0, 2π), or, if
we allow more carefully for the continuity as θ and φ go through 2π, the more accurate
2
statement is that configuration space is R × (S 1 )2 , where S 1 is the circumference of a
circle, θ ∈ [0, 2π], with the requirement that θ = 0 is equivalent to θ = 2π.
ball obeys the constraint |~r| ≥ R. Such problems are solved by considering
the constraint with an equality (|~r| = R), but restricting the region of va-
lidity of the solution by an inequality on the constraint force (N ≥ 0), and
then supplementing with the unconstrained problem once the bug leaves the
surface.
In quantum field theory, anholonomic constraints which are functions of
the positions and momenta are further subdivided into first and second class
constraints à la Dirac, with the first class constraints leading to local gauge
invariance, as in Quantum Electrodynamics or Yang-Mills theory. But this
is heading far afield.
1.3.3 Generalized Coordinates for Unconstrained Sys-

tems
Before we get further into constrained systems and D’Alembert’s Principle,
we will discuss the formulation of a conservative unconstrained system in
generalized coordinates. Thus we wish to use 3n generalized coordinates qj ,
which, together with time, determine all of the 3n cartesian coordinates ~ri :
~ri = ~ri (q1 , ..., q3n , t).
Notice that this is a relationship between different descriptions of the same

point in configuration space, and the functions ~ri ({q}, t) are independent of
the motion of any particle. We are assuming that the ~ri and the qj are each
a complete set of coordinates for the space, so the q’s are also functions of
the {~ri } and t:
qj = qj (~r1 , ..., ~rn , t).
The t dependence permits there to be an explicit dependence of this relation
on time, as we would have, for example, in relating a rotating coordinate
system to an inertial cartesian one.
Let us change the cartesian coordinate notation slightly, with {xk } the
3n cartesian coordinates of the n 3-vectors ~ri , deemphasizing the division of
these coordinates into triplets.
A small change in the coordinates of a particle in configuration space,
whether an actual change over a small time interval dt or a “virtual” change
between where a particle is and where it might have been under slightly
altered circumstances, can be described by a set of δxk or by a set of δqj . If
we are talking about a virtual change at the same time, these are related by
the chain rule
X ∂xk X ∂qj
δxk = δqj , δqj = δxk , (for δt = 0). (1.6)
j ∂qj k ∂xk
For the actual motion through time, or any variation where δt is not assumed
to be zero, we need the more general form,
X ∂xk ∂xk X ∂qj ∂qk
δxk = δqj + δt, δqj = δxk + δt. (1.7)
j ∂qj ∂t k ∂xk ∂t
A virtual displacement, with δt = 0, is the kind of variation we need to

find the forces described by a potential. Thus the force is
∂U ({x}) X ∂U ({x({q})}) ∂qj X ∂qj

Fk = − =− = Qj , (1.8)
∂xk j ∂qj ∂xk j ∂xk
where
X ∂xk ∂U ({x({q})})
Qj := Fk =− (1.9)
k ∂qj ∂qj
is known as the generalized force. We may think of Ũ (q, t) := U (x(q), t)

as a potential in the generalized coordinates {q}. Note that if the coordinate
transformation is time-dependent, it is possible that a time-independent po-
tential U (x) will lead to a time-dependent potential Ũ (q, t), and a system
with forces described by a time-dependent potential is not conservative.
The definition of the generalized force Qj in the left part of (1.9) holds
even if the cartesian force is not described by a potential.
The qk do not necessarily have units of distance. For example, one qk
might be an angle, as in polar or spherical coordinates. The corresponding
component of the generalized force will have the units of energy and we might
consider it a torque rather than a force.
1.3.4 Kinetic energy in generalized coordinates

We have seen that, under the right circumstances, the potential energy can be
thought of as a function of the generalized coordinates qk , and the generalized
forces Qk are given by the potential just as for ordinary cartesian coordinates
and their forces. Now we examine the kinetic energy
1X 2 1X
T = mi~r˙i = mj ẋ2j
2 i 2 j
where the 3n values mj are not really independent, as each particle has the
same mass in all three dimensions in ordinary Newtonian mechanics5 . Now
 
∆xj X ∂xj ∆qk ∂xj
ẋj = lim = lim  + ,
∆t→0 ∆t
k ∂qk q,t ∆t ∂t q
∆t→0
where |q,t means that t and the q’s other than qk are held fixed. The last
term is due to the possibility that the coordinates xi (q1 , ..., q3n , t) may vary
with time even for fixed values of qk . So the chain rule is giving us

dxj X ∂xj ∂xj
ẋj = = q̇k + . (1.10)
dt k ∂qk q,t
∂t q
Plugging this into the kinetic energy, we see that
 2
1X ∂xj ∂xj X ∂xj ∂xj 1X ∂xj 
T = mj q̇k q̇` + mj q̇k + mj
 . (1.11)
2 j,k,` ∂qk ∂q` j,k ∂q k ∂t
q
2 j ∂t
q
What is the interpretation of these terms? Only the first term arises if the
relation between x and q is time independent. The second and third terms
are the sources of the ~r˙ · (~ω × ~r) and (~ω × ~r)2 terms in the kinetic energy
when we consider rotating coordinate systems6 .
5
But in an anisotropic crystal, the effective mass of a particle might in fact be different
in different directions.
6
This will be fully developed in section 4.2
Let’s work a simple example: we

will consider a two dimensional system
using polar coordinates with θ measured r
from a direction rotating at angular ve- θ
locity ω. Thus the angle the radius vec- x2
ωt
tor to an arbitrary point (r, θ) makes
x1
with the inertial x1 -axis is θ + ωt, and
the relations are
x1 = r cos(θ + ωt),
x2 = r sin(θ + ωt),
Rotating polar coordinates
with inverse relations related to inertial cartesian
q
coordinates.
r = x21 + x22 ,
θ = sin−1 (x2 /r) − ωt.
So ẋ1 = ṙ cos(θ + ωt) − θ̇r sin(θ + ωt) − ωr sin(θ + ωt), where the last term
is from ∂xj /∂t, and ẋ2 = ṙ sin(θ + ωt) + θ̇r cos(θ + ωt) + ωr cos(θ + ωt). In
the square, things get a bit simpler, ẋ2i = ṙ2 + r2 (ω + θ̇)2 .
P
We see that the form of the kinetic energy in terms of the generalized co-
ordinates and their velocities is much more complicated than it is in cartesian
inertial coordinates, where it is coordinate independent, and a simple diago-
nal quadratic form in the velocities. In generalized coordinates, it is quadratic
but not homogeneous7 in the velocities, and with an arbitrary dependence on
the coordinates. In general, even if the coordinate transformation is time in-
dependent, the form of the kinetic energy is still coordinate dependent and,
while a purely quadratic form in the velocities, it is not necessarily diagonal.
In this time-independent situation, we have
1X X ∂xj ∂xj
T = Mk` ({q})q̇k q̇` , with Mk` ({q}) = mj , (1.12)
2 k` j ∂qk ∂q`
where Mk` is known as the mass matrix, and is always symmetric but not
necessarily diagonal or coordinate independent.
The mass matrix is independent of the ∂xj /∂t terms, and we can un-
derstand the results we just obtained for it in our two-dimensional example
7
It involves quadratic and lower order terms in the velocities, not just quadratic ones.
above,
M11 = m, M12 = M21 = 0, M22 = mr2 ,
by considering the case without rotation, ω = 0. We can also derive this
expression for the kinetic energy in nonrotating polar coordinates by ex-
pressing the velocity vector ~v = ṙêr + rθ̇êθ in terms of unit vectors in the
radial and tangential directions respectively. The coefficients of these unit
vectors can be understood graphically with geometric arguments. This leads
more quickly to ~v 2 = (ṙ)2 + r2 (θ̇)2 , T = 12 mṙ2 + 21 mr2 θ̇2 , and the mass matrix
follows. Similar geometric arguments are usually used to find the form of the
kinetic energy in spherical coordinates, but the formal approach of (1.12)
enables us to find the form even in situations where the geometry is difficult
to picture.
It is important to keep in mind that when we view T as a function of
coordinates and velocities, these are independent arguments evaluated at a
particular moment of time. Thus we can ask independently how T varies as
we change xi or as we change ẋi , each time holding the other variable fixed.
Thus the kinetic energy is not a function on the 3n-dimensional configuration
space, but on a larger, 6n-dimensional space8 with a point specifying both
the coordinates {qi } and the velocities {q̇i }.
1.4 Phase Space

If the trajectory of the system in configuration space, ~r(t), is known, the
velocity as a function of time, ~v (t) is also determined. As the mass of the
particle is simply a physical constant, the momentum p~ = m~v contains the
same information as the velocity. Viewed as functions of time, this gives
nothing beyond the information in the trajectory. But at any given time,
~r and p~ provide a complete set of initial conditions, while ~r alone does not.
We define phase space as the set of possible positions and momenta for
the system at some instant. Equivalently, it is the set of possible initial
conditions, or the set of possible motions obeying the equations of motion9 .
For a single particle in cartesian coordinates, the six coordinates of phase
8
This space is called the tangent bundle to configuration space. For cartesian coordi-
nates it is almost identical to phase space, which is in general the “cotangent bundle”
to configuration space.
9
As each initial condition gives rise to a unique future development of a trajectory,
there is an isomorphism between initial conditions and allowed trajectories.
1.4. PHASE SPACE 21
space are the three components of ~r and the three components of p~. At any
instant of time, the system is represented by a point in this space, called the
phase point, and that point moves with time according to the physical laws
of the system. These laws are embodied in the force function, which we now
consider as a function of p~ rather than ~v , in addition to ~r and t. We may
write these equations as
d~r p~
= ,
dt m
d~p
= F~ (~r, p~, t).
dt
Note that these are first order equations, which means that the motion of
the point representing the system in phase space is completely determined10
by where the phase point is. This is to be distinguished from the trajectory
in configuration space, where in order to know the trajectory you must have
not only an initial point (position) but also its initial time derivative.
1.4.1 Dynamical Systems

We have spoken of the coordinates of phase space for a single particle as ~r and
p~, but from a mathematical point of view these together give the coordinates
of the phase point in phase space. We might describe these coordinates in
terms of a six dimensional vector ~η = (r1 , r2 , r3 , p1 , p2 , p3 ). The physical laws
determine at each point a velocity function for the phase point as it moves
through phase space,
d~η
= V~ (~η , t), (1.13)
dt
which gives the velocity at which the phase point representing the system
moves through phase space. Only half of this velocity is the ordinary velocity,
while the other half represents the rapidity with which the momentum is
changing, i.e. the force. The path traced by the phase point as it travels
through phase space is called the phase curve.
For a system of n particles in three dimensions, the complete set of initial
conditions requires 3n spatial coordinates and 3n momenta, so phase space is
6n dimensional. While this certainly makes visualization difficult, the large
10
We will assume throughout that the force function is a well defined continuous function
of its arguments.
dimensionality is no hindrance for formal developments. Also, it is sometimes

possible to focus on particular dimensions, or to make generalizations of ideas
familiar in two and three dimensions. For example, in discussing integrable
systems (7.1), we will find that the motion of the phase point is confined
to a 3n-dimensional torus, a generalization of one and two dimensional tori,
which are circles and the surface of a donut respectively.
Thus for a system composed of a finite number of particles, the dynamics
is determined by the first order ordinary differential equation (1.13), formally
a very simple equation. All of the complication of the physical situation is
hidden in the large dimensionality of the dependent variable ~η and in the
functional dependence of the velocity function V (~η , t) on it.
There are other systems besides Newtonian mechanics which are con-
trolled by equation (1.13), with a suitable velocity function. Collectively
these are known as dynamical systems. For example, individuals of an
asexual mutually hostile species might have a fixed birth rate b and a death
rate proportional to the population, so the population would obey the logis-
tic equation11 dp/dt = bp−cp2 , a dynamical system with a one-dimensional
space for its dependent variable. The populations of three competing species
could be described by eq. (1.13) with ~η in three dimensions.
The dimensionality d of ~η in (1.13) is called the order of the dynamical
system. A d’th order differential equation in one independent variable may
always be recast as a first order differential equation in d variables, so it is one
example of a d’th order dynamical system. The space of these dependent vari-
ables is called the phase space of the dynamical system. Newtonian systems
always give rise to an even-order system, because each spatial coordinate is
paired with a momentum. For n particles unconstrained in D dimensions, the
order of the dynamical system is d = 2nD. Even for constrained Newtonian
systems, there is always a pairing of coordinates and momenta, which gives
a restricting structure, called the symplectic structure12 , on phase space.
If the force function does not depend explicitly on time, we say the system
is autonomous. The velocity function has no explicit dependance on time,
V~ = V~ (~η ), and is a time-independent vector field on phase space, which we
can indicate by arrows just as we might the electric field in ordinary space,
or the velocity field of a fluid in motion. This gives a visual indication of
11
This is not to be confused with the simpler logistic map, which is a recursion relation
with the same form but with solutions displaying a very different behavior.
12
This will be discussed in sections (6.3) and (6.6).
1.4. PHASE SPACE 23
the motion of the system’s point. For example, consider a damped harmonic
oscillator with F~ = −kx − αp, for which the velocity function is
!
dx dp p

, = , −kx − αp .
dt dt m
A plot of this field for the undamped (α = 0) and damped oscillators is

p p
x x
Undamped Damped
Figure 1.1: Velocity field for undamped and damped harmonic oscillators,
and one possible phase curve for each system through phase space.
shown in Figure 1.1. The velocity field is everywhere tangent to any possible
path, one of which is shown for each case. Note that qualitative features of
the motion can be seen from the velocity field without any solving of the
differential equations; it is clear that in the damped case the path of the
system must spiral in toward the origin.
The paths taken by possible physical motions through the phase space of
an autonomous system have an important property. Because the rate and
direction with which the phase point moves away from a given point of phase
space is completely determined by the velocity function at that point, if the
system ever returns to a point it must move away from that point exactly as
it did the last time. That is, if the system at time T returns to a point in
phase space that it was at at time t = 0, then its subsequent motion must be
just as it was, so ~η (T + t) = ~η (t), and the motion is periodic with period
T . This almost implies that the phase curve the object takes through phase
space must be nonintersecting13 .
In the non-autonomous case, where the velocity field is time dependent,
it may be preferable to think in terms of extended phase space, a 6n + 1
13
An exception can occur at an unstable equilibrium point, where the velocity function
vanishes. The motion can just end at such a point, and several possible phase curves can
terminate at that point.
dimensional space with coordinates (~η , t). The velocity field can be extended
to this space by giving each vector a last component of 1, as dt/dt = 1. Then
the motion of the system is relentlessly upwards in this direction, though
still complex in the others. For the undamped one-dimensional harmonic
oscillator, the path is a helix in the three dimensional extended phase space.
Most of this book is devoted to finding analytic methods for exploring the
motion of a system. In several cases we will be able to find exact analytic
solutions, but it should be noted that these exactly solvable problems, while
very important, cover only a small set of real problems. It is therefore impor-
tant to have methods other than searching for analytic solutions to deal with
dynamical systems. Phase space provides one method for finding qualitative
information about the solutions. Another approach is numerical. Newton’s
Law, and more generally the equation (1.13) for a dynamical system, is a set
of ordinary differential equations for the evolution of the system’s position
in phase space. Thus it is always subject to numerical solution given an
initial configuration, at least up until such point that some singularity in the
velocity function is reached. One primitive technique which will work for all
such systems is to choose a small time interval of length ∆t, and use d~η /dt at
the beginning of each interval to approximate ∆~η during this interval. This
gives a new approximate value for ~η at the end of this interval, which may
then be taken as the beginning of the next.14
14
This is a very unsophisticated method. The errors made in each step for ∆~r and ∆~ p
are typically O(∆t)2 . As any calculation of the evolution from time t0 to tf will involve
a number ([tf − t0 ]/∆t) of time steps which grows inversely to ∆t, the cumulative error
can be expected to be O(∆t). In principle therefore we can approach exact results for a
finite time evolution by taking smaller and smaller time steps, but in practise there are
other considerations, such as computer time and roundoff errors, which argue strongly in
favor of using more sophisticated numerical techniques, with errors of higher order in ∆t.
Increasingly sophisticated methods can be generated which give cumulative errors of order
O((∆t)n ), for any n. A very common technique is called fourth-order Runge-Kutta, which
gives an error O((∆t)5 ). These methods can be found in any text on numerical methods.
1.4. PHASE SPACE 25
As an example, we show the

meat of a calculation for the
damped harmonic oscillator. This while (t < tf) {
same technique will work even with dx = (p/m) * dt;
a very complicated situation. One dp = -(k*x+alpha*p)*dt;
need only add lines for all the com- x = x + dx;
ponents of the position and mo- p = p + dp;
mentum, and change the force law t = t + dt;
appropriately. print t, x, p;
This is not to say that numeri- }
cal solution is a good way to solve
this problem. An analytical solu- Integrating the motion, for a
tion, if it can be found, is almost damped harmonic oscillator.
always preferable, because
• It is far more likely to provide insight into the qualitative features of

the motion.
• Numerical solutions must be done separately for each value of the pa-
rameters (k, m, α) and each value of the initial conditions (x0 and p0 ).
• Numerical solutions have subtle numerical problems in that they are

only exact as ∆t → 0, and only if the computations are done ex-
actly. Sometimes uncontrolled approximate solutions lead to surpris-
ingly large errors.
Nonetheless, numerical solutions are often the only way to handle a real prob-
lem, and there has been extensive development of techniques for efficiently
and accurately handling the problem, which is essentially one of solving a
system of first order ordinary differential equations.
1.4.2 Phase Space Flows

As we just saw, Newton’s equations for a system of particles can be cast in
the form of a set of first order ordinary differential equations in time on phase
space, with the motion in phase space described by the velocity field. This
could be more generally discussed as a d’th order dynamical system, with a
phase point representing the system in a d-dimensional phase space, moving
with time t along the velocity field, sweeping out a path in phase space called
the phase curve. The phase point ~η (t) is also called the state of the system
at time t. Many qualitative features of the motion can be stated in terms of
the phase curve.
Fixed Points
There may be points ~ηk , known as fixed points, at which the velocity func-
tion vanishes, V~ (~ηk ) = 0. This is a point of equilibrium for the system, for if
the system is at a fixed point at one moment, ~η (t0 ) = ~ηk , it remains at that
point. At other points, the system does not stay put, but there may be sets
of states which flow into each other, such as the elliptical orbit for the un-
damped harmonic oscillator. These are called invariant sets of states. In
a first order dynamical system15 , the fixed points divide the line into intervals
which are invariant sets.
Even though a first-order system is smaller than any Newtonian system, it
is worthwhile discussing briefly the phase flow there. We have been assuming
the velocity function is a smooth function — generically its zeros will be first
order, and near the fixed point η0 we will have V (η) ≈ c(η − η0 ). If the
constant c < 0, dη/dt will have the opposite sign from η − η0 , and the system
will flow towards the fixed point, which is therefore called stable. On the
other hand, if c > 0, the displacement η − η0 will grow with time, and the
fixed point is unstable. Of course there are other possibilities: if V (η) = cη 2 ,
the fixed point η = 0 is stable from the left and unstable from the right. But
this kind of situation is somewhat artificial, and such a system is structually
unstable. What that means is that if the velocity field is perturbed by a
small smooth variation V (η) → V (η) + w(η), for some bounded smooth
function w, the fixed point at η = 0 is likely to either disappear or split
into two fixed points, whereas the fixed points discussed earlier will simply
be shifted by order in position and will retain their stability or instability.
Thus the simple zero in the velocity function is structurally stable. Note
that structual stability is quite a different notion from stability of the fixed
point.
In this discussion of stability in first order dynamical systems, we see that
generically the stable fixed points occur where the velocity function decreases
through zero, while the unstable points are where it increases through zero.
15
Note that this is not a one-dimensional Newtonian system, which is a two dimensional
~η = (x, p) dynamical system.
1.4. PHASE SPACE 27
Thus generically the fixed points will alternate in stability, dividing the phase
line into open intervals which are each invariant sets of states, with the points
in a given interval flowing either to the left or to the right, but never leaving
the open interval. The state never reaches the stable fixed point because the
time t = dη/V (η) ≈ (1/c) dη/(η − η0 ) diverges. On the other hand, in
R R
the case V (η) = cη 2 , a system starting at η0 at t = 0 has a motion given by

η = (η0−1 − ct)−1 , which runs off to infinity as t → 1/η0 c. Thus the solution
terminates at t = 1/η0 c, and makes no sense thereafter. This form of solution
is called terminating motion.
For higher order dynamical systems, the d equations Vi (~η ) = 0 required
for a fixed point will generically determine the d variables ηj , so the generic
form of the velocity field near a fixed point η0 is Vi (~η ) = j Mij (ηj − η0j )
P
with a nonsingular matrix M . The stability of the flow will be determined

by this d-dimensional square matrix M . Generically the eigenvalue equation,
a d’th order polynomial in λ, will have d distinct solutions. Because M
is a real matrix, the eigenvalues must either be real or come in complex
conjugate pairs. For the real case, whether the eigenvalue is positive or
negative determines the instability or stability of the flow along the direction
of the eigenvector. For a pair of complex conjugate eigenvalues λ = u + iv
and λ∗ = u − iv, with eigenvectors ~e and ~e ∗ respectively, we may describe
the flow in the plane δ~η = ~η − ~η0 = x(~e + ~e ∗ ) + iy(~e − ~e ∗ ), so
~η˙ = M · δ~η = x(λ~e + λ∗~e ∗ ) + iy(λ~e − λ∗~e ∗ )

= (ux − vy)(~e + ~e ∗ ) + (vx + uy)(~e − ~e ∗ )
so
ẋ u −v x x = Aeut cos(vt + φ)

= , or .
ẏ v u y y = Aeut sin(vt + φ)
Thus we see that the motion spirals in towards the fixed point if u is negative,
and spirals away from the fixed point if u is positive. Stability in these
directions is determined by the sign of the real part of the eigenvalue.
In general, then, stability in each subspace around the fixed point ~η0
depends on the sign of the real part of the eigenvalue. If all the real parts
are negative, the system will flow from anywhere in some neighborhood of
~η0 towards the fixed point, so limt→∞ ~η (t) = ~η0 provided we start in that
neighborhood. Then ~η0 is an attractor and is a strongly stable fixed point.
On the other hand, if some of the eigenvalues have positive real parts, there
are unstable directions. Starting from a generic point in any neighborhood
of ~η0 , the motion will eventually flow out along an unstable direction, and
the fixed point is considered unstable, although there may be subspaces
along which the flow may be into ~η0 . An example is the line x = y in the
hyperbolic fixed point case shown in Figure 1.2.
Some examples of two dimensional flows in the neighborhood of a generic
fixed point are shown in Figure 1.2. Note that none of these describe the
fixed point of the undamped harmonic oscillator of Figure 1.1. We have
discussed generic situations as if the velocity field were chosen arbitrarily
from the set of all smooth vector functions, but in fact Newtonian mechanics
imposes constraints on the velocity fields in many situations, in particular if
there are conserved quantities.
ẋ = −x + y, ẋ = −3x − y, ẋ = 3x + y, ẋ = −x − 3y,
ẏ = −2x − y. ẏ = −x − 3y. ẏ = x + 3y. ẏ = −3x − y.
Strongly stable Strongly stable Unstable fixed Hyperbolic fixed

spiral point. fixed point, point, point,
√
λ = −1 ± 2i. λ = −1, −2. λ = 1, 2. λ = −2, 1.
Figure 1.2: Four generic fixed points for a second order dynamical system.
Effect of conserved quantities on the flow

If the system has a conserved quantity Q(q, p) which is a function on phase
space only, and not of time, the flow in phase space is considerably changed.
This is because the equations Q(q, p) = K gives a set of subsurfaces or
contours in phase space, and the system is confined to stay on whichever
contour it is on initially. Unless this conserved quantity is a trivial function,
1.4. PHASE SPACE 29
i.e. constant, in the vicinity of a fixed point, it is not possible for all points
to flow into the fixed point, and thus it is not strongly stable.
For the case of a single particle in a potential, the total energy E =

2
p /2m + U (~r) is conserved, and so the motion of the system is confined to
one surface of a given energy. As p~/m is part of the velocity function, a
fixed point must have p~ = 0. The vanishing of the other half of the velocity
field gives ∇U (~r0 ) = 0, which is the condition for a stationary point of the
potential energy, and for the force to vanish. If this point is a maximum or
a saddle of U , the motion along a descending path will be unstable. If the
fixed point is a minimum of the potential, the region E(~r, p~) < E(~r0 , 0) + ,
for sufficiently small , gives a neighborhood around ~η0 = (~r0 , 0) to which the
motion is confined if it starts within this region. Such a fixed point is called
stable16 , but it is not strongly stable, as the flow does not settle down to ~η0 .
This is the situation we saw for the undamped harmonic oscillator. For that
situation F = −kx, so the potential energy may be taken to be
Z 0
1
U (x) = −kx dx = kx2 ,
x 2
and so the total energy E = p2 /2m + 21 kx2 is conserved. The curves of

constant E in phase space are ellipses, and each motion orbits the appropriate
ellipse, as shown in Fig. 1.1 for the undamped oscillator. This contrasts to
the case of the damped oscillator, for which there is no conserved energy, and
for which the origin is a strongly stable fixed point.
16
A fixed point is stable if it is in arbitrarity small neighborhoods, each with the
property that if the system is in that neighborhood at one time, it remains in it at all later
times.
As an example of a con-
servative system with both
stable and unstable fixed
points, consider a particle in 0.3
U
one dimension with a cubic 0.2 U(x)
2 3
potential U (x) = ax − bx , 0.1
as shown in Fig. 1.3. There 0
-0.4 -0.2 0.2 0.4 0.6 0.8 1 1.2
is a stable equilibrium at x
-0.1
xs = 0 and an unstable one
-0.2
at xu = 2a/3b. Each has an
-0.3
associated fixed point in phase
space, an elliptic fixed point
p
ηs = (xs , 0) and a hyperbolic 1
fixed point ηu = (xu , 0). The
velocity field in phase space
and several possible orbits
are shown. Near the stable
equilibrium, the trajectories x
are approximately ellipses, as
they were for the harmonic os-
cillator, but for larger energies -1
they begin to feel the asym-
metry of the potential, and Figure 1.3. Motion in a cubic poten-
the orbits become egg-shaped. tial.
If the system has total energy precisely U (xu ), the contour line crosses
itself. This contour actually consists of three separate orbits. One starts at
t → −∞ at x = xu , completes one trip though the potential well, and returns
as t → +∞ to x = xu . The other two are orbits which go from x = xu to
x = ∞, one incoming and one outgoing. For E > U (xu ), all the orbits start
and end at x = +∞. Note that generically the orbits deform continuously
as the energy varies, but at E = U (xu ) this is not the case — the character
of the orbit changes as E passes through U (xu ). An orbit with this critical
value of the energy is called a separatrix, as it separates regions in phase
space where the orbits have different qualitative characteristics.
Quite generally hyperbolic fixed points are at the ends of separatrices. In
our case the contour E = U (xu ) consists of four invariant sets of states, one
of which is the point ηu itself, and the other three are the orbits which are
1.4. PHASE SPACE 31
the disconnected pieces left of the contour after removing ηu .
Exercises
1.1 (a) Find the potential energy function U (~r) for a particle in the gravita-
tional field of the Earth, for which the force law is F~ (~r) = −GME m~r/r3 .
(b) Find the escape velocity from the Earth, that is, the minimum velocity a
particle near the surface can have for which it is possible that the particle will
eventually coast to arbitrarily large distances without being acted upon by any
force other than gravity. The Earth has a mass of 6.0 × 1024 kg and a radius of
6.4 × 106 m. Newton’s gravitational constant is 6.67 × 10−11 N · m2 /kg2 .
1.2 In the discussion of a system of particles, it is important that the particles

included in the system remain the same. There are some situations in which we
wish to focus our attention on a set of particles which changes with time, such as
a rocket ship which is emitting gas continuously. The equation of motion for such
a problem may be derived by considering an infinitesimal time interval, [t, t + ∆t],
and choosing the system to be the rocket with the fuel still in it at time t, so that
at time t + ∆t the system consists of the rocket with its remaining fuel and also
the small amount of fuel emitted during the infinitesimal time interval.
Let M (t) be the mass of the rocket and remaining fuel at time t, assume that the
fuel is emitted with velocity ~u with respect to the rocket, and call the velocity
of the rocket ~v (t) in an inertial coordinate system. If the external force on the
rocket is F~ (t) and the external force on the infinitesimal amount of exhaust is
infinitesimal, the fact that F (t) is the rate of change of the total momentum gives
the equation of motion for the rocket.
(a) Show that this equation is
d~v dM
M = F~ (t) + ~u .
dt dt
(b) Suppose the rocket is in a constant gravitational field F~ = −M gêz for the
period during which it is burning fuel, and that it is fired straight up with constant
exhaust velocity (~u = −uêz ), starting from rest. Find v(t) in terms of t and M (t).
(c) Find the maximum fraction of the initial mass of the rocket which can escape
the Earth’s gravitational field if u = 2000m/s.
1.3 For a particle in two dimensions, we might use polar coordinates (r, θ) and
use basis unit vectors êr and êθ in the radial and tangent directions respectively to
describe more general vectors. Because this pair of unit vectors differ from point
to point, the êr and êθ along the trajectory of a moving particle are themselves
changing with time.
(a) Show that
d d
êr = θ̇êθ , êθ = −θ̇êr .
dt dt
(b) Thus show that the derivative of ~r = rêr is
~v = ṙêr + rθ̇êθ ,
which verifies the discussion of Sec. (1.3.4).

(c) Show that the derivative of the velocity is
d
~a = ~v = (r̈ − rθ̇2 )êr + (rθ̈ + 2ṙθ̇)êθ .
dt
(d) Thus Newton’s Law says for the radial and tangential components of the
force are Fr = êr · F = m(r̈ − rθ̇2 ), Fθ = êθ · F = m(rθ̈ + 2ṙθ̇). Show that the
generalized forces are Qr = Fr and Qθ = rFθ .
1.4 Analyze the errors in the integration of Newton’s Laws in the simple Euler’s
approach described in section 1.4.1, where we approximated the change for x and p
in each time interval ∆t between ti and ti+1 by ẋ(t) ≈ ẋ(ti ), ṗ(t) ≈ F (x(ti ), v(ti )).
Assuming F to be differentiable, show that the error which accumulates in a finite
time interval T is of order (∆t)1 .
1.5 Write a simple program to integrate the equation of the harmonic oscillator
through one period of oscillation, using Euler’s method with a step size ∆t. Do
this for several ∆t, and see whether the error accumulated in one period meets the
expectations of problem 1.4.
1.6 Describe the one dimensional phase space for the logistic equation ṗ = bp −
cp2 , with b > 0, c > 0. Give the fixed points, the invariant sets of states, and
describe the flow on each of the invariant sets.
1.7 Consider a pendulum consisting of a mass at the end of a massless rod of

length L, the other end of which is fixed but free to rotate. Ignore one of the
horizontal directions, and describe the dynamics in terms of the angle θ between
the rod and the downwards direction, without making a small angle approximation.
(a) Find the generalized force Qθ and find the conserved quantity on phase space.
(b) Give a sketch of the velocity function, including all the regions of phase
space. Show all fixed points, separatrices, and describe all the invariant sets of
states. [Note: the variable θ is defined only modulo 2π, so the phase space is the
1.4. PHASE SPACE 33
Cartesian product of an interval of length 2π in θ with the real line for pθ . This
can be plotted on a strip, with the understanding that the left and right edges are
identified. To avoid having important points on the boundary, it would be well to
plot this with θ ∈ [−π/2, 3π/2].
1.8 Consider again the pendulum of mass m on a massless rod of length L,
with motion restricted to a fixed vertical plane, with θ, the angle made with the
downward direction, the generalized coordinate. Using the fact that the energy E
is a constant,
(a) Find dθ/dt as a function of θ.
(b) Assuming the energy is such that the mass comes to rest at θ = ±θ0 , find an
integral expression for the period
q of the pendulum.
(c) Show that the answer is 4 Lg K(sin2 (θ0 /2), where
Z π/2
dφ
K(m) := q
0 1 − m sin2 φ
is the complete elliptic integral of the first kind.
(Note: the circumference of an ellipse is 4aK(e2 ), where a is the semi-major axis
and e the eccentricity.)
(d) Show that K(m) is given by the power series expansion
∞
πX (2n − 1)!! 2 n

K(m) = m ,
2 n=0 (2n)!!
and give an estimate for the ratio of the period for θ0 = 60◦ to that for small
angles.
1.9 As mentioned in the footnote in section 1.3, a current i1 flowing through a
wire segment d~s1 at ~s1 exerts a force
µ0 d~s2 × (d~s1 × ~r )
F~12 = i1 i2
4π |r|3
on a current i2 flowing through a wire segment d~s2 at ~s2 , where ~r = ~s2 − ~s1 .
(a) Show, as stated in that footnote, that the sum of this force and its Newtonian
reaction force is
µ0 i1 i2
F~12 + F~21 = [d~s1 (d~s2 · ~r) − d~s2 (d~s1 · ~r)] ,
4π |r|3
which is not generally zero. HH
(b) Show that if the currents each flow around closed loops, the total force F12 +
F21 vanishes.
[Note: Eq. (A.7) of appendix (A.1) may be useful, along with Stokes’ theorem.]
Chapter 2
Lagrange’s and Hamilton’s

Equations
In this chapter, we consider two reformulations of Newtonian mechanics, the

Lagrangian and the Hamiltonian formalism. The first is naturally associated
with configuration space, extended by time, while the latter is the natural
description for working in phase space.
Lagrange developed his approach in 1764 in a study of the libration of

the moon, but it is best thought of as a general method of treating dynamics
in terms of generalized coordinates for configuration space. It so transcends
its origin that the Lagrangian is considered the fundamental object which
describes a quantum field theory.
Hamilton’s approach arose in 1835 in his unification of the language of

optics and mechanics. It too had a usefulness far beyond its origin, and
the Hamiltonian is now most familiar as the operator in quantum mechanics
which determines the evolution in time of the wave function.
We begin by deriving Lagrange’s equation as a simple change of coordi-

nates in an unconstrained system, one which is evolving according to New-
ton’s laws with force laws given by some potential. Lagrangian mechanics
is also and especially useful in the presence of constraints, so we will then
extend the formalism to this more general situation.
35
36 CHAPTER 2. LAGRANGE’S AND HAMILTON’S EQUATIONS
2.1 Lagrangian for unconstrained systems

For a collection of particles with conservative forces described by a potential,
we have in inertial cartesian coordinates
mẍi = Fi .
The left hand side of this equation is determined by the kinetic energy func-
tion as the time derivative of the momentum pi = ∂T /∂ ẋi , while the right
hand side is a derivative of the potential energy, −∂U/∂xi . As T is indepen-
dent of xi and U is independent of ẋi in these coordinates, we can write both
sides in terms of the Lagrangian L = T − U , which is then a function of
both the coordinates and their velocities. Thus we have established
d ∂L ∂L
− = 0,
dt ∂ ẋi ∂xi
which, once we generalize it to arbitrary coordinates, will be known as La-
grange’s equation. Note that we are treating L as a function of the 2N
independent variables xi and ẋi , so that ∂L/∂ ẋi means vary one ẋi holding
all the other ẋj and all the xk fixed. Making this particular combination
of T (~r˙) with U (~r) to get the more complicated L(~r, ~r˙) seems an artificial
construction for the inertial cartesian coordinates, but it has the advantage
of preserving the form of Lagrange’s equations for any set of generalized
coordinates.
As we did in section 1.3.3, we assume we have a set of generalized coor-
dinates {qj } which parameterize all of coordinate space, so that each point
may be described by the {qj } or by the {xi }, i, j ∈ [1, N ], and thus each set
may be thought of as a function of the other, and time:
qj = qj (x1 , ...xN , t) xi = xi (q1 , ...qN , t). (2.1)
We may consider L as a function1 of the generalized coordinates qj and q̇j ,

1
Of course we are not saying that L(x, ẋ, t) is the same function of its coordinates as
L(q, q̇, t), but rather that these are two functions which agree at the corresponding physical
points. More precisely, we are defining a new function L̃(q, q̇, t) = L(x(q, t), ẋ(q, q̇, t), t),
but we are being physicists and neglecting the tilde. We are treating the Lagrangian here
as a scalar under coordinate transformations, in the sense used in general relativity, that
its value at a given physical point is unchanged by changing the coordinate system used
to define that point.
2.1. LAGRANGIAN FOR UNCONSTRAINED SYSTEMS 37
and ask whether the same expression in these coordinates

d ∂L ∂L
−
dt ∂ q̇j ∂qj
also vanishes. The chain rule tells us

∂L X ∂L ∂qk X ∂L ∂ q̇k
= + . (2.2)
∂ ẋj k ∂q k ∂ ẋ j k ∂ q̇ k ∂ ẋ j
The first term vanishes because qk depends only on the coordinates xk and
t, but not on the ẋk . From the inverse relation to (1.10),
X ∂qj ∂qj
q̇j = ẋi + , (2.3)
i ∂xi ∂t
we have
∂ q̇j ∂qj
= .
∂ ẋi ∂xi
Using this in (2.2),
∂L X ∂L ∂qj
= . (2.4)
∂ ẋi j ∂ q̇j ∂xi
Lagrange’s equation involves the time derivative of this. Here what is

meant is not a partial derivative ∂/∂t, holding the point in configuration
space fixed, but rather the derivative along the path which the system takes as
it moves through configuration space. It is called the stream derivative, a
name which comes from fluid mechanics, where it gives the rate at which some
property defined throughout the fluid, f (~r, t), changes for a fixed element of
fluid as the fluid as a whole flows. We write it as a total derivative to indicate
that we are following the motion rather than evaluating the rate of change
at a fixed point in space, as the partial derivative does.
For any function f (x, t) of extended configuration space, this total time
derivative is
df X ∂f ∂f
= ẋj + . (2.5)
dt j ∂xj ∂t
Using Leibnitz’ rule on (2.4) and using (2.5) in the second term, we find
∂qj X ∂L X ∂ 2 qj ∂ 2 qj
! !
d ∂L X d ∂L
= + ẋk + . (2.6)
dt ∂ ẋi j dt ∂ q̇j ∂xi j ∂ q̇j k ∂xi ∂xk ∂xi ∂t
On the other hand, the chain rule also tells us
∂L X ∂L ∂qj X ∂L ∂ q̇j
= + ,
∂xi j ∂qj ∂xi j ∂ q̇j ∂xi
where the last term does not necessarily vanish, as q̇j in general depends on
both the coordinates and velocities. In fact, from 2.3,
∂ q̇j X ∂ 2 qj ∂ 2 qj
= ẋk + ,
∂xi k ∂xi ∂xk ∂xi ∂t
so
X ∂L X ∂ 2 qj ∂ 2 qj
!
∂L X ∂L ∂qj
= + ẋk + . (2.7)
∂xi j ∂qj ∂xi j ∂ q̇j k ∂xi ∂xk ∂xi ∂t
Lagrange’s equation in cartesian coordinates says (2.6) and (2.7) are equal,
and in subtracting them the second terms cancel2 , so
!
X d ∂L ∂L ∂qj
0 = − .
j dt ∂ q̇j ∂qj ∂xi
The matrix ∂qj /∂xi is nonsingular, as it has ∂xi /∂qj as its inverse, so we
have derived Lagrange’s Equation in generalized coordinates:
d ∂L ∂L
− = 0.
dt ∂ q̇j ∂qj
Thus we see that Lagrange’s equations are form invariant under changes of
the generalized coordinates used to describe the configuration of the system.
It is primarily for this reason that this particular and peculiar combination
of kinetic and potential energy is useful. Note that we implicity assume the
Lagrangian itself transformed like a scalar, in that its value at a given phys-
ical point of configuration space is independent of the choice of generalized
coordinates that describe the point. The change of coordinates itself (2.1) is
called a point transformation.
2
This is why we chose the particular combination we did for the Lagrangian, rather
than L = T − αU for some α 6= 1. Had we done so, Lagrange’s equation in cartesian
coordinates would have been α d(∂L/∂ ẋj )/dt − ∂L/∂xj = 0, and in the subtraction of
(2.7) from α×(2.6), the terms proportional to ∂L/∂ q̇i (without a time derivative) would
not have cancelled.
2.2. LAGRANGIAN FOR CONSTRAINED SYSTEMS 39
2.2 Lagrangian for Constrained Systems

We now wish to generalize our discussion to include contraints. At the same
time we will also consider possibly nonconservative forces. As we mentioned
in section 1.3.2, we often have a system with internal forces whose effect is
better understood than the forces themselves, with which we may not be
concerned. We will assume the constraints are holonomic, expressible as k
real functions Φα (~r1 , ..., ~rn , t) = 0, which are somehow enforced by constraint
forces F~iC on the particles {i}. There may also be other forces, which we
will call FiD and will treat as having a dynamical effect. These are given by
known functions of the configuration and time, possibly but not necessarily
in terms of a potential.
This distinction will seem artificial without examples, so it would be well
to keep these two in mind. In each of these cases the full configuration
space is R3 , but the constraints restrict the motion to an allowed subspace
of extended configuration space.
1. In section 1.3.2 we discussed a mass on a light rigid rod, the other end
of which is fixed at the origin. Thus the mass is constrained to have
|~r| = L, and the allowed subspace of configuration space is the surface
of a sphere, independent of time. The rod exerts the constraint force
to avoid compression or expansion. The natural assumption to make is
that the force is in the radial direction, and therefore has no component
in the direction of allowed motions, the tangential directions. That is,
for all allowed displacements, δ~r, we have F~ C ·δ~r = 0, and the constraint
force does no work.
2. Consider a bead free to slide without friction on the spoke of a rotating

bicycle wheel3 , rotating about a fixed axis at fixed angular velocity ω.
That is, for the polar angle θ of inertial coordinates, Φ := θ − ωt = 0 is
a constraint4 , but the r coordinate is unconstrained. Here the allowed
subspace is not time independent, but is a helical sort of structure in
extended configuration space. We expect the force exerted by the spoke
on the bead to be in the êθ direction. This is again perpendicular to
any virtual displacement, by which we mean an allowed change in
3
Unlike a real bicycle wheel, we are assuming here that the spoke is directly along a
radius of the circle, pointing directly to the axle.
4
There is also a constraint z = 0.
configuration at a fixed time. It is important to distinguish this virtual

displacement from a small segment of the trajectory of the particle. In
this case a virtual displacement is a change in r without a change in θ,
and is perpendicular to êθ . So again, we have the “net virtual work” of
the constraint forces is zero. It is important to note that this does not
mean that the net real work is zero. In a small time interval, the dis-
placement ∆~r includes a component rω∆t in the tangential direction,
and the force of constraint does do work!
We will assume that the constraint forces in general satisfy this restriction
that no net virtual work is done by the forces of constraint for any possible
virtual displacement. Newton’s law tells us that p~˙i = Fi = FiC + FiD . We
can multiply by an arbitrary virtual displacement
X
F~iD − p~˙i · δ~ri = − F~iC · δ~ri = 0,
X
i i
where the first equality would be true even if δ~ri did not satisfy the con-
straints, but the second requires δ~ri to be an allowed virtual displacement.
Thus X
F~iD − p~˙i · δ~ri = 0, (2.8)
i
which is known as D’Alembert’s Principle. This gives an equation which
determines the motion on the constrained subspace and does not involve the
unspecified forces of constraint F C . We drop the superscript D from now on.
Suppose we know generalized coordinates q1 , . . . , qN which parameterize
the constrained subspace, which means ~ri = ~ri (q1 , . . . , qN , t), for i = 1, . . . , n,
are known functions and the N q’s are independent. There are N = 3n −
k of these independent coordinates, where k is the number of holonomic
constraints. Then ∂~ri /∂qj is no longer an invertable, or even square, matrix,
but we still have
X ∂~ ri ∂~ri
∆~ri = ∆qj + ∆t.
j ∂qj ∂t
For the velocity of the particle, divide this by ∆t, giving
X ∂~ri ∂~ri
~vi = q̇j + , (2.9)
j ∂qj ∂t
but for a virtual displacement ∆t = 0 we have
X ∂~ri
δ~ri = δqj .
j ∂qj
Differentiating (2.9) we note that,
∂~vi ∂~ri
= , (2.10)
∂ q̇j ∂qj
and also
∂~vi X ∂ 2~ri ∂ 2~ri d ∂~ri
= q̇k + = , (2.11)
∂qj k ∂qj ∂qk ∂qj ∂t dt ∂qj
where the last equality comes from applying (2.5), with coordinates qj rather
than xj , to f = ∂~ri /∂qj . The first term in the equation (2.8) stating
D’Alembert’s principle is
∂~ri
F~i · δ~ri = F~i ·
X XX X
δqj = Qj · δqj .
i j i ∂qj j
The generalized force Qj has the same form as in the unconstrained case, as
given by (1.9), but there are only as many of them as there are unconstrained
degrees of freedom.
The second term of (2.8) involves
dpi ∂~ri
p~˙i · δ~ri =
X X
δqj
i i dt ∂qj
! !
X d X ∂~ri X d ∂~ri
= p~i · δqj − pi · δqj
j dt i ∂qj ij dt ∂qj
!
X d X ∂~vi X ∂~vi
= p~i · δqj − pi · δqj
j dt i ∂ q̇j ij ∂qj
" #
X d X ∂~vi X ∂~vi
= mi~vi · − mi vi · δqj
j dt i ∂ q̇j i ∂qj
" #
X d ∂T ∂T
= − δqj ,
j dt ∂ q̇j ∂qj
where we used (2.10) and (2.11) to get the third line. Plugging in the ex-
pressions we have found for the two terms in D’Alembert’s Principle,
" #
X d ∂T ∂T
− − Qj δqj = 0.
j dt ∂ q̇j ∂qj
We assumed we had a holonomic system and the q’s were all independent,
so this equation holds for arbitrary virtual displacements δqj , and therefore
d ∂T ∂T
− − Qj = 0. (2.12)
dt ∂ q̇j ∂qj
Now let us restrict ourselves to forces given by a potential, with F~i =

~ i U ({~r}, t), or
−∇

X ∂~ri ~ ∂ Ũ ({q}, t)
Qj = − · ∇i U = − .
i ∂qj ∂qj
t
Notice that Qj depends only on the value of U on the constrained surface.

Also, U is independent of the q̇i ’s, so
d ∂T ∂T ∂U d ∂(T − U ) ∂(T − U )
− + =0= − ,
dt ∂ q̇j ∂qj ∂qj dt ∂ q̇j ∂qj
or
d ∂L ∂L
− = 0. (2.13)
dt ∂ q̇j ∂qj
This is Lagrange’s equation, which we have now derived in the more general
context of constrained systems.
2.2.1 Some examples of the use of Lagrangians

Atwood’s machine
Atwood’s machine consists of two blocks of mass m1 and m2 attached by an
inextensible cord which suspends them from a pulley of moment of inertia I
with frictionless bearings. The kinetic energy is
1 1 1
T = m1 ẋ2 + m2 ẋ2 + Iω 2
2 2 2
U = m1 gx + m2 g(K − x) = (m1 − m2 )gx + const
where we have used the fixed length of the cord to conclude that the sum of
the heights of the masses is a constant K. We assume the cord does not slip
on the pulley, so the angular velocity of the pulley is ω = ẋ/r, and
1
L = (m1 + m2 + I/r2 )ẋ2 + (m2 − m1 )gx,
2
and Lagrange’s equation gives
d ∂L ∂L
− = 0 = (m1 + m2 + I/r2 )ẍ − (m2 − m1 )g.
dt ∂ ẋ ∂x
Notice that we set up our system in terms of only one degree of freedom, the
height of the first mass. This one degree of freedom parameterizes the line
which is the allowed subspace of the unconstrained configuration space, a
three dimensional space which also has directions corresponding to the angle
of the pulley and the height of the second mass. The constraints restrict
these three variables because the string has a fixed length and does not slip
on the pulley. Note that this formalism has permitted us to solve the problem
without solving for the forces of constraint, which in this case are the tensions
in the cord on either side of the pulley.
Bead on spoke of wheel
As a second example, reconsider the bead on the spoke of a rotating bicycle

wheel. In section (1.3.4) we saw that the kinetic energy is T = 21 mṙ2 +
1
2
mr2 ω 2 . If there are no forces other than the constraint forces, U (r, θ) ≡ 0,
and the Lagrangian is
1 1
L = mṙ2 + mr2 ω 2 .
2 2
The equation of motion for the one degree of freedom is easy enough:
d ∂L ∂L
= mr̈ = = mrω 2 ,
dt ∂ ṙ ∂r
which looks like a harmonic oscillator with a negative spring constant, so the
solution is a real exponential instead of oscillating,
r(t) = Ae−ωt + Beωt .
The velocity-independent term in T acts just like a potential would, and can
in fact be considered the potential for the centrifugal force. But we see that
the total energy T is not conserved but blows up as t → ∞, T ∼ mB 2 ω 2 e2ωt .
This is because the force of constraint, while it does no virtual work, does do
real work.
Mass on end of gimballed rod

Finally, let us consider the mass on the end of the gimballed rod. The
allowed subspace is the surface of a sphere, which can be parameterized by
an azimuthal angle φ and the polar angle with the upwards direction, θ, in
terms of which
z = ` cos θ, x = ` sin θ cos φ, y = ` sin θ sin φ,
and T = 21 m`2 (θ̇2 + sin2 θφ̇2 ). With an arbitrary potential U (θ, φ), the La-
grangian becomes
1
L = m`2 (θ̇2 + sin2 θφ̇2 ) − U (θ, φ).
2
From the two independent variables θ, φ there are two Lagrange equations of
motion,
∂U 1
m`2 θ̈ = −+ sin(2θ)φ̇2 , (2.14)
∂θ 2
d 2 2 ∂U
m` sin θφ̇ = − . (2.15)
dt ∂φ
Notice that this is a dynamical system with two coordinates, similar to ordi-
nary mechanics in two dimensions, except that the mass matrix, while diag-
onal, is coordinate dependent, and the space on which motion occurs is not
an infinite flat plane, but a curved two dimensional surface, that of a sphere.
These two distinctions are connected—the coordinates enter the mass ma-
trix because it is impossible to describe a curved space with unconstrained
cartesian coordinates.
Often the potential U (θ, φ) will not actually depend on φ, in which case
Eq. 2.15 tells us m`2 sin2 θφ̇ is constant in time. We will discuss this further
in Section 2.4.1.
2.3 Hamilton’s Principle

The configuration of a system at any moment is specified by the value of the
generalized coordinates qj (t), and the space coordinatized by these q1 , . . . , qN
is the configuration space. The time evolution of the system is given by
2.3. HAMILTON’S PRINCIPLE 45
the trajectory, or motion of the point in configuration space as a function of

time, which can be specified by the functions qi (t).
One can imagine the system taking many paths, whether they obey New-
ton’s Laws or not. We consider only paths for which the qi (t) are differen-
tiable. Along any such path, we define the action as
Z t2
S= L(q(t), q̇(t), t)dt. (2.16)
t1
The action depends on the starting and ending points q(t1 ) and q(t2 ), but
beyond that, the value of the action depends on the path, unlike the work
done by a conservative force on a point moving in ordinary space. In fact,
it is exactly this dependence on the path which makes this concept useful
— Hamilton’s principle states that the actual motion of the particle from
q(t1 ) = qi to q(t2 ) = qf is along a path q(t) for which the action is stationary.
That means that for any small deviation of the path from the actual one,
keeping the initial and final configurations fixed, the variation of the action
vanishes to first order in the deviation.
To find out where a differentiable function of one variable has a stationary
point, we differentiate and solve the equation found by setting the derivative
to zero. If we have a differentiable function f of several variables xi , the
first-order variation of the function is ∆f = i (xi − x0i ) ∂f /∂xi |x0 , so unless
P
∂f /∂xi |x0 = 0 for all i, there is some variation of the {xi } which causes a
first order variation of f , and then x0 is not a stationary point.
But our action is a functional, a function of functions, which represent
an infinite number of variables, even for a path in only one dimension. In-
tuitively, at each time q(t) is a separate variable, though varying q at only
one point makes q̇ hard to interpret. A rigorous mathematician might want
to describe the path q(t) on t ∈ [0, 1] in terms of Fourier series, for which
P
q(t) = q0 + q1 t + n=1 an sin(nπt). Then the functional S(f ) given by
Z
S= f (q(t), q̇(t), t)dt
becomes a function of the infinitely many variables q0 , q1 , a1 , . . .. The end-

points fix q0 and q1 , but the stationary condition gives an infinite number of
equations ∂S/∂an = 0.
It is not really necessary to be so rigorous, however. Under a change
q(t) → q(t) + δq(t), the derivative will vary by δ q̇ = d δq(t)/dt, and the
functional S will vary by

!
Z
∂f ∂f
δS = δq + δ q̇ dt
∂q ∂ q̇
f " #
∂f Z
∂f d ∂f
= δq + − δqdt,
∂ q̇ i ∂q dt ∂ q̇
where we integrated the second term by parts. The boundary terms each have
a factor of δq at the initial or final point, which vanish because Hamilton tells
us to hold the qi and qf fixed, and therefore the functional is stationary if
and only if
∂f d ∂f
− = 0 for t ∈ (ti , tf ) (2.17)
∂q dt ∂ q̇
We see that if f is the Lagrangian, we get exactly Lagrange’s equation. The
above derivation is essentially unaltered if we have many degrees of freedom
qi instead of just one.
2.3.1 Examples of functional variation

In this section we will work through some examples of functional variations
both in the context of the action and for other examples not directly related
to mechanics.
The falling particle

As a first example of functional variation, consider a particle thrown up in
a uniform gravitional field at t = 0, which lands at the same spot at t = T .
The Lagrangian is L = 12 m(ẋ2 + ẏ 2 + ż 2 ) − mgz, and the boundary conditions
are x(t) = y(t) = z(t) = 0 at t = 0 and t = T . Elementary mechanics tells
us the solution to this problem is x(t) = y(t) ≡ 0, z(t) = v0 t − 21 gt2 with
v0 = 21 gT . Let us evaluate the action for any other path, writing z(t) in
terms of its deviation from the suspected solution,
1 1
z(t) = ∆z(t) + gT t − gt2 .
2 2
We make no assumptions about this path other than that it is differentiable
and meets the boundary conditions x = y = ∆z = 0 at t = 0 and at t = T .
2.3. HAMILTON’S PRINCIPLE 47
The action is
 !2 
Z T(
1 d∆z d∆z 1 2
S = m ẋ2 + ẏ 2 + + g(T − 2t) + g (T − 2t)2 
0 2 dt dt 4
)
1
−mg∆z − mg 2 t(T − t) dt.
2
The fourth term can be integrated by parts,
T
Z T Z T
1 d∆z 1

mg(T − 2t) dt = mg(T − 2t)∆z +
mg∆z(t) dt.
0 2 dt 2 0 0
The boundary term vanishes because ∆z = 0 where it is evaluated, and the

other term cancels the sixth term in S, so
Z T
1 2 1

S = mg (T − 2t)2 − t(T − t) dt
0 2 4
Z T !2 
1  2 d∆z
+ m ẋ + ẏ 2 + .
0 2 dt
The first integral is independent of the path, so the minimum action requires
the second integral to be as small as possible. But it is an integral of a non-
negative quantity, so its minimum is zero, requiring ẋ = ẏ = d∆z/dt = 0.
As x = y = ∆z = 0 at t = 0, this tells us x = y = ∆z = 0 at all times, and
the path which minimizes the action is the one we expect from elementary
mechanics.
Is the shortest path a straight line?

The calculus of variations occurs in other contexts, some of which are more
intuitive. The classic example is to find the shortest path between two points
in the plane. The length ` of a path y(x) from (x1 , y1 ) to (x2 , y2 ) is given5 by
v
Z x2 Z x2 u !2
u dy
`= ds = t
1+ dx.
x1 x1 dx
5
Here we are assuming the path is monotone in x, without moving somewhere to the
left and somewhere to the right. To prove that the straight line is shorter than other paths
which might not obey this restriction, do Exercise 2.2.
We see that length ` is playing the role of the action, and x is playing√the role
of t. Using ẏ to represent dy/dx, we have the integrand f (y, ẏ, x) = 1 + ẏ 2 ,
and ∂f /∂y = 0, so Eq. 2.17 gives
d ∂f d ẏ
= √ = 0, so ẏ = const.
dx ∂ ẏ dx 1 + ẏ 2
and the path is a straight line.
2.4 Conserved Quantities

2.4.1 Ignorable Coordinates
If the Lagrangian does not depend on one coordinate, say qk , then we say
it is an ignorable coordinate. Of course, we still want to solve for it, as
its derivative may still enter the Lagrangian and effect the evolution of other
coordinates. By Lagrange’s equation
d ∂L ∂L
= = 0,
dt ∂ q̇k ∂qk
so if in general we define
∂L
Pk := ,
∂ q̇k
as the generalized momentum, then in the case that L is independent of
qk , Pk is conserved, dPk /dt = 0.
Linear Momentum
As a very elementary example, consider a particle under a force given by a
potential which depends only on y and z, but not x. Then
1
L = m ẋ2 + ẏ 2 + ż 2 − U (y, z)
2
is independent of x, x is an ignorable coordinate and
∂L
Px = = mẋ
∂ ẋ
is conserved. This is no surprize, of course, because the force is F = −∇U
and Fx = −∂U/∂x = 0.
2.4. CONSERVED QUANTITIES 49
Note that, using the definition of the generalized momenta
∂L
Pk = ,
∂ q̇k
Lagrange’s equation can be written as
d ∂L ∂T ∂U
Pk = = − .
dt ∂qk ∂qk ∂qk
Only the last term enters the definition of the generalized force, so if the
kinetic energy depends on the coordinates, as will often be the case, it is
not true that dPk /dt = Qk . In that sense we might say that the generalized
momentum and the generalized force have not been defined consistently.
Angular Momentum
As a second example of a system with an ignorable coordinate, consider an
axially symmetric system described with inertial polar coordinates (r, θ, z),
with z along the symmetry axis. Extending the form of the kinetic energy
we found in sec (1.3.4) to include the z coordinate, we have T = 12 mṙ2 +
1
2
mr2 θ̇2 + 21 mż 2 . The potential is independent of θ, because otherwise the
system would not be symmetric about the z-axis, so the Lagrangian
1 1 1
L = mṙ2 + mr2 θ̇2 + mż 2 − U (r, z)
2 2 2
does not depend on θ, which is therefore an ignorable coordinate, and
∂L
Pθ := = mr2 θ̇ = constant.
∂ θ̇
We see that the conserved momentum Pθ is in fact the z-component of the
angular momentum, and is conserved because the axially symmetric potential
can exert no torque in the z-direction:

~

~
∂U
τz = − ~r × ∇U = −r ∇U = −r2 = 0.
z θ ∂θ
Finally, consider a particle in a spherically symmetric potential in spher-
ical coordinates. In section (3.1.2) we will show that the kinetic energy in
spherical coordinates is T = 21 mṙ2 + 21 mr2 θ̇2 + 21 mr2 sin2 θφ̇2 , so the La-
grangian with a spherically symmetric potential is
1 1 1
L = mṙ2 + mr2 θ̇2 + mr2 sin2 θφ̇2 − U (r).
2 2 2
Again, φ is an ignorable coordinate and the conjugate momentum Pφ is
conserved. Note, however, that even though the potential is independent of
θ as well, θ does appear undifferentiated in the Lagrangian, and it is not an
ignorable coordinate, nor is Pθ conserved6 .
If qj is an ignorable coordinate, not appearing undifferentiated in the
Lagrangian, any possible motion qj (t) is related to a different trajectory
qj0 (t) = qj (t) + c, in the sense that they have the same action, and if one
is an extremal path, so will the other be. Thus there is a symmetry of the
system under qj → qj + c, a continuous symmetry in the sense that c can
take on any value. As we shall see in Section 8.3, such symmetries generally
lead to conserved quantities. The symmetries can be less transparent than
an ignorable coordinate, however, as in the case just considered, of angular
momentum for a spherically symmetric potential, in which the conservation
of Lz follows from an ignorable coordinate φ, but the conservation of Lx and
Ly follow from symmetry under rotation about the x and y axes respectively,
and these are less apparent in the form of the Lagrangian.
2.4.2 Energy Conservation

We may ask what happens to the Lagrangian along the path of the motion.
dL X ∂L dqi X ∂L dq̇i ∂L
= + +
dt i ∂qi dt i ∂ q̇i dt ∂t
In the first term the first factor is

d ∂L
dt ∂ q̇i
6
It seems curious that we are finding straightforwardly one of the components of the
conserved momentum, but not the other two, Ly and Lx , which are also conserved. The
fact that not all of these emerge as conjugates to ignorable coordinates is related to the fact
that the components of the angular momentum do not commute in quantum mechanics.
This will be discussed further in section (6.6.1).
2.4. CONSERVED QUANTITIES 51
by the equations of motion, so

!
dL d X ∂L ∂L
= q̇i + .
dt dt i ∂ q̇i ∂t
We expect energy conservation when the potential is time invariant and there
is not time dependence in the constraints, i.e. when ∂L/∂t = 0, so we rewrite
this in terms of
X ∂L X
H(q, q̇, t) = q̇i −L= q̇i Pi − L
i ∂ q̇i i
Then for the actual motion of the system,

dH ∂L
=− .
dt ∂t
If ∂L/∂t = 0, H is conserved.
H is essentially the Hamiltonian, although strictly speaking that name
is reserved for the function H(q, p, t) on extended phase space rather than
the function with arguments (q, q̇, t). What is H physically? In the case
of Newtonian mechanics with a potential function, L is an inhomogeneous
quadratic function of the velocities q̇i . If we write the Lagrangian L = L2 +
L1 + L0 as a sum of pieces purely quadratic, purely linear, and independent
of the velocities respectively, then
X ∂
q̇i
i ∂ q̇i
is an operator which multiplies each term by its order in velocities,

X ∂Ln X ∂L
q̇i = nLn , q̇i = 2L2 + L1 ,
i ∂ q̇i i ∂ q̇i
and
H = L2 − L0 .
For a system of particles described by their cartesian coordinates, L2 is
just the kinetic energy T , while L0 is the negative of the potential energy
L0 = −U , so H = T + U is the ordinary energy. There are, however, con-
strained systems, such as the bead on a spoke of Section 2.2.1, for which the
Hamiltonian is conserved but is not the ordinary energy.
2.5 Hamilton’s Equations

We have written the Lagrangian as a function of qi , q̇i , and t, so it is a
function of N + N + 1 variables. For a free particle we can write the kinetic
energy either as 12 mẋ2 or as p2 /2m. More generally, we can7 reexpress the
dynamics in terms of the 2N + 1 variables qk , Pk , and t.
The motion of the system sweeps out a path in the space (q, q̇, t) or a
path in (q, P, t). Along this line, the variation of L is
!
∂L
X ∂L ∂L
dL = dq̇k + dqk + dt
k ∂ q̇k ∂qk ∂t
X ∂L
= Pk dq̇k + Ṗk dqk + dt
k ∂t
where for the first term we used the definition of the generalized momentum
and in the second we have used the equations of motion Ṗk = ∂L/∂qk . Then
examining the change in the Hamiltonian H = k Pk q̇k − L along this actual
P
motion,
X
dH = (Pk dq̇k + q̇k dPk ) − dL
k
X ∂L
= q̇k dPk − Ṗk dqk − dt.
k ∂t
If we think of q̇k and H as functions of q and P , and think of H as a function

of q, P , and t, we see that the physical motion obeys

∂H ∂H ∂H ∂L
q̇k = , Ṗk = − , = −
∂Pk q,t ∂qk P,t ∂t q,P ∂t q,q̇
The first two constitute Hamilton’s equations of motion, which are first
order equations for the motion of the point representing the system in phase
space.
Let’s work out a simple example, the one dimensional harmonic oscillator.
Here the kinetic energy is T = 21 mẋ2 , the potential energy is U = 12 kx2 , so
7
In field theory there arise situations in which the set of functions Pk (qi , q̇i ) cannot be
inverted to give functions q̇i = q̇i (qj , Pj ). This gives rise to local gauge invariance, and
will be discussed in Chapter 8, but until then we will assume that the phase space (q, p),
or cotangent bundle, is equivalent to the tangent bundle, i.e. the space of (q, q̇).
2.5. HAMILTON’S EQUATIONS 53
L = 21 mẋ2 − 12 kx2 , the only generalized momentum is P = ∂L/∂ ẋ = mẋ, and

the Hamiltonian is H = P ẋ − L = P 2 /m − (P 2 /2m − 21 kx2 ) = P 2 /2m + 21 kx2 .
Note this is just the sum of the kinetic and potential energies, or the total
energy.
Hamilton’s equations give

∂H P ∂H
ẋ = = , Ṗ = − = −kx = F.
∂P x m ∂x P
These two equations verify the usual connection of the momentum and ve-
locity and give Newton’s second law.
The identification of H with the total energy is more general than our
particular example. If T is purely quadratic in velocities, we can write T =
1P
2 ij Mij q̇i q̇j in terms of a symmetric mass matrix Mij . If in addition U is
independent of velocities,
1X
L = Mij q̇i q̇j − U (q)
2 ij
∂L X
Pk = = Mki q̇i
∂ q̇k i
which as a matrix equation in a n-dimensional space is P = M · q̇. Assuming

M is invertible,8 we also have q̇ = M −1 · P , so
H = P T · q̇ − L
1 T

T −1
= P ·M ·P − q̇ · M · q̇ − U (q)
2
1
= P T · M −1 · P − P T · M −1 · M · M −1 · P + U (q)
2
1 T
= P · M −1 · P + U (q) = T + U
2
so we see that the Hamiltonian is indeed the total energy under these cir-
cumstances.
8
If M were not invertible, there would be a linear combination of velocities which
does not affect the Lagrangian. The degree of freedom corresponding to this combination
would have a Lagrange equation without time derivatives, so it would be a constraint
equation rather than an equation of motion. But we are assuming that the q’s are a set
of independent generalized coordinates that have already been pruned of all constraints.
2.6 Don’t plug Equations of Motion into the

Lagrangian!
When we have a Lagrangian with an ignorable coordinate, say θ, and there-
fore a conjugate momentum Pθ which is conserved and can be considered
a constant, we are able to reduce the problem to one involving one fewer
degrees of freedom. That is, one can substitute into the other differential
equations the value of θ̇ in terms of Pθ and other degrees of freedom, so
that θ and its derivatives no longer appear in the equations of motion. For
example, consider the two dimensional isotropic harmonic oscillator,
1 2 1
L = m ẋ + ẏ 2 − k x2 + y 2
2 2
1 2 1
= m ṙ + r2 θ̇2 − kr2
2 2
in polar coordinates. The equations of motion are
Ṗθ = 0, where Pθ = mr2 θ̇, .

mr̈ = −kr + mrθ̇2 =⇒ mr̈ = −kr + Pθ2 mr3 .
The last equation is now a problem in the one degree of freedom r.

One might be tempted to substitute for θ̇ into the Lagrangian 

and then have a Lagrangian involving one fewer degrees of free-




dom. In our example, we would get








Pθ2

1
L = mṙ2 +
1
− kr2 ,

 This is
2 2mr 2 2 



wrong

which gives the equation of motion








Pθ2



mr̈ = − − kr.



mr3
Notice that the last equation has the sign of the Pθ2 term reversed from
the correct equation. Why did we get the wrong answer? In deriving the
Lagrange equation which comes from varying r, we need

d ∂L ∂L
= .
dt ∂ ṙ r,θ,θ̇ ∂r ṙ,θ,θ̇
2.7. VELOCITY-DEPENDENT FORCES 55
But we treated Pθ as fixed, which means that when we vary r on the right
hand side, we are not holding θ̇ fixed, as we should be. While we often
write partial derivatives without specifying explicitly what is being held fixed,
they are not defined without such a specification, which we are expected to
understand implicitly. However, there are several examples in Physics, such
as thermodynamics, where this implicit understanding can be unclear, and
the results may not be what was intended.
2.7 Velocity-dependent forces

We have concentrated thus far on Newtonian mechanics with a potential
given as a function of coordinates only. As the potential is a piece of the
Lagrangian, which may depend on velocities as well, we should also entertain
the possibility of velocity-dependent potentials. Only by considering such a
potential can we possibly find velocity-dependent forces, and one of the most
important force laws in physics is of that form. This is the Lorentz force9
~ r, t) and
on a particle of charge q in the presence of electromagnetic fields E(~
~ r, t),
B(~
!
~v
F~ = q E~ + ×B ~ . (2.18)
c
If the motion of a charged particle is described by Lagrangian mechanics with
a potential U (~r, ~v , t), Lagrange’s equation says
d ∂L ∂L d ∂U ∂U d ∂U ∂U
0= − = mr̈i − + , so Fi = − .
dt ∂vi ∂ri dt ∂vi ∂ri dt ∂vi ∂ri
We want a force linear in ~v and proportional to q, so let us try

~ r, t) .
U = q φ(~r, t) + ~v · C(~
Then we need to have
~ + ~v × B
~ = dC ~ − ∇φ
~ − ~ j.
X
E vj ∇C (2.19)
c dt j
9
We have used Gaussian units here, but those who prefer S. I. units (rationalized MKS)
can simply set c = 1.
The first term is a stream derivative evaluated at the time-dependent position

of the particle, so, as in Eq. (2.5),
d ~ ~ X ∂C
∂C ~
C= + vj .
dt ∂t j ∂xj
The last term looks like the last term of (2.19), except that the indices on the
derivative operator and on C ~ have been reversed. This suggests that these
two terms combine to form a cross product. Indeed, noting (A.17) that
~
∂C
~ ×C
~ = ~ j−
X X
~v × ∇ vj ∇C vj ,
j ∂xj
we see that (2.19) becomes

~ ~ ~
~ + ~v × B
~ = ∂ C − ∇φ
X ∂C ∂C
~ − ~ j+ ~ − ~v × ∇~ ×C
~ .
X
E vj ∇C vj = − ∇φ
c ∂t j j ∂xj ∂t
We have successfully generated the term linear in ~v if we can show that

~ r, t) such that B
there exists a vector field C(~ ~ = −c∇ ~ × C.
~ A curl is always
divergenceless, so this requires ∇~ ·B~ = 0, but this is indeed one of Maxwell’s
10
equations, and it ensures there exists a vector field A, ~ known as the mag-
netic vector potential, such that B ~ =∇ ~ × A.
~ Thus with C ~ = −A/c,
~ we
need only to find a φ such that
~
~ − 1 ∂A .
~ = −∇φ
E
c ∂t
Once again, one of Maxwell’s laws,
~
~ + 1 ∂ B = 0,
~ ×E
∇
c ∂t
guarantees the existence of φ, the electrostatic potential, because after
~ =∇
inserting B ~ × A,
~ this is a statement that E
~ + (1/c)∂ A/∂t
~ has no curl,
and is the gradient of something.
10
This is but one of many consequences of the Poincaré lemma, discussed in section 6.5
(well, it should be). The particular forms we are using here state that if ∇ ~ ·B~ = 0 and
~ ~ 3 ~
∇ × F = 0 in all of R , then there exist a scalar function φ and a vector field A such that
~ =∇
B ~ ×A ~ and F~ = ∇φ.
~
Thus we see that the Lagrangian which describes the motion of a charged
particle in an electromagnetic field is given by a velocity-dependent potential

~ r, t) .
U (~r, ~v ) = q φ(r, t) − (~v /c) · A(~
Note, however, that this Lagrangian describes only the motion of the charged
particle, and not the dynamics of the field itself.
Arbitrariness in the Lagrangian In this discussion of finding the La-

grangian to describe the Lorentz force, we used the lemma that guaranteed
that the divergenceless magnetic field B ~ can be written in terms of some
magnetic vector potential A, ~ with B~ =∇~ × A.
~ But A ~ is not uniquely spec-
~ ~ ~ ~
ified by B; in fact, if a change is made, A → A + ∇λ(~r, t), B~ is unchanged
~ will be changed
because the curl of a gradient vanishes. The electric field E
~
by −(1/c)∂ A/∂t, however, unless we also make a change in the electrostatic
potential, φ → φ − (1/c)∂λ/∂t. If we do, we have completely unchanged
electromagnetic fields, which is where the physics lies. This change in the
potentials,
A~→A ~ + ∇λ(~
~ r, t), φ → φ − (1/c)∂λ/∂t, (2.20)
is known as a gauge transformation, and the invariance of the physics
under this change is known as gauge invariance. Under this change, the
potential U and the Lagrangian are not unchanged,
!
~v ~ = L + q ∂λ + q ~v · ∇λ(~
~ r, t) = L + q dλ .
L → L − q δφ − · δ A
c c ∂t c c dt
We have here an example which points out that there is not a unique
Lagrangian which describes a given physical problem, and the ambiguity is
more that just the arbitrary constant we always knew was involved in the
potential energy. This ambiguity is quite general, not depending on the gauge
transformations of Maxwell fields. In general, if
d
L(2) (qj , q̇j , t) = L(1) (qj , q̇j , t) + f (qj , t) (2.21)
dt
then L(1) and L(2) give the same equations of motion, and therefore the same
physics, for qj (t). While this can be easily checked by evaluating the Lagrange
equations, it is best understood in terms of the variation of the action. For
any path qj (t) between qjI at t = tI to qjF at t = tF , the two actions are
related by
Z tF !
(2) (1) d
S = L (qj , q̇j , t) + f (qj , t) dt
tI dt
= S (1) + f (qjF , tF ) − f (qjI , tI ).
The variation of path that one makes to find the stationary action does not
change the endpoints qjF and qjI , so the difference S (2) − S (1) is a constant
independent of the trajectory, and a stationary trajectory for S (2) is clearly
stationary for S (1) as well.
The conjugate momenta are affected by the change in Lagrangian, how-
ever, because L(2) = L(1) + j q̇j ∂f /∂qj + ∂f /∂t, so
P
(2) ∂L(2) (1) ∂f

pj = = pj + .
∂ q̇j ∂qj
This ambiguity is not usually mentioned in elementary mechanics, be-

cause if we restict our attention to Lagrangians consisting of canonical kinetic
energy and potentials which are velocity-independent, a change (2.21) to a
Lagrangian L(1) of this type will produce an L(2) which is not of this type, un-
less f is independent of position q and leaves the momenta unchanged. That
is, the only f which leaves U velocity independent is an arbitrary constant.
Dissipation Another familiar force which is velocity dependent is friction.

Even the “constant” sliding friction met with in elementary courses depends
on the direction, if not the magnitude, of the velocity. Friction in a viscous
medium is often taken to be a force proportional to the velocity, F~ = −α~v .
We saw above that a potential linear in velocities produces a force perpen-
dicular to ~v , and a term higher order in velocities will contribute a force
that depends on acceleration. This situation cannot handled by Lagrange’s
equations. More generally, a Lagrangian can produce a force Qi = Rij q̇j with
antisymmetric Rij , but not for a symmetric matrix. An extension to the La-
grange formalism, involving Rayleigh’s dissipation function, can handle such
a case. These dissipative forces are discussed in Ref. [6].
Exercises
2.1 (Galelean relativity): Sally is sitting in a railroad car observing a system of

particles, using a Cartesian coordinate system so that the particles are at positions
(S) (S)
~ri (t), and move under the influence of a potential U (S) ({~ri }). Thomas is in
another railroad car, moving with constant velocity ~u with respect to Sally, and so
(T ) (S)
he describes the position of each particle as ~ri (t) = ~ri (t) − ~ut. Each takes the
(S) 2

kinetic energy to be of the standard form in his system, i.e. T (S) = 1 mi ~r˙
P
2 i
(T ) 2

T (T ) 1
mi ~r˙i
P
and = 2 .
(a) Show that if Thomas assumes the potential function U (T ) (~r (T ) ) to be the same
as Sally’s at the same physical points,
U (T ) (~r (T ) ) = U (S) (~r (T ) + ~ut), (2.22)
then the equations of motion derived by Sally and Thomas describe the same
(S) (T ) (S)
physics. That is, if ri (t) is a solution of Sally’s equations, ri (t) = ri (t) − ~ut
is a solution of Thomas’.
(b) show that if U (S) ({~ri }) is a function only of the displacements of one particle
from another, {~ri − ~rj }, then U (T ) is the same function of its arguments as U (S) ,
U (T ) ({~ri }) = U (S) ({~ri }). This is a different statement than Eq. 2.22, which states
that they agree at the same physical configuration. Show it will not generally be
true if U (S) is not restricted to depend only on the differences in positions.
(c) If it is true that U (S) (~r) = U (T ) (~r), show that Sally and Thomas derive the
same equations of motion, which we call “form invariance” of the equations.
(d) Show that nonetheless Sally and Thomas disagree on the energy of a particular
physical motion, and relate the difference to the total momentum. Which of these
quantities are conserved?
2.2 In order to show that the shortest path in two dimensional Euclidean space
is a straight line without making the assumption that ∆x does not change sign
along the path, we can consider using a parameter λ and describing the path by
two functions x(λ) and y(λ), say with λ ∈ [0, 1]. Then
Z 1 q
`= dλ ẋ2 (λ) + ẏ 2 (λ),
0
where ẋ means dx/dλ. This is of the form of a variational integral with two
variables. Show that the variational equations do not determine the functions
x(λ) and y(λ), but do determine that the path is a straight line. Show that the
pair of functions (x(λ), y(λ)) gives the same action as another pair (x̃(λ), ỹ(λ)),
where x̃(λ) = x(t(λ)) and ỹ(λ) = y(t(λ)), where t(λ) is any monotone function
mapping [0, 1] onto itself. Explain why this equality of the lengths is obvious
in terms of alternate parameterizations of the path. [In field theory, this is an

example of a local gauge invariance, and plays a major role in string theory.]
2.3 Consider a circular hoop of radius R rotating about a vertical diameter at

a fixed angular velocity Ω. On the hoop there is a bead of mass m, which slides
without friction on the hoop. The only external force is gravity. Derive the
Lagrangian and the Lagrange equation using the polar angle θ as the unconstrained
generalized coordinate. Find a conserved quantity, and find the equilibrium points,
for which θ̇ = 0. Find the condition on Ω such that there is an equilibrium point
away from the axis.
2.4 Early steam engines had a feedback device, called a governor, to automat-
ically control the speed. The engine rotated a vertical shaft with an angular
velocity Ω proportional to its speed. On oppo-
site sides of this shaft, two hinged rods each
held a metal weight, which was attached to
another such rod hinged to a sliding collar, as
Ω
shown.
As the shaft rotates faster, the balls move
outwards, the collar rises and uncovers a hole, L
releasing some steam. Assume all hinges are
frictionless, the rods massless, and each ball
has mass m1 and the collar has mass m2 .
m m
(a) Write the Lagrangian in terms of the gen- 1 1
eralized coordinate θ.
L
(b) Find the equilibrium angle θ as a func-
tion of the shaft angular velocity Ω. Tell m
2
whether the equilibrium is stable or not.
Governor for a steam en-

gine.
2.5 A transformer consists of two coils of conductor each of which has an induc-
tance, but which also have a coupling, or mutual inductance.
If the current flowing into the upper posts of coils

A and B are IA (t) and IB (t) respectively, the volt- VA VB
age difference or EMF across each coil is VA and VB
IA A IB
respectively, where B
dIA dIB
VA = LA +M
dt dt
dIB dIA 0 0
VB = LB +M
dt dt
Consider the circuit shown, two
capacitors coupled by a such a trans-
former, where the capacitances are
CA and CB respectively, with the q1 A B q2
charges q1 (t) and q2 (t) serving as the
generalized coordinates for this prob-
lem. Write down the two second or- −q1 −q2
der differential equations of “motion”
for q1 (t) and q2 (t), and write a La-
grangian for this system.
2.6 A cylinder of radius R is held horizontally in a fixed position, and a smaller
uniform cylindrical disk of radius a is placed on top of the first cylinder, and is
released from rest. There is a coefficient of
static friction µs and a coefficient of kinetic
friction µk < µs for the contact between the
cylinders. As the equilibrium at the top is
unstable, the top cylinder will begin to roll on
the bottom cylinder.
(a) If µs is sufficiently large, the small disk a

will roll until it separates from the fixed
cylinder. Find the angle θ at which the θ
separation occurs, and find the mini-
mum value of µs for which this situation
holds.
R
(b) If µs is less than the minimum value
found above, what happens differently,
and at what angle θ does this different A small cylinder rolling on
behavior begin? a fixed larger cylinder.
2.7 (a) Show that if Φ(q1 , ..., qn , t) is an arbitrary differentiable function on ex-
tended configuration space, and L(1) ({qi }, {q̇j }, t) and L(2) ({qi }, {q̇j }, t) are two
Lagrangians which differ by the total time derivative of Φ,
d
L(1) ({qi }, {q̇j }, t) = L(2) ({qi }, {q̇j }, t) + Φ(q1 , ..., qn , t),
dt
show by explicit calculations that the equations of motion determined by L(1) are
the same as the equations of motion determined by L(2) .
(1) (2)
(b) What is the relationship between the momenta pi and pi determined by
these two Lagrangians respectively.
2.8 A particle of mass m1 moves in two dimensions on a frictionless horizontal

table with a tiny hole in it. An inextensible massless string attached to m1 goes
through the hole and is connected to another particle of mass m2 , which moves
vertically only. Give a full set of generalized unconstrained coordinates and write
the Lagrangian in terms of these. Assume the string remains taut at all times
and that the motions in question never have either particle reaching the hole, and
there is no friction of the string sliding at the hole.
Are there ignorable coordinates? Reduce the problem to a single second order
differential equation. Show this is equivalent to single particle motion in one
dimension with a potential V (r), and find V (r).
2.9 Consider a mass m on the end of a massless rigid rod of length `, the other
end of which is free to rotate about a fixed point. This is a spherical pendulum.
Find the Lagrangian and the equations of motion.
2.10 (a) Find a differential equation for θ(φ) for the shortest path on the surface
of a sphere between two arbitrary points on that surface, by minimizing the length
of the path, assuming it to be monotone in φ.
(b) By geometrical argument (that it must be a great circle) argue that the path
should satisfy
cos(φ − φ0 ) = K cot θ,
and show that this is indeed the solution of the differential equation you derived.
2.11 Consider some intelligent bugs who live on a turntable which, according
to inertial observers, is spinning at angular velocity ω about its center. At any
one time, the inertial observer can describe the points on the turntable with polar
coordinates r, φ. If the bugs measure distances between two objects at rest with
respect to them, at infinitesimally close points, they will find
r2
d`2 = dr2 + dφ2 ,
1 − ω 2 r2 /c2
because their metersticks shrink in the
tangential direction and it takes more of
them to cover the distance we think of
as rdφ, though their metersticks agree
with ours when measuring radial dis-
placements.
The bugs will declare a curve to be
a geodesic, or Rthe shortest path between
two points, if d` is a minimum. Show
that this requires that r(φ) satisfies
dr r p
=± α2 r2 − 1,
dφ 1 − ω 2 r2 /c2 Straight lines to us and to the bugs,
between the same two points.
where α is a constant.
2.12 Hamilton’s Principle tells us that the motion of a particle is determined

by the action functional being stationary under small variations of the path Γ in
extended configuration space (t, ~x). The unsymmetrical treatment of t and ~x(t)
is not suitable for relativity, but we may still associate an action with each path,
which we can parameterize with λ, so Γ is the trajectory λ → (t(λ), ~x(λ)).
In the general relativistic treatment of a particle’s motion in a gravitational Rfield,
the action is given by mc2 ∆τ , where ∆τ is the elapsed proper time, ∆τ = dτ .
But distances and time intervals are measured with a spatial varying metric gµν ,
with µ and ν ranging from 0 to 3, with the zeroth component referring to time.
The four components of extended configuration space are written xµ , with a su-
perscript rather than a subscript, and x0 = ct. The gravitational field is de-
scribed by the space-time dependence of the metric gµν (xρ ). In this language,
an infinitesimal
qP element of the path of a particle corresponds to a proper time
dτ = (1/c) µ ν
µν gµν dx dx , so
v
dxµ dxν
Z uX
S = mc2 ∆τ = mc dλt gµν (xρ )
u
.
µν dλ dλ
(a) Find the four Lagrange equations which follow from varying xρ (λ).
(b) Show that if we multiply these four equations by ẋρ and sum on ρ, we get an
identity rather than a differential equation helping to determine the functions
xµ (λ). Explain this as a consequence of the fact that any path has a length
unchanged by a reparameterization of the path, λ → σ(λ), x0 µ (λ) = xµ (σ(λ)
(c) Using this freedom to choose λ to be τ , the proper time from the start of the
path to the point in question, show that the equations of motion are
d2 xλ X λ dxρ dxσ
+ Γ ρσ = 0,
dτ 2 ρσ dτ dτ
and find the expression for Γλρσ .
2.13 (a): Find the canonical momenta for a charged particle moving in an electro-
magnetic field and also under the influence of a non-electromagnetic force described
by a potential U (~r).
(b): If the electromagnetic field is a constant magnetic field B ~ = B0 êz , with no
electric field and with U (~r) = 0, what conserved quantities are there?
Chapter 3
Two Body Central Forces
Consider two particles of masses m1 and m2 , with the only forces those of
their mutual interaction, which we assume is given by a potential which is a
function only of the distance between them, U (|~r1 − ~r2 |). In a mathematical
sense this is a very strong restriction, but it applies very nicely to many
physical situations. The classical case is the motion of a planet around the
Sun, ignoring the effects mentioned at the beginning of the book. But it
also applies to electrostatic forces and to many effective representations of
nonrelativistic interparticle forces.
3.1 Reduction to a one dimensional problem

Our original problem has six degrees of freedom, but because of the sym-
metries in the problem, many of these can be simply separated and solved
for, reducing the problem to a mathematically equivalent problem of a single
particle moving in one dimension. First we reduce it to a one-body problem,
and then we reduce the dimensionality.
3.1.1 Reduction to a one-body problem

As there are no external forces, we expect the center of mass coordinate to
be in uniform motion, and it behoves us to use
~ = m1~r1 + m2~r2
R
m1 + m2
65
66 CHAPTER 3. TWO BODY CENTRAL FORCES
as three of our generalized coordinates. For the other three, we first use the
cartesian components of the relative coordinate
~r := ~r2 − ~r1 ,
although we will soon change to spherical coordinates for this vector. In

~ and ~r, the particle positions are
terms of R
~ − m2 ~r,
~r1 = R ~ + m1 ~r,
~r2 = R where M = m1 + m2 .
M M
The kinetic energy is
1 1
T = m1 ṙ12 + m2 ṙ22
2 2
1 ˙~ m2 ˙ 2 1 ˙ m1 ˙ 2

= m1 R − ~r + m2 R + ~ ~r
2 M 2 M
1 2
= (m1 + m2 )R ~˙ + 1 m1 m2 ~r˙ 2
2 2 M
1 ~˙ 2 1 ˙ 2
= M R + µ~r ,
2 2
where
m1 m2
µ :=
m1 + m2
is called the reduced mass. Thus the kinetic energy is transformed to the
form for two effective particles of mass M and µ, which is neither simpler
nor more complicated than it was in the original variables.
For the potential energy, however, the new variables are to be preferred,
~ whose three components are
for U (~r1 − ~r2 ) = U (~r) is independent of R,
therefore ignorable coordinates, and their conjugate momenta
∂(T − U )
P~cm = = M Ṙi
i ∂ Ṙi
are conserved. This reduces half of the motion to triviality, leaving an effec-
tive one-body problem with T = 12 µṙ2 , and the given potential U (~r).
We have not yet made use of the fact that U only depends on the mag-
nitude of ~r. In fact, the above reduction applies to any two-body system
without external forces, as long as Newton’s Third Law holds.
3.1. REDUCTION TO A ONE DIMENSIONAL PROBLEM 67
3.1.2 Reduction to one dimension

In the problem under discussion, however, there is the additional restriction
that the potential depends only on the magnitude of ~r, that is, on the distance
between the two particles, and not on the direction of ~r. Thus we now convert
from cartesian to spherical coordinates (r, θ, φ) for ~r. In terms of the cartesian
coordinates (x, y, z)
1
r= (x2 + y 2 + z 2 ) 2 x= r sin θ cos φ
θ= cos−1 (z/r) y= r sin θ sin φ
φ= tan−1 (y/x) z= r cos θ
Plugging into the kinetic energy is messy but eventually reduces to a rather
simple form
1 h 2 i
T = µ ẋ1 + ẋ22 + ẋ23
2
1 h
= µ (ṙ sin θ cos φ + θ̇r cos θ cos φ − φ̇r sin θ sin φ)2
2
+(ṙ sin θ sin φ + θ̇r cos θ sin φ + φ̇r sin θ cos φ)2
i
+(ṙ cos θ − θ̇r sin θ)2
1 h 2 i
= µ ṙ + r2 θ̇2 + r2 sin2 θφ̇2 (3.1)
2
Notice that in spherical coordinates T is a funtion of r and θ as well as ṙ, θ̇,
and φ̇, but it is not a function of φ, which is therefore an ignorable coordinate,
and
∂L
Pφ = = µr2 sin2 θφ̇ = constant.
∂ φ̇
Note that r sin θ is the distance of the particle from the z-axis, so Pφ is just
the z-component of the angular momentum, Lz . Of course all of L ~ = ~r × p~
is conserved, because in our effective one body problem there is no torque
about the origin. Thus L ~ is a constant1 , and the motion must remain in a
plane perpendicular to L ~ and passing through the origin, as a consequence
1 ~ = 0, p~ and ~r are in the same direction, to which the motion is then confined.
If L
In this case it is more appropriate to use Cartesian coordinates with this direction as x,
reducing the problem to a one-dimensional problem with potential U (x) = U (r = |x|). In
the rest of this chapter we assume L~ 6= 0.
~ It simplifies things if we choose our coordinates so

of the fact that ~r ⊥ L.
~
that L is in the z-direction. Then θ = π/2, θ̇ = 0, L = µr2 φ̇. The r equation
of motion is then
L2
µr̈ − µrφ̇2 + dU/dr = 0 = µr̈ − + dU/dr.
µr3
This is the one-dimensional motion of body in an effective potential
L2
Ueff (r) = U (r) + .
2µr2
Thus we have reduced a two-body three-dimensional problem to one with
a single degree of freedom, without any additional complication except the
addition of a centrifugal barrier term L2 /2µr2 to the potential.
Before we proceed, a comment may be useful in retrospect about the re-
duction in variables in going from the three dimensional one-body problem
to a one dimensional problem. Here we reduced the phase space from six
variables to two, in a problem which had four conserved quantities, L ~ and
H. But we have not yet used the conservation of H in this reduction, we
have only used the three conserved quantities L. ~ Where have these dimen-
~
sions gone? From L conservation, by choosing our axes with L ~ k z, the
two constraints Lx = 0 and Ly = 0 ( with Lz 6= 0) do imply z = pz = 0,
thereby eliminating two of the coordinates of phase space. The conservation
of Lz , however, is a consequence of an ignorable coordinate φ, with conserved
conjugate momentum Pφ = Lz . In this case, not only is the corresponding
momentum restricted to a constant value, eliminating one dimension of vari-
ation in phase space, but the corresponding coordinate, φ, while not fixed,
drops out of consideration because it does not appear in the remaining one
dimensional problem. This is generally true for an ignorable coordinate —
the corresponding momentum becomes a time-constant parameter, and the
coordinate disappears from the remaining problem.
3.2 Integrating the motion

We can simplify the problem even more by using the one conservation law
left, that of energy. Because the energy of the effective motion is a constant,
1
E = µṙ2 + Ueff = constant
2
3.2. INTEGRATING THE MOTION 69
we can immediately solve for

( )1/2
dr 2
=± (E − Ueff (r)) .
dt µ
This can be inverted and integrated over r, to give

Z
dr
t = t0 ± q , (3.2)
2 (E − Ueff (r)) /µ
which is the inverse function of the solution to the radial motion problem
r(t). We can also find the orbit because
dφ φ̇ L dt
= = 2
dr dr/dt µr dr
so
Z r
dr
φ = φ0 ± L q . (3.3)
r0 r2 2µ (E − Ueff (r))
The sign ambiguity from the square root is only because r may be increasing
or decreasing, but time, and usually φ/L, are always increasing.
Qualitative features of the motion are largely determined by the range
over which the argument of the square root is positive, as for other values of
r we would have imaginary velocities. Thus the motion is restricted to this
allowed region. Unless L = 0 or the potential U (r) is very strongly attractive
for small r, the centrifugal barrier will dominate there, so Ueff −→ +∞, and
r→0
there must be a smallest radius rp > 0 for which E ≥ Ueff . Generically the
force will not vanish there, so E −Ueff ≈ c(r −rp ) for r ≈ rp , and the integrals
in (3.2) and (3.3) are convergent. Thus an incoming orbit reaches r = rp at a
finite time and finite angle, and the motion then continues with r increasing
and the ± signs reversed. The radius rp is called a turning point of the
motion. If there is also a maximum value of r for which the velocity is real,
it is also a turning point, and an outgoing orbit will reach this maximum and
then r will start to decrease, confining the orbit to the allowed values of r.
If there are both minimum and maximum values, this interpretation of
Eq. (3.3) gives φ as a multiple valued function of r, with an “inverse” r(φ)
which is a periodic function of φ. But there is no particular reason for this
period to be the geometrically natural periodicity 2π of φ, so that different

values of r may be expected in successive passes through the same angle in
the plane of the motion. There would need to be something very special
about the attractive potential for the period to turn out to be just 2π, but
indeed that is the case for Newtonian gravity.
We have reduced the problem of the motion to doing integrals. In general
that is all we can do explicitly, but in some cases we can do the integral
analytically, and two of these special cases are very important physically.
3.2.1 The Kepler problem

Consider first the force of Newtonian gravity, or equivalently the Coulomb
attraction of unlike charged particles. The force F (r) = −K/r2 has a poten-
tial
K
U (r) = − .
r
Then the φ integral is
)−1/2
L2
(
L Z
2E 2K
φ = φ0 ± dr + − 2 2
µr2 µ r µr
Z
du
= φ0 ± √ (3.4)
γ + αu − u2
where we have made the variable substitution u = 1/r which simplifies the
form, and have introduced abbreviations γ = 2µE/L2 , α = 2Kµ2 /L2 .
As dφ/dr must be real the motion will clearly be confined to regions for
which the argument of the square root is nonnegative, and the motion in
r will reverse at the turning points where the argument vanishes. The ar-
gument is clearly negative as u → ∞, which is r = 0. We have assumed
L 6= 0, so the angular momentum barrier dominates over the Coulomb at-
traction, and always prevents the particle from reaching the origin. Thus
there is always at least one turning point, umax , corresponding to the min-
imum distance rp . Then the argument of the square root must factor into
[−(u − umax )(u − umin )], although if umin is negative it is not really the min-
imum u, which can never get past zero. The integral (3.4) can be done2 with
2
Of course it can also be done by looking in a good table of integrals. For example, see
2.261(c) of Gradshtein and Ryzhik[7].
the substitution sin2 β = (umax − u)/(umax − umin ). This shows φ = φ0 ± 2β,

where φ0 is the angle at r = rmin , u = umax . Then
1
u≡ = A cos(φ − φ0 ) + B
r
where A and B are constants which could be followed from our sequence of
substitutions, but are better evaluated in terms of the conserved quantities
E and L directly. φ = φ0 corresponds to the minimum r, r = rp , the point
of closest approach, or perigee3 , so rp−1 = A + B, and A > 0. Let θ = φ − φ0
be the angle from this minimum, with the x axis along θ = 0. Then
1 1 e 1 1 + e cos θ

= A cos θ + B = 1− (1 − cos θ) =
r rp 1+e rp 1 + e
where e = A/B.
What is this orbit? Clearly rp just sets the scale of the whole orbit. From
rp (1 + e) = r + er cos θ = r + ex, if we subtract ex and square, we get
rp2 + 2rp e(rp − x) + e2 (rp − x)2 = r2 = x2 + y 2 , which is clearly quadratic in
x and y. It is therefore a conic section,
y 2 + (1 − e2 )x2 + 2e(1 + e)xrp − (1 + e)2 rp2 = 0.
The nature of the curve depends on the coefficient of x2 . For
• |e| < 1, the coefficient is > 0, and we have an ellipse.
• e = ±1, the coefficient vanishes and y 2 = ax + b is a parabola.
• |e| > 1, the coefficient is < 0, and we have a hyperbola.
All of these are possible motions. The bound orbits are ellipses, which
describe planetary motion and also the motion of comets. But objects which
have enough energy to escape from the sun, such as Voyager 2, are in hyper-
bolic orbit, or in the dividing case where the total energy is exactly zero, a
parabolic orbit. Then as time goes to ∞, φ goes to a finite value, φ → π for
a parabola, or some constant less than π for a hyperbolic orbit.
3
Perigee is the correct word if the heavier of the two is the Earth, perihelion if it is
the sun, periastron for some other star. Pericenter is also used, but not as generally as it
ought to be.
Let us return to the elliptic case. The closest approach, or perigee,

is r = rp , while the furthest apart the objects get is at θ = π, r = ra =
rp (1 + e)/(1 − e), which is called the apogee or aphelion. e is the eccentric-
ity of the ellipse. An ellipse is a circle stretched uniformly in one direction;
the diameter in that direction becomes the major axis of the ellipse, while
the perpendicular diameter becomes the minor axis.
One half the length of the major
axis is the semi-major axis and
is denoted by a. d r
1

1+e rp

ra ea
rp
a = rp + rp = ,
2 1−e 1−e
a
so
a
rp = (1 − e)a, ra = (1 + e)a.
Properties of an ellipse. The large
Notice that the center of the el- dots are the foci. The eccentricity
lipse is ea away from the Sun. is e and a is the semi-major axis.
Kepler tells us not only that the orbit is an ellipse, but also that the
sun is at one focus. To verify that, note the other focus of an ellipse is
symmetrically located, at (−2ea, 0), and work out the sum of the distances
of any point on the ellipse from the two foci. This will verify that d + r = 2a
is a constant, showing that the orbit is indeed an ellipse with the sun at one
focus.
How are a and e related to the total energy E and the angular momentum
L? At apogee and perigee, dr/dφ vanishes, and so does ṙ, so E = U (r) +
L2 /2µr2 = −K/r + L2 /2µr2 , which holds at r = rp = a(1 − e) and at
r = ra = a(1 + e). Thus Ea2 (1 ± e)2 + Ka(1 ± e) − L2 /2µ = 0. These two
equations are easily solved for a and e in terms of the constants of the motion
E and L
K 2 2EL2
a=− , e =1+ .
2E µK 2
As expected for a bound orbit, we have found r as a periodic function
of φ, but it is surprising that the period is the natural period 2π. In other
words, as the planet makes its revolutions around the sun, its perihelion is
always in the same direction. That didn’t have to be the case — one could
imagine that each time around, the minimum distance occurred at a slightly
different (or very different) angle. Such an effect is called the precession
of the perihelion. We will discuss this for nearly circular orbits in other
potentials in section (3.2.2).
What about Kepler’s Third Law? The area of a triange with ~r as one
edge and the displacement during a small time interval δ~r = ~v δt is A =
1
2
|~r × ~v |δt = |~r × p~|δt/2µ, so the area swept out per unit time is
dA L
= .
dt 2µ
which is constant. The area of an ellipse made by stretching a circle is
stretched by the same amount, so A is π times the semimajor axis times the
semiminor axis.√ The endpoint of the semiminor axis is a away from each
focus, so it is a 1 − e2 from the center, and
v
√
!
2EL2
u
2 2t
u
2
A = πa 1 − e = πa 1 − 1 +
µK 2
s
L −2E
= πa2 .
K µ
Recall that for bound orbits E < 0, so A is real. The period is just the area
swept out in one revolution divided by the rate it is swept out, or
s
L −2E 2µ
2
T = πa
K µ L
2q
2πa π
= −2µE = K(2µ)1/2 (−E)−3/2 (3.5)
K 2
2πa2 q
= µK/a = 2πa3/2 (K)−1/2 µ1/2 , (3.6)
K
independent of L. The fact that T and a depend only on E and not on
L is another fascinating manifestation of the very subtle symmetries of the
Kepler/Coulomb problem.
3.2.2 Nearly Circular Orbits

For a general central potential we cannot find an analytic form for the motion,
which involves solving the effective one-dimensional problem with Ueff (r) =
U (r)+L2 /2µr2 . If Ueff (r) has a minimum at r = a, one solution is certainly a

circular orbit of radius a. The minimum requires dUeff (r)/dr = 0 = −F (r) −
L2 /µr3 , so
L2
F (a) = − .
µa3
We may also ask about trajectories which differ only slightly from this orbit,
for which |r − a| is small. Expanding Ueff (r) in a Taylor series about a,
1
Ueff (r) = Ueff (a) + (r − a)2 k,
2
where

d2 Ueff
k =
dr2 a

3L2
!
dF dF 3F
= − + 4 =− + .
dr µa dr a
For r = a to be a minimum and the nearly circular orbits to be stable, the

second derivative and k must be positive, and therefore F 0 + 3F/a < 0. As
4
always when we treat a problem as small deviations from a stable
q equilibrium
we have harmonic oscillator motion, with a period Tosc = 2π µ/k.
As a simple class of examples, consider the case where the force law
depends on r with a simple power, F = −crn . Then k = (n + 3)can−1 , which
is positive and the orbit stable only if n > −3. For gravity, n = −2, c =
K, k = K/a3 , and
s
µa3
Tosc = 2π
K
agreeing with what we derived for the more general motion, not restricted to
small deviations from circularity. But for more general n, we find
v
µa1−n
u
u
Tosc = 2π t .
c(n + 3)
4
This statement has an exception if the second derivative vanishes, k = 0.
The period of revolution Trev can be calculated for the circular orbit, as
2π q
L = µa2 φ̇ = µa2 = µa3 |F (a)|,
Trev
so
s
µa
Trev = 2π
|F (a)|
which for the power law case is

s
µa1−n
Trev = 2π .
c
Thus the two periods Tosc and Trev are not equal unless n = −2, as in the
gravitational case. Let us define the apsidal angle ψ as the angle between
√
an apogee and the next perigee. It is therefore ψ = πTosc /Trev = π/ 3 + n.
For the gravitational case ψ = π, the apogee and perigee are on opposite sides
of the orbit. For a two- or three-dimensional harmonic oscillator F (r) = −kr
we have n = 1, ψ = 12 π, and now an orbit contains two apogees and two
perigees, and is again an ellipse, but now with the center-of-force at the
center of the ellipse rather than at one focus.
Note that if ψ/π is not rational, the orbit never closes, while if ψ/π = p/q,
the orbit will close after p revolutions, having reached q apogees and perigees.
The orbit will then be closed, but unless p = 1 it will be self-intersecting.
This exact closure is also only true in the small deviation approximation;
more generally, Bertrand’s Theorem states that only for the n = −2 and
n = 1 cases are the generic orbits closed.
In the treatment of planetary motion, the precession of the perihelion is
the angle though which the perihelion slowly moves, so it is 2ψ −2π per orbit.
We have seen that it is zero for the pure inverse force law. There is actually
some precession of the planets, due mostly to perturbative effects of the other
planets, but also in part due to corrections to Newtonian mechanics found
from Einstein’s theory of general relativity. In the late nineteenth century
discrepancies in the precession of Mercury’s orbit remained unexplained, and
the resolution by Einstein was one of the important initial successes of general
relativity.
3.3 The Laplace-Runge-Lenz Vector

The remarkable simplicity of the motion for the Kepler and harmonic oscilla-
tor central force problems is in each case connected with a hidden symmetry.
We now explore this for the Kepler problem.
For any central force problem F~ = p~˙ = f (r)êr we have a conserved
angular momentum L ~˙ = µ~r˙ × ~r˙ + (f (r)/r)~r × ~r = 0. The
~ = µ(~r × ~r˙), for L
motion is therefore confined to a plane perpendicular to L, ~ and the vector
~
p~ × L is always in the plane of motion, as are ~r and p~. Consider the evolution
~ with time5
of p~ × L
d
~ = p~˙ × L
~ = F~ × L ~ = µf (r)êr × (~r × ~r˙)
p~ × L
dt
= µf (r) ~rêr · ~r˙ − ~r˙êr · ~r = µf (r)(ṙ~r − r~r˙)
On the other hand, the time variation of the unit vector êr = ~r/r is
d d ~r ~r˙ ṙ~r ṙ~r − r~r˙

êr = = − 2 =− .
dt dt r r r r2
For the Kepler case, where f (r) = −K/r2 , these are proportional to each
other with a constant ratio, so we can combine them to form a conserved
quantity A~ = p~ × L
~ − µK êr , called6 the Laplace-Runge-Lenz vector,
~
dA/dt = 0.
While we have just found three conserved quantities in addition to the
conserved energy and the three conserved components of L, ~ these cannot all
~
be independent. Indeed we have already noted that A lies in the plane of
~ so A
motion and is perpendicular to L, ~ ·L
~ = 0. If we dot A
~ into the position
vector,
~ · ~r = ~r · (~p × (~r × p~)) − µKr = (~r × p~)2 − µKr = L2 − µKr,
A
~ and ~r, we have Ar cos θ + µKr = L2 , or
so if θ is the angle between A
!
1 µK A
= 2 1+ cos θ ,
r L µK
5 ~ B×
Some hints: A×( ~ C)~ = B(
~ A·
~ C)−
~ C( ~ A· ~ and êr ·~r˙ = (1/r)~r ·~r˙ = (1/2r)d(r2 )/dt =
~ B),
ṙ. The first equation, known as the bac-cab equation, is shown in Appendix A.1.
6
by Goldstein, at least. While others often use only the last two names, Laplace clearly
has priority.
3.4. THE VIRIAL THEOREM 77
which is an elegant way of deriving the formula we found previously by

integration, with A = µKe. Note θ = 0 is the perigee, so A ~ is a constant
vector pointing towards the perigee.
We also see that the magnitude of A~ is given in terms of e, which we have
previously related to L and E, so A2 = µ2 K 2 + 2µEL2 is a further relation
among the seven conserved quantities, showing that only five are indepen-
dent. There could not be more than five independent conserved functions
depending analytically on the six variables of phase space (for the relative
motion only), for otherwise the point representing the system in phase space
would be unable to move. In fact, the five independent conserved quantities
on the six dimensional dimensional phase space confine a generic invariant
set of states, or orbit, to a one dimensional subspace. For power laws other
than n = −2 and n = 1, as the orbits do not close, they are dense in a two
dimensional region of phase space, indicating that there cannot be more than
four independent conserved analytic functions on phase space. So we see the
connection between the existence of the conserved A ~ in the Kepler case and
the fact that the orbits are closed.
3.4 The virial theorem

p~i · ~ri . Then the rate
P
Consider a system of particles and the quantity G = i
at which this changes is
dG X ~
= Fi · ~ri + 2T.
dt
If the system returns to a region in phase space where it had been, after some
time, G returns to what it was, and the average value of dG/dt vanishes,
* +
dG DX E
= F~i · ~ri + 2 hT i = 0.
dt
This average will also be zero if the region stays in some bounded part of
phase space for which G can only take bounded values, and the averaging time
is taken to infinity. This is appropriate for a system in thermal equilibrium,
for example.
Consider a gas of particles which interact only with the fixed walls of the
container, so that the force acts only on the surface, and the sum becomes
an integral over dF~ = −pdA, ~ where p is the uniform pressure and dA ~ is
an outward pointing vector representing a small piece of the surface of the

volume. Then
DX E Z Z
F~i · ~ri = − ~ = −p
p~r · dA ∇ · ~rdV = −3pV
δV V
so h2T i = 3pV . In thermodynamics we have the equipartition theorem

which states that hT i = 32 N kB τ , where N is the number of particles, kB is
Boltzmann’s constant and τ the temperature, so pV = N kB τ .
A very different application occurs for a power law central force between
pairs of particles, say for a potential U (~ri , ~rj ) = c|~ri − ~rj |n+1 . Then this
action and reaction contribute F~ij · ~rj + F~ji · ~ri = F~ji · (~ri − ~rj ) =
−(n + 1)c|~ri − ~rj |n+1 = −(n + 1)U (~ri , ~rj ). So summing over all the particles
and using h2T i = −h F~ · ~ri, we have
P
n+1
hT i = hU i.
2
For Kepler, n = −2, so hT i = − 12 hU i = −hT +U i = −E must hold for closed
orbits or for large systems of particles which remain bound and uncollapsed.
It is not true, of course, for unbound systems which have E > 0.
The fact that the average value of the kinetic energy in a bound system
gives a measure of the potential energy is the basis of the measurements
of the missing mass, or dark matter, in galaxies and in clusters of galaxies.
This remains a useful tool despite the fact that a multiparticle gravitationally
bound system can generally throw off some particles by bringing others closer
together, so that, strictly speaking, G does not return to its original value or
remain bounded.
3.5 Rutherford Scattering

We have discussed the 1/r potential in terms of Newtonian gravity, but of
course it is equally applicable to Coulomb’s law of electrostatic forces. The
force between nonrelativistic charges Q and q is given7 by
1 Qq
F~ = ~r,
4π0 r3
7
Here we use S. I. or rationalized MKS units. For Gaussian units drop the 4π0 , or for
Heaviside-Lorentz units drop only the 0 .
3.5. RUTHERFORD SCATTERING 79
and the potential energy is U (r) = −K/r with K = −Qq/4π0 .
Unlike gravity, the force is not al-

ways attractive (K > 0), and for θ
like sign charges we have K < 0,
and therefore U and the total en-
ergy are always positive, and there r
are no bound motions. Whatever α
φ
the relative signs, we are going to
consider scattering here, and there- α
fore positive energy solutions with
the initial state of finite speed v0
and r → ∞. Thus the relative mo-
tion is a hyperbola, with
1+e
r = rp
1 + e cos φ
s
2EL2 b
e = ± 1+ .
µK 2
This starts and ends with r → ∞, Rutherford scattering. An α par-
at φ → ±α = ± cos−1 (−1/e), and ticle approaches a heavy nucleus
the angle θ through which the ve- with an impact parameter b, scat-
locity changes is called the scattering through an angle θ. The
tering angle. For simplicity we cross sectional area dσ of the inci-
will consider the repulsive case, dent beam is scattered through an-
with e < 0 so that α < π/2. gles ∈ [θ, θ + dθ].
We see that θ = π − 2α, so
s
θ cos α |e|−1 1 µK 2
tan = cot α = √ = q =√ 2 = .
2 2
1 − cos α 1 − |e|−2 e −1 2EL2
We have K = Qq/4π0 . We need to evaluate E and L. At r = ∞, U → 0,

E = 21 µv02 , L = µbv0 , where b is the impact parameter, the distance by
which the asymptotic line of the initial motion misses the scattering center.
Thus
|K|
s
θ µ
tan = |K| 2
= . (3.7)
2 µv0 (µbv0 )2 µbv02
The scattering angle therefore depends on b, the perpendicular displace-

ment from the axis parallel to the beam through the nucleus. Particles
passing through a given area will be scattered through a given angle, with
a fixed angle θ corresponding to a circle centered on the axis, having radius
b(θ) given by 3.7. The area of the beam dσ in an annular ring of impact
parameters ∈ [b, b + db] is dσ = 2πb|db|. To relate db to dθ, we differentiate
the scattering equation for fixed v0 ,
1 θ −K
sec2 dθ = 2 2 db,
2 2 µv0 b
dσ µv02 b2 πµv02 b3
= 2πb =
dθ 2K cos2 (θ/2) K cos2 (θ/2)
!3 !3 !2
πµv02 K cos θ/2 K cos θ/2
= =π
K cos2 (θ/2) µv02 sin θ/2 µv02 sin3 θ/2
!2
π K sin θ
= .
2 µv02 sin4 θ/2
(The last expression is useful because sin θdθ is the “natural measure” for θ,
in the sense that integrating over volume in spherical coordinates is d3 V =
r2 dr sin θdθdφ.)
How do we measure dσ/dθ? There is a beam of N particles shot at
random impact parameters onto a foil with n scattering centers per unit
area, and we confine the beam to an area A. Each particle will be significantly
scattered only by the scattering center to which it comes closest, if the foil
is thin enough. The number of incident particles per unit area is N/A, and
the number of scatterers being bombarded is nA, so the number which get
scattered through an angle ∈ [θ, θ + dθ] is
N dσ dσ
× nA × dθ = N n dθ.
A dθ dθ
We have used the cylindrical symmetry of this problem to ignore the φ
dependance of the scattering. More generally, the scattering would not be
uniform in φ, so that the area of beam scattered into a given region of (θ,φ)
would be
dσ
dσ = sin θdθdφ,
dΩ
where dσ/dΩ is called the differential cross section. For Rutherford scat-
tering we have
!2
dσ 1 K θ
= 2
csc4 .
dΩ 4 µv0 2
Scattering in other potentials

We see that the cross section depends on the angle through which the incident
particle is scattered for a given impact parameter. In Rutherford scattering
θ increases monotonically as b decreases, which is possible only because the
force is “hard”, and a particle aimed right at the center will turn around
rather than plowing through. This was a surprize to Rutherford, for the
concurrent model of the nucleus, Thompson’s plum pudding model, had the
nuclear charge spread out over some atomic-sized spherical region, and the
Coulomb force would have decreased once the alpha particle entered this
region. So sufficiently energetic alpha particles aimed at the center should
have passed through undeflected instead of scattered backwards. In fact, of
course, the nucleus does have a finite size, and this is still true, but at a much
smaller distance, and therefore a much larger energy.
If the scattering angle θ(b) does run smoothly from 0 at b = 0 to 0 at
b → ∞, as shown, then there is an extremal value for which dθ/db|b0 = 0,
and for θ < θ(b0 ), dσ/dθ can get contributions from several different b’s,

dσ X bi db
= .
dΩ i sin θ dθ i

It also means that the cross sec-

tion becomes infinite as θ → θ(b0 ),
and vanishes above that value of θ
θ. This effect is known as rain- θ (b0 )
bow scattering, and is the cause
of rainbows, because the scattering
for a given color light off a water b0
droplet is very strongly peaked at b
the maximum angle of scattering.
Another unusual effect occurs when θ(b) becomes 0 or π for some nonzero
value of b, with db/dθ finite. Then dσ/dΩ
R
blows up due to the sin θ in the
denominator, even though the integral (dσ/dΩ) sin θdθdφ is perfectly finite.
This effect is called glory scattering, and can be seen around the shadow
of a plane on the clouds below.
Exercises
3.1 A space ship is in circular orbit at radius R and speed v1 , with the period
of revolution τ1 . The crew wishes to go to planet X, which is in a circular orbit
of radius 2R, and to revolve around the Sun staying near planet X. They propose
to do this by firing two blasts, one putting them in an orbit with perigee R and
apogee 2R, and the second, when near X, to change their velocity so they will have
the same speed as X.
• (a) By how much must the first blast change their velocity? Express your
answer in terms of v1 .
• (b) How long will it take until they reach the apogee? Express your answer
in terms of τ1
• (c) By how much must the second blast change their speed? Will they need
to slow down or speed up, relative to the sun.
3.2 Consider a spherical droplet of water in the sunlight. A ray of light with
impact parameter b is refracted, so by Snell’s Law n sin β = sin α. It is then
internally reflected once and refracted again on the way out.
(a) Express the scattering angle θ in terms of α and β.
(b) Find the scattering cross section
dσ/dΩ as a function of θ, α and β
(which is implicitly a function of θ
from (a) and Snell’s Law).
(c) The smallest value of θ is called
the rainbow scattering angle. Why? α
Find it numerically to first order in
δ if the index of refraction is n =
b β
1.333 + δ
(d) The visual spectrum runs from vi-
olet, where n = 1.343, to red, where
n = 1.331. Find the angular radius
of the rainbow’s circle, and the an- θ
gular width of the rainbow, and tell
whether the red or blue is on the out- One way light can scatter from a
side. spherical raindrop.
3.3 Consider a particle constrained to move on the surface described in cylindrical

coordinates by z = αr3 , subject to a constant gravitational force F~ = −mgêz .
Find the Lagrangian, two conserved quantities, and reduce the problem to a one
dimensional problem. What is the condition for circular motion at constant r?
3.4 From the general expression for φ as an integral over r, applied to a three
dimensional symmetrical harmonic oscillator U (~r) = 21 kr2 , integrate the equation,
and show that the motion is an ellipse, with the center of force √ at the center of
the ellipse. Consider the three complex quantities Qi = pi − i kmri , and show
that each has a very simple equation of motion, as a consequence of which the
nine quantities Q∗i Qk are conserved. Identify as many as possible of these with
previously known conserved quantities.
3.5 Show that if a particle under the influence of a central force has an orbit
which is a circle passing through the point of attraction, then the force is a power
law with |F | ∝ r−5 . Assuming the potential is defined so that U (∞) = 0, show
that for this particular orbit E = 0. In terms of the diameter and the angular
momentum, find the period, and by expressing ẋ, ẏ and the speed as a function of
the angle measured from the center of the circle, and its derivative, show that ẋ, ẏ
and the speed all go to infinity as the particle passes through the center of force.
3.6 For the Kepler problem we have the relative position tracing out an ellipse.
What is the curve traced out by the momentum in momentum space? Show that
~ × A/L
it is a circle centered at L ~ 2 , where L
~ and A
~ are the angular momentum and
Runge-Lenz vectors respectively.
3.7 The Rutherford cross section implies all incident projectiles will be scattered
and emerge at some angle θ, but a real planet has a finite radius, and a projectile
that hits the surface is likely to be captured rather than scattered.
What is the capture cross section for an airless planet of radius R and mass M
for a projectile with a speed v0 ? How is the scattering differential cross section
modified from the Rutherford prediction?
3.8 In problem 2.12 we learned that the general-relativistic motion of a particle

in a gravitational field is given by Hamilton’s variational principle on the path
xµ (λ) with the action
v
dxµ dxν
Z uX
L = mct gµν (xρ )
u
S= dλL with ,
µν dλ dλ
where we may freely choose the path parameter λ to be the proper time (after
√
doing the variation), so that the is c, the speed of light.
The gravitational field of a static point mass M is given by the
Schwartzschild metric
2GM 2GM

g00 = 1 − , grr = −1 1− , gθθ = −r2 , gφφ = −r2 sin2 θ,
rc2 rc2
where all other components of gµν are zero. Treating the four xµ (λ) as the coordi-
nates, with λ playing the role of time, find the four conjugate momenta pµ , show
that p0 and pφ = L are constants, and use the freedom to choose
v
1 dxµ dxν
Z uX
gµν (xρ )
u
λ=τ = t
c µν dλ dλ
to show m2 c2 = g µν pµ pν , where g µν is the inverse matrix to gαβ . Use this to

P
µν
show that s
dr 2GM L2 2GM L2

= κ− − + 2 2− 2 3 2 ,
dτ r m r m r c
where κ is a constant. For an almost circular orbit at the minimum r = a of
the effective potential this implies, show that the precession of the perihelion is
6πGM/ac2 .
Find the rate of precession for Mercury, with G = 6.67 × 10−11 Nm2 /kg2 , M =
1.99 × 1030 kg and a = 5.79 × 1010 m, per revolution, and also per century, using
the period of the orbit as 0.241 years.
Chapter 4
Rigid Body Motion
In this chapter we develop the dynamics of a rigid body, one in which all
interparticle distances are fixed by internal forces of constraint. This is,
of course, an idealization which ignores elastic and plastic deformations to
which any real body is susceptible, but it is an excellent approximation for
many situations, and vastly simplifies the dynamics of the very large number
of constituent particles of which any macroscopic body is made. In fact, it
reduces the problem to one with six degrees of freedom. While the ensuing
motion can still be quite complex, it is tractible. In the process we will be
dealing with a configuration space which is a group, and is not a Euclidean
space. Degrees of freedom which lie on a group manifold rather than Eu-
clidean space arise often in applications in quantum mechanics and quantum
field theory, in addition to the classical problems we will consider such as
gyroscopes and tops.
4.1 Configuration space for a rigid body

A macroscopic body is made up of a very large number of atoms. Describing
the motion of such a system without some simplifications is clearly impos-
sible. Many objects of interest, however, are very well approximated by the
assumption that the distances between the atoms in the body are fixed1 ,
|~rα − ~rβ | = cαβ = constant. (4.1)

1
In this chapter we will use Greek letters as subscripts to represent the different particles
within the body, reserving Latin subscripts to represent the three spatial directions.
85
86 CHAPTER 4. RIGID BODY MOTION
This constitutes a set of holonomic constraints, but not independent ones, as

we have here 21 n(n − 1) constraints on 3n coordinates. Rather than trying to
solve the constraints, we can understand what are the generalized coordinates
by recognizing that the possible motions which leave the interparticle lengths
fixed are combinations of
~
• translations of the body as a whole, ~rα → ~rα + C,
• rotations of the body about some fixed, or “marked”, point.
We will need to discuss how to represent the latter part of the configuration,
(including what a rotation is), and how to reexpress the kinetic and potential
energies in terms of this configuration space and its velocities.
The first part of the configuration, describing the translation, can be
specified by giving the coordinates of the marked point fixed in the body,
R(t).
e Often, but not always, we will choose this marked point to be the
center of mass R(t) ~ of the body. In order to discuss other points which are
part of the body, we will use an orthonormal coordinate system fixed in the
body, known as the body coordinates, with the origin at the fixed point
R.
e The constraints mean that the position of each particle of the body has
fixed coordinates in terms of this coordinate system. Thus the dynamical
configuration of the body is completely specified by giving the orientation of
these coordinate axes in addition to R. e This orientation needs to be described
relative to a fixed inertial coordinate system, or inertial coordinates, with
orthonormal basis êi .
Let the three orthogonal unit vectors defining the body coordinates be
0
êi , for i = 1, 2, 3. Then the position of any particle α in the body which has
coordinates b0αi in the body coordinate system is at the position ~rα = R e+
P 0 0 P
i bαi êi . In order to know its components in the inertial frame ~ rα = i rαi êi
0
we need to know the coordinates of the three vectors êi in terms of the inertial
coordinates,
ê0i =
X
Aij êj . (4.2)
j
e = PR
The nine quantities Aij , together with the three components of R e ê ,
i i
specify the position of every particle,
b0αj Aji ,
X
rαi = R̃i +
j
4.1. CONFIGURATION SPACE FOR A RIGID BODY 87
and the configuration of the system is completely specified by R e (t) and

i
Aij (t).
The nine real quantities in the matrix Aij are not independent, for the
basis vectors ê0i of the body-fixed coordinate system are orthonormal,
ê0i · ê0k = δik =

X X X
Aij Ak` êj · ê` = Aij Ak` δj` = Aij Akj ,
j` j` j
or in matrix languag AAT = 1I. Such a matrix of real values, whose transpose
is equal to its inverse, is called orthogonal, and is a transformation of basis
vectors which preserves orthonormality of the basis vectors. Because they
play such an important role in the study of rigid body motion, we need to
explore the properties of orthogonal transformations in some detail.
4.1.1 Orthogonal Transformations

There are two ways of thinking about an orthogonal transformation A and
its action on an orthonormal basis, (Eq. 4.2). One way is to consider that
{êi } and {ê0i } are simply different basis vectors used to describe the same
physical vectors in the same vector space. A vector V~ is the same vector
whether it is expanded in one basis V~ = j Vj êj or the other V~ = i Vi0 ê0i .
P P
Thus
V~ = Vi0 ê0i = Vi0 Aij êj ,

X X X
Vj êj =
j i ij
and we may conclude from the fact that the êj are linearly independent
that Vj = i Vi0 Aij , or in matrix notation that V = AT V 0 . Because A is
P
orthogonal, multiplying by A (from the left) gives V 0 = AV , or
Vi0 =
X
Aij Vj . (4.3)
j
Thus A is to be viewed as a rule for giving the primed basis vectors in terms
of the unprimed ones (4.2), and also for giving the components of a vector in
the primed coordinate system in terms of the components in the unprimed
one (4.3). This picture of the role of A is called the passive interpretation.
One may also use matrices to represent a real physical transformation
of an object or quantity. In particular, Eq. 4.2 gives A the interpretation
of an operator that rotates each of the coordinate basis ê1 , ê2 , ê3 into the
corresponding new vector ê01 , ê02 , or ê03 . For real rotation of the physical
system, all the vectors describing the objects are changed by the rotation
into new vectors V~ → V~ (R) , physically different from the original vector, but
having the same coordinates in the primed basis as V has in the unprimed
basis. This is called the active interpretation of the transformation. Both
active and passive views of the transformation apply here, and this can easily
lead to confusion. The transformation A(t) is the physical transformation
which rotated the body from some standard orientation, in which the body
axes ê0i were parallel to the “lab frame” axes êi , to the configuration of the
body at time t. But it also gives the relation of the components of the same
position vectors (at time t) expressed in body fixed and lab frame coordinates.
If we first consider rotations in two dimensions, it is clear that they are
generally described by the counterclockwise angle θ through which the basis
is rotated,
ê01 = cos θê1 + sin θê2 θ e^2 ^e’

1
ê02 = − sin θê1 + cos θê2 e^’2
θ
corresponding to the matrix ^
e1
cos θ sin θ

A= . (4.4)
− sin θ cos θ
Clearly taking the transpose simply changes the sign of θ, which is just
what is necessary to produce the inverse transformation. Thus each two
dimensional rotation is an orthogonal transformation. The orthogonality
equation A · AT = 1 has four matrix elements. It is straightforward to show
that these four equations on the four elements of A determine A to be of
the form (4.4) except that the sign of the bottom row is undetermined. For
example, the transformation ê01 = ê1 , ê02 = −ê2 is orthogonal but is not
a rotation. Let us call this transformation P . Thus any two-dimensional
orthogonal matrix is a rotation or is P followed by a rotation. The set of
all real orthogonal matrices in two dimensions is called O(2), and the subset
consisting of rotations is called SO(2).
In three dimensions we need to take some care with what we mean by
a rotation. On the one hand, we might mean that the transformation has
some fixed axis and is a rotation through some angle about that axis. Let
us call that a rotation about an axis. On the other hand, we might mean
all transformations we can produce by a sequence of rotations about various
axes. Let us define rotation in this sense. Clearly if we consider the rotation
R which rotates the basis {ê} into the basis {ê0 }, and if we have another
rotation R0 which rotates {ê0 } into {ê00 }, then the transformation which first
does R and then does R0 , called the composition of them, R̆ = R0 ◦R, is also
a rotation in this latter sense. As ê00i = j Rij 0 0 0
P P
êj = ij Rij Rjk êk , we see that
R̆ik = j Rij Rjk and êi = k R̆ik êk . Thus the composition R̆ = R0 R is given
0 00
P P
by matrix multiplication. In two dimensions, straightforward evaluation will

verify that if R and R0 are of the form (4.4) with angles θ and θ0 respectively,
the product R̆ is of the same form with angle θ̆ = θ + θ0 . Thus all rotations
are rotations about an axis there. Rotations in three dimensions are a bit
more complex, because they can take place in different directions as well as
through different angles. We can still represent the composition of rotations
with matrix multiplication, now of 3×3 matrices. In general, matrices do not
commute, AB 6= BA, and this is indeed reflected in the fact that the effect
of performing two rotations depends in the order in which they are done.
A graphic illustration is worth trying. Let V be the process of rotating an
object through 90◦ about the vertical z-axis, and H be a rotation through 90◦
about the x-axis, which goes goes off to our right. If we start with the book
lying face up facing us on the table, and first apply V and then H, we wind
up with the binding down and the front of the book facing us. If, however,
we start from the same position but apply first H and then V , we wind up
with the book standing upright on the table with the binding towards us.
Clearly the operations H and V do not commute.
It is clear that any composition of rotations must be orthogonal, as any

set of orthonormal basis vectors will remain orthonormal under each trans-
formation. It is also clear that there is a three dimensional version of P , say
ê01 = ê1 , ê02 = ê2 , ê03 = −ê3 , which is orthogonal but not a composition of
rotations, for it changes a right-handed coordinate system (with ê1 × ê2 = ê3 )
to a left handed one, while rotations preserve the handedness. It is straight-
forward to show, in any dimension N , that any composition of orthogonal
matrices is orthogonal, for if AAT = 1I and BB T = 1I and C = AB, then
CC T = AB(AB)T = ABB T AT = A 1I AT = 1I, and C is orthogonal as well.
So the rotations are a subset of the set O(N ) of orthogonal matrices.
H
V
V:
H:
H V
Figure 4.1: The results of applying the two rotations H and V to a book
depends on which is done first. Thus rotations do not commute. Here we
are looking down at a book which is originally lying face up on a table. V is
a rotation about the vertical z-axis, and H is a rotation about a fixed axis
pointing to the right, each through 90◦ .
4.1.2 Groups
This set of orthogonal matrices is a group, which means that the set O(N )
satisfies the following requirements, which we state for a general set G.
A set G of elements A, B, C, ... together with a group multiplication
rule () for combining two of them, is a group if
• Given any two elements A and B in the group, the product A B is

also in the group. One then says that the set G is closed under .
In our case the group multiplication is ordinary matrix multiplication,
the group consists of all N × N orthogonal real matrices, and we have
just shown that it is closed.
• The product rule is associative; for every A, B, C ∈ G, we have A

(B C) = (A B) C. For matrix multiplication this is simply due
P P P P
to the commutivity of finite sums, i j = j i .
• There is an element e in G, called the identity, such that for every

element A ∈ G, e A = A e = A. In our case e is the unit matrix
1I, 1Iij = δij .
• Every element A ∈ G has an element A−1 ∈ G such that A A−1 =

A−1 A = e. This element is called the inverse of A, and in the case of
orthogonal matrices is the inverse matrix, which always exists, because
for orthogonal matrices the inverse is the transpose, which always exists
for any matrix.
While the constraints (4.1) would permit A(t) to be any orthogonal ma-
trix, the nature of Newtonian mechanics of a rigid body requires it to vary
continuously in time. If the system starts with A = 1I, there must be a contin-
uous path in the space of orthogonal matrices to the configuration A(t) at any
later time. But the set of matrices O(3) is not connected in this fashion: there
is no path from A = 1I to A = P . To see it is true, we look at the determinant
of A. From AAT = 1I we see that det(AAT ) = 1 = det(A) det(AT ) = (det A)2
so det A = ±1 for all orthogonal matrices A. But the determinant varies con-
tinuously as the matrix does, so no continuous variation of the matrix can
lead to a jump in its determinant. Thus the matrices which represent rota-
tions have unit determinant, det A = +1, and are called unimodular.
The set of all unimodular orthogonal matrices in N dimensions is called
SO(N ). It is a subset of O(N ), the set of all orthogonal matrices in N
dimensions. Clearly all rotations are in this subset. The subset is closed
under multiplication, and the identity and the inverses of elements in SO(N )
are also in SO(N ), for their determinants are clearly 1. Thus SO(N ) is a
subgroup of O(N ). It is actually the set of rotations, but we shall prove
this statement only for the case N = 3, which is the immediately relevant
one. Simultaneously we will show that every rotation in three dimensions is
a rotation about an axis. We have already proven it for N = 2. We now
show that every A ∈ SO(3) has one vector it leaves unchanged or invariant,
so that it is effectively a rotation in the plane perpendicular to this direction,
or in other words a rotation about the axis it leaves invariant. The fact that
every unimodular orthogonal matrix in three dimensions is a rotation about
an axis is known as Euler’s Theorem. To show that it is true, we note that
if A is orthogonal and has determinant 1,
n o
det (A − 1I)AT = det(1I − AT ) = det(1I − A)
= det(A − 1I) det(A) = det(−(1I − A)) = (−1)3 det(1I − A)
= − det(1I − A),
so det(1I − A) = 0 and 1I − A is a singular matrix. Then there exists a vector
~ which is annihilated by it, (1I − A)~ω = 0, or A~ω = ω
ω ~ , and ω~ is invariant
under A. Of course this determines only the direction of ω ~ , and only up
to sign. If we choose a new coordinate system in which the z̃-axis points
along ω ~ , we see that the elements Ãi3 = (0, 0, 1), and orthogonality gives
P 2
Ã3j = 1 = Ã233 so Ã31 = Ã32 = 0. Thus Ã is of the form
0
 
(B )
Ã =  0
0 0 1
where B is an orthogonal unimodular 2 × 2 matrix, which is therefore a
rotation about the z-axis through some angle ω, which we may choose to be
in the range ω ∈ (−π, π]. It is natural to define the vector ω
~ , whose direction
only was determined above, to be ω ~ = ωêz̃ . Thus we see that the set of
orthogonal unimodular matrices is the set of rotations, and elements of this
set may be specified by a vector2 of length ≤ π.
2
More precisely, we choose ω ~ along one of the two opposite directions left invariant by
A, so that the the angle of rotation is non-negative and ≤ π. This specifies a point in or on
the surface of a three dimensional ball of radius π, but in the case when the angle is exactly
π the two diametrically opposed points both describe the same rotation. Mathematicians
say that the space of SO(3) is three-dimensional real projective space P3 (R)[4].
4.2. KINEMATICS IN A ROTATING COORDINATE SYSTEM 93
Thus we see that the rotation which determines the orientation of a rigid
body can be described by the three degrees of freedom ω ~ . Together with
the translational coordinates R, this parameterizes the configuration space
e
of the rigid body, which is six dimensional. It is important to recognize that
this is not motion in a flat six dimensional configuration space, however. For
example, the configurations with ω ~ = (0, 0, π − ) and ω~ = (0, 0, −π + )
approach each other as → 0, so that motion need not even be continuous
in ω~ . The composition of rotations is by multiplication of the matrices, not
by addition of the ω~ ’s. There are other ways of describing the configuration
space, two of which are known as Euler angles and Cayley-Klein parameters,
but none of these make describing the space very intuitive. For some purposes
we do not need all of the complications involved in describing finite rotations,
but only what is necessary to describe infinitesimal changes between the
configuration at time t and at time t + ∆t. We will discuss these applications
first. Later, when we do need to discuss the configuration in section 4.4.2,
we will define Euler angles.
4.2 Kinematics in a rotating coordinate sys-

tem
We have seen that the rotations form a group. Let us describe the configu-
ration of the body coordinate system by the position R(t) e of a given point
0
and the rotation matrix A(t) : êi → êi which transforms the canonical fixed
basis (inertial frame) into the body basis. A given particle of the body is
fixed in the body coordinates, but this, of course, is not an inertial coordinate
system, but a rotating and possibly accelerating one. We need to discuss the
transformation of kinematics between these two frames. While our current
interest is in rigid bodies, we will first derive a general formula for rotating
(and accelerating) coordinate systems.
Suppose a particle has coordinates ~b(t) = i b0i (t)ê0i (t) in the body system.
P
We are not assuming at the moment that the particle is part of the rigid
body, in which case the b0i (t) would be independent of time. In the inertial
coordinates the particle has its position given by ~r(t) = R(t) e + ~b(t), but the
coordinates of ~b(t) are different in the space and body coordinates. Thus
X
ri (t) = R
e (t) + b (t) = R
i i
e (t) +
i A−1 (t) b0 (t).
ij j
j
P
The velocity is ~v = i ṙi êi , because the êi are inertial and therefore consid-
ered stationary, so
 
db0j (t)
!
d −1
b0j (t) + A−1 (t)
X
~v = R
ė +  A (t)  êi ,
ij dt ij
ij dt
P 0 0 0
and not R
ė +
i (dbi /dt)êi , because the êi are themselves changing with time.
We might define a “body time derivative”
db0i 0
! !
d~

~b˙
X
:= b := êi ,
b dt b i dt
but it is not the velocity of the particle α, even with respect to R(t),
e in the
sense that physically a vector is basis independent, and its derivative requires
a notion of which basis vectors are considered time independent (inertial) and
which are not. Converting the inertial evaluation to the body frame requires
−1 ~˙
the velocity to include the dA /dt term as well as the b term.
b
What is the meaning of this extra term
!
d −1
b0j (t)êi
X
V= A (t) ?
ij dt ij
The derivative is, of course,

1 X h −1 i
V = lim A (t + ∆t)ij − A−1 (t)ij b0j (t)êi .
∆t→0 ∆t
ij
This expression has coordinates in the body frame with basis vectors from
the inertial frame. It is better to describe it in terms of the body coordinates
and body basis vectors by inserting êi = k (A−1 (t)ik ê0k (t) = k Aki (t)ê0k (t).
P P
Then we have
1 h i
ê0k lim A(t)A−1 (t + ∆t) − A(t)A−1 (t) b0j (t).
X
V=
∆t→0 ∆t kj
kj
The second term is easy enough to understand, as A(t)A−1 (t) = 1I, so the
full second term is just ~b expressed in the body frame. The interpretation of
the first term is suggested by its matrix form: A−1 (t + ∆t) maps the body
4.2. KINEMATICS IN A ROTATING COORDINATE SYSTEM 95
basis at t + ∆t to the inertial frame, and A(t) maps this to the body basis
at t. So together this is the infinitesimal rotation ê0i (t + ∆t) → ê0i (t). This
transformation must be close to an identity, as ∆t → 0. Let us expand it:
B := A(t)A−1 (t + ∆t) = 1I − Ω0 ∆t + O(∆t)2 . (4.5)
Here Ω0 is a matrix which has fixed (finite) elements as ∆t → 0, and is
called the generator of the rotation. Note B −1 = 1I + Ω0 ∆t to the order we
are working, while the transpose B T = 1I − Ω0 T ∆t, so because we know B is
orthogonal we must have that Ω0 is antisymmetric, Ω0 = −Ω0 T , Ω0ij = −Ω0ji .
Subtracting 1I from both sides of (4.5) and taking the limit shows that
the matrix
!
d d
Ω (t) = −A(t) · A−1 (t) =
0
A(t) · A−1 (t),
dt dt
where the latter equality follows from differentiating A · A−1 = 1I. The
antisymmetric 3 × 3 real matrix Ω0 is determined by the three off-diagonal
elements above the diagonal, Ω023 = ω10 , Ω013 = −ω20 , Ω012 = ω30 . as the
others are given by antisymmetry. Thus it is effectively a vector. It is very
useful to express this relationship by defining the Levi-Civita symbol ijk ,
a totally antisymmetric rank 3 tensor specified by 123 = 1. Then the above
expressions are given by Ω0ij = k ijk ωk0 , and we also have
P
1X 1X
kij Ω0ij = kij ij` ω`0 = ωk0 ,
2 ij 2 ij`
because, as explored in Appendix A.1,
X X
kij = ijk , ijk ipq = δjp δkq − δjq δkp , so ijk ij` = 2δk` .
i ij
Thus ωk0 and Ω0ij are essentially the same thing.

We have still not answered the question, “what is V?”
1
ê0k lim [B − 1I]kj b0j = − ê0k Ω0kj b0j = − ê0k kj` ω`0 b0j
X X X
V =
∆t→0 ∆t
kj kj kj`
~ × ~b,
= ω
~ = ` ω`0 ê0` . Note we have used Eq. A.4 for the cross-product. Thus
P
where ω
we have shown that
˙
~v = R ~ × ~b + (~b)b ,
ė + ω (4.6)
and the second term, coming from V, represents the motion due to the ro-
tating coordinate system.
When differentiating a true vector, which is independent of the origin
of the coordinate system, rather than a position, the first term in (4.6) is
~
absent, so in general for a vector C,
 
~
d ~  dC
C= ~
 + ω × C. (4.7)
dt dt
b
ė and ~b, the latter because it is the difference

The velocity ~v is a vector, as are R
of two positions. The angular velocity ω ~ is also a vector3 , and its derivative
is particularly simple, because
! !
d d~ω d~ω
~˙ = ω
ω ~ = ~ ×ω
+ω ~ = . (4.8)
dt dt b
dt b
Another way to understand (4.7) is as a simple application of Leibnitz’

~ = P C 0 ê0 , noting that
rule to C i i
d 0 X d
(Ω0 A)ij êj = Ω0ik ê0k ,
X X
êi (t) = Aij (t)êj =
dt j dt j k
which means that the second term from Leibnitz is

d 0 ~
Ci0 Ci0 Ω0ik ê0k = Ci0 ikj ωj0 ê0k = ω
X X X
êi (t) = ~ × C,
dt ik ijk
˙
as given in (4.7). This shows that even the peculiar object (~b)b obeys (4.7).
Applying this to the velocity itself (4.6), we find the acceleration
d d ė dω ~ d d ˙
~a = ~v = R + × b + ω × ~b + (~b)b
dt dt dt   dt dt    
d~b d2~
b d~b
= R
ë + ω~˙ × ~b + ω ×   + ω ~ × ~b +  2  + ω ×  
dt dt dt
b b b
   
2~
d b d~b ˙ × ~b + ω

~b .
= R
ë +  + 2ω ×   +ω
~ ~ × ω ×
dt2 dt
b b
3
Actually ω
~ is a pseudovector, which behaves like a vector under rotations but changes
sign compared to what a vector does under reflection in a mirror.
4.3. THE MOMENT OF INERTIA TENSOR 97
This is a general relation between any orthonormal coordinate system and

an inertial one, and in general can be used to describe physics in noninertial
coordinates, regardless of whether that coordinate system is imbedded in a
rigid body. The full force on the particle is F~ = m~a, but if we use ~r, ~v 0 , and
~a 0 to represent ~b, (d~b/dt)b and (d2~b/dt2 )b respectively, we have an expression
for the apparent force
m~a 0 = F~ − mR ~˙ × ~r − m~ω × (~ω × ~r).

ω × ~v 0 − mω
ë − 2m~
The additions to the real force are the pseudoforce for an accelerating refer-
ence frame −mR,ë the Coriolus force −2m~ ω ×~v 0 , an unnamed force involving
the angular acceleration of the coordinate system −mω ~˙ ×~r, and the centrifu-
gal force −m~ω × (~ω × ~r) respectively.
4.3 The moment of inertia tensor

Let us return to a rigid body, where the particles are constrained to keep
the distances between them constant. Then the coordinates b0αi in the body
frame are independant of time, and
ė + ω × ~b
~vα = R α
so the individual momenta and the total momentum are
~ × ~bα
p~α = mα Ve + mα ω
P~ = M Ve + ω mα~bα
X
~×
α
= M Ve + Mω ~
~ ×B
~ is the center of mass position relative to the marked point R.
where B e
4.3.1 Motion about a fixed point

Angular Momentum
We next evaluate the total angular momentum, L ~ = Pα ~rα × pα . We will
first consider the special case in which the body is rotating about the origin,
so Re ≡ 0, and then we will return to the general case. As p ~α = mα ω~ × ~bα
already involves a cross product, we will find a triple product, and will use
the reduction formula4

~× B
A ~ ×C
~ =B
~ A
~·C
~ −C
~ A
~·B
~ .
Thus

~ = mα~bα × ω
~ × ~bα
X
L (4.9)
α

mα~bα2 − mα~bα ~bα · ω
X X
= ω
~ ~ . (4.10)
α α
~ need not be parallel to the angular velocity ω

We see that, in general, L ~ , but it
is always linear in ω ~
~ . Thus it is possible to generalize the equation L = I~ω of
elementary physics courses, but we need to generalize I from a multiplicative
number to a linear operator which maps vectors into vectors, not necessarily
in the same direction. In component language this linear operation is clearly
in the form Li = j Iij ωj , so I is a 3 × 3 matrix. Rewriting (4.10), we have
P

mα~bα2 − mα bαi ~bα · ω
X X
Li = ωi ~ .
α α

mα ~bα2 δij
XX
= − bαi bαj ωj
j α
X
≡ Iij ωj ,
j
where
mα ~bα2 δij − bαi bαj
X
Iij = (4.11)
α
is the inertia tensor about the fixed point R.

e In matrix form, we now have
(4.10) as
~ =I·ω
L ~, (4.12)
where I · ω
~ means a vector with components (I · ω
P
~ )i = j Iij ωj .
If we consider the rigid body in the continuum limit, the sum over particles
becomes an integral over space times the density of matter,
Z
Iij = d3 b ρ(~b) ~b 2 δij − bi bj . (4.13)
4
This formula is colloquially known as the bac-cab formula. It is proven in Appendix
A.1.
Kinetic energy
For a body rotating about the origin
1X 1X
T = mα~vα2 = mα ω~ × ~bα · ω~ × ~bα .
2 α 2 α
From the general 3-dimensional identity5

~×B
A ~ · C~ ×D~ =A ~·C ~B
~ ·D
~ −A
~·D
~B~ · C,
~
we have
1X
2
T = mα ω ~ 2~bα2 − ω ~ · ~bα
2 α
1X
mα ~bα2 δij − ~bαi~bαj
X
= ωi ωj
2 ij α
1X
= ωi Iij ωj . (4.14)
2 ij
or
1
T = ω ~ ·I·ω~.
2
Noting that j Iij ωj = Li , T = 12 ω
P
~ ·L~ for a rigid body rotating about the
~ measured from that origin.
origin, with L
4.3.2 More General Motion

When the marked point R e is not fixed in space, there is nothing special about
it, and we might ask whether it would be better to evaluate the moment of
inertia about some other point. Working in the body-fixed coordinates, we
may consider a given point ~b and evaluate the moment of inertia about that
point, rather than about the origin. This means ~bα is replaced by ~bα − ~b, so
2
(~b ) X
~bα − ~b
Iij = mα δij − (bαi − bi ) (bαj − bj )
α
h i
(0)
= Iij +M −2~b · B
~ + b2 δij + Bi bj + bi Bj − bi bj , (4.15)
~ is the position of the center of mass with respect to R,
where we recall that B e
the origin of the body fixed coordinates6 . Subtracting the moment of inertia
5
See Appendix A for a hint on how to derive this.
I is evaluated about the body-fixed position ~b = 0, or about R̃, so it is given by Eq.
6 (0)
4.11.
about the center of mass, given by (4.15) with b → B, we have

(~b ) ~) h i
(B
Iij − Iij = M −2~b · B
~ + b2 + B 2 δij + Bi bj + bi Bj − bi bj − Bi Bj
2
= M ~b − B
~ δij − (bi − Bi ) (bj − Bj ) . (4.16)
Note the difference is independent of the origin of the coordinate system,

depending only on the vector b̆ = ~b − B.
~
A possible axis of rotation can be specified by a point ~b through which
it passes, together with a unit vector n̂ in the direction of the axis7 . The
~
moment of inertia about the axis (~b, n̂) is defined as n̂·I(b ) ·n̂. If we compare
this to the moment about a parallel axis through the center of mass, we see
that
~
h i
n̂ · I(b ) · n̂ − n̂ · I(cm) · n̂ = M b̆2 n̂2 − (b̆ · n̂)2
= M (n̂ × b̆)2 = M b̆2⊥ , (4.17)
where b̆⊥ is the projection of the vector, from the center of mass to ~b, onto
the plane perpendicular to the axis. Thus the moment of inertia about any
axis is the moment of inertia about a parallel axis through the center of
mass, plus M `2 , where ` = b̆⊥ is the distance between these two axes. This
is known as the parallel axis theorem.
The general motion of a rigid body involves both a rotation and a trans-
lation of a given point R.
e Then
~rα = R̃ + ~bα , ~r˙α = Ve + ω

~ × ~bα , (4.18)
where Ve and ω~ may be functions of time, but they are the same for all
particles α. Then the angular momentum about the origin is

~ = mα~rα × ~r˙α = e + ~b × ω
~ × ~bα
X X X
L mα~rα × Ve + mα R α
α α α
~ × Ve + I(0) · ω
= MR e × (~
~ + MR ~
ω × B), (4.19)
where the inertia tensor I(0) is still measured8 about R,
e even though that is
not a fixed point. Recall that R ~ is the laboratory position of the center of
7
Actually, this gives more information than is needed to specify an axis, as ~b and ~b 0
specify the same axis if ~b − ~b 0 ∝ n̂. In the expression for the moment of inertia about the
axis, (4.17), we see that the component of ~b parallel to n̂ does not affect the result.
8
Recall the (~b) superscript in (4.15) refers to the body-fixed coordinate, so I(0) is about
~b = 0, not about the origin in inertial coordinates.
~ is its position in the body-fixed system. The kinetic energy is

mass, while B
now
1 2 1X
mα~r˙α = ~ × ~bα · Ve + ω ~ × ~bα
X
T = mα Ve + ω
α 2 2 α
!
1X 1X 2
2
mα~bα + ~ × ~bα
X
= mα Ve + Ve · ω ~× mα ω
2 α α 2 α
1 e2
~ + 1ω

= M V + M Ve · ω ~ ×B ~ · I(0) · ω
~ (4.20)
2 2
and again the inertia tensor I(0) is calculated about the arbitrary point R.
e
We will see that it makes more sense to use the center of mass.
Simplification Using the Center of Mass

As each ~r˙α = Ve + ω
~ × ~bα , the center of mass velocity is given by

M V~ = mα~r˙α = ~ × ~bα = M Ve + ω ~ ,
X X
mα Ve + ω ~ ×B (4.21)
α α
so 21 M V~ 2 = 12 M Ve 2 + M Ve · (~ω × B)
~ + 1 M (ω × B)
2
~ 2 . Comparing with 4.20,
we see that
1 1 ~ 2 + 1ω
T = M V~ 2 − M (~ω × B) ~ · I(0) · ω
~.
2 2 2
The last two terms can be written in terms of the inertia tensor about the
center of mass. From 4.16 with ~b = 0, as B ~ is the center of mass,
(cm) (0)
Iij = Iij − M B 2 δij + M Bi Bj .

~×B
Using the formula for A ~ · C
~ ×D
~ again,
1 ~2 1 ~ 2 + 1ω

T = MV − M ω ~2 − ω
~ 2B ~ ·B ~ · I(0) · ω
~
2 2 2
1 ~2 1
= MV + ω~ · I(cm) · ω
~. (4.22)
2 2
A similar expression holds for the angular momentum. Inserting Ve = V~ −

~ ×B
ω ~ into (4.19),
h i
~ = MR
L ~ × V~ − ω ~ + I(0) · ω
~ ×B e × (~
~ + MR ~
ω × B)
~ × V~ − M (R
= MR ~ − R)e × (~ ~ + I(0) · ω
ω × B) ~
~ × V~ − M B
= MR ~ × (~ω × B)
~ + I(0) · ω
~
~ × V~ − M ω
= MR 2
~ B + MB ~ω ~ + I(0) · ω
~ ·B ~
~ ~
= MR × V + I(cm)
·ω
~. (4.23)
These two decompositions, (4.22) and (4.23), have a reasonable interpre-

tation: the total angular momentum is the angular momentum about the
center of mass, plus the angular momentum that a point particle of mass
M and position R(t)~ would have. Similiarly, the total kinetic energy is the
rotational kinetic energy of the body rotating about its center of mass, plus
the kinetic energy of the fictious point particle moving with the center of
mass.
Note that if we go back to the situation where the marked point R e is
stationary at the origin of the lab coordinates, V = 0, L
e ~ = I·ω ~, T =
1
~ ·I·ω
ω 1
~ = 2ω ~
~ · L.
2
The angular momentum in Eqs. 4.19 and 4.23 is the angular momentum
measured about the origin of the lab coordinates, L ~ = Pα mα~rα × vα . It is
useful to consider the angular momentum as measured about the center of
mass,

~ cm = ~ × ~vα − V~ = L
~ − MR
~ × V~ ,
X
L mα ~rα − R (4.24)
α
so we see that the angular momentum, measured about the center of mass,
is just I(cm) · ω
~.
The parallel axis theorem is also of the form of a decomposition. The
inertia tensor about a given point ~r given by (4.16) is
2
(r) (cm) ~
Iij = Iij +M ~r − R δij − (ri − Ri ) (rj − Rj ) .
This is, once again, the sum of the quantity, here the inertia tensor, of the
body about the center of mass, plus the value a particle of mass M at the
center of mass R~ would have, evaluated about ~r.
There is another theorem about moments of inertia, though much less
general — it only applies to a planar object — let’s say in the xy plane, so
that zα ≈ 0 for all the particles constituting the body. As

mα x2α + yα2
X
Izz =
α

mα yα2 + zα2 = mα yα2
X X
Ixx =
α α

x2α zα2 mα x2α ,
X X
Iyy = mα + =
α α
we see that Izz = Ixx +Iyy , the moment of inertia about an axis perpendicular
to the body is the sum of the moments about two perpendicular axes within
the body, through the same point. This is known as the perpendicular
axis theorem. As an example of its usefulness we calculate the moments
for a thin uniform ring lying on the circle x2 + y 2 = R2 , y
z = 0, about the origin. As every particle of the ring has
the same distance R from the z-axis, the moment of inertia x
2
Izz is simply M R . As Ixx = Iyy by symmetry, and as
the two must add up to Izz , we have, by a simple indirect
calculation, Ixx = 21 M R2 .
The parallel axis theorem (4.17) is also a useful calculational tool. Con-
sider the moment of inertia of the ring about an axis parallel to its axis of
symmetry but through a point on the ring. About the axis of
y
symmetry, Izz = M R2 , and b⊥ = R, so about a point on the
ring, Izz = 2M R2 . If instead, we want the moment about a
(cm)
tangent to the ring in the x direction, Ixx = Ixx + M R2 =
1
2
M R2 + M R2 = 3M R2 /2. Of course for Iyy the b⊥ = 0, so x
Iyy = 21 M R2 , and we may verify that Izz = Ixx + Iyy about
this point as well.
For an object which has some thickness, with non-zero z components, the
perpendicular axis theorem becomes an inequality, Izz ≤ Ixx + Iyy .
Principal axes
If an object has an axial symmetry about z, we may use cylindrical polar
coordinates (ρ, θ, z). Then its density µ(ρ, θ, z) must be independent of θ,
and
Z h i
Iij = dz ρ dρ dθ µ(ρ, z) (ρ2 + z 2 )δij − ri rj ,
Z
so Ixz = dz ρ dρ dθ µ(ρ, z)(−zρ cos θ) = 0
Z
Ixy = dz ρ dρ dθ µ(ρ, z)(ρ2 sin θ cos θ) = 0
Z h i
Ixx = dz ρ dρ dθ µ(ρ, z) (ρ2 + z 2 − ρ2 cos2 θ
Z h i
Iyy = dz ρ dρ dθ µ(ρ, z) (ρ2 + z 2 − ρ2 sin2 θ = Ixx
Thus the inertia tensor is diagonal and has two equal elements,
 
Ixx 0 0
I= 0 Ixx 0 .
 
0 0 Izz
In general, an object need not have an axis of symmetry, and even a
diagonal inertia tensor need not have two equal “eigenvalues”. Even if a
body has no symmetry, however, there is always a choice of axes, a coordinate
system, such that in this system the inertia tensor is diagonal. This is because
Iij is always a real symmetric tensor, and any such tensor can be brought to
diagonal form by an orthogonal similiarity transformation9
 
I1 0 0
I = OID O−1 , ID =  0 I2 0 (4.25)
 
0 0 I3
An orthogonal matrix O is either a rotation or a rotation times P , and the
P ’s can be commuted through ID without changing its form, so there is a
rotation R which brings the inertia tensor into diagonal form. The axes of
this new coordinate system are known as the principal axes.
Tire balancing
Consider a rigid body rotating on an axle, and therefore about a fixed axis.
~˙ = ω
What total force and torque will the axle exert? First, R ~ so
~ × R,
~¨ = ω
R ~˙ × R
~ +ω ~˙ = ω
~ ×R ~˙ × R
~ +ω ~˙ × R
~ =ω
~ × (ω × R) ~ +ω ~ + Rω
~ (~ω · R) ~ 2.
If the axis is fixed, ω ~˙ are in the same direction, so the first term in the
~ and ω
last expression is perpendicular to the other two. If we want the total force
to be zero10 , R~¨ = 0, so
R ~¨ = 0 = 0 + (~ω · R)
~ ·R ~ 2 − R2 ω 2 .
9
This should be proven in any linear algebra course. For example, see [1], Theorem 6
in Section 6.3.
10
Here we are ignoring any constant force compensating the force exerted by the road
which is holding the car up!
Thus the angle between ω ~ and R ~ is 0 or π, and the center of mass must lie
on the axis of rotation. This is the condition of static balance if the axis of
rotation is horizontal in a gravitational field. Consider a car tire: to be stable
at rest at any angle, R ~ must lie on the axis or there will be a gravitational
torque about the axis, causing rotation in the absense of friction. If the tire
is not statically balanced, this force will rotate rapidly with the tire, leading
to vibrations of the car.
Even if the net force is 0, there might be a torque. ~τ = L ~˙ = d(I · ω
~ )/dt.
˙
~ will rapidly
If I · ω
~ is not parallel to ω
~ it will rotate with the wheel, and so L
oscillate. This is also not good for your axle. If, however, ω ~ is parallel to
one of the principal axes, I · ω ~ is parallel to ω
~ , so if ω ~
~ is constant, so is L,
and ~τ = 0. The process of placing small weights around the tire to cause
one of the principal axes to be aligned with the axle is called dynamical
balancing.
Every rigid body has its principal axes; the problem of finding them
and the moments of inertia about them, given the inertia tensor I in some
coordiate system, is a mathematical question of finding a rotation R and
“eigenvalues” I1 , I2 , I3 (not components of a vector)
 such that equation 4.25
1
holds, with R in place of O. The vector ~v1 = R  0  is then an eigenvector,
 
0
for
     
1 1 1
I · ~v1 = RID R−1 R 
 0  = RID  0  = I1 R  0  = I1~
    
v1 .
0 0 0
Similarly I · ~v2 = I2~v2 and I · ~v3 = I3~v3 , where ~v2 and ~v3 are defined the same
way, starting with ê2 and ê3 instead of ê1 . Note that, in general, I acts simply
as a multiplier only for multiples of these three vectors individually, and not
for sums of them. On a more general vector I will change the direction as
well as the length of the vector it acts on.
Note that the Ii are all ≥ 0, for given any vector ~n,
mα [rα2 n2 − (~rα · ~n)2 ] = mα rα2 n2 (1 − cos2 θα ) ≥ 0,

X X
~n · I · ~n =
α α
so all the eigenvalues must be ≥ 0. It will be equal to zero only if all massive
points of the body are in the ±~n directions, in which case the rigid body
must be a thin line.
Finding the eigenvalues Ii is easier than finding the rotation R. Consider

the matrix I − λ1I, which has the same eigenvectors as I, but with eigenvalues
Ii − λ. Then if λ is one of the eigenvalues Ii , this matrix will annihilate ~vi ,
so I − λ1I is a singular matrix with zero determinant. Thus the characteristic
equation det(I − λ1I) = 0, which is a cubic equation in λ, gives as its roots
the eigenvalues of I.
4.4 Dynamics
4.4.1 Euler’s Equations
So far, we have been working in an inertial coordinate system O. In compli-
cated situations this is rather unnatural; it is more natural to use a coordiate
system O0 fixed in the rigid body. In such a coordinate system, the vector
one gets by differentiating the coefficients of a vector ~b = b0i ê0i differs from
P
˙
the inertial derivative ~b as given in Eq. 4.7. Consider two important special
cases: either we have a system rotating about a fixed point R, e with ~ ~ and
τ , L,
0
Iij all evaluated about that fixed point, or we are working about the center
~ and Iij0 all evaluated about the center of mass, even if it
of mass, with ~τ , L,
is in motion. In either case, we have L ~ = I0 · ω
~ , so for the time derivative of
the angular momentum, we have
 
~
dL ~
dL
~τ = =   +ω ~
~ ×L
dt dt
b
d(Iij0 ωj0 ) 0
~ × (I 0 · ω
X
= êi +ω ~ ),
ij dt
Now in the O0 frame, all the masses are at fixed positions, so Iij0 is constant,
~˙ . Thus
and the first term is simply I · (dω/dt)b , which by (4.8) is simply I · ω
we have (in the body coordinate system)
~˙ + ω
~τ = I0 · ω ~ × (I0 · ω). (4.26)
We showed that there is always a choice of cartesian coordinates mounted

on the body along the principal axes. For the rest of this section we will use
this body-fixed coordinate system, so we will drop the primes.
4.4. DYNAMICS 107
The torque not only determines the rate of change of the angular momen-
tum, but also does work in the system. For a system rotating about a fixed
point, we see from the expression (4.14), T = 12 ω
~ ·I·ω
~ , that
dT 1˙ 1 1
= ω~ ·I·ω
~+ ω~ · İ · ω
~+ ω ~˙ .
~ ·I·ω
dt 2 2 2
The first and last terms are equal because the inertia tensor is symmetric,
Iij = Iji , and the middle term vanishes in the body-fixed coordinate system
because all particle positions are fixed. Thus dT /dt = ω ~˙ = ω
~ ·I·ω ~˙ = ω
~ ·L ~ · ~τ .
Thus the kinetic energy changes due to the work done by the external torque.
Therefore, of course, if there is no torque the kinetic energy is constant.
We will write out explicitly the components of Eq. 4.26. In evaluating τ1 ,
we need the first component of the second term,
[(ω1 , ω2 , ω3 ) × (I1 ω1 , I2 ω2 , I3 ω3 )]1 = (I3 − I2 )ω2 ω3 .
Inserting this and the similar expressions for the other components into
Eq. (4.26), we get Euler’s equations
τ1 = I1 ω̇1 + (I3 − I2 )ω2 ω3 ,

τ2 = I2 ω̇2 + (I1 − I3 )ω1 ω3 , (4.27)
τ3 = I3 ω̇3 + (I2 − I1 )ω1 ω2 .
Using these equations we can address several situations of increasing diffi-

culty.
First, let us ask under what circumstances the angular velocity will be
fixed in the absense of a torque. As ~τ = ω ~˙ = 0, from the 1-component
equation we conclude that (I2 − I3 )ω2 ω3 = 0. Then either the moments
are equal (I2 = I3 ) or one of the two components ω2 or ω3 must vanish.
Similarly, if I1 6= I2 , either ω1 or ω2 vanishes. So the only way more than one
component of ω ~ can be nonzero is if two or more of the principal moments
are equal. In this case, the principal axes are not uniquely determined. For
example, if I1 = I2 6= I3 , the third axes is unambiguously required as one
of the principle axes, but any direction in the (12)-plane will serve as the
second principal axis. In this case we see that ~τ = ω ~˙ = 0 implies either ω
~ is
along the z-axis (ω1 = ω2 = 0) or it lies in the (12)-plane, (ω3 = 0). In any
case, the angular velocity is constant in the absence of torques only if it lies
along a principal axis of the body.
As our next example, consider an axially symmetric body with no external

forces or torques acting on it. Then R ~˙ is a constant, and we will choose to
work in an inertial frame where R~ is fixed at the origin. Choosing our body-
fixed coordinates with z along the axis of symmetry, our axes are principal
ones and I1 = I2 , so we have
I1 ω̇1 = (I1 − I3 )ω2 ω3 ,

I1 ω̇2 = (I3 − I1 )ω1 ω3 ,
I3 ω̇3 = (I1 − I2 )ω1 ω2 = 0.
We see that ω3 is a constant. Let Ω = ω3 (I3 − I1 )/I1 . Then we see that
ω̇1 = −Ωω2 , ω̇2 = Ωω1 .
Differentiating the first and plugging in the second, we find
ω̈1 = −Ωω̇2 = −Ω2 ω1 ,
which is just the harmonic oscillator equation. So ω1 = A cos(Ωt + φ) with

some arbitrary amplitude A and constant phase φ, and ω2 = −ω̇1 /Ω =
A sin(Ωt + φ). We see that, in the body-fixed frame, the angular velocity
rotates about the axis of symmetry in a circle, with arbitrary radius A, and
a period 2π/Ω. The angular velocity vector ω ~ is therefore sweeping out a
cone, called the body cone of precession with a half-angle φb = tan−1 A/ω3 .
Note the length of ω ~ is fixed.
What is happening in the lab frame? The kinetic energy 21 ω ~ is constant,
~ ·L
as is the vector L~ itself. As the length of a vector is frame independent, |~ω |
is fixed as well. Therefore the angle between them, called the lab angle, is
constant,
~ ·L
ω ~ 2T
cos φL = = = constant. (4.28)
~
|~ω ||L| ~
|~ω ||L|
Thus ω ~ rotates about L~ in a cone, called the laboratory cone.
Note that φb is the angle between ω ~ and the z-axis of the body, while φL
is the angle between ω ~
~ and L, so they are not the same angle in two different
coordinate systems.
The situation is a bit hard to picture. In the body frame it is hard
to visualize ω
~ , although that is the negative of the angular velocity of the
universe in that system. In the lab frame the body is instantanously rotating
4.4. DYNAMICS 109
about the axis ω~ , but this axis is not fixed in the body. At any instant,
the points on this line are not moving, and we may think of the body rolling
without slipping on the lab cone, with ω
~ the momentary line of contact. Thus
the body cone rolls on the lab cone without slipping.
The Poinsot construction

This idea has an extension to the more general case where the body has no
symmetry. The motion in this case can be quite complex, both for analytic
solution, because Euler’s equations are nonlinear, and to visualize, because
the body is rotating and bobbing around in a complicated fashion. But as
we are assuming there are no external forces or torques, the kinetic energy
and total angular momentum vectors are constant, and this will help us
understand the motion. To do so we construct an abstract object called the
inertia ellipsoid. Working in the body frame, consider that the equation
X
2T = ωi Iij ωj = f (~ω )
ij
is a quadratic equation for ω ~ , with constant coefficients, which therefore

determines an ellipsoid11 in the space of possible values of ω ~ . This is called
12
the inertia ellipsoid . It is fixed in the body, and so if we were to scale it
by some constant to change units from angular velocity to position, we could
think of it as a fixed ellipsoid in the body itself, centered at the center of
mass. At every moment the instantanous value of ω ~ must lie on this ellipsoid,
so ω~ (t) sweeps out a path on this ellipsoid called the polhode.
If we go to the lab frame, we see this ellipsoid fixed in and moving with
the body. The instantaneous value of ω ~ still lies on it. In addition, the
component of ω ~ in the (fixed) L~ direction is fixed, and as the center of mass
is fixed, the point corresponding to ω~ lies in a plane perpendicular to L~ a fixed
distance from the center of mass, known as the invariant plane. Finally we
note that the normal to the surface of the ellipsoid f (~ω ) = 2T is parallel to
∇f = 2I · ω ~ so the ellipsoid of inertia is tangent to the invariant plane
~ = 2L,
11
We assume the body is not a thin line, so that I is a positive definite matrix (all its
eigenvalues are strictly > 0), so the surface defined by this equation is bounded.
12
Exactly which √quantity forms the inertia ellipsoid varies by author. Goldstein scales
ω
~ by a constant 1/ 2T to form an object ρ whose ellipsoid he calls the inertia ellipsoid.
Landau and Lifshitz discuss an ellipsoid of L ~ values but don’t give it a name. They then
call the corresponding path swept out by ω ~ the polhode, as we do.
at the point ω ~ (t). The path that ω~ (t) sweeps out on the invariant plane is
called the herpolhode. At this particular moment, the point corresponding
to ω
~ in the body is not moving, so the inertia ellipsoid is rolling, not slipping,
on the invariant plane.
In general, if there is no special symmetry, the inertia ellipsoid will not
be axially symmetric, so that in order to roll on the fixed plane and keep its
center at a fixed point, it will need to bob up and down. But in the special
case with axial symmetry, the inertia ellipsoid will also have this symmetry,
so it can roll about a circle, with its symmetry axis at a fixed angle relative
to the invariant plane. In the body frame, ω3 is fixed and the polhode moves
on a circle of radius A = ω sin φb . In the lab frame, ω ~ rotates about L, ~ so
it sweeps out a circle of radius ω sin φL in the invariant plane. One circle is
rolling on the other, and the polhode rotates about its circle at the rate Ω in
the body frame, so the angular rate at which the herpolhode rotates about
~ ΩL , is
L,
circumference of polhode circle I3 − I1 sin φb
ΩL = Ω = ω3 .
circumference of herpolhode circle I1 sin φL
Stability of rotation about an axis

We have seen that the motion of a isolated rigid body is simple only if the
angular velocity is along one of the principal axes, and can be very complex
otherwise. However, it is worth considering what happens if ω ~ is very nearly,
but not exactly, along one of the principal axes, say z. Then we may write
ω
~ = ω3 ê3 + ~ in the body coordinates, and assume 3 = 0 and the other
components are small. We treat Euler’s equations to first order in the small
quantity ~. To this order, ω̇3 = (I1 − I2 )1 2 /I3 ≈ 0, so ω3 may be considered
a constant. The other two equations give
I2 − I3
ω̇1 = ˙1 = 2 ω3
I1
I3 − I1
ω̇2 = ˙2 = 1 ω3
I2
so
I2 − I3 I3 − I1 2
¨1 = ω3 1 .
I1 I2
What happens to ~(t) depends on the sign of the coefficient, or the sign
of (I2 − I3 )(I3 − I1 ). If it is negative, 1 oscillates, and indeed ~ rotates
4.4. DYNAMICS 111
about z just as we found for the symmetric top. This will be the case if
I3 is either the largest or the smallest eigenvalue. If, however, it is the
middle eigenvalue, the constant will be positive, and the equation is solved by
exponentials, one damping out and one growing. Unless the initial conditions
are perfectly fixed, the growing piece will have a nonzero coefficient and ~ will
blow up. Thus a rotation about the intermediate principal axis is unstable,
while motion about the axes with the largest and smallest moments are
stable. For the case where two of the moments are equal, the motion will be
stable about the third, and slightly unstable (~ will grow linearly instead of
exponentially with time) about the others.
An interesting way of understanding this stability or instability of rotation
close to a principle axes involves another ellipsoid we can define for the free
rigid body, an ellipsoid of possible angular momentum values. Of course
in the inertial coordinates L ~ is constant, but in body fixed language the
coordinates vary with time, though the length of L ~ is still constant. In
addition, the conservation of kinetic energy
2T = L~ · I−1 · L
~
(where I−1 is the inverse of the moment of inertia matrix) gives a quadratic
equation for the three components of L, ~ just as we had for ω
~ and the ellipsoid
~
of inertia. The path of L(t) on this ellipsoid is on the intersection of the
~ for the length is fixed.
ellisoid with a sphere of radius |L|,
If ω
~ is near the principle axis with the largest moment of inertia, L ~ lies
near the major axis of the ellipsoid. The sphere is nearly circumscribing the
ellipsoid, so the intersection consists only of two small loops surrounding each
end of the major axis. Similiarly if ω~ is near the smallest moment, the sphere
~ lie close
is nearly inscribed in the ellipsoid, and again the possible values of L
to either end of the minor axis. Thus the subsequent motion is confined to
one of these small loops. But if ω ~ starts near the intermediate principle axis,
~ does likewise, and the intersection consists of two loops which extend from
L
near one end to near the other of the intermediate axis, and the possible
continuous motion of L ~ is not confined to a small region of the ellipsoid.
Because the rotation of the Earth flattens the poles, the Earth is approx-
imately an oblate ellipsoid, with I3 greater than I1 = I2 by about one part
in 300. As ω3 is 2π per siderial day, if ω ~ is not perfectly aligned with the
axis, it will precess about the symmetry axis once every 10 months. This
Chandler wobble is not of much significance, however, because the body
angle φb ≈ 10−6 .
4.4.2 Euler angles

Up to this point we have managed to describe the motion of a rigid body
without specifying its coordinates. This is not possible for most problems
with external forces, for which the torque will generally depend on the ori-
entation of the body. It is time to face up to the problem of using three
generalized coordinates to describe the orientation.
In section 4.1.1 we described the orientation of a rigid body in terms of a
rotation through a finite angle in a given direction, specified by ω. This does
not give a simple parameterization of the matrix A, and it is more common
to use an alternate description known as Euler angles. Here we describe
the rotation A as a composition of three simpler rotations about specified
coordinates, so that we are making a sequence of changes of coordinates
Rz (φ) Ry1 (θ) Rz2 (ψ) 0 0 0

(x, y, z) −→ (x1 , y1 , z1 ) −→ (x2 , y2 , z2 ) −→ (x , y , z ).
We have chosen three specific directions about which to make the three ro-
tations, namely the original z-axis, the next y-axis, y1 , and then the new
z-axis, which is both z2 and z 0 . This choice is not universal, but is the one
generally used in quantum mechanics. Many of the standard classical me-
chanics texts13 take the second rotation to be about the x1 -axis instead of
y1 , but quantum mechanics texts14 avoid this because the action of Ry on a
spinor is real, while the action of Rx is not. While this does not concern us
here, we prefer to be compatible with quantum mechanics discussions.
This procedure is pictured in Figure 4.2. To see that any rotation can
be written in this form, and to determine the range of the angles, we first
discuss what fixes the y1 axis. Notice that the rotation about the z-axis
leaves z uneffected, so z1 = z, Similiarly, the last rotation leaves the z2
axis unchanged, so it is also the z 0 axis. The planes orthogonal to these
axes are also left invariant15 . These planes, the xy-plane and the x0 y 0 -plane
respectively, intersect in a line called the line of nodes16 . These planes
are also the x1 y1 and x2 y2 planes respectively, and as the second rotation
13
See [2], [6], [9], [10], [11] and [17].
14
For example [13] and [20].
15
although the points in the planes are rotated by 4.4.
16
The case where the xy and x0 y 0 are identical, rather than intersecting in a line, is
exceptional, corresponding to θ = 0 or θ = π. Then the two rotations about the z-axis
add or subtract, and many choices for the Euler angles (φ, ψ) will give the same full
rotation.
4.4. DYNAMICS 113
z
z’
θ
y’
x ψ
y φ
y1
line of nodes
x’
Figure 4.2: The Euler angles as rotations through φ, θ, ψ, about the z, y1 ,

and z2 axes sequentially
Ry1 (θ) must map the first into the second plane, we see that y1 , which is
unaffected by Ry1 , must be along the line of nodes. We choose between the
two possible orientations of y1 to keep the necessary θ angle in [0, π]. The
angles φ and ψ are then chosen ∈ [0, 2π) as necessary to map y → y1 and
y1 → y 0 respectively.
While the rotation about the z-axis leaves z uneffected, it rotates the x
and y components by the matrix (4.4). Thus in three dimensions, a rotation
about the z axis is represented by
 
cos φ sin φ 0
 − sin φ cos φ 0  .
Rz (φ) =  (4.29)

0 0 1
Similarly a rotation through an angle θ about the current y axis has a similar
form  
cos θ 0 − sin θ
Ry (θ) =  0 1 0 . (4.30)
 
sin θ 0 cos θ
The reader needs to assure himself, by thinking of the rotations as active
transformations, that the action of the matrix Ry after having applied Rz
produces a rotation about the y1 -axis, not the original y-axis.
The full rotation A = Rz (ψ) · Ry (θ) · Rz (φ) can then be found simply by
matrix multiplication:
A(φ, θ, ψ) =
   
cos ψ sin ψ 0 cos θ 0 − sin θ cos φ sin φ 0
 − sin ψ cos ψ 0 0 1 0   − sin φ cos φ 0 
   
0 0 1 sin θ 0 cos θ 0 0 1
= (4.31)
 
− sin φ sin ψ + cos θ cos φ cos ψ cos φ sin ψ + cos θ sin φ cos ψ − sin θ cos ψ
 − sin φ cos ψ − cos θ cos φ sin ψ cos φ cos ψ − cos θ sin φ sin ψ sin θ sin ψ  .
sin θ cos φ sin θ sin φ cos θ
We need to reexpress the kinetic energy in terms of the Euler angles and
their time derivatives. From the discussion of section 4.2, we have
d −1
Ω0 = −A(t) ·
A (t)
dt
The inverse matrix is simply the transpose, so finding Ω0 can be done by
straightforward differentiation and matrix multiplication17 . The result is
Ω0 = (4.32)
0 ψ̇ + φ̇ cos θ −θ̇ cos ψ − φ̇ sin θ sin ψ
 
 −ψ̇ − φ̇ cos θ 0 θ̇ sin ψ − φ̇ sin θ cos ψ  .
θ̇ cos ψ + φ̇ sin θ sin ψ −θ̇ sin ψ + φ̇ sin θ cos ψ 0
Note Ω0 is antisymmetric as expected, so it can be recast into the axial vector
ω
ω10 = Ω023 = θ̇ sin ψ − φ̇ sin θ cos ψ,
ω20 = Ω031 = θ̇ cos ψ + φ̇ sin θ sin ψ, (4.33)
ω30 = Ω012 = ψ̇ + φ̇ cos θ.
17
Verifying the above expression for A and the following one for Ω0 is a good appli-
cation for a student having access to a good symbolic algebra computer program. Both
Mathematica and Maple handle the problem nicely.
4.4. DYNAMICS 115
This expression for ω~ gives the necessary velocities for the kinetic energy
term (4.20 or 4.22) in the Lagrangian, which becomes
1
~ + 1ω

L = M Ve 2 + M Ve · ω~ ×B ~ · I (R̃) · ω
~ − U (R,
e θ, ψ, φ), (4.34)
2 2
or
1 1
L = M V~ 2 + ω~ · I (cm) · ω ~ θ, ψ, φ),
~ − U (R, (4.35)
2 2
ωi0 ê0i given by (4.33).
P
with ω
~ = i
4.4.3 The symmetric top

Now let us consider an example with external forces which constrain one
point of a symmetrical top to be stationary. Then we choose this to be the
fixed point, at the origin R̃ = 0, and we choose the body-fixed z 0 -axis to be
along the axis of symmetry. Of course the center of mass in on this axis,
so R~ = (0, 0, `) in body-fixed coordinates. We will set up the motion by
writing the Lagrangian from the forms for the kinetic and potential energy,
due entirely to the gravitational field18 .
1 2 1
T = (ω1 + ω22 )I1 + ω32 I3
2 2
1 2 2 1 2
= φ̇ sin θ + θ̇2 I1 + φ̇ cos θ + ψ̇ I3 , (4.36)
2 2
−1
U = M gzcm = M g` A = M g` cos θ. (4.37)
zz
So L = T − U is independent of φ, ψ, and the corresponding momenta

pφ = φ̇ sin2 θI1 + φ̇ cos θ + ψ̇ cos θI3
= φ̇

sin2 θI1 + cos

θω3 I3 ,
pψ = φ̇ cos θ + ψ̇ I3 = ω3 I3
are constants of the motion. Let us use parameters a = pψ /I1 and b = pφ /I1 ,
which are more convenient, to parameterize the motion, instead of pφ , pψ , or
18
As we did in discussing Euler’s equations, we drop the primes on ωi and on Iij even
though we are evaluating these components in the body fixed coordinate system. The
coordinate z, however, is still a lab coordinate, with êz pointing upward.
even ω3 , which is also a constant of the motion and might seem physically a
more natural choice. A third constant of the motion is the energy,
1 1
E = T + U = I1 θ̇2 + φ̇2 sin2 θ + ω32 I3 + M g` cos θ.
2 2
Solving for φ̇ from pφ = I1 b = φ̇ sin2 θI1 + I1 a cos θ,
b − a cos θ
φ̇ = , (4.38)
sin2 θ
I1 a b − a cos θ
ψ̇ = ω3 − φ̇ cos θ = − cos θ, (4.39)
I3 sin2 θ
Then E becomes
1 1
E = I1 θ̇2 + U 0 (θ) + I3 ω32 ,
2 2
where
1 (b − a cos θ)2
U 0 (θ) := I1 + M g` cos θ.
2 sin2 θ
The term 21 I3 ω32 is an ignorable constant, so we consider E 0 := E − 21 I3 ω32
as the third constant of the motion, and we now have a one dimensional
problem for θ(t), with a first integral of the motion. Once we solve for θ(t),
we can plug back in to find φ̇ and ψ̇.
Substitute u = cos θ, u̇ = − sin θθ̇, so
I1 u̇2 1 (b − au)2
E0 = + I1 + M g`u,
2(1 − u2 ) 2 1 − u2
or
u̇2 = (1 − u2 )(α − βu) − (b − au)2 =: f (u), (4.40)
with α = 2E 0 /I1 , β = 2M g`/I1 .
f (u) is a cubic with a positive u3
term, and is negative at u = ±1,
where the first term vanishes, and
which are also the limits of the
physical range of values of u. If .2
u
there are to be any allowed val-

ues for u̇2 , f (u) must be nonneg- -1
cos θ
min
ative somewhere in u ∈ [−1, 1], so cos θ

max
1 u
f must look very much like what is

shown.
4.4. DYNAMICS 117
To visualize what is happening, note that a point on the symmetry axis moves
on a sphere, with θ and φ representing the usual spherical coordinates, as
can be seen by examining what A−1 does to (0, 0, z 0 ). So as θ moves back
and forth between θmin and θmax , the top is wobbling closer and further
from the vertical, called nutation. At the same time, the symmetry axis
θ0 = 52◦ θ0 = 44◦ θ0 = θmin
Figure 4.3: Possible loci for a point on the symmetry axis of the top. The
axis nutates between θmin = 50◦ and θmax = 60◦
is precessing, rotating about the vertical axis, at a rate φ̇ which is not

constant but a function of θ (Eq. 4.38). Qualitatively we may distinguish
three kinds of motion, depending on the values of φ̇ at the turning points in
θ. These in turn depend on the initial conditions and the parameters of the
top, expressed in a, b, and θmin , θmax . If the value of u0 = cos θ0 at which
φ̇ vanishes is within the range of nutation, then the precession will be in
different directions at θmin and θmax , and the motion is as in Fig. 4.3a. On
the other hand, if θ0 = cos−1 (b/a) 6∈ [θmin , θmax ], the precession will always
be in the same direction, although it will speed up and slow down. We then
get a motion as in Fig. 4.3b. Finally, it is possible that cos θmin = b/a, so
that the precession stops at the top, as in Fig. 4.3c. This special case is of
interest, because if the top’s axis is held still at an angle to the vertical, and
then released, this is the motion we will get.
Exercises
4.1 Prove the following properties of matrix algebra:
(a) Matrix multiplication is associative: A · (B · C) = (A · B) · C.
(b) (A · B)T = B T · AT , where AT is the transpose of A, that is (AT )ij := Aji .

(c) If A−1 and B −1 exist, (A · B)−1 = B −1 · A−1 .
(d) The complex conjugate of a matrix (A∗ )ij = A∗ij is the matrix with every
element complex conjugated. The hermitean conjugate A† is the transpose of
that, A† := (A∗ )T = (AT )∗ , with (A† )ij := A∗ji . Show that (A · B)∗ = A∗ · B ∗ and
(A · B)† = B † · A† .
4.2 In section (4.1) we considered reexpressing a vector V ~ = P Vi êi in terms of

P i
new orthogonal basis vectors. If the new vectors are ~e 0i = j Aij êj , we can also
write êi = j Aji~e 0j , because AT = A−1 for an orthogonal transformation.
P
Consider now using a new basis ~e 0i which are not orthonormal. Then we must
choose which of the two above expressions to generalize. Let êi = j Aji~e 0j , and
P
find the expressions for (a) ~e 0j in terms of êi ; (b) Vi0 in terms of Vj ; and (c) Vi in
terms of Vj0 . Then show (d) that if a linear tranformation T which maps vectors
~ → W
V ~ is given in the êi basis by a matrix Bij , in that Wi = P Bij Vj , then
the same transformation T in the ~e 0i basis is given by C = A · B · A−1 . This
transformation of matrices, B → C = A · B · A−1 , for an arbitrary invertible
matrix A, is called a similarity transformation.
4.3 Two matrices B and C are called similar if there exists an invertible matrix
A such that C = A · B · A−1 , and this transformation of B into C is called a
similarity transformation, as in the last problem. Show that, if B and C are similar,
(a) Tr B = Tr C; (b) det B = det C; (c) B and C have the same eigenvalues; (d) If
A is orthogonal and B is symmetric (or antisymmetric), then C is symmetric (or
antisymmetric).
4.4 From the fact that AA−1 = 1 for any invertible matrix, show that if A(t) is
a differentiable matrix-valued function of time,
dA−1
Ȧ A−1 = −A .
dt
4.5 Show that a counterclockwise rotation through an angle θ about an axis in

the direction of a unit vector n̂ passing through the origin is given by the matrix
Aij = δij cos θ + ni nj (1 − cos θ) − ijk nk sin θ.

4.4. DYNAMICS 119
4.6 Consider a rigid body in the shape of a right circular cone of height h and a
base which is a circle of radius R, made of matter with a uniform density ρ.
a) Find the position of the center of
mass. Be sure to specify with respect
to what.
b) Find the moment of inertia ten- P
sor in some suitable, well specified co-
ordinate system about the center of
mass.
c) Initially the cone is spinning about
its symmetry axis, which is in the z h
direction, with angular velocity ω0 ,
and with no external forces or torques y
acting on it. At time t = 0 it is hit
with a momentary laser pulse which
imparts an impulse P in the x direc- R
x
tion at the apex of the cone, as shown.
Describe the subsequent force-free motion, including, as a function of time, the
angular velocity, angular momentum, and the position of the apex, in any inertial
coordinate system you choose, provided you spell out the relation to the initial
inertial coordinate system.
4.7 We defined the general rotation as A = Rz (ψ) · Ry (θ) · Rz (φ). Work out
the full expression for A(φ, θ, ψ), and verify the last expression in (4.31). [For
this and exercise 4.8, you might want to use a computer algebra program such as
mathematica or maple, if one is available.]
4.8 Find the expression for ω~ in terms of φ, θ, ψ, φ̇, θ̇, ψ̇. [This can be done simply
with computer algebra programs. If you want to do this by hand, you might find
it easier to use the product form A = R3 R2 R1 , and the rather simpler expressions
for RṘT . You will still need to bring the result (for R1 Ṙ1T , for example) through
the other rotations, which is somewhat messy.]
4.9 A diamond shaped object is shown in top, front, and side views. It is an
octahedron, with 8 triangular flat faces.
A’
a
B’ C B
a
A
b b a a
It is made of solid aluminum of uniform C C
density, with a total mass M . The di-

mensions, as shown, satisfy h > b > a. h
(a) Find the moment of inertia tensor
about the center of mass, clearly speci- B’ A B A B A’
fying the coordinate system chosen.
(b) About which lines can a stable spin-
h
ning motion, with fixed ω ~ , take place,
assuming no external forces act on the
C’ C’
body?
4.10 From the expression 4.40 for u = cos θ for the motion of the symmetric top,
we can derive a function for the time t(u) as an indefinite integral
Z u
t(u) = f −1/2 (z) dz.
For values which are physically realizable, the function f has two (generically dis-
tinct) roots, uX ≤ uN in the interval u ∈ [−1, 1], and one root uU ∈ [1, ∞), which
does not correspond to a physical value of θ. The integrand is then generically an
analytic function of z with square root branch points at uN , uX , uU , and ∞, which
we can represent on a cut Riemann sheet with cuts on the real axis, [−∞, uX ] and
[uN , uU ], and f (u) > 0 for u ∈ (uX , uN ). Taking t = 0 at the time the top is at
the bottom of a wobble, θ = θmax , u = uX , we can find the time at which it first
reaches another u ∈ [uX , uN ] by integrating along the real axis. But we could also
use any other path in the upper half plane, as the integral of a complex function
is independent of deformations of the path through regions where the function is
analytic.
(a) Extend this definition to a function t(u) defined for Im u ≥ 0, with u not on a
cut, and show that the image of this function is a rectangle in the complex t plane,
and identify the pre-images of the sides. Call the width T /2 and the height τ /2
(b) Extend this function to the lower half of the same Riemann sheet by allowing
contour integrals passing through [uX , uN ], and show that this extends the image
in t to the rectangle (0, T /2) × (−iτ /2, iτ /2).
(c) If the coutour passes through the cut (−∞, uX ] onto the second Riemann sheet,
the integrand has the opposite sign from what it would have at the corresponding
point of the first sheet. Show that if the path takes this path onto the second sheet
4.4. DYNAMICS 121
and reaches the point u, the value t1 (u) thus obtained is t1 (u) = −t0 (u), where
t0 (u) is the value obtained in (a) or (b) for the same u on the first Riemann sheet.
(d) Show that passing to the second Riemann sheet by going through the cut
[uN , uU ] instead, produces a t2 (u) = t1 + T .
(e) Show that evaluating the integral along two contours, Γ1 and Γ2 , which differ
only by Γ1 circling the [uN , uU ] cut clockwise once more than Γ2 does, gives t1 =
t2 + iτ .
(f) Show that any value of t can be reached by some path, by circling the [uN , uU ]
as many times as necessary, and also by passing downwards through it and upwards
through the [−∞, uX ] cut as often as necessary (perhaps reversed).
(g) Argue that thus means the function u(t) is an analytic function from the
complex t plane into the u complex plane, analytic except at the points t = nT +
i(m + 21 )τ , where u(t) has double poles. Note this function is doubly periodic, with
u(t) = u(t + nT + imτ ).
(g) Show that the function is then given by u = β ℘(t − iτ /2) + c, where c is a
constant, β is the constant from (4.40), and
1 X 1 1

℘(z) = + −
z2 m,n∈Z
Z
(z − nT − miτ )2 (nT + miτ )2
(m,n)6=0
is the Weierstrass’ ℘-Function.

(h) Show that ℘ satisfies the differential equation
℘02 = 4℘3 − g2 ℘ − g3 ,
where
(mT + inτ )−4 , (mT + inτ )−6 .

X X
g2 = g3 =
m,n∈ Z m,n∈ Z
(m,n)6=(0,0) (m,n)6=(0,0)
[Note that the Weierstrass function is defined more generally, using parameters
ω1 = T /2, ω2 = iτ /2, with the ω’s permitted to be arbitrary complex numbers
with differing phases.]
4.11 As a rotation about the origin maps the unit sphere into itself, one way
to describe rotations is as a subset of maps f : S 2 → S 2 of the (surface of the)
unit sphere into itself. Those which correspond to rotations are clearly one-to-
one, continuous, and preserve the angle between any two paths which intersect
at a point. This is called a conformal map. In addition, rotations preserve the
distances between points. In this problem we show how to describe such mappings,
and therefore give a representation for the rotations in three dimensions.
(a) Let N be the north pole (0, 0, 1) of the unit sphere Σ = {(x, y, z), x2 +y 2 +z 2 =
1}. Define the map from the rest of the sphere s : Σ − {N } → R2 given by a
stereographic projection, which maps each point on the unit sphere, other than
the north pole, into the point (u, v) in the equatorial plane (x, y, 0) by giving the
intersection with this plane of the straight line which joins the point (x, y, z) ∈ Σ
to the north pole. Find (u, v) as a function of (x, y, z), and show that the lengths
of infinitesimal paths in the vicinity of a point are scaled by a factor 1/(1 − z)
independent of direction, and therefore that the map s preserves the angles between
intersecting curves (i.e. is conformal).
(b) Show that the map f ((u, v)) → (u0 , v 0 ) which results from first applying s−1 ,
then a rotation, and then s, is a conformal map from R2 into R2 , except for the
pre-image of the point which gets mapped into the north pole by the rotation.
By a general theorem of complex variables, any such map is analytic, so f : u+iv →
u0 + iv 0 is an analytic function except at the point ξ0 = u0 + iv0 which is mapped
to infinity, and ξ0 is a simple pole of f . Show that f (ξ) = (aξ + b)/(ξ − ξ0 ), for
some complex a and b. This is the set of complex Mobius transformations, which
are usually rewritten as
αξ + β
f (ξ) = ,
γξ + δ
where α, β, γ, δ are complex constants. An overall complex scale change does not
affect f , so the scale of these four complex constants is generally fixed by imposing
a normalizing condition αδ − βγ = 1.
(c) Show that composition of Mobius transformations f 00 = f 0 ◦ f : ξ −→ ξ 0 −→ 0
ξ 00
f f
is given by matrix multiplication,
00
β 00
0
β0

α α α β
= · .
γ 00 δ 00 γ0 δ0 γ δ
(d) Not every mapping s−1 ◦ f ◦ s is a rotation, for rotations need to preserve
distances as well. We saw that an infinitesimal distance d` on Σ is mapped by s to
a distance |dξ| = d`/(1 − z). Argue that the condition that f : ξ → ξ˜ correspond
to a rotation is that d`˜ ≡ (1 − z̃)|df /dξ||dξ| = d`. Express this change of scale in
terms of ξ and ξ˜ rather than z and z̃, and find the conditions on α, β, γ, δ that
insure this is true for all ξ. Together with the normalizing condition, show that this
requires the matrix for f to be a unitary matrix with determinant 1, so that the
set of rotations corresponds to the group SU (2). The matrix elements are called
Cayley-Klein parameters, and the real and imaginary parts of them are called the
Euler parameters.
Chapter 5
Small Oscillations
5.1 Small oscillations about stable equilibrium

Consider a situation with N unconstrained generalized coordinates qi de-
scribed by a mass matrix Mij ({qk }) and a potential U ({qi }), and suppose
that U has a local minimum at some point in configuration space, qi = qi0 .
Then this point is a stable equilibrium point, for the generalized force at that
point is zero, and if the system is placed nearly at rest near that point, it will
not have enough energy to move far away from that point. We may study
the behavior of such motions by expanding the potential1 in Taylor’s series
expansion in the deviations ηi = qi − qi0 ,

X ∂U 1 X ∂ 2 U
U (q1 , . . . , qN ) = U (qi0 ) + ηi + ηi ηj + ... .
i ∂qi 0 2 ij ∂qi ∂qj 0
The constant U (qi0 ) is of no interest, as only changes in potential matter,

so we may as well set it to zero. In the second term, − ∂U/∂qi |0 is the
generalized force at the equilibrium point, so it is zero. Thus the leading
term in the expansion is the quadratic one, and we may approximate

1X ∂ 2 U
U ({qi }) = Aij ηi ηj , with Aij = . (5.1)
2 ij ∂qi ∂qj 0
Note that A is a constant symmetric real matrix.

1
assumed to have continuous second derivatives.
123
124 CHAPTER 5. SMALL OSCILLATIONS
The kinetic energy T = 12 Mij η̇i η̇j is already second order in the small
P
variations from equilibrium, so we may evaluate Mij , which in general can

depend on the coordinates qi , at the equilibrium point, ignoring any higher
order changes. Thus Mij is a constant. Thus both the kinetic and potential
energies are quadratic forms in the displacement η, which we think of as a
vector in N -dimensional space. Thus we can write the energies in matrix
form
1 1
T = η̇ T · M · η̇, U = η T · A · η. (5.2)
2 2
A and M are real symmetric matrices, and because any displacement corre-
sponds to positive kinetic and nonnegative potential energies, they are pos-
itive (semi)definite matrices, meaning that all their eigenvalues are greater
than zero, except that A may also have eigenvalues equal to zero (these
are directions in which the stability is neutral to lowest order, but may be
determined by higher order terms in the displacement).
Lagrange’s equation of motion
d ∂L ∂L d
0= − = M · η̇ + A · η = M · η̈ + A · η (5.3)
dt ∂ η̇i ∂ηi dt
is not necessarily diagonal in the coordinate η. We shall use the fact that
any real symmetric matrix can be diagonalized by a similarity transforma-
tion with an orthogonal matrix to reduce the problem to a set of independant
harmonic oscillators. While both M and A can be diagonalized by an orthog-
onal transformation, they can not necessarily be diagonalized by the same
one, so our procedure will be in steps:
1. Diagonalize M with an orthogonal transformation O1 , transforming the

coordinates to a new set x = O1 · η.
2. Scale the x coordinates to reduce the mass matrix to the identity ma-
trix. The new coordinates will be called y.
3. Diagonalize the new potential energy matrix with another orthogonal

matrix O2 , giving the final set of coordinates, ξ = O2 · y. Note this
transformation leaves the kinetic energy matrix diagonal because the
identity matrix is unaffected by similarity transformations.
The ξ are normal modes, modes of oscillation which are independent in

the sense that they do not affect each other.
5.1. SMALL OSCILLATIONS ABOUT STABLE EQUILIBRIUM 125
Let us do this in more detail. We are starting with the coordinates η and
the real symmetric matrices A and M , and we want to solve the equations
M · η̈ + A · η = 0. In our first step, we use the matrix O1 , which linear
algebra guarantees exists, that makes m = O1 · M · O1−1 diagonal. Note O1 is
time-independent, so defining xi = j O1 ij ηj also gives ẋi = j O1 ij η̇j , and
P P
1 T
T = η̇ · M · η̇
2
1 T −1
= η̇ · O1 · m · O1 · η̇
2
1 T
= η̇ · O1T · m · (O1 · η̇)
2
1
= (O1 · η̇)T · m · (O1 · η̇)
2
1 T
= ẋ · m · ẋ.
2
Similarly the potential energy becomes U = 12 xT · O1 · A · O1−1 · x. We know
that the matrix m is diagonal, and the diagonal elements mii are all strictly
√
positive. To begin the second step, define the diagonal matrix Sij = mii δij
and new coordinates yi = Sii xi = j Sij xj , or y = S·x. Now m = S 2 = S T ·S,
P
so T = 21 ẋT · m · ẋ = 12 ẋT · S T · S · ẋ = 21 (S · ẋ)T · S · ẋ = 21 ẏ T · ẏ. In terms of

y, the potential energy is U = 12 y T · B · y, where
B = S −1 · O1 · A · O1−1 · S −1
is still a symmetric matrix2 .

Finally, let O2 be an orthogonal matrix which diagonalizes B, so C =
O2 · B · O2−1 is diagonal, and let ξ = O2 · y. Just as in the first step,
1 1
U = ξ T · O2 · B · O2−1 · ξ = ξ T · C · ξ,
2 2
while the kinetic energy
1 1 1
T = ẏ T · ẏ = ẏ T · O2T · O2 · ẏ = ξ˙T · ξ˙
2 2 2
is still diagonal. Because the potential energy must still be nonnegative,
√ all
the diagonal elements Cii are nonnegative, and we will call them ωi := Cii .
2
O1 · A · O1−1 is symmetric because A is and O1 is orthogonal, so O1−1 = O1T .
Then
1 X ˙2 1X 2 2
T = ξ , U= ω ξ , ξ¨j + ωj2 ξj = 0,
2 j j 2 j j j
so we have N independent harmonic oscillators with the solutions
ξj = Re aj eiωj t ,
with some arbitrary complex numbers aj .

To find what the solution looks like in terms of the original coordinates
qi , we need to undo all these transformations. As ξ = O2 · y = O2 · S · x =
O2 · S · O1 · η, we have
q = q0 + O1−1 · S −1 · O2−1 · ξ.
We have completely solved this very general problem in small oscilla-

tions, at least in the sense that we have reduced it to a solvable problem of
diagonalizing symmetric real matrices. What we have done may appear ab-
stract and formal and devoid of physical insight, but it is a general algorithm
which will work on a very wide class of problems of small oscillations about
equilibrium. In fact, because diagonalizing matrices is something for which
computer programs are available, this is even a practical method for solving
such systems, even if there are dozens of interacting particles.
5.1.1 Molecular Vibrations

Consider a molecule made up of n atoms. We need to choose the right level
of description to understand low energy excitations. We do not want to
describe the molecule in terms of quarks, gluons, and leptons. Nor do we
need to consider all the electronic motion, which is governed by quantum
mechanics. The description we will use, called the Born-Oppenheimer
approximation, is to model the nuclei as classical particles. The electrons,
which are much lighter, move around much more quickly and cannot be
treated classically; we assume that for any given configuration of the nuclei,
the electrons will almost instantaneously find a quantum-mechanical ground
state, which will have an energy which depends on the current positions of
the nuclei. This is then a potential energy when considering the nuclear
motion. The nuclei themselves will be considered point particles, and we
will ignore internal quantum-mechanical degrees of freedom such as nuclear

spins. So we are considering n point particles moving in three dimensions,
with some potential about which we know only qualitative features. There
are 3n degrees of freedom. Of these, 3 are the center of mass motion, which,
as there are no external forces, is simply motion at constant velocity. Some
of the degrees of freedom describe rotational modes, i.e. motions that the
molecule could have as a rigid body. For a generic molecule this would be
three degrees of freedom, but if the equilibrium configuration of the molecule
is linear, rotation about that line is not a degree of freedom, and so only two
of the degrees of freedom are rotations in that case. The remaining degrees
of freedom, 3n − 6 for noncollinear and 3n − 5 for collinear molecules, are
vibrations.
O2
CO 2 H O
2
Figure 5.1: Some simple molecules in their equilibrium positions.
For a collinear molecule, it makes sense to divide the vibrations into

transverse and longitudinal ones. Considering motion in one dimension only,
the nuclei have n degrees of freedom, one of which is a center-of-mass motion,
leaving n − 1 longitudinal vibrations. So the remaining (3n − 5) − (n − 1) =
2(n − 2) vibrational degrees of freedom are transverse vibrational modes.
There are no such modes for a diatomic molecule.
Example: CO2
Consider first the CO2 molecule. As it is a molecule, there must be a position
of stable equilibrium, and empirically we know it to be collinear and sym-
metric, which one might have guessed. We will first consider only collinear
motions of the molecule. If the oxygens have coordinates q1 and q2 , and the
carbon q3 , the potential depends on q1 − q3 and q2 − q3 in the same way, so
the equilibrium positions have q2 − q3 = −(q1 − q3 ) = b. Assuming no direct

force between the two oxygen molecules, the one dimensional motion may be
described near equilibrium by
1 1
U = k(q3 − q1 − b)2 + k(q2 − q3 − b)2
2 2
1 1 1
T = mO q̇1 + mO q̇2 + mC q̇32 .
2 2
2 2 2
We gave our formal solution in terms of displacements from the equilibrium
position, but we now have a situation in which there is no single equilibrium
position, as the problem is translationally invariant, and while equilibrium
has constraints on the differences of q’s, there is no constraint on the center
of mass. We can treat this in two different ways:
1. Explicitly fix the center of mass, eliminating one of the degrees of free-
dom.
2. Pick arbitrarily an equilibrium position. While the deviations of the

center-of-mass position from the equilibrium is not confined to small
excursions, the quadratic approximation is still exact.
First we follow the first method. We can always work in a frame where
the center of mass is at rest, at the origin. Then mO (q1 + q2 ) + mC q3 = 0
is a constraint, which we must eliminate. We can do so by dropping q3
as an independant degree of freedom, and we have, in terms of the two
displacements from equilibrium η1 = q1 + b and η2 = q2 − b, q3 = −(η1 +
η2 )mO /mC , and
1 1 1 mO

T = mO (η̇12 + η̇22 ) + mC η̇32 = mO η̇12 + η̇22 + (η̇1 + η̇2 )2
2 2 2 mC
1 m2O 1 + mC /mO 1 η̇1

= ( η̇1 η̇2 ) .
2 mC 1 1 + mC /mO η̇2
Now T is not diagonal, or more precisely M isn’t. We must find the orthog-
onal matrix O1 such that O1 · M · O1−1 is diagonal. We may assume it to be
a rotation, which can only be
cos θ − sin θ

O=
sin θ cos θ
for some value of θ. It is worthwhile to derive a formula for diagonalizing a

general real symmetric 2 × 2 matrix and then plug in our particular form.
Let
a b c −s

M= , and O = ,
b d s c
where we have abbreviated s = sin θ, c = cos θ. We will require the matrix
element m12 = (O · M · O−1 )12 = 0, because m is diagonal. This determines
θ:
c −s a b c s

−1
O·M ·O =
s c b d −s c
c −s · as + bc · acs + bc2 − bs2 − scd

= =
· · · bs + cd · ·
where we have placed a · in place of matrix elements we don’t need to cal-
culate. Thus the condition on θ is
1
(a − d) sin θ cos θ + b(cos2 θ − sin2 θ) = 0 = (a − d) sin 2θ + b cos 2θ,
2
or
−2b
tan 2θ = .
a−d
Notice this determines 2θ only modulo π, and therefore θ modulo 90◦ , which
ought to be expected, as a rotation through 90◦ only interchanges axes and
reverses directions, both of which leave a diagonal matrix diagonal.
In our case a = d, so tan 2θ = ∞, and θ = π/4. As x = O1 η,
x1 cos π/4 − sin π/4 η1 1 η1 − η2

= =√ ,
x2 sin π/4 cos π/4 η2 2 η1 + η2
and inversely
η1 1 x1 + x2

=√ .
η2 2 −x1 + x2
Then
(ẋ1 + ẋ2 )2 (ẋ1 − ẋ2 )2 mO √
" #
1
T = mO + + ( 2ẋ2 )2
2 2 2 mC
1 1 2mO 2

= mO ẋ21 + mO 1 + ẋ2
2 2 mC
1 1
U = k(q3 − q1 − b)2 + k(q2 − q3 − b)2
2 " 2
2 2 #
1 mO mO

= k η1 + (η1 + η2 ) + η2 + (η1 + η2 )
2 mC mC
2m2O
" #
1 2 2 2 2mO 2
= k η1 + η2 + 2 (η1 + η2 ) + (η1 + η2 )
2 mC mC
" #
1 4m O
= k x21 + x22 + 2 (mO + mC )x22
2 mC
1 2 1 mC + 2mO 2 2

= kx1 + k x2 .
2 2 mC
Thus U is already diagonal and we don’t need to go through steps 2 and 3,

the scaling and second orthogonalization, except to note that if we skip the
scaling the angular frequencies are given by ωi2 = coefficient
q in U / coefficient
in T . Thus we have one normal mode, x1 , with ω1 = k/mO , with x2 = 0,
η1 = −η2 , q3 = 0, in which the two oxygens vibrate in and out together,
symmetrically about the carbon, which doesn’t move. We also have another
mode, x2 , with
v s
+ 2mO )2 /m2O
u
u k(mC k(mC + 2mO )
ω2 = t = ,
mO (1 + 2mO /mC ) mO mC
with x1 = 0, η1 = η2 , in which the two oxygens move right or left together,

with the carbon moving in the opposite direction.
We have successfully solved for the longitudinal vibrations by eliminating
one of the degrees of freedom. Let us now try the second method, in which
we choose an arbitrary equilibrium position q1 = −b, q2 = b, q3 = 0. Then
1 1
T = mO (η̇12 + η̇22 ) + mC η̇32
2 2
1 h i
U = k (η1 − η3 ) + (η2 − η3 )2 .
2
2
T is already diagonal, so O1 = 1I, x = η. In the second step S is the diagonal
√ √ √
matrix with S11 = S22 = mO , S33 = mC , and yi = mO ηi for i = 1, 2,
√
and y3 = mC η3 . Then
 !2 !2 
1  y1 y3 y2 y3
U = k √ −√ + √ −√ 
2 mO mC mO mC
1 k h √ i
= mC y12 + mC y22 + 2mO y32 − 2 mO mC (y1 + y2 )y3 .
2 mO mC
Thus the matrix B is
 √ 
mC 0 − mO mC
√
B= 0 m − mO mC  ,
 
√ √ C
− mO mC − mO mC 2mO
√ √ √
which is singular, as it annihilates the vector y T = ( mO , mO , mC ),
which corresponds to η T = (1, 1, 1), i.e. all the nuclei are moving by the same
amount, or the molecule is translating rigidly. Thus this vector corresponds
to a zero eigenvalue of U , and a harmonic oscillation of zero frequency. This is
free motion3 , ξ = ξ0 +vt. The other two modes can be found by diagonalizing
the matrix, and will be as we found by the other method.
Transverse motion
What about the transverse motion? Consider the equilibrium position of
the molecule to lie in the x direction, and consider small deviations in the z
direction. The kinetic energy
1 1 1
T = mO ż1 + mO ż22 + mC ż32 .
2 2 2
is already diagonal, just as for
the longitudinal modes in the sec-
ond method. Any potential en-
ergy must be due to a resistance
to bending, so to second order, z
U ∝ (ψ − θ)2 ∼ (tan ψ − tan θ)2 = θ 2
b ψ
[(z2 −z3 )/b+(z1 −z3 )/b]2 = b−2 (z1 + z
z b 3
z2 − 2z3 )2 . 1
Note that the potential energy is proportional to the square of a single linear
3
To see that linear motion is a limiting case of harmonic motion as ω → 0, we need to
choose the complex coefficient to be a function of ω, A(ω) = x0 − iv0 /ω, with x0 and v0
real. Then x(t) = limω→0 Re A(ω)eiωt = x0 + v0 limω→0 sin(ωt)/ω = x0 + v0 t
combination of the displacements, or to the square of one component (with

respect to a particular direction) of the displacement. Therefore there is no
contribution of the two orthogonal directions, and there are two zero modes,
or two degrees of freedom with no restoring force. One of these is the center of
mass motion, z1 = z2 = z3 , and the other is the third direction in the abstract
space of possible displacements, z T = (1, −1, 0), with z1 = −z2 , z3 = 0, which
we see is a rotation. Thus there remains only one true transverse vibrational
mode in the z direction, and also one in the y direction, which together with
the two longitudinal ones we found earlier, make up the 4 vibrational modes
we expected from the general formula 2(n − 2) for a collinear molecule.
You might ask whether these oscillations we have discussed are in any
way observable. Quantum mechanically, a harmonic oscillator can only be in
states with excitation energy E = nh̄ω, where n ∈ Z is an integer and 2πh̄
is Planck’s constant. When molecules are in an excited state, they can emit
a photon while changing to a lower energy state. The energy of the photon,
which is the amount lost by the molecule, is proportional to the frequency,
∆E = 2πh̄f , so by measuring the wavelength of the emitted light, we can
determine the vibrational frequencies of the molecules. So the calculations
we have done, and many others for which we have built the apparatus, are
in fact very practical tools for molecular physics.
5.1.2 An Alternative Approach

The step by step diagonalization we just gave is not the easiest approach to
solving the linear differential equation (5.3). Solutions to linear differential
equations are subject to superposition, and equations with coefficients inde-
pendent of time are simplified by Fourier transform, so we can express the
N dimensional vector of functions ηj (t) as
Z ∞
ηj (t) = dωψj (ω)e−iωt .
−∞
Then the Lagrange equations become

Z ∞ X
dω Aij − ω 2 Mij ψj (ω)e−iωt = 0 for all t.
−∞ j
But e−iωt are linearly independent functions of t ∈ R, so

X
Aij − ω 2 Mij ψj (ω) = 0.
j
5.2. OTHER INTERACTIONS 133
This implies ψj (ω) = 0 except when the matrix Aij − ω 2 Mij is singular,
det (Aij − ω 2 Mij ) = 0, which gives a discrete set of angular frequencies
ω1 . . . ωN , and for each ωj an eigenvector ψj .
5.2 Other interactions

In our treatment we assumed a Lagrangian formulation with a kinetic term
purely quadratic in ~q˙, together with a velocity independent potential. There
is a wider scope of small oscillation problems which might include dissipative
forces like friction, or external time-dependent forces, or perhaps terms in
the Lagrangian linear in the velocities. An example of the latter occurs
in rotating reference frames, from the Coriolus force, and is important in
the question of whether there is a gravitationally stable location for small
objects caught between the Earth and the moon at the “L5” point4 . Each of
these complications introduces terms, even in the linear approximation to the
equations of motion, which cannot be diagonalized away, because there is not
significant freedom of diagonalization left, in general, after having simplified
T and U . Thus the approach of section 5.1 does not generalize well, but the
approach of section 5.1.2 can be applied.
For example, we might consider adding a generalized force Qi on ηi , con-
P
sisting of a dissipative force j Rij η̇j and a driving force Fi . We will assume
R is a symmetric matrix, as might be a result of a Rayleigh dissipation func-
tion (see Section 2.7 or Ref. [6]). We will consider the motion to first order
in F , so any coordinate dependence of R or F is replaced, as it was for M
and A, by their values at the equilibrium position. Thus the equations of
motion become X
(Mij η̈j + Rij η̇j + Aij ηj ) − Fi = 0.
j
Again making the ansatz that

Z ∞
ηj (t) = dωψj (ω)e−iωt
−∞
and expressing Fi (t) in terms of its fourier transform

Z ∞
Fj (t) = dω f˜i (ω)e−iωt
−∞
4
See problem 5.3.
we find
X
−ω 2 Mij − iωRij + Aij ψj = f˜i .
j
Except for at most 2N values of ω the matrix multiplying ψj will have a non-
zero determinant and will be invertible, allowing us to find the response ψj
to the fourier component of the driving force, f˜i . Those values of ω for which
the determinant vanishes, and the vector ψj which the matrix annihilates,
correspond to damped modes that we would see if the driving force were
removed.
5.3 String dynamics

In this section we consider two closely related problems, transverse oscilla-
tions of a stretched loaded string, and of a stretched heavy string. The latter
is is a limiting case of the former. This will provide an introduction to field
theory, in which the dynamical degrees of freedom are not a discrete set but
are defined at each point in space. In Chapter 8 we will discuss more in-
teresting and involved cases such as the electromagnetic field, where at each
point in space we have E ~ and B~ as degrees of freedom, though not without
constraints.
The loaded string we will consider is a light string under tension τ stretched
between two fixed points a distance ` apart, say at x = 0 and x = `. On
the string, at points x = a, 2a, 3a, . . . , na, are fixed n particles each of mass
m, with the first and last a distance a away from the fixed ends. Thus
` = (n + 1)a. We will consider only small transverse motion of these masses,
using yi as the transverse displacement of the i’th mass, which is at x = ia.
We assume all excursions from the equilibrium positions yi = 0 are small, and
in particular that the difference in successive displacements yi+1 − yi a.
Thus we are assuming that the angle made by each segment of the string,
θi = tan−1 [(yi+1 − yi )/a] 1. Working to first order in the θ’s in the equa-
tions of motion, and second order for the Lagrangian, we see that restricting
our attention to transverse motions and requiring no horizontal motion forces
taking the tension τ to be constant along the string. The transverse force on
the i’th mass is thus
yi+1 − yi yi−1 − yi τ
Fi = τ +τ = (yi+1 − 2yi + yi−1 ).
a a a
5.3. STRING DYNAMICS 135
The potential energy U (y1 , . . . , yn ) then satisfies
∂U τ
= − (yi+1 − 2yi + yi−1 )
∂yi a
so
U (y1 , . . . , yi , . . . , yn )
Z yi
τ
= dyi (2yi − yi+1 − yi−1 ) + F (y1 , . . . , yi−1 , yi+1 , . . . , yn )
0 a
τ 2
= yi − (yi+1 + yi−1 )yi + F (y1 , . . . , yi−1 , yi+1 , . . . , yn )
a
τ
= (yi+1 − yi )2 + (yi − yi−1 )2 + F 0 (y1 , . . . , yi−1 , yi+1 , . . . , yn )
2a
n
τ
(yi+1 − yi )2 + constant.
X
=
i=0 2a
The F and F 0 are unspecified functions of all the yj ’s except yi . In the last
expression we satisfied the condition for all i, and we have used the convenient
definition y0 = yn+1 = 0. We can and will drop the arbitrary constant.
Pn 2
The kinetic energy is T = 21 m ẏ .
1 i
Before we continue with the analysis of this problem, let us note that
another physical setup also leads to the same Lagrangian. Consider a one
dimensional lattice of identical atoms with a stable equilibrium in which they
are evenly spaced, with interactions between nearest neighbors. Let ηi be the
longitudinal displacement of the i’th atom from its equilibrium position. The
kinetic energy is simply T = 21 m n1 η̇i2 . As the interatomic distance differs
P
from its equilibrium position by ηi+1 − ηi , the interaction potential of atoms

i and i + 1 can be approximated by U (ηi+1 , ηi ) ≈ 12 k(ηi+1 − ηi )2 . We have
in effect atoms separated by springs of spring constant k, and we see that
if k = τ /a, we get the same Lagrangian for longitudinal oscillations of this
lattice as we had for the transverse oscillations of the loaded string.
As the kinetic energy is simply T = 21 m n1 ẏi2 , the mass matrix is already
P
proportional to the identity matrix and we do not need to go through the

first two steps of our general process. The potential energy U = 21 y T · A · y
has a non-diagonal n × n matrix

−2 1 0 0 ··· 0 0
 
 1
 −2 1 0 · · · 0 0 
τ 0 1 −2 1 · · · 0 0
 

A=− 
 .. .. .. .. . . .. ..  .
a . . . . . . . 

0 0 · · · −2 1 
 
 0 0
0 0 0 0 · · · 1 −2
Diagonalizing even a 3 × 3 matrix is work, so an n × n matrix might seem
out of the question, without some hints from the physics of the situation. In
this case the hint comes in a roundabout fashion — we will first consider a
limit in which n → ∞, the continuum limit, which leads to an interesting
physical situation in its own right.
Suppose we consider the loaded string problem in the limit that the spac-
ing a becomes very small, but the number of masses m becomes large, keeping
the total length ` of the string fixed. If at the same time we adjust the in-
dividual masses so that the mass per unit length, ρ, is fixed, our bumpy
string gets smoothed out in the limit, and we might expect that in this limit
we reproduce the physical problem of transverse modes of a uniformly dense
stretched string, like a violin string. Thus we wish to consider the limit
a → 0, n → ∞, ` = (n + 1)a fixed, m → 0, ρ = m/a fixed.
It is natural to think of the degrees of freedom as associated with the label x
rather than i, so we redefine the dynamical functions {yj (t)} as y(x, t), with
y(ja, t) = yj (t). While this only defines the function at discrete points in x,
these are closely spaced for small a and become dense as a → 0. We will
assume that the function y(x) is twice differentiable in the continuum limit,
though we shall see that this is not the case for all possible motions of the
discrete system.
What happens to the kinetic and potential energies in this limit? For the
kinetic energy,
1 X 2 1 X 2 1 X 1 Z`
T = m ẏi = ρ aẏ (xi ) = ρ ∆xẏ 2 (xi ) → ρ dx ẏ 2 (x),
2 i 2 i 2 i 2 0
where the next to last expression is just the definition of a Riemann integral.
For the potential energy,
2 !2
τ X τX yi+1 − yi τZ` ∂y

U= (yi+1 − yi )2 = ∆x → dx .
2a i 2 i ∆x 2 0 ∂x
5.3. STRING DYNAMICS 137
The equation of motion for yi is
∂L ∂U τ
mÿi = =− = [(yi+1 − yi ) − (yi − yi−1 )],
∂yi ∂yi a
or
τ
ρaÿ(x) = ([y(x + a) − y(x)] − [y(x) − y( x − a)]).
a
We need to be careful about taking the limit
y(x + a) − y(x) ∂y
→
a ∂x
because we are subtracting two such expressions evaluated at nearby points,
and because we will need to divide by a again to get an equation between
finite quantities. Thus we note that

y(x + a) − y(x) ∂y
= + O(a2 ),
a ∂x x+a/2

so
!
τ y(x + a) − y(x) y(x) − y( x − a)
ρÿ(x) = −
a a a
 
τ  ∂y ∂y ∂ 2y
≈ − →τ ,
∂x2

a ∂x x+a/2 ∂x x−a/2
and we wind up with the wave equation for transverse waves on a massive
string
∂ 2y 2
2∂ y
− c = 0,
∂t2 ∂x2
where s
τ
c= .
ρ
Solving this wave equation is very simple. For the fixed boundary condi-
tions y(x) = 0 at x = 0 and x = `, the solution is a fourier expansion
∞
Re Bp eickp t sin kp x,
X
y(x, t) =
p=1
where kp ` = pπ. Each p represents one normal mode, and there are an
infinite number as we would expect because in the continuum limit there are
an infinite number of degrees of freedom.
We have certainly not shown that y(x) = B sin kx is a normal mode for
the problem with finite n, but it is worth checking it out. This corresponds
to a mode with yj = B sin kaj, on which we apply the matrix A
X τ
(A · y)i = Aij yj = − (yi+1 − 2yi + yi−1 )
j a
τ
= − B (sin(kai + ka) − 2 sin(kai) + sin(kai − ka))
a
τ
= − B(sin(kai) cos(ka) + cos(kai) sin(ka) − 2 sin(kai)
a
+ sin(kai) cos(ka) − cos(kai) sin(ka))
τ
= B (2 − 2 cos(ka)) sin(kai)
a
2τ
= (1 − cos(ka)) yi .
a
So we see that it is a normal mode, although the frequency of oscillation
s s
2τ τ sin(ka/2)
ω= (1 − cos(ka)) = 2
am ρ a
q
differs from k τ /ρ except in the limit a → 0 for fixed k.
The wave numbers k which index the normal modes are restricted by
the fixed ends to the discrete set k = pπ/` = pπ/(n + 1)a, for p ∈ Z, i.e. p is
an integer. This is still too many (∞) for a system with a finite number of
degrees of freedom. The resolution of this paradox is that not all different k’s
correspond to different modes. For example, if p0 = p + 2m(n + 1) for some
integer m, then k 0 = k + 2πm/a, and sin(k 0 aj) = sin(kaj + 2mπ) = sin(kaj),
so k and k 0 represent the same normal mode. Also, if p0 = 2(n + 1) − p,
k 0 = (2π/a)−k, sin(k 0 aj) = sin(2π −kaj) = − sin(kaj), so k and k 0 represent
the same normal mode, with opposite phase. Finally p = n + 1, k = π/a
gives yj = B sin(kaj) = 0 for all j and is not a normal mode. This leaves as
independent only p = 1, ..., n, the right number of normal modes for a system
with n degrees of freedom.
The angular frequency of the p’th normal mode
τ pπ
r
ωp = 2 sin
ma 2(n + 1)
5.4. FIELD THEORY 139
in plotted in Fig. 5.3. For fixed values of p and ρ, as n → ∞,

s s
τ1 paπ τ pπ
ωp = 2 sin →2 = ckp ,
ρa 2` ρ 2`
as we have in the continuum limit.

But if we consider modes with a
fixed ratio of p/n as n → ∞, we
do not have a smooth limit y(x),
and such nodes are not appropri-
ate for the continuum limit. In the
physics of crystals, the former kind
of modes are known as accous-
tic modes, while the later modes,
in particular those for n − p fixed,
which depend on the discrete na- 1 2 3 4 5 6 7 8 9 10 11 12
ture of the crystal, are called opti- Fig. 5.3. Frequencies of oscillation
cal modes. of the loaded string.
5.4 Field theory

We now examine how to formulate the continuum limit directly.
5.4.1 Lagrangian density

We saw in the last section that the kinetic and potential energies in the
continuum limit can be written as integrals over x of densities, and so we
may also write the Lagrangian as the integral of a Lagrangian density
L(x),
 !2 
Z L
1 1 ∂y(x, t)
L=T −U = dxL(x), L(x) =  ρẏ 2 (x, t) − τ .
0 2 2 ∂x
This Lagrangian, however, will not be of much use until we figure out what is
meant by varying it with respect to each dynamical degree of freedom or its
corresponding velocity. In the discrete case we have the canonical momenta
Pi = ∂L/∂ ẏi , where the derivative requires holding all ẏj fixed, for j 6= i, as
well as all yk fixed. This extracts one term from the sum 21 ρ aẏi2 , and this
P
would appear to vanish in the limit a → 0. Instead, we define the canonical

momentum as a density, Pi → aP (x = ia), so
1 ∂ X
P (x = ia) = lim a L(y(x), ẏ(x), x)|x=ai .
a ∂ ẏi i
We may think of the last part of this limit,

X Z
lim a L(y(x), ẏ(x), x)|x=ai = dx L(y(x), ẏ(x), x),
a→0
i
if we also define a limiting operation
1 ∂ δ
lim → ,
a→0 a ∂ ẏi δ ẏ(x)
1 ∂
and similarly for a ∂yi
, which act on functionals of y(x) and ẏ(x) by
δy(x1 ) δ ẏ(x1 ) δy(x1 ) δ ẏ(x1 )

= δ(x1 − x2 ), = = 0, = δ(x1 − x2 ).
δy(x2 ) δy(x2 ) δ ẏ(x2 ) δ ẏ(x2 )
Here δ(x0 − x) is the Dirac delta function, defined by its integral,

Z x2
f (x0 )δ(x0 − x)dx0 = f (x)
x1
for any function f (x), provided x ∈ (x1 , x2 ). Thus
δ Z ` 01 2 0 Z `
P (x) = dx ρẏ (x , t) = dx0 ρẏ(x0 , t)δ(x0 − x) = ρẏ(x, t).
δ ẏ(x) 0 2 0
We also need to evaluate

!2
δ δ Z ` 0 −τ ∂y
L= dx .
δy(x) δy(x) 0 2 ∂x x=x0
For this we need

δ ∂y(x0 ) ∂
0
= 0
δ(x0 − x) := δ 0 (x0 − x),
δy(x) ∂x ∂x
which is again defined by its integral,

Z x2 Z x2
∂
f (x0 )δ 0 (x0 − x)dx0 = f (x0 ) δ(x0 − x)dx0
x1 x1 ∂x0
Z x2
0 0 x ∂f
= f (x )δ(x − x)|x21 − dx0 δ(x0 − x)
x1 ∂x0
∂f
= (x),
∂x
where after integration by parts the surface term is dropped because δ(x −
x0 ) = 0 for x 6= x0 , which it is for x0 = x1 , x2 if x ∈ (x1 , x2 ). Thus
δ Z `
∂y ∂ 2y
L = − dx0 τ (x0 )δ 0 (x0 − x) = τ 2 ,
δy(x) 0 ∂x ∂x
and Lagrange’s equations give the wave equation
∂ 2y
ρÿ(x, t) − τ = 0. (5.4)
∂x2
We have derived the wave equation for small transverse deformations
of a strectch string by considering the continuum limit of a loaded string,
in the process demonstating how to formulate Lagrangian mechanics for a
continuum system. Of course it is more usual, and simpler, to derive it
directly by considering Newton’s law on an infinitesimal element of the string.
Let’s include gravity for good measure. If the string point initially at x
has a transverse displacement y(x)
and a longitudinal displacement
η(x), both considered small, the τ (x+∆ x)
θ (x ) θ (x+∆ x)
slope of the string dy/dx is also
small. The segment [x, x + ∆x] has ∆x
τ (x )
a mass ρ∆x, where as before ρ is x x+∆x
the mass per unit length, and the
forces on it are
in x direction: τ (x + ∆x) cos θ(x + ∆x) − τ (x) cos θ(x) = ρ∆xη̈
in y direction: τ (x + ∆x) sin θ(x + ∆x) − τ (x) sin θ(x) − ρg∆x = ρ∆xÿ
As θ << 1, we can replace cos θ by 1 and sin θ with tan θ = ∂y/∂x, and
then from the first equation we see that ∂τ /∂x is already small, so we can
consider τ as constant in the second equation, which gives
!
∂y ∂y
τ − − ρg∆x = ρ∆xÿ,
∂x x+∆x ∂x x
or
∂ 2y
τ 2 − ρg = ρÿ.
∂x
This agrees with Eq. 5.4 if we drop the gravity term, which we had not
included in our discussion of the loaded string.
5.4.2 Three dimensional continua
Could we do the same kind of analysis on a three dimensional solid object?

We might label each piece of the object with an equilibrium or reference
position ~x, and consider the dynamics of possible displacements ~η (~x). We
will assume this displacement is small and smooth function of ~x and t, in
fact twice differentiable. Consider the dynamics of an infinitesimal volume
element ∆V . The acceleration of each volume element will be determined by
the ratio of the net force on that volume to its mass, ρ∆V , where ρ is now the
density, mass per unit volume, and is also assumed to be a smooth function,
though not necessarily constant. The forces we will consider will be of two
types. There may be external forces which will be taken to be extensive,
that is, proportional to the volume, called volume forces. One example is
gravity near the Earth’s surface, with F~ = −ρg∆V êz . If the material under
discussion had an electric charge density ρE (~x) in an external electric field
~ x), there would also be a volume force ρE (~x)E(~
E(~ ~ x) ∆V . In addition to the
volume forces, there are also surface forces which the rest of the object
exerts on the element under consideration. We will assume that all such
forces are local, due to the material on the other side of the surface, and
continuously varying, so the force across an infinitesimal element of surface
dS will be proportional to its area, at least if we keep the direction fixed.
In fact, we can show that the force

across an infinitesimal surface dS ~ is linear
in the vector dS ~ even when the direction
changes. Consider two elements dS ~1 and
~
dS2 , shown as rectangles, and the third 1
S
~
side dS3 , which is the opposite of their
sum in the limit that size shrinks to zero.
Together with the two parallel triangular
pieces, these bound an infinitesimal vol- 3 S
ume. Let us scale the whole picture by a
factor λ. The force on each side is propor-
tional to λ2 , but the mass of the volume
2 S
is proportional to λ3 , so as λ → 0, the
coefficient of λ2 in the sum of the forces must vanish. The triangular pieces
cancel each other, so the sum of the forces through dS ~1 and dS ~2 cancels
the force through dS ~3 . That is, the force is linear (additive) in the surface
elements dS.~
But the force is not necessarily in the same direction as dS. ~ This would
be true for the pressure in a gas, or in a nonviscous or static fluid, in which no
tangential forces could be exerted along the boundary. But more generally,
a force linear in dS~ will be specified by a matrix, and the force exerted on
~
dV across dS will be Fi = − j Pij dSj , where P is known as the stress
P
tensor5 .
Though P is not a scalar or di-
agonal in general, there is one con-
straint on the stress tensor — it is
symmetric. To see this, consider z
the prism shown, and the torque in y
the y direction. The forces across
the two faces perpendicular to z λ h
are of order λ, and are equal and
opposite, so they provide a torque λ x
−λ2 hPxz in the y direction. Sim-
5
P
To be clear: Pij dSj is the force exerted by the back side of the surface element on
j
~ is an outward normal, the force on the volume is − P Pij dSj ,
R
the front side, so if dS S j
and a pressure corresponds to P = +pδij . This agrees with Symon ([17]) but has a reversed
sign from Taylor’s ([18]) Σ = −P.
ilarly the two faces perpendicular to x provide a torque +λ2 hPzx in that
direction. The equal forces on other two faces have a moment arm parallel
to y and therefore provide no torque in that direction. But the moment of
inertia about the y axis is of order λ2 dV = λ4 h. So if the angular acceleration
is to remain finite as λ → 0, we must have Pzx − Pxz = 0, and P must be a
symmetric matrix.
We expect that the stress forces the material on one side of a boundary
exerts on the other is due to some distortion of the material. Near any value
of x, we may expand the displacement as
∂ηi
ηi (x + ∆x) = ηi (x) + ∆xj + ...
∂xj
Moving the entire object as a whole, ~η (x) = constant, or rotating it as a rigid
body about an axis ω~ , with ∂ηi /∂xj = ijk ωk , will not produce any stress, and
so we will not consider such displacements to be part of the strain tensor,
which we therefore define to be the symmetric part of the derivative matrix:
!
1 ∂ηi ∂ηj
Sij = + .
2 ∂xj ∂xi
In general, the properties of the material will determine how the stress tensor
is related to the strain tensor, though for small displacements we expect it
to depend linearly.
Even linear dependence could be quite complex, but if the material prop-
erties are rotationally symmetric, things are fairly simple. Of course in a crys-
tal we might not satisfy that condition, but if we do assume the functional
dependence of the stress on the strain is rotationally invariant, we may find
the most general possibilities by decomposing the tensors into pieces which
behave suitably under rotations. Here we are generalizing the idea that a
vector cannot be defined in terms of pure scalars, and a scalar can depend on
vectors only through a scalar product. A symmetric tensor consists of a piece,
its trace, which behaves like a scalar, and a traceless piece, called the de-
viatoric part, which behaves differently, as an irreducible representation6 .
6
Representations of a symmetry group are defined as vector spaces which are invariant
under the action of the symmetry, and irreducible ones are those for which no proper
subspace is closed in that fashion. For more on this, see any book on group theory for
physicists. But for representations of the rotation group a course in quantum mechanics
may be better. The traceless part of the symmetric tensor transforms like a state with
angular momentum 2.
The only possible linear relationships are thus

1 1

Tr P = −α Tr S; Pij − δij Tr P = −β Sij − δij Tr S . (5.5)
3 3
These are known as the generalized Hooke’s law for an elastic solid.
The tensor stress and strain we have described here are perhaps not as fa-
miliar as some other relations met in more elementary courses. First consider
the bulk modulus B, the inverse of the ratio of the fractional decrease in
volume to the isotropic pressure which causes it. Here the stress and strain
tensors are both multiples of the identity, P = +pδij and d~η = −cd~x, so
S = −cδij and c = p/α. For a linear contraction x → x − cx the volume will
contract by ∆V = −3cV . Therefore the
p p α
bulk modulus B = = = .
−dV /V 3c 3
Next, consider a shear, in which the displacement might be ~η = cyêx
produced by forces ±Fx on the
horizontal faces shown, and ±Fy Fx
on the vertical faces. To have
no rotation we need wFx = LFy . Fy w Fy
The shear modulus G is defined by Fx
−Pxy = Fx /A = Gdηx /dy = Gc, L
where A is the area of the top face.
As !
1 ∂ηx ∂ηy 1 c
Sxy = + = (c + 0) =
2 ∂y ∂x 2 2
and all other components are zero, we can set
Pxy
β=− = 2G.
Sxy
Finally, consider a rod being pulled by a force F stretching a distance ∆L
over a length L. Hooke’s constant is k = F/∆L and Young’s modulus
Y is defined by
F ∆L
=Y so Y = kL/A.
A L
The strain S11 = ∆L /L, and the stress has −P11 = Y S11 , with all other
components of the stress zero. But there may be displacement in the trans-
verse directions. If the rod is axially symmetric we may assume S22 = S33 ,
so
− Tr P = −P11 = Y S11 = α Tr S = α (S11 + 2S22 ) ,

1 Y 1 β

− P22 − Tr P = 0 − S11 = β S22 − Tr S = (S22 − S11 )
3 3 3 3
Thus solving the two equations
Y S11 = α (S11 + 2S22 )

Y β
− S11 = (S22 − S11 )
3 3
gives the value of Young’s modulus
3αβ
Y =
2α + β
and the contraction of the transverse dimentsions,
β−α
S22 = S11 .
2α + β
The Equation of Motion

Now that the generalized Hooke’s law provides the forces for a solid in a given
configuration, we can write down the equations of motion. The infinitesimal
volume
R
originally at the reference point ~r is at position ~r + ~η (~r, t). Its mass
is V ρdV , and the force on it is the sum of the volume force and the surface
~ r)dV , where E could be
force. We will write the volume force as F~vol = V E(~
R
−ρgêz for gravity or some other intensive external force. The surface force is
Z X Z
Fisurf = − Pij (~r)dSj or F~ surf = − ~
P(~r) · dS.
S j S
In this vector form we imply that the first index of P is matched to that
of F~ surf , while the second index is paired with that of dS ~ and summed
over. Gauss’s law tells us that this is the integral over the volume V of the
divergence, but we should take care that this divergence dots the derivative
with the second index, that is
Z X
∂
Fisurf = − Pij (~r)dV.
V j ∂xj
However, as P is symmetric, we can get away with writing

Z
F~ surf = − ~ · P(~r)dV.
∇
V
Writing Hooke’s law as

1 1
P = −βS + 1I(Tr P + β Tr S) = −βS − 1I(α − β) Tr S,
3 3
(where 1I is the identity matrix 1Iij = δij ), Newton’s second law gives
∂ 2 ~η (~r) ~ r) − ∇
~ · P(~r)
ρ(~r) = E(~
∂t2
= E(~ ~ · S(~r) + α − β ∇
~ r) + β ∇ ~ Tr S(~r)
3
where in the last term we note that the divergence contracted into the 1I
gives an ordinary gradient on the scalar function Tr S. As the strain tensor
is already given in terms of derivatives of ~η , we have
! !
~ · S(~r)]j = ∂ 1 ∂ηi ∂ηj 1 ∂ ~
∇ · ~η + ∇2 ηj ,
X
[∇ + =
i ∂xi 2 ∂xj ∂xi 2 ∂xj
~ · S(~r) = 1 ∇(
or ∇ ~ ∇~ · ~η ) + 1 ∇2 ~η . Also Tr S = Pi ∂ηi /∂xi = ∇
~ · ~η , so we find
2 2
the equations of motion
∂ 2 ~η (~r)
!
ρ(~r) ~ r) + α + β ∇(
= E(~ ~ ∇~ · ~η ) + β ∇2 ~η . (5.6)
∂t 2 3 6 2
This equation is called the Navier equation. We can rewrite this in terms of
the shear modulus G and the bulk modulus B:
∂ 2 ~η (~r) ~ r) + B + G ∇(

ρ(~r) = E(~ ~ ∇~ · ~η ) + G∇2 ~η .
∂t2 3
Fluids
In discussing the motion of pieces of a solid, we specified which piece of the
material was under consideration by its “original” or “reference” position ~r,
from which it might be displaced by a small amount ~η (~r). So ~r is actually a
label for a particular hunk of material. This is called the material descrip-
tion. It is not very useful for a fluid, however, as any element of the fluid
may flow arbitrarily far from some initial position. It is more appropriate to
consider ~r as a particular point of space, and ρ(~r, t) or ~v (~r, t) or T (~r, t) as
the density or velocity or temperature of whatever material happens to be
at point ~r at the time t. This is called the spatial description.
If we wish to examine how some physical property of the material is chang-
ing with time, however, the physical processes which cause change do so on a
particular hunk of material. For example, the concentration of a radioactive
substance in a hunk of fluid might change due to its decay rate or due to its
diffusion, understandable physical processes, while the concentration at the
point ~r may change just because new fluid is at the point in question. In
describing the physical processes, we will need to consider the rate of change
for a given hunk of fluid. Thus we need the stream derivative, which involves
the difference of the property (say c) at the new position ~r 0 = ~r + ~v ∆t at
time t + ∆t and that at the old ~r, t. Thus
dc c(~r + ~v ∆t, t + ∆t) − c(~r, t) ~ + ∂c .
(~r, t) = lim = ~v · ∇c
dt ∆t→0 ∆t ∂t
In particular, Newton’s law refers to the acceleration of a hunk of material,
so it is the stream derivative of the velocity which will be changed by the
forces acting on the fluid:
!
d~v ~ v (~r, t) + ∂~v (~r, t)
ρ(~r)∆V = ρ(~r)∆V ~v · ∇~ = F~ surf + F~ vol .
dt ∂t
The forces on a fluid are different from that in a solid. The volume force
is of the same nature, the most common being F~ vol = −ρgêz dV , and the
pressure piece of the stress, Pp = +p1I is also the same. Thus we can expect
a force of the form F~ = (−ρgêz − ∇ ~ · 1Ip)dV = dV (−ρgêz − ∇p).
~ A static
fluid can not experience a shear force. So there will be no shear component
of the stress due to a deviatoric part of the strain. But there can be stress
due to the velocity of the fluid. Of course a uniformly moving fluid will
not be stressed, but if the velocity varies from point to point, stress could
be produced. Considering first derivatives, the nine components of ∂vi /∂xj
~ · ~v , an antisymmetric piece, and a traceless symmetric
have a scalar piece ∇
piece, each transforming differently under rotations. Thus for an isotropic
fluid the stress may have a piece
!
∂vi ∂vj ~ · ~v 1I
Pij = −µ + − ν∇
∂xj ∂xi
in addition to the scalar piece p1I. The coefficient µ is called the viscosity.
The piece proportional to ∇ ~ · ~v may be hard to see relative to the pressure
term, and is not usually included7
~ v , is in fact just the fractional rate of
The scalar component of ∂vi /∂xj , ∇·~
change of the volume. To see that, consider the surface S which bounds the
material in question. If a small piece of that surface is moving with velocity
~v , it is adding volume to the material at a rate ~v · dS,~ so
dV I
~=
Z
~ · ~v dV.
= ~v · dS ∇
dt S V
As the mass of the material in question is constant, d(ρV )/dt = 0, so

dρ ~ · ~v = 0.
+ ρ∇
dt
This is known as the equation of continuity.
With !
∂vi ∂vj X ∂vj
Pij = p1I − µ + − ν1I
∂xj ∂xi j ∂xj
the surface force is

!
I XI ∂vi ∂vj X I ∂vj
F~isurf = −pdSi + µ + dSj + ν dSi
S j S ∂xj ∂xi j S ∂xj
 
Z
∂p X ∂ 2 vi X ∂ 2 vj
= − +µ + (µ + ν)  dV
2
V ∂xi j ∂xj j ∂xi xj
where the last equality is by Gauss’ law. This can be rewritten in vector
form: Z
surf

~
F = ~ + µ∇2~v + (µ + ν)∇(
−∇p ~ ∇
~ · ~v dV
V
Adding in F~ vol = −ρgêz dV and setting this equal to ρ dV d~v /dt, we find
d~v ∂~v (~r, t) ~ v (~r, t)
= + ~v · ∇~ (5.7)
dt ∂t
1~ µ µ + ν ~ ~
= −gêz − ∇p(~ r, t) + ∇2~v (~r, t) + ∇ ∇ · ~v (~r, t) .
ρ ρ ρ
7
Tietjens ([19]), following Stokes, assumes the trace of P is independent of the “velocity
of dilatation” ∇~ · ~v , which requires ν = −2µ/3. But Prandtl and Tietjens [12] drop the
~ ~
∇(∇ · ~v ) term in (5.7) entirely, equivalent to taking ν = −µ.
This is the Navier-Stokes equation for a viscous fluid. For an inviscid fluid,
one with a negligible viscosity, this reduces to the simpler Euler’s equation
∂~v (~r, t) ~ v (~r, t) = −gêz − 1 ∇p(~
~ r, t).
+ ~v · ∇~ (5.8)
∂t ρ
If we assume the fluid is inviscid and incompressible, so ρ is constant,
and also make the further simplifying assumption that we are looking at a
steady-state flow, for which ~v and p at a fixed point do not change, the partial
derivatives ∂/∂t vanish, and ∇ ~ · ~v = 0. Then Euler’s equation becomes
~ v (~r) = −gêz − 1 ∇p(~

~v · ∇~ ~ r). (5.9)
ρ
In a steady state situation, any function f (~r) has a stream derivative
d
f = ~v · ∇f,
dt
~
so the first term in (5.9) is d~v /dt, and the second term is −∇(gz). Dotting
the equation in this form into ρ~v , we have
d~v ~ = 0 = d 1 ρv 2 + ρgz + p

ρ~v · + ρ~v · ∇(gz) + ~v · ∇p
dt dt 2
which implies Bernoulli’s equation:
1 2
ρv + ρgz + p = constant along a streamline
2
where the restriction is because a streamline is the set of points in the flow
which are traversed by an element of the fluid as time goes by.
Exercises
5.1 Three springs connect two masses to each other and to immobile walls, as
shown. Find the normal modes and frequencies of oscillation, assuming the system
remains along the line shown.
a 2a a
k m 2k m k
5.2 Consider the motion, in a fixed vertical plane, of a double pendulum consist-
ing
of two masses attached to each other and to a fixed
point by inextensible strings of length L. The upper
mass has mass m1 and the lower mass m2 . This is all in
a laboratory with the ordinary gravitational forces near
the surface of the Earth. L
a) Set up the Lagrangian for the motion, assuming the
strings stay taut.
b) Simplify the system under the approximation that the m
motion involves only small deviations from equilibrium.
1
Put the problem in matrix form appropriate for the pro-
cedure discussed in class. L
c) Find the frequencies of the normal modes of oscilla-
tion. [Hint: following exactly the steps given in class will
be complex, but the analogous procedure reversing the m
order of U and T will work easily.] 2
5.3 (a) Show that if three mutually gravitating point masses are at the vertices
of an equilateral triangle which is rotating about an axis normal to the plane of
the triangle and through the center of mass, at a suitable angular velocity ω, this
motion satisfies the equations of motion. Thus this configuration is an equilibrium
in the rotating coordinate system. Do not assume the masses are equal.
(b) Suppose that two stars of masses M1 and M2 are rotating in circular orbits
about their common center of mass. Consider a small mass m which is approx-
imately in the equilibrium position described above (which is known as the L5
point). The mass is small enough that you can ignore its effect on the two stars.
Analyze the motion, considering specifically the stability of the equilibrium point
as a function of the ratio of the masses of the stars.
5.4 In considering the limit of a loaded string we found that in the limit a →
0, n → ∞ with ` fixed, the modes with fixed integer p became a smooth excitation
y(x, t) with finite wavenumber k and frequency ω = ck.
Now consider the limit with q := n+1−p fixed as n → ∞. Calculate the expression
for yj in that limit. This will not have a smooth limit, but there is nonetheless a
sense in which it can be described by a finite wavelength. Explain what this is,
and give the expression for yj in terms of this wavelength.
5.5 Consider the Navier equation ignoring the volume force, and show that
a) a uniform elastic material can support longitudinal waves. At what speed do
they travel?
b) an uniform elastic material can support transverse waves. At what speed do

they travel?
c) Granite has a density of 2700 kg/m3 , a bulk modulus of 4 × 1010 N/m2 and a
shear modulus of 2.5 × 1010 N/m2 . If a short spike of transverse oscillations arrives
25 seconds after a similar burst of longitudinal oscillations, how far away was the
explosion that caused these waves?
Chapter 6
Hamilton’s Equations
We discussed the generalized momenta
∂L(q, q̇, t)
pi = ,
∂ q̇i
and how the canonical variables {qi , pj } describe phase space. One can use
phase space rather than {qi , q̇j } to describe the state of a system at any
moment. In this chapter we will explore the tools which stem from this
phase space approach to dynamics.
6.1 Legendre transforms

The important object for determining the motion of a system using the La-
grangian approach is not the Lagrangian itself but its variation, under arbi-
trary changes in the variables q and q̇, treated as independent variables. It
is the vanishing of the variation of the action under such variations which
determines the dynamical equations. In the phase space approach, we want
to change variables q̇ → p, where the pi are components of the gradient of
the Lagrangian with respect to the velocities. This is an example of a general
procedure called the Legendre transformation. We will discuss it in terms of
the mathematical concept of a differential form.
Because it is the variation of L which is important, we need to focus
our attention on the differential dL rather than on L itself. We first want
to give a formal definition of the differential, which we will do first for a
function f (x1 , ..., xn ) of n variables, although for the Lagrangian we will
153
154 CHAPTER 6. HAMILTON’S EQUATIONS
later subdivide these into coordinates and velocities. We will take the space
in which x takes values to be some general n-dimensional space we call M,
which might be ordinary Euclidean space but might be something else, like
the surface of a sphere1 . Given a function f of n independent variables xi ,
the differential is
n
X ∂f
df = dxi . (6.1)
i=1 ∂xi
What does that mean? As an approximate statement, this can be regarded

as saying
n
X ∂f
df ≈ ∆f ≡ f (xi + ∆xi ) − f (xi ) = ∆xi + O(∆xi ∆xj ),
i=1 ∂xi
with some statement about the ∆xi being small, followed by the dropping of
the “order (∆x)2 ” terms. Notice that df is a function not only of the point
x ∈ M, but also of the small displacements ∆xi . A very useful mathematical
language emerges if we formalize the definition of df , extending its definition
to arbitrary ∆xi , even when the ∆xi are not small. Of course, for large ∆xi
they can no longer be thought of as the difference of two positions in M
and df no longer has the meaning of the difference of two values of f . Our
formal df is now defined as a linear function of these ∆xi variables, which
we therefore consider to be a vector ~v lying in an n-dimensional vector space
Rn . Thus df : M × Rn → R is a real-valued function with two arguments,
one in M and one in a vector space. The dxi which appear in (6.1) can be
thought of as operators acting on this vector space argument to extract the
i0 th component, and the action of df on the argument (x, ~v ) is df (x, ~v ) =
P
i (∂f /∂xi )vi .
This differential is a special case of a 1-form, as is each of the operators
dxi . All n of these dxi form a basis of 1-forms, which are more generally
X
ω= ωi (x)dxi ,
i
where the ωi (x) are functions on the manifold M. If there exists an ordinary
function f (x) such that ω = df , then ω is said to be an exact 1-form.
1
Mathematically, M is a manifold, but we will not carefully define that here. The
precise definition is available in Ref. [16].
6.1. LEGENDRE TRANSFORMS 155
Consider L(qi , vj , t), where vi = q̇i . At a given time we consider q and v

as independant variables. The differential of L on the space of coordinates
and velocities, at a fixed time, is
X ∂L X ∂L X ∂L X
dL = dqi + dvi = dqi + pi dvi .
i ∂qi i ∂vi i ∂qi i
If we wish to describe physics in phase space (qi , pi ), we are making a change

of variables from vi to the gradient with respect to these variables, pi =
∂L/∂vi , where we focus now on the variables being transformed and ignore
P
the fixed qi variables. So dL = i pi dvi , and the pi are functions of the vj
determined by the function L(vi ). Is there a function g(pi ) which reverses
P
the roles of v and p, for which dg = i vi dpi ? If we can invert the functions
p(v), we can define g(pi ) = i pi vi (pj ) − L(vi (pj )), which has a differential
P
X X X X X
dg = dvi pi + vi dpi − dL = dvi pi + vi dpi − pi dvi
i i i i i
X
= vi dpi
i
as requested, and which also determines the relationship between v and p,

∂g
vi = = vi (pj ),
∂pi
giving the inverse relation to pk (v` ). This particular form of changing vari-
ables is called a Legendre transformation. In the case of interest here,
the function g is called H(qi , pj , t), the Hamiltonian,
X
H(qi , pj , t) = pk q̇k (qi , pj , t) − L(qi , q̇j (q` , pm , t), t). (6.2)
k
Other examples of Legendre transformations occur in thermodynamics.

The energy change of a gas in a variable container with heat flow is sometimes
written
dE = d̄Q − pdV,
where d̄Q is not an exact differential, and the heat Q is not a well defined
system variable. Though Q is not a well defined state function, the differential
d̄Q is a well defined 1-form on the manifold of possible states of the system.
It is not, however, an exact 1-form, which is why Q is not a function on that

manifold. We can express d̄Q by defining the entropy and temperature, in
terms of which d̄Q = T dS, and the entropy S and temperature T are well
defined state functions. Thus the state of the gas can be described by the
two variables S and V , and changes involve an energy change
dE = T dS − pdV.
We see that the temperature is T = ∂E/∂S|V . If we wish to find quantities

appropriate for describing the gas as a function of T rather than S, we define
the free energy F by −F = T S − E so dF = −SdT − pdV , and we treat
F as a function F (T, V ). Alternatively, to use the pressure p rather than V ,
we define the enthalpy X(p, S) = V p + E, dX = V dp + T dS. To make both
changes, and use (T, p) to describe the state of the gas, we use the Gibbs
free energy G(T, p) = X − T S = E + V p − T S, dG = V dp − SdT . Each of
these involves a Legendre transformation starting with E(S, V ).
Unlike Q, E is a well defined property of the gas when it is in a volume
V if its entropy is S, so E = E(S, V ), and

∂E ∂E
T = , p= .
∂S V ∂V S

∂ 2E ∂ 2E ∂T ∂p
As = we can conclude that = . We may also
∂S∂V ∂V ∂S ∂V S ∂S V
consider the state of the gas to be described by T and V , so

∂E ∂E
dE = dT + dV
∂T V ∂V T
" !#
1 p 1 ∂E 1 ∂E
dS = dE + dV = dT + p+ dV,
T T T ∂T V T ∂V T
from which we can conclude
! " !#
∂ 1 ∂E ∂ 1 ∂E

= p+

,
∂V T ∂T V
T
∂T T ∂V T
V
and therefore
∂p ∂E
T −p= .
∂T V ∂V T
6.1. LEGENDRE TRANSFORMS 157
This is a useful relation in thermodynamics.

Let us get back to mechanics. Most Lagrangians we encounter have the
decomposition L = L2 +L1 +L0 into terms quadratic, linear, and independent
of velocities, as considered in 2.4.2. Then the momenta are linear in velocities,
pi = j Mij q̇j + ai , or in matrix form p = M · q̇ + a, which has the inverse
P
relation q̇ = M −1 · (p − a). As H = L2 − L0 , H = 21 (p − a) · M −1 · (p − a) − L0 .
As a simple example, with a = 0 and a diagonal matrix M , consider spherical
coordinates, in which the kinetic energy is
p2 p2
!
m 2 1
T = ṙ + r2 θ̇2 + r2 sin2 θφ̇2 = p2r + 2θ + 2 φ 2 .
2 2m r r sin θ
Note that the generalized momenta are not normalized components of the
ordinary momentum, as pθ 6= p~ · êθ , in fact it doesn’t even have the same
units.
The equations of motion in Hamiltonian form,

∂H ∂H
q̇k = , ṗk = − ,
∂pk q,t ∂qk p,t
are almost symmetric in their treatment of q and p. If we define a 2N

dimensional coordinate η for phase space,
ηi = qi

for 1 ≤ i ≤ N,
ηN +i = pi
we can write Hamilton’s equation in terms of a particular matrix J,
2N
∂H 0 1IN ×N
X
η̇j = Jjk , where J = .
k=1 ∂ηk −1IN ×N 0
J is like a multidimensional version of the iσy which we meet in quantum-

mechanical descriptions of spin 1/2 particles. It is real, antisymmetric, and
because J 2 = −1I, it is orthogonal. Mathematicians would say that J de-
scribes the complex structure on phase space, also called the symplec-
tic structure.
In Section 2.1 we discussed how the Lagrangian is unchanged by a change
of generalized coordinates used to describe the physical situation. More pre-
cisely, the Lagrangian transforms as a scalar under such point transforma-
tions, taking on the same value at the same physical point, described in the
new coordinates. There is no unique set of generalized coordinates which

describes the physics. But in transforming to the Hamiltonian language,
different generalized coordinates may give different momenta and different
Hamiltonians. An nice example is given in Goldstein, a mass on a spring at-
tached to a “fixed point” which is on a truck moving at uniform velocity vT ,
relative to the Earth. If we use the Earth coordinate x to describe the mass,
the equilibrium position of the spring is moving in time, xeq = vT t, ignoring a
negligible initial position. Thus U = 21 k(x − vT t)2 , while T = 12 mẋ2 as usual,
and L = 12 mẋ2 − 21 k(x − vT t)2 , p = mẋ, H = p2 /2m + 12 k(x − vT t)2 . The
equation of motion is ṗ = mẍ = −∂H/∂x = −k(x − vT t), of course. This
shows that H is not conserved, dH/dt = (p/m)dp/dt + k(ẋ − vT )(x − vT t) =
−(kp/m)(x − vT t) + (kp/m − kvT )(x − vT t) = −kvT (x − vT t) 6= 0. Alterna-
tively, dH/dt = −∂L/∂t = −kvT (x − vT t) 6= 0. This is not surprising; the
spring exerts a force on the truck and the truck is doing work to keep the
fixed point moving at constant velocity.
On the other hand, if we use the truck coordinate x0 = x − vT t, we
may describe the motion in this frame with T 0 = 12 mẋ0 2 , U 0 = 21 kx0 2 , L0 =
1
2
mẋ0 2 − 12 kx0 2 , giving the correct equations of motion p0 = mẋ0 , ṗ0 = mẍ0 =
−∂L0 /∂x0 = −kx0 . With this set of coordinates, the Hamiltonian is H 0 =
ẋ0 p0 − L0 = p0 2 /2m + 21 kx0 2 , which is conserved. From the correspondence
between the two sets of variables, x0 = x − vT t, and p0 = p − mvT , we see that
the Hamiltonians at corresponding points in phase space differ, H(x, p) −
H 0 (x0 , p0 ) = (p2 − p02 )/2m = 2mvT p − 12 mvT2 6= 0.
Thus the Hamiltonian is not invariant, or a scalar, under change of gen-
eralized coordinates, or point transformations.
6.2 Variations on phase curves

In applying Hamilton’s Principle to derive Lagrange’s Equations, we consid-
ered variations in which δqi (t) was arbitrary except at the initial and final
times, but the velocities were fixed in terms of these, δ q̇i (t) = (d/dt)δqi (t).
In discussing dynamics in terms of phase space, this is not the most natural
variation, because this means that the momenta are not varied indepen-
dently. Here we will show that Hamilton’s equations follow from a modified
Hamilton’s Principle, in which the momenta are freely varied.
6.3. CANONICAL TRANSFORMATIONS 159
We write the action in terms of the Hamiltonian,

Z tf "X #
I= pi q̇i − H(qj , pj , t) dt,
ti i
and consider its variation under arbitrary variation of the path in phase
space, (qi (t), pi (t)). The q̇i (t) is still dqi /dt, but the momentum is varied free
of any connection to q̇i . Then
Z tf "X ! !# tf
∂H X ∂H X
δI = δpi q̇i − − δqi ṗi + dt + pi δqi ,

ti i ∂pi i ∂qi i

ti
RP
where we have integrated the pi dδqi /dt term by parts. Note that in order
to relate stationarity of the action to Hamilton Equations of Motion, it is
necessary only to constrain the qi (t) at the initial and final times, without
imposing any limitations on the variation of pi (t), either at the endpoints, as
we did for qi (t), or in the interior (ti , tf ), where we had previously related pi
and q̇j . The relation between q̇i and pj emerges instead among the equations
of motion.
The q̇i seems a bit out of place in a variational principle over phase space,
and indeed we can rewrite the action integral as an integral of a 1-form over
a path in extended phase space,
Z X
I= pi dqi − H(q, p, t)dt.
i
We will see, in section 6.6, that the first term of the integrand leads to a very
important form on phase space, and that the whole integrand is an important
1-form on extended phase space.
6.3 Canonical transformations

We have seen that it is often useful to switch from the original set of coor-
dinates in which a problem appeared to a different set in which the problem
became simpler. We switched from cartesian to center-of-mass spherical coor-
dinates to discuss planetary motion, for example, or from the Earth frame to
the truck frame in the example in which we found how Hamiltonians depend
on coordinate choices. In all these cases we considered a change of coordi-
nates q → Q, where each Qi is a function of all the qj and possibly time, but
not of the momenta or velocities. This is called a point transformation.

But we have seen that we can work in phase space where coordinates and
momenta enter together in similar ways, and we might ask ourselves what
happens if we make a change of variables on phase space, to new variables
Qi (q, p, t), Pi (q, p, t). We should not expect the Hamiltonian to be the same
either in form or in value, as we saw even for point transformations, but there
must be a new Hamiltonian K(Q, P, t) from which we can derive the correct
equations of motion,
∂K ∂K
Q̇i = , Ṗi = − .
∂Pi ∂Qi
The analog of η for our new variables will be called ζ, so
Q ∂K

ζ= , ζ̇ = J · .
P ∂ζ
If this exists, we say the new variables (Q, P ) are canonical variables and
the transformation (q, p) → (Q, P ) is a canonical transformation. Note
that the functions Qi and Pi may depend on time as well as on q and p.
These new Hamiltonian equations are related to the old ones, η̇ = J ·
∂H/∂η, by the function which gives the new coordinates and momenta in
terms of the old, ζ = ζ(η, t). Then
dζi X ∂ζi ∂ζi
ζ̇i = = η̇j + .
dt j ∂ηj ∂t
Let us write the Jacobian matrix Mij := ∂ζi /∂ηj . In general, M will not be a
constant but a function on phase space. The above relation for the velocities
now reads

∂ζ
ζ̇ = M · η̇ + .
∂t η
The gradients in phase space are also related,

∂ X ∂ζj ∂
= , or ∇η = M T · ∇ζ .
∂ηi t,η j ∂ηi t,η ∂ζj t,ζ

Thus we have
∂ζ ∂ζ ∂ζ
ζ̇ = M · η̇ + = M · J · ∇η H + = M · J · M T · ∇ζ H +
∂t ∂t ∂t
= J · ∇ζ K.
6.3. CANONICAL TRANSFORMATIONS 161
Let us first consider a canonical transformation which does not depend

on time, so ∂ζ/∂t|η = 0. We see that we can choose the new Hamiltonian to
be the same as the old, K = H, and get correct mechanics, if
M · J · M T = J. (6.3)
We will require this condition even when ζ does depend on t, but then we
need to revisit the question of finding K.
The condition (6.3) on M is similar to, and a generalization of, the condi-
tion for orthogonality of a matrix, OOT = 1I, which is of the same form with
J replaced by 1I. Another example of this kind of relation in physics occurs
in special relativity, where a Lorentz transformation Lµν gives the relation
between two coordinates, x0µ = ν Lµν xν , with xν a four dimensional vector
P
with x4 = ct. Then the condition which makes L a Lorentz transformation

is
1 0 0 0
 
0 1 0 0 
L · g · LT = g, with g =  .
 
0 0 1 0 
0 0 0 −1
The matrix g in relativity is known as the indefinite metric, and the condition
on L is known as pseudo-orthogonality. In our current discussion, however,
J is not a metric, as it is antisymmetric rather than symmetric, and the word
which describes M is symplectic.
Just as for orthogonal transformations, symplectic transformations can be
divided into those which can be generated by infinitesimal transformations
(which are connected to the identity) and those which can not. Consider a
transformation M which is almost the identity, Mij = δij + Gij , or M =
1I + G, where is considered some infinitesimal parameter while G is a finite
matrix. As M is symplectic, (1 + G) · J · (1 + GT ) = J, which tells us that
to lowest order in , GJ + JGT = 0. Comparing this to the condition for
the generator of an infinitesimal rotation, Ω = −ΩT , we see that it is similar
except for the appearence of J on opposite sides, changing orthogonality to
symplecticity. The new variables under such a canonical transformation are
ζ = η + G · η.
The condition (6.3) for a transformation η → ζ to be canonical does not
involve time — each canonical transformation is a fixed map of phase-space
onto itself, and could be used at any t. We might consider a set of such
maps, one for each time, giving a time dependant map g(t) : η → ζ. Each
such map could be used to transform the trajectory of the system at any
time. In particular, consider the set of maps g(t, t0 ) which maps each point
η at which a system can be at time t0 into the point to which it will evolve
at time t. That is, g(t, t0 ) : η(t0 ) 7→ η(t). If we consider t = t0 + ∆t for
infinitesimal ∆t, this is an infinitesimal transformation. As ζi = ηi + ∆tη̇i =
ηi + ∆t k Jik ∂H/∂ηk , we have Mij = ∂ζi /∂ηj = δij + ∆t k Jik ∂ 2 H/∂ηj ∂ηk ,
P P
so Gij = k Jik ∂ 2 H/∂ηj ∂ηk ,

P
∂ 2H ∂ 2H
!
T
X
(GJ + JG )ij = Jik J`j + Ji` Jjk
k` ∂η` ∂ηk ∂η` ∂ηk
X ∂ 2H
= (Jik J`j + Ji` Jjk ) .
k` ∂η` ∂ηk
The factor in parentheses in the last line is (−Jik Jj` + Ji` Jjk ) which is anti-
symmetric under k ↔ `, and as it is contracted into the second derivative,
which is symmetric under k ↔ `, we see that (GJ + JGT )ij = 0 and we
have an infinitesimal canonical transformation. Thus the infinitesimal flow
of phase space points by the velocity function is canonical. As compositions
of canonical transformations are also canonical2 , the map g(t, t0 ) which takes
η(t0 ) into η(t), the point it will evolve into after a finite time increment t − t0 ,
is also a canonical transformation.
Notice that the relationship ensuring Hamilton’s equations exist,
∂ζ
M · J · M T · ∇ζ H + = J · ∇ζ K,
∂t
with the symplectic condition M ·J·M T = J, implies ∇ζ (K−H) = −J·∂ζ/∂t,
so K differs from H here. This discussion holds as long as M is symplectic,
even if it is not an infinitesimal transformation.
6.4 Poisson Brackets

Suppose I have some function f (q, p, t) on phase space and I want to ask how
f , evaluated on a dynamical system, changes as the system evolves through
2
If M = M1 · M2 and M1 · J · M1T = J, M2 · J · M2T = J, then M · J · M T =
(M1 · M2 ) · J(·M2T · M1T ) = M1 · (M2 · J · M2T ) · M1T = M1 · J · M1T = J, so M is canonical.
6.4. POISSON BRACKETS 163
phase space with time. Then

df X ∂f X ∂f ∂f
= q̇i + ṗi +
dt i ∂qi i ∂pi ∂t
X ∂f ∂H X ∂f ∂H ∂f
= − + . (6.4)
i ∂qi ∂pi i ∂pi ∂qi ∂t
The structure of the first two terms is that of a Poisson bracket, a bilinear
operation of functions on phase space defined by
X ∂u ∂v X ∂u ∂v
[u, v] := − . (6.5)
i ∂qi ∂pi i ∂pi ∂qi
Thus Eq. 6.4 may be rewritten as

df ∂f
= [f, H] + . (6.6)
dt ∂t
The Poisson bracket is a fundamental property of the phase space. In
symplectic language,
∂u ∂v
= (∇η u)T · J · ∇η v.
X
[u, v] = Jij (6.7)
ij ∂ηi ∂ηj
If we describe the system in terms of a different set of canonical variables

ζ, we should still find the function f (t) changing at the same rate. We may
think of u and v as functions of ζ as easily as of η. Really we are thinking
of u and v as functions of points in phase space, represented by u(η) = ũ(ζ)
and we may ask whether [ũ, ṽ]ζ is the same as [u, v]η . Using ∇η = M T · ∇ζ ,
we have
T
[u, v]η = M T · ∇ζ ũ · J · M T ∇ζ ṽ = (∇ζ ũ)T · M · J · M T ∇ζ ṽ
= (∇ζ ũ)T · J∇ζ ṽ = [ũ, ṽ]ζ ,
so we see that the Poisson bracket is independent of the coordinatization
used to describe phase space, as long as it is canonical.
The Poisson bracket plays such an important role in classical mechanics,
and an even more important role in quantum mechanics, that it is worthwhile
to discuss some of its abstract properties. First of all, from the definition it
is obvious that it is antisymmetric:
[u, v] = −[v, u]. (6.8)
It is a linear operator on each function over constant linear combinations,

but is satisfies a Leibnitz rule for non-constant multiples,
[uv, w] = [u, w]v + u[v, w], (6.9)
which follows immediately from the definition, using Leibnitz’ rule on the
partial derivatives. A very special relation is the Jacobi identity,
[u, [v, w]] + [v, [w, u]] + [w, [u, v]] = 0. (6.10)
We need to prove that this is true. To simplify the presentation, we introduce

some abbreviated notation. We use a subscript ,i to indicate partial derivative
with respect to ηi , so u,i means ∂u/∂ηi , and u,i,j means ∂(∂u/∂ηi )/∂ηj . We
will assume all our functions on phase space are suitably differentiable, so
u,i,j = u,j,i . We will also use the summation convention, that any index
which appears twice in a term is assumed to be summed over3 . Then [v, w] =
v,i Jij w,j , and
[u, [v, w]] = [u, v,i Jij w,j ]

= [u, v,i ]Jij w,j + v,i Jij [u, w,j ]
= u,k Jk` v,i,` Jij w,j + v,i Jij u,k Jk` w,j,` .
In the Jacobi identity, there are two other terms like this, one with the
substitution u → v → w → u and the other with u → w → v → u, giving
a sum of six terms. The only ones involving second derivatives of v are the
first term above and the one found from applying u → w → v → u to the
second, u,i Jij w,k Jk` v,j,` . The indices are all dummy indices, summed over, so
their names can be changed, by i → k → j → ` → i, converting this second
term to u,k Jk` w,j Jji v,`,i . Adding the original term u,k Jk` v,i,` Jij w,j , and using
v,`,i = v,i,` , gives u,k Jk` w,j (Jji +Jij )v,`,i = 0 because J is antisymmetric. Thus
the terms in the Jacobi identity involving second derivatives of v vanish, but
the same argument applies in pairs to the other terms, involving second
derivatives of u or of w, so they all vanish, and the Jacobi identity is proven.
This argument can be made more elegantly if we recognize that for each
function f on phase space, we may view [f, ·] as a differential operator on
3
This convention of understood summation was invented by Einstein, who called it the
“greatest contribution of my life”.
6.4. POISSON BRACKETS 165
functions g on phase space, mapping g → [f, g]. Calling this operator Df ,

we see that
!
X X ∂f ∂
Df = Jij ,
j i ∂ηi ∂ηj
which is of the general form that a differential operator has,

X ∂
Df = fj ,
j ∂ηj
where fj are an arbitrary set of functions on phase space. For the Poisson
bracket, the functions fj are linear combinations of the f,j , but fj 6= f,j .
With this interpretation, [f, g] = Df g, and [h, [f, g]] = Dh Df g. Thus
[h, [f, g]] + [f, [g, h]] = [h, [f, g]] − [f, [h, g]] = Dh Df g − Df Dh g
= (Dh Df − Df Dh )g, (6.11)
and we see that this combination of Poisson brackets involves the commutator
of differential operators. But such a commutator is always a linear differential
operator itself,
X ∂ ∂ X ∂fj ∂ X ∂2
Dh Df = hi fj = hi + hi fj
ij ∂ηi ∂ηj ij ∂ηi ∂ηj ij ∂ηi ∂ηj
X ∂ ∂ X ∂hi ∂ X ∂2
Df Dh = fj hi = fj + hi fj
ij ∂ηj ∂ηi ij ∂ηj ∂ηi ij ∂ηi ∂ηj
so in the commutator, the second derivative terms cancel, and

X ∂fj ∂ X ∂hi ∂
Dh Df − Df Dh = hi − fj
ij ∂ηi ∂ηj ij ∂ηj ∂ηi
!
X ∂fj ∂hj ∂
= hi − fi .
ij ∂ηi ∂ηi ∂ηj
This is just another first order differential operator, so there are no second
derivatives of g left in (6.11). In fact, the identity tells us that this combina-
tion is
Dh Df − Df Dh = D[h,f ] (6.12)
An antisymmetric product which obeys the Jacobi identity is what makes

a Lie algebra. Lie algebras are the infinitesimal generators of Lie groups,
or continuous groups, one example of which is the group of rotations SO(3)
which we have already considered. Notice that the “product” here is not as-
sociative, [u, [v, w]] 6= [[u, v], w]. In fact, the difference [u, [v, w]] − [[u, v], w] =
[u, [v, w]] + [w, [u, v]] = −[v, [w, u]] by the Jacobi identity, so the Jacobi iden-
tity replaces the law of associativity in a Lie algebra. Lie groups play a
major role in quantum mechanics and quantum field theory, and their more
extensive study is highly recommended for any physicist. Here we will only
mention that infinitesimal rotations, represented either by the ω∆t or Ω∆t
of Chapter 4, constitute the three dimensional Lie algebra of the rotation
group (in three dimensions).
Recall that the rate at which a function on phase space, evaluated on the
system as it evolves, changes with time is
df ∂f
= −[H, f ] + , (6.13)
dt ∂t
where H is the Hamiltonian. The function [f, g] on phase space also evolves
that way, of course, so
d[f, g] ∂[f, g]
= −[H, [f, g]] +
dt ∂t " # " #
∂f ∂g
= [f, [g, H]] + [g, [H, f ]] + , g + f,
∂t ∂t
" !# " !#
∂g ∂f
= f, −[H, g] + + g, [H, f ] −
∂t ∂t
" # " #
dg df
= f, − g, .
dt dt
If f and g are conserved quantities, df /dt = dg/dt = 0, and we have the

important consequence that d[f, g]/dt = 0. This proves Poisson’s theorem:
The Poisson bracket of two conserved quantities is a conserved quantity.
We will now show an important theorem, known as Liouville’s theo-
rem, that the volume of a region of phase space is invariant under canonical
transformations. This is not a volume in ordinary space, but a 2n dimen-
sional volume, given by integrating the volume element 2n
Q
i=1 dηi in the old
6.5. HIGHER DIFFERENTIAL FORMS 167
coordinates, and by
2n 2n 2n

Y ∂ζi Y Y
dζi = det

dηi = |det M | dηi
i=1
∂ηj i=1 i=1
in the new, where we have used the fact that the change of variables requires
a Jacobian in the volume element. But because J = M · J · M T , det J =
det M det J det M T = (det M )2 det J, and J is nonsingular, so det M = ±1,
and the volume element is unchanged.
In statistical mechanics, we generally do not know the actual state of a
system, but know something about the probability that the system is in a
particular region of phase space. As the transformation which maps possible
values of η(t1 ) to the values into which they will evolve at time t2 is a canon-
ical transformation, this means that the volume of a region in phase space
does not change with time, although the region itself changes. Thus the prob-
ability density, specifying the likelihood that the system is near a particular
point of phase space, is invariant as we move along with the system.
6.5 Higher Differential Forms

In section 6.1 we discussed a reinterpretation of the differential df as an
example of a more general differential 1-form, a map ω : M × Rn → R. We
saw that the {dxi } provide a basis for these forms, so the general 1-form can
P
be written as ω = i ωi (x) dxi . The differential df gave an example. We
defined an exact 1-form as one which is a differential of some well-defined
P
function f . What is the condition for a 1-form to be exact? If ω = ωi dxi
is df , then ωi = ∂f /∂xi = f,i , and
∂ωi ∂ 2f ∂ 2f
ωi,j = = = = ωj,i .
∂xj ∂xi ∂xj ∂xj ∂xi
Thus one necessary condition for ω to be exact is that the combination ωj,i −
ωi,j = 0. We will define a 2-form to be the set of these objects which must
vanish. In fact, we define a differential k-form to be a map
n
ω (k) : M × R
|
· · × Rn} → R
× ·{z
k times
which is linear in its action on each of the Rn and totally antisymmetric in

its action on the k copies, and is a smooth function of x ∈ M. At a given
point, a basis of the k-forms is4

(−1)P dxiP 1 ⊗ dxiP 2 ⊗ · · · ⊗ dxiP k .
X
dxi1 ∧ dxi2 ∧ · · · ∧ dxik :=
P ∈Sk
For example, in three dimensions there are three independent 2-forms at a

point, dx1 ∧dx2 , dx1 ∧dx3 , and dx2 ∧dx3 , where dx1 ∧dx2 = dx1 ⊗dx2 −dx2 ⊗
dx1 , which means that, acting on ~u and ~v , dx1 ∧ dx2 (~u, ~v ) = u1 v2 − u2 v1 . The
product ∧ is called the wedge product or exterior product, and can be
extended to act between k1 - and k2 -forms so that it becomes an associative
distributive product. Note that this definition of a k-form agrees, for k = 1,
with our previous definition, and for k = 0 tells us a 0-form is simply a
function on M. The general expression for a k-form is
ω (k) =
X
ωi1 ...ik (x)dxi1 ∧ · · · ∧ dxik .
i1 <...<ik
Let us consider some examples in three dimensional Euclidean5 space E 3 ,

where there is a correspondance we can make between vectors and 1- and
2-forms. In this discussion we will not be considering how the objects change
under changes in the coordinates of E 3 , to which we will return later.
k = 0: As always, 0-forms are simply functions, f (x), x ∈ E 3 .
P
k = 1: A 1-form ω = ωi dxi can be thought of, or associated with, a vector
~
field A(x)
P ~ = ∇f
= ωi (x)êi . Note that if ω = df , ωi = ∂f /∂xi , so A ~ .
k = 2: A general two form is a sum over the three independent wedge prod-
ucts with independent functions B12 (x), B13 (x), B23 (x). Let us extend
the definition of Bij to make it an antisymmetric matrix, so
X X
B= Bij dxi ∧ dxj = Bij dxi ⊗ dxj .
i<j i,j
4
Some explanation of the mathematical symbols might be in order here. Sk is the group
of permutations on k objects, and (−1)P is the sign of the permutation P , which is plus
or minus one if the permutation can be built from an even or an odd number, respectively,
of transpositions of two of the elements. The tensor product ⊗ of two linear operators into
a field is a linear operator which acts on the product space, or in other words a bilinear
n n
operator with two arguments. Here dxi ⊗ dxj is an operator on R × R which maps the
pair of vectors (~u, ~v ) to ui vj .
5
Forms are especially useful in discussing more general manifolds, such as occur in
general relativity. Then one must distinguish between covariant and contravariant vectors,
a complication we avoid here by treating only Euclidean space.
As we did for the angular velocity matrix Ω in (4.2), we can condense

the information in the antisymmetric matrix Bij into a vector field
~ = P Bi êi , with Bij = P ijk Bk . Note that this step requires that
B
we are working in E 3 rather than some other dimension. Thus B =
1P 1 P
ijk ijk Bk dxi ⊗ dxj . Also note 2
P
ij ijk Bij = 2 ij` ijk ij` B` = Bk .
k = 3: There is only one basis 3-form available in three dimensions, dx1 ∧

dx2 ∧ dx3 . Any other 3-form is proportional to this one, though the
proportionality can be a function of {xi }. In particular dxi ∧dxj ∧dxk =
ijk dx1 ∧ dx2 ∧ dx3 . The most general 3-form C is simply specified by
an ordinary function C(x), which multiplies dx1 ∧ dx2 ∧ dx3 .
Having established, in three dimensions, a correspondance between vec-
tors and 1- and 2-forms, and between functions and 0- and 3-forms, we can
ask to what the wedge product corresponds in terms of these vectors. If
~ and C
A ~ are two vectors corresponding to the 1-forms A = P Ai dxi and
C = Ci dxi , and if B = A ∧ C, then
P
X X X
B= Ai Cj dxi ∧ dxj = (Ai Cj − Aj Ci )dxi ⊗ dxj = Bij dxi ⊗ dxj ,
ij ij ij
so Bij = Ai Cj − Aj Ci , and
1X 1X 1X X
Bk = kij Bij = kij Ai Cj − kij Aj Ci = kij Ai Cj ,
2 2 2
so
~ =A
B ~ × C,
~
and the wedge product of two 1-forms is the cross product of their vectors.
If A is a 1-form and B is a 2-form, the wedge product C = A ∧ B =
C(x)dx1 ∧ dx2 ∧ dx3 is given by
XX
C = A∧B = Ai Bjk dxi ∧ dxj ∧ dxk
i j<k
|{z} | {z }
jk` B` ijk dx1 ∧ dx2 ∧ dx3
X X
= Ai B` jk` ijk dx1 ∧ dx2 ∧ dx3
i` j<k
| {z }
symmetric under j ↔ k
1X X X
= Ai B` jk` ijk dx1 ∧ dx2 ∧ dx3 = Ai B` δi` dx1 ∧ dx2 ∧ dx3
2 i` jk i`
~·B
= A ~ dx1 ∧ dx2 ∧ dx3 ,
so we see that the wedge product of a 1-form and a 2-form gives the dot
product of their vectors.
If A and B are both 2-forms, the wedge product C = A ∧ B must be a
4-form, but there cannot be an antisymmetric function of four dxi ’s in three
dimensions, so C = 0.
The exterior derivative

We defined the differential of a function f , which we now call a 0-form, giving
P
a 1-form df = f,i dxi . Now we want to generalize the notion of differential
so that d can act on k-forms for arbitrary k. This generalized differential
d : k-forms → (k + 1)-forms
is called the exterior derivative. It is defined to be linear and to act on
one term in the sum over basis elements by
d (fi1 ...ik (x)dxi1 ∧ · · · ∧ dxik ) = (dfi1 ...ik (x)) ∧ dxi1 ∧ · · · ∧ dxik
X
= fi1 ...ik ,j dxj ∧ dxi1 ∧ · · · ∧ dxik .
j
Clearly some examples are called for, so let us look again at three dimen-
sional Euclidean space.
P
k = 0: For a 0-form f , df = f,i dxi , as we defined earlier. In terms of
~ .
vectors, df ∼ ∇f
k = 1: For a 1-form ω = ωi dxi , dω = i dωi ∧ dxi = ij ωi,j dxj ∧ dxi =
P P P
ij (ωj,i − ωi,j ) dxi ⊗dxj , corresponding to a two form with Bij = ωj,i −
P
ωi,j . These Bij are exactly the things which must vanish if ω is to be
exact. In three dimensional Euclidean space, we have a vector B ~ with
1 P
components Bk = 2 kij (ωj,i − ωi,j ) = kij ∂i ωj = (∇
P ~ ×ω
~ )k , so here
~
the exterior derivative of a 1-form gives a curl, B = ∇ × ω~ ~.
k = 2: On a two form B = i<j Bij dxi ∧ dxj , the exterior derivative gives
P
a 3-form C = dB = k i<j Bij,k dxk ∧ dxi ∧ dxj . In three-dimensional

P P
Euclidean space, this reduces to

XX X
C= (∂k ij` B` ) kij dx1 ∧ dx2 ∧ dx3 = ∂k Bk dx1 ∧ dx2 ∧ dx3 ,
k` i<j k
so C(x) = ∇ ~ · B,
~ and the exterior derivative on a 2-form gives the
divergence of the corresponding vector.
k = 3: If C is a 3-form, dC is a 4-form. In three dimensions there cannot

be any 4-forms, so dC = 0 for all such forms.
We can summarize the action of the exterior derivative in three dimensions

in this diagram:
d d d d
f - ~
ω (1) ∼ A - ~
ω (2) ∼ B - ω (3) - 0
∇f ∇×A ∇·B
Now that we have d operating on all k-forms, we can ask what happens
if we apply it twice. Looking first in three dimenions, on a 0-form we get
d2 f = dA for A ~ ∼ ∇f , and dA ∼ ∇ × A, so d2 f ∼ ∇ × ∇f . But the curl of a
~ ∼ ∇×A
gradient is zero, so d2 = 0 in this case. On a one form d2 A = dB, B ~
~
and dB ∼ ∇ · B = ∇ · (∇ × A). Now we have the divergence of a curl,
which is also zero. For higher forms in three dimensions we can only get zero
because the degree of the form would be greater than three. Thus we have
a strong hint that d2 might vanish in general. To verify this, we apply d2 to
ω (k) = ωi1 ...ik dxi1 ∧ · · · ∧ dxik . Then
P
X X
dω = (∂j ωi1 ...ik ) dxj ∧ dxi1 ∧ · · · ∧ dxik
j i1 <i2 <···<ik
X X
d(dω) = ( ∂` ∂j ωi1 ...ik ) dx` ∧ dxj ∧dxi1 ∧ · · · ∧ dxik
`j i1 <i2 <···<ik
| {z } | {z }
symmetric antisymmetric
= 0.
This is a very important result. A k-form which is the exterior derivative of

some (k − 1)-form is called exact, while a k-form whose exterior derivative
vanishes is called closed, and we have just proven that all exact k-forms are
closed.
The converse is a more subtle question. In general, there are k-forms
which are closed but not exact, given by harmonic functions on the manifold
M, which form what is known as the cohomology of M. This has to do with
global properties of the space, however, and locally every closed form can be
written as an exact one.6 The precisely stated theorem, known as Poincaré’s
6
An example may be useful. In two dimensions, irrotational vortex flow can be repre-
sented by the 1-form ω = −yr−2 dx + xr−2 dy, which satisfies dω = 0 wherever it is well
defined, but it is not well defined at the origin. Locally, we can write ω = dθ, where θ is
the polar coordinate. But θ is not, strictly speaking, a function on the plane, even on the
Lemma, is that if ω is a closed k-form on a coordinate neighborhood U of a

manifold M , and if U is contractible to a point, then ω is exact on U . We will
ignore the possibility of global obstructions and assume that we can write
closed k-forms in terms of an exterior derivative acting on a (k − 1)-form.
Coordinate independence of k-forms
We have introduced forms in a way which makes them appear dependent

on the coordinates xi used to describe the space M. This is not what we
want at all7 . We want to be able to describe physical quantities that have
intrinsic meaning independent of a coordinate system. If we are presented
with another set of coordinates yj describing the same physical space, the
points in this space set up a mapping, ideally an isomorphism, from one
coordinate system to the other, ~y = ~y (~x). If a function represents a physical
field independent of coordinates, the actual function f (x) used with the x
coordinates must be replaced by another function f˜(y) when using the y
coordinates. That they both describe the physical value at a given physical
point requires f (x) = f˜(y) when y = y(x), or more precisely8 f (x) = f˜(y(x)).
This associated function and coordinate system is called a scalar field.
If we think of the differential df as the change in f corresponding to
an infinitesimal change dx, then clearly df˜ is the same thing in different
coordinates, provided we understand the dyi to represent the same physical
displacement as dx does. That means
X ∂yk
dyk = dxj .
j ∂xj
plane with the origin removed, because it is not single-valued. It is a well defined function
on the plane with a half axis removed, which leaves a simply-connected region, a region
with no holes. In fact, this is the general condition for the exactness of a 1-form — a
closed 1-form on a simply connected manifold is exact.
7
Indeed, most mathematical texts will first define an abstract notion of a vector in
the tangent space as a directional derivative operator, specified by equivalence classes of
parameterized paths on M. Then 1-forms are defined as duals to these vectors. In the
first step any coordinatization of M is tied to the corresponding basis of the vector space
Rn . While this provides an elegant coordinate-independent way of defining the forms, the
abstract nature of this definition of vectors can be unsettling to a physicist.
8
More elegantly, giving the map x → y the name φ, so y = φ(x), we can state the
relation as f = f˜ ◦ φ.
As f (x) = f˜(y(x)) and f˜(y) = f (x(y)), the chain rule gives
∂f X ∂ f˜ ∂yj ∂ f˜ X ∂f ∂xi
= , = ,
∂xi j ∂yj ∂xi ∂yj i ∂xi ∂yj
so
∂ f˜ X ∂f ∂xi ∂yk
df˜ =
X
dyk = dxj
k ∂yk ijk ∂xi ∂yk ∂xj
X ∂f X
= δij dxj = f,i dxi = df.
ij ∂xi i
We impose this transformation law in general on the coefficients in our k-

forms, to make the k-form invariant, which means that the coefficients are
covariant,
X ∂xi
ω̃j = ωi
i ∂yj
k
!
X Y ∂xi`
ω̃j1 ...jk = ωi1 ...ik .
i1 ,i2 ,...,ik `=1 ∂yjl
Integration of k-forms
Suppose we have a k-dimensional smooth “surface” S in M, parameterized
by coordinates (u1 , · · · , uk ). We define the integral of a k-form
ω (k) =
X
ωi1 ...ik dxi1 ∧ · · · ∧ dxik
i1 <...<ik
over S by
k
!
Z
(k)
Z X Y ∂xi`
ω = ωi1 ...ik (x(u)) du1 du2 · · · duk .
S i1 ,i2 ,...,ik `=1 ∂u`
We had better give some examples. For k = 1, the “surface” is actually

a path Γ : u 7→ x(u), and
Z umax X
Z X
∂xi
ωi dxi = ωi (x(u)) du,
Γ umin ∂u
R
~ · d~r, the path integral of
which seems obvious. In vector notation this is Γ A
~
the vector A.
For k = 2,
Z Z
∂xi ∂xj
ω (2) = Bij dudv.
S ∂u ∂v
In three dimensions, the parallelogram
which is the image of the rectangle [u, u +
du] × [v, v + dv] has edges (∂~x/∂u)du and v
(∂~x/∂v)dv, which has an area equal to the
magnitude of u
!
~ = ∂~x ∂~x
“dS” × dudv
∂u ∂v
~
and a normal in the direction of “dS”. Writing Bij in terms of the corre-
~ Bij = ijk Bk , so
sponding vector B,
! !
Z
(2)
Z
∂~x ∂~x
ω = ijk Bk dudv
S S ∂u i
∂v j
!
Z
∂~x ∂~x Z
~ · dS,
~
= Bk × dudv = B
S ∂u ∂v k S
so
R
~ through the surface.
ω (2) gives the flux of B
Similarly for k = 3 in three dimensions,
! ! !
X ∂~x ∂~x ∂~x
ijk dudvdw
∂u i
∂v j
∂w k
is the volume of the parallelopiped which is the image of [u, u + du] × [v, v +
dv] × [w, w + dw]. As ωijk = ω123 ijk , this is exactly what appears:
Z
(3)
Z X
∂xi ∂xj ∂xk Z
ω = ijk ω123 dudvdw = ω123 (x)dV.
∂u ∂v ∂w
Notice that we have only defined the integration of k-forms over subman-
ifolds of dimension k, not over other-dimensional submanifolds. These are
the only integrals which have coordinate invariant meanings. Also note that
the integrals do not depend on how the surface is coordinatized.
6.6. THE NATURAL SYMPLECTIC 2-FORM 175
We state9 a marvelous theorem, special cases of which you have seen often
before, known as Stokes’ Theorem. Let C be a k-dimensional submanifold
of M, with ∂C its boundary. Let ω be a (k − 1)-form. Then Stokes’ theorem
says
Z Z
dω = ω. (6.14)
C ∂C
This elegant jewel is actually familiar in several contexts in three dimen-

sions. If k = 2, C is a surface, usually called S, Rbounded by a closed path
A, then Γ ω = Γ A · d~`. Now dω is
~ ~
R
Γ = ∂S. If ω is a 1-form associated with
R
the 2-form ∼ ∇ ~ × A,
~ and dω = ~ ~ ~
S ∇ × A · dS, so we see that this Stokes’
R
S
theorem includes the one we first learned by that name. But it also includes
other possibilities. We can try k = 3, where C = V is a volume with surface
S = ∂V . Then if ω ∼ B ~ is a two form, R ω = R B ~ · dS,
~ while dω ∼ ∇ ~ · B,
~
S S
~ ~
so V dω = ∇ · BdV , so here Stokes’ general theorem gives Gauss’s theo-
R R
rem. Finally, we could consider k = 1, C = Γ, which has a boundary ∂C

consisting of two points,Rsay A and B. Our 0-form ω = f is a function, and
Stokes’ theorem gives10 Γ df = f (B) − f (A), the “fundamental theorem of
calculus”.
6.6 The natural symplectic 2-form

We now turn our attention back to phase space, with a set of canonical
coordinates (qi , pi ). Using these coordinates we can define a particular 1-
P
form ω1 = i pi dqi . For a point transformation Qi = Qi (q1 , . . . , qn , t) we
may use the same Lagrangian, reexpressed in the new variables, of course.
Here the Qi are independent of the velocities q̇j , so on phase space11 dQi =
9
For a proof and for a more precise explanation of its meaning, we refer the reader to
the mathematical literature. In particular [14] and [3] are advanced calculus texts which
give elementary discussions in Euclidean 3-dimensional space. A more general treatment
is (possibly???) given in [16].
10
Note that there is a direction associated with the boundary, which is induced by a
direction associated with C itself. This gives an ambiguity in what we have stated, for
example how the direction of an open surface induces a direction on Rthe closed loop which
bounds it. Changing this direction would clearly reverse the sign of A ~ · d~`. We have not
worried about this ambiguity, but we cannot avoid noticing the appearence of the sign in
this last example.
11
We have not included a term ∂Q ∂t dt which would be necessary if we were considering
i
a form in the 2n + 1 dimensional extended phase space which includes time as one of its
P
j (∂Qi /∂qj )dqj . The new velocities are given by
X ∂Qi ∂Qi ∂ Q̇i ∂Qi

Q̇i = q̇j + , so = .
j ∂qj ∂t ∂ q̇j ∂qj
Thus the old canonical momenta,

∂L(q, q̇, t) X ∂L(Q, Q̇, t) ∂ Q̇j X ∂Qj
pi = = = P j .
∂ q̇i
q,t j ∂ Q̇j
q,t
∂ q̇ i

q,t j ∂q i
Thus the form ω1 may be written

XX ∂Qj X
ω1 = Pj dqi = Pj dQj ,
i j ∂qi j
so the form of ω1 is invariant under point transformations. This is too lim-

ited, however, for our current goals of considering general canonical transfor-
mations on phase space, under which ω1 will not be invariant. However, its
exterior derivative
X
ω2 := dω1 = dpi ∧ dqi
i
is invariant under all canonical transformations, as we shall show momen-

tarily. This makes it special, the natural symplectic structure on phase
space. We can reexpress ω2 in terms of our combined coordinate notation ηi ,
because
X X X
− Jij dηi ∧ dηj = − dqi ∧ dpi = dpi ∧ dqi = ω2 .
i<j i i
We must now show that the natural symplectic structure is indeed form
invariant under canonical transformation. Thus if Qi , Pi are a new set of
canonical coordinates, combined into ζj , we expect the corresponding object
formed from them, ω20 = − ij Jij dζi ⊗ dζj , to reduce to the same 2-form, ω2 .
P
We first note that

X ∂ζi X
dζi = dηj = Mij dηj ,
j ∂ηj j
coordinates.
with the same Jacobian matrix M we met in section 6.3. Thus
ω20 = −
X X X X
Jij dζi ⊗ dζj = − Jij Mik dηk ⊗ Mj` dη`
ij ij k `
X
= − MT · J · M dηk ⊗ dη` .
k`
k`
Things will work out if we can show M T · J · M = J, whereas what we know

for canonical transformations from Eq. (6.3) is that M · J · M T = J. We
also know M is invertible and that J 2 = −1, so if we multiply this known
equation from the left by −J · M −1 and from the right by J · M , we learn
that
−J · M −1 · M · J · M T · J · M = −J · M −1 · J · J · M
= J · M −1 · M = J
= −J · J · M T · J · M = M T · J · M,
which is what we wanted to prove. Thus we have shown that the 2-form ω2
is form-invariant under canonical transformations, and deserves its name.
One important property of the 2-form ω2 on phase space is that it is
non-degenerate. A 2-form has two slots to insert vectors — inserting one
leaves a 1-form. Non-degenerate means there is no non-zero vector ~v on phase
space such that ω2 (·, ~v ) = 0, that is, such that ω2 (~u, ~v ) = 0, for all ~u on phase
space. This follows simply from the fact that the matrix Jij is non-singular.
Extended phase space

One way of looking at the evolution of a system is in phase space, where
a given system corresponds to a point moving with time, and the general
equations of motion corresponds to a velocity field. Another way is to con-
sider extended phase space, a 2n + 1 dimensional space with coordinates
(qi , pi , t), for which a system’s motion is a path, monotone in t. By the modi-
fied Hamilton’sRprinciple, the path of a system in this space is an extremum of
t P
the action I = tif pi dqi − H(q, p, t)dt, which is the integral of the one-form
X
ω3 = pi dqi − H(q, p, t)dt.
The exterior derivative of this form involves the symplectic structure, ω2 ,

as dω3 = ω2 − dH ∧ dt. The 2-form ω2 on phase space is nondegenerate,
and every vector in phase space is also in extended phase space. On such a
vector, on which dt gives zero, the extra term gives only something in the dt
direction, so there are still no vectors in this subspace which are annihilated
by dω3 . Thus there is at most one direction in extended phase space which
is annihilated by dω3 . But any 2-form in an odd number of dimensions
must annihilate some vector, because in a given basis it corresponds to an
antisymmetric matrix Bij , and in an odd number of dimensions det B =
det B T = det(−B) = (−1)2n+1 det B = − det B, so det B = 0 and the matrix
is singular, annihilating some vector ξ. In fact, for dω3 this annihilated vector
ξ is the tangent to the path the system takes through extended phase space.
One way to see this is to simply work out what dω3 is and apply it to
the vector ξ, which is proportional to ~v = (q̇i , ṗi , 1). So we wish to show
dω3 (·, ~v ) = 0. Evaluating
X X X X X
dpi ∧ dqi (·, ~v ) = dpi dqi (~v ) − dqi dpi (~v ) = dpi q̇i − dqi ṗi
dH ∧ dt(·, ~v ) = dH dt(~v ) − dt dH(~v )
!
X ∂H X ∂H ∂H
= dqi + dpi + dt 1
∂qi ∂pi ∂t
!
X ∂H X ∂H ∂H
−dt q̇i + ṗi +
∂qi ∂pi ∂t
!
X ∂H X ∂H X ∂H ∂H
= dqi + dpi − dt q̇i + ṗi
∂qi ∂pi ∂qi ∂pi
! !
X ∂H ∂H
dω3 (·, ~v ) = q̇i − dpi − ṗi + dqi
∂pi ∂qi
!
X ∂H ∂H
+ q̇i + ṗi dt
∂qi ∂pi
= 0
where the vanishing is due to the Hamilton equations of motion.
There is a more abstract way of understanding why dω3 (·, ~v )
vanishes, from the modified Hamilton’s principle, which states
δη v dt
that if the path taken were infinitesimally varied from the physi-
cal path, there would be no change in the action. But this change
is the integral of ω3 along a loop, forwards in time along the first
trajectory and backwards along the second. From Stokes’ the-
orem this means the integral of dω3 over a surface connecting
these two paths vanishes. But this surface is a sum over infinitesimal parallel-
ograms one side of which is ~v ∆t and the other side of which12 is (δ~q(t), δ~p(t), 0).
As this latter vector is an arbitrary function of t, each parallelogram must in-
dependently give 0, so that its contribution to the integral, dω3 ((δ~q, δ~p, 0), ~v )∆t =
0. In addition, dω3 (~v , ~v ) = 0, of course, so dω3 (·, ~v ) vanishes on a complete
basis of vectors and is therefore zero.
6.6.1 Generating Functions

Consider a canonical transformation (q, p) → (Q, P ), and the two 1-forms
ω1 = i pi dqi and ω10 = i Pi dQi . We have mentioned that the difference of
P P
these will not vanish in general, but the exterior derivative of this difference,
d(ω1 − ω10 ) = ω2 − ω20 = 0, so ω1 − ω10 is an closed 1-form. Thus it is exact13 ,
and there must be a function F on phase space such that ω1 − ω10 = dF .
We call F the generating function of the canonical transformation14
If the transformation (q, p) → (Q, P ) is such that the old q’s alone, without
information about the old p’s, do not impose any restrictions on the new Q’s,
then the dq and dQ are independent, and we can use q and Q to parameter-
ize phase space15 . Then knowledge of the function F (q, Q) determines the
transformation, as
 
∂F ∂F
ω1 − ω10
X X
= (pi dqi − Pi dQi ) = dF =  dqi + dQi 
i i ∂qi Q ∂Qi q

∂F ∂F
=⇒ pi = , −Pi = .
∂qi Q
∂Qi q
If the canonical transformation depends on time, the function F will

also depend on time. Now if we consider the motion in extended phase
space, we know the phase trajectory that the system takes through extended
12
It is slightly more elegant to consider the path parameterized independently of time,
and consider arbitrary variations (δq, δp, δt), because the integral involved in the action,
being the integral of a 1-form, is independent of the parameterization. With this approach
we find immediately that dω3 (·, ~v ) vanishes on all vectors.
13
We are assuming phase space is simply connected, or else we are ignoring any compli-
cations which might ensue from F not being globally well defined.
14
This is not an infinitesimal generator in the sense we have in Lie algebras — this
generates a finite canonical transformation for finite F .
15
Note that this is the opposite extreme from a point transformation, which is a canonical
transformation for which the Q’s depend only on the q’s, independent of the p’s.
phase space is determined by Hamilton’s equations, which could be written

in any set of canonical coordinates, so in particular there is some Hamiltonian
K(Q, P, t) such that the tangent to the phase trajectory, ~v , is annihilated by
dω30 , where ω30 = Pi dQi − K(Q, P, t)dt. Now in general knowing that two
P
2-forms both annihilate the same vector would not be sufficient to identify
them, but in this case we also know that restricting dω3 and dω30 to their
action on the dt = 0 subspace gives the same 2-form ω2 . That is to say, if
~u and ~u 0 are two vectors with time components zero, we know that (dω3 −
dω30 )(~u,~u 0 ) = 0. Any vector can be expressed as a multiple of ~v and some
vector ~u with time component zero, and as both dω3 and dω30 annihilate ~v ,
we see that dω3 − dω30 vanishes on all pairs of vectors, and is therefore zero.
Thus ω3 − ω30 is a closed 1-form, which must be at least locally exact, and
indeed ω3 − ω30 = dF , where F is the generating function we found above16 .
Thus dF = pdq − P dQ + (K − H)dt, or
P P
∂F
K=H+ .
∂t
The function F (q, Q, t) is what Goldstein calls F1 . The existence of F
as a function on extended phase space holds even if the Q and q are not
independent, but in this case F will need to be expressed as a function of
other coordinates. Suppose the new P ’s and the old q’s are independent, so
P
we can write F (q, P, t). Then define F2 = Qi Pi + F . Then
X X X X
dF2 = Qi dPi + Pi dQi + pi dqi − Pi dQi + (K − H)dt
X X
= Qi dPi + pi dqi + (K − H)dt,
so
∂F2 ∂F2 ∂F2
Qi = , pi = , K(Q, P, t) = H(q, p, t) + .
∂Pi ∂qi ∂t
The generating function can be a function of old momenta rather than the
old coordinates. Making one choice for the old coordinates and one for the
new, there are four kinds of generating functions as described by Goldstein.
P
Let us consider some examples. The function F1 = i qi Qi generates an
16
From its definition in that context, we found that in phase space, dF = ω1 − ω10 ,
which is the part of ω3 − ω30 not in the time direction. Thus if ω3 − ω30 = dF 0 for some
other function F 0 , we know dF 0 − dF = (K 0 − K)dt for some new Hamiltonian function
K 0 (Q, P, t), so this corresponds to an ambiguity in K.
interchange of p and q,
Q i = pi , Pi = −qi ,
which leaves the Hamiltonian unchanged. We saw this clearly leaves the
form of Hamilton’s equations unchanged. An interesting generator of the
second type is F2 = i λi qi Pi , which gives Qi = λi qi , Pi = λ−1
P
i pi , a simple
change in scale of the coordinates with a corresponding inverse scale change
in momenta to allow [Qi , Pj ] = δij to remain unchanged. This also doesn’t
change H. For λ = 1, this is the identity transformation, for which F = 0,
of course.
Placing point transformations in this language provides another example.
For a point transformation, Qi = fi (q1 , . . . , qn , t), which is what one gets with
a generating function
X
F2 = fi (q1 , . . . , qn , t)Pi .
i
Note that
∂F2 X ∂fj
pi = = Pj
∂qi j ∂qi
is at any point ~q a linear transformation of the momenta, required to preserve

the canonical Poisson bracket, but this transformation is ~q dependent, so
while Q~ is a function of ~q and t only, independent of p~, P~ (q, p, t) will in general
have a nontrivial dependence on coordinates as well as a linear dependence
on the old momenta.
For a harmonic oscillator, a simple scaling gives
p2 k 1q
H= + q2 = k/m P 2 + Q2 ,
2m 2 2
where Q = (km)1/4 q, P = (km)−1/4 p. In this form, thinking P
θ
of phase space as just some two-dimensional space, we seem to
be encouraged to consider a second canonical transformation
Q
Q, P −→ θ, P, generated by F1 (Q, θ), to a new, polar, coordi-
F1
nate system with θ = tan−1 Q/P as the new coordinate, and
we might hope to have the radial coordinate related to the new
momentum, P = − ∂F1 /∂θ|Q . As P = ∂F1 /∂Q|θ is also Q cot θ,
we can take F1 = 21 Q2 cot θ, so
1 1 1
P = − Q2 (− csc2 θ) = Q2 (1 + P 2 /Q2 ) = (Q2 + P 2 ) = H/ω.
2 2 2
Note as F1 is not time dependent, K = H and is independent of θ, which is
therefore an ignorable coordinate, so its conjugate momentum P is conserved.
q P differs from the conserved Hamiltonian H only by the factor
Of course
ω = k/m, so this is not unexpected. With H now linear in the new
momentum P, the conjugate coordinate θ grows linearly with time at the
fixed rate θ̇ = ∂H/∂P = ω.
Infinitesimal generators, redux

Let us return to the infinitesimal canonical transformation
ζi = ηi + gi (ηj ).
Mij = ∂ζi /∂ηj = δij + ∂gi /∂ηj needs to be symplectic, and so Gij = ∂gi /∂ηj
satisfies the appropriate condition for the generator of a symplectic matrix,
G · J = −J · GT . For the generator of the canonical transformation, we
need a perturbation of the generator for the identity transformation, which
can’t be in F1 form (as (q, Q) are not independent), but is easily done in F2
P
form, F2 (q, P ) = i qi Pi + G(q, P, t), with pi = ∂F2 /∂qi = Pi + ∂G/∂qi ,
Qi = ∂F2 /∂Pi = qi + ∂G/∂Pi , or
Qi qi 0 1I ∂G/∂qi

ζ= = + = η + J · ∇G,
Pi pi −1I 0 ∂G/∂pi
where we have ignored higher order terms in in inverting the q → Q relation
and in replacing ∂G/∂Pi with ∂G/∂pi .
The change due to the infinitesimal transformation may be written in
terms of Poisson bracket with the coordinates themselves:
δη = ζ − η = J · ∇G = [η, G].
In the case of an infinitesimal transformation due to time evolution, the small
parameter can be taken to be ∆t, and δη = ∆t η̇ = ∆t[H, η], so we see that
the Hamiltonian acts as the generator of time translations, in the sense that
it maps the coordinate η of a system in phase space into the coordinates the
system will have, due to its equations of motion, at a slightly later time.
This last example encourages us to find another interpretation of canoni-
cal transformations. Up to now we have taken a passive view of the transfor-
mation, as a change of variables describing an unchanged physical situation,
just as the passive view of a rotation is to view it as a change in the descrip-

tion of an unchanged physical point in terms of a rotated set of coordinates.
But rotations are also used to describe changes in the physical situation with
regards to a fixed coordinate system17 , and similarly in the case of motion
through phase space, it is natural to think of the canonical transformation
generated by the Hamiltonian as describing the actual motion of a system
through phase space rather than as a change in coordinates. More generally,
we may view a canonical transformation as a diffeomorphism18 of phase
space onto itself, g : M → M with g(q, p) = (Q, P ).
For an infinitesimal canonical transformation, this active interpretation
gives us a small displacement δη = [η, G] for every point η in phase space,
so we can view G and its associated infinitesimal canonical transformation
as producing a flow on phase space. G also builds a finite transformation by
repeated application, so that we get a sequence of canonical transformations
g λ parameterized by λ = n∆λ. This sequence maps an initial η0 into a se-
quence of points g λ η0 , each generated from the previous one by the infinites-
imal transformation ∆λG, so g λ+∆λ η0 − g λ η0 = ∆λ[g λ η0 , G]. In the limit
∆λ → 0, with n allowed to grow so that we consider a finite range of λ, we
have a one (continuous) parameter family of transformations g λ : M → M,
satisfying the differential equation
dg λ (η) h λ i
= g η, G .
dλ
This differential equation defines a phase flow on phase space. If G is not
a function of λ, this has the form of a differential equation solved by an
exponential,
g λ (η) = eλ[·,G] η,
which means
1
g λ (η) = η + λ[η, G] + λ2 [[η, G], G] + ....
2
In the case that the generating function is the Hamiltonian, G = H, this

phase flow gives the evolution through time, λ is t, and the velocity field on
17
We leave to Mach and others the question of whether this distinction is real.
18
An isomorphism g : M → N is a 1-1 map with an image including all of N (onto),
which is therefore invertible to form g −1 : N → M. A diffeomorphism is an isomorphism
g for which both g and g −1 are differentiable.
phase space is given by [η, H]. If the Hamiltonian is time independent, the
velocity field is fixed, and the solution is formally an exponential.
Let us review changes due to a generating function considered in the
passive and alternately in the active view. In the passive picture, we view η
and ζ = η+δη as alternative coordinatizations of the same physical point A in
P
phase space. For an infinitesimal generator F2 = i qi Pi + G, δη = J∇G =
[η, G]. A physical scalar defined by a function u(η) changes its functional
form to ũ, but not its value at a given physical point, so ũ(ζA ) = u(ηA ). For
the Hamiltonian, there is a change in value as well, for H̃ or K̃ may not be
the same as H, even at the corresponding point,

∂F2 ∂G
K̃(ζA ) = H(ηA ) + = H(ηA ) + .
∂t ∂t A
Now consider an active view. Here a canonical transformation is thought

of as moving the point in phase space, and at the same time changing the
functions u → ũ, H → K̃, where we are focusing on the form of these func-
tions, on how they depend on their arguments. We think of ζ as representing
the η coordinates of a different point B of phase space, although the coordi-
nates ηB = ζA . We ask how ũ and K̃ differ from u and H at B, evaluated
at the same values of their arguments, not at what we considered the same
physical point in the passive view. Let19
∂u
∆u = ũ(ηB ) − u(ηB ) = ũ(ζA ) − u(ζA ) = u(ηA ) − u(ζA ) = −δηi
∂ηi
X ∂u
= − [ηi , G] = −[u, G]
i ∂ηi
∆H = K̃(ηB ) − H(ηB ) = K̃(ζA ) − H(ηB )

!
∂G ∂G
= H(ηA ) + − H(ηA ) − δη · ∇η H = − [H, G]
∂t A ∂t
dG
= .
dt
19
We differ by a sign from Goldstein in the definition of ∆u.
η2 η1 (B) = ζ1 (A)
ζ1 (A) η2 δη
A η2(A) A η (A)
B η2 (B) = ζ (A)
2 2
η1 η1
η1(A) ζ (A) η1(A)

2
Passive view of the canonical trans- Active view of the transformation,
formation. Point A is the same moving from point A in phase space
point, whether expressed in coor- to point B. So ηB = ζA . The func-
dinates ηj or ζj , and scalar function ũ then differs from u when eval-
tions take the same value there, so uated on the same coordinates η.
u(ηA ) = ũ(ζA ).
Note that if the generator of the transformation is a conserved quantity,
the Hamiltonian is unchanged, in that it is the same function after the trans-
formation as it was before. That is, the Hamiltonian is form invariant.
So we see that conserved quantities are generators of symmetries of the
problem, transformations which can be made without changing the Hamil-
tonian. We saw that the symmetry generators form a closed algebra under
Poisson bracket, and that finite symmetry transformations result from expo-
nentiating the generators. Let us discuss the more common conserved quan-
tities in detail, showing how they generate symmetries. We have already seen
that ignorable coordinates lead to conservation of the corresponding momen-
tum. Now the reverse comes if we assume one of the momenta, say pI , is
conserved. Then from our discussion we know that the generator G = pI
will generate canonical transformations which are symmetries of the system.
Those transformations are
δqj = [qj , pI ] = δjI , δpj = [pj , pI ] = 0.
Thus the transformation just changes the one coordinate qI and leaves all
the other coordinates and all momenta unchanged. In other words, it is a
translation of qI . As the Hamiltonian is unchanged, it must be independent
of qI , and qI is an ignorable coordinate.
Second, consider the angular momentum component ω ~ = ijk ωi rj pk

~ ·L
~ produces changes
for a point particle with q = ~r. As a generator, ~ω · L
δr` = [r` , ijk ωi rj pk ] = ijk ωi rj [r` , pk ] = ijk ωi rj δ`k = ij` ωi rj
= (~ω × ~r)` ,
which is how the point moves under a rotation about the axis ω
~ . The mo-
mentum also changes,
δp` = [p` , ijk ωi rj pk ] = ijk ωi pk [p` , rj ] = ijk ωi pk (−δ`j ) = −i`k ωi pk
= (~ω × p~)` ,
so p~ also rotates.
By Poisson’s theorem, the set of constants of the motion is closed un-
der Poisson bracket, and given two such generators, the bracket is also a
symmetry, so the symmetries form a Lie algebra under Poisson bracket.
For a free particle, p~ and L ~ are both symmetries, and we have just seen
that [p` , Li ] = ik` pk , a linear combination of symmetries, while of course
[pi , pj ] = 0 generates the identity transformation and is in the algebra. What
about [Li , Lj ]? As Li = ik` rk p` ,
[Li , Lj ] = [ik` rk p` , Lj ]
= ik` rk [p` , Lj ] + ik` [rk , Lj ]p`
= −ik` rk j`m pm + ik` jmk rm p`
= (δij δkm − δim δjk ) rk pm − (δij δm` − δim δj` ) rm p`
= (δia δjb − δib δja ) ra pb
= kij kab ra pb = ijk Lk . (6.15)
We see that we get back the third component of L, ~ so we do not get a new
kind of conserved quantity, but instead we see that the algebra closes on the
space spanned by the momenta and angular momenta. We also note that
it is impossible to have two components of L ~ conserved without the third
component also being conserved. Note also that ω ~ does a rotation the
~ ·L
~ Indeed it will do so on any vector
same way on the three vectors ~r, p~, and L.
composed from ~r, and p~, rotating all of the physical system20 .
20
If there is some rotationally non-invariant property of a particle which is not built
~ = ~r × p~, in which case L
out of ~r and p~, it will not be suitably rotated by L ~ is not the
full angular momentum but only the orbital angular momentum. The generator of
a rotation of all of the physics, the full angular momentum J, ~ is then the sum of L~ and
another piece, called the intrinsic spin of the particle.
6.7. HAMILTON–JACOBI THEORY 187
The use of the Levi-Civita ijk to write L as a vector is peculiar to three

dimensions; in other dimensions d 6= 3 there is no -symbol to make a vector
out of L, but the angular momentum can always be treated as an antisym-
metric tensor, Lij = xi pj − xj pi . There are D(D − 1)/2 components, and the
Lie algebra again closes
[Lij , Lk` ] = δjk Li` − δik Lj` − δj` Lik + δi` Ljk .
We have related conserved quantities to generators of infinitesimal canon-

ical transformation, but these infinitesimals can be integrated to produce fi-
nite transformations as well. As we mentioned earlier, from an infinitesimal
generator G we can exponentiate to form a one-parameter set of transforma-
tions
ζα (η) = eα[·,G] η
1

= 1 + α[·, G] + α2 [[·, G], G] + ... η
2
1 2
= η + α[η, G] + α [[η, G], G] + ....
2
In this fashion, any Lie algebra, and in particular the Lie algebra formed
by the Poisson brackets of generators of symmetry transformations, can be
exponentiated to form a continuous group of finite transformations, called a
Lie Group. In the case of angular momentum, the three components of L ~
form a three-dimensional Lie algebra, and the exponentials of these form a
three-dimensional Lie group which is SO(3), the rotation group.
6.7 Hamilton–Jacobi Theory

We have mentioned the time dependent canonical transformation that maps
the coordinates of a system at a given fixed time t0 into their values at
a later time t. Now let us consider the reverse transformation, mapping
(q(t), p(t)) → (Q = q0 , P = p0 ). But then Q̇ = 0, Ṗ = 0, and the Hamilto-
nian which generates these trivial equations of motion is K = 0. We denote
by S(q, P, t) the generating function of type 2 which generates this transfor-
mation. It satisfies
∂S ∂S
K = H(q, p, t) + = 0, with pi = ,
∂t ∂qi
so S is determined by the differential equation

!
∂S ∂S
H q, ,t + = 0, (6.16)
∂q ∂t
which we can think of as a partial differential equation in n + 1 variables q, t,
thinking of P as fixed and understood. If H is independent of time, we can
solve by separating the t from the q dependence, we may write S(q, P, t) =
W (q, P ) − αt, where α is the separation constant independent of q and t, but
not necessarily of P . We get a time-independent equation
!
∂W
H q, = α. (6.17)
∂q
The function S is known as Hamilton’s principal function, while the
function W is called Hamilton’s characteristic function, and the equa-
tions (6.16) and (6.17) are both known as the Hamilton-Jacobi equation.
They are still partial differential equations in many variables, but under some
circumstances further separation of variables may be possible. We consider
first a system with one degree of freedom, with a conserved H, which we will
sometimes specify even further to the particular case of a harmonic oscillator.
Then we we treat a separable system with two degrees of freedom.
We are looking for new coordinates (Q, P ) which are time independent,
and have the differential equation for Hamilton’s principal function S(q, P, t):
!
∂S ∂S
H q, + = 0.
∂q ∂t
For a harmonic oscillator with H = p2 /2m + 12 kq 2 , this equation is
!2
∂S ∂S
+ kmq 2 + 2m = 0. (6.18)
∂q ∂t
For any conserved Hamiltonian, we can certainly find a separated solution of
the form
S = W (q, P ) − α(P )t,
and then the terms in (6.16) from the Hamiltonian are independent of t. For
the harmonic oscillator, we have an ordinary differential equation,
!2
dW
= 2mα − kmq 2 ,
dq
which can be easily integrated

Z qq
W = 2mα − kmq 2 dq + f (α)
0
α 1

= f (α) + θ + sin 2θ , (6.19)
ω 2
q q
where we have made a substitution sin θ = q k/2α, used ω = k/m, and
made explicit note that the constant (in q) of integration, f (α), may depend
on α. For other hamiltonians, we will still have the solution to the partial
differential equation for S given by separation of variables S = W (q, P ) − αt,
because H was assumed time-independent, but the integral for W may not
be doable analytically.
As S is a type 2 generating function,
∂F2 ∂W
p= = .
∂q ∂q
For our harmonic oscillator, this gives

√
,
∂W ∂q α 1 + cos 2θ
p= = q = 2αm cos θ.
∂θ ∂θ ω 2α/k cos θ
Plugging into the Hamiltonian, we have
H = α(cos2 θ + sin2 θ) = α,
which will always be the case when (6.17) holds.

We have not spelled out what our new momentum P is, except that it is
conserved, and we can take it to be α. The new coordinate Q = ∂S/∂P =
∂W/∂α|q − t. But Q is, by hypothesis, time independent, so

∂W
= t + Q.
∂α q
For the harmonic oscillator calculation (6.19),

∂W 1 1 α ∂θ θ
= f 0 (α) + (θ + sin 2θ) + (1 + cos 2θ) = f 0 (α) +
∂α q ω 2 ω ∂α q ω
q
where for the second equality we used sin θ = q k/2α to evaluate
s
∂θ −q k 1
= = − tan θ.
∂α q 2α cos θ 2α 2α
Thus θ = ωt + δ, for δ some constant.

As an example of a nontrivial problem with two degrees of freedom which
is nonetheless separable and therefore solvable using the Hamilton-Jacobi
method, we consider the motion of a particle of mass m attracted by Newto-
nian gravity to two equal masses fixed in space. For simplicity we consider
only motion in a plane containing the two masses, which we take to be at
(±c, 0) in cartesian coordinates x, y. If r1 and r2 are the distances from the
particle to the two fixed masses respectively, the gravitational potential is
U = −K(r1−1 + r2−1 ), while the kinetic energy is simple in terms of x and y,
T = 12 m(ẋ2 + ẏ 2 ). The relation between these is, of course,
r12 = (x + c)2 + y 2
y
r22 = (x − c)2 + y 2
r1 r2
Considering both the kinetic and

potential energies, the problem will
c c x
not separate either in
terms of (x, y) or in terms of (r1 , r2 ), but it does separate in terms of elliptical
coordinates
ξ = r1 + r2
η = r1 − r2
˙
From r12 − r22 = 4cx = ξη we find a fairly simple expression ẋ = (ξ η̇ + ξη)/4c.
The expression for y is more difficult, but can be found from observing that
1 2
(r + r22 ) = x2 + y 2 + c2 = (ξ 2 + η 2 )/4, so
2 1
!2
2 ξ 2 + η2 ξη (ξ 2 − 4c2 )(4c2 − η 2 )
y = − − c2 = ,
4 4c 16c2
or
1q 2 q
y= ξ − 4c2 4c2 − η 2
4c
and s s !
1 4c2 − η 2 ξ 2 − 4c2
ẏ = ξ ξ˙ 2 − η η̇ .
4c ξ − 4c2 4c2 − η 2
Squaring, adding in the x contribution, and simplifying then shows that
ξ 2 − η2 2 ξ 2 − η 2 ˙2
!
m
T = η̇ + ξ .
8 4c2 − η 2 ξ 2 − 4c2
Note that there are no crossed terms ∝ ξ˙η̇, a manifestation of the orthogo-
nality of the curvilinear coordinates ξ and η. The potential energy becomes
!
1 1 2 2 −4Kξ

U = −K + = −K + = .
r1 r2 ξ+η ξ−η ξ 2 − η2
In terms of the new coordinates ξ and η and their conjugate momenta, we
see that
2/m 2 2 2 2 2 2

H= 2 p ξ (ξ − 4c ) + p η (4c − η ) − 2mKξ .
ξ − η2
Then the Hamilton-Jacobi equation for Hamilton’s characteristic function is
 !2 !2 
2/m  2 ∂W ∂W
2 2
(ξ − 4c2 ) + (4c2 − η 2 ) − 2mKξ  = α,
ξ −η ∂ξ ∂η
or
!2
2 ∂W2 1
(ξ − 4c ) − 2mKξ − mαξ 2
∂ξ 2
!2
2 ∂W 2 1
+(4c − η ) + αmη 2 = 0.
∂η 2
If W is to separate into a ξ dependent piece and an η dependent one, the
first line will depend only on ξ, and the second only on η, so they must each
be constant, with W (ξ, η) = Wξ (ξ) + Wη (η), and
!2
2 2 dWξ (ξ) 1
(ξ − 4c ) − 2mKξ − αmξ 2 = β
dξ 2
!2
2 dWη (η)
2 1
(4c − η ) + αmη 2 = −β.
dη 2
These are now reduced to integrals for Wi , which can in fact be integrated
to give an explicit expression in terms of elliptic integrals.
6.8 Action-Angle Variables

Consider again a general system with one degree of freedom and a conserved
Hamiltonian. Suppose the system undergoes periodic behavior, with p(t)
and q̇(t) periodic with period τ . We don’t require q itself to be periodic as
it might be an angular variable which might not return to the same value
when the system returns to the same physical point, as, for example, the
angle which describes a rotation.
If we define an integral over one full period,
1 Z t+τ
J(t) = p dq,
2π t
it will be time independent. As p = ∂W/∂q = p(q, α), the integral
R
can be
defined without reference to time, just as the integral 2πJ = pdq over one
orbit of q, holding α fixed. Then J becomes a function of α alone, and if we
assume this function to be invertible, H = α = α(J). We can take J to be
our canonical momentum P . Using Hamilton’s Principal Function S as the
generator, we find Q = ∂S/∂J = ∂W (q, J)/∂J −(dα/dJ)t. Alternatively, we
might use Hamilton’s Characteristic Function W by itself as the generator,
to define the conjugate variable φ = ∂W (q, J)/∂J, which is simply related
to Q = φ − (dα/dJ)t. Note that φ and Q are both canonically conjugate
to J, differing at any instant only by a function of J. As the Hamilton-
Jacobi Q is time independent, we see that φ̇ = dα/dJ = dH/dJ = ω(J),
which is a constant, because while it is a function of J, J is a constant in
time. We could also derive φ̇ from Hamilton’s equations considering W as a
genenerator, for W is time independent, the therefore the new Hamiltonian
is unchanged, and the equation of motion for φ is simply φ̇ = ∂H/∂J. Either
way, we have φ = ωt + δ. The coordinates (J, φ) are called action-angle
variables. Consider the change in φ during one cycle.
!
I
∂φ I
∂ ∂W d I d
∆φ = dq = dq = pdq = 2πJ = 2π.
∂q ∂q ∂J dJ dJ
Thus we see that in one period τ , ∆φ = 2π = ωτ , so ω = 1/τ .

For our harmonic oscillator, of course,
s
I √ 2α Z 2π 2απ
2πJ = pdq = 2αm cos2 θdθ = q
k 0 k/m
6.8. ACTION-ANGLE VARIABLES 193
q
so J is just a constant 1/ k/m times the old canonical momentum α, and
q q
thus its conjugate φ = k/m(t + β), so ω = k/m as we expect. The
important thing here is that ∆φ = 2π, even if the problem itself is not
solvable.
Exercises
6.1 In Exercise 2.7, we discussed the connection between two Lagrangians, L1 and
L2 , which differed by a total time derivative of a function on extended configuration
space,
d
L1 ({qi }, {q̇j }, t) = L2 ({qi }, {q̇j }, t) + Φ(q1 , ..., qn , t).
dt
You found that these gave the same equations of motion, but differing momenta
(1) (2)
pi and pi . Find the relationship between the two Hamiltonians, H1 and H2 ,
and show that these lead to equivalent equations of motion.
6.2 A uniform static magnetic field can be described by a static vector potential
A~ = 1B ~ × ~r. A particle of mass m and charge q moves under the influence of this
2
field.
(a) Find the Hamiltonian, using inertial cartesian coordinates.
(b) Find the Hamiltonian, using coordinates of a rotating system with angular
velocity ω ~
~ = −q B/2mc.
6.3 Consider a symmetric top with one point on the symmetry axis fixed in
space, as we did at the end of chapter 4. Write the Hamiltonian for the top.
Noting the cyclic (ignorable) coordinates, explain how this becomes an effective
one-dimensional system.
6.4 (a) Show that a particle under a central force with an attractive potential
inversely proportional to the distance squared has a conserved quantity D = 12 ~r ·
p~ − Ht.
(b) Show that the infinitesimal transformation generated by G := 21 ~r · p~ scales
~r and p~ by opposite infinitesimal amounts, Q ~ = (1 + )~r, P~ = (1 − )~
2 2 p, or for a
~ ~ −1
finite transformation Q = λ~r, P = λ p~. Show that if we describe the motion in
terms of a scaled time T = λ2 t, the equations of motion are invariant under this
~ P~ , T ).
combined transformation (~r, p~, t) → (Q,
6.5 We saw that the Poisson bracket associates with every differentiable function
f on phase space a differential operator Df := [f, ·] which acts on functions g
on phase space by Df g = [f, g]. We also saw that every differential operator is
associated with a vector, which in a particular coordinate system has components
fi , where
X ∂
Df = fi .
∂ηi
A 1-form acts on such a vector by
dxj (Df ) = fj .
Show that for the natural symplectic structure ω2 , acting on the differential oper-
ator coming from the Poisson bracket as its first argument,
ω2 (Df , ·) = df,
which indicates the connection between ω2 and the Poisson bracket.
6.6 Give a complete discussion of the relation of forms in cartesian coordinates

in four dimensions to functions, vector fields, and antisymmetric matrix (tensor)
fields, and what wedge products and exterior derivatives of the forms correspond
to in each case. This discussion should parallel what is done in section 6.5 for
three dimensions. [Note that two different antisymmetric tensors, Bµν and B̃µν =
1P
2 ρσ µνρσ Bρσ , can be related to the same 2-form, in differing fashions. They
are related to each other with the four dimensional jk`m , which you will need to
define, and are called duals of each other. Using one fashion, the two different
2-forms associated with these two matrices are also called duals.
(b) Let Fµν be a 4 × 4 matrix defined over a four dimensional space (x, y, z, ict),
with matrix elements Fjk = jk` B` , for j, k, ` each 1, 2, 3, and F4j = iEj = −Fj4 .
Show that the statement that F corresponds, by one of the two fashions, to a
closed 2-form F, constitutes two of Maxwell’s equations, and explain how this
implies that 2-form is the exterior derivative of a 1-form, and what that 1-form is
in terms of electromagnetic theory described in 3-dimensional language.
(c) Find the 3-form associated with the exterior derivative of the 2-form dual to F,
and show that it is associated with the 4-vector charge current density J = (~j, icρ),
where ~j is the usual current density and ρ the usual charge density.
6.7 Consider the following differential forms:
A = y dx + x dy + dz
B = y 2 dx + x2 dy + dz
C = xy(y − x) dx ∧ dy + y(y − 1) dx ∧ dz + x(x − 1) dy ∧ dz
D = 2(x − y) dx ∧ dy ∧ dz
E = 2(x − y) dx ∧ dy
6.8. ACTION-ANGLE VARIABLES 195
Find as many relations as you can, expressible without coordinates, among these
forms. Consider using the exterior derivative and the wedge product.
6.8 Consider the unusual Hamiltonian for a one-dimensional problem
H = ω(x2 + 1)p,
where ω is a constant.
(a) Find the equations of motion, and solve for x(t).
1
(b) Consider the transformation to new phase-space variables P = αp 2 , Q =
1
βxp 2 . Find the conditions necessary for this to be a canonical transforma-
tion, and find a generating function F (x, Q) for this transformation.
(c) What is the Hamiltonian in the new coordinates?
6.9 For the central force problem with an attractive coulomb law,
p2 K
H= − ,
2m r
we saw that the Runge-Lenz vector
~ − mK ~r
~ = p~ × L
A
|r|
~ Find the Poisson brackets of Ai with Lj , which you
is a conserved quantity, as is L.
should be able to do without detailed calculation, and also of Ai with Aj . [Hint:
it might be useful to first show that [pi , f (~r)] = −∂i f for any function of the
coordinates only. It will be useful to evaluate the two terms in A ~ independently,
and to use the Jacobi identity judiciously.]
6.10 a) Argue that [H, Li ] = [H, Ai ] = 0. Show that for any differentiable
function R on phase space and any differentiable function f of one variable, if
[H, R] = 0 then [f (H), R] = 0. √
b) Scale the Ai to form new conserved quantities Mi = Ai / −2mH. Given the
~ M
results of (a), find the simple algebra satisfied by the six generators L, ~.
c) Define Lij = ijk Lk , for i, j, k = 1, 2, 3, and Li4 = −L4i = Mi . Show that in
this language, with µ, ν, ρ, σ = 1, ..., 4,
[Lµν , Lρσ ] = −δνρ Lµσ + δµρ Lνσ + δνσ Lµρ − δµσ Lνρ .
What does this imply about the symmetry group of the Hydrogen atom?
6.11 Consider a particle of mass m and charge q in the field of a fixed electric
dipole with dipole moment21 p. In spherical coordinates, the potential energy is
given by
1 qp
U (~r) = cos θ.
4π0 r2
a) Write the Hamiltonian. It is independent of t and φ. As a consequence, there
are two conserved quantities. What are they?
b) Find the partial differential equation in t, r, θ, and φ satisfied by Hamilton’s
principal function S, and the partial differential equation in r, θ, and φ satisfied
by Hamilton’s characteristic function W.
c) Assume W can be broken up into r-dependent, θ-dependent, and φ-dependent
pieces:
W (r, θ, φ, Pi ) = Wr (r, Pi ) + Wθ (θ, Pi ) + Wφ (φ, Pi ).
Find ordinary differential equations for Wr , Wθ and Wφ .
21
Please note that q and p are the charge and dipole moments here, not coordinates or
momenta of the particle.
Chapter 7
Perturbation Theory
The class of problems in classical mechanics which are amenable to exact

solution is quite limited, but many interesting physical problems differ from
such a solvable problem by corrections which may be considered small. One
example is planetary motion, which can be treated as a perturbation on a
problem in which the planets do not interact with each other, and the forces
with the Sun are purely Newtonian forces between point particles. Another
example occurs if we wish to find the first corrections to the linear small
oscillation approximation to motion about a stable equilibrium point. The
best starting point is an integrable system, for which we can find sufficient
integrals of the motion to give the problem a simple solution in terms of
action-angle variables as the canonical coordinates on phase space. One then
phrases the full problem in such a way that the perturbations due to the
extra interactions beyond the integrable forces are kept as small as possible.
We first examine the solvable starting point.
7.1 Integrable systems

An integral of the motion for a hamiltonian system is a function F on
phase space M for which the Poisson bracket with H vanishes, [F, H] = 0.
More generally, a set of functions on phase space is said to be in involution if
all their pairwise Poisson brackets vanish. The systems we shall consider are
integrable systems in the sense that there exists one integral of the motion
for each degree of freedom, and these are in involution and independent.
Thus on the 2n-dimensional manifold of phase space, there are n functions
197
198 CHAPTER 7. PERTURBATION THEORY
Fi for which [Fi , Fj ] = 0, and the Fi are independent, so the dFi are linearly
independent at each point η ∈ M. We will assume the first of these is the
Hamiltonian. As each of the Fi is a conserved quantity, the motion of the
system is confined to a submanifold of phase space determined by the initial
values of these invariants fi = Fi (q(0), p(0)):
Mf~ = {η : Fi (η) = fi for i = 1, . . . , n, connected},
where if the space defined by Fi (η) = fi is disconnnected, Mf~ is only the
piece in which the system starts. The differential operators DFi = [Fi , ·]
correspond to vectors tangent to the manifold Mf~, because acting on each
of the Fj functions DFi vanishes, as the F ’s are in involution. These
differential operators also commute with one another, because as we saw in
(6.12),
DFi DFj − DFj DFi = D[Fi ,Fj ] = 0.
P P
They are also linearly independent, for if αi DFi = 0, αi DFi ηj =
P P
0 = [ αi Fi , ηj ], which means that αi Fi is a constant on phase space,
and that would contradict the assumed independence of the Fi . Thus the
DFi are n commuting independent differential operators corresponding to
the generators Fi of an Abelian1 group of displacements on Mf~. A given
reference point η0 ∈ M is mapped by the canonical transformation generator
ti Fi into some other point g~t(η0 ) ∈ Mf~. Poisson’s Theorem shows the
P
volume covered diverges with ~t, so if the manifold Mf~ is compact, there must
be many values of ~t for which g~t(η0 ) = η0 . These elements form a discrete
Abelian subgroup, and therefore a lattice in Rn . It has n independent lattice
vectors, and a unit cell which is in 1-1 correspondence with Mf~. Let these
basis vectors be ~e1 , . . . , ~en . These are the edges of the unit cell in Rn , the
interior of which is a linear combination ai~ei where each of the ai ∈ [0, 1).
P
We therefore have a diffeomorphism between this unit cell and Mf~, which
induces coordinates on Mf~. Because these are periodic, we scale the ai to
new coordinates φi = 2πai , so each point of Mf~ is labelled by φ, ~ given by
the ~t = φk~ek /2π for which g~t(η0 ) = η. Notice each φi is a coordinate on a
P
circle, with φi = 0 representing the same point as φi = 2π, so the manifold

Mf~ is diffeomorphic to an n dimensional torus T n = (S 1 )n .
1
An Abelian group is one whose elements all commute with each other, A B = B A
for all A, B ∈ G. When Abelian group elements are expressed as exponentials of a set of
independent infinitesimal generators, group multiplication corresponds to addition of the
parameters multiplying the generators in the exponent.
7.1. INTEGRABLE SYSTEMS 199
δti Fi , a point of Mf~ is translated

P
Under an infinitesimal generator
P
by δη = δti [η, Fi ]. This is true for any choice of the coordinates η, in
particular it can be applied to the φj , so
X
δφj = δti [φj , Fi ],
i
where we have already expressed
δ~t =
X
δφk~ek /2π.
k
We see that the Poisson bracket is the inverse of the matrix Aji given by
the j’th coordinate of the i’th basis vector
1
Aji = (~ei )j , δ~t = A · δφ, [φj , Fi ] = A−1 .
2π ji
As the Hamiltonian H = F1 corresponds to the generator with ~t = (1, 0, . . . , 0),

an infinitesimal time translation generated by δtH produces a change δφi =
(A−1 )i1 δt = ωi δt, for some vector ω
~ which is determined by the ~ei . Note that
the periodicities ~ei may depend on the values of the integrals of the motion,
so ω
~ does as well, and we have
~
dφ
~ (f~).
=ω
dt
~ are not conjugate to the integrals of the motion Fi ,
The angle variables φ
but rather to combinations of them,
1
Ii = ~ei (f~) · F~ ,
2π
for then
1 ~
[φj , Ii ] = ~ei (f ) [φj , Fk ] = Aki A−1 = δij .
2π k jk
These Ii are the action variables, which are functions of the original set Fj of
integrals of the motion, and therefore are themselves integrals of the motion.
In action-angle variables the motion is very simple, with I~ constant and
~˙ = ω
φ ~ = constant. This is called conditionally periodic motion, and the
ωi are called the frequencies. If all the ratios of the ωi ’s are rational, the
motion will be truly periodic, with a period the least common multiple of
the individual periods 2π/ωi . More generally, there may be some relations
X
ki ωi = 0
i
for integer values ki . Each of these is called a relation among the fre-
quencies. If there are no such relations the frequencies are said to be inde-
pendent frequencies.
In the space of possible values of ω ~ , the subspace of values for which
the frequencies are independent is surely dense. In fact, most such points
have independent frequencies. We should be able to say then that most of
the invariant tori Mf~ have independent frequencies if the mapping ω ~ (f~) is
one-to-one. This condition is
! !
∂~ω ∂~ω
det 6= 0, or equivalently det 6= 0.
∂ f~ ∂ I~
When this condition holds the system is called a nondegenerate system.
As ωi = ∂H/∂Ii , this condition can also be written as det ∂ 2 H/∂Ii ∂Ij 6= 0.
Consider a function g on Mf~. We define two averages of this function.
~ 0 and averaging
One is the time average we get starting at a particular point φ
over over an infinitely long time,
Z T
~ 0 ) = lim 1
hgit (φ ~0 + ω
g(φ ~ t)dt.
T →∞ T 0
We may also define the average over phase space, that is, over all values of
~ describing the submanifold M ~,
φ f
Z 2π Z 2π
hgiMf~ = (2π) −n
... ~
g(φ)dφ 1 . . . dφn ,
0 0
where we have used the simple measure dφ1 . . . dφn on the space Mf~. Then
an important theorem states that, if the frequencies are independent, and
g is a continuous function on Mf~, the time and space averages of g are
the same. Note any such function g can be expanded in a Fourier series,
~ = P~ n g~ ei~k·φ~ , with hgiM = g~ , while
g(φ) k∈Z k f~ 0
1ZTX ~ ~ ~
hgit = lim g~k eik·φ0 +ik·~ωt dt
T →∞ T 0
~ k
~ ~ 1 Z T i~k·~ωt
g~k eik·φ0 lim
X
= g~0 + e dt = g~0 ,
T →∞ T 0
~k6=~0
because
~
1 Z T i~k·~ωt 1 eik·~ωT − 1
lim e = lim = 0,
T →∞ T 0 T →∞ T i~k · ω
~
as long as the denominator does not vanish. It is this requirement that ~k ·~ω 6=
0 for all nonzero ~k ∈ Zn , which requires the frequencies to be independent.
As an important corrolary of this theorem, when it holds the trajectory is
dense in Mf~, and uniformly distributed, in the sense that the time spent in
each specified volume of Mf~ is proportional to that volume, independent of
the position or shape of that volume. This leads to the notion of ergodicity,
that every state of a system left for a long time will have average values of
various properties the same as the average of all possible states with the same
conserved values.
If instead of independence we have relations among the frequencies, these
relations, each given by a ~k ∈ Zn , form a subgroup of Zn (an additive group of
translations by integers along each of the axes). Each such ~k gives a constant
of the motion, ~k · φ.
~ Each independent relation among the frequencies there-
fore restricts the dimensionality of the motion by an additional dimension,
so if the subgroup is generated by r such independent relations, the motion
is restricted to a manifold of reduced dimension n − r, and the motion on
this reduced torus T n−r is conditionally periodic with n − r independent
frequencies. The theorem and corrolaries just discussed then apply to this
reduced invariant torus, but not to the whole n-dimensional torus with which
we started. In particular, hgit (φ0 ) can depend on φ0 as it varies from one
submanifold T n−r to another, but not along paths on the same submanifold.
While having relations among the frequencies for arbitrary values of the
integrals of the motion might seem a special case, unlikely to happen, there
are important examples where they do occur. We saw that for Keplerian mo-
tion, there were five invariant functions on the six-dimensional phase space of
the relative coordinate, because energy, angular momentum, and the Runge-
Lenz are all conserved, giving five independent conserved quantities. The
locus of points in the six dimensional space with these five functions taking
on assigned values is therefore one-dimensional, that is, a curve on the three
dimensional invariant torus. This is responsible for the stange fact that the
oscillations in r have the same period as the cycles in φ. Even for other
central force laws, for which there is no equivalent to the Runge-Lenz vector,
there are still four conserved quantities, so there must still be one relation,
which turns out to be that the periods of motion in θ and φ are the same2 .
If the system is nondegenerate, for typical I~ the ωi ’s will have no relations
and the invariant torus will be densely filled by the motion of the system.
Therefore the invariant tori are uniquely defined, although the choices of
action and angle variables is not. In the degenerate case the motion of
the system does not fill the n dimensional invariant torus, so it need not be
uniquely defined. This is what happens, for example, for the two dimensional
harmonic oscillator or for the Kepler problem.
This discussion has been somewhat abstract, so it might be well to give
some examples. We will consider
• the pendulum
• the two-dimensional isotropic harmonic oscillator
• the three dimensional isotropic anharmonic oscillator
The Pendulum
The simple pendulum is a mass connected by a fixed length massless rod to
a frictionless joint, which we take to be at the origin, hanging in a uniform
gravitational field. The generalized coordinates may be
taken to be the angle θ which the rod makes with the down-
ward vertical, and the azimuthal angle φ. If ` is the length
of the rod, U = −mg` cos θ, and as shown in section 2.2.1 or
section 3.1.2, the kinetic energy is T = 21 m`2 θ̇2 + sin2 θφ̇2 .
So the lagrangian, θ
1
L = m`2 θ̇2 + sin2 θφ̇2 + mg` cos θ
2
φ
is time independent and has an ignorable coordinate φ,
2
The usual treatment for spherical symmetry is to choose L ~ in the z direction, which
sets z and pz to zero and reduces our problem to a four-dimensional phase space with two
integrals of the motion, H and Lz . But without making that choice, we do know that the
motion will be resticted to some plane, so ax x + ay y + az z = 0 for some fixed coefficients
ax , ay , az , and in spherical coordinates r(az cos θ + ax sin θ cos φ + ay sin θ sin φ) = 0. The
r dependence factors out, and thus φ can be solved for, in terms of θ, and must have the
same period.
so pφ = m`2 sin2 θφ̇ is conserved, and so is H. As pθ = m`2 θ̇, the Hamiltonian

is
2
!
1 p φ
H= p2 + − mg` cos θ.
2m`2 θ sin2 θ
In the four-dimensional phase space one coordinate, pφ , is fixed, and the equa-
tion H(θ, φ, pθ ) = E gives a two-dimensional surface in the three-dimensional
space which remains. Let us draw this in cylindrical coordiates with radial
coordinate θ, angular coordinate φ,
and z coordinate pθ . Thus the mo-
tion will be restricted to the invari-
ant torus shown. The generators
F2 = pφ and F1 = H generate mo-
tions along the torus as shown, with
pφ generating changes in φ, leaving
θ and pθ fixed. Thus a point moves
as on the blue path shown, looking
like a line of latitude. The change
in φ generated by g (0,t2 ) is just t2 ,
so we may take φ = φ2 of the last section. H generates the dynamical motion
of the system,
∂H pθ ∂H pφ
θ̇ = = , φ̇ = = ,
∂pθ m`2 ∂pφ m` sin2 θ
2
∂H p2 cos θ
ṗθ = − = φ2 3 − mg` sin θ.
∂θ m` sin θ
This is shown by the red path, which goes around the bottom, through the
hole in the donut, up the top, and back, but not quite to the same point
as it started. Ignoring φ, this is periodic motion in θ with a period Tθ , so
g (Tθ ,0) (η0 ) is a point at the same latitude as η0 . This t ∈ [0, Tθ ] part of the
trajectory is shown as the thick red curve. There is some t̄2 which, together
~
with t̄1 = Tθ , will cause g t̄ to map each point on the torus back to itself.
Thus ~e1 = (Tθ , t̄2 ) and ~e2 = (0, 2π) constitute the unit vectors of the
lattice of ~t values which leave the points unchanged. The trajectory generated
by H does not close after one or a few Tθ . It could be continued indefinitely,
and as in general there is no relation among the frequencies (t̄2 /2π is not
rational, in general), the trajectory will not close, but will fill the surface of
the torus. If we wait long enough, the system will sample every region of the
torus.
The 2-D isotropic harmonic oscillator

A different result occurs for the two dimensional zero-length isotropic oscil-
lator,
1 1 1 1
L = m(ẋ2 + ẏ 2 ) − k(x2 + y 2 ) = m(ṙ2 + r2 φ̇2 ) − kr2 .
2 2 2 2
While this separates in cartesian coordinates, from which we easily see that
the orbit closes because the two periods are the same, we will look instead
at polar coordinates, where we have a conserved Hamiltonian
p2r p2φ 1
F1 = H = + 2
+ kr2 ,
2m 2mr 2
and conserved momentum pφ conjugate to the ignorable coordinate φ.
As before, pφ simply changes φ, as
shown in blue. But now if we trace
the action of H,
dr dφ pφ
= pr (t)/m, = ,
dt dt mr2
dpr p2φ
= − kr(t),
dt mr3 (t)
we get the red curve which closes
on itself after one revolution in φ
and two trips through the donut
hole. Thus the orbit is a closed
curve, there is a relation among the frequencies. Of course the system now
only samples the points on the closed curve, so a time average of any function
on the trajectory is not the same as the average over the invariant torus.
The 3-D isotropic anharmonic oscillator

Now consider the spherically symmetric oscillator for which the potential
energy is not purely harmonic, say U (r) = 21 kr2 + cr4 . Then the Hamiltonian
in spherical coordinates is
p2r p2 p2φ 1 2
H= + θ2+ 2 2 + kr + cr4 .
2m 2mr 2mr sin θ 2
This is time independent, so F1 = H is conserved, the first of our integrals

of the motion. Also φ is an ignorable coordinate, so F2 = pφ = Lz is the
second. But we know that all of L ~ is conserved. While Lx is an integral of
the motion, it is not in involution with Lz , as [Lz , Lx ] = Ly 6= 0, so it will
not serve as an additional generator. But L2 = k L2k is also conserved and
P
has zero Poisson bracket with H and Lz , so we can take it to be the third
generator
p2 p2
!
F3 = L2 = (~r × p~)2 = r2 p~ 2 − (~r · p~)2 = r2 p2r + 2θ + 2 φ 2 − r2 p2r
r r sin θ
p2φ
= p2θ + .
sin2 θ
The full phase space is six dimensional, and as pφ is constant we are left,
in general, with a five dimensional space with two nonlinear constraints.
On the three-dimensional hypersurface, pφ generates motion only in φ, the
Hamiltonian generates the dynamical trajectory with changes in r, pr , θ, pθ
and φ, and F3 generates motion in θ, pθ and φ, but not in r or pr .
Now while Lx is not in involution with the three Fi already chosen, it is
a constant of the (dynamical) motion, as [Lx , H] = 0. But under the flow
generated by F2 = Lz , which generates changes in ηj proportional to [ηj , Lz ],
we have
d X ∂Lx (η) X ∂Lx (η) ∂Lz

Lx (g λLz ~η ) = [ηj , Lz ] = Jjk
dλ j ∂ηj jk ∂ηj ηk
= [Lx , Lz ] 6= 0.
Thus the constraint on the dynamical motion that Lx is conserved tells us

that motion on the invariant torus generated by Lz is inconsistent with the
dynamical evolution — that the trajectory lies in a discrete subspace (two di-
mensional) rather than being dense in the three-dimensional invariant torus.
This also shows that there must be one relation among the frequencies.
Of course we could have reached this conclusion much more easily, as we
did in the last footnote, by choosing the z-axis of the spherical coordinates
along whatever direction L~ points, so the motion restricts ~r to the xy plane,
and throwing in pr gives us a two-dimensional torus on which the motion
remains.
7.2 Canonical Perturbation Theory

We now consider a problem with a conserved Hamiltonian which is in some
sense approximated by an integrable system with n degrees of freedom. This
integrable system is described with a Hamiltonian H (0) , and we assume we
(0) (0)
have described it in terms of its action variables Ii and angle variables φi .
This system is called the unperturbed system, and theHamiltonian is, of
(0) ~(0) ~ (0) (0) ~(0)
course, independent of the angle variables, H I ,φ =H I .
The action-angle variables of the unperturbed system are a canonical set
of variables for the phase space, which is still the same phase space for the
full system. We write the Hamiltonian of the full system as

~ (0) = H (0) I~(0) + H1 I~(0) , φ
H I~(0) , φ ~ (0) . (7.1)
We have included the parameter so that we may regard the terms in H1 as
fixed in strength relative to each other, and still consider a series expansion
in , which gives an overall scale to the smallness of the perturbation.
We might imagine that if the perturbation is small, there are some new
action-angle variables Ii and φi for the full system, which differ by order
from the unperturbed coordinates. These are new canonical coordinates,
and may be generated by a generating function3 (of type 2),
X (0)
~ φ
F I, ~ (0) = ~ φ
φi Ii + F1 I, ~ (0) + ....
This is a time-independent canonical transformation, so the full Hamiltonian

is the same function on phase-space whether the unperturbed or full action-
angle variables are used, but has a different functional form,

~ = H I~(0) , φ
~ φ)
H̃(I, ~ (0) . (7.2)
Note that the phase space itself is described periodically by the coordinates
~ (0) , so the Hamiltonian perturbation H1 and the generating function F1 are
φ
periodic functions (with period 2π) in these variables. Thus we can expand
them in Fourier series:
~ ~ (0)

~ (0)
H1 I~(0) , φ H1~k I~(0) eik·φ ,
X
= (7.3)
~k
~ ~ (0)

~ φ~ (0) F1~k I~ eik·φ ,
X
F1 I, = (7.4)
~k
3
To avoid confusion, note that here F1 is not the first integral of the motion.
7.2. CANONICAL PERTURBATION THEORY 207
where the sum is over all n-tuples of integers ~k ∈ Zn . The zeros of the new
~ so we may choose F ~ (I) = 0.
angles are arbitrary for each I, 10
The unperturbed action variables, on which H0 depends, are the old
(0) (0) (0)
momenta given by Ii = ∂F/∂φi = Ii + ∂F1 /∂φi + ..., so to first order
∂H0 ∂F1
H0 I~(0) = H0 I~ +
X
(0) (0)
+ ...
j ∂Ij ∂φj
~ i~k·φ~(0) + ...,
X (0) X
= H0 I~ + ωj ikj F1~k (I)e (7.5)
j ~k
(0) (0)
where we have noted that ∂H0 /∂Ij = ωj , the frequencies of the unper-
turbed problem. Thus
~ ~ (0)

~ φ~ ~ (0) = H (0) I~(0) +
= H I~(0) , φ H1~k I~(0) eik·φ
X
H̃ I,
~k
 
~ + H ~ I~(0)  ei~k·φ~(0) .

(0)
H0 I~ +
X X
=  ikj ωj F1~k (I) 1k
~k j
The I~ are the action variables of the full Hamiltonian, so H̃(I, ~ is in fact
~ φ)
independent of φ. ~ In the sum over Fourier modes on the right hand side,
the φ dependence of the terms in parentheses due to the difference of I~(0)
(0)
~ ~ (0)
from I~ is higher order in , so the the coefficients of eik·φ may be considered
constants in φ(0) and therefore must vanish for ~k 6= ~0. Thus the generating
function is given in terms of the Hamiltonian perturbation
H1~k ~k 6= ~0.
F1~k = i , (7.6)
~k · ω ~
~ (0) (I)
We see that there may well be a problem in finding new action variables
if there is a relation among the frequencies. If the unperturbed system is
not degenerate, “most” invariant tori will have no relation among the fre-
quencies. For these values, the extension of the procedure we have described
to a full power series expansion in may be able to generate new action-
angle variables, showing that the system is still integrable. That this is true
(0)
for sufficiently small perturbations and “sufficiently irrational” ωJ is the
conclusion of the famous KAM theorem4 .
4
See Arnold[2], pp 404-405, though he calls it Kolmogorov’s Theorem, denying credit
to himself and Moser, or Josè and Saletan[8], p. 477.
What happens if there is a relation among the frequencies? Consider a

(0) (0)
two degree of freedom system with pω1 + qω2 = 0, with p and q relatively
prime. Then the Euclidean algorithm shows us there are integers m and
(0)
n such that pm + qn = 1. Instead of our initial variables φi ∈ [0, 2π] to
describe the torus, we can use the linear combinations
(0) (0)
! !
ψ1 p q φ1 φ1

= (0) =B· (0) .
ψ2 n −m φ2 φ2
Then ψ1 and ψ2 are equally good choices for the angle variables of the unper-
turbed system, as ψi ∈ [0, 2π] is a good coordinate system on the torus. The
corresponding action variables are Ii0 = (B −1 )ji Ij , and the corresponding
new frequencies are
∂H X ∂H ∂Ij (0)
ωi0 = 0
= 0
= Bij ωj ,
∂Ii j ∂Ij ∂Ii
(0) (0)
and so in particular ω10 = pω1 + qω2 = 0 on the chosen invariant torus.
This conclusion is also obvious from the equations of motion φ̇i = ωi .
In the unperturbed problem, on our initial invariant torus, ψ1 is a constant
of the motion, so in the perturbed system we might expect it to vary slowly
with respect to ψ2 . Then it is appropriate to use the adiabatic approximation
of section 7.3
7.2.1 Time Dependent Perturbation Theory

Now we will consider problems for which the Hamiltonian H is approximately
that of an exactly solvable problem, H0 . So we write
H(q, p, t) = H0 (q, p, t) + HI (q, p, t),
where HI (q, p, t) is considered a small “interaction” Hamiltonian. We as-

sume we know Hamilton’s principal function S0 (q, P, t) for the unperturbed
problem, which gives a canonical transformation (q, p) → (Q, P ), and in the
limit → 0, Q̇ = Ṗ = 0. For the full problem,
∂S0
K(Q, P, t) = H0 + HI + = HI ,
∂t
7.2. CANONICAL PERTURBATION THEORY 209
and is small. Expressing HI in terms of the new variables (Q, P ), we have

that
∂HI ∂HI
Q̇ = , Ṗ = −
∂P ∂Q
and these are slowly varying because is small. In symplectic form, with
ζ T = (Q, P ), we have, of course,
ζ̇ = J · ∇HI (ζ). (7.7)
This differential equation can be solved perturbatively. If we assume an

expansion
ζ(t) = ζ0 (t) + ζ1 (t) + 2 ζ2 (t) + ...,
ζ̇n on the left of (7.7) can be determined from only lower order terms ζj ,
j < n on the right hand side. The initial value Rt
ζ(0) is arbitrary, so we can
take it to be ζ0 (0), and determine ζn (t) = 0 ζ̇n (t0 )dt0 accurate to order n .
Thus we can recursively find higher and higher order terms in . This is a
good expansion for small enough, for fixed t, but as we are making an error
in ζ̇, this will give an error of order t compared to the previous stage., so the
total error at the m’th step is O([t]m ) for ζ(t). Thus for calculating the long
time behavior of the motion, this method is unlikely to work in the sense
that any finite order calculation cannot be expected to be good for t → ∞.
Even though H and H0 differ only slightly, and so acting on any given η they
will produce only slightly different rates of change, as time goes on there is
nothing to prevent these differences from building up. In a periodic motion,
for example, the perturbation is likely to make a change ∆τ of order in the
period τ of the motion, so at a time t ∼ τ 2 /2∆τ later, the systems will be at
opposite sides of their orbits, not close together at all.
Clearly a better approximation scheme is called for, one in which ζ(t) is
compared to ζ0 (t0 ) for a more appropriate time t0 . The canonical method
does this, because it compares the full Hamiltonian and the unperturbed one
at given values of φ, not at a given time. Another example of such a method
applies to adiabatic invariants.
7.3 Adiabatic Invariants

7.3.1 Introduction
We are going to discuss the evolution of a system which is, at every instant,
given by an integrable Hamiltonian, but for which the parameters of that
Hamiltonian are slowly varying functions of time. We will find that this leads
to an approximation in which the actions are time invariant. We begin with a
qualitative discussion, and then we discuss a formal perturbative expansion.
First we will consider a system with one degree of freedom described by
a Hamiltonian H(q, p, t) which has a slow time dependence. Let us call TV
the time scale over which the Hamiltonian has significant variation (for fixed
q, p). For a short time interval TV , such a system could be approximated
by the Hamiltonian H0 (q, p) = H(q, p, t0 ), where t0 is a fixed time within
that interval. Any perturbative solution based on this approximation may
be good during this time interval, but if extended to times comparable to
the time scale TV over which H(q, p, t) varies, the perturbative solution will
break down. We wish to show, however, that if the motion is bounded and
the period of the motion determined by H0 is much less than the time scale
of variations TV , the action is very nearly conserved, even for evolution over
a time interval comparable to TV . We say that the action is an adiabatic
invariant.
7.3.2 For a time-independent Hamiltonian

In the absence of any explicit time dependence, a Hamiltonian is conserved.
The motion is restricted to lie on a particular contour H(q, p) = α, for all
times. For bound solutions to the equations of motion, the solutions are
periodic closed orbits in phase space. We will call this contour Γ, and the
period of the motion τ . Let us parameterize the contour with the action-
angle variable φ. We take an arbitrary point on Γ to be φ = 0 and also
(q(0), p(0)). As action-angles evolve at a fixed rate, every other point is
determined by Γ(φ) = (q(φτ /2π), p(φτ /2π)), so the complete orbit is given
by Γ(φ), φ ∈ [0, 2π). The action is defined as
1 I
J= pdq. (7.8)
2π
This may be considered
R t+τ
as an integral along one cycle in extended phase
space, 2πJ(t) = t p(t )q̇(t0 )dt0 . Because p(t) and q̇(t) are periodic with
0
7.3. ADIABATIC INVARIANTS 211
period τ , J is independent of time t. But J can also be

p
thought
H
of as an integral in phase space itself, 2πJ =
Γ pdq, of a one form ω1 = pdq along the closed 1
path Γ(φ), φ ∈ [0, 2π], which is the orbit in question. -1 0 1

q
By Stokes’ Theorem, -1
Z Z
dω = ω, Fig. 1. The orbit
S δS
of an autonomous
true for any n-form ω Rand suitable region S of a man- system in phase
ifold, we have 2πJ = A dp ∧ dq, where A is the area space.
bounded by Γ.
In extended phase space {q, p, t}, if we start at time t=0 with any point
(q, p) on Γ, the trajectory swept out by the equations of motion, (q(t), p(t), t)
will lie on the surface of a cylinder with base A extended in the time direction.
Let Γt be the embedding of Γ into the time slice at t, which is the intersection
of the cylinder with that time slice. The
surface of the cylinder can also be viewed
as the set of all the dynamical trajectories
which start on Γ at t = 0. In other words, if
2
Tφ (t) is the trajectory of the system which 1
starts at Γ(φ) at t=0, the set of Tφ (t) for p0
φ ∈ [0, 2π], t ∈ [0, T ], sweeps out the same -2
-1 Γ ℑ -1
t
0q
surface as {Γt }, for all t ∈ [0, T ]. Because 0 5
10 1
this is an autonomous system, the value t 15 20 2
of the action J is the same, regardless of Fig 2. The surface in extended

whether it is evaluated along Γt , for any phase space, generated by the
t, or evaluated along one period for any ensemble of systems which start
of the trajectories starting on Γ0 . If we at time t = 0 on the orbit Γ
terminate the evolution at time T , the end shown in Fig. 1. One such
of the cylinder, ΓT , is the same orbit of the trajectory is shown, labelled I,
motion, in phase space, as was Γ0 . and also shown is one of the Γt .
7.3.3 Slow time variation in H(q, p, t)

Now consider a time dependent Hamiltonian H(q, p, t). For a short interval
of time near t0 , if we assume the time variation of H is slowly varying, the
autonomous Hamiltonian H(q, p, t0 ) will provide an approximation, one that
has conserved energy and bound orbits given by contours of that energy.
Consider extended phase space,

and a closed path Γ0 (φ) in the
t=0 plane which is a contour of 1
H(q, p, 0), just as we had in the
time-independent case. For each p0
point φ on this path, construct -1

0 10 2
the trajectory Tφ (t) evolving from 20
30
40 0 q
Γ(φ) under the influence of the t 50
60 -2
full Hamiltonian H(q, p, t), up until Fig. 3. The motion of a harmonic
some fixed final time t = T . This oscillator with time-varying spring
collection of trajectories will sweep constant k ∝ (1 − t)4 , with =
out a curved surface Σ1 with bound- 0.01. [Note that the horn is not
ary Γ0 at t=0 and another we call tipping downwards, but the surface
ΓT at time t=T . ends flat against the t = 65 plane.]
Because the Hamiltonian does change with time, these Γt , the intersections
of Σ1 with the planes at various times t, are not congruent. Let Σ0 and ΣT
be the regions of the t=0 and t=T planes bounded by Γ0 and ΓT respectively,
oriented so that their normals go forward in time.
This constructs a region which is a deformation of the cylinder5 that we
had in the case where H was independent of time. Of course if the variation
of H is slow on a time scale of T , the path ΓT will not Hdiffer much from Γ0 ,
so it will be nearly an orbit and the action defined by pdq around ΓT will
be nearly that around Γ0 . We shall show something much stronger; that if
the time dependence of H is a slow variation compared with the approximate
period of the Hmotion, then each Γt is nearly an orbit and the action on that
˜ =
path, J(t) Γt pdq is constant, even if the Hamiltonian varies considerably
over time T .
The Σ’s form a closed surface, which is Σ1 +ΣT −Σ0 , where we have taken
the orientation of Σ1 to point outwards, and made up for the inward-pointing
direction of Σ0 with a negative sign. Call the volume enclosed by this closed
surface V .
˜ and J(T
We will first show that the actions J(0) ˜ ) defined on the ends of
5
Of course it is possible that after some time, which must be on a time scale of order TV
rather than the much shorter cycle time τ , the trajectories might intersect, which would
require the system to reach a critical point in phase space. We assume that our final time
T is before the system reaches a critical point.
the cylinder are the same. Again from Stokes’ theorem, they are
Z Z Z
˜ =
J(0) pdq = dp ∧ dq ˜ )=
and J(T dp ∧ dq
Γ0 Σ0 ΣT
respectively. Each of these surfaces has no component in the t direction, so

˜ = R dω3 , where ω3 = pdq − Hdt is the one-form
we may also evaluate J(t) Σt
of section (6.6) which determines the motion by Hamilton’s principle. So
dω3 = dp ∧ dq − dH ∧ dt. (7.9)
Clearly dω3 is closed as it is exact.
As H is a function on extended phase space, dH = ∂H ∂p
dp + ∂H
∂q
dq + ∂H
∂t
dt,
and thus
∂H ∂H
dω3 = dp ∧ dq − dp ∧ dt − dq ∧ dt
∂p ∂q
! !
∂H ∂H
= dp + dt ∧ dq − dt , (7.10)
∂q ∂p
where we have used the antisymmetry of the wedge product, dq ∧ dt =
−dt ∧ dq, and dt ∧ dt = 0.
Now the interesting thing about this rewriting of the action in terms of
the new form (7.10) of dω3 is that dω3 is now a product of two 1-forms
∂H ∂H
dω3 = ωa ∧ ωb , where ωa = dp + dt, ωb = dq − dt,
∂q ∂p
and each of ωa and ωb vanishes along any trajectory of the motion, along
which Hamilton’s equations require
dp ∂H dq ∂H
=− , = .
dt ∂q dt ∂p
As a consequence, dω3 vanishes at any point when evaluated on a surface
which contains a physical trajectory, so in particular dω3 vanishes over the
surface Σ1 generated by the trajectories. Because dω3 is closed,
Z Z
dω3 = d(dω3 ) = 0
Σ1 +ΣT −Σ0 V
where the first equality is due to Gauss’ law, one form of the generalized
Stokes’ theorem. Then we have
Z Z
˜ )=
J(T dω3 = ˜
dω3 = J(0).
ΣT Σ0
What we have shown here for the area in phase space enclosed by an orbit
holds equally well for any area in phase space. If A is a region in phase space,
and if we define B as that region in phase space in which systemsRwill lie at
time t = T if the system was in A at time t = 0, then A dp ∧ dq = B dp ∧ dq.
R
For systems with n > 1 degrees of freedom, we may consider a set of n

forms ( k dpk ∧ dqk )j , j = 1...n, which are all conserved under dynamical
P
evolution. In particular, ( k dp ∧ dqk )n tells us the hypervolume in phase

P
space is preserved under its motion under evolution according to Hamilton’s

equations of motion. This truth is known as Liouville’s theorem, though the
n invariants ( k dp ∧ dqk )j are known as Poincaré invariants.
P
R
While we have shown that the integral pdq is conserved when evaluated
over an initial contour in phase space at time t = 0, and then compared
to its integral over the path at time t = T given by the time evolution of
the ensembles which started on the first path, neither of these integrals are
exactly an action.
In fact, for a time-varying system

1 1
the action is not really well defined, p 0.5 p 0.5
because actions are defined only for -2 -1.5 -1 -0.5 0 0.5 0

q1 1.5 -2 -1.5 -1 -0.5 0.5
q1 1.5
periodic motion. For the one dimen- -0.5 -0.5
sional harmonic oscillator (with vary- -1 -1
ing spring constant) of Fig. 3, a reason- Fig. 4. The trajectory in phase

able substitute definition is to define J space of the system in Fig. 3. The
for each “period” from one passing to “actions” during two “orbits” are
the right through the symmetry point, shown by shading. In the adiabatic
q = 0, to the next such crossing. The approximation the areas are equal.
p
trajectory of a single such system as it
moves through phase spaceR
is shown in -2 -1 0
q 1 1.5
Fig. 4. The integrals p(t)dq(t) over

time intervals between successive for-
-1
ward crossings of q = 0 is shown for
the first and last such intervals. While Fig. 5. The differences between the
these appear to have roughly the same actual trajectories (thick lines) dur-
area, what we have shown is that the ing the first and fifth oscillations,
integrals over the curves Γt are the and the ensembles Γt at the mo-
same. In Fig. 5 we show Γt for t at ments of the beginnings of those pe-
the beginning of the first and fifth “periods. The area enclosed by the lat-
riods”, together with the actual motion ter two curves are strictly equal, as
through those periods. The deviations we have shown. The figure indi-
are of order τ and not of T , and so are cates the differences between each
negligible as long as the approximate of those curves and the actual tra-
period is small compared to TV ∼ 1/. jectories.
Another way we can define an action in our time-varying problem is to
write an expression for the action on extended phase space, J(q, p, t0 ), given
by the action at that value of (q, p) for a system with hamiltonian fixed at
the time in question,qHt0 (q, p) := H(q, p, t0 ). This is an ordinary harmonic
oscillator with ω = k(t0 )/m. For an autonomous harmonic oscillator the
area of the elliptical orbit is
2
2πJ = πpmax qmax = πmωqmax ,
while the energy is
p2 mω 2 2 mω 2 2
+ q =E= q ,
2m 2 2 max
so we can write an expression for the action as a function on extended phase
space,
1 2 p2 mω(t) 2
J = mωqmax = E/ω = + q .
2 2mω(t) 2
With this definition, we can assign a value for the action to the system as a
each time, which in the autonomous case agrees with the standard action.
From this discussion, we see that if

the Hamiltonian varies slowly on the
time scale of an oscillation of the sys-
tem, the action will remain fairly close 1.2
to the J˜t , which is conserved. Thus
J
the action is an adiabatic invariant, con- 1
served in the limit that τ /TV → 0.
To see how this works in a particular
0.8
example, consider the harmonic oscilla-
tor with a time-varying spring constant,
which we have chosen to be k(t) = 0.6
k0 (1 − t)4 . With = 0.01, in units
given by the initial ω, the evolution is 0.4
ω
shown from time 0 to time 65. During E
this time the spring constant becomes 0.2
over 66 times weaker, and the natural
frequency decreases by a factor of more
0 20 40 60
than eight, as does the energy, but the t
action remains quite close to its origi- Fig. 6. The change in angular
nal value, even though the adiabatic ap- frequency, energy, and action
proximation is clearly badly violated by for the time-varying spring-
a spring constant which changes by a constant harmonic oscillator,
factor of more than six during the last with k(t) ∝ (1 − t)4 , with
oscillation. = ω(0)/100
We see that the failure of the action to be exactly conserved is due to
the descrepancy between the action evaluated on the actual path of a single
system and the action evaluated on the curve representing the evolution,
after a given time, of an ensemble of systems all of which began at time t = 0
on a path in phase space which would have been their paths had the system
been autonomous.
This might tempt us to consider a different problem, in which the time
dependance of the hamiltonian varies only during a fixed time interval, t ∈
[0, T ], but is constant before t = 0 and after T . If we look at the motion
during an oscillation before t = 0, the system’s trajectory projects exactly
˜
onto Γ0 , so the initial action J = J(0). If we consider a full oscillation
beginning after time T , the actual trajectory is again a contour of energy in
phase space. Does this mean the action is exactly conserved?
There must be something wrong with this argument, because the con-
˜ did not depend on assumptions of slow variation of the Hamil-

stancy of J(t)
tonian. Thus it should apply to the pumped swing, and claim that it is
impossible to increase the energy of the oscillation by periodic changes in
the spring constant. But every child knows that is not correct. Examining
this case will point out the flawed assumption in the argument. In Fig. 7,
we show the surface generated
by time evolution of an ensem-
ble of systems initially on an en-
ergy contour for a harmonic os-
cillator. Starting at time 0, the
spring constant is modulated by
10% at a frequency twice the
1.5
natural frequency, for four nat- 1
0.5
ural periods. Thereafter the 0
1
1.5
-0.5
Hamiltonian is the same as is -1
0
0.5
-1.5
was before t = 0, and each sys- 0 -1
-0.5
10
tem’s path in phase space con- 20
30
-1.5
tinues as a circle in phase space Fig. 7. The surface Σ1 for a harmonic

(in the units shown), but the en- oscillator with a spring constant which
semble of systems form a very varies, for the interval t ∈ [0, 8π], as
elongated figure, rather than a k(t) = k(0)(1 + 0.1 sin 2t).
circle.
What has happened is that some of the systems in the ensemble have
gained energy from the pumping of the spring constant, while others have
lost energy. Thus there has been no conservation of the action for individual
systems, but rather there is some (vaguely understood) average action which
is unchanged.
Thus we see what is physically the crucial point in the adiabatic expan-
sion: if all the systems in the ensemble experience the perturbation in the
same way, because the time variation of the hamiltonian is slow compared
to the time it takes for each system in the ensemble to occupy the initial
position (in phase space) of every other system, then each system will have
its action conserved.
7.3.4 Systems with Many Degrees of Freedom

In the discussion above we considered as our starting point an autonomous
system with one degree of freedom. As the hamiltonian is a conserved
function on phase space, this is an integrable system. For systems with

n > 1 degrees of freedom, we wish to again start with an integrable sys-
tem. Such systems have n invariant “integrals of the motion in involution”,
and their phase space can be described in terms of n action variables Ji and
corresponding coordinates φi . Phase
space is periodic in each of the φi with
period 2π, and the submanifold Mf~ of
phase space which has a given set {fi }
of values for the Ji is an n-dimensional
torus. As the Ji are conserved, the mo-
tion is confined to Mf~, and indeed the
equations of motion are very simple,
dφi /dt = ωi (constant). Mf~ is known
as an invariant torus. Γ2
Γ1
In the one variable case we related
the action to the 1-form p dq. On the Fig 8. For an integrable system
invariant torus, the actions are con- with two degrees of freedom, the
stantsH and so it is trivially true that motion is confined to a 2-torus,
J = Ji dφi /2π, where the integral is and the trajectories are uniform
R i2π
0 dφi with the other φ’s held fixed. motion in each of the angles, with
This might lead one to think about n independent frequencies. The
1-forms without a sum, but it is more two actions J1 and J2 may be
profitable to recognize that the single considered as integrals of the single
P P
1-form ω1 = Ji dφi alone contains all 1-form ω1 = Ji dφi over two
of the information we need. First note independant cycles Γ1 and Γ2 as
that, restricted to Mf~ , dJi vanishes, shown.
so ω1 is closed on Mf~, and its integral is a topological invariant, that is,

unchanged under continuous deformations of the path. We can take a set of
paths, or cycles, Γi , each winding around the torus only in the φi direction,
1 R
and we then have Ji = 2π Γi 1 The answer is completely independent of
ω .
where the path Γi is drawn on Mf~ , as long as its topology is unchanged.
Thus the action can be thought of as a function on the simplicial homology
H1 of Mf~ . The actions can also be expressed as an integral over a surface
1 R P
Σi bounded by the Γi , Ji = 2π Σi dJi ∧ dφi . Notice that this surface
does not lie on the invariant torus but cuts across it. This formulation has
dpi ∧ dqi is invariant under arbitrary canonical
P
two advantages. First,
transformations, so dJi ∧ dφi is just one way to write it. Secondly, on a
P
surface of constant t, such as Σi , it is identical to the fundamental form

n
X
dω3 = dpi ∧ dqi − dH ∧ dt,
i=1
the generalization to several degrees of freedom of the form we used to show

the invariance of the integral under time evolution in the single degree of
freedom case.
Now suppose that our system is subject to some time-dependent pertur-
bation, but that at all times its Hamiltonian remains close to an integrable
system, though that system might have parameters which vary with time.
Let’s also assume that after time T the hamiltonian again becomes an au-
tonomous integrable system, though perhaps with parameters different from
what it had at t = 0.
Consider the evolution in time, un-
der the full hamiltonian, of each sys-
tem which at t = 0 was at some
point φ~ 0 on the invariant torus M ~ of
f
the original unperturbed system. Fol- Γ1
low each such system until time T . Γ2
We assume that none of these systems
reaches a critical point during this evo-
lution. The region in phase space thus
varies continuously, and at the fixed
later time T , it still will be topolog-
ically an n-torus, which we will call ~
B. The image of each of the cycles ~ Γ2
Γi will be a cycle Γ̃i on B, and to- Γ1
gether these images will be a a basis Fig. 9. Time evolution of the in-
of the homology H1 of the B. Let Σ̃i variant torus, and each of two of the
cycles on it.
be surfaces within the t = T hyperplane bounded by Γ̃i . Define J˜i to be

the integral on Σ̃i of dω3 , so J˜i = 2π Σ̃i j dpj ∧ dqj , where we can drop
1 R P
the dH ∧ dt term on a constant t surface, as dt = 0. We can now repeat

the argument from the one-degree-of-freedom case to show that the integrals
J˜i = Ji , again because dω3 is a closed 2-form which vanishes on the surface
of evolution, so that its integrals on the end-caps are the same.
Now we have assumed that the system is again integrable at t = T , so

there are new actions Ji0 , and new invariant tori
M~0g = {(~q, p~) 3 Ji0 (~q, p~) = gi }.
Each initial system which started at φ ~ 0 winds up on some new invariant torus
~
with ~g (φ0 ).
If the variation of the hamiltonian is sufficiently slow and smoothly vary-
ing on phase space, and if the unperturbed motion is sufficiently ergotic that
each system samples the full invariant torus on a time scale short compared
to the variation time of the hamiltonian, then each initial system φ ~ 0 may
be expected to wind up with the same values of the perturbed actions, so
~g is independant of φ ~ 0 . That means that the torus B is, to some good ap-
proximation, one of the invariant tori M~0g , that the cycles of B are cycles of
M~0g , and therefore that Ji0 = J˜i = Ji , and each of the actions is an adiabatic
invariant.
7.3.5 Formal Perturbative Treatment

Consider a system based on a system H(~q, p~, ~λ), where ~λ is a set of param-
eters, which is integrable for each constant value of ~λ within some domain
of interest. Now suppose our “real” system is described by the same Hamil-
tonian, but with ~λ(t) a given slowly varying function of time. Although the
full Hamiltonian is not invariant, we will show that the action variables are
approximately so.
For each fixed value of ~λ, there is a generating function of type 1 to the
corresponding action-angle variables:
~ ~λ) : (~q, p~) → (φ,
F1 (~q, φ, ~ I).
~
This is a time-independent transformation, so the Hamiltonian may be writ-

ten as H(I(~~ q , p~), ~λ), independent of the angle variable. This constant ~λ
Hamiltonian has equations of motion φ̇i = ∂H/∂Ii = ωi (~λ), I˙i = 0. But
in the case where ~λ is a function of time, the transformation F1 is not a
time-independent one, so the correct Hamiltonian is not just the reexpressed
Hamiltonian but has an additional term
~ I,
~ ~λ) = H(I,
~ ~λ) +
X ∂F1 dλn
K(φ, ,
n ∂λn dt
where the second term is the expansion of ∂F1 /∂t by the chain rule. The
equations of motion involve differentiating K with respect to one of the vari-
ables (φj , Ij ) holding the others, and time, fixed. While these are not the
~ for F1 , they are coordinates of phase space, so F1 can
usual variables (~q, φ)
be expressed in terms of (φj , Ij ), and as shown in (7.2), it is periodic in the
φj . The equation of motion for Ij is
∂ 2 F1
φ̇i = ωi (~λ) +
X
λ̇n ,
n ∂λn ∂Ii
∂ 2 F1
I˙i =
X
λ̇n ,
n ∂λn ∂φi
where all the partial derivatives are with respect to the variables φ, ~ I,
~ ~λ. We
first note that if the parameters λ are slowly varying, the λ̇n ’s in the equations
of motion make the deviations from the unperturbed system small, of first
order in /τ = λ̇/λ, where τ is a typical time for oscillation of the system.
But in fact the constancy of the action is better than that, because the
expression for I˙j is predominantly an oscillatory term with zero mean. This
is most easily analyzed when the unperturbed system is truly periodic, with
period τ . Then during one period t ∈ [0, τ ], λ̇(t) ≈ λ̇(0) + tλ̈. Assuming
λ(t) varies smoothly on a time scale τ /, λ̈ ∼ λO(2 /τ 2 ), so if we are willing
to drop terms of order 2 , we may treat λ̇ as a constant. We can then also
evaluate F1 on the orbit of the unperturbed system, as that differs from the
true orbit by order , and the resulting value is multiplied by λ̇, which is
already of order /τ , and the result is to be integrated over a period τ . Then
we may write the change of Ij over one period as
Z τ !
X ∂ ∂F1
∆Ij ≈ λ̇n dt.
n 0 ∂φj ∂λn
But F1 is a well defined single-valued function on the invariant manifold, and
so are its derivatives with respect to λn , so we may replace the time integral
by an integral over the orbit,
!
X τ I ∂ ∂F1
∆Ij ≈ λ̇n dφj = 0,
n L ∂φj ∂λn
where L is the length of the orbit, and we have used the fact that for the
unperturbed system dφj /dt is constant.
Thus the action variables have oscillations of order , but these variations
do not grow with time. Over a time t, ∆I~ = O()+tO(2 /τ ), and is therefore
conserved up to order even for times as large as τ /, corresponding to
many natural periods, and also corresponding to the time scale on which the
Hamiltonian is varying significantly.
This form of perturbation, corresponding to variation of constants on a
time scale slow compared to the natural frequencies of the unperturbed sys-
tem, is known as an adiabatic variation, and a quantity conserved to order
over times comparable to the variation itself is called an adiabatic in-
variant. Classic examples include ideal gases in a slowly varying container,
a pendulum of slowly varying length, and the motion of a rapidly moving
charged particle in a strong but slowly varying magnetic field. It is inter-
esting to note that in Bohr-Sommerfeld quantization in the old quantum
mechanics, used before the Schrödinger equation clarified such issues, the
quantization of bound states was related to quantization of the action. For
example, in Bohr theory the electrons are in states with action nh, with n a
positive integer and h Planck’s constant. Because these values are preserved
under adiabatic perturbation, it is possible that an adiabatic perturbation
of a quantum mechanical system maintains the system in the initial quan-
tum mechanical state, and indeed this can be shown, with the full quantum
theory, to be the case in general. An important application is cooling by
adiabatic demagnetization. Here atoms with a magnetic moment are placed
in a strong magnetic field and reach equilibrium according to the Boltzman
distribution for their polarizations. If the magnetic field is adiabatically re-
duced, the separation energies of the various polarization states is reduced
proportionally. As the distribution of polarization states remains the same
for the adiabatic change, it now fits a Boltzman distribution for a tempera-
ture reduced proportionally to the field, so the atoms have been cooled.
7.4 Rapidly Varying Perturbations

At the other extreme from adiabatic perturbations, we may ask what hap-
pens to a system if we add a perturbative potential which oscillates rapidly
with respect to the natural frequencies of the unperturbed system. If these
forces are of the same magnitude as those of the unperturbed system, we
would expect that they would cause in the coordinates and momenta a small
rapid oscillation, small because a finite acceleration could make only small
7.4. RAPIDLY VARYING PERTURBATIONS 223
changes in velocity and position over a small oscillation time. Then we might
expect the effects of the force to be little more than adding jitter to the unper-
turbed motion. Consider the case that the external force is a pure sinusoidal
oscillation,
H(~q, p~) = H0 (~q, p~) + U (~q) sin ωt,
and let us write the resulting motion as
qj (t) = q̄j (t) + ξj (t),

pj (t) = p̄j (t) + ηj (t),
where we subtract out the average smoothly varying functions q̄ and p̄, leav-
ing the rapidly oscillating pieces ξ~ and ~η , which have natural time scales of
2π/ω. Thus ξ, ¨ ω ξ,
˙ ω 2 ξ, η̇ and ωη should all remain finite as ω gets large with
all the parameters of H0 and U (q) fixed. Our naı̈ve expectation is that the
q̄(t) and p̄(t) are what they would have been in the absence of the perturba-
tion, and ξ(t) and η(t) are purely due to the oscillating force.
This is not exactly right, however, because the force due to H0 depends
on the q and p at which it is evaluated, and it is being evaluated at the full
q(t) and p(t) rather than at q̄(t) and p̄(t). In averaging over an oscillation,
the first derivative terms in H0 will not contributed to a change, but the
second derivative terms will cause the average value of the force to differ
from its value at (q̄(t), p̄(t)). The lowest order effect (O(ω −2 )) is from the
oscillation of p(t), with η ∝ ω −1 ∂U/∂q, changing the average force by an
amount proportional to η 2 times ∂ 2 H0 /∂pk ∂p` . We shall see that a good
approximation is to take q̄ and p̄ to evolve with the effective “mean motion
Hamiltonian”
1 X ∂U ∂U ∂ 2 H0
K(q̄, p̄) = H0 (q̄, p̄) + . (7.11)
4ω 2 k` ∂ q̄k ∂ q̄` ∂ p̄k ∂ p̄`
Under this hamiltonian, we have

∂K ∂H0 1 X ∂U ∂U ∂ 3 H0
q̄˙j = = + .
∂pj ∂pj q̄,p̄ 4ω 2 k` ∂ q̄k ∂ q̄` ∂ p̄k ∂ p̄` ∂ p̄j
∂K
p̄˙j = − (7.12)
∂qj

∂H0 1 X ∂ 2 U ∂U ∂ 2 H0 1 X ∂U ∂U ∂ 3 H0
= − − − .
∂qj q̄,p̄ 2ω 2 k` ∂ q̄j ∂ q̄k ∂ q̄` ∂ p̄k ∂ p̄` 4ω 2 k` ∂ q̄k ∂ q̄` ∂ p̄k ∂ p̄` ∂ q̄j
Of course the full motion for q(t) and p(t) is given by the full Hamiltonian
equations:

∂H0
q̄˙j + ξ˙j =
∂pj q,p

∂H0 X ∂ 2 H0 X ∂ 2 H0
= + ξk + η k
∂pj q̄,p̄ k ∂pj ∂qk q̄,p̄ k ∂pj ∂pk q̄,p̄

1X ∂ 3 H0
+ ηk η` + O(ω −3 )
2 k` ∂pj ∂pk ∂p` q̄,p̄

∂H0 ∂U
p̄˙j + η̇j = − − sin ωt
∂qj q,p ∂qj q,p

∂H0 X ∂ 2 H0 X ∂ 2 H0
= − − ξk − ηk
∂qj q̄,p̄ k ∂qj ∂qk q̄,p̄ k ∂qj ∂pk q̄,p̄

1X ∂ 3 H0 ∂U
− ηk η` − sin ωt
2 k` ∂qj ∂pk ∂p` q̄,p̄ ∂qj q̄

∂ 2 U
+ O(ω −3 ).
X
− ξk sin ωt (7.13)
k ∂qj ∂qk q̄
Subtracting (7.12) from (7.13) gives

∂ 2 H0 ∂ 2 H0
ξ˙j =
X X
ηk + ξk +
k ∂pj ∂pk q̄,p̄ k ∂pj ∂qk q̄,p̄

∂ 3 H0
!
1X 1 ∂U ∂U
+ ηk η` − 2 + O(ω −3 ) (7.14)
2 k` 2ω ∂ q̄k ∂ q̄` ∂pj ∂pk ∂p` q̄,p̄

∂U X ∂ 2 H0 X ∂ 2 H0
η̇j = − sin ωt − ηk − ξk
∂qj q̄ k ∂qj ∂pk q̄,p̄ k ∂qj ∂qk q̄,p̄

∂ 3 H0
!
1X 1 ∂U ∂U
− ηk η` − 2
2 k` 2ω ∂ q̄k ∂ q̄` ∂qj ∂pk ∂p` q̄,p̄

1 X ∂U ∂ 2 H0 ∂ 2 U
!
+ O(ω −3(7.15)
X
− ξk sin ωt − 2 ).
k 2ω ` ∂q` ∂pk ∂p` ∂qj ∂qk q̄
All variables in expressions (7.14) and (7.15) are evaluated at time t. We

wish to show that over a full period τ = 2π/ω, η and ξ grow only negligibly,
that is, ∆η and ∆ξ vanish to O(ω −3 ), for which we need the derivatives to
order O(ω −2 ). During a period, the change in q̄ and p̄ will be O(ω −1 ), so

in evaluating the H0 and U derivative terms in which they are multiplied by
things already O(ω −2 ), we can treat them as constants.
To lowest order in ω −1 , we see that

1 ∂U
ηj (t ) = cos ωt0
0
+ const + O(ω −2 ).
ω ∂qj q̄
The ambiguity in the integration constant is an ambiguity in our initial con-
dition for p̄, so we can set the constant to zero, or better yet, arranged so
that the average value of ηj over one period is zero. So we require <ηk > = 0.
Our expression for ηj (t0 ) is good enough to integrate (7.14) for ξj (t0 ) to order
O(ω −3 ),
1 X ∂U ∂ 2 H0
ξj (t0 ) = 2 sin ωt0 + O(ω −3 ),
ω k ∂ q̄ k ∂p j ∂p k
where we have again dropped the integration constant as a correction to the

initial condition for q̄. Notice that the average of ξj over one period is zero,
to the order required.
Now we are ready to find whether η and ξ change over the course of one
period. We will use
Z t+ τ
2 2π df
sin ωt0 f (t0 ) dt0 = 2
cos ωt + O(ω −3 )
τ
t− 2 ω dt
Z t+ τ
2 2π df
cos ωtf (t) dt = − 2 sin ωt + O(ω −3 )
t− 2τ ω dt
In particular,
Z t+ τ
2 ∂U 2π X ∂ 2 U
0
sin ωt dt0 = cos ωt q̇k
ω2

t− τ2 ∂qj q̄(t0 ) k ∂qj ∂qk q̄(t)

2π X ∂ 2 U ∂H0
= cos ωt .
ω2

k ∂qj ∂qk q̄(t) ∂pk q̄(t),p̄(t)

We also see that

Z t+ τ Z t+ τ !
2 2 df
ηk (t0 )f (t0 ) dt0 = ηk (t0 ) f (t) + (t0 −t) dt0 + O(ω −3 )
t− τ2 t− τ2 dt t
τ
2π df Z t+ 2 0
= <ηk >f (t) + (t − t)ηk (t0 ) dt0
ω dt t t− τ2
2π
= <ηk >f (t) + O(ω −3 )
ω
because ηk (t0 ) is already O(ω −1 ), is multiplied by something less than τ and

integrated over an interval of lenght τ .
So we can write that the changes in η and ξ over one period are
Z t+ τ
2
∆ξj = ξ˙j (t0 ) dt0
t− τ2

∂ 2 H0 ∂ 2 H0
"
2π X X
= <ηk > + <ξk>
ω k ∂pj ∂pk q̄,p̄ k ∂pj ∂qk q̄,p̄

∂ 3 H0
! #
1X 1 ∂U ∂U
+ <ηk η` > − 2 + O(ω −4 )
2 k` 2ω ∂ q̄k ∂ q̄` ∂pj ∂pk ∂p` q̄,p̄

Z t+ τ
2
∆ηj = η̇j (t0 ) dt0
t− τ2

2π X ∂ 2 U ∂H0 2π X ∂ 2 H0
= − 2 cos ωt − <ηk >
ω k ∂qj ∂qk ∂pk ω k ∂qj ∂pk q̄,p̄

∂ 3 H0
!
πX 1 ∂U ∂U
− <ηk η` > − 2
ω k` 2ω ∂ q̄k ∂ q̄` ∂qj ∂pk ∂p` q̄,p̄

1 X ∂U ∂ 2 H0 ∂ 2 U
!
2π X
− <ξk sin ωt> − 2 + O(ω −4 ).
ω k 2ω ` ∂q` ∂pk ∂p` ∂qj ∂qk q̄
We need
τ
ω Z t+ 2 1 ∂U ∂U 0 1 ∂U ∂U
<ηk η` > = 2
cos2 ωt0 dt = ,
2π t− τ2 ω ∂qk ∂q` 2ω 2 ∂qk ∂q`
τ
ω Z t+ 2 1 X ∂U ∂ 2 H0
0
<ξk sin ωt> = sin2
ωt dt0
2π t− τ2 ω2 k ∂ q̄ k ∂p j ∂p k
1 X ∂U ∂ 2 H0
=
2ω 2 k ∂ q̄k ∂pj ∂pk
These, together with our requirement <ηk > = 0, show that all the terms
vanish except
2π X ∂ 2 U ∂H0
∆ηj = − 2 cos ωt.
ω k ∂qj ∂qk ∂pk
Thus the system evolves as if with the mean field hamiltonian, with with a
small added oscillatory motion which does not grow (to order ω −2 for q(t))
with time.
We have seen that there are excellent techniques for dealing with pertur-
bations which are either very slowly varying modifications of a system which
would be integrable were the parameters not varying, or with perturbations
which are rapidly varying (with zero mean) compared to the natural motion
of the unperturbed system.
Exercises
7.1 Consider the harmonic oscillator H = p2 /2m + 12 mω 2 q 2 as a perturbation
on a free particle H0 = p2 /2m. Find Hamilton’s Principle Function S(q, P ) which
generates the transformation of the unperturbed hamiltonian to Q, P the initial
position and momentum. From this, find the Hamiltonian K(Q, P, t) for the full
harmonic oscillator, and thus equations of motion for Q and P . Solve these iter-
atively, assuming P (0) = 0, through fourth order in ω. Express q and p to this
order, and compare to the exact solution for an harmonic oscillator.
7.2 Consider the Kepler problem in two dimensions. That is, a particle of (re-
duced) mass µ moves in two dimensions under the influence of a potential
K
U (x, y) = − p .
x2+ y2
This is an integrable system, with two integrals of the motion which are in invo-
lution. In answering this problem you are expected to make use of the explicit
solutions we found for the Kepler problem.
a) What are the two integrals of the motion, F1 and F2 , in more familiar terms
and in terms of explicit functions on phase space.
b) Show that F1 and F2 are in involution.
c) Pick an appropriate η0 ∈ Mf~, and explain how the coordinates ~t are related
to the phase space coordinates η = g~t(η0 ). This discussion may be somewhat
qualitative, assuming we both know the explicit solutions of Chapter 3, but it
should be clearly stated.
d) Find the vectors ~ei which describe the unit cell, and give the relation between
the angle variables φi and the usual coordinates η. One of these should be explicit,
while the other may be described qualitatively.
e) Comment on whether there are relations among the frequencies and whether
this is a degenerate system.
7.3 Consider a mass m hanging at the end of a length of string which passes
through a tiny hole, forming a pendulum. The length of string below the hole, `(t)
is slowly shortened by someone above the hole pulling on the string. How does
the amplitude (assumed small) of the oscillation of the pendulum depend on time?
(Assume there is no friction).
7.4 A particle of mass m slides without friction on a flat

ramp which is hinged at one end, at which
there is a fixed wall. When the mass hits the
wall it is reflected perfectly elastically. An ex-
ternal agent changes the angle α very slowly
compared to the interval between successive
times at which the particle reaches a maxi-
11
00
mum height. If the angle varies from from 00
11
00
11
00
11
00
11
00
11
an initial value of αI to a final value αF , and L
00
11
00
11
000
111
000
111
000
111
if the maximum excursion is LI at the be-
ginning, what is the final maximum excursion
LF ? 0
1
0
1
α
0
1
7.5 Consider a particle of mass m and charge q in the field of a fixed electric dipole
with moment p~. Using spherical coordinates with the axis in the p~ direction, the
potential energy is given by
1 qp
U (~r) = cos θ.
4π0 r2
There is no explicit t or φ dependence, so H and pφ = Lz are conserved.
a) Show that
p2φ qpm
A = p2θ + + cos θ
sin2 θ 2π0
is also conserved.
b) Given these three conserved quantities, what else must you show to find if this
is an integrable system? Is it true? What, if any, conditions are there for the
motion to be confined to an invariant torus?
Chapter 8
Field Theory
8.1 Lagrangian Mechanics for Fields

In sections 5.3 and 5.4 we considered the continuum limit of a chain of point
masses on stretched string. We had a situation in which the potential energy
had interaction terms for particle A which depended only on the relative dis-
placements of particles in the neighborhood of A. If we take our coordinates
to be displacements from equilibrium, and consider only motions for which
the displacement η = η(x, y, z, t) becomes differentiable in the continuum
limit, then the leading term in the potential energy is proportional to the
square of derivatives in the spatial coordinates. For our points on a string at
tension τ , with mass density ρ, we found
1 ZL 2
T = ρ ẏ (x)dx,
2 0
!2
τ Z L ∂y
U = dx,
2 0 ∂x
and we can write the Lagrangian as an integral of a Lagrangian density

L(y, ẏ, y 0 , x, t) over x. Actually for our stringR we had no y or x or t de-
pendence, because we ignored gravity Ug = ρgy(x, t)dx, and had a ho-
mogeneous string whose properties were also time independent. In general,
however, such dependence is quite possible. For a three dimensional object,
such as the equations for the displacement of the atoms in a crystal, we
discussed fields ~η , the three components of the displacement of a particle,
as a function of the three coordinates (x, y, z) determining the particle, as
229
230 CHAPTER 8. FIELD THEORY
well as time. Thus the generalized coordinates are the functions ηi (x, y, z, t),
and the Lagrangian density will depend on these, their gradients, their time
derivatives, as well as possibly on x, y, z, t. Thus
∂ηi ∂ηi ∂ηi ∂ηi

L = L(ηi , , , , , x, y, z, t)
∂x ∂y ∂z ∂t
and
Z
L = dx dy dz L,
Z
I = dx dy dz dt L.
The actual motion of the system will be given by a particular set of

functions ηi (x, y, z, t), which are functions over the volume in question and
of t ∈ [tI , tf ]. The function will be determined by the laws of dynamics of
the system, together with boundary conditions which depend on the initial
configuration ηi (x, y, z, tI ) and perhaps a final configuration. Generally there
are some boundary conditions on the spatial boundaries as well. For example,
our stretched string required y = 0 at x = 0 and x = L.
Before taking the continuum limit we say that the configuration of the
system at a given t was a point in a large N dimensional configuration space,
and the motion of the system is a path Γ(t) in this space. In the continuum
limit N → ∞, so we might think of the path as a path in an infinite dimen-
sional space. But we can also think of this path as a mapping t → η(·, ·, ·, t)
of time into the (infinite dimensional) space of functions on ordinary space.
Hamilton’s principal says that the actual path is an extremum of the
action. If we consider small variations δηi (x, y, z, t) which vanish on the
boundaries, then Z
δI = dx dy dz dt δL = 0
determines the equations of motion.

Note that what is varied here are the functions ηi , not the coordinates
(x, y, z, t). x, y, z do not represent the position of some atom — they represent
a label which tells us which atom it is that we are talking about. They may
well be the equilibrium position of that atom, but they are independent of the
motion. It is the ηi which are the dynamical degrees of freedom, specifying
the configuration of the system. In our discussion of section 5.4 ηi specified
8.1. LAGRANGIAN MECHANICS FOR FIELDS 231
the displacement from equilibrium, but here we generalize to an arbitrary set

of dynamical fields1 .
The variation of the Lagrangian density is
∂ηi ∂ηi ∂ηi ∂ηi
δL(ηi , , , , , x, y, z, t)
∂x ∂y ∂z ∂t
∂L ∂L ∂η ∂L ∂η ∂L ∂η
= δη + δ + δ + δ
∂η ∂(∂η/∂x) ∂x ∂(∂η/∂y) ∂y ∂(∂η/∂z) ∂z
∂L ∂η
+ δ .
∂(∂η/∂t) ∂t
Notice there is no variation of x, y, z, and t, as we discussed.
The notation is getting awkward, so we need to reintroduce the notation
A,i = ∂A/∂ri . In fact, we see that ∂/∂t enters in the same way as ∂/∂x,
so it is time to introduce notation which will become crucial when we con-
sider relativistic dynamics, even though we are not doing so here. So we
will consider time to be an additional component of the position, called the
zeroth rather than the fourth component. We will also change our notation
for coordinates to anticipate needs from relativity, by writing the indices of
coordinates as superscripts rather than subscripts. Thus we write x0 = ct,
where c will eventually be taken as the speed of light, but for the moment
is an arbitrary scaling factor. Until we get to special relativity, one should
consider whether an index is raised or lowered as irrelevant, but they are
written here in the place which will be correct once we make the distinction
between them. In particular the Kronecker delta is now written δµ ν . For the
partial derivatives we now have
!
∂ ∂ ∂ ∂ ∂
∂µ := = , , , ,
∂xµ c∂t ∂x ∂y ∂z
for µ = 0, 1, 2, 3, and write η,µ := ∂µ η. If there are several fields ηi , then
∂µ ηi = ηi,µ . The comma represents the beginning of differentiation, so we
must not use one to separate different ordinary indices.
In this notation, we have
3
X ∂L XX ∂L
δL = δηi + δηi,µ ,
i ∂ηi i µ=0 ∂ηi,µ
1
Note in particular that {ηi } is not the set of coordinates of phase space as it was in
the last chapter.
and
 
3
Z
∂L ∂L
δηi,µ  d4 x,
X XX
δI =  δηi +
i ∂ηi i µ=0 ∂ηi,µ
where d4 x = dx dy dz dt. Except for the first term, we integrate by parts,

 !
3
Z
∂L X X ∂L 
δηi d4 x,
X
δI =  − ∂µ
i ∂ηi i µ=0 ∂η i,µ
where we have thrown away the boundary terms which involve δηi evaluated
on the boundary, which we assume to be zero. Inside the region of integration,
the δηi are independent, so requiring δI = 0 for all functions δηi (xµ ) implies
∂L ∂L
∂µ − = 0. (8.1)
∂ηi,µ ∂ηi
We have written the equations of motion (which are now partial differ-
ential equations rather than coupled ordinary differential equations), in a
form which looks like we are dealing with a relativistic problem, because t
and spatial coordinates are entering in the same way. We have not made
any assumption of relativity, however, and our problem will not be relativis-
tically invariant unless the Lagrangian density is invariant under Lorentz
transformations (as well as translations).
Now consider how the Lagrangian changes from one point in space-time
to another, including the variation of the fields, assuming the fields obey the
equations of motion. Then the total derivative for a variation of xµ is

dL ∂L ∂L ∂L
µ
= µ
+ ηi,µ + ηi,ν,µ .
dx ∂x η ∂ηi ∂ηi,ν
As we did previously with d/dt, we are using “total” derivative notation
d/dxµ to represent the variation from a change in one xµ , including the
changes induced in the fields which are the arguments of L, though it is still
a partial derivative in the sense that the other three xν need to be held fixed
while varying xµ .
Plugging the equations of motion into the second term,
!
dL ∂L X ∂L X ∂L
= + ∂ν ηi,µ + ηi,µ,ν
dxµ ∂xµ i ∂ηi,ν i ∂ηi,ν
!
∂L X ∂L
= µ
+ ∂ν ηi,µ .
∂x i ∂ηi,ν
Thus
∂L
∂ν Tµ ν = − , (8.2)
∂xµ
where the stress-energy tensor Tµ ν is defined by
∂L
Tµ ν (x) = ηi,µ − Lδµ ν .
X
(8.3)
i ∂ηi,ν
We will often talk about Tµ ν as a function of xρ , understanding that x de-

pendence to include the implicit dependence through the fields, for T is a
function of xµ , ηi (x) and ηi,µ (x). It is that total derivative that we are eval-
uating on the left of equation (8.2), despite the use of partial derivative
notation. But the partial derivatives on the right of that equation do not
include the variations through the fields. Sorry about that, it is just the way
it is.
Note that if the Lagrangian density has no explicit dependence on the
coordinates xµ , equation (8.2) tells us the stress-energy tensor satisfies an
equation ∂ν Tµ ν = 0 which is a continuity equation.
What does that mean? In fluid mechanics, we have the equation of con-
tinuity
∂ρ/∂t + ∇ ~ · (ρ~v ) = 0,
which expresses the conservation of mass. That equation has the interpreta-
tion that the rate of change in the mass contained in some volume is equal
to the flux into the volume, because ρ~v is the flow of mass outward past a
unit surface area. In general, if we have a scalar field ρ(~x, t) which, together
with a vector field ~j(~x, t), satisfies the equation
∂ρ
(~x, t) + ∇ · ~j(~x, t) = 0, (8.4)
∂t
we can interpret ρ as the density of, and ~j as the flow of, a material property
which is conserved. Given any volume V with a boundary surfaceR S, the rate
at which this property is flowing out of the volume, S ~j · dS
R
~ = ∇ · ~j dV ,
V
is the rate Rat which the total amount of the substance in the volume is
decreasing, V −(dρ/dt)dV . If we define j 0 = cρ, we can rewrite this equation
of continuity (8.4), as ν ∂ν j ν = 0, and we say that j ν is a conserved current2 .
P
2
More accurately, the set of four fields j ν (~x, t) is a conserved current.
If we integrate over the whole volume of our field, we can define a total
“charge” Q(t) = V j (~x, t)/c d3 x, and its time derivative is
0
R
d Z
dρ Z Z
~
Q(t) = 3
(~x, t) d x = − ~
∇ · j(~x, t) d x = − ~j · dS.
3
dt V dt V S
We see that this is the integral of the divergence of a vector current ~j, which
by Gauss’ law becomes a surface integral of the flux of j out of the volume
of our system. We have been sloppy about our boundary conditions, but
in many cases it is reasonable to assume there is no flux out of the entire
volume, either because of boundary conditions, as in a stretched string, or
because we are working in an infinite space and expect any flux to vanish at
infinity. Then the surface integral vanishes, and we find that the charge is
conserved.
We have seen that when the lagrangian density has no explicit xµ depen-
dence, for each value of µ, Tµ ν represents such a conserved current. Thus
we should have four conserved currents (Jµ )ν := Tµ ν , each of which gives a
conserved “charge”
Z
Qµ (t) = Tµ 0 (~x, t) d3 x = constant.
V
We will return to what these conserved quantities are in a moment.

In dynamics of discrete systems we defined the momenta pi = ∂L/∂ q̇i ,
and defined the Hamiltonian as H = i pi q̇i − L(q, p, t). In considering the
P
continuum limit of the loaded string, we noted that the momentum corre-
sponding to each point particle (of vanishing mass) disappears in the limit,
but the appropriate thing to do is define a momentum density

δ δ Z ∂L
P (x) = L= L(y(x0 ), ẏ(x0 ), x0 , t)dx0 = ,
δ ẏ(x) δ ẏ(x) ∂ ẏ x
δ
having defined both the “variation at a point” δẏ(x) and the lagrangian density
L. In consideringR 3
the three dimensional continuum as a limit, say on a cubic
P
lattice, L = d xL is the limit of ijk ∆x∆y∆zLijk , where Lijk depends on
~ηijk and a few of its neighbors, and also on ~η˙ ijk . The conjugate momentum
to ~η (i, j, k) is p~ijk = ∂L/∂ ~η˙ ijk = ∆x∆y∆z∂Lijk /∂ ~η˙ ijk , which would vanish
in the continuum limit. So we define instead the momentum density
π` (x, y, z) = (~pijk )` /∆x∆y∆z = ∂Lijk /∂(~η˙ ijk )` = ∂L/∂ η̇` (x, y, z).
The Hamiltonian
p~ijk · ~η˙ ijk − L = ∆x∆y∆z~π (x, y, z) · ~η˙ (xyz) − L

X X
H =
Z Z
= d3 x ~π (~r) · ~η˙ (~r) − L = d3 x H,
where the Hamiltonian density is defined by H(~r) = ~π (~r) · ~η˙ (~r) − L(~r).
This assumed the dynamical fields were the vector displacements ~η (~r, t), but
the same discussion applies to any set of dynamical fields η` (~r, t), even if η
refers to some property other than a displacement. Then
X
H(~r) = π` (~r)η̇` (~r) − L(~r).
`
where
∂L 1 ∂L
π` (~r) = = .
∂ η̇` (~r) c ∂η`,0 (~r)
Notice from (8.3) that Tµ 0 = c ` π` η`,µ − δµ 0 L, and in particular T0 0 =

P
` π` η̇` − L = H is the Hamiltonian density, which we see is one component

P
of the stress-energy tensor.

Consider again the case where L does not depend explicitly on (~x, t),
P3
so ν=0 ∂ν Tµ ν = 0, which, as we have seen, tells us that the four cur-
rents (J )ν := T ν are conserved currents, leading to conserved “charges”
R µ 0 3 µ
Qµ = V Tµ d x. For µ = 0, T00 is the hamiltonian density, so under appro-
priate conditions Q0 is the conserved total energy. Then T0 j should be the j
component of the flow of energy. As an example, let’s return to thinking of ηi
as the displacement, and make the small deviation approximation of section
5.4. If we consider a small piece dS ~ of the surface of a volume V , then the
P
inside is exerting a force dFi = j Pij dSj on the outside, and if the surface
~ But
is moving with velocity ~v , the inside is doing work i vi dFi = ~v · P · dS.
P
~v = d~η /dt or vi = cηi,0 , so energy is flowing out of the volume at a rate
dE Z
~
Z X
− = c ~η,0 · P · dS = c ∂j (ηi,0 Pij ))
dt S V ij
!
Z X
j
Z X
∂L
= c ∂j T0 = c ∂j ηi,0
V j V ij ∂ηi,j
which encourages us to conclude

∂L
Pij = .
∂ηi,j
A force on the surface of our volume transfers not only energy but also
momentum. In fact, the force A exerts on B represents the rate of momentum
transfer from A to B, and the force per unit area across a surface gives the
flux of momentum across that surface. As the outside is exerting a force
−dFi = − j Pij dSj on the inside, this force will cause the momentum Pi of
P
the inside of the volume to be changing at a rate

d Z X Z X Z X
∂L
Pi = − Pij dSj = − ∂j Pij = − ∂j
dt S j V j V j ∂ηi,j
!
Z
d ∂L ∂L
= − ,
V cdt ∂ηi,0 ∂ηi
where in the last step we used the equations of motion. If it were not for
the last term, we would take this as expected, because we would expect, if
the Lagrangian is of the usual form, that the momentum density would be
∂L ∂L
∂ η̇i
= ∂cη i,0
. We will return to the interpretation of this last term after we
discuss what happens in its absence.
Cyclic coordinates
In discrete mechanics, when L was independent of a coordinate qi , even
though it depended on q̇i , we called the coordinate cyclic or ignorable, and
found a conserved momentum conjugate to it. In particular, if we use the
center-of-mass coordinates in an isolated system those will be ignorable co-
ordinates and the conserved momentum of the system will be their conjugate
variables. In field theory, however, the center of mass is not a suitable dy-
namical variable. The variables are not ~x but ηi (~x, t). For fields in general,
L(η, η̇, ∇η) depends on spatial derivates of η as well, and we may ask whether
we need to require absence of dependence on ∇η for a coordinate to be cyclic.
Independence of both η and ∇η implies independence on an infinite num-
ber of discrete coordinates, the values of η(~r) at every point ~r, which is too
restrictive a condition for our discussion. We will call a coordinate field ηi
cyclic if L does not depend directly on ηi , although it may depend on its
derivatives η̇i and ∇ηi .
The Lagrange equation then states

X ∂L d X ∂L
∂µ = 0, or πi + ∂j = 0,
µ ∂ηi,µ dt j ∂ηi,j
which constitutes continuity equations for the densities πi (~r, t) and currents
(~ji )` = ∂L/∂ηi,j . If we integrate this equation over all space, and define
Z
Πi (t) = πi (~r)d3 r,
then the derivative dΠ/dt involves the integral of a divergence, which by

Gauss’ law is a surface term
dΠ(t) Z
∂L
=− (dS)j .
dt ∂ηi,j
If we assume the spatial boundary conditions are such that we may ignore this
boundary term, we see that the Πi (t) will be constants of the motion. These
are the total canonical momentum conjugate to η, and not, except when η
represents a displacement, the components of the total ordinary momentum
of the system.
If we considered our continuum with ηi representing the displacement, and
placed
R
it in a gravitational field, we would have an additional potential energy
V ρgη3 , and our equation for dπi /dt would have an extra term corresponding
to the volume force:
 
dπi ∂L ∂L 
Fivol + Fisurf = ∆V
X
= ∆V − ∂j + ,
dt j ∂ηi,j ∂η i
so
∂L
Fivol = ∆V = −ρgêz ∆V,
∂η i
as expected, and the total momentum is not conserved.
From equation (8.3) we found that if L is independent of ~x, the stress-
energy tensor gives conserved currents. Linear momentum conservation in
field dynamics is connected not to ignorable coordinates but to a lack of
dependence on the labels. This is best viewed as an invariance under a
transformation of all the fields, ηi (~x) → ηi (~x + ~a), for a constant vector
~a. This is a change in the integrand which can be undone by a change in
the variable of integration, ~x → ~x 0 = ~x + ~a, under which the Lagrangian

is unchanged if the integration is over all space and the Lagrangian density
does not depend explicitly on ~x. This is a special case of conserved quantities
arising because of symmetries, a topic we will pursue in the next section.
Before we do so, let us return to our treatment of elasticity in the linear
continuum approximation of a solid, with the dynamical fields being the
displacements ηi (~x, t). We saw that the stress tensor Pij = ∂η∂Li,j , and if we
intend to describe a material obeying the generalized Hooke’s law,
∂L α−β α−β X β
= Pij = − δij Tr S − βSij = − δij ηk,k − (ηi,j + ηj,i ) .
∂ηi,j 3 3 k 2
This suggests a term in the Lagrangian
!2
β−α X β
L1 = ηk,k − (ηi,j + ηj,i )2 .
6 k 8
We will also need a kinetic energy term to give a momentum density, which
we would expect to be just ~π = ρ~η˙ , so we take that term to be
c2 X 2
L2 = ρ η .
2 i i,0
~ r) due to some external potential −~η · E,
Finally, if we have a volume force E(~ ~
~ Thus our total lagrangian density is
this will be from L3 = ~η · E.
!2
c2 X 2 β−α X β
L= ρ ηi,0 + ηk,k − (ηi,j + ηj,i )2 + ~η · E.
~
2 i 6 k 8
Now
∂L β−α X β
= δij ηk,k − (ηi,j + ηj,i )
∂ηi,j 3 k 2
∂L
= c2 ρηi,0
∂ηi,0
∂L
= Ei
∂ηi
so the equations of motion are
" #
∂L ∂L X β−α X β
0 = ∂µ − = ρη̈i + δij ηk,k,j − (ηi,j,j + ηj,i,j ) − Ei ,
∂ηi,µ ∂ηi j 3 k 2
8.2. SPECIAL RELATIVITY 239
or !
α β β
ρ~η¨ = + ~
∇(∇ · ~η ) + ∇2 ~η + E,
3 6 2
in agreement with (5.6).
8.2 Special Relativity

We have commented several times that a continuous symmetry of the dynam-
ics, such as invariance under translation or rotation, is reflected in conserva-
tion laws. We will give a formal development of Noether’s theorem, which
makes this connection generally, in the next section. When we do that, we
will certainly want to consider relativistic invarinance, so first we will revise
and clarify our notation appropriately.
So we now consider the symmetry known as special relativity, the postu-
late that the laws of physics are equally valid in all inertial reference frames.
We will assume familiarity with the basic ideas3 , so we will only deal with no-
tational issues here. The relation of coordinates in different inertial reference
frames is determined by the invariance of
(ds)2 = −c2 (dt)2 + (dx)2 + (dy)2 + (dz)2 ,
where c is the speed of light in vacuum. This looks something like the
Pythagorian length, except that the time component is scaled and has the
wrong sign. The scaling is not a problem, we could just choose to define
x0 = ct and measure time with x0 in meters. Then we can treat the space-
time coordinates as a four-vector4 xµ = (ct, x, y, z). The minus sign is more
significant, so that (ds)2 is not a true length. We introduce the Minkowski
metric tensor
−1 0 0 0
 
 0 1 0 0
ηµν = 
 
 0 0 1 0

0 0 0 1
3
The student who has not learned about Einstein’s theory is referred to Smith ([15])
or French ([5]) for elementary introductions.
4
Actually xµ is a position in space-time and not truly a vector, a distinction discussed
in section (1.2.1) but not important here.
so we can write5
(ds)2 = ηµν dxµ dxν .
X
µν
µ
Notice we have defined x with superscripts rather than subscripts, and any
vector (or tensor) with such indices is said to be contravariant. From any
such vector V µ we can also define a covariant vector
ηµν V ν .
X
Vµ =
ν
This is a somewhat trivial distinction in special relativity, only changing the

sign of the zeroth component6 . But it is useful, because it enables us to define
an invariant inner product µ Vµ W µ . One can also make a contravariant
P
vector from a covariant one, W µ = ν η µν Wν , where η µν is the inverse7 , as

P
a matrix, of ηµν . We will also redefine the Einstein summation convention:

an index which occurs twice is summed over only if it appears once upper
and once lower. (Otherwise it is probably a mistake!) We also redefine what
we mean by the square of a vector V µ : V 2 := ηµν V µ V ν = Vµ V µ and not
P µ 2
µ (V ) .
The relationship between coordinates in different inertial frames,
x0 µ = Λµν xν
is given by the Lorentz transformation matrix Λµν . The invariance of (ds)2
tells us
ηµν Λµρ Λν σ = ηρσ , (8.5)
which says that Λ is pseudo-orthogonal.
We have defined position to be naturally described by a contravariant
vector, but some objects are naturally defined as covariant. In particular,
the partial derivative operator
∂
∂µ = µ
is, for ∂µ xν = δµν .
∂x
5
Note that this is not a two-form, as η is symmetric.
6
In general relativity ηµν is replaced by the metric tensor gµν which is a dynamical
degree of freedom of space-time rather than a fixed matrix, and this distinction becomes
less Ptrivial.
7 µρ µ
ρ η ηρν = δ ν = 1 if µ = ν and 0 otherwise. Note the Kronecker delta function
needs one upper and one lower index in order to be properly covariant, and in fact it and
η are different forms of the same tensor, using the usual lowering or raising procedures
with η. Don’t be misled by the fact that for each µ and ν, η µν is the same as ηµν .
8.2. SPECIAL RELATIVITY 241
With this four dimensional notation we see that time translation and
spatial translations are unified in xµ → xµ + cµ , and rotations are just special
cases of Lorentz transformations, with
1 0 0 0
 
0
Λµ ν
R

=
0
.

0
As for rotations, we may ask how objects transform under Lorentz trans-
formations. For rotations, we saw that in addition to scalars and vectors, we
may have tensors with multiple indices. The same is true in relativity — a
large class of covariant objects may be written in terms of multiple indices,
and the transformation properties are simply multiplicative. First of all,
how does a covariant vector transform? From V 0 µ = Λµν V ν and the lowered
forms Vρ0 = ηρµ V 0 µ = ηρµ Λµν V ν = ηρµ Λµν η νσ Vσ , we see that V 0 µ = Λρσ Vσ ,
where we have used η’s to lower and raise the indices on the Lorentz matrix,
Λρσ = ηρµ Λµν η νσ . So we see that covariant indices transform with Λρσ . Note
that Λρσ Λρτ = ηρµ Λµν η νσ Λρτ = ητ ν η νσ = δτσ , so Λρσ = (Λ−1 )σρ . Note also
that the order of indices matters, Λµν 6= Λν µ .
Now more generally we may define a multiply-indexed tensor
µ ...µ µ ...µ
T 1 jν1 ...νk j+1 ` and it will transform with each index suitably transformed:
` k
0 µ01 ...µ0j µ0j+1 ...µ0` µ0i ν
Λνn0 n T µ1 ...µjν1 ...νk µj+1 ...µ` .
Y Y
T 0 0
ν1 ...νk = Λ µi (8.6)
i=1 n=1
If we contract two indices, they don’t contribute to the transformation:
T 0µ µ = Λµν Λµρ Tν ρ = (Λ−1 )ν µ Λµρ Tν ρ = δρν Tν ρ = Tν ν .
So we see that we can make an invariant object (a scalar) by contracting

all indices. We should mention that in addition to tensors, another possible
transformation possibility is that of a spinor, but we will not explore that
here.
For a point particle, the momentum three-vector is coupled by Lorentz
transformation to the energy8 , P µ = (E/c, p~). Then we see that to make an
8
Why P 0 rather than P0 for the energy? In quantum mechanics we have p~ associated
~ and a partial derivative is naturally covariant. But
with the gradient operator, p~ = −ih̄∇,
the energy is H = ih̄∂/∂t, because Schrödinger arbitrarily chose that sign when he wrote
down his equation. So if we write Pµ = −ih̄∂/∂xµ , we have Pµ = (−E/c, p~).
invariant,
P µ P ν ηµν = p~ 2 − E 2 /c2 = −m2 c2 .
We are going to be interested in infinitesimal Lorentz transformations,
with Λµν = δνµ + Lµν . From the condition (8.5) for Λ to be a Lorentz
transformation, we have

ηµν δρµ + Lµρ (δσν + Lν σ ) = ηρσ + ηµσ Lµρ + ηρν Lν σ + O(2 ) = ηρσ ,
so
ηµσ Lµρ + ηρν Lν σ = Lσρ + Lρσ = 0,
so the condition is that L is antisymmetric when its indices are both lowered.
Thus L is a 4 × 4 antisymmetric real matrix, and has 6 independent parame-
ters, and the infinitesimal Lorentz transformations form a 6 dimensional Lie
algebra.
Now we are ready to discuss symmetries more generally.
8.3 Noether’s Theorem

We have seen in several ways that there is a connection between conserved
quantities and an invariance of the dynamics under some continuous trans-
formations. First we saw, in discrete dynamics, that ignorable coordinates
have conserved conjugate momenta. A coordinate is ignorable if the La-
grangian is unchanged under its translation, φ → φ + c, for arbitrary c. In
particular invariance under translation of all coordinates ~rj → ~rj +~c leads to
conservation of the total momentum. In field theory momentum conservation
is not associated with ignorable field coordinates, but rather to invariance
under translations of the labels, that is, under η` (~r) → η` (~r + ~c), which is
a consequence of ~r being an integration variable, so changing it makes no
difference as long as L has no explicit dependence on it, and as long as we
are integrating over all ~r. For rotations in discrete mechanics we saw that
one component of L ~ could be considered conserved because φ is ignorable,
but the other two components, which are also conserved, must be attributed
to a less obvious symmetry, that of rotations about directions other than z.
Now we will discuss more generally the relationship between symmetries
and conserved quantities, a general connection given in a famous theorem
by Emmy Noether9 . Symmetry means the dynamics is unchanged under
9
This section relies heavily on Goldstein, “Classical Mechanics”, 2nd Ed., section 12-7.
8.3. NOETHER’S THEOREM 243
a change in the values of the degrees of freedom η → η 0 which will in general

depend on those degrees of freedom. In the discrete case the dependence is
commonly on a related set, such as the new x component of the electric field
experienced by a point charge being dependent on all three old components
under a general rotation. In the case of fields, it would in principle be pos-
sible for the new field η`0 (~x) to depend on all the values of all fields at all
points in space, but this is not useful to consider. We might consider only
local symmetries, for which it depends only on the old fields at the same
point, ηk0 (~x), which might for example be the case for considering the spins
of atoms under rotation of all the spins. But if we want to consider the more
fundamental symmetry under a true rotation, for which the atoms are also
rotating, we need to consider a symmetry which relates new fields at x0 to
old fields at x, where the symmetry maps x → x0 as well as transforming
the fields. Then we find that the new field η`0 (~x 0 ) depends on the old fields
at a different point ~x. This is what we have in the case of translation we
just discussed, as well as for rotations and other possible symmetries. These
symmetries may be thought of in a passive sense as having the physics un-
changed when we translate, rotate, or boost (in a relativistic theory) the
coordinate system describing the physics. Then the new coordinates x0µ de-
scribe the same physical point as the old xµ , with a definite map Φ : x 7→ x0
which describes the change of coordinates of space(time). While the physics
at that point is unchanged, its description in terms of fields may be, so we
need to consider a rule for transforming the fields, which gives η` (x0µ ) as a
function of fields at xν .
We will only be concerned with continuous symmetries, which can be gen-
erated by infinitesimal transformations, so we can consider an infinitesimal
transformation with x0µ = xµ + δxµ , along with a rule that gives the change
of η`0 (x0µ ) from the set of ηk (xν ). For a scalar field, like temperature, under a
rotation, we would define the new field
η 0 (x0 ) = η(x),
but more generally the field may also change, in a way that may depend on
other fields,
ηi0 (x0 ) = ηi (x) + δηi (x; ηk (x)).

~ under rotations, because
This is what you would expect for a vector field E
0
the new Ex gets a component from the old Ey .
To say that
xµ → x0µ , ηi → ηi0
is a symmetry means, at the least, that if ηi (x) is a specific solution of the
equations of motion, the set of transformed fields ηi0 (x0 ) is also a solution.
The equations of motion are determined by varying the action, so if the
corresponding actions are equal for each pair of configurations (η(x), η 0 (x0 )),
so are the equations of motion. Notice here that what we are saying is that
the same Lagrangian function applied to the fields ηi0 and integrated over
x ∈ R should give the same action as S = R L(ηi (x)...)d4 x, where R0 is the
0 0
R
range of x0 corresponding to the domain R of x. [Of course our argument

applies also if δxµ = 0, when the transformation does not involve a change
in coordinates. Such symmetries are called internal symmetries, with isospin
an example.]
Actually, the above condition that the actions be unchanged is far more
demanding than is needed to insure that the same equations of motion arise.
The variations required to derive the equations of motion only compare ac-
tions for field configurations unchanged at the boundaries, so if the actions
Z Z
0
S = L(ηi0 (x0 ), ∂µ0 ηi0 (x0 ), x0 )d4 x0 and S = L(ηi (x), ∂µ ηi (x), x)d4 x (8.7)
R0 R
differ by a function only of the values of ηi on the boundary ∂R, they will
give the same equations of motion. Even in quantum mechanics, where the
transition amplitude is given by integrating eiS/h̄ over all configurations, a
change in the action which depends only on surface values is only a phase
change in the amplitude. In classical mechanics we could also have an overall
change multiplying the Lagrangian and the action by a constant c 6= 0, which
would still have extrema for the same values of the fields, but we will not
consider such changes because quantum mechanically they correspond to
changing Planck’s constant.
The Lagrangian density is a given function of the old fields L(ηi , ∂µ ηi , xµ ).
If we substitute in the values of η(x) in terms of η 0 (x0 ) we get a new density
L0 , defined by

∂xν
0
L (ηi0 , ∂µ0 ηi0 , x0µ ) = L(ηi , ∂µ ηi , xµ ) 0µ ,

∂x
where the last factor is the Jacobian of the transformation x → x0 , required

because these are densities, intended to be integrated. This change in func-
tional form for the Lagrangian is not the symmetry transformation, for as
long as x ↔ x0 is one-to-one, the integral is unchanged

Z Z ∂xν
0
L (ηi0 (x0 ), ∂µ0 ηi0 (x0 ), x0 )d4 x0 = L(ηi (x), ∂µ ηi (x), x) 0µ d4 x0

R0 R0 ∂x
Z
= L(ηi (x), ∂µ ηi (x), x)d4 x = S (8.8)
R
regardless of whether this transformation is a symmetry.

We see that the change in the action, δS = S 0 − S, which must vanish
up to surface terms for a symmetry, may be written
R
as an integral over R0
of the variation of the Lagrangian density, δS = R0 δL, with
δL(ηi0 (x0 ), ∂µ0 ηi0 (x0 ), x0 ) := L(ηi0 (x0 ), ∂µ0 ηi0 (x0 ), x0 ) − L0 (ηi0 (x0 ), ∂µ0 ηi0 (x0 ), x0 )

∂xν
= L(ηi0 (x0 ), ∂µ0 ηi0 (x0 ), x0 ) − L(ηi (x), ∂µ ηi (x), x) 0µ .

(8.9)
∂x
Here we have used the first of Eq. (8.7) for S 0 and Eq. (8.8) for S.
Expanding to first order, the Jacobian is
∂x0µ −1
!−1
µ −1 ∂δxµ
det ν

= det (δνµ + ∂ν δx ) = 1 + Tr = 1 − ∂µ δxµ , (8.10)
∂x ∂xν
while
L(ηi0 (x0 ), ∂µ0 ηi0 (x0 ), x0 ) = L(ηi (x), ∂µ ηi (x), x)

∂L ∂L δL
+δηi + δ(∂µ ηi ) + δxµ µ , (8.11)
∂ηi ∂∂µ ηi δx
Thus10
∂L ∂L δL
δL = L∂µ δxµ + δηi + δ(∂µ ηi ) + δxµ µ , (8.12)
∂ηi ∂∂µ ηi δx
and if this is a divergence, δL = ∂µ Λµ for some Λµ , we will have a symmetry.
There are subtleties in this expression11 . The last term involves a deriva-
tive of L with its first two arguments fixed, and as such is not the derivative
with respect to xµ with the functions ηi fixed. For this reason we used a
different symbol, because it is customary to use ∂µ to mean only that xν is
10
This is the equation to use on homework.
11
There is also a summation understood on the repeated i index as well as on the
repeated µ index.
fixed for ν 6= µ, and not to indicate that the other arguments of L are held
fixed. That form of derivative is the stream derivative,

∂L ηi (x), ∂µ ηi (x), x δL ηi (x), ∂µ ηi (x), x ∂L ∂L
= + (∂ν ηi ) + (∂ν ∂µ ηi ) .
∂xν δxν ∂ηi ∂∂µ ηi
Note also that δηi (x) = ηi0 (x0 )−ηi (x) is not simply the variation of the field at
a point, ηi (x) = ηi0 (x) − ηi (x), but includes in addition the change (δxµ )∂µ ηi
due to the displacement of the argument. Thus
δηi (x) = ηi (x) + (δxν )∂ν ηi . (8.13)
The variation with respect to ∂µ0 ηi0 needs to be examined carefully, because
the δ variation effects the coordinates, and therefore in general ∂µ δηi 6= δ∂µ ηi .
By definition,
δ∂µ ηi = ∂ηi0 /∂x0µ |x0 − ∂ηi /∂xµ |x

∂xν ∂ ρ

µ
= [η + (δx )∂ η + η i − ∂ηi /∂x |x
]

i ρ i
∂x0µ ∂xν
x
∂
= − (∂µ δxν ) ∂ν ηi + µ [(δxρ )∂ρ ηi + ηi ]
∂x
= (δxν )∂µ ∂ν ηi + ∂µ ηi (8.14)
where in the last line we used ∂µ ηi = ∂µ ηi , because the variation is
defined at a given point and does commute with ∂µ .
Notice that the δxν terms in (8.13) and (8.14) are precisely what is re-
quired in (8.11) to change the last term to a full stream derivative. Thus
L(ηi0 (x0 ), ∂µ0 ηi0 (x0 ), x0 ) = L(ηi (x), ∂µ ηi (x), x)
∂L ∂L ∂L
+ ηi + ∂µ ηi + δxµ µ , (8.15)
∂ηi ∂∂µ ηi ∂x
where now ∂L/∂xµ means the stream derivative, including the variations of
ηi (x) and its derivative due to the variation δxµ in their arguments.
Inserting this and (8.10) into the expression (8.9) for δL, we see that the
change of action is given by the integral of
∂L ∂L ∂L
δL = (∂µ δxµ ) L + δxµ µ
+ ηi + ∂µ ηi
∂x ∂ηi ∂∂µ ηi
! !
∂ µ ∂L ∂L ∂ ∂L
= δx L + ηi + ηi − (8.16)
∂xµ ∂∂µ ηi ∂ηi ∂xµ ∂∂µ ηi
We will discuss the significance of this in a minute, but first, I want to present
an alternate derivation.
Observe that in the expression (8.7) for S 0 , x0 is a dummy variable and
can be replaced by x, and the difference can be taken at the same x values,
except that the ranges of integration differ. That is,
Z
0
S = L (η 0 (x), ∂µ η 0 (x), x) d4 x,
R0
and this differs from S(η) because

1. the Lagrangian is evaluated with the field η 0 (x) rather than η(x), pro-
ducing a change
!
Z
∂L ∂L
δ1 S = ηi + ∂µ ηi d4 x,
∂ηi ∂∂µ ηi
where the variation with respect to the fields is now in terms of ηi (x) :=
ηi0 (x) − ηi (x), at the same argument x.
2. Change in the region of integration, R0 rather than R,
Z Z
δ2 S = − L(ηi , ∂µ ηi , x) d4 x.
R0 R
If we define dSµ to be an element of the three dimensional surface ∂R of R,

with outward-pointing normal in the direction of dSµ , the difference in the
regions of integration may be written as an integral over the surface,
Z Z Z
− d4 x = δxµ · dSµ .
R0 R ∂R
Thus
Z Z
δ2 S = Lδxµ · dSµ = ∂µ (Lδxµ ) (8.17)
∂R R
by Gauss’ Law (in four dimensions).

As is a difference of two functions at the same values of x, this operator
commutes with partial differentiation, so ∂µ ηi = ∂µ ηi . Using this in the
second term of δ1 S, and using A∂µ B = ∂µ (AB) − B∂µ A, we have
" ! !#
Z
4 ∂L ∂L ∂L
δ1 S = d x ∂µ ηi + ηi − ∂µ
R ∂∂µ ηi ∂ηi ∂∂µ ηi
Thus altogether S 0 − S = δ1 S + δ2 S = R d4 xδL, with δL

R
given by (8.16).
0
This completes our alternate derivation that S − S = R d4 xδL, and Eq.
R
(8.16).
Note that δL is a divergence plus a piece which vanishes if the dynamical
fields obey the equation of motion, quite independent of whether or not the
infinitesimal variation we are considering is a symmetry. As we mentioned,
to be a symmetry, δL must be a divergence for all field configurations, not
just those satisfying the equations of motion, so that the variations over
configurations will give the correct equations of motion.
We have been assuming the variations δx and δη can be treated as in-
finitesimals. This is appropriate for a continuous symmetry, that is, a symme-
try group12 described by a (or several) continuous parameters. For example,
symmetry under displacements xµ → xµ + cµ , where cµ is any arbitrary fixed
4-vector, or rotations through an arbitrary angle θ about a fixed axis. Each
element of such a group lies in a one-parameter subgroup, and can be ob-
tained, in the limit, from an infinite number of applications of an infinitesimal
transformation. If we call the parameter , the infinitesimal variations in xµ
and ηi are given by derivatives of x0 (, x) and η 0 with respect to the parameter
. Thus
0µ 0 0

dx dη (x )
δxµ = ,

δηi = i .

d xν d xν
The divergence must also be first order in , so δL = ∂µ Λµ if we have a
symmetry.
We define the current for the transformation
µ ∂L dηi0 ∂L dx0ν dx0µ

J = − + ∂ν ηi −L + Λµ . (8.18)
∂∂µ ηi d ∂∂µ ηi d d
Recalling that ηi = δηi − (δxν )∂ν ηi , we can rewrite (8.16)

!
∂ ∂L ∂L
δL = µ
δxµ L + δηi − δxν (∂ν ηi )
∂x ∂∂µ ηi ∂∂µ ηi
!
∂L ∂ ∂L
+ ηi − µ
∂ηi ∂x ∂∂µ ηi
12
Symmetries always form a group. Continuous symmetries form a Lie group, whose ele-
ments can be considered exponentials of linear combinations of generators. The generators
form a Lie algebra.
and see that

!
µ ∂ ∂L ∂L ∂
∂µ J = µ
− δηi + ∂ν ηi δxν − µ (Lδxµ ) + δL
∂x ∂∂µ ηi ∂∂µ ηi ∂x
!
∂L ∂ ∂L
= ηi − µ
∂ηi ∂x ∂∂µ ηi
Thus we have
∂µ J µ = 0 for a symmetry, when the fields obey the equations of motion.
This condition is known as current conservation. Associated with each

such current, we may define the charge enclosed in a constant volume V
Z
QV (t) = d3 xJ 0 (~x, t).
V
If we evaluate the time derivative of the charge, we have

d Z
3 0
Z
3
Z
~ · J(~
~ x, t)
∂i J (~x, t) = − d3 x∇
i
X
QV (t) = d x∂0 J (~x, t) ≈ − d x
dt V V i=1,3 V
Z
= − J~ · dS,
~
∂V
where ∂V is the boundary of the volume and dS ~ an element of surface area.

We have used the conservation of the current and Gauss’ Law. If, as can
usually be assumed, the current vanishes as we move infinitely far way from
the region of interest, the surface integral vanishes if we take V to be all
of space, and we find that the total charge is conserved, dQ/dt = 0, in the
same sense that equations of motion are satisfied. The assumption about
asymptotic behavior is not always valid, and we must consider whether we
have grounds for it in particular applications. We will see later that in some
circumstances there are “anomolies” when this assumption is not justified.
It should be mentioned that, because we are only considering infinitesimal
transformations, it is possible to describe the symmetry without relating new
fields at new points to old fields at the old points. We could simply consider
whether the transformation of fields ηi (x) → ηi0 (x) = ηi (x) + ηi (x) is a
symmetry, where ηi (x) = δηi (x) − (δxν )∂ν ηi includes not only the natural
variation δη (that is, zero for a scalar and an orthogonal transformation for a
vector), but also the derivative piece. The derivation then need not consider
change of integration region, but will in general require a nonzero choice of Λ

to compensate. This is not necessary in simple applications using the method
described here. Another disadvantage of starting with is that it obscures
the local nature of the field dependence.
8.3.1 Applications of Noether’s Theorem

Now it is time to use the very powerful though abstract formalism Noether
developed for continuous symmetries to ask about symmetries we expect our
theories to have. At the very least, in this class, we are going to deal only
with theories which are invariant under
• spatial translations, ~x → ~x 0 = ~x + ~c.
• time translations, t → t0 = t + c0 , or in four dimensional notation,

x 0 → x 0 0 = x 0 + c0 .
• rotations, xi → x0 i = Rij xj , with Rij an orthogonal matrix.

P
j
• Lorentz boost transformations.
where Rij is an orthogonal real matrix of determinant 1. The first two of

these together are four dimensional translations,
x µ → x 0 µ = x µ + cµ , (8.19)
and the last two (actually Lorentz transformations already include both) can
be written xµ → x0 µ = ν Λµν xν = Λµν xν , (using the Einstein summa-
P
tion convention), where the matrix Λ is a real matrix satisfying the pseudo-
orthogonality condition
Λµν ηµρ Λρτ = ηντ ,
which is required so that the length of a four-vector is preserved, x0 2 :=

x0 µ x0µ = x2 .
All together, this symmetry group is called the inhomogeneous Lorentz
group, or Poincaré group.
Translation Invariance
First, let us consider the conserved quantities generated by translation in-
variance, for which δxµ = cµ . All fields we will deal with are invariant, or
transform as scalars, under translations, so δη` = 0. From (8.18) the con-
served current is
∂L
Jcµ = cν ∂ν η` − Lcµ = cν Tν µ ,
∂∂µ η`
so the four conserved currents are nothing but the energy-momentum tensor
whose conservation we found in (8.3) directly from the equations of motion.
The conserved charges from this current are
Z
Pµ = Tµ 0 (~x, t)d3 x,
V
with P 0 = H, P j the total momentum for j = 1, 2, 3.
Lorentz Transformations
Now consider an infinitesimal Lorentz transformation, with

x0 µ = Λµν xν = δνµ + Lµν xν , or δxµ = Lµν xν .
The pseudo-orthogonality of Λ requires
ηµν Lµρ δ νσ + ηµν δ µρ Lν σ = 0 = Lσρ + Lρσ ,
so the infinitesimal generator, when its indices are lowered, is antisymmetric.

The fields may transform is many ways. A scalar field13 will have ξ 0 (x0 ) =
ξ(x), with δξ = 0, but a field ξ might transform like a contravariant vector,
δξ µ = Lµν ξ ν , or in an even more complex fashion such as a tensor or a
spinor. Whatever the change in ξ` is, it will be proportional to Lµν , so
δη` = Lµν ∆µν ` , and the current generated is
∂L ρ ∂L
J µ = Lρν Mµρν = − L σ ∆ρσ` + ∂τ ξ` Lτ κ xκ − LLµν xν .
∂∂µ ξ` ∂∂µ ξ`
As Lρν is antisymmetric under ρ ↔ ν, there are six independant infinitesimal
generators which can produce currents. Only the part antisymmetric under
13
Now that our fields may be developing space-time indices, we will change their name
from η to ξ to avoid confusion with ηµν .
ρ ↔ ν in Mµρν enters in this expression, so we take Mµρν and ∆ρν ` to be

antisymmetric under ρ ↔ ν, and thus
∂L ρν 1 ∂L 1
Mµρν = − ∆ `+ [η ρτ (∂τ ξ` )xν −η ντ (∂τ ξ` )xρ ]− L (η µρ xν −η µν xρ ) .
∂∂µ ξ` 2 ∂∂µ ξ` 2
Of course the six currents Mµρν are conserved only if the action is invariant,
which will be the case only if the lagrangian density transforms like a scalar
under lorentz transformations. This will be assured if all the vector indices
of the fields are contracted correctly, one up and one down. Note that part
of the current Mµρν is related to the energy-momentum tensor,
1 ν ρµ ∂L ρν
Mµρν = (x T − xρ T νµ ) − ∆ `.
2 ∂∂µ ξ`
As T ρµ is a 4-current of the 4-momentum, we see that the first term is the

4-current of the four dimensional version of orbital angular momentum. The
last term is then the contribution of the spin to the current of the total
angular momentum.
8.4 Examples of Relativistic Fields

As we mentioned, Noether’s theorem will generate conserved generators of
Lorentz transformations if the lagrangian density transforms as a scalar under
Poincaré transformations. For convenience we will take c = 1. The easiest
example to consider is a single scalar field, with what is called the Klein-
Gordon Lagrangian:
!
1 ∂φ ∂φ 1 2
~ 2 − m2 φ2 .
L= −η µν µ ν − m2 φ2 = φ̇ − (∇φ)
2 ∂x ∂x 2
∂L
The canonical momentum field is π = ∂ φ̇
= φ̇, and
∂L 1
~ 2 + m2 φ2 .
Tµ ν = φ,µ − Lδµν = −φ,ν φ,µ + δµν −φ̇2 + (∇φ)
∂φ,ν 2
The Hamiltonian is
Z
1Z h 2 i
~ 2 + m2 φ2 d3 x,
H= T 00 d3 x = φ̇ + (∇φ)
2
8.4. EXAMPLES OF RELATIVISTIC FIELDS 253
the three-momentum is
Z Z Z
(P~ )j = 0 3
Tj d x = ~ j d3 x
φ̇(∇φ) or P~ = ~ d3 x.
π ∇φ
The equation of motion (8.1) is

η µν ∂µ ∂ν − m2 φ = 0, or φ̈ − ∇2 φ + m2 φ = 0,
which has solutions which are waves, decomposable into plane waves φ(~x, t) ∝
~
ei(k·~x−ωt) , with ω 2 = k 2 + m2 . Identifying k with the momentum and ω with
the energy, as one would in quantum mechanics (with h̄ = 1) gives the
relation one would expect for a particle of mass m: E 2 = p2 + m2 (as we
have set c = 1 also. E 2 = p2 c2 + m2 c4 if you want to put c back in).
The only relativistic field we are familiar with from classical (non-quantum)
mechanics is the electromagnetic field. We are familiar with E ~ and B~ as fields
defined throughout space and also functions of time. But E ~ and B ~ satisfy
constraint equations. Maxwell’s equations (in free space and SI units) are
∇~ ·E
~ = ρ/0 (8.20)
∇~ ·B
~ = 0 (8.21)
~
~ ×E
∇ ~ + 1 ∂B = 0 (8.22)
c ∂t
~
~ ×B
∇ ~ − 1 ∂ E = µ0~j (8.23)
c2 ∂t
Notice that (8.20) and (8.21) are not equations of motion, as they do not
involve time derivatives. Instead they are equations of constraint, best im-
plemented by solving them in terms of independent degrees of freedom. As
we saw in section (2.7), these equations allow us to consider E ~ and B~ as
~ x, t) and the electrostatic po-
derivatives of the magnetic vector potential A(~
~ =∇
tential φ(~x, t). Then we have B ~ × A,
~ and E~ = −∇φ~ − 1 ∂ A~ . We also saw
c ∂t
that the interaction of these fields with a charged particle could be given in
terms of a potential
µ
~ r, t) = − q dx Aµ ,

U (~r, ~v ) = q φ(r, t) − (~v /c) · A(~
c dt
0
if I take A0 = −φ = −A . This is the first step in writing electromagnetism
in relativistic notation14 .
14
Note U is not an invariant, nor should it be, as it is part of the energy. Therefore it
is expected that it should transform like d/dt of a scalar.
The connection to E ~ and B ~ is best understood if we define a 1-form from

A and its exterior derivative:
∂Aµ ν 1
A := Aµ (xν )dxµ , F := dA = dx ∧ dx µ
= Fµν dxµ ∧ dxν .
∂xν 2
Examining the components, we have
1 ∂φ
F0j = Ȧj + = −Ej = −Fj0 , (8.24)
c ∂xj
∂Aj ∂Ai X ~ × A)
~ k=
X
Fij = − = ijk (∇ ijk Bk . (8.25)
∂xi ∂xj k k
As F := dA we know that dF = 0. dF is a 3-form,

1 1
dF = (dF)µνρ dxµ dxν dxρ = µνρσ V σ dxµ dxν dxρ ,
6 6
where V σ = (−1/6)µνρσ (dF)µνρ . As we saw in three dimensions in section
(6.5), a k-form ω in D dimensions can be associated not only with an anti-
symmetric tensor of rank k, but also with one of rank D − k. That tensor is
associated with a (D−k)-form, called the Hodge dual15 of ω, written ∗ω.
On the basis vectors we define
1
∗(dxµ1 ∧ · · · dxµk ) = µ1 ···µk ν1 ···νD−k dxν1 ∧ · · · dxνD−k .
(D − k)!
In particular, if ω is a 2-form in four dimensional Minkowski space,

1
ω = ωµν dxµ ∧ dxν
2
1 1 µν

∗ω = ωµν dxρ ∧ dxσ
2 2 ρσ
1
dω = ωµν,ρ dxρ ∧ dxµ ∧ dxν
2
1 µνρ
∗ dω = σ ωµν,ρ dxσ
2
1 µν

κρσ
∗d∗ω = τ ωµν,κ dxτ = ωτ κ,κ dxτ .
4 ρσ
15
Warning: the dual of the dual of a k-form ω is ±ω, with the sign depending on D and
k.
In particular for our 2-form F, the fact that dF = 0, and thus ∗dF = 0 tells
us the vector V σ = (−1/6)µνρσ Fνρ,µ = 0. The σ = 0 component of this
1 1 ~ · B,
~
0 = 3V 0 = ijk Fjk,i = ijk jk` B`,i = δi` B`,i = ∇
2 2
giving us the constraint equation (8.21). For the spatial component,
3 3
1 X 1 X
0 = −3V i = µνρi Fνρ,µ = jki Fjk,0 + 2jki Fk0,j
2 µ,ν,ρ=0 2 j,k=1
3
1 X 1

= jki jk` Ḃ` + 2jki ∂j Ek
2 j,k=1 c
1 ~˙

= B+∇~ ×E
~ ,
c i
which gives us the constraint (8.22). So the two constraint equations among
Maxwell’s four are
dF = 0. (8.26)
What are the two dynamical equations? If we evaluate ∗ d ∗ F =
Fµν,ν dxµ =: Vµ dxµ , we see the zeroth component contains only F0j = −Ej ,
with V0 = j ∂F0j /∂xj = −∇
P ~ · E,
~ which Maxwell tells us is −ρ/0 . The spa-

P ~˙ + ∇
tial component is Vi = Fi0,0 + j Fij,j = Ėj /c + ijk ∂j Bk = E/c ~ ×B ~
i
which Maxwell tells us is (modulo c) µ0 (~j)i . This encourages us to define the
4-vector J µ = (ρ, ~j) and its accompanying 1-form J = Jµ dxµ , and to write
the two dynamical equations as
∗ d ∗ F = −J or d ∗ F = ∗J. (8.27)
How should we write the lagrangian density for the electromagnetic fields?
As the dynamics is determined by the action, the integral of L over four-
dimensional space-time, we should expect L to be essentially a 4-form, which
needs to be made out of the 2-form F. Our first idea might be to try F ∧ F,
which is a 4-form, but unfortunately it is a closed 4-form, for d(F ∧ F) =
(dF) ∧ F + (F) ∧ (dF), and dF = ddA = 0. Because we are working on a
contractable space, F ∧ F is thereform exact, and an exact form is useless
as a lagrangian density because dωd4 x = S ω which depends only on the
R R
boundaries, both in space and time, but this is exactly where variations of
the dynamical degrees of freedom are kept fixed in determing the variation
of the action.
There is another 2-form available, however, ∗ F , so we might consider
1 1 1 1
Ldtd3 x = − F ∧ ∗F = − · Fµν dxµ ∧ dxν · κλρσ Fκλ ∧ dxρ ∧ dxσ
2 2 2 4
1 κλ µνρσ
= − ρσ Fµν Fκλ dx0 ∧ dx1 ∧ dx2 ∧ dx3
16
c c
L = − κλρσ µνρσ Fµν Fκλ = − (F µν Fµν − F µν Fνµ )
16 8
c µν c 1 1
= − F Fµν = − (−F0j + ijk Bk ij` B` = (E 2 − B 2 )
2
4 2 2 2
Exercises
8.1 The Lagrangian density for the electromagnetic field in vacuum may be writ-
ten
1 ~ 2 ~ 2
L= E −B ,
2
where the dynamical degrees of freedom are not E ~ and B, ~ but rather A
~ and φ,
where
~ = ∇
B ~ ×A
~ − 1A
~ = −∇φ
E ~˙
c
a) Find the canonical momenta, and comment on what seems unusual about one
of the answers.
b) Find the Lagrange Equations for the system. Relate to known equations for
the electromagnetic field.
8.2 A tensor transforms properly under Lorentz transformations as specified by

equation (8.6), with each index being contracted with a suitable L·· or L·· as
appropriate.
(a) The Minkowsky metric ηµν should then be transformed into a new tensor by
contracting with two L·· ’s. Show that the new tensor η 0 is nonetheless the same as
η. [That is, each element still has the same value].
(b) The Levi-Civita symbol in one reference frame is defined by 0123 = 1 and
µνρσ is antisymmetric under any interchange of two indices. Being a four-index
contravariant tensor, it will transform with four L·· ’s. Show that the transformed
tensor still has the same values under proper16 Thus both ηµν and µνρσ are both
invariant and transform co- or contra-variantly.
ρ ...ρ µ ρ ...ρ
(c) Show that if T 1 j σ1 ...σk transforms correctly, the tensor T 1 jµ σ1 ...σk :=
ρ1 ...ρj ν
ηµν T σ1 ...σk transforms correctly as well.
(d) Show that if two indices, one upper and one lower, are contracted, that is, set
equal and summed over, the resulting object transforms as if those indices were
µ ...µ νµ ...µ
not there. That is, W 1 jρ1 ...ρk := T 1 jνρ1 ...ρk transforms correctly.
16
Proper Lorentz transformations are those that can be generated continuously from
the identity. That is, they exclude transformations that reverse the direction of time or
convert a right-handed coordinate system to a left-handed one.
Appendix A
Appendices
A.1 ijk and cross products

A.1.1 Vector Operations: δij and ijk
These are some notes on the use of the antisymmetric symbol ijk for ex-
pressing cross products. This is an extremely powerful tool for manipulating
cross products and their generalizations in higher dimensions, and although
many low level courses avoid the use of , I think this is a mistake and I want
you to become proficient with it.
In a cartesian coordinate system a vector V~ has components Vi along each
of the three orthonormal basis vectors êi , or V~ = i Vi êi . The dot product
P
of two vectors, A~ · B,
~ is bilinear and can therefore be written as
~·B
~ = (
X X
A Ai êi ) · Bj êj (A.1)
i j
XX
= Ai Bj êi · êj (A.2)
i j
XX
= Ai Bj δij , (A.3)
i j
where the Kronecker delta δij is defined to be 1 if i = j and 0 otherwise.

As the basis vectors êk are orthonormal, i.e. orthogonal to each other and of
unit length, we have êi · êj = δij .
Doing a sum over an index j of an expression involving a δij is very simple,
because the only term in the sum which contributes is the one with j = i.
P
Thus j F (i, j)δij = F (i, i), which is to say, one just replaces j with i in all
259
260 APPENDIX A. APPENDICES
the other factors, and drops the δij and the summation over j. So we have
A~·B~ = Pi Ai Bi , the standard expression for the dot product1
We now consider the cross product of two vectors, A ~ × B,~ which is also
a bilinear expression, so we must have A ~×B ~ = ( i Ai êi ) × (Pj Bj êj ) =
P
j Ai Bj (êi × êj ). The cross product êi × êj is a vector, which can therefore
P P
i
be written as V~ = k Vk êk . But the vector result depends also on the two
P
input vectors, so the coefficients Vk really depend on i and j as well. Define

them to be ijk , so X
êi × êj = kij êk .
k
It is easy to evaluate the 27 coefficients kij , because the cross product of two
orthogonal unit vectors is a unit vector orthogonal to both of them. Thus
ê1 × ê2 = ê3 , so 312 = 1 and k12 = 0 if k = 1 or 2. Applying the same
argument to ê2 × ê3 and ê3 × ê1 , and using the antisymmetry of the cross
product, A ~×B ~ = −B ~ × A,
~ we see that
123 = 231 = 312 = 1; 132 = 213 = 321 = −1,
and ijk = 0 for all other values of the indices, i.e. ijk = 0 whenever any
two of the indices are equal. Note that changes sign not only when the last
two indices are interchanged (a consequence of the antisymmetry of the cross
product), but whenever any two of its indices are interchanged. Thus ijk is
zero unless (1, 2, 3) → (i, j, k) is a permutation, and is equal to the sign of
the permutation if it exists.
Now that we have an expression for êi × êj , we can evaluate
~×B
~ =
XX XXX
A Ai Bj (êi × êj ) = kij Ai Bj êk . (A.4)
i j i j k
Much of the usefulness of expressing cross products in terms of ’s comes

from the identity X
kij k`m = δi` δjm − δim δj` , (A.5)
k
which can be shown as follows. To get a contribution to the sum, k must be

different from the unequal indices i and j, and also different from ` and m.
Thus we get 0 unless the pair (i, j) and the pair (`, m) are the same pair of
1
Note that this only holds because we have expressed our vectors in terms of orthonor-
mal basis vectors.
A.1. IJK AND CROSS PRODUCTS 261
different indices. There are only two ways that can happen, as given by the
two terms, and we only need to verify the coefficients. If i = ` and j = m,
the two ’s are equal and the square is 1, so the first term has the proper
coefficient of 1. The second term differs by one transposition of two indices
on one epsilon, so it must have the opposite sign.
We now turn to some applications. Let us first evaluate
~ · (B
~ × C)
~ =
X X X
A Ai ijk Bj Ck = ijk Ai Bj Ck . (A.6)
i jk ijk
Note that A~ · (B
~ × C)
~ is, up to sign, the volume of the parallelopiped formed
by the vectors A,~ B,
~ and C.~ From the fact that the changes sign under
transpositions of any two indices, we see that the same is true for transposing
the vectors, so that
~ · (B
A ~ × C)
~ = −A
~ · (C
~ × B)
~ = B~ · (C
~ × A)
~ = −B
~ · (A
~ × C)
~
~ · (A
= C ~ × B)
~ = −C
~ · (B
~ × A).
~
Now consider V~ = A
~ × (B
~ × C).
~ Using our formulas,
V~ = ~ × C)
~ j=
X X X
kij êk Ai (B kij êk Ai jlm Bl Cm .
ijk ijk lm
Notice that the sum on j involves only the two epsilons, and we can use
X X
kij jlm = jki jlm = δkl δim − δkm δil .
j j
Thus
X X X
Vk = ( kij jlm )Ai Bl Cm = (δkl δim − δkm δil )Ai Bl Cm
ilm j ilm
X X
= δkl δim Ai Bl Cm − δkm δil Ai Bl Cm
ilm ilm
~·C
~ Bk − A
~·B
~ Ck ,
X X
= Ai Bk Ci − Ai Bi Ck = A
i i
so
~ × (B
A ~ × C)
~ =B
~A~·C
~ −C
~A~ · B.
~ (A.7)
This is sometimes known as the bac-cab formula.
Exercise: Using (A.5) for the manipulation of cross products, show

that
~ × B)
(A ~ · (C
~ × D)
~ =A ~·C~B~ ·D
~ −A ~·D ~B~ · C.
~
The determinant of a matrix can be defined using the symbol. For a
3 × 3 matrix A,
X X
det A = ijk A1i A2j A3k = ijk Ai1 Aj2 Ak3 .
ijk ijk
From the second definition, we see that the determinant is the volume of the
parallelopiped formed from the images under the linear map A of the three
unit vectors êi , as
(Aê1 ) · ((Aê2 ) × (Aê3 )) = det A.
In higher dimensions, the cross product is not a vector, but there is a gen-
eralization of which remains very useful. In an n-dimensional space, i1 i2 ...in
has n indices and is defined as the sign of the permutation (1, 2, . . . , n) →
(i1 i2 . . . in ), if the indices are all unequal, and zero otherwise. The analog of
(A.5) has (n − 1)! terms from all the permutations of the unsummed indices
on the second . The determinant of an n × n matrix is defined as
X n
Y
det A = i1 i2 ...in Ap,ip .
i1 ,...,in p=1
A.2 The gradient operator

We can define the gradient operator
~ =
X ∂
∇ êi . (A.8)
i ∂xi
While this looks like an ordinary vector, the coefficients are not numbers Vi
but are operators, which do not commute with functions of the coordinates
xi . We can still write out the components straightforwardly, but we must be
careful to keep the order of the operators and the fields correct.
The gradient of a scalar field Φ(~r) is simply evaluated by distributing the
gradient operator
~ =(
X ∂ X ∂Φ
∇Φ êi )Φ(~r) = êi . (A.9)
i ∂xi i ∂xi
A.2. THE GRADIENT OPERATOR 263
∂AB ∂A ∂B
Because the individual components obey the Leibnitz rule ∂xi
= ∂xi
B+A ∂xi
,
so does the gradient, so if A and B are scalar fields,
~
∇AB ~
= (∇A)B ~
+ A∇B. (A.10)
The general application of the gradient operator ∇~ to a vector A
~ gives an
object with coefficients with two indices, a tensor. Some parts of this tensor,
however, can be simplified. The first (which is the trace of the tensor) is
called the divergence of the vector, written and defined by
~ ·A
~ = (
X ∂ X X ∂Bj X ∂Bj
∇ êi ) · ( êj Bj ) = êi · êj = δij
i ∂xi j ij ∂xi ij ∂xi
X ∂Bi
= . (A.11)
i ∂xi
In asking about Leibnitz’ rule, we must remember to apply the divergence
operator only to vectors. One possibility is to apply it to the vector V~ = ΦA,
~
with components Vi = ΦAi . Thus
~ · (ΦA)
~ = ∂(ΦAi ) X ∂Φ
X X ∂Ai
∇ = Ai + Φ
i ∂xi i ∂xi i ∂xi
~
= (∇Φ) ·A~ + Φ∇~ · A.
~ (A.12)
We could also apply the divergence to the cross product of two vectors,
~ × B)
∂(A ~ i X ∂(Pjk ijk Aj Bk ) X ∂(Aj Bk )
~ · (A
~ × B)
~ =
X
∇ = = ijk
i ∂xi i ∂xi ijk ∂xi
X ∂Aj X ∂Bk
= ijk Bk + ijk Aj . (A.13)
ijk ∂xi ijk ∂x i
~ and B.
This is expressible in terms of the curls of A ~
The curl is like a cross product with the first vector replaced by the
differential operator, so we may write the i’th component as
~ × A)
~ i=
X ∂
(∇ ijk Ak . (A.14)
jk ∂xj
We see that the last expression in (A.13) is

X X ∂Aj X X ∂Bk ~ × A)
~ ·B
~ −A
~ · (∇
~ × B).
~ (A.15)
( kij )Bk − Aj jik = (∇
k ij ∂xi j ik ∂x i
where the sign which changed did so due to the transpositions in the indices
on the , which we have done in order to put things in the form of the
definition of the curl. Thus
~ · (A
∇ ~ × B)
~ = (∇
~ × A)
~ ·B
~ −A
~ · (∇
~ × B).
~ (A.16)
Vector algebra identities apply to the curl as to any ordinary vector,

except that one must be careful not to change, by reordering, what the
differential operators act on. In particular, Eq. A.7 is
~
∂B
~ × (∇
~ × B)
~ = ~ i−
X X
A Ai ∇B Ai . (A.17)
i i ∂xi
A.3 Gradient in Spherical Coordinates

The transformation between Cartesian and spherical coordinates is given by
1
r= (x2 + y 2 + z 2 ) 2 x= r sin θ cos φ
θ= cos−1 (z/r) y= r sin θ sin φ
−1
φ= tan (y/x) z= r cos θ
The basis vectors {êr , êθ , êφ } at the point (r, θ, φ) are given in terms of
the cartesian basis vectors by
êr = sin θ cos φ êx + sin θ sin φ êy + cos θ êz

êθ = cos θ cos φ êx + cos θ sin φ êy − sin θ êz
êφ = − sin φ êx + cos φ êy .
By the chain rule, if we have two sets of coordinates, say si and ci , and we
know the form a function f (si ) and the dependence of si on cj , we can find
∂f P ∂f ∂sj
∂ci
= j ∂sj ∂ci , where |s means hold the other s’s fixed while varying
s c
sj . In our case, the sj are the spherical coordinates r, θ, φ, while the ci are
x, y, z.
Thus
 
~ ∂f ∂r ∂f ∂θ ∂f ∂φ 
∇f =  + + êx
∂r θφ ∂x yz ∂θ rφ ∂x yz ∂φ rθ ∂x yz
 
∂f ∂r ∂f ∂θ ∂f ∂φ 
+  + + êy (A.18)
∂r θφ ∂y xz ∂θ rφ ∂y xz ∂φ rθ ∂y xz
A.3. GRADIENT IN SPHERICAL COORDINATES 265
 
∂f ∂r ∂f ∂θ ∂f ∂φ 
+  + + êz
∂r θφ ∂z xy ∂θ rφ ∂z xy ∂φ rθ ∂z xy
We will need all the partial derivatives ∂s

∂ci
j
. From r2 = x2 + y 2 + z 2 we see
that
∂r x ∂r y ∂r z
= = = .
∂x yz r ∂y xz r ∂z xy r
√
From cos θ = z/r = z/ x2 + y 2 + z 2 ,

∂θ −zx −r2 cos θ sin θ cos φ
− sin θ = =
∂x yz (x2 + y 2 + z 2 )3/2 r3
so
∂θ cos θ cos φ
= .
∂x yz
r
Similarly,
∂θ cos θ sin φ
= .
∂y xz
r
There is an extra term when differentiating w.r.t. z, from the numerator, so

∂θ 1 z2 1 − cos2 θ
− sin θ = − 3 = = r−1 sin2 θ,
∂z xy r r
r
so
∂θ
= −r −1 sin θ.
∂z xy
Finally, the derivatives of φ can easily be found from differentiating tan φ =

y/x. Using differentials,
dy ydx dy dx sin θ sin φ

sec2 φdφ = − 2 = −
x x r sin θ cos φ r sin2 θ cos2 φ
so
∂φ 1 sin φ ∂φ 1 cos φ ∂φ
=− = = 0.
∂x yz r sin θ ∂y xz r sin θ ∂z xy
Now we are ready to plug this all into (A.18). Grouping together the
terms involving each of the three partial derivatives, we find

∂f x y z

~
∇f = êx + êy + êz
∂r θφ r r r
!
∂f cos θ cos φ cos θ sin φ sin θ
+ êx + êy − êz
∂θ rφ r r r
!
∂f 1 sin φ 1 cos φ
+ − êx + êy
∂φ rθ r sin θ r sin θ

∂f 1 ∂f 1 ∂f
= êr + êθ + êφ
∂r θφ r ∂θ rφ r sin θ ∂φ rθ
Thus we have derived the form for the gradient in spherical coordinates.
Bibliography
[1] Howard Anton. Elementary Linear Algebra. John Wiley, New York,
1973. QA251.A57 ISBN 0-471-03247-6.
[2] V. I. Arnol’d. Math. Methods of Classical Mechanics. Springer-Verlag,
New York, 1984. QA805.A6813.
[3] R. Creighton Buck. Advanced Calculus. McGraw-Hill, 1956.
[4] Tohru Eguchi, Peter B. Gilkey, and Andrew J. Hanson. Gravitation,
gauge theories and differential geometry. Physics Reports, 66, No. 6:213–
393, 1980. Doubtless there are more appropriate references, but I learned
this here.
[5] A. P. French. Special Relativity. W. W. Norton, New York, 1968. SBN
393-09793-5.
[6] Herbert Goldstein. Classical Mechanics. Addison-Wesley, Reading, Mas-
sachusetts, second edition, 1980. QA805.G6.
[7] I. S. Gradshtein and I. M. Ryzhik. Table of integrals, series, and prod-
ucts. Academic Press, New York, 1965. QA55.R943.
[8] Jorge V. Josè and Eugene J. Saletan. Classical Mechanics, a Comtem-
porary Approach. Cambridge University Press, 1998. QC805.J73 ISBN
0-521-63636-1.
[9] L Landau and Lifschitz. Mechanics. Pergamon Press, Oxford, 2nd
edition, 1969. QA805.L283/1976.
[10] Jerry B. Marion and Stephen T. Thornton. Classical Dynam-
ics. Harcourt Brace Jovanovich, San Diego, 3rd ed edition, 1988.
QA845.M38/1988.
267
268 BIBLIOGRAPHY
[11] R. A. Matzner and L. C Shepley. Classical Mechanics. Prentice Hall,

Englewood Cliffs, NJ, 91. QC125.2.M37 ISBN 0-13-137076-6.
[12] L. Prandtl and O. G. Tietjens. Applied Hydro- and AeroMechanics.

Dover Publications, New York, NY, 1934. QA911.T564F.
[13] Morris Edgar Rose. Elementary Theory of Angular Momentum. Wiley,

New York, 1957. QC174.1.R7.
[14] Walter Rudin. Principles of Mathematical Analysis. McGraw-Hill, New

York, 1953.
[15] James H. Smith. Introduction to Special Relativity. W. A. Benjamin,

Inc, New York, 1965.
[16] M. Spivak. Differential Geometry, volume 1. Publish or Perish, Inc.,

1970.
[17] Keith R. Symon. Mechanics. Addison-Wesley, Reading, Massachusetts,

3rd edition, 1971. QC125.S98/1971 ISBN 0-201-07392-7.
[18] John R. Taylor. Classical Mechanics. University Science Books, Sausal-

ito, California, 2005. QC125.2.T39 2004 ISBN 1-891389-22-X.
[19] O. G. Tietjens. Fundamentals of Hydro- and Aero-Mechanics. Maple

Press, York, PA, 1934. QA911.T564F.
[20] Eugene Wigner. Group Theory and Its Applications to Quantum Me-
chanics of Atomic Spectra. Academic Press, New York, 1959.
Index
O(N ), 89 composition, 89
1-forms, 154 conditionally periodic motion, 199
configuration space, 5, 44
accoustic modes, 139 conformal, 122
action, 45 conservative force, 7
action-angle, 192 conserved, 5
active, 88 conserved quantity, 6
adiabatic invariant, 222 continuum limit, 136
angular momentum, 8 contravariant, 240
antisymmetric, 95 cotangent bundle, 20
apogee, 72 covariant, 240
apsidal angle, 75 current, 248
associative, 91 current conservation, 249
attractor, 27
autonomous, 22 D’Alembert’s Principle, 40
deviatoric part, 144
bac-cab, 76, 98, 261 diffeomorphism, 183
Bernoulli’s equation, 150 differential cross section, 81
body cone, 108 differential k-form, 167
body coordinates, 86 Dirac delta function, 140
Born-Oppenheimer, 126 dynamical balancing, 105
bulk modulus, 145 dynamical systems, 22
canonical transformation, 160 eccentricity, 72

canonical variables, 160 electrostatic potential, 56
center of mass, 9 elliptic fixed point, 30
centrifugal barrier, 68 elliptic integral, 33
Chandler wobble, 111 Emmy Noether, 242
closed, 171 enthalpy, 156
closed under, 91 equation of continuity, 149
complex structure on phase space, 157 ergodicity, 201
269
270 INDEX
Euler’s equation, 150 identity, 91

Euler’s equations, 107 ignorable coordinate, 48
Euler’s Theorem, 92 impact parameter, 79
exact, 154, 171 independent frequencies, 200
extended phase space, 6, 177 inertia ellipsoid, 109
exterior derivative, 170 inertia tensor, 98
exterior product, 168 inertial coordinates, 86
integrable system, 197
fixed points, 26 integral of the motion, 197
form invariant, 185 intrinsic spin, 186
free energy, 156 invariant plane, 109
functional, 45 invariant sets of states, 26
invariant torus, 203
gauge invariance, 57 inverse, 87, 91
gauge transformation, 57 involution, 197
generalized force, 17
generalized Hooke’s law, 145 Jacobi identity, 164
generalized momentum, 48
kinetic energy, 7
generating function of the canonical
Knonecker delta, 87
transformation, 179
generator, 95 lab angle, 108
Gibbs free energy, 156 Lagrangian, 36
glory scattering, 82 Lagrangian density, 139, 229
group, 91 Laplace-Runge-Lenz vector, 76
group multiplication, 91 Legendre transformation, 155
Levi-Civita, 95
Hamilton’s characteristic function, 188 Lie algebra, 166
Hamilton’s equations of motion, 52 line of nodes, 112
Hamilton’s principal function, 188 Liouville’s theorem, 166
Hamilton-Jacobi, 188 logistic equation, 22
Hamiltonian, 51, 155
Hamiltonian density, 235 magnetic vector potential, 56
hermitean conjugate, 118 major axis, 72
herpolhode, 110 mass matrix, 19, 53
Hodge dual, 254 material description, 147
holonomic constraints, 13 mean motion Hamiltonian, 223
Hooke’s constant, 145 minor axis, 72
hyperbolic fixed point, 28 moment of inertia, 100
INDEX 271
momentum, 5 rotation, 89
rotation about an axis, 89
natural symplectic structure, 176
non-degenerate, 177 scattering angle, 79
nondegenerate system, 200 semi-major axis, 72
normal modes, 124 separatrix, 30
nutation, 117 sign of the permutation, 168
similar, 118
oblate, 111 similarity transformation, 118
optical modes, 139 spatial description, 148
orbit, 5 stable, 26, 29
orbital angular momentum, 186 Stokes’ Theorem, 175
order of the dynamical system, 22 strain tensor, 144
orthogonal, 87 stream derivative, 37, 148
stress tensor, 143
parallel axis theorem, 100
stress-energy, 233
passive, 87
strongly stable, 27
perigee, 72
structurally stable, 26
period, 23
subgroup, 92
periodic, 23
summation convention, 164
perpendicular axis theorem, 103
surface force, 142
phase curve, 21, 26
symplectic, 161
phase point, 21, 25
symplectic structure, 22, 157
phase space, 6, 20
phase trajectory, 179 terminating motion, 27
Poincaré’s Lemma, 172 torque, 8
point transformation, 38, 160 total external force, 10
Poisson bracket, 163 total mass, 9
Poisson’s theorem, 166 total momentum, 9
polhode, 109 trajectory, 5
potential energy, 7 transpose, 87, 118
precessing, 117 turning point, 69, 70
precession of the perihelion, 73
principal axes, 104 unimodular, 91
pseudovector, 96 unperturbed system, 206
unstable, 28
rainbow scattering, 81
reduced mass, 66 velocity function, 21
relation among the frequencies, 200 vibrations, 127
272 INDEX
virtual displacement, 39
viscosity, 149
volume forces, 142
wave numbers, 138

wedge product, 168
work, 7
Young’s modulus, 145

Joel A Shapiro Classical Mechanic

Uploaded by

Copyright:

Available Formats

Joel A Shapiro Classical Mechanic

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Joel A Shapiro Classical Mechanic

Uploaded by

Copyright:

Available Formats

What are the main topics covered in the document?

What are the main topics covered in the document?

What are generalized coordinates and how are they used?

What are generalized coordinates and how are they used?

Classical Mechanics

Copyright C 1994-2010 by Joel A. Shapiro

This is a preliminary version of the book, not to be considered a fully

The author welcomes corrections, comments, and criticism.

2 Lagrange’s and Hamilton’s Equations 35

3 Two Body Central Forces 65

4 Rigid Body Motion 85

5 Small Oscillations 123

6 Hamilton’s Equations 153

6.4 Poisson Brackets . . . . . . . . . . . . . . . . . . . . . . . . . 162

7 Perturbation Theory 197

8 Field Theory 229

• multipole moments of the sun

1.2 Single Particle Kinematics

1.2.1 Motion in configuration space

A. These displacements do form a vector space, and for a three-dimensional

So there is an explicit dependence on t Finally, the force might depend on

However the force is determined, it determines the motion of the particle

1.2.2 Conserved Quantities

rather than as producing an acceleration F~ = m~a. In focusing on the con-

where ~r0 is some arbitrary reference position and U (~r0 ) is an arbitrarily

The condition for the path inte-

The value of the concept of potential energy is that it enables finding

1.3 Systems of Particles

1.3.1 External and internal forces

Let us define F~ E = i F~iE to be the total external force. If Newton’s

Third Law holds,

The total external torque is naturally defined as

the components of ~v which are perpendicular to ~r are determined in terms

We have defined ~p 0i = mi~v 0i , the momentum in the center of mass reference

The constrained subspace of

As the disk rolls through an

1.3.3 Generalized Coordinates for Unconstrained Sys-

~ri = ~ri (q1 , ..., q3n , t).

Notice that this is a relationship between different descriptions of the same

A virtual displacement, with δt = 0, is the kind of variation we need to

∂U ({x}) X ∂U ({x({q})}) ∂qj X ∂qj

is known as the generalized force. We may think of Ũ (q, t) := U (x(q), t)

1.3.4 Kinetic energy in generalized coordinates

Plugging this into the kinetic energy, we see that

Let’s work a simple example: we

1.4 Phase Space

1.4.1 Dynamical Systems

dimensionality is no hindrance for formal developments. Also, it is sometimes

A plot of this field for the undamped (α = 0) and damped oscillators is

As an example, we show the

• It is far more likely to provide insight into the qualitative features of

• Numerical solutions have subtle numerical problems in that they are

1.4.2 Phase Space Flows

the case V (η) = cη 2 , a system starting at η0 at t = 0 has a motion given by

with a nonsingular matrix M . The stability of the flow will be determined

~η˙ = M · δ~η = x(λ~e + λ∗~e ∗ ) + iy(λ~e − λ∗~e ∗ )

Strongly stable Strongly stable Unstable fixed Hyperbolic fixed

Effect of conserved quantities on the flow

For the case of a single particle in a potential, the total energy E =