Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
75 views

Lecture Notes For PC2132 (Classical Mechanics) : Disclaimer

This document provides lecture notes for a classical mechanics course. It includes notation that will be used, such as symbols for vectors, scalars, matrices and other physical quantities. It also covers kinematics topics like trajectories, velocity, acceleration and how to describe motion using position vectors and derivatives with respect to time and path length.

Uploaded by

Lex Francis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Lecture Notes For PC2132 (Classical Mechanics) : Disclaimer

This document provides lecture notes for a classical mechanics course. It includes notation that will be used, such as symbols for vectors, scalars, matrices and other physical quantities. It also covers kinematics topics like trajectories, velocity, acceleration and how to describe motion using position vectors and derivatives with respect to time and path length.

Uploaded by

Lex Francis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 162

Version: 11th Nov, 2017 11:13; svn-65

Lecture notes for PC2132


(Classical Mechanics)
For AY2017/18, Semester 1
Lecturer: Christian Kurtsiefer, Tutors: Adrian Nugraha Utama, Shi
Yicheng, Jiaan Qi, Tan Zong Sheng

Disclaimer
These notes are by no means a suitable replacement for a proper textbook on
classical mechanics, nor should it replace your own notes. It is just a best-effort
affair with the intention to be useful.
This is a document in process – it hopefully does not contain too many mis-
takes, but please contact us if you feel that you spotted one.

1
Version: 11th Nov, 2017 11:13; svn-65

Notation
There is an attempt to do use consistent notations through this lecture. Below
is a list what symbols typically refer to, unless they are referenced to otherwise.

a a vector; often the acceleration


e a unit vector, i.e., a vector of length 1 (e · e = 1)
dA a vector differential representing an oriented surface area element
ds a vector differential representing a line element
F a vector representing a force
H a scalar representing the Hamilton function of a system
I the tensor for the inertia of a rigid body
L a scalar representing the Lagrange function
L a vector representing an angular momentum
N a vector representing a torque
m a scalar representing the mass of a particle
m mass matrix in a system of coupled masses
M the total mass of a system
p a vector representing the momentum mv of a particle
P a vector representing the total momentum of a system
qk a generalized coordinate
Q the “quality factor” of a damped harmonic oscillator
r (or x) The position of a particle, represented by a vector
s path length of a trajectory
σ scattering cross section, has unit of an area
t time; typically a parameter that parameterizes the evolution of a system
T either a time like a period or a total time, or a kinetic energy
U potential energy
v another vector, often the velocity of a particle
v a scalar, often referring to the speed ||v||
W12 a scalar representing work for a state change 1 → 2
δθ a vector representing an infinitesimal rotation by an angle δθ
Ω solid angle, characterizes a set of directions, unit sr (steradian)
ω a vector representing an angular velocity
xi the i-th component of a vector x

2
Version: 11th Nov, 2017 11:13; svn-65

1 Kinematics
1.1 Trajectories, velocity, acceleration
This part deals with a geometric description of the trajectories of a single point-
like object without going through how such a trajectory comes about.
In its simplest form, the motion of a particle over time can be described as a
time-dependent position vector
   
x1 (t) x(t)
   
r(t) =  x2 (t)  =  y(t)  (1)
x3 (t) z(t)

This is a representation of the position r in traditional Cartesian coordinates


x, y, z or – to simplify notations later – x1 , x2 , x3 . In order to prepare ourselves
for other coordinate systems, we write this as
3
X
r(t) = x1 (t) e1 + x2 (t) e2 + x3 (t) e3 = xi e i , (2)
i=1

where the ei are the unit vectors of the standard Cartesian coordinate system.
These unit vectors are normalized, which can be expressed by the scalar product,
ei · ei = 1, and two ei with different indices are orthogonal, i.e., ei · ej = 0 for
i 6= j. This can be summarized by the short notation

ei · ej = δij , (3)

with the Kronecker delta δij equal to 1 for i = j, and 0 otherwise. Such a system
of unit vectors is referred to as an orthonormal basis for a vector, which means
that each vector can be represented as a linear combination of basis vectors,
and the coordinates of any vector x can be extracted via projection onto the
corresponding unit vector,
xi = x · ei , (4)
where the notation ( · ) denotes again the scalar product between two vectors.
An interesting property of a trajectory of a single point that moves in time
according to r(t) is its rate of change of the position, or its velocity. This is
simply the derivation of r(t) with respect to time,

r(t1 ) − r(t2 ) dr(t)


v(t) = lim = , (5)
∆t→0 ∆t dt
with the position at two different times t1 , t2 with a time difference ∆t = t2 − t1
and ∆r = r(t1 ) − r(t2 ) according to the following figure:

3
Version: 11th Nov, 2017 11:13; svn-65

O r (t2 ) O r (t) + ∆ r (t)

r (t1 ) r (t)
∆r (t)

The velocity as a derivative with respect to time is often written as v = ṙ, and
can be expressed as a linear combination of coordinate base vectors with time
dependent components vi (t) with i = 1, 2, 3
3
X 3
X
v= vi (t) ei = ṙi (t) ei . (6)
i=1 i=1

As can be see from the figure, the direction of the velocity vector v is tangential
to the trajectory in each point. Its modulus v(t) = ||v(t)|| is referred to as the
speed of the point, and is obtained in the usual way via the norm of the velocity
vector, v
u 3
√ q uX
v = ||v|| = v · v = v1 + v2 + v3 = t vi2 .
2 2 2
(7)
i=1

Apart from the parameterization of a trajectory with respect to time t, it is


often useful to parameterize it according to its path length s. This makes it
possible to describe geometrical properties of the trajectory independently of the
speed with which a point may follow it. For example, we can express the velocity
of a point via
∆r ∆s dr ds
v = lim lim = . (8)
∆s→0 ∆s ∆t→0 ∆t ds dt
The derivative ds/dt is by definition the speed of a point, i.e., the path length
increase per unit of time. This means that we can write the velocity vector as
dr
v=v· =: v · et , (9)
ds
where et = dr/ds is a tangential vector et to the trajectory. Contrary to v, it is
a unit vector, as it can be seen by comparing (6) and (7). The transition to a
parameterization of the trajectory by its path length s therefore allows a definition
of a tangential vector that only depends on the geometry of the trajectory.
We now move on to another vectorial quantity, the rate of change of velocity,
or the acceleration. This is simply the temporal derivative of the velocity,
dv
a= = v̇ = r̈ (10)
dt
Apart from just specifying a time dependence of the Cartesian coordinates similar
to (6), we can express it as a linear combination of vectors that are associated

4
Version: 11th Nov, 2017 11:13; svn-65

with the trajectory. We start from (9), and apply the rule for deriving a product
or two quantities:
d dv det det
a= (v et ) = et + v = v̇ et + v . (11)
dt dt dt dt
The first component in this sum is pointing in the tangential direction, and, as
its vectorial component et is a unit vector, the quantity v̇ describes the change
of velocity over time along the path. The second component in this expression
contains a temporal derivative of the tangential unit vector et . If we take the
normalization condition for et ,

et · et = 1 , (12)

and derive the whole equation with respect to time, we obtain


d det det det
(et · et ) = · et + et · = 2et · = 0. (13)
dt dt dt dt
The vanishing dot product between product between et and det /dt means that
the latter is a vector that is perpendicular to the tangential vector, and thus
points in a direction orthogonal to the trajectory in each point. We now split up
det /dt into a modulus and a direction, and formulate it as much as we can by
elements that only depends on the geometry of the trajectory. For this, we write

det det ds de
t
= = v en , (14)
dt ds dt ds
where en is a new unit vector in the direction of the derivative of the tangent vec-
tor et with respect to the path length. The last two factors in this expression do
not contain any explicit time dependence anymore, and are thus only a property
of the trajectory’s geometry.
To interpret the meaning of this product, we consider a circular trajectory
parameterized by s, with
x2
R
r (s)

O R x1

     
s s
r(s) = R cos e1 + sin e2 (15)
R R

5
Version: 11th Nov, 2017 11:13; svn-65

By taking the derivative with respect to s, one finds


     
dr s 1 s 1 s s
= R − sin e1 + cos e2 = − sin e1 + cos e2 , (16)
ds R R R R R R
making use of several derivation rules, and the fact that the unit vectors e1,2 do
not depend on s. The derivative dr/ds has the norm 1 because of sin2 (s/R) +
cos2 (s/R) = 1, and is thus the unit tangential vector et to the trajectory (15),
i.e., dr/ds = et . We now proceed to take the next derivative with respect to s:
 
det 1 s 1 s 1 s s
= − cos e1 − sin e2 = − cos e1 + sin e2 (17)
ds R R R R R R R
The term in the parenthesis is again a unit vector, which we denote en , as it is
normal to et . Thus, the derivative
det 1
= − en (18)
ds R
for this circular trajectory has a modulus that is inversely proportional to the
radius R of the circle. As any (reasonably smooth) trajectory can locally be
approximated by an arc of a circular trajectory, we now can identify the last two
terms in (14) with a local radius of curvature R. With this and (14) we can write
the expression (11) for the acceleration as
1
a = v̇ et + v 2 en , (19)
R
where some care has to be taken with the direction of the unit vector en of the
trajectory. The quantity 1/R is also referred to as the local curvature of the
trajectory in any point. A straight trajectory with det /ds = 0 has curvature 0.

1.2 Vectors in polar coordinates


So far, we have chosen Cartesian coordinates to describe the trajectory of a
point, and some of its derived quantities. We now try to do this (for a two-
dimensional problem) in another coordinate system, the polar coordinates. The

x2 eθ
er
r
r

O θ
x1

6
Version: 11th Nov, 2017 11:13; svn-65

polar coordinates (r, θ) are connected to the Cartesian counterparts (x1 , x2 ) via

x1 = r cos θ , x2 = r sin θ (20)

While a trajectory can simply be specified by describing the coordinate pair


(r, θ) as a function of a parameter like time t or path length s, the description
of directed, vector-like quantities becomes more tricky. We do need a basis to
describe vectors that is somehow connected to the coordinates. Such vectors can
be constructed from the direction specified by the position change from a given
point r to r+dr if one of the coordinates is changed, i.e., r → r+dr or θ → θ+dθ.
From (20) we find (in Cartesian coordinates)


er = const1 · (x1 e1 + x2 e2 ) = cos θ e1 + sin θ e2
∂r

eθ = const2 · (x1 e1 + x2 e2 ) = − sin θ e1 + cos θ e2 , (21)
∂θ
where the constants are chosen to normalize the vectors. In the first equation,
this constant is 1, in the second equation it is 1/r. Both vectors er , eθ form an
orthonormal basis, that can be used to express vectors anywhere in (here: two
dimensional) space.
As an example, we now express velocity and acceleration vectors of a moving
point in these coordinates. First, we note that any point in the trajectory r(t)
can be expressed as
r = r er (22)
We then arrive at the velocity by taking the temporal derivative:
d der
v= (r er ) = ṙ er + r (23)
dt dt
We use the product rule to carry out the differentiation, because er will depend
on the position of the point, and may thus not be constant over time. As we
are looking for changes of er , we can attempt to look for changes in the new
coordinates:
der ∂er dθ ∂er dr dθ
= + = eθ = θ̇ eθ (24)
dt ∂θ dt |∂r {z} dt dt
=0

With this, we can write the velocity vector as a linear combination of the new
unit vectors er , eθ :
v = ṙ er + rθ̇ eθ (25)
Similarly, we try to express the acceleration in the basis (er , eθ ):

dv der deθ
a= = r̈ er + ṙ + (ṙθ̇ + rθ̈) eθ + rθ̇ (26)
dt dt dt

7
Version: 11th Nov, 2017 11:13; svn-65

We obtain the temporal derivation of eθ in a similar way as in (24):


deθ ∂eθ dθ ∂eθ dr ∂eθ
= + = θ̇ = −θ̇ er (27)
dt ∂θ dt |∂r
{z } dt ∂θ
=0

The latter step follows from (21) by differentiation of eθ and comparison with er .
We now can clean up the expression for the acceleration, and arrive at
a = (r̈ − rθ̇2 ) er + (2ṙθ̇ + rθ̈) eθ (28)
The strategy how to obtain local unit vectors can be applied to other coordinate
systems. It is an important method to find a basis to express vectorial quantities
in whatever coordinate system is chosen.

1.3 Angular velocity


So far, we have encountered the vector quantities for the velocity v, which de-
scribes the rate of change of the position r of a point in time, and the acceleration
a, which describes the rate of changes of the velocity.
We now come back to the motion of a point on a circular trajectory defined
by (15). Here, polar coordinates (r, θ) as defined in (20) simplify the description
because r is constant. In a trajectory is parameterized by time t, all information
about the motion is contained in the function θ(t). Similar to the linear motion,
we can introduce an angular velocity ω for the instantaneous rate of change of θ:
d
ω(t) = θ(t) (29)
dt
This is a simple quantity for a motion in two dimensions. As any motion in
three dimensions can be locally approximated by a circular arc, we look more
carefully at a circular motion in three dimensions. The approximate circular
motion in each point has a well-defined axis of rotation. This axis can be well
described by a vector, with an ambiguity in the pointing direction (because an
axis is equally well described by two vectors ω and −ω), and its length. In the
same way as the rate of change of the position is encoded in the length v = |v| of
the velocity vector v, we now encode the rate of change of the angle, ω = dθ/dt
of the approximate circular motion, into the length of a vector ω, which points
in the axis of rotation. We can fix the pointing ambiguity by postulating that v,
a = v̇ (which is perpendicular to v), and ω form a right-handed system. Note
that here the angle θ refers to a polar coordinate system with an origin in the
center of the approximating circle. This definition of ω is independent of any
coordinate system, a property that seems sensible to postulate for meaningful
physical quantities.
We now check the relation between v, r and ω if we choose a coordinate system
that has an origin somewhere on the axis of the rotation as shown in the figure.

8
Version: 11th Nov, 2017 11:13; svn-65

approximate
cirular path
ω δθ
in P
R
v
P trajectory
α r

O instantaneous rotation axis

The modulus of the velocity is given by v = ωR = ω|r| sin α, and v is perpen-


dicular both on r and ω. This relationship is fulfilled by the vector product

v = ω × r, (30)

which incidentally also matches the choice for the direction of ω.


In analogy to the velocity v describing the rate of change of a position in space
r, and the simple relation v = ṙ, this may suggest that there is a vector quantity
that has the meaning of an angle, and its temporal derivative leads to the vector
quantity ω above. This, however, will not work. To see why, we need to have a
closer look to coordinate transformations.

1.4 Coordinate transformations


It can be useful to describe a vector quantity like the velocity in different co-
ordinate systems. The usual Cartesian coordinate system with some origin is
one of them, the tangential coordinate system to a trajectory, defined by et from
(9), en from (11) and a third unit vector, er = et × en orthogonal to the first
two, is another one. Yet other Cartesian reference frames can be generated from
tangential vectors to polar, spherical or cylindrical coordinate systems. They all
have the property that any vector can be expressed as a linear combination of
the respective basis vectors:
3
X
a= ai ei (31)
i=1

Any transition from a right-handed basis {e1 , e2 , e3 } to a different right-handed


basis {e′1 , e′2 , e′3 } can be expressed as a rotation in space. This can be shown
rigorously, but we skip this here and refer to a course in basic linear algebra. The
coordinate transformation should not affect the scalar product of two vectors,
and as a consequence, norm of a single vector should be conserved, i.e,
3
X 3
X
a′ · b′ = a′ i b ′ i = ai b i = a · b , (32)
i=1 i=1

9
Version: 11th Nov, 2017 11:13; svn-65

with components ai and a′i in the respective bases. The transformation between
the coordinates is a linear relationship, and can be represented by
       
a′1 a1 m11 m12 m13 a1 3
X
 ′       
 a2  = M ·  a2  =  m21 m22 m23  ·  a2  or a′i = mij aj ,
a′3 a3 m31 m32 m33 a3 j=1

(33)
with the matrix M made up by components mij . The matrix representation M
of a linear transformation that preserves the scalar product between two vectors
obeys |det M| = 1.
We now look at specific examples for rotations and their representation in the
form (33). A rotation around the e3 axis by an angle φ is represented by
 
cos φ − sin φ 0
R3 (φ) =  sin φ cos φ 0 

 (34)
0 0 1

Matrix representations of rotations around axes e1,2 can be obtained by cyclic


coordinate change 1 → 2 → 3 → 1 (or x → y → z → x):
   
1 0 0 cos φ 0 sin φ
R1 (φ) =  0 cos φ − sin φ 



and R2 (φ) =  0 1 0   (35)
0 sin φ cos φ − sin φ 0 cos φ

Rotations around different axes do generally not commute. The matrix repre-
senting two sequential transformations is the matrix product of the representa-
tions of the individual transformations. For example,
     
0 0 1 0 −1 0 0 0 1
◦ ◦
R2 (90 ) · R3 (90 ) =  0 1 0  ·  1 0 0  =  1 0 0 
    
 , (36)
−1 0 0 0 0 1 0 1 0

but  
0 −1 0
◦ ◦
6 R2 (90◦ ) · R3 (90◦ ) .

R3 (90 ) · R2 (90 ) =  0 0 1 
= (37)
−1 0 0
If we were to represent a rotation by a vector, the sum of two such vectors would
be again a vector representing a rotation that should represent the concatenation
of the rotations represented by the individual vectors; this can be motivated by
looking at two rotations around the same axis. For different axes, however, the
representation of concatenated rotations depend on their sequence, but the sum
of two vectors does not. Hence, a rotation by a finite angle can not be represented
in a meaningful way by a vector. This answers, in part, the problem we were
facing at the end of section 1.3: there is no underlying vector quantity which has
the angular velocity vector ω as a rate of change in time.

10
Version: 11th Nov, 2017 11:13; svn-65

We stay for a while with the properties of rotation transformations. They are
still meaningful independently of the choice of a coordinate system, although they
can not represented by vectors. Specifically, observers with different coordinate
systems can agree on a rotation axis, direction, and angle.
As rotations can be represented by matrices that transform vectors according to
(33), and such matrices can again be represented in different coordinate systems,
we can try to extract properties of rotation transformations that are independent
of the coordinate system. The determinant det M of a representing matrix M
is such a property. Others are the eigenvalues of a representing matrix. For the
specific example of a rotation around e3 in (34), we can evaluate the characteristic
equation to determine the eigenvalues λ:

c−λ −s 0

|R3 (φ) − λI| = s = (1 − λ)[(c − λ)2 + s2 ] = 0 ,

c−λ 0 (38)

0 0 1−λ

with c = cos φ, and s = sin φ. This can be further simplified to

(1 − λ)(1 − 2cλ + λ2 ) = 0 . (39)

The first eigenvalue is λ = 1; one corresponding eigenvector is


 
0
a= 0 

 = e3 . (40)
1

This represents the fact that vectors along the rotation axis do not change. More
generally, an eigenvector to the eigenvalue 1 of a matrix representing an arbitrary
rotation provides a nice way to find the axis of rotation in any Cartesian coordi-
nate system. For the other two eigenvalues λ that fulfill (39), we find the roots
to the second factor (1 − 2 cos φλ + λ2 ):
q √ q
λ = cos φ ± cos2 φ − 1 = cos φ ± −1 1 − cos2 φ
= cos φ ± i sin φ = e±iφ (41)

Therefore, we can extract the angle of rotation represented by a matrix by looking


at its complex eigenvalues. As a side remark, the corresponding eigenvectors
for a rotation represented by R3 (φ) are a = e1 ± ie2 . These vectors are not
too meaningful in classical mechanics, but become useful e.g. when considering
representations of electromagnetic fields of circular polarization; left- and right-
handed polarized fields propagating along the e3 axis are represented by these
eigenvectors, and are preserved under rotations around e3 .
The rotations we have considered so far all preserve the handedness of a coordi-
nate system; they fall into the category with det R = +1; they are so-called proper

11
Version: 11th Nov, 2017 11:13; svn-65

rotations. Scalar-product preserving coordinate transformations with det R = −1


are referred to as improper rotations, and are are concatenations of (an odd
number of) coordinate inversions, or mirror images with proper rotations. For
example, a reflection in direction of e3 is represented by the matrix
 
1 0 0
M3 =  0 1 0 

. (42)
0 0 −1
We have seen examples of vectors like velocity and acceleration. For a change
to a different orthogonal coordinate system, they need to be transformed accord-
ing to (33). Physical quantities like temperature in a given location or the speed
of a point that do not change under coordinate changes form the class of scalars.
We can extend this concept and ask e.g. how to transform the representations
of physical properties that need to be represented by matrices. Such quantities,
apart from the rotations we just saw, do exist; an example in continuum mechan-
ics would be the stress tensor), but this is beyond the scope of this module. The
moment of inertia of a solid is another one. If such a quantity (referred to as
tensor of rank 2) is represented by a matrix T made up by components tij , its
components transform under M according to
3
X
t′ ij = mik mjl tkl (43)
k,l=1

One can turn these transformation properties around, and classify a physical
property as scalar, vector or tensor according to their transformation properties:
Scalars do not change under rotations, vector are transformed according to (33),
tensors of rank 2 according to (43), and so on.

1.5 Infinitesimal rotations


Examples (36) and (37) show that finite rotations generally do not commute.
However, the situation changes if we consider only small rotations. Starting from
(30), we find a change of a position r in a small time interval δt:
r → r + δr1 with δr1 = vδt = (ωδt) × r = δθ 1 × r , (44)
where δθ 1 represents a small rotation by an angle δθ1 around an axis parallel to
δθ 1 , and a handedness in the same way as defined for ω.
Two sequential infinitesimal rotations, δθ 1 followed by δθ 2 , change the position
r of a point similarly to (44) by the infinitesimal amount
δr12 = δr1 + δθ 2 × (r + δr1 )
= δθ 1 × r + δθ 2 × (r + δθ 1 × r)
= δθ 1 × r + δθ 2 × r + δθ 2 × δθ 1 × r
≈ δθ 1 × r + δθ 2 × r (45)

12
Version: 11th Nov, 2017 11:13; svn-65

by neglecting the product term of two infinitesimal quantities in the last step.
This representation is the same if one carries out the rotations in different order.
As a consequence, infinitesimal rotations represented by vectors δθ do commute.
We can also see this with an infinitesimal version of our example in (36) and
(37). We first approximate the representation (34) by a truncated Taylor expan-
sion for small angles δθ:
   
1 − δθ32 /2 + . . . −δθ3 + δθ33 /3! − . . . 0 1 −δθ3 0
 3 2  
R3 (δθ3 ) =  δθ3 − δθ3 /3! + . . . 1 − δθ3 /2 + . . . 0  ≈  δθ3 1 0 
,
0 0 1 0 0 1
(46)
again by neglecting terms in higher than linear order in δθ. Similarly, we ap-
proximate R2 (δθ2 ), and evaluate two concatenated small rotation matrices. We
find  
1 −δθ3 δθ2
R2 (δθ2 ) · R3 (δθ3 ) = 
 δθ3 1 0  (47)
−δθ2 δθ2 δθ3 1
and  
1 −δθ3 δθ2

R3 (δθ2 ) · R3 (δθ3 ) =  δ3 1 δθ2 δθ3 
. (48)
−δθ2 0 1
The two expressions differ only by the two underlined terms, which is a product
of two infinitesimal angles. Neglecting them with the same argument used for
truncating the Taylor expansion, the infinitesimal rotations commute.
So in summary, an infinitesimal rotation can be represented by a vector δθ. The
direction of this vector characterizes the rotation axis, and can be transformed in
a meaningful way according to (33). Its modulus represents the rotation angle,
and the sum of two vectors δθ 1 + δθ 2 represents correctly the concatenation of
the two individual infinitesimal rotations.

2 Newtonian Mechanics for single particles


So far, we only looked at a mathematical description of trajectories. One of
the main goals in classical mechanics is to describe the evolution of a physical
system in time. An extremely successful method for this was developed by Sir
Isaac Newton. The idea is not to specify the whole trajectory of a particle, but
to combine very few underlying universal principles with standard mathematical
methods. These principles were expressed as “laws”, and are sufficient (together
with some initial parameters) to lead to a complete description of the evolution
of a physical system.
These laws, together with the mathematical tools to solve differential equations,
have turned out to be tremendously successful in the description of mechanical

13
Version: 11th Nov, 2017 11:13; svn-65

systems over an extremely wide scales in space and time, ranging from motion
on the molecular level up to the motion of planets. It was not until perhaps a
120 years ago that the descriptive strength of these few “laws” has been seen
as incomplete for describing the motion of mechanical objects, specifically for
areas where very small masses, extremely short time scales, high energies, and
velocities on the order of the speed of light are involved.
That said, Newton’s laws turned physics into a science with an enormous pre-
dictive power based on very few and simple rules, and cover extremely well the
phenomena that we encounter in our daily life.

2.1 Newton’s laws


Sir Isaac Newton formulated these laws in 1687 in his “Principia”. They were
formulated as follows:
I. A body remains at rest or in uniform motion unless acted upon a force.

II. A body acted upon a force moves in such a manner that the time rate of
change of momentum equals the force.

III. If two bodies exert forces on each other, these forces are equal in magnitude
and opposite in direction.
These laws do not mean that the physical world has to follow them strictly —
they are more a tool to efficiently describe how many things evolve in time. In
a sufficiently well specified context of initial conditions and participating forces,
they and allow a very accurate description of the motion of objects. However, it is
necessary to observe how well Newton’s laws (or any other physical law) capture
a phenomenon in nature to see if corrections or additional principles need to be
added.
When these laws were formulated, the whole mathematical language of defini-
tions and formal logic was not as formally developed as it is now. Thus, it makes
sense to comment on these laws to make clearer what they actually state.
The first law is basically a definition of a “free particle”. The notion of a force
is used, but not introduced in a too useful way. “Uniform motion” means that
the velocity of a particle does not change in time. For making a statement about
a velocity, we do require a specific reference frame, i.e., the choice of an origin
and a few other properties that we will see later. Formally, we would write the
first law as
F = 0 ⇒ v = const. (49)
The second law explicitly relates force with momentum (called quantity of
motion originally), and Newton provided a definition of the momentum of a
particle as
p = mv , (50)

14
Version: 11th Nov, 2017 11:13; svn-65

where m is the mass of a particle, and v its velocity. Mathematically, the second
law can be written as
d d
F= p = (mv) = ma = mẍ . (51)
dt dt
This is a definition of what is meant by force. However, this definition requires
that one has already an idea what the mass m of an object is. While it seems
intuitively clear, this is a reference to the inertial mass of an object. It is an
intrinsic quantity of an object that is not subject of its state of motion.
The third law is a statement on the motion of two objects that exert forces on
each other. Even more specific, it makes reference to forces aligned along the
connecting line between them,

F12 k (x1 − x2 ) , (52)

where x1 and x2 are the positions of the two objects. Such forces are referred
to as central forces, but the choice of the name “central” will only become clear
at a stage when we consider objects of finite size. Examples of such forces are
forces exerted by a elastic spring connecting the two bodies, the gravitational
force between two heavy masses, or the force between two electrically charged
objects, van der Waals forces, and others.

m1 F12 F21 m2
x1 x2

In such a case, the third law then states that

F12 = −F21 (53)

Notably, the third law does not apply to forces that depend on the velocity of
particles. Examples for such forces are the friction of an object moving through
a medium, or even the (weak) velocity dependence of the gravitational force.
Together with the definition of force in the second law in (51), one can write
the third law in its version (53):
dv1 dv2 m1 a2
m1 = −m2 or m1 a1 = −m2 a2 or =− , (54)
dt dt m2 a1
where the minus sign indicates the opposite orientation of the two accelerations
on the two masses. This relation can be used to compare the two inertial masses
by comparing the accelerations of the two bodies when they exert a force on
each other. Again, an appropriate reference frame is necessary to determine the
accelerations.

15
Version: 11th Nov, 2017 11:13; svn-65

We often do not measure the ratio of inertial masses, but that of heavy masses
in the gravitational field of the earth with a balance. It does not follow by
logic from Newton’s law that the heavy mass and the inertial mass used in the
definition of a force (51) are the same. However, many tests have been carried
out that compared heavy and inertial masses that seem to indicate that they are
indeed the same property. Often Galileo is attributed to have carried out the first
of these tests by comparing the falling time of balls of the same material, but
different masses from the tower of Pisa some time around 1600, but apparently
Simon Stevin seemed to have actually carried out this experiment on the church
tower in Delft in 1586. Those early experiments had a limited accuracy, and
experiments carried out by Newton himself seemed to have shown the equivalence
of heavy and inertial mass “only” to within 10−3 . Much more accurate tests were
carried out by Eötvös1 in 1890, and more recent experiments by Dicke2 in 1964
could show the equivalence of the two quantities to within 10−12 . This seems to
suggest that they are indeed equivalent. The assertion that the two are equivalent
is referred to as the equivalence principle, and is one of the cornerstones of general
theory of relativity.
Another important consequence of the third law can be seen when using the
definition of the force in the form of F = dp/dt in an isolated system of two
bodies 1,2. Then, it states
dp1 dp d
=− 2 or (p + p2 ) = 0 . (55)
dt dt dt 1
This means that the sum p1 + p2 of the two momenta is constant in time, i.e.,
the total momentum in this system is conserved. We will see a few more of such
conservation laws, they tend to simplify the description of the dynamics of a
system.

2.2 Inertial reference frames


The consequences of Newton’s laws only make sense if there is an appropriate
coordinate system that is used to describe the derivatives of the momentum
via (53). Such coordinate systems are referred to as inertial reference frames.
Therefore, a somewhat lame definition of such a reference system is can be seen
as one in which Newton’s laws hold. Then, the first law (49) can be used as a
definition: An inertial reference frame is a reference frame in which a body under
no influence of a force remains in uniform motion.
Once one has identified an inertial reference frame, there are several others:
As Newton’s laws make statements about the change of velocities of bodies, the
addition of a constant velocity does not affect the validity of the laws. Thus,
1
Loránd Eötvös, 1848-1919
2
Robert H. Dicke, 1916-1997

16
Version: 11th Nov, 2017 11:13; svn-65

a coordinate transformation between two reference systems K and K ′ with a


position vector of a point r in K, and r′ in K ′ according to
r → r′ = r + v 0 t (56)
with a constant velocity v0 leads to r̈ = r̈′ . Thus, if Newton’s laws hold in
K, they also do in K ′ . It should be noted that the times t and t′ in the two
reference systems are assumed to be the same. Such transformations are referred
to as Galilean transformations, the equivalence of two inertial frames under such a
transformation is referred to as Galilean invariance, or the principle of Newtonian
or Galilean relativity.

2.3 Equation of motion


We can now use Newton’s second law (51) to determine the dynamic of a system,
simply by considering
F = mẍ (57)
as a differential equation for the position x of a body. This is a second order
ordinary differential equation (ODE), which we can solve if we know the force
F acting on the body, and sufficient initial or boundary conditions for the posi-
tion and/or velocities. However, this equation of motion does not always hold.
Examples where this is not applicable are
• situations in non-inertial reference frames, like rotating reference frames.
This becomes significant e.g. in the description of the motion of air particles
and clouds on long ranges;
• at velocities that are not small compared to the speed of light;
• when time would be not homogenous. This is perhaps a rather exotic
restriction.

2.3.1 Free Fall


A simple equation of motion is presented for a constant force F = F0 . Then,
the acceleration a = F0 /m is constant as well, and we can obtain the velocity by
integration over time:
Zt
1
v(t) = a(t′ )dt′ + v0 = F0 t + v0 , (58)
m
0

where v0 is the velocity of the particle at time t = 0. The position x(t) of the
particle at any time t follows from integration of v(t),
Zt
1
x(t) = v(t′ )dt′ + x0 = F 0 t 2 + v 0 t + x0 , (59)
2m
0

17
Version: 11th Nov, 2017 11:13; svn-65

y m
x (t)

v0
F0

x0

With the additional integration constant x0 fixing the position at t = 0. A typical


example for such a case is the acceleration of a body with a mass m under the
influence of gravity on earth, ignoring eventual friction forces. Such a motion is
called free fall, but can have initial conditions that include a velocity component
that points in an arbitrary direction.
This is a nice example where one can see that the vectorial equation of motion
(and its solution) separates into individual components. In the example shown
in the figure, the constant force F0 has no component in x direction. Therefore,
the motion in x is uniform, i.e., the velocity component vx is constant, and the
component x(t) increases linearly in time. For the component in y, the motion is
quadratic in t. The choice of a suitable coordinate system therefore can simplify
the integration of the equation of motion, and separate out certain degrees of
freedom.

2.3.2 Friction Forces


Another class of forces are friction forces. They appear when an object moves in
a fluid, or in contact with a surface, and opposes the motion of a particle. They
depend on the velocity either linearly,

F = −αv (60)

with a positive constant α, or more generally,


v
F = −f (v) (61)
v
with a function f (v) depending on the speed v of the object. The dependency can
be rather complex and depends on the nature of the friction, and on the velocity
range. If something moves through a fluid like air slow enough, f (v) ∝ v as in
(60), but for higher speeds, a quadratic dependency is observed. An approximate
heuristic formulation of such a friction force that is used to describe the drag of

18
Version: 11th Nov, 2017 11:13; svn-65

objects like cars or bicycles moving through air is


1
f (v) = cW ρAv 2 , (62)
2
with the density of air ρ, the surface A area of the vehicle, and a dimensionless
shape-dependent drag coefficient cW that is around 0.3 for most cars these days,
and around 1 for someone on a bicycle for typical speed ranges. If v comes close
to the speed of sound, f (v) undergoes a relatively complex change.

2.3.3 Harmonic oscillator


The motion of a harmonic oscillator is a particularly important and appears in
many areas of physics. We consider the simple case where a mass m is connected
to a spring with a Hook constant k, and motion only takes place in x direction:

k
m

F(x)

x(t) 0 x

The spring exerts a restoring force Fx on the mass,


Fx = −kx , (63)
and with (51) or Fx = mẍ = −kx, the equation of motion takes the form
k
ẍ(t) + x(t) = 0 . (64)
m
This is a linear ordinary differential equation, because all derivatives of x appear
linearly. Such equations have special solutions
x(t) = est (65)
with a (generally complex) constant s. Inserting (65) into (64) leads to
!
2 k
s + x(t) = 0 . (66)
m
As this equation has to hold for all times, the first term needs to vanish:
k
s2 + =0 (67)
m

19
Version: 11th Nov, 2017 11:13; svn-65

This is referred to as the characteristic equation of (64). Its two roots are
s s s
k k k
s1,2 = ± − = ±i = ±iω with ω = (68)
m m m
Because the roots s1,2 are distinct, the general solution of (64) is a linear combi-
nation of the two solutions (65) for s1 and s2 ,

x(t) = a1 es1 t + a2 es2 t = a1 eiωt + a2 e−iωt . (69)

The constants a1,2 are now chosen to meet the initial conditions of the differential
equation. To fully determine the motion in time, exactly two initial conditions
are needed; let’s assume x(t = 0) = x0 , and v(t = 0) = ẋ(t = 0) = 0. Inserting
these conditions into (69) leads to

x(t = 0) = a1 + a2 = x0 (70)
ẋ(t = 0) = a1 iω − a2 iω = iω(a1 − a2 ) = 0 (71)

The second part leads to a1 = a2 , and the first part then to a1 = x0 /2, so we
finally get the oscillatory solution to (64) that meets the initial conditions,
x0  iωt 
x(t) = e + e−iωt = x0 cos ωt , (72)
2
with the oscillatory solution of amplitude x0 and a period 2π/ω:
x
x0

0 t

ω

2.4 Angular motion


Newton’s laws introduce the concept of momentum, which is changed by a force,
and can be used to make a statement about the linear motion of a body. If no
force is acting on the body, the linear momentum p is conserved. Similarly for
angular motion, one can formally define a vector termed angular momentum L:

L := r × p , (73)

where r is the position of a particle with respect to some origin O, and p = mv its
linear momentum. This definition does not imply any circular motion, but it is

20
Version: 11th Nov, 2017 11:13; svn-65

L
circular path ω general trajectory N

O
r p O
r F
rotation axis
for circuar motion

inspired by it, as shown below: In the case of a circular trajectory and a constant
speed v, the newly defined vector L is constant, and points in the direction of
the earlier introduced angular velocity ω.
The temporal derivative of the angular momentum defined in (73) is given by:
d
L̇ = (r × p)
dt
= ṙ × p + r × ṗ
= v × (mv) +r × ṗ
| {z }
=0
= r × F, (74)
(75)
where F is the force acting on the particle. Similar to the definition of L in (73),
one can define a vector N called torque with respect to an origin O:
N := r × F (76)
With this definition, the temporal derivative of L is simply given by
L̇ = N (77)
This looks similar to the second law for linear momentum, ṗ = F, but for rota-
tions. Thus, for angular motion, angular momentum L and torque N play similar
roles as momentum p and force F for linear motion.
If N = 0, L̇ = 0, i.e., the angular momentum L remains constant or is con-
served. An important example for this situation can be found with central force
problems, like the Coulomb interaction or gravitational attraction between two
heavy bodies. If the origin O for determining the angular momentum is chosen
in one of the two bodies, or somewhere on the connecting line between them, the
force F exerted by each body to the other one is parallel to r. Then,
N = r × F = r × (const · r) = 0 . (78)
Consequently, the angular momentum does not change in time: isolated systems
governed by a central force (planets around a central star, electrons around a
proton) conserve the angular momentum. This does not mean, however, that the
trajectories have to be circular!

21
Version: 11th Nov, 2017 11:13; svn-65

3 Work and Energy


While Newton’s laws allow for a straightforward formulation of equations of mo-
tion via forces, they require a knowledge of the vectorial quantity force. This
can become a bit complicated, especially if forces need to be introduced to fulfill
boundary conditions and constraints. Sometimes a quantity of interest may not
require to fully solve the equations of motion. On the other hand, some of the
expressions for forces can get simpler as well with the introduction of concepts of
work and energy. They will also be required for deriving the equations of motion
of a physical system in different ways than using Newton’s laws.

3.1 Kinetic energy


We start with the definition of an integral quantity between two positions 1 and
2 of a particle along a path Γ as indicated:

F 2
Z
W12 := F · ds (79)
Γ1→2 ds
1 path Γ
This is definition of “work” W12 for a transition between two points integrates up
the scalar products between a local force F and line elements ds along the path
Γ. It can be evaluated component-wise,
Z Z X
F · ds = Fi dxi (80)
Γ Γ i

Often one can also evaluate the kernel Fds of this integral with respect to a
sensible parameterization of the path Γ, like the path length s introduced earlier,
F · ds = F · et ds , (81)
where et is the tangential vector to the path, and ds a length element. With
F = mv̇, and parameterizing via time, ds = vdt, it becomes
F · ds = (mv̇) · (v dt)
1
= m (2vv̇ dt)
2  
1 1 2
= m d (v · v) = d m v (82)
2 2
The third step in the development above can be seen as reversing the differenti-
ation of the scalar product v · v =: v2 with respect to time:
d
(v · v) = v · v̇ + v̇ · v = 2vv̇ (83)
dt

22
Version: 11th Nov, 2017 11:13; svn-65

The last step in (82) is just combining the constants into a single differential
d(mv2 /2). By defining a quantity
1
T := mv2 , (84)
2
one can simplify the differential d(mv2 /2) = dT , and the work integral becomes
Z Z
W12 = F · ds = dT = T2 − T1 (85)
Γ1→2 Γ1→2

This is a somewhat tricky step: by showing that F · ds can be written as a full


differential dT in the newly defined quantity T , the integration becomes trivial,
and the result is the difference between the two values of T at points 1 and 2.
The quantity T only depends on the velocity v in a given point; it is therefore
reasonable to call the definition (84) kinetic energy T of a particle. According to
(85), the work W12 along a path Γ evaluates the change in the kinetic energy of
the particle moving from point 1 to 2.
This statement holds independently of the form of the force, which can depend
on the position and velocity of the particle, and explicitly on time: F = F(r, v, t).
Also note that definition (84) is reasonable in the sense that T = 0 for v = 0,
but not unique: we could have chosen to add a constant T0 to T . Such an
additive constant leaves both the full differential dT unchanged, and cancels out
in the subtraction (85). This is something to pay attention to whenever some
quantity (here: T ) is defined that conveniently describes differential property
(here: dT = F · ds).

3.2 Conservative forces and potential energy


For forces that only depend on the position r of a particle, one can assign a “field”
F(r). The work integral (79)
Z
W12 = F(r) · ds (86)
Γ1→2

can be more directly evaluated with results from vector calculus. In some cases,
the work does not even depend on the specific path Γ a particle takes, but only
on the end points 1 and 2. In such cases, F(r) is called a conservative force.

1 ΓA 2
ΓB

A path integral between points 1 to 2 along a path Γ can be split up in an


integral between points 1 and an intermediate point 3, and one between points 3

23
Version: 11th Nov, 2017 11:13; svn-65

and 2:
3 Z Z Z
2 F · ds = F · ds + F · ds (87)
1 Γ1→2 Γ1→3 Γ3→2

If the path integral is independent of the path Γ, and therefore any intermediate
point 3 like in the path above, it has to take the form of a difference of an
endpoint-dependent function:
Z
F · ds = U (r1 ) − U (r2 ) (88)
Γ1→2

This form suggests that the integral kernel can be written as a full differential,
F · ds = −dU . With the ansatz
X dU
F = −∇U = ei , (89)
i dxi
one can verify that the path integral takes indeed the form (88):
Z Z
F · ds = − (∇U ) · ds
Γ1→2 Γ1→2
Z ! 
X ∂U X
= − ei  ej dxj 
i ∂xi j
Γ1→2
Z X Z
∂U
= − dxi = − dU
i ∂xi
Γ1→2 Γ1→2
= U (r1 ) − U (r2 ) (90)
Similarly to the definition of the kinetic energy, the function U (r) is not
uniquely defined. By adding a constant U0 to a given function U (r), the re-
sulting force and therefore also the work integral W12 do not change, because U
only appears as a derivative. The scalar function U has the same dimension as
the previously defined kinetic energy, hence, it seems reasonable to refer to this
quantity as potential energy or simply potential .
To see better when a force field can be written as the gradient of a potential
U (r), we consider two paths ΓA and ΓB from point 1 to 2. By definition, the
work integral W12 for a conservative field will be the same for both paths:
Z Z
ΓA 2
1 F · ds = F · ds (91)
ΓB
ΓA,1→2 ΓB,1→2

Reversing one of the two paths changes the sign of the work integral:
Z Z
W21 = F · ds = − F · ds = −W12 (92)
Γ−B,2→1 ΓB,1→2

24
Version: 11th Nov, 2017 11:13; svn-65

A Γ Γ−B
The concatenated path 1 −→ 2 −→ 1 is a closed, and the path integral vanishes:
I
ΓA 2
1 F · ds = W12 + W21 = 0 (93)
Γ−B
ΓA +Γ−B

The circle over the integral symbol is a convention indicating the closed path
integration. One of the results of vector calculus, referred to as Stokes’ theorem,
relates the path integral of a vector field F along the boundary ∂S of an orientable
surface S with a surface integral of the curl ∇ × F of the field:
F
dA
I Z
F · ds = (∇ × F) · dA (94)
∂S S
S
∂S

As closed path integrals of the type (93) vanish for conservative fields for all
paths, the integral over ∇ × F on the right side is also identical to zero. This
must hold for all surfaces, also sufficiently small ones where ∇ × F is smooth,
which implies that
∇×F=0 (95)
everywhere. This is an important result: the curl of a conservative force field
vanishes – and the other way round, because the Stokes theorem has no logical
direction. So all force fields F(r) with ∇ × F = 0 are conservative.
One can show easily (by explicitly carrying out the differentiation) that force
fields that can be written as the gradient of any potential U (r) are curl-free and
thus conservative:
∇ × [∇U (r)] ≡ 0 (96)
This justifies the ansatz in (89) to write conservative field as the gradient of
a potential. Further, one can show that every sufficiently smooth field F with
∇ × F = 0 can be represented by a gradient of the form (89).

3.3 Total energy


For conservative fields, we can write the work W12 for moving from point 1 to 2
as a difference between the kinetic energy T at the two points due to (85), and
as a difference of the potential energy at the two endpoints according to (88):
W12 = T (r2 ) − T (r1 ) = U (r1 ) − U (r2 ) (97)
So the change in kinetic energy corresponds to the negative change in the potential
energy. Thus, it is reasonable to define a total energy E:
E := T + U (98)

25
Version: 11th Nov, 2017 11:13; svn-65

From (97) it follows immediately that for a transition from point 1 to point 2,
the total energy does not change in a conservative force field:

E(r1 ) = T (r1 ) + U (r1 ) = T (r2 ) + U (r2 ) = E(r2 ) (99)

Using this equality, it is easy to see that the total energy is the same for every
point along a path Γ, and that the temporal derivative dE/dt therefore vanishes
— or that the total energy E is conserved.
This derivation assumed that the force is not explicitly time dependent. The
statement of conservation of the total energy can be slightly extended to time-
dependent potentials. To see that, we first use (82) and transit from differentials
to temporal derivatives:
dT dr
dT = F · ds ⇒ =F· = F · ṙ = F · v (100)
dt dt
The total temporal derivative of the potential experienced by a particle moving
along a path is composed by both the spatial dependency of U , and its explicit
time dependency:
dU X ∂U dxi ∂U
= +
dt i ∂xi dt ∂t
∂U ∂U
= (∇U ) · v + = −F · v + (101)
∂t ∂t
Adding the last two equations then yields the change of the total energy over
time:
dE dT dU ∂U
= + = (102)
dt dt dt ∂t
Again, if the potential U does not explicitly depend on time, dE/dt = 0, and the
total energy is conserved .

3.3.1 Classification of problems


The knowledge of the potential and the total energy of a system allows making a
lot of qualitative statements about the motion of a particle, even without solving
the full equation of motion. For this, we consider the potential U (x) in a one-
dimensional problem:
First, the system can not have a total energy smaller than the lowest potential
U (x) anywhere, because the kinetic energy T = mv 2 /2 has to be positive. Thus,
a system with total energy E0 has no solution.
For E = E1 , there is a region between x1 and x2 where T ≥ 0. The two posi-
tions x1,2 are called classical turning points, because the kinetic energy vanishes
there, while the force does not. In such a scenario, the physical system will oscil-
late between the classical turning points. The solution of the equation of motion

26
Version: 11th Nov, 2017 11:13; svn-65

U(x)
E4

E3

E2
E1
E0
x7 x3 x1 x2 x4 x5 x6 x

is referred to as a bound solution, because the motion is restricted to the finite


interval [x1 , x2 ].
For a total energy E = E2 , there are two such intervals, [x3 , x4 ] and [x5 , x6 ].
The solution of the equation of motion will have two branches, but there will
be no transition between them. Any position x of the system outside these two
intervals is not allowed, and said to be forbidden by energy conservation.
For E = E3 (and assuming U < E3 for x > x6 ), the system has still a classical
turning point x7 , but is not bounded anymore. For t → ±∞, the system will
evolve to x → +∞.
For the last case (assuming U < E4 everywhere), there is no restriction to
the position x of the system: a particle will come from x → −∞ and evolve
to x → +∞, or the other way around. The last two cases are often found in
scattering problems, since the asymptotic position for t → ±∞ is not finite,
whereas the interesting part of U (x) is often limited to a small region.

3.3.2 Transit time


The expression of a total energy in a conservative system can be used to evaluate
the time it takes a system to evolve between two points, without having to solve
the equation of motion explicitly. For a one-dimensional problem and a given
potential U (x) and total energy E, one can invert the expression for the total
energy,
1
E = mv 2 + U (x) (103)
2
and obtain an expression for the velocity v at a given position:
s
2 dx
v=± (E − U (x)) = (104)
m dt
A single integration leads to an expression for the time difference between two
positions x, x0 :
Zt Zx x
′ 1 ′ Z dx′
t − t0 = dt = dx = q (105)
x0
v x0
2
(E − U (x))
t0 m

27
Version: 11th Nov, 2017 11:13; svn-65

as long as the sign of the velocity does not change over the region [x0 , x]. Such an
expression can be used to evaluate the oscillation period in an arbitrary potential
leading to bound solutions.

3.3.3 Potential of a harmonic oscillator


A particularly important potential U (x) in physics is that of a harmonic oscillator
which is characterized by a restoring force F(x) proportional to its displacement
from a reference position x = x0 . It can easily be seen that the potential
1
U (x) = k(x − x0 )2 (106)
2
leads to the restoring force F = −k (x − x0 ) seen in 2.3.3 for the one-dimensional
case.
U(x)

x0 x

As many local minima (with a non-vanishing second derivative d2 U/dx2 ) can


be approximated by a parabola, many potentials with a local minimum lead to
an approximate harmonic oscillation around this minimum position x0 .

4 System of many particles


So far, the mathematical formalism to describe the dynamics of a system was
restricted to a single particle. Many times, however, one is interested in the
dynamics of a system of many interacting particles. These N particles shall have
individual masses mα and individual instantaneous positions xα , where α is a
particle index.
Often, one is not interested in the properties like position or velocity of the
individual particles, but in joint properties of the whole ensemble. It is therefore
useful to define a few such quantities. Obviously, the total mass M of the ensemble
is given by
N
X
M := mα (107)
α=1

28
Version: 11th Nov, 2017 11:13; svn-65

x α mα
x1 center−of−mass position
m1 x’α
m2
x2
R
x3
O m3

With this, one can define a position vector R pointing to the center of mass
N
1 X
R := mα x α (108)
M α=1

of the ensemble, which is an average position of all particles, weighted by their


mass. One can re-express the individual particle positions x′α relative to the
center of mass:
xα = R + x′α (109)

4.1 Center-of-mass motion


To obtain the Newtonian equations of motion for the ensemble, one needs to
know the force Fα acting each particle α. It is convenient to split this force into
a component f α caused by the interaction between the particles in the ensemble,
and a component Fext
α due to external interactions:
X
Fα = Fext
α + fα with f α = f αβ (110)
β
β6=α

The latter sum considers interactions between particle α with all other particles
β. For central forces, like gravitational attraction or Coulomb interaction, these
internal forces are symmetric according to Newton’s third law:

f αβ = −f βα (111)

The equation of motion for the whole ensemble is then simply the set of equa-
tions of motion for the individual particles for all α:
X
mẍα = Fext
α + f αβ α = 1...N (112)
β
β6=α

The summation over all equations in (112) leads on the left side to
X d2 X
mẍα = mxα = M R̈ . (113)
α dt2 α

29
Version: 11th Nov, 2017 11:13; svn-65

On the right side, we can define a total external force


X
Fext := Fext
α (114)
α

The sum over all internal forces f αβ vanishes,


X
f αβ = 0 , (115)
α,β
α6=β

because in the double summation over α and β, each term f αβ gets canceled out
by the term f βα . Thus, the sum over all equations of motion (112) leads to an
equation of motion for the center of mass R of the ensemble where all internal
forces between the particles vanish:
M R̈ = Fext (116)
This equation of motion has the same form as the one for a single particle.
Similarly to the definition of a momentum for an individual particle, one can
define a total linear momentum
X
P := mα ẋα = M Ṙ , (117)
α

with a time derivative that is only determined by the external force:


Ṗ = M R̈ = Fext (118)
This allows considering the motion of a whole ensemble (or more precisely, col-
lective properties of the ensemble like R and P) in the same way as a single point
with mass M .

4.2 Total angular momentum


The individual particles in the ensemble all have a well-defined angular momen-
tum with respect to a coordinate origin O as defined in (73),
lα = xα × pα = xα × mα ẋα (119)
The total angular momentum of the ensemble is then given by
X
L := lα (120)
α

This definition can be expressed in terms of the center-of-mass position R of the


ensemble, and the individual displacements x′α (109):
X
L = (R + x′α ) × mα (Ṙ + ẋ′α )
α
X h i
= mα R × Ṙ + x′α × ẋ′α
α
! !
X X
+R × mα ẋ′α + mα x′α × Ṙ (121)
α α

30
Version: 11th Nov, 2017 11:13; svn-65

The expressions in the parentheses are center of mass position of an ensemble


with respect to the center of mass position, and
X X X X
mα x′α = mα (xα − R) = mα x α − mα R = M R − M R = 0 , (122)
α α α α

so the last two terms in (121) vanish. With P = M Ṙ, this leads to
X
L=R×P+ x′α × p′α ( with p′α := mα ẋ′α ) . (123)
α

The total angular momentum is therefore the sum of a contribution from the
center of mass, and a contribution of the total angular momentum with respect
to the center of mass of the ensemble. This will be become important in the
Huygens-Steiner theorem for moments of inertia.
To understand the dynamics of the total angular momentum in a differential
equation for L similar to (77), we first consider the temporal derivative of the
individual angular momenta:
d
l̇α = (xα × pα ) = ẋα × pα +xα × ṗα (124)
dt | {z }
=0

The first term vanishes because pα = mα ẋα , and cross products of parallel vectors
are zero. By using ṗα = Fα and the decomposition (110) one obtains
 
ext
X
l̇α = xα × 
Fα + f αβ 
 (125)
β
α6=β

and for the derivative of the total angular momentum


X X  X
L̇ = l̇α = xα × Fext
α + xα × f αβ . (126)
α α α,β
α6=β

To see that the last term vanishes, we reorder the summation


X X X X
Aαβ = Aαβ + Aαβ = (Aαβ + Aβα ) . (127)
α,β α,β α,β α,β
α6=β α<β α>β α<β

With Aαβ = xα × f αβ and using the symmetry f αβ = −f βα of the inter-particle


forces (111), one finds
X X
xα × f αβ = (xα − xβ ) × f αβ . (128)
α,β α,β
α6=β α<β

For central inter-particle forces, f αβ is parallel to the distance vector xα − xβ


between them, so the cross product vanishes; therefore, the whole sum (128)

31
Version: 11th Nov, 2017 11:13; svn-65

vanishes. With the torque Next ext


α := xα × Fα on particle α according to definition
(76) caused by the external force Fext
α , the change of the total angular momentum
with time is given by X ext
L̇ = Nα =: Next . (129)
α
Analog to the single particle case in (77), the total angular momentum of the
ensemble with respect to an origin O remains constant if the total external torque
Next with respect to O vanishes.

4.3 Work and total kinetic energy


Similar to the single particle case, we consider the sum of the work for a transition
of the ensemble between two configurations a and b:
b
XZ
Wab = Fα · dxα (130)
α a

Here, the configurations a and b represent sets of positions {x1 , x2 , . . .} for all
particles evolving in time. The integral is a path integral for the trajectory of
particle α in this transition. In exactly the same way as for the single particle
case in section 3.1, one replaces the kernel in the integral via
1
Fα · dxα = dTα with Tα := mα v2α , (131)
2
and can express the total work as
b
XZ
Wab = dTα = Tb − Ta , (132)
α a

with a total kinetic energy T of a configuration a or b given by the sum of the


individual kinetic energies,
X X1
T := Tα = mα v2α . (133)
α α 2
By splitting the velocity of the individual particles into a center-of-mass velocity
Ṙ, and a relative velocity x′α with respect to the center-of-mass,
v2α = ẋ2α = (ẋ′α + Ṙ) · (ẋ′α + Ṙ) = v′2
α + 2ẋα · Ṙ + Ṙ · Ṙ , (134)
the total kinetic energy becomes
X1 X1 2 d X
T = mα vα′2 + mα Ṙ + Ṙ mα x′α
α 2 α 2 dt α
| {z }
=0
X1 1 2
= mα vα′2 + M Ṙ . (135)
α 2 2

32
Version: 11th Nov, 2017 11:13; svn-65

So the total kinetic energy is a sum of the kinetic energy of the relative motion
of the particles with respect to each other, and the kinetic energy of the center-
of-mass motion in the form of a single particle with the total mass M of the
ensemble.

4.4 Total potential energy


Analog to the evaluation of the work integral in section 3.2 for a single particle
in a conservative force field through a potential, several potentials can be used to
describe conservative forces in a particle ensemble. First, the forces on particle
α in (130) get split up in external and internal contributions according to (110):
b b
XZ XZ
Wab = Fext
α · dxα + f αβ · dxα (136)
α a α,β
a
α6=β

Conservative external and internal forces can be written as

Fext
α = −∇α Uα , f αβ = −∇α U αβ , (137)

where ∇α denotes a differentiation with respect to the coordinates xα of particle


α. The two potentials are different functions, where Uα represents external po-
tentials, while the U αβ describes the interaction between particles α and β within
the ensemble.
The first term in (136) is completely analog to the single particle case:
b b ! b
XZ XZ X
Fext
α · dxα = − (∇α Uα ) · dxα = − Uα (138)
α a α a α a

For the second term in (136), the sum gets split up over half of the combinations
α, β like in (128),
X X X
f αβ · dxα = (f αβ · dxα + f βα · dxβ ) = f αβ · d(xα − xβ ) . (139)
α,β α,β α,β
α6=β α<β α<β

Formally, the total differential dU αβ can be written as


X ∂U αβ ∂U αβ
dU αβ = dxα,i + dxβ,i
i ∂xα,i ∂xβ,i
= (∇α U αβ ) · dxα + (∇β U αβ ) · dxβ . (140)

For inter-particle central forces, the potential U αβ only depends on the modulus
of the distance, |xα − xβ |, therefore, U αβ = U βα . Then,

∇β U αβ = ∇β U βα = −f βα = f αβ , (141)

33
Version: 11th Nov, 2017 11:13; svn-65

and the total differential dU αβ in (140) becomes

dU αβ = −f αβ · dxα + f αβ · dxβ = −f αβ · d(xα − xβ ) (142)

With this, the second term in the total work in (136) can be written as
b
b b
XZ XZ X

f αβ · dxα = − dU αβ =− U αβ , (143)
α,β α,β α,β
α6=β
a α<β
a α<β

a

so the complete expression for the total work (136) is


b
! b
X X
Wab = −
Uα − U αβ = − U |ba = Ua − Ub , (144)
α a α,β
α<β a

where the last U in the expression is a total potential energy


X X
U := Uα + U αβ (145)
α α,β
| {z } α<β
external | {z }
internal

for a given system state defined by positions {x1 , x2 , . . .} of all particles.

4.5 Energy conservation


As before in case of a single particle, the work in a configuration change can be
written both as a difference in total kinetic and total potential energy. Combining
(132) and (144) gives
Wab = Tb − Ta = Ua − Ub , (146)
or after reordering
Ta + Ua = Tb + Ub , (147)
which means the total energy E := T +U of the many-particle system is conserved
in an evolution between states a and b.
It should be noted that for rigid bodies, the second term in (136) will vanish
because the inter-particle distances ||xα − xβ || will not change in a system state
change a → b. Thus, the contribution (145) due to the internal interactions stay
constant, and need not to be evaluated explicitly. Then, the total kinetic and total
potential energy in the system are just the sum of the individual contributions
from the participating particles.

34
Version: 11th Nov, 2017 11:13; svn-65

5 Lagrangian mechanics - first approach


So far, the equations of motion have been derived from Newton’s laws, and re-
quired the knowledge of forces for all particles. This can sometimes become
challenging, in particular in cases where forces need to be considered that are
only there to meet the constraints. In general, the equations of motion of a
system following Newtonian mechanics can be written as

mk x¨k = Fk ({xl , ẋl }, t) . (148)

This is a system of differential equations for all xk , where k indexes both the
coordinate components (like x, y, z) and a particle index in a many particle sys-
tem: a system of two particles moving in 3-dimensional space would lead to 6
differential equations. The notation {xl , ẋl } indicates that in principle, all forces
Fk can depend on coordinates xl and velocities ẋl of all particles. Additionally
they can explicitly depend on time t.
Similarly, the total kinetic energy of the system (133) can be written as
X X1
T = T ({ẋl }) = Tk = mk ẋ2k . (149)
k k 2

The Cartesian momentum component for coordinate index k is then given by


∂T
pk = mk ẋk = , (150)
∂ x˙k
because differentiation of the sum in (149) with respect to ẋl vanish for l 6= k.
The left side of (148) can be expressed by the total temporal derivative of (150),
!
d ∂T
mk ẍk = ṗk = = Fk ({xl , ẋl }, t) . (151)
dt ∂ x˙k

For conservative forces, one can write Fk = −∂U/∂xk , with a total potential
energy U for the whole system. This is compatible with the definition in (145).
Then, the equations of motion become
!
d ∂T ∂U
=− for all k. (152)
dt ∂ x˙k ∂xk

With the definition of a so-called Lagrange function L := T − U , this can be


written as !
d ∂L ∂L
− = 0 for all k , (153)
dt ∂ x˙k ∂xk
because for the differentiation with respect to x˙k , there is no contribution from U ,
and similarly, for the differentiation with respect to xk , there is no contribution

35
Version: 11th Nov, 2017 11:13; svn-65

from T . The set (153) is called Lagrange equations of motion of a physical system,
and are equivalent to the equation of motion (148) in Newtonian mechanics.
So far, there is no obvious advantage of this method for obtaining the equations
of motion. However, it will simplify the treatment of systems, because these
equations of motion take the same form in also in general, possibly non-Cartesian
coordinates that reflect better the symmetry of a system.

5.1 Example: harmonic oscillator


We consider again the one-dimensional harmonic oscillator. The kinetic energy
is given by T = mv 2 /2, and the potential energy according to (106) by U = kx2
for an equilibrium position x0 = 0. Then, the Lagrange function becomes
1 1
L = T − U = m ẋ2 − k x2 (154)
2 2
The corresponding Lagrange equation is then
!
d ∂L ∂L d
− = (m ẋ) + kx = m ẍ + kx = 0 , (155)
dt ∂ x˙k ∂xk dt
which is equivalent to (64).

5.2 Generalization to some non-conservative forces


With the Lagrange formalism, it is also possible to consider some non-conservative
velocity-dependent forces, as long as they can be written as
∂V d ∂V
Fk = Fk ({xl , ẋl }, t) = − + (156)
∂xk dt ∂ ẋk
where V = V ({xl , ẋl }, t) is a velocity-dependent pseudo-potential. With a La-
grange function L = T − V , the force on the left side of the Newtonian equation
of motion (151) can be written as
d ∂T d ∂L d ∂V
mk ẍk = = + (157)
dt ∂ ẋk dt ∂ ẋk dt ∂ ẋk
With the right side of the Newtonian equation of motion (151) taken from (156),
and observing that ∂L/∂xk = −∂V /∂xk because T does not depend on xk , one
finds
d ∂L d ∂V ∂V d ∂V ∂L d ∂V
+ =− + = + . (158)
dt ∂ ẋk dt ∂ ẋk ∂xk dt ∂ ẋk ∂xk dt ∂ ẋk
By removing the second derivatives of V on both sides, one ends up with the
Lagrange equation of motion of the form (153).
This expands the usefulness of the Lagrange formalism significantly beyond
conservative forces. An important example of such a force is the one felt by a
charged particle in the presence of both an electrical and magnetic field.

36
Version: 11th Nov, 2017 11:13; svn-65

5.2.1 Lagrange function for a charge under electromagnetic forces


A particle with a point charge q is subject to forces both due to electrical and
magnetic fields:
F = q (E + v × B) (159)
The first part is due to the electric field vector E, the second part is the Lorentz
force due to the presence of a magnetic field B.
In order to show that such an interaction can be written in a Lagrangian form,
we make use from a result from electrodynamics that the electric and magnetic
fields can be derived from two potentials,

E = −∇Φ − A , and B = ∇ × A , (160)
∂t
where Φ = Φ(x, t) is the scalar electrical potential, and A = A(x, t) the so-called
vector potential. We will show that a “pseudopotential” energy
V = V (x, ẋ, t) = q [Φ(x, t) − ẋ · A(x, t)] (161)
can reproduce the electromagnetic force (159) via (156). For this, we first rewrite
(159) in terms of individual components k, and express the fields E, B by the
potentials Φ, A according to (160):
 
X
Fk = q Ek + ǫklm ẋl Bm 
l,m
 ! !
∂Φ ∂Ak X X ∂Aq 
= q − − + ǫklm ẋl ǫmpq
∂xk ∂t l,m p,q ∂xp
 ! ! 
∂Φ ∂Ak X X ∂Aq 
= q − − + ǫklm ǫmpq ẋl (162)
∂xk ∂t l,p,q m ∂xp
By using an identity for the sum over the Levi-Civita symbols,
X X
ǫklm ǫmpq = ǫmkl ǫmpq = δkp δlq − δkq δlp , (163)
m m

the summations over p and q can be removed:


" ! !#
∂Φ ∂Ak X ∂Al ∂Ak
Fk = q − − + ẋl − (164)
∂xk ∂t l ∂xk ∂xl
In a next step, we derive the force via (156) from the pseudopotential V in (161):
∂V d ∂V
Fk = − +
∂x dt ∂ ẋk
" k #
∂Φ X ∂Al d
= q − + ẋl − Ak
∂xk l ∂xk dt
" #
∂Φ X ∂Al ∂Ak X ∂Ak
= q − + ẋl − − ẋl , (165)
∂xk l ∂xk ∂t l ∂xl

37
Version: 11th Nov, 2017 11:13; svn-65

which is exactly the same as the force derived from the fields in (164). Therefore,
the forces (159) can be expressed via the pseudopotential (161).
As a consequence, the Lagrange function or Lagrangian for a charged particle
in time-dependent electromagnetic fields is given by
1
L = T − V = m ẋ2 − q [Φ(x, t) − ẋ · A(x, t)] , (166)
2
and an equation of motion for it can be obtained via the Lagrange equations.

6 Lagrangian mechanics from Hamilton’s prin-


ciple
In the previous section, the dynamics of a physical system was expressed not via
equations of motion provided by Newton’s laws, but via the definition of a the
Lagrange function L = L({xl , ẋl }, t) = T − U , and a set of differential equations,
the so-called Lagrange equations (153).
Since the Lagrange equations are just a mathematical prescription how to
obtain equations of motion for a system, all the information on how a system
evolves is encoded in the form of the Lagrange function. So far, we used Newton’s
laws, which use forces as their central concept, to derive the Lagrange equations.
However, the forces do not appear in the final equation of motion. Thus, several
attempts were made to derive the equations of motion from different principles,
and that do not require the notion of forces.
A long-time popular concept is the principle of minimizing a certain quantity:
For example, to explain how light is reflected off a mirror, Hero3 postulated that
the light takes the shortest path between two points. About 1000 years later, this
principle was extended by Ibn al-Haytham4 to describe the refraction of light as
well, postulating the principle that the path of the light is determined by the
shortest time to move between two points. This was stated in a modern from by
Fermat5 and is known as Fermat’s principle.
After a number of attempts to come up with such a principle determining
the mechanics by physicists over the time, Hamilton6 announced in 1834/35 the
dynamic principle, often referred to as Hamilton’s principle, that does exactly
this. It basically states:
Of all possible paths a system may move from one point to another within
a specified time interval, the actual path followed is the one that minimizes
the integral over the difference between kinetic and potential energies.
3
Hero of Alexandria, ≈10-70, in year 60
4
Ibn al-Haytham from Cairo, 965-1040
5
Pierre de Fermat, Toulouse, France, 1601-1665
6
Sir William Rowan Hamilton, 1805-1865

38
Version: 11th Nov, 2017 11:13; svn-65

A mathematical formulation of this is often written as


Zt2 Zt2
δS = 0 , with S := (T − U ) dt = L dt , (167)
t1 t1

where the the time integral S is referred to as action of a physical system when
evolving between two different states at times t1 and t2 . The delta symbol here
refers somewhat vaguely to a variation of a quantity, which tries to capture what
is meant by “the trajectory chosen by nature of the system will make S extremal”.
The quantity δS is the change of S with a variation of the final trajectory, and
should vanish for the extremal path – in a similar way that the change df of a
function f (x) vanishes with variation x → x + dx near a minimum or maximum
of f (x). The mathematical discipline of variational calculus tries to solve exactly
this problem.

6.1 Elements of variational calculus - Euler equation


To solve the variational problem of a kind proposed by Hamilton’s principle,
we first consider a function y(x), and a recipe to obtain a number J from this
function,
Zx2
J= f (y(x), y ′ (x), x) dx , (168)
x1

where f is a function that depends on three parameters: the function y(x) itself,
its derivative y ′ (x) with respect to the function parameter x, and x itself. Such
a function is referred to as a functional. Examples for such functions would be
s
q 1 + y ′2
f = y (1 + y ′2 ) or f = , (169)
x
where the first one does not explicitly depend on x, and the second one not
explicitly on y.
To find a condition on y(x) that minimizes or maximizes the value J, we
consider a small deviation of y(x) from that optimum in the form of a variation

y(α, x) = y(x) + αη(x) , (170)

where α is a parameter to gradually add the “deviation” η(x) to the optimal


solution y(x). At the end points x1,2 , the deviation should vanish, η(x1 ) =
η(x2 ) = 0.
Now, the functional J becomes a function of the perturbation parameter α,
Zx2
J(α) = f (y(α, x), y ′ (α, x), x) dx , (171)
x1

39
Version: 11th Nov, 2017 11:13; svn-65

y(x)
y

y(x)+ α 1 η (x)
y(x)+ α 2 η (x)

x1 x2 x

For a function y(x) that makes J extremal, one would require the condition

∂J !
= 0 for all η(x) , (172)
∂α α=0
because a small change from the optimal y(x) will not change the value of J at
the optimum. Condition (172) will now lead to a way to construct y(x):
Zx2 ! Zx2 !
∂J ∂f ∂y ∂f ∂y ′ ∂f ∂y ∂f ∂ 2 y
= + ′ dx = + ′ dx
∂α x1
∂y ∂α ∂y ∂α x1
∂y ∂α
|{z} ∂y ∂α∂x
| {z }
=η(x) =
dη(x)
dx
Zx2 ! Z x2 !
∂f ∂f dη(x)
= η(x) dx + dx (173)
x1
∂y x
∂y ′ dx
1
R R
The last part can be integrated using u = ∂f /∂y ′ , v = η(x), and uv ′ = uv− u′ v:
Zx2 ! x2 Zx2 !
∂J ∂f ∂f
d ∂f
= η(x) dx + ′ η(x) − ′
η(x) dx
∂α x
∂y ∂y x1 x
dx ∂y
1 | {z } 1
=0 because
η(x1 )=η(x2 )=0

Zx2 !
∂f d ∂f
= − η(x) dx (174)
x1
∂y dx ∂y ′

To make J extremal, ∂J/∂α needs to vanish for all deviations η(x), and thus
the expression in the parentheses needs to vanish. As the expression has to be
evaluated for α = 0, the expression in parentheses leads to a differential equation
for the optimal y(x):
∂f d ∂f
− =0 (175)
∂y dx ∂y ′
This condition is called the Euler equation, and provides a differential equation to
find the function y(x) that maximizes or minimizes J. This is a purely mathemat-
ical result. For f = L(x, ẋ, t), (175) has exactly the form of the Lagrange equation
(153) derived from Newton’s laws earlier. Therefore, in a mechanics context, the
equations of motion of this form are also called Euler-Lagrange equations.

40
Version: 11th Nov, 2017 11:13; svn-65

6.1.1 Second from of the Euler equation


The Euler equation (175) can be simplified if the function f does not explicitly
depend on x:
f = f (y, y ′ ) (176)
To arrive there, one starts with the total differential
df ∂f ∂f ′ ∂f ′′
= + y + ′y . (177)
dx |{z}
∂x ∂y ∂y
=0

The first term vanishes because f does not explicitly depend on x. By evaluating
!
d ∂f ∂f d ∂f
y′ ′ = y ′′ ′
+ y′ (178)
dx ∂y ∂y dx ∂y ′

and substituting the first term on the right side with the last term in (177), one
finds
!
d ∂f df ∂f ′ d ∂f
y′ ′ = − y + y′
dx ∂y dx ∂y dx ∂y ′
!
df ′ d ∂f ∂f
= +y − (179)
dx dx ∂y ′ ∂y
| {z }
=0

The last term vanishes because of the Euler equation (175). The rest can be
written as !
d ′ ∂f ∂f
f − y ′ = 0 or f − y ′ ′ = const. (180)
dx ∂y ∂y
This is the so-called second from of the Euler equation for y(x) to minimize/maximize
the functional J in (168).

6.1.2 Example: shape of a heavy rope


As a simple example for a variational problem, we consider the shape of a heavy
chain or rope suspended between two poles:

2a

y
x

41
Version: 11th Nov, 2017 11:13; svn-65

The principle that defines the shape of the rope is the demand that the potential
energy of the chain at rest should be minimal, because all deviations from that
configuration would drive the system into motion, which would eventually be
converted into heat via friction. The problem is completely defined by specifying
the length l of the rope, and the spacing 2a < l between the poles.
The total potential energy U in the gravitational acceleration g of the rope
with a line density (i.e., mass per length) ρ is given by
Z
U= ρ g y ds , (181)

where the integration is carried out along the rope, with a line element ds. This
integration along the
√ rope can be√ expressed by an integration along x with the
2 2
line element ds = dx + dy = 1 + y dx: ′2

Za q
ds U = ρg y 1 + y ′2 dx (182)
dy
y(x) −a
dx

This is a variational problem of the “second form” (180) with f = y 1 + y ′2 .
With the constant c required by the Euler equation in the second form, we get
∂f q yy ′
c = f − y′ = y 1 + y ′2 − y ′ √
∂y ′ 1 + y ′2
y(1 + y ′2 ) yy ′2 y
= √ − √ = √ , (183)
1 + y ′2 1 + y ′2 1 + y ′2
which can be further transformed into
dy q 2 c dy
c y′ = c = y − c2 or √ = dx (184)
dx y 2 − c2
Integration on both sides leads to
y 1
arccosh = (x + x0 ) (185)
c c
or finally  
x + x0
y(x) = c cosh , (186)
c
with the two integration constants c and x0 . Since cosh z = (ez + e−z )/2 is an
even function, we choose x0 = 0 in the middle of the rope. To fix the final
constant c from the rope length, we need an expression for the length, and with
y ′ = sinh(x/c) we find
Za q Za r Za
x 2 x
l = 1+ y ′2 dx = 1 + sinh dx = cosh dx
c c
−a −a −a
a
x a
= c sinh = 2c sinh (187)
c −a
c

42
Version: 11th Nov, 2017 11:13; svn-65

This leads to a transcendental equation


l a
sinh z = z with z = , (188)
2a c
which finally has to be solved numerically for z and consequently c.

6.1.3 Euler equations with several dependent variables


The Lagrange equations (153) typically involve several dependent variables, i.e.,
all coordinates of all particles. This corresponds to N individual functions yk (x)
that all depend on the same x. The functional f in (168),

f = f ({yk (x), yk′ (x)}; x) , (189)

depends now on an ensemble {yk (x), yk′ (x)} of variables yk and their derivatives
yk′ with respect to x in the same way as in (148). For the variational argument,
one defines a variation for component yk in the same way as in (170),

yk (α, x) = yk (x) + αηk (x) . (190)

The requirement to the extremal condition on J in (168) becomes


x
Z 2X !
∂J ∂f d ∂f !
= − ′
ηk (x) dx = 0 . (191)
∂α x k ∂yk dx ∂yk
k

Since all deviation functions ηk (x) are independent, all the parentheses in the
sum above have to vanish, which results in a system of differential equations:
∂f d ∂f
− = 0 for all k = 1 . . . N (192)
∂yk dx ∂yk′
This is exactly of the form that allows deriving the Lagrange equations from the
Hamilton principle in (167), where f becomes the Lagrange function L({xk , ẋk }, t),
and the independent variable x is replaced with time t.

6.2 Generalized coordinates


The Hamilton principle states that a physical system takes the trajectory that
makes the variation of the action integral vanish:
Zt2 Zt2
δS = δ L dt = δ (T − U ) dt = 0 . (193)
t1 t1

Earlier, the scalar Lagrange function L was expressed in a set of Cartesian co-
ordinates, {xk }, and a set of corresponding velocities, {ẋk }, and eventually the

43
Version: 11th Nov, 2017 11:13; svn-65

time t explicitly. Then, the Euler-Lagrange equations (192) provide equations of


motion for the whole physical system:
∂L d ∂L
− = 0 for all k = 1 . . . N (194)
∂xk dt ∂ ẋk
These equations of motion are equivalent to the equations of motion we derived
out of Newton’s laws, so there would be no advantage in introducing a new
principle to derive the equations of motion of a system.
In the variational calculus argument, there was no explicit reference that Carte-
sian coordinates needed to be used to write down the Lagrange function. The
Euler-Lagrange equations (194) were obtained buy simply replacing the yk in
(192) by coordinates xk , and similarly their derivatives with respect to the inde-
pendent parameter, and switching to the independent variable t. The Hamilton
principle does, however, not state anywhere that the Cartesian coordinates have
to be used for describing the system. Therefore, it is fine to use other coordi-
nates to describe the system, and the Hamilton principle provides a way to find
the equations of motion in other than Cartesian coordinate systems. This is an
important advantage of the Hamilton principle in comparison to the approach by
using Newton’s laws.

6.2.1 Simple example: plane pendulum


To demonstrate this, we consider the relatively simple problem of a pendulum,
made up by a mass m on a massless string of length l in the gravitational accel-
eration g:

O
θ
y l

Fc g

x m h
Fg Ft

To come up with the equation of motion from Newton’s laws, we first recognize
that the motion of the mass is constrained on a circular trajectory. An adequate
coordinate to express this is the angle θ, with a fixed distance l to the origin.
Forces acting on the particle are the force Fg = mg = −mg ey induced by the
gravitational acceleration g, and a constraining force Fc exerted by the string
on the mass to keep it on a circular trajectory, which is aligned with the string,
Fc = −Fc er = −Fc (ex sin θ − ey cos θ). (195)

44
Version: 11th Nov, 2017 11:13; svn-65

For a circular motion, r = l = const., and therefore ṙ = 0 and r̈ = 0. According


to (28), the projection ar = ẍ · er of the total acceleration ẍ in radial direction
is given by ar = r̈ − rθ̇2 = −lθ̇2 . With the constraining force pointing in inverse
radial direction, Fc = −Fc er , one can determine its amplitude Fc :

(Fg + Fc ) · er = (m ẍ) · er
−mg (ey · er ) − Fc = m ar
−mg (− cos θ) − Fc = −m lθ̇2
Fc = m (g cos θ + lθ̇2 ) . (196)

With knowledge of Fc , we can finally evaluate the total force, and use Newton’s
law to make the connection to the acceleration ẍ, again using (28):

m ẍ = Fg + Fc
m (−lθ̇ er + lθ̈ eθ ) = −mg ey − m (g cos θ + lθ̇2 ) er
2
(197)

The terms with lθ̇2 cancel, and by multiplying the last equation with eθ =
−ex cos θ + ey sin θ, dividing by m and using er · eθ = 0, we get

lθ̈ = −g (ey · eθ ) = −g sin θ (198)

or finally
g
θ̈ + sin θ = 0 . (199)
l
Steps (195) and (196) may seem unnecessarily complicated, because the constrain-
ing force was explicitly calculated, but this would be the procedure according to
Newton’s laws, taking care of all forces acting on the mass. One could have cut
some corners by only looking at the projection of all forces on eθ , and notice that
the only force that contributes here would be Fg , while the constraining force is
orthogonal to eθ , and would not need to be evaluated explicitly.
Now, we compare the strategy to obtain the equation of motion with the one
provided by the Lagrange formalism. We start out with writing down the kinetic
energy,
1 1
T = mv 2 = ml2 θ̇2 , (200)
2 2
and the potential energy of a mass in gravitational acceleration,

U = mgh − U0 = mgl (1 − cos θ) − U0 = −mgl cos θ , (201)

where we could subtract the constant offset U0 = mgl because this does not affect
the dynamics of the system. The Lagrange function then can be written as
1
L = T − U = ml2 θ̇2 + mgl cos θ , (202)
2

45
Version: 11th Nov, 2017 11:13; svn-65

which is now only a function of the dynamic variables θ and θ̇. Since Hamilton’s
principle makes no statement what coordinates can be used, as long as one can
define a meaningful potential and kinetic energy, one could just use (194) for the
single variable θ, and get the Euler-Lagrange equations of motion:

∂L(θ, θ̇) d ∂L(θ, θ̇)


− = 0. (203)
∂θ dt ∂ θ̇
By carrying out the differentiations of L, one finds
d 2
−mgl sin θ − ml θ̇ = −mgl sin θ − ml2 θ̈ = 0 . (204)
dt
Dividing the last part by −ml2 , we reproduce the equation of motion (199):
g
θ̈ + sin θ = 0 . (205)
l
In this derivation, there was no need to work out any forces. A lucky choice of
the right coordinate, θ in this case, took implicitly care of the constraint that the
mass has to move on a circle.

6.2.2 Generalized coordinates and velocities


The treatment of the pendulum above is only an example of describing a system
by generalized coordinates, which are usually referred to as a set {qk }. Together
with the set of their corresponding temporal derivatives, {q̇k }, referred to as
generalized velocities, they give a complete description of the state of a physical
system at any point in time. Again, the system can be made up by a number
of particles. Then, Hamilton’s principle leads directly to a set of equations of
motion for the whole system:
∂L d ∂L
− = 0 for all k = 1 . . . N (206)
∂qk dt ∂ q̇k
A few remarks on generalized coordinates seem to be in order:

• Generalized coordinates do not need to have the dimension of a length,


like Cartesian coordinates do by specifying the respective distance from a
reference point to the position of a mass point.

• Likewise, generalized velocities do not need to have the dimension of a


length per time, the dimension of the traditional velocity.

• Generalized coordinates of a particle can be functions of various coordinates


of one particle, but can also be combinations of coordinates from different
particles.

46
Version: 11th Nov, 2017 11:13; svn-65

• To some extent, generalized coordinates may explicitly contain the time.

• Generalized coordinates may be used to take care of constraints of a system,


like in the pendulum example. A careful analysis of the degrees of freedom
of a system can eliminate the explicit need to consider constraints.
The only thing that is really required for generalized coordinates to be useful
for coming up with the equations of motion of a physical system is that all the
degrees of freedom of the system are properly captured, and that it is possible to
formulate a Lagrange function that only depends on the {qk } and {q̇k }.

6.2.3 Point transformations between traditional and generalized co-


ordinates
One can make the transition between the traditional Cartesian coordinates {xk }
and the generalized coordinates {qk } of a system slightly more explicit by intro-
ducing a specific transformation between the two sets, and then showing that the
Euler-Lagrange equations in the generalized coordinates follow from the Euler-
Lagrange equations in the original coordinates. This step, however, is not nec-
essary to see that (206) can describe the complete dynamics of a system, but it
illustrates what transformations can be used to construct generalized coordinates.
A property for a set of generalized coordinates {q̇k } is that all the degrees of
freedom of the system are captured. Then, there must exist a transformation
that converts between the old and new set of coordinates, so the old coordinates
may be expressed as a function of the new coordinates, and possible the time t:

xk = xk ({ql }, t) (207)

This notation means that xk is an explicit function of all the ql , and the time t.
Then, the temporal derivative of qk can be calculated:
∂xk X ∂xk dql ∂xk X ∂xk
ẋk = + = + q̇l (208)
∂t l ∂ql dt ∂t l ∂ql

This makes ẋk an explicit function of the set {ql } (via the derivatives of xk with
respect to ql ), the set {q̇l }, and the time. The Lagrange function in the old
coordinates, L, and the one in new coordinates, L′ , should be the same, but of
course the functions L and L′ take a different form in the respective coordinate
sets – this is why there is a different symbol, L′ :

L({xk }, {ẋk }, t) = L({xk ({ql }, t)}, {ẋk ({ql }, {q̇l }, t)} = L′ ({ql }, {q̇l }, t) (209)

We now evaluate the derivatives we need for the Euler-Lagrange equations in the
new coordinates:
∂L′ X ∂L ∂xk ∂L ∂ ẋk
= + (210)
∂ql k ∂xk ∂ql ∂ ẋk ∂ql

47
Version: 11th Nov, 2017 11:13; svn-65

and ! !
∂L′ X ∂L ∂xk ∂L ∂ ẋk X ∂L ∂xk
= + = (211)
∂ q̇l k ∂xk ∂ q̇l ∂ ẋk ∂ q̇l k ∂ ẋk ∂ql
| {z } | {z }
=0 =∂xk /∂ql

The first term vanishes because in (207), xk does not explicitly depend on the
velocity q̇l . The change in the second derivative can be seen from (208) because
the only term in ẋk that depends on q̇l is the one with ∂xk /∂ql (don’t get confused
with the same l index – in (208), this is a summation index, and can be replaced
with, say, m. Then, ∂ q̇m /∂ql = δlm and the sum vanishes...).
Then, the “new” Euler-Lagrange equation can then simply be calculated:
! !
∂L′ d ∂L′ X ∂L ∂xk ∂L ∂ ẋk d X ∂L ∂xk
− = + −
∂ql dt ∂ q̇l k ∂xk ∂ql ∂ ẋk ∂ql dt k ∂ ẋk ∂ql
 
! !
 ∂xk
X ∂L d ∂L ∂L ∂ ẋk d ∂xk 

=  − + − (212)
 ∂ql ∂x dt ∂ ẋk} ∂ ẋk ∂ql dt ∂ql 
k | k {z
=0

The first parenthesis vanishes because of the Euler-Lagrange equations in the


original coordinates {xk }. For the second term, one explicitly calculates
d ∂xk ∂ 2 xk X ∂ 2 xk
= + q̇m
dt ∂ql ∂ql ∂t m ∂ql ∂qm
!
∂ ∂xk X ∂xk ∂ dxk ∂ ẋk
= + q̇m = = (213)
∂ql ∂t m ∂qm ∂ql dt ∂ql

With this, also the second term on the right side in (212) vanishes, and the Euler-
Lagrange equations in the new coordinates {ql } and the corresponding velocities
are recovered.

6.2.4 Generalized momenta and cyclic coordinates


Newton’s laws made an explicit statement on the quantity momentum associated
with a Cartesian coordinate xk , defined in (150) as pk = mk ẋk . To come to a
similarly useful definition for generalized coordinates, we first try to extract the
linear momentum pk from the Lagrange function. We first note that the required
velocity ẋk appears in the expression (149) of the kinetic energy,
X1
T = mk ẋ2k . (214)
k 2

The momentum pk can be extracted from T by differentiating with respect to ẋk :


∂T
= mk ẋk = pk (215)
∂ ẋk

48
Version: 11th Nov, 2017 11:13; svn-65

Since Lagrange function is the difference between T and a potential U that should
not depend on the velocities, one can obtain the momentum directly by differ-
entiating the Lagrange function with respect to ẋk . This can be used to define
a generalized momentum, if the Lagrange function is expressed in generalized
coordinates {ql } and generalized velocities {q̇l }:

∂L
pk := (216)
∂ q̇k
The difference between generalized momentum and the traditional momentum
can e.g. be seen with the Lagrange function of a charged particle in an electro-
magnetic field from (166):
3 3
X 1 X
L= mẋ2i − qΦ(x) + q ẋi Ai (x) . (217)
i=1 2 i=1

The corresponding generalized momentum to the coordinate xi is


∂L
pi = = mẋi + qAi (x) , (218)
∂ ẋi
which contains not only the kinetic momentum mẋi of the particle’s motion, but
also a contribution qAi from the vector potential.
The generalized momentum can also be used to rewrite the Euler-Lagrange
equation:
∂L d ∂L ∂L d ∂L
− = − pk = 0 or ṗk = (219)
∂qk dt ∂ q̇k ∂qk dt ∂qk
This has an important consequence if the Lagrange function does not depend on
a particular generalized coordinate qk . Such coordinates are called cyclic. If qk is
a cyclic coordinate, ∂L/∂qk = 0, and

ṗk = 0 or pk = const. (220)

So if L does not depend on a coordinate qk , and is therefore symmetric with


respect to a translation in qk , the corresponding generalized momentum does not
change in time or is conserved.
We will expand more on this important connection between symmetries and
conserved quantities that holds for any generalized coordinate and its canonically
conjugated momentum, because it will help to explore symmetries in problems.

6.3 Noether’s theorem: Symmetries and conservation laws


The conservation of the canonically conjugated momentum of a cyclic general-
ized coordinate deserves slightly more attention, because it can be expanded to

49
Version: 11th Nov, 2017 11:13; svn-65

different symmetries. The mathematically rigorous version of this idea was formu-
lated by Emmy Noether7 , and the connection between symmetries and conserved
quantities is known as the Noether theorem. In a simple form of this theorem,
symmetries are expressed as an invariance of the Lagrange function against small
symmetry transformations. Here, we consider three different symmetry examples:
translation symmetry in space, rotation symmetry, and translation symmetry in
time. However, this is a very general principle that reaches way beyond classical
mechanics, and is e.g. heavily used in elementary particle physics.

6.3.1 Translation symmetry and momentum conservation


If a Lagrangian does not explicitly depend on a particular coordinate qk , i.e., qk
is cyclic, L is symmetric with respect to a translation in this coordinate; this
caused momentum conservation (220).
We now use a slightly different way of noting this symmetry that will allow
treating the rotational symmetry easier later. First, we consider a transformation

q → q′ = q + δq , (221)

where q is a vector of (generalized) coordinates, and δq a small displacement


vector in a particular direction. We can decompose this displacement with the
help of a unit vector set {el }:
X
δq = δql el , (222)
l

with small time-independent displacements δql . Then, the change δL of the


Lagrange function under such a transformation is given by
!
X ∂L ∂L
δL = δql + δ q̇l . (223)
l ∂ql ∂ q̇l

The last term vanishes, as δ q̇l = d(δql )/dt and the small displacements δl do not
depend on time. A symmetry with respect to a small translation means that
the Lagrange function should not change under this transformation, or δL = 0.
Thus,
X ∂L X
δL = δql = ṗl δql = ṗ · δq = 0 . (224)
l ∂ql l

We can interpret this result in the following way: if the system (and therefore
the Lagrangian) has a translational symmetry in the direction δq, the projection
of the generalized momentum p on this direction does not change in time, or is
conserved.
7
A. Emmy Noether, 1882-1935

50
Version: 11th Nov, 2017 11:13; svn-65

6.3.2 Rotational symmetry and angular momentum conservation


We recall from section 1.5 that infinitesimal rotations can be described by a vector
δθ, which according to (44) causes a displacement of something at position r
(referenced to a point on the rotation axis) by

δr = δθ × r , and also δ ṙ = δθ × ṙ (225)

because the vector δθ should not depend on time. Under this position transfor-
mation
r → r′ = r + δr , (226)
the change of the Lagrange function is given by
3
!
X ∂L ∂L
δL = δxl + δ ẋl
l=1 ∂x
|{z}l
∂ ẋ
|{z}l
=ṗl =pl
3
X
= ṗl δxl + pl δ ẋl
l=1
= ṗ · δr + p · δ ṙ = ṗ · (δθ × r) + p · (δθ × ṙ) . (227)

With the cyclic permutation symmetry of the triple product8 a·(b×c) = b·(c×a),

δL = δθ · (r × ṗ) + δθ · (ṙ × p)
= δθ · (r × ṗ + ṙ × p)
!
d
= δθ · (r × p)
dt
= δθ · L̇ , (228)

where L was the angular momentum vector introduced in (73). If the physical
system is symmetric with respect to a rotation defined by δθ, the Lagrangian
does not change, i.e., δL=0. Then,

δθ · L̇ = 0 , (229)

which means that the projection of the angular momentum vector L on the axis
of rotational symmetry, defined by the direction of θ, is a constant in time,
or is conserved. An example would be a spherical pendulum, where a mass m
suspended by a string of length l in gravity is allowed oscillating freely. We leave
it to the reader to work out the Lagrange function for that case, but it should
be obvious that neither the potential nor the kinetic energy depend on the polar
8
The product a · (b × c) is used to calculate the volume of a parallelepiped spanned by the
three vectors a, b, and c.

51
Version: 11th Nov, 2017 11:13; svn-65

angle, thus the angular momentum component along the vertical direction is
conserved.
For a spherically symmetric problem, the Lagrangian is independent of rota-
tions around all directions in space, thus, all components of the angular momen-
tum are conserved.

6.3.3 Invariance against translation in time and energy conservation


We now consider a different symmetry, where the physical system is invariant by
a translation in time:
t → t′ = t + δt . (230)
Then, the change of the Lagrangian under small changes δt must vanish, or
∂L
= 0, (231)
∂t
i.e., the Lagrange function is not explicitly time dependent. Under this condition,
we can calculate the total differential of the Lagrange function with respect to
time (because the system can still evolve, causing L to change over time):
!
d X ∂L ∂L
L = q̇k + q̈k
dt k ∂qk ∂ q̇k
" !# !
X d ∂L ∂L
= q̇k + q̈k
k dt ∂ q̇k ∂ q̇k
!
X d ∂L
= q̇k (232)
k dt ∂ q̇k

From the first line to the second, the Lagrange equation (153) allows replacing
∂L/∂qk , and from the second to third line, the product rule for differentiations
was applied. The differentiation with respect to time in the last line can be taken
out of the sum, and the whole equation (232) can be written as
!
d X ∂L d
q̇k − L = 0 or H=0 (233)
dt k ∂ q̇k dt

The quantity in the parenthesis above is defined as the Hamilton function


X ∂L
H := q̇k − L (234)
k ∂ q̇k

of the problem, and is constant in time, or conserved if the physical system and
therefore the Lagrangian is symmetric with respect to a translation in time. In
many cases, the Hamilton function is the same as the total energy. In those
cases, the invariance of L under translations in time means that the total energy

52
Version: 11th Nov, 2017 11:13; svn-65

of the system is conserved, as we have seen in section 3.3 and 4.5. Equation
(233) expresses energy conservation for a physical system described in generalized
coordinates, but with the same assumption on the time-independence of U as
earlier. To identify H with the total energy E, there is another constraint to
consider in how the generalized coordinates are connected with the Cartesian
coordinates.

6.4 Hamilton function and total energy


To interpret the Hamilton function defined in (234), we detour to an aspect of
the total kinetic energy T of the system. In Cartesian coordinates, the kinetic
energy can be written as
X1
T = mk ẋ2k . (235)
k 2
Via the point transformations (207) between {xk } and {ql }, and using (208), the
kinetic energy can be expressed via generalized velocities q̇l :
! !
X1 X ∂xk ∂xk X ∂xk ∂xk
T = mk q̇l + q̇l′ +
k 2 l ∂ql ∂t l′
∂ql′ ∂t
" # " #  !2 
X X1 ∂xk ∂xk X X1 ∂xk ∂xk X1 ∂xk
= mk q̇l q̇l′ + 2 mk q̇l +  mk 
l,l′ k 2 ∂ql ∂ql′ l k 2 ∂ql ∂t k 2 ∂t
X X
= al,l′ q̇l q̇l′ + bl q̇l + c , (236)
l,l′ l

where al,l′ , bl , and c summarize the corresponding brackets in the line before.
For time-independent point transformations from {xk } to generalized coordinates
{ql },
∂xk
= 0, (237)
∂t
so bl = c = 0 in the kinetic energy expression (236), which then simplifies to
X
T = al,l′ q̇l q̇l′ . (238)
l,l′

Then, the partial derivative of T with respect to velocities q̇k are


∂T X X X
= al,k q̇l + ak,l′ q̇l′ = (al,k + ak,l ) q̇l (239)
∂ q̇k l l′ l

In the first step, the two terms appear because in the double sum in (238), q̇k
appears twice. In the second step, a variable change l′ → l for the sum allowed
taking everything under one sum. With this, one can evaluate the following sum:
X ∂T X
q̇k = (al,k + ak,l ) q̇k q̇l
k ∂ q̇k k,l
X
= 2 al,k q̇k q̇l = 2T . (240)
k,l

53
Version: 11th Nov, 2017 11:13; svn-65

The step from the first to the second line simply involved a summation index
exchange k ↔ l for one of the sums.
The relation (240) is special case of a property of so-called homogenous functions.
A multi-variable function f ({qk }) is called homogenous of degree p, if

f ({λqk }) = λp f ({qk }) . (241)

This means that if all parameters of f are multiplied by a constant λ, the value
of f is the value of the original function, multiplied with λp . For example, the
function f (x, y) = x2 + y 2 is homogenous of degree p = 2 in x, y. For homogenous
functions, Euler9 showed that
X ∂f
qk = pf , (242)
k ∂qk

which is referred to as Euler’s theorem on homogenous functions.


After this small detour, we come back to the definition of the Hamilton function
(234). If we assume that the potential U in L = T − U is independent of the
velocities q̇k , then
∂L ∂T
= , (243)
∂ q̇k ∂ q̇k
and we can write
X ∂L
H = q̇k − L
∂ q̇k
k
X ∂T
= q̇k − L = 2T − L = 2T − (T − U ) = T + U (244)
k ∂ q̇k

So the Hamilton function becomes the total energy (i.e., the sum of total ki-
netic and potential energy) under the assumptions we made when interpreting
H. Explicitly, these assumptions were

(a) the transformations form Cartesian to generalized coordinates {qk } used as


parameters for L and H are time-independent, and

(b) the potential energy U is independent of the velocities q̇k .

An example where H 6= E can be found with rotating coordinate systems, since


this makes the coordinate transformations to a inertial reference frame time-
dependent. Still, the total energy may be conserved.
9
Leonhard Euler, 1707-1783

54
Version: 11th Nov, 2017 11:13; svn-65

6.5 Non-conservative forces in the Lagrange formalism


So far, the description of physical systems by the Euler-Lagrange formalism re-
quired that forces can be expressed as a gradient of a potential, and do not depend
velocities, with notable exceptions like charged particles in electromagnetic fields
in section 5.2. However, friction forces and other dissipative forces that can not
be expressed by a potential or a Lagrange function are an important part of many
practical systems, so a mechanism to include these phenomena in the very flexible
Lagrange formalism is desirable.
To do this, we recall the definition of the work integral (130) in section 4.3,
where the work on an ensemble of particles transitioning from state a to b was
defined as
b
XZ
Wab = Fα · dxα . (245)
α a

The position xα of particle α can be expressed by generalized coordinates,

xα = xα ({qk }) , (246)

as long as the transformation between real and generalized coordinates is not


explicitly time dependent. The differential displacement dxα can be written
X ∂xα
dxα = dqk , (247)
k ∂qk

and the differential dW for the total work on the system by the forces Fα is
X
dW = Fα · dxα
α
!
X ∂xα
= Fα · dqk
α,k ∂qk
X
= φk dqk , (248)
k

with the definition of the so-called generalized force


X ∂xα
φk := Fα · . (249)
α ∂qk

Expression (248) holds for any type of force, and we now try to reproduce the
Lagrange equations. For that, we start with the expression for the total kinetic
energy as expressed by Cartesian velocities,
X1
T = mα (ẋα )2 , (250)
α 2

55
Version: 11th Nov, 2017 11:13; svn-65

and calculate their derivatives with respect to generalized coordinates qk :


∂T X ∂ ẋα
= mα ẋα · (251)
∂qk α ∂qk

From (247), one finds


X ∂xα ∂ ẋα ∂xα
ẋα = q̇k and thus = (252)
k ∂qk ∂ q̇k ∂qk

With this, the derivative of the kinetic energy with respect to the generalized
velocity q̇k is
∂T X ∂ ẋα X ∂xα
= mα ẋα · = mα ẋα · . (253)
∂ q̇k α q̇k α ∂qk
This allows assembling the kinetic energy dependent part of a Lagrange equation:
!
∂T d ∂T X ∂ ẋα d X ∂xα
− = mα ẋα · − mα ẋα ·
∂qk dt ∂ q̇k α ∂qk dt α ∂qk
" #
X ∂ ẋα ∂xα d ∂xα
= mα ẋα · − ẍα · − ẋα ·
α ∂qk ∂qk dt ∂qk
X ∂xα
= − mα ẍα · . (254)
α ∂qk

Using Newton’s equation of motion Fα = mα ẍα and the definition (249) for the
generalized force, one obtains
∂T d ∂T X ∂xα
− =− Fα · = −φk . (255)
∂qk dt ∂ q̇k α ∂qk
The generalized force can always be split up in a part that can be derived from
a potential U , and a residual part φ′k ,
∂U
φk = − + φ′k , (256)
∂qk
with a velocity-independent potential U = U ({qk }). Then, the equation of motion
(255) can be written as
∂(T − U ) d ∂(T − U ) ∂L d ∂L
− = − = −φ′k , (257)
∂qk dt ∂ q̇k ∂qk dt ∂ q̇k
where L = T − U is the usual Lagrange function that contains forces that can be
derived from a potential. Expression (257) resembles almost the set of Lagrange
equations (206), but this time, non-conservative forces like the friction forces
seen in section 2.3.2 can be taken into account as well. If there are no dissipative
forces, φ′k = 0, and the original Euler-Lagrange equations (206) are reproduced.

56
Version: 11th Nov, 2017 11:13; svn-65

The strategy to handle problems with such forces would be as follows: First,
formulate a Lagrange function L with the kinetic energy and the potential from
interactions or forces that can be expressed via a potential U in convenient coor-
dinates. Then, evaluate the generalized forces φ′k for the non-conservative forces
from their description in real space via (249). Subsequently, solve the set of
(possibly coupled) differential equations (257).

6.6 Constraints in the Lagrange formalism


One of the advantages of the Lagrange formalism is that constraints in the motion
of particles can be elegantly taken care of by an adequate choice of coordinates.
In an adequate coordinate system, a constraint does not explicitly not show up
in the equation of motion, and the number of degrees of freedoms or generalized
coordinates qk is reduced by the number of constraints.
Sometimes, however, it may be better to keep more generalized coordinates
in the system of differential equations to be solved. There is a specific method
within the Euler-Lagrange formalism that can handle explicit constraints.
Constraints between a set of coordinates are typically formulated in the form

f ({qk }) = 0 . (258)

An example would be the motion of a point mass on the surface of a sphere in the
usual three-dimensional space with coordinates x, y and z. There, the constraint
to the surface of a sphere with radius r is

f (x, y, x) = x2 + y 2 + z 2 − r2 = 0 . (259)

In general, constraints can not only be a function of coordinates qk , but also a


function of the velocities q̇k , or explicitly the time t. The constraints are typically
classified into the following categories:

scleronomic rheonomic
(fixed in time ) (time-dependent)
holonomic
f ({qk }) = 0 f ({qk }, t) = 0
(independent of q̇k )

non-holonomic
f ({qk }, {q̇k }) = 0 f ({qk }, {q̇k }, t) = 0
(dependent on q̇k )

The method of undetermined Lagrange multipliers allows taking care of holo-


nomic constraints, and certain non-holonomic constraints if they are of a certain
form.

57
Version: 11th Nov, 2017 11:13; svn-65

To see how these can be taken care of, we recall the variational problem in
section 6.1, and formulate it for two variables y and z that are connected via a
constraint
g(y, z; x) = 0 . (260)
As a reminder, in the Euler problem for the case at hand we were looking for
solutions y(x) and z(x) which make the functional
Zx2
J= f (y, y ′ , z, z ′ ; x) dx (261)
x1

extremal (i.e., minimal or maximal). This was done by adding deviations ηi with
a control parameter α to the desired solutions,

y(α, x) = y(x) + αη1 (x) , z(α, x) = z(x) + αη2 (x) . (262)

The extremal condition for J according to (191) for the two functions y and z is
then explicitly
Z x2 " ! ! #
∂J ∂f d ∂f ∂y ∂f d ∂f ∂z !
= − + − dx = 0 , (263)
∂α x ∂y dx ∂y ′ ∂α ∂z dx ∂z ′ ∂α
1

where the derivatives of y and z with respect to α are the deviation functions
ηi (x) from the optimal path:

∂y ∂z
η1 (x) = and η2 (x) = . (264)
∂α ∂α
As ∂J/∂α = 0 needed to be valid for all possible independent deviations ηi (x), it
was necessary that both parentheses in (263) are vanishing identically, which led
to the Euler equations for y and z. With constraint (260), however, the deviations
are not independent anymore, but connected via
dg ∂g ∂y ∂g ∂z ∂g ∂g
= + = η1 + η2 = 0 . (265)
dα ∂y ∂α ∂z ∂α ∂y ∂z
The deviation η2 now can be expressed via the deviation η1 ,

∂g/∂y
η2 = −η1 , (266)
∂g/∂z

so the condition for an extremal J turns into


Z x2 " ! ! #
∂J ∂f d ∂f ∂f d ∂f ∂g/∂y !
= − − − η1 (x) dx = 0 . (267)
∂α x ∂y dx ∂y ′ ∂z dx ∂z ′ ∂g/∂z
1

58
Version: 11th Nov, 2017 11:13; svn-65

For this expression to hold for all deviations η1 (x), the square bracket needs to
vanish identically, or
! !
∂f d ∂f 1 ∂f d ∂f 1
− = − =: −λ . (268)
∂y dx ∂y ′ ∂g/∂y ∂z dx ∂z ′ ∂g/∂z

As each side of this expression contains derivatives with respect to y and y ′ , or z


and z ′ only, the so-called Lagrange undetermined multiplier λ can at most be a
function of x. Then,
∂f d ∂f ∂g
− ′
+ λ(x) = 0, and
∂y dx ∂y ∂y
∂f d ∂f ∂g
− ′
+ λ(x) = 0. (269)
∂z dx ∂z ∂z
This is a set of differential equations for three functions y(x), z(x) and λ(x).
Together with the constraint condition (260), there are three equations for the
three unknown functions.
The method can be extended to more than two functions, and more than a
single constraint. Applying this general formalism to the Hamilton principle, the
Euler-Lagrange equations of motion with s constraints become
s
∂L d ∂L X ∂fi
− + λi (t) = 0, (270)
∂qk dt ∂ q̇k i=1 ∂qk

with s constraint equations fi = 0 for the dynamic variables qk (t), and the s
Lagrange undetermined multipliers λi (t).
Comparing (270) with the definition of equations of motion under arbitrary
generalized forces in (257), one can interpret the Lagrange multipliers directly as
forces that are not captured by standard Lagrange formalism:
s s
∂L d ∂L X ∂fi X ∂fi
− =− λi (t) =− φi,k with φi,k := λi (t) . (271)
∂qk dt ∂ q̇k i=1 ∂qk i=1 ∂qk

The φi,k represents the constraining force in direction of coordinate qk due to


the constraint fi = 0. This is useful to find out the constraining forces in the
Lagrange formalism, which tries to avoid the calculation of constraining forces
by an adequate choice of variables to describe the problem. Such constraining
forces are not interesting to understand the dynamics of the system, but may
be important in an “engineering” context, where a system has to be designed to
provide the constraining forces.

59
Version: 11th Nov, 2017 11:13; svn-65

7 Elements of Hamiltonian mechanics


The introduction of the Hamilton function has a much wider use than that of a
conserved quantity in time-invariant physical systems. In a similar way as the
Lagrange function is used to derive the complete dynamics of a system, there is a
method constructing a set of equations of motion that are based on the Hamilton
function. This method offers some advantages in the complexity of the equation
of motion, and may lead to a better handling of cyclic variables. The main
advantage of that method, however, is in its use in other areas of physics, namely
quantum mechanics: There, the Hamilton function is at the center of describing
the dynamics of a system. Similarly, Hamiltonian mechanics is heavily used is in
statistical physics, when ensembles of many particles, like in a fluid, are analyzed.
The starting point for this yet another approach to find equations of motion of a
system is the transition from a dynamic variable set {qk , q̇k } to a set that is made
up by the generalized coordinates, and their canonically conjugated momenta,
{qk , pk }. This should be possible because the momenta pk and velocities ẋk carry
somewhat similar information.
We remember the definition (216) of the generalized momentum,
∂L
pk = (272)
∂ q̇k
and also write the Lagrange equations of motion (153) with this momentum:
∂L d ∂L ∂L
− =0 → ṗk = (273)
∂qk dt ∂ q̇k ∂qk
A fast way to come to equations of motion is to write the Hamilton function as
a function of the new dynamic variables (and time, if necessary):
H = H({qk }, {pk }, t) , (274)
and try to write down the full differential dH in all the variables:
!
X ∂H ∂H ∂H
dH = dqk + dpk + dt . (275)
k ∂qk ∂pk ∂t
In a next step, we try to express the total differential dH from its definition
(234), and identify the partial derivatives in (275). For this, we first simplify the
Hamilton function by using definition of pk ,
X ∂L X
H= q̇k − L = pk q̇k − L , (276)
k ∂ q̇k k

and then calculate the total differential dH from this expression:


X X
dH = d( pk q̇k − L) = d pk q̇k − dL
k k

60
Version: 11th Nov, 2017 11:13; svn-65

" #
X X ∂L ∂L ∂L
= (pk dq̇k + q̇k dpk ) − dqk + dq̇k + dt
k k ∂qk ∂ q̇k ∂t
|{z} |{z}
=ṗk =pk
X ∂L
= (pk dq̇k + q̇k dpk − ṗk dqk − pk dq̇k ) − dt
k ∂t
X ∂L
= (q̇k dpk − ṗk dqk ) − dt (277)
k ∂t

In this transformation, the Lagrange equation of motion was inserted when


∂L/∂qk was replaced with ṗk . The result is a full differential dH, expressed
in full differentials of the parameters of H. By comparing the coefficients in
(275) and (277), one finds
∂H ∂H
q̇k = , −ṗk = , (278)
∂pk ∂qk
and
∂L ∂H
− = . (279)
∂t ∂t
Equations (278) are in fact already the equations of motion for the dynamic
variables qk and pk , and are referred to as Hamilton’s equation of motion.
If there are N degrees of freedom (i. e., N different indices k), then (278) forms
a system of 2N ordinary differential equations of first order. They are equivalent
with the other equations of motion of a system; the Lagrange equations of motion
(153) form a system of N differential equations of second order. Both sets of
equations of motion require 2N integration constants to be solved.
Before we use the Hamiltonian approach, we look at the total temporal deriva-
tive of the Hamilton function, which was shown to vanish if a system was invariant
to translations in time (see section 6.3.3). It can be directly calculated:
!
dH ∂HX ∂H ∂H
Ḣ = = q̇k + ṗk +
dt k ∂qk ∂pk ∂t
X ∂H ∂H
= (−ṗk q̇k + q̇k ṗk ) + = . (280)
k ∂t ∂t

The total derivative Ḣ is the same as the explicit time dependency ∂H/∂t, so if
H is not explicitly time dependent, H is a conserved quantity. As before, if U
is independent of the velocities q̇k , and the transformations from {xl } to {qk } is
independent of time, then H = E.
The strategy to obtain the Hamilton equations of motion is summarized below:
1. Express the kinetic energy T in the velocities q̇.

2. Find the potential energy U as a function of the coordinates qk .

61
Version: 11th Nov, 2017 11:13; svn-65

3. Find the canonically conjugated momenta pk = ∂L/∂ q̇k

4a. Express the kinetic energy T in these momenta pk if this is easy, and obtain
the Hamilton function H = T + U directly if the conditions for H = E are
met (no time-dependent coordinates qk , no q̇k in U ).

4b. Otherwise, obtain H via the definition (234).

5. Proceed to the equations of motion (278).

7.1 Simple example: harmonic oscillator


As before, we apply the Hamiltonian formalism first to a simple physical system,
the one-dimensional harmonic oscillator. We have seen in section 5.1 that
1 1
T = mẋ2 and U = kx2 . (281)
2 2
We use the standard Cartesian coordinate x as coordinate, and find the corre-
sponding momentum

∂L ∂(T − U ) ∂T
p= = = = mẋ . (282)
∂ ẋ ∂ ẋ ∂ ẋ
This allows expressing the kinetic energy via the momentum,

p2
T = , (283)
2m
and since U is not velocity-dependent and we work with Cartesian coordinates,

p2 kx2
H =T +U = + . (284)
2m 2
The resulting set of equations of motion obtained via (278) is then
1
ẋ = p, and ṗ = −kx . (285)
m
For this simple example, the two equations of motion can not be solved easier
than it used to be before, as both differential equations are coupled.
The advantages of this formalism come really out on more complex problems,
where several coordinates are cyclic.

62
Version: 11th Nov, 2017 11:13; svn-65

7.2 Hamilton dynamics and phase space


The transition to dynamic variables {qk } and {pk } allows visualizing the trajec-
tory of a system in this so-called phase space. To do so, we first keep in mind that
for many systems, the Hamilton function H is a constant in time. Therefore, in a
diagram that represents the function H({qk }, {pk }), the trajectory of the system
moves along a line of constant H. For example, the Hamiltonian of the harmonic
oscillator had a parabolic shape in x and p,

p2 kx2
H =T +U = + , (286)
2m 2
and the contour lines for a constant energy E (shown in black below) are ellipses:

H
300
250
200
150
100
50
0

10
5
-10 0 p
-5 -5
0
5 -10
x 10

In an appropriate scaling, the time evolution of the harmonic oscillator is just


a rotation
q in phase space around the origin, with a constant angular frequency
ω = k/m, independent of the amplitude of the oscillation, and the trajectories
in phase space are closed after a period T = 2π/ω:
Similarly to the simple picture where qualitative statements could be made on a
solution by knowing the potential energy in section 3.3.1, statements can be made
from observing contours with fixed total energy in phase space. First, it allows
identifying the type of solutions one can expect: If the plane of a constant energy
cuts the Hamilton function near a local minimum in phase space, the trajectory
in phase space will be closed loops, pretty much as shown in the graphics above.
But for higher energies, there may be solutions that are not bound - the plane
pendulum (see section 6.2.1) with coordinate θ would be such an example, where

63
Version: 11th Nov, 2017 11:13; svn-65

there are oscillatory solutions as well as solutions that increase θ monotonously


in time when the pendulum rotates:

unbound

θ
bound

For a given total energy E, some statements about maximal and minimal
angles θ can be made, or for maximal and minimal values for the corresponding
generalized momentum pθ (which happens to be the angular momentum).

7.3 Hamiltonian mechanics for an ensemble of systems


The concept of phase space becomes very useful if a physical system is obeyed in
the same way by a large number of particles, like molecules in a gas. Then, the
different systems can be represented by a point moving in the same phase space
at once, and one can ask the question of how an ensemble of system distributes
over time:
p

q
The ensemble will move in a particular way, but the arrangement of points in

64
Version: 11th Nov, 2017 11:13; svn-65

phase space may change its shape or orientation over time. In such a situation,
it can be useful to introduce a density ρ of systems in phase space, that describes
how many systems dN can be found in a infinitesimal phase space volume dv:
dN = ρdv , (287)
where dv is a volume in a phase space for the s coordinate/momentum pairs:
dv = (dq1 dq2 · · · dqs )(dp1 dp2 · · · dps ) . (288)
For a small volume element in phase space, one can balance how many systems
enter and leave the volume in a timer interval dt. For this purpose, we consider
the small area dq dp in the in the phase space for one coordinate q and momentum
p, and evaluate the flow of systems into this area:

4
p +dp
1 3

p
2

q q +dq
The number of systems dN1 that enter the volume in the time interval dt (or
the rate dN1 /dt) from the left is
dN1 dq
=ρ dp = ρ q̇ dp . (289)
dt dt
This can be understood in the following way: The first term ρ measures how
many points are there per unit phase space area, the second term q̇ captures how
fast these points move from left to right into the volume, and the third term dp
captures how wide the area is in p direction where systems can enter the area
under consideration. The expression ρq̇ is therefore a flow density per unit of p.
In a similar way, the rate of systems entering via the bottom boundary is
dN2 dp
=ρ dq = ρ ṗ dq . (290)
dt dt
To evaluate how the rate at which systems leave the test volume on the right
side, one finds the flow by Taylor expansion of the flow density ρq̇ in q up to the
first order: !
dN3 ∂
= ρ q̇ + (ρ q̇) dq + . . . dp . (291)
dt ∂q
In a similar way, the rate of systems leaving through the top border is
!
dN4 ∂
= ρ ṗ + (ρ ṗ) dp + . . . dq . (292)
dt ∂p

65
Version: 11th Nov, 2017 11:13; svn-65

Summing up the net flow of systems into the area dq dp, we get
dN dN1 dN2 dN3 dN4
= + − −
dt dt" dt dt # dt
∂ ∂
= − (ρ q̇) + (ρ ṗ) dq dp
∂q ∂p
" #
∂ρ ∂ q̇ ∂ρ ∂ ṗ
= − q̇ + ρ + ṗ + ρ dq dp (293)
∂q ∂q ∂p ∂p

Using the Hamilton equations of motion (278) for the second and last term,
! !
∂ q̇ ∂ ∂H ∂ 2H ∂ ṗ ∂ ∂H ∂ 2H
= = and = − =− , (294)
∂q ∂q ∂p ∂q∂p ∂p ∂p ∂q ∂q∂p

one can see that they vanish, and that


" #
dN ∂ρ ∂ρ
=− q̇ + ṗ dq dp . (295)
dt ∂q ∂p

Up to here, we have considered only the flux into the area dq dp (or the phase
space volume) of a single coordinate/momentum pair (q, p) by balancing the flux
through the sides of this volume. For s degrees of freedom (or qk /pk pairs), the
phase space volume dv is not a square, but a hypercube in 2s dimensions, with 2s
surfaces. We can balance the flux into dv in a similar way as for one dimension,
but need to replace the width dp of side 1 in expression (289) by the “area” ds1,k
of surface (1, k) of the hypercube:

dq dp → ds1,k = (dq1 dq2 · · · dqk−1 dqk+1 · · · dqs )(dp1 · · · dps ) . (296)

With this, the rate of systems entering via surface (1, k) is

dN1,k = ρ q̇k ds1,k dt . (297)

The difference between the opposing hypersurfaces (1, k) and (3, k) becomes


dN1,k − dN3,k = − (ρq̇k ) dqk ds1,k dt
∂qk

= − (ρq̇k ) dv dt
∂qk
" #
∂ρ ∂ q̇k
= − q̇k + ρ dv dt (298)
∂qk ∂qk

A similar argument holds for the opposing hypersurfaces (2, k) and (4, k). When
summing up the flux through all surfaces, the terms containing ∂ q̇k /∂qk and

66
Version: 11th Nov, 2017 11:13; svn-65

∂ ṗk /∂pk vanish again because of (294). This leads to a total number
!
X
dN = dN1,k + dN2,k − dN3,k − dN4,k dt
k
" #
X ∂ρ ∂ρ
= − q̇k + ṗk dv dt (299)
k ∂qk ∂pk

of systems entering the phase space volume dv in time dt.


Since the number of systems moving in phase space can not change, the influx
dN into dv must cause a change in the phase space density ρ in time:
∂ρ
dN = dv dt . (300)
∂t
Comparing this with (299) and omitting dv and dt leads to
" #
∂ρ X ∂ρ ∂ρ
=− q̇k + ṗk , (301)
∂t k ∂q k ∂p k

and after reordering


" #
∂ρ X ∂ρ ∂ρ d
+ q̇k + ṗk = ρ = 0 . (302)
∂t k ∂qk ∂pk dt

This result is referred as Liouville’s theorem10 , and states that the density ρ
of points in phase space under time evolution according to Hamilton’s equation
stays constant. To understand what this result means, we consider an ensemble of
systems subject to the motion of a harmonic oscillator, described by the Hamilton
function H in (286). The ensemble should be initially distributed over a large
range of positions x with a small spread in momentum p and thus little kinetic
energy. A quarter of an oscillation period T = 2π/ω later, the distribution has
evolved in phase space:

p after p
T/4

x x

10
published in 1838 by Joseph Liouville, 1809-1882

67
Version: 11th Nov, 2017 11:13; svn-65

While the distribution changed its configuration in phase space, the density in
the phase space volume that initially contained the distribution did not change,
and the large spread in p together with a large spread in x translated to a dis-
tribution with a large spread in p and a small spread in x. For the special case
of a harmonic oscillator, even the shape of the distribution stays constant, but
for more complex systems, the Liouville theorem dρ/dt still holds, i.e., the phase
space distribution is incompressible as long as the evolution in time can be de-
scribed by a Hamilton function, i.e., is non-dissipative.

7.4 Evolution of functions in time and Poisson brackets


Often, one is interested in the time evolution of a quantity that is a function of
all coordinates and/or positions of a physical system; the center of mass position
in section 4.1, or the total angular momentum in section 4.2 were such examples.
In general, such a function can be represented as
f = f ({qk }, {pk }, t) . (303)
Then, the temporal derivative of this function in time can be calculated in the
usual way by differentiating with respect to its dependent variables, and using
the Hamilton equations of motion (278):
!
df ∂f X ∂f ∂f
= + q̇k + ṗk
dt ∂t k ∂qk ∂pk
!
∂f X ∂f ∂H ∂f ∂H
= + − . (304)
∂t k ∂qk ∂pk ∂pk ∂qk
By defining the so-called Poisson bracket as a short notation,
!
X ∂f ∂g ∂f ∂g
{f, g} := − , (305)
k ∂qk ∂pk ∂pk ∂qk
the temporal derivative can be written as
df ∂f
= + {f, H} (306)
dt ∂t
This is a notation that resembles closely the expression for the time evolution of
an operator in a system in quantum physics. Some other properties of these Pois-
son brackets that are very closely related to corresponding expressions quantum
physics can also be verified:
!
X ∂qi ∂pj ∂qi ∂pj
{qi , pj } = −
k ∂qk ∂pk ∂pk ∂qk
| {z } |{z}
=0 =0
X
= δik δjk = δij , (307)
k

68
Version: 11th Nov, 2017 11:13; svn-65

and similarly, for Poisson brackets between coordinates:


!
X ∂qi ∂qj ∂qi ∂qj
{qi , qj } = − = 0, (308)
k ∂qk ∂pk ∂pk ∂qk
| {z } | {z }
=0 =0

and with a similar argument, {pi , pj } = 0. In general, two functions f and g are
said to commute with each other if

{f, g} = 0 . (309)

A useful expression can be obtained by evaluating the Poisson bracket with a


momentum pi :
!
X ∂pi ∂f ∂pi ∂f
{pi , f } = −
k ∂qk ∂pk ∂pk ∂qk
|{z} | {z }
=0 =δik
!
X ∂f ∂f
= − δik =− . (310)
k ∂qk ∂qi

Again, a symmetric expression can be obtained for a Poisson bracket with qi :


∂f
{qi , f } = . (311)
∂pi
Poisson brackets allow compactly formulating the evolution in time and the re-
lation between different functions of the state of the system characterized by a
set of dynamic variables {qk , pk }. They are very closely related to the commu-
tators on quantum physics; in fact, one of the ways to make a transition from
classical mechanics to quantum physics simply postulates particular properties of
the commutators between operators that differ from classical physics, and take
over the rest of the dynamics from the tool set developed in classical Hamiltonian
mechanics.

69
Version: 11th Nov, 2017 11:13; svn-65

8 Central force motion and two-body problem


One of the both historically and practically important problems in classical me-
chanics is the dynamics of two objects that exert a central force on each other.
As a reminder, central forces were defined as forces that two point masses exert
on each other along their distance vector:

m1 F12 F21 m2
r1 r2
O

Such forces are found in the gravitational interaction between celestial bodies
and in the Coulomb interaction between charged particles, and often are in good
approximation not subject to dissipation.

8.1 Simplification through conservation laws


Central forces only depend on the difference vector between the positions, and
thus can be derived from a potential that only depends on the distance vector:
F = F(r1 − r2 ) and U = U (r1 − r2 ) . (312)
Furthermore, since the form of the potential does not depend on the orientation
of the distance vector r = r1 − r2 between the two masses, the potential is only
a function of the distance r = |r|, so U = U (|r|) = U (r). The Lagrange function
of the system is given by
1 1
L = T − U = m1 ṙ21 + m2 ṙ22 − U (r) . (313)
2 2
As we have seen in section 4.1, a system of particles with central forces between
the particles without external forces has a constant total momentum, and the
motion of the center-of-mass position
1
R= (m1 r1 + m2 r2 ) (314)
m1 + m2
is uniform, i.e., R̈ = 0. Therefore, we can choose an inertial system where Ṙ = 0.
Next, we place the coordinate origin into the center-of-mass position, so R = 0.
We then we express the positions r1 , r2 via the distance vector r:
m2 m1
r1 = r and r2 = − r. (315)
m1 + m2 m1 + m2
With this, the Lagrange function of the problem becomes
1 m1 m2
L = µṙ2 − U (|r|) with µ := . (316)
2 m1 + m2

70
Version: 11th Nov, 2017 11:13; svn-65

The quantity µ is referred to as the reduced mass of the two-body problem. With
this, the original two-body problem has been reduced to a single body problem,
with only the distance vector r as a dynamic variable.
The Lagrange function does not depend on the orientation of r, only its modu-
lus |r| via U (|r|) = U (r). This means that the problem is spherically symmetric,
so the Noether theorem in 6.3.2 implies that the angular momentum

L=r×p (317)

is constant in time. By definition, vectors r and p are perpendicular to L, so they


stay in a fixed plane perpendicular to the constant L. This reduces the problem
to a two-dimensional problem in this plane. An adequate set of coordinates are
polar coordinates (r, θ) with
! ! !
cos θ cos θ − sin θ
r=r and ṙ = ṙ +r θ̇ (318)
sin θ sin θ cos θ

in the plane perpendicular to L. With this, ṙ2 in (316) becomes

ṙ2 = ṙ2 + r2 θ̇2 , (319)

and the Lagrange function simplifies to


1
L = µ(ṙ2 + r2 θ̇2 ) − U (r) = L(r, ṙ, θ̇) . (320)
2
As θ is a cyclic coordinate, the corresponding Lagrange equation (206) for θ states
just that the corresponding angular momentum
∂L
pθ = = µr2 θ̇ = const. =: l (321)
∂θ
is a constant of motion or a first integral. This constant angular momentum
l = |L| has a simple geometric interpretation. The area dA that the radius vector
r covers in a time interval dt can be calculated via

11111111
00000000dθ
00000000
11111111 rd θ 1 1
dA = r(rdθ) = r2 dθ , (322)

00000000
11111111
2 2
r
so the change of this are in time is given by
dA 1 l
= r2 θ̇ = = const. (323)
dt 2 2µ
This is referred to as Kepler’s second law 11 , which at that time was a heuristic
description of the positions of the planet Mars recorded by T. Brahe12 . While
11
published in 1609 by Johannes Kepler, 1571-1630
12
Tycho Brahe, 1546-1601

71
Version: 11th Nov, 2017 11:13; svn-65

first found with the gravitational interaction, this law holds for any form of the
potential U (r) in a central force problem, as it is only a consequence of angular
momentum conservation.
The remaining equation of motion is for the distance r between the two bodies,
∂L d ∂L ∂U d
− = µrθ̇2 − − (µṙ)
∂r dt ∂ ṙ ∂r dt
l2 ∂U
= − − µr̈ = 0 (324)
µr3 ∂r
or
l2 1 ∂U
r̈ −
2 3
+ = 0. (325)
µr µ ∂r
This is an ordinary differential equation in r that does not have a simple ana-
lytical solution for a general potential U (r). However, a number of qualitative
observations can be made by looking at the total energy E of the system (which
is conserved, as L is independent of t):
1 l2
E = T + U = µṙ2 + + U (r) . (326)
2 2µr2
This expression can be resolved for the radial velocity ṙ,
v !
u
dr u2 l2
ṙ = = ±t E − U (r) − . (327)
dt µ 2µr2

Collecting all terms that depend on r on one side and subsequent direct integra-
tion leads to
Zr
dr′
t= r   = t(r, r0 ) , (328)
r0
2
E − U (r ′ ) − l2
µ 2µr ′2

where r0 is the distance at t = 0. If the integral can be carried out explicitly,


The result can be inverted for the solution r(t, r0 ).

8.2 Effective potential in central force motion


Before moving on to a solution of the radial motion, it is helpful to group the
expression (326) into a part dependent on the radial velocity ṙ, and the rest:
1 l2
E = µṙ2 + + U (r) . (329)
2 2µr2
| {z }
=:Veff (r)

The so-called effective Potential Veff (r) captures both the original potential U (r),
and the kinetic energy l2 /2µr2 associated with the angular momentum (some-
times referred to as centrifugal energy), which diverges to positive values for

72
Version: 11th Nov, 2017 11:13; svn-65

r → 0. In a similar way that comparing of the potential with the total energy in
(3.3.1) allowed a classification of solutions, one can use the effective potential to
make statements on the radial motion of the system. We consider the case of an
attractive potential U (r) = −k/r with some l > 0:

Veff l2/(2µr2)
r3 min
E3

Veff(r)
0 r
r2 min r2 max
E2
U(r)
E1
r1

For E = E1 , the system is in a state where all the energy is taken up by the
effective potential energy, and nothing is left for any radial motion. Therefore,
ṙ = 0, so r is constant in time. This does not mean that there is no kinetic energy
– there is still the motion connected with the angular momentum l as part of the
effective potential Veff . The fixed r implies a circular trajectory (with radius r1 ),
i.e., the system is in a bound state.
For E = E2 , the system is also in a bound state, but there is enough energy
to allow for radial motion, oscillating between two extremal radii r2min and r2max
with an oscillation period that can be calculated via (328). The trajectory or
orbit r(t) could look like this:
rmax

rmin

r(t)

The motion in radial direction and the one due to the angular momentum l are
not necessarily synchronized for all potentials U (r) – in the figure above, they
are not. In this case the orbit is referred to as open.

73
Version: 11th Nov, 2017 11:13; svn-65

For E = E3 , there is only one intersection of Veff (r) with the line of constant
energy at r3min , which means that the particle moves from infinity to the point of
closest proximity to the origin (corresponding to the center of mass of the system),
and then leaves again with r(t → +∞) → ∞. Because U (r → ∞) → 0, the
particle approaches the center from an asymptotically uniform motion, interacts
with the other particle and then escapes, approaching a uniform motion into a
different direction:

rmin
r(t)
O

This situation is referred to as a scattering process, and is an important concept


in atomic, molecular and particle physics.
We now come back to the problem of solving the radial motion in (325). To
assess the shape of orbits without considering the explicit time dependency, we
convert (325) into an equation for r(θ). For this, we introduce the substitution
1
u := (330)
r

and calculate its derivatives with respect to θ (using l = µr2 θ̇):

du 1 dr 1 dr dt 1 1 µ
=− 2 =− 2 = − 2 ṙ = −ṙ , (331)
dθ r dθ r dt dθ r θ̇ l
and
d2 u µ dṙ µ dṙ dt µ1 µ2 2
= − = − = − r̈ = − r r̈ . (332)
dθ2 l dθ l dt dθ l θ̇ l2
This can be used to substitute r̈ in (325) by
!
d2 u l2
r̈ = 2 − 2 2 , (333)
dθ µr

so the differential equation (325) becomes

l 2 d2 u l2 ∂U
− 2 2
− 4
r = − or
µr dθ µr ∂r
d2 u µ ∂U
2
+ u = 2 r2 (334)
dθ l ∂r
This is a simpler differential equation that can be solved for the important class
of potentials where U ∝ 1/r.

74
Version: 11th Nov, 2017 11:13; svn-65

8.3 Kepler problems


Two-body problems with a potential
k
U =− (335)
r
are referred to as Kepler problems, because they are equivalent to solving the
planetary motion problem. For gravitational attraction,
k = Gm1 m2 , (336)
with the gravitational constant G and the heavy masses m1 and m2 . For the
coulomb interaction,
q1 q2
k=− , (337)
4πǫ0
for two charges q1 , q2 , with the electrical permittivity ǫ0 . The minus sign takes
into account that charges of the same sign repel each other, while charges of
opposite sign attract each other. The potential (335) leads to a radial derivative
∂U/∂r ∝ r−2 , which simplifies the differential equation (334) to
d2 u µ
2
+ u − c = 0 with c = 2 k = const. (338)
dθ l
This is a simple linear differential equation with special solutions of the form
u(θ) = AeBθ + c , (339)
with the same c as in (338). Inserting this solution into (338) leads to B 2 = −1,
or a general solution
u(θ) = A′ eiθ + A′′ e−iθ + c , or u(θ) = A′′′ cos(θ − θ0 ) + c . (340)
with free integration constants A′ and A′′ , or A′′′ and θ0 that are suitable for a
description of the real-valued solution. We can choose θ0 = 0, because this only
orients our polar coordinate system in a particular way. The integration constant
A′′′ needs then to be determined form the initial conditions. Re-arranging the
solution into a different form yields
u 11 A′′′
= = cos θ + 1 . (341)
c c r c
With a transition to the the common definitions
1 l2 A′′′
α := = and ǫ := , (342)
c µk c
the solution takes the form
α
= 1 + ǫ cos θ . (343)
r
This is the parametric description of a number of curves referred to as cone
intersections, and the dimensionless constant ǫ in this expression is referred to as
the eccentricity of the curve.

75
Version: 11th Nov, 2017 11:13; svn-65

8.3.1 Classification of orbits


The cone intersections as solutions for the Kepler problem can be classified ac-
cording to the values of the eccentricity ǫ; in the graph below, α was chosen
such that all trajectories have the same distance of the nearest point P on the
trajectory to the origin F :

ε=1 ε=1.5

ε=0.75 r(θ)
ε=0.5
ε=0 θ
A P
F

• For ǫ = 0, the solution for r(θ) = α is independent of θ, and the trajectory


is a circle around the coordinate origin. This corresponds to the bound
state with the lowest total energy for a given angular momentum l.

• For 0 < ǫ < 1, the trajectory r(θ) forms an ellipse, with the coordinate
origin in one of its focal points F . The trajectory represents again a bound
state. Point P on the trajectory is the one with the closest distance to
the coordinate origin, and therfore the shortest distance between the two
bodies. It is referred to as pericenter, or, if reference is made to a planet
orbiting the Sun, as perihelion, or as perigee if reference is made to a satel-
lite orbiting the earth. Similarly, the position A on the trajectory which is
furthest away from the coordinate origin is referred to as apocenter, apohe-
lion, or apogee, respectively. If one of the two masses is much heavier than
the other one, its distance from the center of mass of the system (i.e the
coordinate origin) is very small, and therefore would be located near the
focal point F of the elliptical orbit. This is Kepler’s first law for planetary
motion, stating that planetary orbits are ellipses, with the Sun in one of its
foci.

76
Version: 11th Nov, 2017 11:13; svn-65

• For ǫ = 1, the solution r(θ) represents a parabola. This corresponds to


the situation where E = 0. It is not a bound state anymore, because
r(t → ±∞) → ∞.

• For ǫ > 1, the trajectories are also not bounded. Furthermore, the range of
θ is limited because expression (343) diverges for θ → θmax , with
1
cos θmax = − . (344)
ǫ
In this case, the trajectory starts out with θ = θmax and r → ∞, moves
towards the coordinate center, with a decreasing θ, reaches the pericenter
P for θ = 0, and leaves for r → ∞ with θ → −θmax (or the other way
round). Such a problem is referred to as a scattering problem.

So far, we have discussed the solutions (343) to the Kepler problem only qual-
itatively. We still need to make the connection of the orbit parameter ǫ with the
physical properties of the two body problem, since α is already fixed via (342).
The connection is easily made by using expression (326) for the total energy,
and inserting the Kepler potential:
1 2 1 l2 k
E = T + U = µṙ + 2
− (345)
2 2 µr r
By definition, the radial velocity ṙ a the pericenter r = rmin vanishes
1 l2 k
E= 2
− (346)
2 µrmin rmin
On the other hand, we have an expression for rmin from the Kepler solution (343):
α
rmin = . (347)
1+ǫ
Inserting this into the expression for the total energy (346) yields

1 l2 (1 + ǫ)2 1+ǫ
E = − k
2 µ α2 α
 
k 1 2 k h2 i
= (1 + ǫ) − (1 + ǫ) = ǫ −1 . (348)
α 2 2α
This expression can be inverted into
s s
2αE 2l2
ǫ= +1= 1+E . (349)
k µk 2
With this, the orbit parameters α, ǫ are fully determined by the orbital momen-
tum l and the total energy E of the two-body problem. Alternatively, a particular

77
Version: 11th Nov, 2017 11:13; svn-65

velocity at a position in space could have been specified as an initial condition,


but l and E can always easily be calculated from them.
As a last step in this section, we want to apply Kepler’s second law (323), and
evaluate the total orbital period τ of a bound state by equating the area covered
by the radius vector per unit of time with the ratio of the ellipse area A and τ :
dA l ! A πab
= = = , (350)
dt 2µ τ τ
where a and b are the two semi-axes of an elliptical or circular orbit.

b F
A a rmin P

One can easily show that a and b are connected with the parameters ǫ and α
of the cone intersection expression (343):
α √
a= 2
and b = αa . (351)
1−ǫ

With this, one finds with l = αµk from (342)
s
2µ √ 2µ 4µ
τ = πab = πa3/2 α √ = a3/2 π . (352)
l αµk k
This leads to
τ2 4π 2 µ
= = const. , (353)
a3 k
which to a good approximation is Kepler’s third law, stating that the ratio of
τ 2 and the cube of the semi-major axis a of a planetary orbit is a constant
for all planets orbiting around the sun. The latter can be seen by recalling the
expressions for the reduced mass µ from (316) and k = Gm1 m2 for a gravitational
potential:

4π 2 µ m1 m2 1 4π 2 4π 2
= 4π 2 = ≈ , (354)
k m1 + m2 Gm1 m2 G(m1 + m2 ) Gm2

since the mass m2 of the Sun is much larger the mass m1 of any planet.

78
Version: 11th Nov, 2017 11:13; svn-65

8.3.2 Orientation of elliptical orbits: Laplace-Runge-Lentz vector


In the previous section, we have seen some aspects of stable orbits for Kepler
problems with
k k
U (r) = − and F = −∇U = − 3 r . (355)
r r
One of the consequences of the conservation of angular momentum L was that
the motion takes place in a plane only. Implicit in the solutions for the elliptical
orbits was that the semi-major axis is a constant in time, i.e., its orientation in
space is fixed. When determining the integration constants in (340), we chose
θ0 = 0, and aligned the coordinate system with the semi-major axis.
In practice, the orientation of the semi-major axis (and even the whole equa-
tions of motion) can be obtained from a specific vector. We define
1
A := p × L − µk r , (356)
r
with the momentum p, angular momentum L = r × p and position vector r.
This quantity is referred to as Laplace-Runge-Lentz vector, or sometimes only as
Runge-Lentz vector. It lies in the plane of motion (because the first component of
(356) is perpendicular to L, and the second component is in the plane of motion
anyways). To show that this vector is a constant of motion, we calculate at the
temporal derivative of its first component:
d
(p × L) = ṗ × L + p × |{z}
L̇ (357)
dt =0

Using ṗ = F = −kµ r/r3 and p = µ ṙ, we can continue


d kµ
(p × L) = − 3 (r × (r × ṙ))
dt r

= − 3 ((r · ṙ)r − (r · r)ṙ)
r

= − 3 (rṙ r − r2 ṙ)
r   
1 1 d 1
= kµ − 2 ṙ r + ṙ = kµ r . (358)
r r dt r
The step from the second to thrid line can be seen by writing r = r er , differen-
tiating this product with respect to time to get ėr , and using er ⊥ ėr . Then, the
expression can be re-arranged into
 
d 1 d
p × L − kµ r = A = 0 . (359)
dt r dt
Thus, the Laplace-Runge-Lentz vector is conserved over time. Its orientation
with respect to the elliptical motion can be seen by forming the scalar product

79
Version: 11th Nov, 2017 11:13; svn-65

with the position vector r:


r·r
A · r = r · (p × L) − µk
r
= L · (r × p) −µkr = l2 − µkr . (360)
| {z }
=L

On the other side, A · r = A r cos θ, where A = |A| and θ is the angle between
vectors A and r. Reordering this expression into

l2 A
=1+ cos θ (361)
µkr µk
suggests that the angle θ here is the same as in the solutions for the cone inter-
sections in (343). Thus, for θ = 0, the vector A points in the same direction as r,
namely in the direction of the pericenter, parallel to the semi-major axis of the
ellipse:

L
A θ A
p r F

Direct comparison of (361) with the expression for the cone intersections (343)
also reveals the connection between the length of the Laplace-Runge-Lentz vector
and the eccentricity:
A = µkǫ (362)

80
Version: 11th Nov, 2017 11:13; svn-65

8.4 Scattering orbits


We now come back to the orbits in section (8.2) corresponding to a total energy
E > 0, such that r(t) is not bounded, i.e., r(t → ±∞) → ∞. This is also the
only type of solutions for potentials modeling the two-body interaction that are
repulsive:

Veff
Veff(r)

U(r)
2 2
l /(2µr )
0 rmin r

Typically, the interactions U (r) vanish for large r; there, the particle is in
uniform motion with a constant velocity ṙ = v′0 . The prime with the velocity
should indicate that reference is made to a relative velocity between the two
bodies, because we still are in the framework of an effective one-body problem.
The vector r will move towards the pericenter P with a minimal distance from
the center-of-mass position (or coordinate origin O), and then move away again.
For r → ∞, the motion becomes uniform again, with a new constant velocity v′1 .
As we are still looking at a conservative interaction, the velocities have the same
modulus, but a different direction. Asymptotically, the effect of the two-body
interaction is a deflection of the particle by an angle ϕ′ . The geometry of that
interaction is shown in the diagram below:
v1’

α P
v0’ π−α ϕ’
r (θ) r θ
b
π−α ϕ’
θ max O θ max

First, we note that the scattering angle ϕ′ between the asymptotic velocities v′0

81
Version: 11th Nov, 2017 11:13; svn-65

and v′1 is determined by the maximal angle θmax in the polar coordinate system
(oriented in the direction O − P in the figure):

ϕ′ = π − 2θmax (363)

For any two-body central force problem, the angular momentum is a conserved
quantity. We therefore try to evaluate it from the geometric parameters shown
in the above diagram, using

l = |L| = |r × p| = µ|r × v′0 | = µ r v0′ sin α = µ r v0′ sin(π − α) , (364)

where here, α is the angle between r and v′0 . Asymptotically, the expression

r sin(π − α) =: b (365)

measures the shortest distance b between the trajectory the particle would have
taken without interaction, and the coordinate origin. The distance b is referred to
as impact parameter of the scattering trajectory, and measures, casually speaking,
how far the scattering center O was missed if there were no interaction. With
this, one can fix the angular momentum to

l = µ v0′ b (366)

The diagram above shows a scattering problem for a repulsive potential. By con-
vention, the scattering angle ϕ′ in this case is counted positively. For a scattering
problem with an attractive potential the scattering angle is negative, compatible
with definition (363):

v’1
θ max
v’0 P v’0 θ max
ϕ’
b θ max b ϕ ’< 0
O v’1
θ max O P

Since the scattering angle ϕ′ is a simple function (363) of the maximum angle
θmax of the trajectory in a polar coordinate system with the θ = 0 direction defined
by the direction O − P , we try to evaluate this angle for a general potential U (r).
For this, we go back to the expression (327) for the radial velocity in a central
force problem: v
u !
dr u2 l 2
ṙ = = ±t E − U (r) − . (367)
dt µ 2µr2

82
Version: 11th Nov, 2017 11:13; svn-65

We are interested in an expression θ(r), so we use the above expression to con-


struct a useful derivative:

dθ dθ dt θ̇
= =
dr dt dr ṙ
l 1 l 1
= 2
= 2 √ . (368)
µr ṙ µr ± · · ·

Sorting again components that depend on r on one side leads to


l dr
dθ = 2
√ , (369)
µr ± · · ·

which we integrate from the pericenter (at θ = 0) to infinity. Since the distance
r increases monotonously, we can choose the positive sign (assuming l > 0), and
get an expression for θmax :
+∞
Z
l dr
θmax = r  . (370)
2 l2
rmin µr2 µ
E − U (r) − 2µr 2

To carry out this integration, we still need to know the rmin , which can be obtained
from (327), knowing that at rmin , the radial velocity ṙ vanishes:

l
E − U (rmin ) − 2
= 0. (371)
2µ rmin

This equation needs to be solved for the integration boundary rmin in (370),
leading to the maximal polar angle θmax = θmax (E, l). The energy E in the
system is the energy in the center-of-mass system, so we should note it as

µv0′2
E′ = . (372)
2
As the angular momentum in a scattering problem is conveniently given by the
impact parameter b via (366), the scattering angle ϕ′ becomes only a function of
b and v0′ , or b and E ′ .

8.4.1 Scattering angle for Kepler problems


For general potentials U (r), it can be difficult or impossible to obtain the maximal
polar angle θmax from (370) in a closed form. For the important Kepler problems
introduced in section 8.3 it is possible, but it is even easier to use the result (344):
1
cos θmax = − , (373)
ǫ

83
Version: 11th Nov, 2017 11:13; svn-65

with the eccentricity ǫ given by (349). In the parametric equation (343) for the
Kepler orbits, we implicitly used k > 0 corresponding to attractive interactions
to obtain a radial distance r > 0. However, the sign of the integration constant
α in (342) can be changed for k < 0 without affecting any of the derivations,
resulting in r > 0 for repulsive interactions, e.g. of two electric charges of the
same sign. Expression (349) for ǫ is not affected by this sign change, so θmax can
be obtained via (373), resulting in an expression for the scattering angle ϕ′ :
!
1 π ϕ′ ϕ′
− = cos θmax = cos − = sin . (374)
ǫ 2 2 2

Using (349) for the eccentricity ǫ leads to


" #−1/2
ϕ′ 1 2l2
sin =− = − 1 + E′ 2
2 ǫ µk
" # " #−1/2
2 ′2 2 −1/2
′ 2µ v0 b b2 ′2
= − 1+E = − 1 + 4E 2
µk 2 k
" #−1/2
b2 k
= − 1+ 2 with b0 := . (375)
b0 2E ′

This can be further simplified into

ϕ′ b0
tan =− , (376)
2 b
correctly reflecting the sign convention. The scattering angle ϕ′ is therefore a sim-
ple function of the normalized impact parameter b/b0 , where b0 has the dimension
of a length:

ϕ’
π

π/2 k < 0 (repulsive potential)

0
1 b / b0
-π/2 k > 0 (attractive potential)

84
Version: 11th Nov, 2017 11:13; svn-65

8.5 Scattering cross section


Scattering problems are often described in terms of a scattering cross section.
The scattering cross section σ is the effective area an object exposes to projectiles
targeting the object from a particular direction. In the simplest form, σ is just
the projection of the object onto a screen S perpendicular to the velocity vector
v of the projectiles before the impact:

V0

1111
0000
σ
0000
1111
0000
1111
object

0000
1111
0000
1111
S
0000
1111
If the density of the projectiles per screen area is constant, the number of
projectiles hitting the object is proportional to its scattering cross section σ.
In the previous section, we found a relation between the impact parameter b and
the deflection angle ϕ′ for scattering trajectories of two particles that interact via
a potential U (r). To connect these trajectories with the concept of a scattering
cross section, we first quantify the direction in which projectiles are scattered. A
single direction in the 3-dimensional space can be described by two angles θ, φ in
a spherical coordinate system. The solid angle Ω captures a set of directions in
space, and is just the surface area on a unit sphere corresponding to this set of
directions. Since the surface of a sphere is 4πr2 , the full solid angle corresponding
to all possible directions is Ω = 4π, a half space corresponds to Ω = 2π. The
solid angle is dimensionless, but occasionally, the unit sr (for steradian) is used
to indicate that reference is made to a solid angle.
z
sin θ d φ
In spherical coordinates, a small
dΩ set of directions is given by the
dθ solid angle element

θ dΩ = sin θ dφ dθ . (377)

85
Version: 11th Nov, 2017 11:13; svn-65

Coming back to scattering orbits in two-body interactions, we need to consider


two different coordinate systems. One is the coordinate system in which we
describe the two-body interaction. As we have seen in section 8.1, the two-body
interaction is a problem in a plane W , characterized by polar coordinates θ, r in
the figure below:

dΩ
v’0 dσ
L
P
b
r ϕ’
θ
O
S
z

Remember that this is an effective one-body problem, assuming a description


in an inertial reference frame, with the coordinate origin O centered in the center-
of-mass motion of the two-body problem. The angular momentum L is constant
in time, and perpendicular to the plane W of orbital motion.
The other coordinate system is used to describe the deflection of the particle in
a coordinate system suitable to describe directions. This is a spherical coordinate
system, with the main axis pointing in the direction of the velocity v′0 of the
effective single body scattered from the origin O. This is a spherical coordinate
system, where z axis is aligned parallel to the initial velocity v′0 of the projectile.
We consider the surface element dσ in the screen plane S that corresponds to
a deflection into the direction element dΩ. The ratio

(378)
dΩ
is referred to as the differential cross section. To calculate this quantity, we
first note that the scattering problem for central force interactions is rotationally
symmetric around the z axis. Projectiles hitting in a ring of radius b and thickness
db with an area
dσ = 2πb db (379)
will have the same differential cross section, and end up in a ring-shaped set of
directions as shown in the figure below, covering a solid angle of

dΩ = 2π sin ϕ′ dϕ′ . (380)

86
Version: 11th Nov, 2017 11:13; svn-65


v’0
dΩ
b

ϕ’
S O

With this, we find for the differential scattering cross section



dσ 2πb db b db

= ′ ′
= . (381)
dΩ 2π sin ϕ dϕ sin ϕ′ dϕ′

The modulus of the derivative db/dϕ′ was taken because for scattering problems
with a repulsive potential, db/dϕ′ is negative; the differential cross section is only
meaningful for positive values.

8.5.1 Rutherford scattering


An important problem that helped to discover the structure of atoms was the
elastic scattering of α particles (positively charged helium nuclei) by the also
positively charged nuclei of heavy atoms. To treat this in the framework of
classical mechanics, we recall the expression (375) for the deflection angle ϕ′ for
scattering orbits in Kepler problems in the previous section:
" #−1/2
ϕ′ b2
sin =− 1+ 2 (382)
2 b0
By squaring and inverting this expression, we get
!−2
ϕ′ b2
sin =1+ 2 , (383)
2 b0
which can be differentiated on both sides,
ϕ′ ϕ′ 1 ′ 2 b
−2 sin−3 cos dϕ = 2 db , (384)
2 2 2 b0
or
db b20 cos(ϕ′ /2) 1
= − , (385)
dϕ′ sin3 (ϕ′ /2) 2b

87
Version: 11th Nov, 2017 11:13; svn-65

which is negative for all scattering angles 0 ≤ ϕ′ ≤ π. We use this result in


expression (381) for the differential scattering cross section, and obtain
dσ b20 cos(ϕ′ /2)
=
dΩ 2 sin ϕ′ sin3 (ϕ′ /2)
b20 k2
= = . (386)
4 sin4 (ϕ′ /2) 16E ′2 sin4 (ϕ′ /2)
This is a historically and practically important formula for Rutherford scattering.
The fortunate fact that this result, obtained by classical mechanics, is the same
as the one obtained from a quantum mechanical description led to a rapid de-
velopment of the understanding of the structure of atoms by Rutherford13 and
coworkers.

8.6 Scattering problems in the laboratory system


The differential scattering cross section dσ/dΩ is very useful because it is directly
related to typical scattering experiments. Usually, a scattering target is located
somewhere in space, and exposed to a homogenous flux of projectile particles (e.g.
Helium nuclei or protons). A typical sample consists of many target particles, and
is much larger than a characteristic impact parameter for an individual scattering
process, like b0 in (375). A detector with a fixed size measures the flux of scattered
particles at different deflection angles in a fixed distance from the target, so the
solid angle subtended by the detector from the view of the scattering target is
fixed. The differential scattering cross section is than simply proportional to the
detected number of scattered particles in a particular direction.
To make a connection between the differential scattering cross section seen in
the lab with the one calculated for the effective one-body problem, we need to
find an adequate transformation – first into the centre-of-mass (CM) system, then
into the lab.
effective single Lab system: r1
particle:
r
r R
O r2

The center-of-mass position R of two masses m1 , m2 at positions r1 and r2 (in


the lab system) was was defined in section 4.1 as
1 µ µ
R= (m1 r1 + m2 r2 ) = r1 + r2 , (387)
m1 + m2 m2 m1
13
published in Philosophical Magazine 21, 669-688 (1911) by Ernest Rutherford, 1871-1937

88
Version: 11th Nov, 2017 11:13; svn-65

with the effective mass µ = m1 m2 /(m1 + m2 ). The coordinate origin in a CM


system is chosen such that R = 0. Then, the positions r′1 and r′2 of both masses
in the CM system are connected by
µ µ
m1 r′1 + m2 r′2 = 0 , or r′1 = r, r′2 = − r (388)
m1 m2
with the difference vector

r = r′1 − r′2 = r1 − r2 , (389)

describing completely the state of the system in the effective single-particle de-
scription. With (387) and (389), the positions of particles 1 and 2 in the lab
system can be expressed by R and r according to
µ µ
r1 = R + r′1 = R + r, and r2 = R − r. (390)
m1 m2
Similarly, the velocities in the lab system are given by
µ µ
ṙ1 = Ṙ + ṙ and ṙ2 = Ṙ − ṙ . (391)
m1 m2

8.6.1 Transformation of energy between lab and CM system


The total kinetic energy of the two particles in the lab system at any time can
be expressed by the CM velocity Ṙ and the effective single-particle velocity ṙ,
 2   2
1 1 1 µ 1 µ
T = m1 ṙ21 + m2 ṙ22 = m1 Ṙ + ṙ + m1 Ṙ − ṙ
2 2 2 m1 2 m2
!
1 2 1 µ2 µ2
= (m1 + m2 ) Ṙ + + ṙ2
2 2 m1 m2
1 2 1 1 2
= M Ṙ + µ ṙ2 = M Ṙ + T ′ (392)
2 2 2
with the total mass M = m1 + m2 , and the kinetic energy T ′ in the CM system.
Long before and long after the impact, all the energy is in the kinetic energy of
the particles, so
1 2
E = T = M Ṙ + E ′ , (393)
2
where E ′ is the energy in the CM system available for the scattering process.
In a typical scattering scenario, the target mass m2 is initially at rest in the
lab (i.e., ṙ2 = 0), and the projectile mass m1 moves towards the target with a
velocity v0 . By differentiating (387), one finds for the CM velocity
µ
Ṙ = v0 . (394)
m2

89
Version: 11th Nov, 2017 11:13; svn-65

By using (391), one finds


µ µ
v0 = ṙ1 = v0 + ṙ or v0 = ṙ = v′0 , (395)
m2 m1
i.e., the velocity v′0 in the CM system before the interaction is the same as the
initial velocity v0 of the projectile in the lab system. Acknowledging that the
initial total energy is only given by the kinetic energy m1 v20 /2 of the projectile,
and using (394) in (393), one finds the simple relation
m2
E′ = E (396)
m1 + m2
for the total available energy E ′ in the CM system, and therefore also in the
effective single-particle system. Since the impact parameter b in the lab, the CM
system, and in the effective single particle system are the same, we have enough
information to calculate the deflection angle ϕ′ = ϕ′ (E ′ , b) (in the effective single
particle systm) from a known energy E in the lab system. Also note that the
angle ϕ′ between r and v0 is the same in the effective single particle system and
the CM system.

8.6.2 Transformation of deflection angles between lab and CM system


To transform the deflection angle ϕ′ back to the lab system, we consider the
geometrical relationship between velocities r1 , r2 and r as shown:

r1 r’1

ϕ ϕ’
m1 r1
R
ϕ ϕ’
R r1 r’1
m2 r2
ϕ ϕ’
R

On the left side, the relationship between ϕ and the asymptotic trajectories
in the lab, and the deflection angle ϕ′ with respect to the difference vector r is
shown. On the right side, one can see the corresponding geometry of the velocity
relation ṙ1 = Ṙ+ ṙ′1 between lab and CM system. Splitting this up into Cartesian
components leads to

ṙ1 cos ϕ = ṙ1′ cos ϕ′ + Ṙ and


ṙ1 sin ϕ = ṙ1′ sin ϕ′ . (397)

90
Version: 11th Nov, 2017 11:13; svn-65

Note that this expression relates asymptotic velocities, so ϕ and ϕ′ do not change
with time anymore. This can be resolved into a relation between ϕ and ϕ′ ,

sin ϕ sin ϕ′ Ṙ m1 v0
tan ϕ = = , with γ := ′
= , (398)
cos ϕ cos ϕ′ + γ ṙ1 m1 + m2 ṙ1′

which still depends on the final speed ṙ1′ of the projectile in the CM system. To
eliminate this, we express the total energy E ′ = T ′ in the CM system before the
impact by the kinetic energies after the impact, and include a possible energy
loss H during the scattering process:
1 1
E ′ = m1 ṙ1′2 + m2 ṙ2′2 + H . (399)
2 2
This can e.g. cover a radiation losses due to the acceleration of charges in the
scattering process. The kinetic energy of m2 can also be expressed by ṙ1′ using
momentum conservation in the CM system, leading to
 
1 m1 + m2 ′2
E ′ = m1 ṙ1 + H . (400)
2 m2
By expressing E ′ by E via (396) and dividing by E, one finds
 
m2 11 m1 + m2 ′2 H
= m1 ṙ1 + (401)
m1 + m2 E2 m2 E
or
ṙ1′2 H m1 + m2
 2
m1 + m2
1= + (402)
m2 v02 E m2
| {z }
m2
= 12 21
γ m
2

which can finally be resolved into


 2
2 m1 1
γ =  . (403)
m2 1− H m1 +m2
E m2

For elastic scattering with H = 0, one gets the simple relation


m1
γ= (404)
m2
for use in the transformation (398) between deflection angles ϕ and ϕ′ .

8.6.3 Transformation of the differential scattering cross sections


From the dependence of the deflection angle ϕ′ on the impact parameter b, the
differential scattering cross section dσ/dΩ could be calculated via (381). Since

91
Version: 11th Nov, 2017 11:13; svn-65

the differential dσ = 2πb db in (379) is the same in the lab and CM system,
we need to change the expression for solid angle element dΩ by one for the lab
system:
dΩLab = 2π sin ϕ dϕ , (405)
which leads to the simple relation between the differential scattering cross sections
in the lab and the CM system,
! !
dσ dσ sin ϕ′ dϕ′
= , (406)
dΩ Lab
dΩ CM
sin ϕ dϕ

where the differential scattering cross sections are evaluated at corresponding


angles ϕ in the lab, and ϕ′ in the CM system. Evaluation of the correction factor
on the right side of (406) is straightforward but a bit tedious; using (398), i.e.,

sin ϕ′
tan ϕ = (407)
cos ϕ′ + γ

one finds with 1/ tan2 x = 1/ sin2 x − 1 first

sin ϕ′ q
= 1 + γ 2 + 2γ cos ϕ′ . (408)
sin ϕ
Expressing the differential of (398) on the left side by dϕ, and on the right side
by dϕ′ leads after some steps to

dϕ′ 1 + γ 2 + 2γ cos ϕ′
= , (409)
dϕ 1 + γ cos ϕ′
and finally to a correction factor

sin ϕ′ dϕ′ (1 + γ 2 + 2γ cos ϕ′ )3/2


= (410)
sin ϕ dϕ 1 + γ cos ϕ′

8.6.4 Special cases


For the special case of elastic scattering (H = 0), and m1 = m2 , one finds

γ = 1 and E ′ = E/2 , (411)

so
sin ϕ′ ϕ ϕ′
tan ϕ = ′
= tan , or ϕ = . (412)
cos ϕ + 1 2 2
The correction factor (410) becomes

sin ϕ′ dϕ′
= 4 cos ϕ , (413)
sin ϕ dϕ

92
Version: 11th Nov, 2017 11:13; svn-65

and ! !
dσ dσ
= · 4 cos ϕ . (414)
dΩ Lab
dΩ CM ,ϕ′ =2ϕ

This case covers e.g. scattering of electrons electrons, or protons by protons.


The other important special case of elastic scattering (H = 0) is for a target
mass m2 much larger than the projectile mass m1 . Then, γ ≈ 0 and
! !
′ ′ dσ dσ
ϕ ≈ ϕ, E =E, and = . (415)
dΩ Lab
dΩ CM ,ϕ′ =ϕ

This was the case for the classical Rutherford scattering experiment conducted
by Geiger and Marsden14 , where relatively light α particles (m1 = 4 amu) were
scattered of a thin foil of gold atoms (m2 ≈ 197 amu). The occasional relatively
rare scattering of α particles in the backwards direction (ϕ > 90◦ ) indicated
that the positive charge of the nuclei was localized in a very small space only,
and not uniformly distributed over the whole size of the atom, as hypothesized
by the then common “plum pudding” model of a large-sized positive charge to
counterbalance the electrons.

14
H. Geiger and E. Marsden, Proc. Roy. Soc. London A82, 495-500 (1909).

93
Version: 11th Nov, 2017 11:13; svn-65

9 Harmonic oscillator
So far, we have encountered the undamped harmonic oscillator as an example for
obtaining an equation of motion via various strategies, resulting in

F = mẍ = −kx or ẍ + ω02 x = 0 (416)

with ω02 = k/m for the dynamical variable x, with oscillating solutions discussed
in section 2.3.3. Since the harmonic oscillator is at the core of many dynamic
phenomena in physics, it deserves a closer look, and the inclusion of dissipation
as well as response to time-varying external forces.

9.1 Damped harmonic oscillator


First, we extend the basic model by a damping term. This can be thought of as
a friction force present in the motion. In its simplest form (see section 2.3.2), the
friction force would be proportional to the velocity ẋ of the system. This results
in a modified equation of motion,

F = mẍ = −kx − αẋ , (417)

or in the more commonly found form

ẍ + 2β ẋ + ω02 x = 0 , (418)

with β > 0 for a damping action. This is still a linear ordinary differential
equation (ODE), and can also be solved using an exponential ansatz

x(t) = Aert . (419)

Inserting this into (418) and division by x(t) leads to an algebraic equation for r,

r2 + 2βr + ω02 = 0 , (420)

which can easily be solved for its roots:


q
r1,2 = −β ± β 2 − ω02 . (421)

Depending on the values of ω0 and β, the two roots r1,2 can assume complex
values. One distinguishes three cases for the solutions.

9.1.1 Underdamped case


If 0 < β < ω0 , the argument of the square root in (421) is negative, so r has an
imaginary component:
q q
r1,2 = −β ± i ω02 − β 2 = −β ± iω1 with ω1 := ω02 − β 2 . (422)

94
Version: 11th Nov, 2017 11:13; svn-65

The corresponding solutions (419) are

x(t) = Ae−βt±iω1 t = Ae−βt e±iω1 t . (423)

The first exponential provides the damping term that decays exponentially with
time, while the second exponential forms an oscillating part, with an oscillation
frequency ω1 lower than the frequency ω0 of the undamped system. As this is a
second order differential equation, there are two integration constants that allow
meeting the initial conditions of the problem; the most general solution to (418)
can be written in various ways,

x(t) = (Aeiω1 t + A′ e−iω1 t )e−βt


= (B cos ω1 t + B ′ sin ω1 t)e−βt
= C cos(ω1 t − δ)e−βt (424)

for integration constant pairs (A, A′ ), (B, B ′ ), or (C, δ). The first form is con-
venient if complex parameters are to be considered, the second and third are
often useful when a real-valued x(t) is expected. The solutions are illustrated
below, where an oscillation of frequency ω1 < ω0 is multiplied with an exponen-
tially decaying envelope e−βt . For t → ∞, the exponential term takes over, and
x(t) → 0.

e-βt Acos (ωt - δ)


x(t)

9.1.2 Critically damped case


For β = ω0 , the two roots of (420) become degenerate, r1,2 = −β. In this case,
one can verify that the ansatz

x(t) = (A + Bt) e−βt (425)

is a solution to (418), which can cover all initial conditions. Typical solutions for
x(t) matching various initial conditions are shown below:

95
Version: 11th Nov, 2017 11:13; svn-65

x(t)
A>0, B=0
A=0, B>0

t
A<B<0

9.1.3 Overdamped case


For β > ω0 , both roots r1,2 in (421) are real and negative, because the square
root q
β 2 − ω02 =: ω2 (426)
is always smaller than β. The solution for x(t) is a superposition of two expo-
nential decays, with two time constants corresponding to r1 and r2 :

x(t) = (Aeω2 t + A′ e−ω2 t )e−βt


= Be−(β+ω2 )t + B ′ e−(β−ω2 )t . (427)

The figure below shows a few typical examples for x(t):

x(t)
B > 0, B’ = 0

B > B’ > 0, β = 2ω0

t
B > 0, B’ < 0, β= 1.5ω0

9.1.4 Damped oscillator trajectories in phase space


It is instructive to visualize the various cases for solutions to the harmonic oscil-
lator problem in phase space. As introduced in section 7.2, the phase space for
the simple harmonic oscillator is a plane with coordinates x and p = mv = mẋ,
i.e., the momentum is proportional to the speed of the oscillator.
On the left side of the figure below, the case β = 0 reproduces the closed
elliptical trajectories we have seen in section 7.2. Various initial conditions lead
to trajectories with different amplitude.

96
Version: 11th Nov, 2017 11:13; svn-65

p p p

x x x

β=0 0<β<1 β>1

In the center, the underdamped case is illustrated for a single initial condi-
tion with a given amplitude and velocity. the trajectory is a logarithmic spiral,
converging into the origin x = 0, p = 0 for t → ∞.
On the right side, a few trajectories for the overdamped case are shown, with a
fixed value β = 1.3ω0 . The two dashed lines correspond to solutions of type (427)
with either B = 0, or B ′ = 0 (fast decay or slow decay only). There, the ratio
between position x and velocity ẋ is fixed and given by the respective decay rates
−β −ω2 and −β +ω2 . The trajectories first follow a direction corresponding to the
faster decay rate −β − ω2 . After some time, this contribution has become much
smaller than the one with the slower decay rate −β + ω2 , which then dominates
how the trajectory approaches the origin of the phase space.

9.2 Harmonically driven harmonic oscillator


So far, we have only considered time-independent equations of motion, which
are adequate for closed systems. However, physical systems are often driven by
external forces, so it is important to know how the system responds. Staying with
the (damped) harmonic oscillator, we first consider an external force Fext (t) =
F0 cos(ωt) with a harmonic time dependence:

F0
ẍ + 2β ẋ + ω02 x = A cos(ωt) with A = . (428)
m
This is a so-called inhomogeneous linear differential equation, with the inhomo-
geneity on the right side of the equation. The solution x(t) of such a differential
equation is the sum of a complementary solution xc (t) to the homogenous differen-
tial equation, as it was earlier presented in (424), (425), or (427), and a particular
solution xp (t) that takes care of the sinusoidal driving part. We first solve this
problem with a real-valued ansatz, and later with a complex-valued one. The
first approach guarantees a real-valued solution, but the latter is algebraically
simpler, and easier to derive and remember.

97
Version: 11th Nov, 2017 11:13; svn-65

9.2.1 Solution with a real-valued ansatz


The inhomogeneity oscillates with a fixed (angular) frequency ω, so we expect a
periodic solution x(t) with the same angular frequency. Similar to the ansatz in
section 2.3.3, we allow for a phase shift δ and try
x(t) = B cos(ωt − δ) . (429)
Direct differentiation and inserting into (428) leads to
−Bω 2 cos(ωt − δ) − 2βBω sin(ωt − δ) + ω02 B cos(ωt − δ) = A cos(ωt) . (430)
This can be sorted into terms with sin and cos and rearranged:
A
(ω02 − ω 2 ) cos(ωt − δ) − 2βω sin(ωt − δ) = cos(ωt) . (431)
B
The oscillating terms with the phase shift δ on the left side can be expanded into
an oscillating and a static part:
cos(ωt − δ) = cos ωt cos δ + sin ωt sin δ
sin(ωt − δ) = sin ωt cos δ − cos ωt sin δ (432)
With this, (431) can be arranged in a terms proportional to sin ωt and cos ωt:
h i
(ω02 − ω 2 ) cos δ + 2βω sin δ cos ωt
h i A
+ (ω02 − ω 2 ) sin δ − 2βω cos δ sin ωt = cos ωt . (433)
B
Since this equation needs to hold for all times, the amplitude of the sin and cos
components on both sides need to be the same, which leads to the two equations
A
(ω02 − ω 2 ) cos δ + 2βω sin δ = , and (434)
B
(ω02 − ω 2 ) sin δ − 2βω cos δ = 0 . (435)
Equation (435) leads to an expression for the phase shift,
2βω
tan δ = 2 , (436)
ω0 − ω 2
which provides expressions for sin δ and cos δ’:
2βω ω02 − ω 2
sin δ = q and cos δ = q . (437)
(ω02 − ω 2 )2 + 4ω 2 β 2 (ω02 − ω 2 )2 + 4ω 2 β 2
Those can be used to find a relationship between the amplitudes A of the driving
acceleration, and the amplitude B of the resulting oscillation from (434):
B 1 1
= 2 2
=q (438)
A (ω0 − ω ) cos δ + 2βω sin δ (ω02 − ω 2 )2 + 4ω 2 β 2
Before we discuss the result, we show that by using complex amplitudes, the prob-
lem can be substantially simplified, without using relations for the trigonometric
functions.

98
Version: 11th Nov, 2017 11:13; svn-65

9.2.2 Solution with a complex-valued driving term


We first re-write the inhomogeneous differential equation (428) into one with a
complex-valued inhomogeneity:
ẍ + 2β ẋ + ω02 x = Aeiωt . (439)
Inserting the complex-valued ansatz
xp (t) = Beiωt (440)
into the differential equation (439) gives
−ω 2 Beiωt + 2iβωBeiωt + ω02 Beiωt = Aeiωt , (441)
leading to the algebraic equation
A
−ω 2 + 2iβω + ω02 = (442)
B
or
1 B
= =: χ(ω) . (443)
(ω02 − ω2) + 2iβω A
Here, we defined the complex susceptibility χ(ω) which specifies the ratio between
the amplitude B of the oscillator response, and the amplitude A of the driving
oscillation. The modulus of the complex susceptibility is easy to calculate,
1 1
|χ(ω)| = =q (444)
|(ω02 − ω2) + 2iβω| (ω02 − ω 2 )2 + 4β 2 ω 2

and reproduces the result (438) for the amplitude ratio for a system driven by
a real-valued harmonic inhomogeneity. Similarly, the argument arg(χ) of the
complex susceptibility reflects the phase shift between complex amplitudes B
and A. For this, we remember that a complex number z can be written as
Im[z]
z = |z|ei arg(z) with tan(arg(z)) = (445)
Re[z]
With 1/(a + ib) = (a − ib)/(a2 + b2 ), we find from (443)
−2βω
tan(arg(χ)) = = tan(−δ) , (446)
ω02 − ω 2
which reproduces the phase shift δ for the real-valued expression (436). The
minus sign in the phase shift δ reflects the fact that we can write
xp (t) = χ(ω)Aeiωt = A|χ(ω)|e−iδ eiωt = A|χ(ω)|ei(ωt−δ) , (447)
where a positive valued δ reflects a phase lag of the response with respect to the
driving acceleration.

99
Version: 11th Nov, 2017 11:13; svn-65

9.2.3 Transients in harmonically driven oscillators


As mentioned before, the solution x(t) to the harmonically driven oscillator is a
linear combination of the particular solution we just obtained, and the comple-
mentary solution to the homogenous differential equation:

x(t) = xc (t) + xp (t) . (448)

The complementary solution allows taking care of the initial conditions of the
system, because the particular solution leaves no freedom to do so. For a time t
long after the instant where the initial conditions were defined, the complemen-
tary solution xc (t) will have decayed for the damped harmonic oscillator, and the
particular solution xp (t) will dominate the response. The complementary solution
leads to a so-called transient, shown in the figure below for two examples:

xc(t)
x(t) xc(t) x(t)

t t
xp(t) ω > ω0 > β ω0 > ω > β
xp(t)

9.2.4 Stationary solution of the driven harmonic oscillator


After the transients of the solution x(t) are gone, the system assumes the station-
ary solution xp (t), which is, apart from an amplitude, completely characterized
by the complex susceptibility χ(ω) from (443). With
1
χ(ω = 0) = =: χ0 . (449)
ω02
and another customary definition of the so-called quality factor
ω0
Q := (450)

that normalizes the damping factor β to the resonance frequency ω0 of the un-
damped oscillator, the behavior of the susceptibility for different frequencies can
then be discussed in a generic way. The modulus of the susceptibility, normalized
to χ0 and the phase shift δ = −arg[χ(ω)] is shown in the following figure below
for different values of Q.

100
Version: 11th Nov, 2017 11:13; svn-65

| χ(ω) |
Q →∞
5χ0

Q=5

Q=3

Q=2

χ0 Q=1

0
ωR ω0 ω
(Q=1)
- arg[χ(ω)] Q →∞
π

π/2 Q=1

0
0 ω0 ω

The modulus |χ(ω)| rises from χ0 with increasing frequency to a single maxi-
mum at the resonance frequency ω = ωR . The maximum is found by differenti-
ating (444),
∂|χ(ω)| −1/2 h
2 2 2
i
= q 3 2(ω0 − ω )(−2ω) + 8ωβ
∂ω (ω02 − ω 2 )2 + 4β 2 ω 2
−2ω h i
!
= √ 3 ω 2 − ω02 + 2β 2 = 0 (451)
···
revealing the maximum at
q
ω= ω02 − 2β 2 =: ωR . (452)
This amplitude resonance frequency ωR , the damped free oscillation frequency
ω1 , and the undamped oscillator frequency ω0 obey the relation
ω0 > ω1 > ωR . (453)
At ω = ω0 , the phase shift δ between driving force and oscillation amplitude
has increased from 0 at ω = 0 to δ = π/2, i.e., the response of the oscillator on

101
Version: 11th Nov, 2017 11:13; svn-65

resonance lags a quarter period behind to the driving force. The maximal value
of the susceptibility modulus at the amplitude resonance ωR takes the value
1
|χ(ωR )| = q
(ω02 − ωR2 )2 + 4β 2 ωR2
1 1 1
= q = q =
4β 4 + 4β 2 (ω02 − 2β 2 ) 2β ω02 − β2 2βω1
χ0 ω02 χ0 Q
= q =q . (454)
2βω0 1 − β 2 /ω02 1 − 1/(4Q2 )

For a weakly damped oscillator, Q ≫ 1, so


|χ(ωR )|
≈ Q, (455)
χ0
i.e., the resonance amplitude is increased by the factor of Q compared to the low
frequency response at ω = 0. For ω ≫ ω0 ,
1
|χ(ω)| ≈ , (456)
ω2
and the phase lag approaches δ = π, i.e., the response x(t) of the driven system is
opposed to the driving force. The 1/ω 2 dependence is e.g. used to damp out high
frequency vibrations on optical tables by making the table/suspension system an
oscillator with a very low resonance frequency.
The resonance peak gets more pronounced for larger values of Q. To see this,
we introduce a detuning ∆ := ω − ω0 from the resonance, and approximate the
susceptibility for ∆ ≪ ω0 :
1
|χ(ω)| = q
(ω + ω0 )2 (ω − ω0 )2 + 4β 2 ω 2
1
= q
(2ω0 + ∆)2 ∆2 + ω02 /Q2 (ω0 + ∆)2
1 Q
≈ q = q
4ω02 ∆2 + ω04 /Q2 ω02 4Q2 ∆2 /ω02 + 1
χ0 Q
= q (457)
4Q2 ∆2 /ω02 + 1

This is the amplitude version of a so-called Lorentz profile, which governs reso-
nance phenomena in many areas in physics, including spectral line of atoms and
molecules.
One can assign a width ∆ω of the resonance, defined by the frequency range
where the average energy stored in the resonator exceeds half of the maximal

102
Version: 11th Nov, 2017 11:13; svn-65

|χ| / χ0

Q / √2 ∆ω

0 ∆ = ω − ω0

energy on resonance. Since the stored energy is proportional to x2 and therefore


proportional to |χ2 |, we find the condition
 
∆ω Q ω0
= √
χ ω0 ± → ∆ω = (458)

2 2 Q
The maximum for the modulus of the susceptibility is located at
s
q 1 ω0 ∆ω
ωR = ω02 − 2β 2 = ω0 1 − 2
≈ ω0 − 2
= ω0 − , (459)
2Q 4Q 4Q
i.e., the peak separation from ω0 is much smaller than the width ∆ω of the
resonance line for large Q. An example of mechanical resonators with Q =
104 . . . 106 are the quartz crystals forming the time basis of modern clocks.

9.3 Arbitrarily driven harmonic oscillator


In the previous section, we considered the harmonic oscillator with a sinusoidal
or harmonic inhomogeneity, because of the simple temporal derivatives. The
particular solution for the differential equation (439) with an inhomogeneity Aeiωt
was
xp (t) = χ(ω)Aeiωt , (460)
with a complex susceptibility χ(ω) in (443). We now consider the more general
equation of motion
ẍ + 2β ẋ + ω02 x = a(t) (461)
and start with an inhomogeneity that is a superposition of two harmonic terms,

a(t) = a1 ei Ω1 t + a2 ei Ω2 t . (462)

Note that the frequencies Ω1,2 can be arbitrary. Due to the linearity of the
differential equation, the solution is the superposition of the solutions for the

103
Version: 11th Nov, 2017 11:13; svn-65

individual harmonic driving terms,

xp (t) = χ(Ω1 )a1 ei Ω1 t + χ(Ω2 )a2 ei Ω2 t . (463)

This also holds for a more general superposition and its corresponding solution,

X ∞
X
a(t) = an e i Ω n t −→ xp (t) = an χ(Ωn )ei Ωn t . (464)
n=−∞ n=−∞

For Ωn = n Ω0 , the sum in a(t) is referred to as a Fourier series15 . One can show
that every square-integrable periodic function a(t) on an interval [0, 2π/Ω0 [ can
be expressed in this way16 , and the coefficients an are uniquely determined by
2π/Ω0
Ω0 Z
an = a(t) e−i n Ω0 t dt . (465)

0

For non-periodic functions a(t) a similar transformation exists:



1 Z
a(t) = √ ã(ω) eiωt dω =: F −1 [ã(ω)] , (466)
2π −∞

which is the definition of the inverse Fourier transformation F −1 for the continu-
ous frequency distribution ã(ω). This frequency distribution can be obtained via
the direct Fourier transformation F,

1 Z
ã(ω) = F [a(t)] = √ a(t) e−iωt dt . (467)
2π −∞

The particular solution for an arbitrary inhomogeneity a(t) is therefore given by

xp (t) = F −1 [x̃(ω)] = F −1 [χ(ω)ã(ω)] = F −1 [χ(ω)F [a(t)]] . (468)

This is a rather simple procedure: First, the Fourier transformation of the inho-
mogeneity a(t) is calculated, resulting in a Fourier distribution ã(ω). The result
is multiplied with the complex susceptibility χ(ω), and the product transformed
back to obtain xp (t).
This approach also takes care of the initial conditions x(t → −∞) = 0 and
ẋ(t → −∞) = 0, so no complementary solution needs to be added.

15
after Joseph Fourier, 1768-1830
16
The right side of this interval is open to avoid problems with δ-functions on one of the
interval limits - Fourier transformations work also for these rather strange functions.

104
Version: 11th Nov, 2017 11:13; svn-65

9.3.1 Examples of Fourier transformations


A commonly encountered function is a rectangular step function, with a height
h and a width b, defined by
(
h for |t| < b/2,
a(t) = (469)
0 elsewhere.

The Fourier transform of a(t) can directly be calculated:

∞ b/2
1 Z −iωt h Z −iωt
F[a(t)] = ã(ω) = √ a(t)e dt = √ e dt
2π −∞ 2π
−b/2
h 1 h 2 sin(ωb/2)
= √ (e−iωb/2 − e+iωb/2 ) = √
2π −iω 2π ω
hb sin(ωb/2) hb
= √ = √ sinc(ωb/2) . (470)
2π ωb/2 2π
The characteristic width of the real-valued function ã(ω) in frequency space is
inversely proportional to the width in real space:

a(t) a~ (ω)

-b/2 0 b/2 t -2π/b 0 2π/b ω

Another example is the cosine function,


1  iω0 t 
a(t) = cos(ω0 t) = e + e−iω0 t , (471)
2
with the rather simple Fourier transform
r
π
ã(ω) = (δ(ω − ω0 ) + δ(ω + ω0 )) , (472)
2
which specializes for ω0 = 0 into

a(t) = 1 ↔ ã(ω) = 2π δ(ω) , (473)

with the Dirac delta function δ(ω).

105
Version: 11th Nov, 2017 11:13; svn-65

9.3.2 Expression of the oscillator response with Green’s function


While the solution of xp (t) for an arbitrary inhomogeneity a(t) is provided by
(468), it is not very efficient in practice, because two integrations have to be car-
ried out: one for the Fourier transformation, and one for the back transformation.
The expression can be simplified by explicitly writing down both transformations,
and then swapping integrations:
h i
xp (t) = F −1 χ(ω)F [a(t)]
∞ Z∞
1 Z ′
= dω e χ(ω) dt′ e−iωt a(t′ )
iωt

−∞ −∞
 
Z∞ Z∞
1 iω(t−t′ )
= dt′ a(t′ )  dω χ(ω)e 

−∞ −∞
Z∞ ∞
′ ′ ′ 1 Z
= dt a(t )g(t − t ) with g(τ ) := dω χ(ω)eiωτ . (474)

−∞ −∞

For a given physical system like the damped harmonic oscillator, the function
g(τ ) needs to be evaluated only once, and the result can be used to obtain xp (t)
from a(t) by a single integration. The combination of a(t) and g(t) in the form
above is also referred to as a convolution of the two functions a(t) and g(t),
which is also referred to as Green’s function17 for the physical system.
Green’s function has a simple physical interpretation. One rewrites the inho-
mogeneity a(t), and compares it with the response xp (t):
Z∞ Z∞
′ ′ ′
a(t) = a(t )δ(t − t ) dt → xp (t) = a(t′ )g(t − t′ ) dt′ (475)
−∞ −∞

Using again the linearity of the differential equation (461), the function g(t − t′ )
presents the response of the harmonic oscillator to a driving function δ(t − t′ ),
i.e., a delta pulse at time t′ . An arbitrary function a(t) can be composed as a
superposition of delta pulses according to the left side of (475). The response
xp (t) of the system is an appropriately weighted superposition of the responses
to these delta pulses. This idea behind Green’s function carries over to many
other areas in physics, especially in electromagnetism.
The task of determining g(t) for the harmonic oscillator can be solved in dif-
ferent ways:
(a) directly by solving the differential equation (461) for a delta-shaped driving
term, or
(b) by carrying out the Fourier transformation of the complex susceptibility
χ(ω) according to the definition of g in (474).
17
after George Green, 1793-1841

106
Version: 11th Nov, 2017 11:13; svn-65

9.3.3 Green’s function by direct integration


Green’s function is the solution of the inhomogeneous differential equation

ẍ + 2β ẋ + ω02 x = δ(t) , (476)

with initial conditions x(−ǫ) = 0 and ẋ(−ǫ) = v(−ǫ) = 0 for a small ǫ > 0 before
the delta-shaped inhomogeneity. During the short impact at t = 0, the left side
of (476) can be approximated by ẍ(t), because the system has not enough time
to build up a significant speed or displacement. Therefore (476) becomes

ẍ = v̇ ≈ δ(t) . (477)

This can be directly integrated over a small region around the delta function,
Z+ǫ Z+ǫ
v̇(t) dt = δ(t) dt = 1 → v(ǫ) − v(−ǫ) = 1 or v(ǫ) = 1 . (478)
| {z }
−ǫ −ǫ =0

The solution x(t) for t > ǫ can therefore be found by solving the homogenous
differential equation with initial conditions x(0) = 0 and ẋ(0) = 1. We have done
this already in (424). Assuming we have a case β < ω0 , we choose the form

x(t) = [B cos(ω1 t) + B ′ sin(ω1 t)] e−βt . (479)

The initial condition x(0) = 0 implies that B = 0. To meet the second initial
condition, we calculate the speed
h i
ẋ = B ′ cos(ω1 t)ω1 e−βt + sin(ω1 t)(−β)e−βt or ẋ(0) = B ′ ω1 (480)

and find B ′ = 1/ω1 , which completes the solution x(t). For t < 0, we demand
x(t) = 0. Therefore, Green’s function for the damped harmonic oscillator is
(
1 −βt
ω1
e sin(ω1 t) for t > 0 ,
g(t) = (481)
0 for t ≤ 0 .

g(t)

107
Version: 11th Nov, 2017 11:13; svn-65

9.3.4 Green’s function from Fourier transformation of χ(ω)


From the definition in (474), it follows that Green’s function is essentially the
(inverse) Fourier transform of the complex susceptibility:
∞ ∞
1 −1 1 Z iωτ 1 Z
g(τ ) = √ F [χ(ω)] = χ(ω) e dω =: f (ω) dω (482)
2π 2π 2π
−∞ −∞

Such integrations can be carried out efficiently by making use of a result from
complex calculus. Cauchy’s residue theorem18 considers the integral of a function
f (z) along a closed path C in the complex plane:

Im[ z]

poles inside C
zk poles outside C

Re[ z]
C

The theorem states that the integral of f (z) along C (when evaluated in coun-
terclockwise direction) is given by
I X
f (z) dz = 2πi Res(f, zk ) , (483)
C zk

where zk are the poles of the function f (z) (i.e., locations where f (z) diverges)
inside the path C, and Res(f, zk ) the so-called residues of f at the poles zk . Poles
outside the path C do not contribute to the integral. A residue at zk is defined
as element f−1 in the Laurent expansion

X
f (z) = fn (z − zk )n (484)
n=−∞

around the position zk of a pole. This Laurent series19 is an extension of the


Taylor series (where n = 0 . . . ∞) to functions with a singularity/pole.
To use this theorem, we first extend the kernel of the integral (482) to a
complex-valued parameter z:

eizτ
f (ω) = χ(ω) eiωτ → f (z) = (485)
ω02 − z 2 + 2izβ
18
after Augustin-Louis Cauchy, French mathematician, 1789-1857
19
after P.A. Laurent, French mathematician, 1813-1854

108
Version: 11th Nov, 2017 11:13; svn-65

The denominator of f (z) vanishes at two complex locations


q
za,b = iβ ± ω02 − β 2 = iβ ± ω1 , (486)

where f (z) will have poles:

− eizτ
f (z) = . (487)
(z − za )(z − zb )

The minus sign reflects that the z 2 term in the denominator of (485) is negative.
The two residues of f are the respective parts that remain when leaving out the
divergent factor 1/(z − za ) or 1/(z − zb ):

− eiza τ −e−βτ eiω1 τ


Res[f, za ] = =
(za − zb ) 2ω1
izb τ −βτ −iω1 τ
−e e e
Res[f, zb ] = = (488)
(zb − za ) 2ω1

The location of the poles of f is shown in the diagram below, together with the
integration path from z = −∞ to z = +∞ along the real axis for (482):

Im[z ]
zb za
i β − ω1 i β + ω1 integration
path
−∞ +∞
Re[z ]

In a next step, the integration along the real axis needs to be translated into
an integration along a closed path C. For this, we construct C from a part C1
along the real axis from z = −R to z = +R, and a semicircle C2 with radius R:

Im[ z] Im[ z]
τ <0 τ >0 C2
R
zb za zb za

C1 R Re[ z] C1 Re[ z]

C2

109
Version: 11th Nov, 2017 11:13; svn-65

For R → ∞, the integral along C1 will become the integral in (482) for g(τ ),
and we show that integral I2 along the semicircle C2 vanishes. To do so, we
parameterize C2 by an angle φ and the radius R:
Z Z
dz
z = Reiφ → I2 = f (z) dz = f (Reiφ ) dφ . (489)
C2 C2 dφ
To show that I2 vanishes, it is enough to consider the modulus of the integral:
Z

−eiR(cos φ+i sin φ)τ iφ


|I2 | =
C2 (z − za )(z − zb )
iRe dφ

Z iR(cos φ+i sin φ)τ

−e


Z
R e−Rτ sin φ
≤ iRe dφ = dφ (490)
C2 (z − za )(z − zb ) C2 |(z − za )(z − zb )|

For z on the semicircle with a radius R large enough to include the poles,
q
|z − za,b | > R − ω12 + β 2 = R − ω0 , (491)
and therefore
Z
R e−Rτ sin φ R Z
|I2 | ≤ dφ < 2
e−Rτ sin φ dφ . (492)
C2 |(z − z a )(z − z b )| (R − ω 0 ) C2

For τ < 0, we choose the bottom semicircle shown in the left part of the figure.
There, sin φ ≤ 0, so e−Rτ sin φ ≤ 1 so
Z Z
R −Rτ sin φ R Rπ
|I2 | < 2
e dφ ≤ 2
dφ = . (493)
(R − ω0 ) C2 (R − ω0 ) C2 (R − ω0 )2
The last upper bound for I2 vanishes for R → ∞, so

lim |I2 | < lim =0 ⇒ lim I2 = 0 . (494)
R→∞ R→∞ (R − ω0 )2 R→∞

There are no poles surrounded by C in the lower imaginary plane, so



1 Z 1 Z
1 I
g(τ < 0) = f (ω) dω = lim f (z) dz = lim f (z) dz = 0 . (495)
2π 2π R→∞ C1 2π R→∞ C
−∞

For τ > 0, the same argument holds for the vanishing contribution from C2 in
the upper semiplane. The full path C now encloses the two poles at za and zb , so

1 Z 1 Z
1 I
g(τ > 0) = f (ω) dω = lim f (z)dz = lim f (z) dz
2π 2π R→∞ C1 2π R→∞ C
−∞
1
=
2πi (Res[f, za ] + Res[f, zb ])
2π !
−e−βτ eiω1 τ e−βτ e−iω1 τ
= i +
2ω1 2ω1
1 −βτ
= e sin ω1 τ . (496)
ω1
This reproduces the result we found by direct integration in (481).

110
Version: 11th Nov, 2017 11:13; svn-65

9.4 Small oscillations


The concept of the harmonic oscillator and its associated solutions in the presence
of damping and external excitation can be found in many physical situations, and
may apply even where a restoring force is only approximately linear in the re-
spective coordinate. We consider an arbitrary potential for a system with a single
degree of freedom, describing the conservative interaction with its environment:

U(x)

x1 x2 x3 x

The potential U (x) shall have a minimum at position x1 . For small deviations
u from x1 , the potential can be approximated by a Taylor expansion:

∂U 1 ∂ 2 U 2
U (x) = U (x1 ) + u + u + ... , with u = (x − x1 ) (497)
∂x x=x1 2 ∂x2 x=x1
| {z }
=0

In a minimum of U (x), the second term vanishes by definition, and the potential
resembles that of a harmonic oscillator, since the constant offset U (x1 ) does not
change the dynamics of the system.
The minimum is characterized by a positive second derivative of the potential,

∂ 2 U
> 0, (498)
∂x2 x=x1

leading to the oscillatory solutions of the harmonic oscillator.


In presence of a damping force, a system starting near x1 with a low enough
velocity will approach x1 asymptotically in time. Therefore, such a position is
referred to as a stable equilibrium. For a maximum in the potential energy, like
at position x2 in the graph above, the second derivative is negative, which does
not lead to an oscillatory motion. While a particle at rest at x2 will stay there
for all times, any small deviation will lead to a trajectory that starts with an
exponentially growing separation from x2 .
As most of the physical systems will experience small fluctuations of forces,
and some dissipation, system tend to evolve into a state near a minimum where
∂ 2 U/∂x2 > 0, which is therefore also referred to as a stable equilibrium. A

111
Version: 11th Nov, 2017 11:13; svn-65

position where the potential energy takes a maximum, or ∂ 2 U/∂x2 < 0 is referred
to as an unstable equilibrium. To complete the description, a location like x3
where ∂ 2 U/∂x2 = 0 over some extended interval is referred to as an indifferent
equilibrium. It should be pointed out that a minimum or maximum could still
be present, but the second derivative could vanish. In that case, the sign of the
first non-vanishing term in the Taylor expansion of U (x) would determine if the
equilibrium is stable or unstable, but such situations are very rare in practice.
Therefore, the small deviations of a system from a stable equilibrium can often
be mapped to a harmonic oscillator, assuming the kinetic energy is of a quadratic
form in the velocity ẋ. As the restoring force F = −∂U/∂x near a minimum is
linear in the deviation u, this procedure is sometimes referred to as linearization.

9.4.1 Example of the plane pendulum


As a simple example, we revisit the plane pendulum from section 6.2.1:
The potential and kinetic energy were given by
O
g θ U = mgl (1 − cos θ) ,
l 1 2 2
T = ml θ̇ , (499)
2
leading to the nonlinear differential equation (199)
m
g
θ̈ +
sin θ = 0 . (500)
l
A Taylor expansion of the potential near the minimum at θ = 0 leads to
1
U (θ) ≈ mgl θ2 , (501)
2
and an approximate Lagrange function of
1 2 2 1
L=T −U = ml θ̇ − mgl θ2 , (502)
2 2
which after some simplification leads to the equation of motion
g
θ̈ + θ = 0. (503)
l
This is the equation of motion for the undamped harmonic oscillator, and by
comparison e.g. with (418), one can immediately extract the oscillation frequency
r
g
ω0 = . (504)
l

112
Version: 11th Nov, 2017 11:13; svn-65

10 Coupled oscillations
So far, we have encountered methods to generate equations of motion for system
of many particles, but have not really solved very complex systems. In this
section, we will look into a typical system of many harmonic oscillators. Such a
problem may arise from a system of particles near an equilibrium that are coupled
together by some localized interactions, and can be approximated by harmonic
oscillators as seen in section 9.4.

10.1 Two coupled oscillators


We demonstrate the strategy to tackle coupled oscillator problems with that of
two masses, coupled by springs:

k1 m k12 m k2

x1 x2

The equations of motion can be obtained in various ways, and form a system
of coupled differential equations:
m ẍ1 + k1 x1 + k12 (x1 − x2 ) = 0
m ẍ2 + k2 x2 + k12 (x2 − x1 ) = 0 (505)
The coupling means that these are not independent equations of motion for x1 and
x2 , and we need to find a solution for both variables that satisfy both equations
at the same time. As these equations are still linear, we can try the previous
trick, and look for complex solutions of the form
x1 (t) = b1 eiωt , x2 (t) = b2 eiωt , (506)
with the same oscillation frequency ω for both variables, but different complex
amplitudes b1 , b2 . Inserting these into (505) leads to
 
−mω 2 b1 + (k1 + k12 )b2 − k12 b1 eiωt = 0
 
−mω 2 b2 + (k2 + k12 )b1 − k12 b2 eiωt = 0 . (507)
As before, we can divide by the exponential function, and obtain an algebraic
equation determining the oscillation frequency. This time, however, the two
equations remain coupled via the amplitudes. The algebraic equation is linear in
the b, and can be written in matrix form,
! !
−mω 2 + (k1 + k12 ) −k12 b1
· = 0, (508)
−k12 −mω 2 + (k2 + k12 ) b2

113
Version: 11th Nov, 2017 11:13; svn-65

or simply
M · b = 0, (509)
where the symbol M denotes a matrix, and b a vector that is made up by two
amplitudes. Note that this is now a vector that does not represent a vector in
the usual three-dimensional space, it just stores the coefficients. The condition
for this matrix equation to be fulfilled is

det M = 0 , (510)

which leads to the characteristic equation of the linear equation system (508)
h ih i
−mω 2 + (k1 + k12 ) −mω 2 + (k2 + k12 ) − k12
2
= 0. (511)

To simplify the subsequent treatment, we assume now that k1 = k2 = k, so the


characteristic equation becomes

(k + k12 − mω 2 )2 − k12
2
= 0, (512)

which can easily be resolved for the two roots for ω,


s s s
k + k12 ± k12 k + 2k12 k
ω1,2 = or ω1 = , ω2 = . (513)
m m m
As with the simple harmonic oscillator, we can have solutions with both signs for
ω, and the general solution of the system (508) would be given by

x1 (t) = b+
11 e
iω1 t
+ b−
11 e
−iω1 t
+ b+
12 e
iω2 t
+ b−
12 e
−iω2 t

x2 (t) = b+
21 e
iω1 t
+ b−
21 e
−iω1 t
+ b+
22 e
iω2 t
+ b−
22 e
−iω2 t
. (514)

Here, the indices on b indicate the mass index, and the frequency of the sys-
tem. However, these eight coefficients are not independent - they still need to
fulfill (508), which imposes a relation between the amplitudes for the two masses.
Inserting the solutions (514), and restricting to either positive or negative fre-
quencies yields

for ω = ω1 : −k12 b11 − k12 b21 = 0 → b11 = −b21 =: B1 ,


for ω = ω2 : +k12 b12 − k12 b22 = 0 → b12 = b22 =: B2 . (515)

The general solution of the coupled problem is then given by

x1 (t) = B1+ eiω1 t + B1− e−iω1 t + B2+ eiω2 t + B2− e−iω2 t


x2 (t) = −B1+ eiω1 t − B1− e−iω1 t + B2+ eiω2 t + B2− e−iω2 t , (516)

with four constants B1+ , B1− , B2+ , B2− to satisfy the initial conditions of the
problem — the system is hereby completely determined, as (508) is a system of
differential equations of second order.

114
Version: 11th Nov, 2017 11:13; svn-65

10.1.1 Normal coordinates


The system has two oscillation frequencies ω1 and ω2 , and both masses participate
in oscillations at those two frequencies. One can ask for coordinates that simplify
the description of the problem. To see that this is the case, we define coordinates

η1 := x1 − x2 , and η2 := x1 + x2 , (517)

such that the original coordinates can be obtained again via


1
x1,2 = (η2 ± η1 ) . (518)
2
Inserting this into the coupled equations of motion (505) leads to
m k
(η̈2 + η̈1 ) + (η2 + η1 ) + k12 η1 = 0
2 2
m k
(η̈2 − η̈1 ) + (η2 − η1 ) − k12 η1 = 0 (519)
2 2
This system of equations can be transformed into another one by subtracting and
adding the two equations,

m η̈1 + (k + 2k12 ) η1 = 0
m η̈2 + k η2 = 0 . (520)

This is now a decoupled system of two equations of two independent harmonics


oscillators. Their solutions can be simply written from (69) as

η1 (t) = C1+ eiω1 t + C1− e−iω1 t ,


η2 (t) = C2+ eiω2 t + C2− e−iω2 t . (521)

Coordinates η1 and η2 for this system are referred to as normal coordinates, a


name that is justified in a later section. By appropriate choice of initial conditions,
one can ensure that only one of the coordinates has a non-trivial solution. For
example, if we choose

x1 (0) = −x2 (0) and ẋ1 (0) = −ẋ2 (0) , (522)

we know from (517) that

η2 (0) = 0 and η̇2 (0) = 0 , (523)

and we will find an oscillation with a single frequency ω1 . For this solution, the
two amplitudes x1 and x2 are always related via

x1 (t) = −x2 (t) , (524)

115
Version: 11th Nov, 2017 11:13; svn-65

i.e., the two masses move out of phase or in an antisymmetric oscillation; occa-
sionally this mode of oscillation is also referred to as a breathing mode. Similarly,
if initial conditions are chosen such that
x1 (0) = x2 (0) and ẋ1 (0) = ẋ2 (0) , (525)
an oscillation takes place only at a frequency ω2 , with
x1 (t) = x2 (t) (526)
at all times. This mode of oscillation is a symmetric mode, and some times
referred to as common mode oscillation.
Both oscillation modes with only one frequency appearing can be understood
as an effective one-variable problem: For the antisymmetric mode, the individual
masses oscillate independently between two springs and a fixed center position of
the middle spring, leading to a larger effective spring constant for the single mass
motion. For the symmetric case, the inner spring stays always in its equilibrium
position, and the only restoring force to the masses motion is provided by the
outside springs, leading to the same result as we have seen from the simple
mass/spring system in section 2.3.3.

10.1.2 Beat of oscillations


Before we move on to a more general treatment of coupled systems, we consider
the special case where the coupling constant k12 between the two masses is much
smaller than the other spring constants k. Then, the two oscillation frequencies
ω1,2 from (513) become very similar, and one can define
ω1 + ω2 ω1 − ω2
ω0 := , and ∆ := , (527)
2 2
with ω0 ≫ ∆. If we further assume the initial conditions
x1 (0) = D , x2 (0) = 0 , ẋ1 (0) = ẋ2 (0) = 0 , (528)
i.e., the system starts with only one mass displaced from its resting position, the
coefficients in (516) become
D
B1+ = B1− = B2+ = B2− = . (529)
4
The oscillation of mass 1 over time can be written as
D h iω1 t i
x1 (t) = e + e−iω1 t + eiω2 t + e−iω2 t
4
D
= (2 cos(ω1 t) + 2 cos(ω2 t))
4    
ω1 + ω2 ω1 − ω2
= D cos cos = D cos(ω0 t) cos(∆t) . (530)
2 2

116
Version: 11th Nov, 2017 11:13; svn-65

Similarly, one obtains for the oscillation of mass 2 the expression

x2 (t) = D sin(ω0 t) sin(∆t) . (531)

Qualitatively, the oscillation of the system looks as shown below:

x1(t)

t
π/2∆ 3π/2∆

x2(t)

t
π/∆ 2π/∆

Initially, mass 1 oscillates with approximate frequency ω0 , and mass 2 is at


rest. Then, the oscillation of mass 1 decreases in amplitude, while the amplitude
of mass 2 increases. At time t = π/2∆, only mass 2 oscillates, before the pattern
reverses, and mass 1 increases its amplitude again. This phenomenon is referred
to as a beat of two oscillations, and can be found in many weakly coupled oscil-
lators of the same frequency. The dashed lines in the figure above indicate the
envelope of the oscillations.

10.2 Many coupled particles – small oscillations


In this section, we try to extend the idea of small oscillations to a system of many
particles, e.g. molecules or the even more complex solids, where many atoms are
hold together by chemical bonds, which are not completely rigid. However, the
approach we take is not limited to these special situations, and goes also beyond
mechanics.
We start with a system described by N generalized coordinates, forming the
set {qk }. The interactions should be conservative, i.e., they can be described by a
single potential U ({qk }). We now assume that the system is near an equilibrium
state, possibly due to the presence of some dissipating forces which we exclude
in the treatment. The equilibrium position of all coordinates shall be given by
the set {qk,0 }. Similar to the simple one-dimensional problem in section 9.4, we

117
Version: 11th Nov, 2017 11:13; svn-65

can approximate the potential U by a truncated Taylor expansion. This time,


however, we need to do the Taylor expansion for N coordinates:

X ∂U X ∂ 2 U 1
U ({qk }) = U ({qk,0 }) + (qk −qk,0 ) + (qk −qk,0 )(qj −qj,0 )+. . .
k ∂qk 0 j,k ∂qj ∂qk 0 2
| {z }
=:U0 | {z }
=−φk
(532)
The first term is simply the constant potential energy U0 in the equilibrium, and
will not be relevant for the dynamics of the system. In the second term, the first
derivative of of the potential corresponds to the generalized force φk (see section
6.5). These forces are evaluated at the equilibrium position {qk,0 } of the system,
where they vanish by definition. By introducing variables

uk := qk − qk,0 (533)

for the displacement of coordinate k from the equilibrium position, we can for-
mulate the approximate potential energy by the first non-vanishing and relevant
term in these displacements:

1X ∂ 2 U
U = U0 + Ajk uj uk , with Ajk := , (534)
2 j,k ∂qj ∂qk 0

where the index 0 at the second derivative in Ajk indicates again that it has
to be taken at the equilibrium position set {qk,0 }. Since the sequence of the
differentiation does not matter, we have the symmetry

Ajk = Akj . (535)

We now try a similar expansion for the kinetic energy T of the system. Here, we
recall from (238) in section 6.4 that for time-independent transformations from
Cartesian to generalized coordinates, the total kinetic energy can be written as
X 1X X ∂xα,i ∂xα,i
T = ajk q̇j q̇k with ajk = mα , (536)
j,k 2 α i ∂qk ∂qj

where the summations in the definition of ajk go over all particles α and Cartesian
coordinates i. Since the ajk still can depend on the coordinates, we also perform
a Taylor expansion,

X ∂ajk
ajk = ajk |0 + (ql − ql,0 ) + . . . (537)
ql
l 0

In this expansion, the first term does not vanish, so we stop right there (even
neglecting the linear term), and define

mjk := 2 ajk |0 . (538)

118
Version: 11th Nov, 2017 11:13; svn-65

These coefficients do not depend on the deviations uk anymore. Because the


sequence of the differentiations in ajk does not matter, we also have mjk = mkj .
With q̇k = u̇k , the kinetic energy can then be written as
1X
T = mjk u̇j u̇k . (539)
2 j,k

The Lagrange function for small deviations from the equilibrium position becomes
1X
L=T −U = [mij u̇i u̇j − Aij ui uj ] − U0 , (540)
2 i,j

with constant coefficients mij and Aij . This expression has bilinear terms in the
new coordinates uk , and bilinear terms in the velocities u̇k — a structure very
similar to a harmonic oscillator. The resulting equations of motion are obtained
via the Euler-Lagrange method:
∂L d ∂L X d X
− =− Ajk uj − mjk u̇j = 0 , (541)
∂uk dt ∂ u̇k j dt j
or Xh i
mjk üj + Ajk uj = 0 for all k . (542)
j

This set of equations can be written as a matrix equation,


m · ü(t) + A · u(t) = 0 , (543)
where m is the mass matrix, A is a matrix that describes the elastic response of
a system, and u is a vector made up by all the displacement coordinates uk in
the system.

10.2.1 Solving the equations of motion


We can solve equation (543) in the same way as in section 2.3.3, e.g. with a
harmonic ansatz
u(t) = a cos(ωt − δ) , or componentwise : uk (t) = ak cos(ωt − δ) , (544)
leading to an algebraic set of equations
(A − ω 2 m) · a = 0 (545)
X
or componentwise : (Akj − ω 2 mkj )aj = 0 ∀k , (546)
j

with constant amplitude coefficients ak forming a vector a. This is again a set of


algebraic equations, with a condition
!
det (A − ω 2 m) = 0 (547)

119
Version: 11th Nov, 2017 11:13; svn-65

to have solutions. This characteristic or secular equation for the ω 2 has N roots,
where N is the number of coordinates in the system. The resulting frequencies
ωr , r = 1 . . . N are referred to as eigenfrequencies or characteristic frequencies of
the problem.
For each eigenfrequency ωr , a vector ar of amplitude coefficients solves the
equation set (545). These vectors characterize the modes of oscillation, i.e., the
relative amplitude with which each coordinate oscillates at this particular fre-
quency. That mode of oscillation is also referred to as eigenmode for an oscillation
at frequency ωr . With this, the general solution of the coupled oscillation can be
written as
N
X
u(t) = αr ar cos(ωr t − δr ) , (548)
r=1
with a factor αr permitting normalization of ar , or componentwise
N
X
uk (t) = αr ak,r cos(ωr t − δr ) , (549)
r=1

with real-valued amplitudes αr ar and phase shifts δr for the contributions of


the various eigenmodes to the oscillation. In the componentwise expression, the
elements ak,r are the k-th component of eigenvector ar .
It is interesting to investigate some properties of the vector ar that charac-
terizes a particular eigenmode. Since it has to fulfill (545), it is not completely
determined, and any vector α ar would also fulfill that equation. One can there-
fore choose the normalization of the vector such that
ar · (m · ar ) = 1 . (550)
In this expression, the term in the parenthesis is a vector multiplied from the
right to a matrix, leading to another vector. This vector then gets multiplied with
another vector via a scalar product, leading to a scalar result 1. Furthermore,
vectors ar and as fulfill a generalized orthogonality relation20 :
ar · m · as = 0 for r 6= s . (551)
To see this, we take two solutions of (545) for different eigenfrequencies,
ωr2 m · ar = A · ar
ωs2 m · as = A · as , (552)
and multiply these equations from the right with as and ar , respectively:
ωr2 as · m · ar = as · A · ar
ωs2 ar · m · as = ar · A · as . (553)
20
Careful: this is not the usual orthogonality relation between two vectors, which would be
ar · as = δrs . The two relations differ if the masses mα are not all the same.

120
Version: 11th Nov, 2017 11:13; svn-65

Evaluating one of the matrix sandwiched between two vectors in components,


and using the symmetry Ajk = Akj ,
X X
as · A · ar = aj,s Ajk ak,r = aj,s Akj ak,r
j,k j,k
X
= ak,r Akj aj,s = ar · A · as , (554)
j,k

we see that the two right sides of (553) are the same. Similarly, due to mkj = mjk ,
the sandwich products as · m · ar and ar · m · as on the left sides of (553) are the
same, so one can subtract the two equations and obtain

(ωs2 − ωr2 ) ar · m · as = 0 . (555)

Assuming that the eigenfrequencies are not degenerate, the difference of its
squares in the parenthesis does not vanish for r 6= s, and therefore, the sand-
wich product must vanish. This means that the two vectors obey the generalized
orthogonality relation (553), which can be combined with the normalization (550)
to
ar · m · as = δrs , (556)
with the Kronecker symbol δrs .
The orthogonality of eigenvectors is one of the results of linear algebra; in
fact, the whole search for the oscillation modes can be mapped to an eigenvector
problem. To see this, we first recognize that the matrix m can be inverted. For the
simple case that the qk are Cartesian coordinates, one can see from the definitions
(536) and (538) of the matrix elements mjk of m that there are no off-diagonal
elements, and the diagonal entries in m are simply the masses corresponding to
coordinate k:
mjk = δjk mk . (557)
Then, m can be inverted21 , with matrix elements of the inverse matrix m−1

(m−1 )jk = δjk 1/mk . (558)

Therefore, we can multiply equation (545) from the left with m−1 , and obtain

(m−1 · A) · a = ω 2 a , (559)

which is the familiar eigenvector/eigenvalue equation from linear algebra for the
matrix m−1 · A, which is a N × N matrix if there are N degrees of freedom.
There are N eigenvalues ωr2 , and the corresponding eigenvectors ar determine the
oscillation modes. If the eigenvalues of a matrix are not degenerate, the corre-
sponding eigenvectors are orthogonal. If there are degenerate eigenvalues, the
21
In fact, m can always be inverted if there are no redundant coordinates.

121
Version: 11th Nov, 2017 11:13; svn-65

subset of the corresponding eigenvectors is orthogonal to all other eigenvectors,


and a linear combination of eigenvectors in that subspace can be found such that
all eigenvectors are orthogonal. As the orthogonal vectors ar characterize the
oscillation modes to an eigenfrequency ωr , the modes are characterized by ar are
also called normal modes.

10.2.2 Normal coordinates


In section 10.1.1 we introduced normal coordinates for the problem of two masses
on an ad-hoc basis, and saw that the choice indeed decoupled the equations of
motion. We can do this now for the general case, and introduce the connection
between individual displacements uk and normal coordinates ηr :
X
uk (t) = ak,r ηr (t) , (560)
r

or in vectorial form X
u= ar ηr (t) . (561)
r
The transformation from the original coordinates uk to normal coordinates can
be accomplished by realizing that the eigenvectors ar are all orthogonal; when
using normalization (556), one can multiply the relation (560) from the left with
as · m to directly obtain the normal coordinate ηs :
X
a s · m · u = as · m · a r ηr
r
X
= a s · m · a r ηr
r | {z }
=δrs
= ηs . (562)
This helps e.g. to express an initial condition given in u in the normal coordinates.
Since the coefficient matrices m and A do not depend on time, the velocities are
X X
u̇k (t) = ak,r η̇r (t) or u̇ = ar η̇r (t) . (563)
r r

The coupled Lagrange function in (540) can also be expressed in matrix form,
1 Xh i
L = mik u̇i u̇j − Aij ui uj
2 i,j
1h i
= u̇ · (m · u̇) − u · A · u , (564)
2
assuming without loss of generality that U0 = 0. Using expression (561) to make
the transition to normal coordinates leads to
1X 1X
L= η̇r η̇s ar · m · as − ηr ηs a r · A · ss , (565)
2 r,s | {z } 2 r,s
=δrs

122
Version: 11th Nov, 2017 11:13; svn-65

where the first sandwich product is just the orthogonality relation (556). For the
second term, we use (545) for eigenvector as ,
A · as = ωs2 m · as , (566)
and continue with the evaluation of the Lagrange function:
1X 2 1X
L = η̇ − ηr ηs ar ωs2 m · as
2 r r 2 r,s
1X 2 1X
= η̇ − ηr ηs ωs2 ar · m · as
2 r r 2 r,s | {z }
=δrs
1 Xh 2 i
= η̇r − ηr2 ωr2 . (567)
2 r
This is a sum of Lagrange functions for simple harmonic oscillators, which means
that the motion in normal coordinates is completely decoupled. The correspond-
ing equations of motion via Euler-Lagrange are
∂L d ∂L d
− = −ωr2 ηr − η̇r = 0 (568)
∂ηr dt ∂ η̇r dt
or
η̈r + ωr2 ηr = 0 for r = 1 . . . N , (569)
which is a set of equations for N decoupled harmonic oscillators with the typical
solutions
ηr (t) = ηr+ eiωr t + ηr− e−iωr t . (570)
As before, the coefficients ηr+ and ηr− need to be chosen to meet initial conditions.
To summarize, the strategy of solving a problem of small oscillations around
the equilibrium of a coupled system of masses is as follows:
• Determine the mass matrix m either directly if the coordinates are the
Cartesian coordinates, or via (536) and (538).
• Find the elastic coupling matrix A according to (534).
• Find the eigenfrequencies ωr and a set of corresponding normalized eigen-
vectors ar to the matrix m−1 · A for the normal modes of the system.
• Find a combination of amplitudes αr and phase shifts δr for each eigenmode
that satisfy the initial conditions via (548) – and you are done!
In practice, this strategy can be followed for relatively small systems with not
too many coordinates, because then the eigenvector search can be either done
manually, or very efficient numerical methods can be used. The difficulty would
be more in finding the elastic coupling matrix A if interaction between all masses
take place. Examples for such problems are the vibrations that can occur in
molecules.

123
Version: 11th Nov, 2017 11:13; svn-65

10.3 Linear systems with next-neighbor couplings


We now come to a generalization of the problem of coupled masses presented
in section 10.1 from 2 to N masses coupled to each other. The coupling should
be restricted to close neighbors, as it is typical e.g. for lattice vibrations in a
crystalline solid. However, the problem is much more general, because in various
areas of physics one is restricted to such “local” interactions.
As the simplest example of such a problem, we consider again a linear chain of
masses that can move in one direction, coupled by springs:

k m k m m k m k

u1 u2 uN−1 uN

The system looks similar for all masses, i.e., all masses are the same and see a
similar environment of neighbors and springs. The equation of motion is obtained
in one of the usual ways, e.g. by considering the total force Fj acting on mass j:

m k m k m

uj−1 uj uj+1

Force Fj is determined by position uj of mass j, and those of its neighbors:

Fj = müj = −k(uj − uj−1 ) − k(uj − uj+1 )


= k(uj−1 + uj+1 − 2uj ) . (571)

Index j runs from 1 to N ; we define u0 := 0 and uN +1 := 0 to fulfill the boundary


conditions at both ends of the chain in expression (571).
A very similar problem appears when we consider a transverse displacement of
masses from an equilibrium position of a coupled string of masses. This approxi-
mates the situation of the string in a music instrument. The string should have a
static tension force τ in equilibrium; the geometry of the problem is shown below:
m
k k m
mϕ τ τ
ϕ’
qj−1 qj qj+1

124
Version: 11th Nov, 2017 11:13; svn-65

The transverse restoring force for mass j is now given by

Fj = −τ sin φ − τ sin φ′
≈ −τ (tan φ + tan φ′ )
 
qj − qj−1 qj − qj+1
= −τ +
d d
τ
= (qj−1 + qj+1 − 2qj ) , (572)
d
leading to the same equation of motion as (571) with a different force coefficient:
τ
mq̈j = (qj−1 + qj+1 − 2qj ) . (573)
d
Using the usual harmonic ansatz uj = aj eiωt , we obtain the algebraic equation
(545) for both the eigenfrequencies and the eigenvectors; in components:

−aj−1 + λaj − aj+1 = 0 for j = 1 . . . N , with λ = (2 − mω 2 /k) , (574)

or in matrix form with D = A − ω 2 m:


   
λ −1 0 0 ... a1

 −1 λ −1 0 . . .   a2 
 

D · a = 0 or 

 ·  .  = 0.
0 −1 λ −1 . . .   .. 
  (575)
 
.. . .
. . aN

To satisfy this matrix equation, the characteristic equation must hold:

det D = 0 , (576)

fixing the frequency ω of a solution. We verify this for two special cases. For
N = 1, the matrix D reduces to the single value λ, and (576) simply becomes

λ = 0. (577)

From this, with the definition of λ in (574), we find


s
2k
ω= . (578)
m
Compared to the result (68) for the harmonic oscillator, the factor 2 here should
not come as a surprise, because the we chose the boundary conditions such that
the masses at the end are coupled to a wall with another spring on each side:
k m k

125
Version: 11th Nov, 2017 11:13; svn-65

For N = 2 we expect to reproduce the problem in section 10.1. We have


!
λ −1
D= , therefore det D = λ2 − 1 = 0 , (579)
−1 λ

resulting in the solutions and corresponding eigenfrequencies seen in (513):


s s
2 − λ1,2 2k ∓ k
λ1,2 = ±1 , and ω1,2 = = . (580)
m/k m
One could come up with a recursive expression for the characteristic equation
for any N , but it is much more convenient to solve the problem differently.

10.3.1 Solution with the wave approach


In the algebraic equation system (574), the structure for all coefficients looks
similar, with couplings only to next neighbors. We consider the ansatz

aj = aei(jγ−δ) (581)

for the components of vector a in (575), which is eigenvector to m−1 · A. The


coefficients a and δ should be real values, describing a common amplitude and
phase for all components aj , and γ a phase shift between neighboring components.
The components aj±1 for the next neighbors are given by

aj±1 = aei([j±1]γ−δ) = e±iγ aj , (582)

which helps to simplify (574):

−aj−1 − aj+1 + λaj = 0 → (−eiγ − e−iγ + λ)aj = 0 ∀j=1...N . (583)

Since the amplitude aj is non-zero by construction (581), the parenthesis has to


vanish, which leads to a condition between γ and λ:
m 2
2 cos γ = λ = 2 − ω (584)
k
or
s
k 2k 4k γr 4k γr
ωr2 = (2 − 2 cos γr ) = (1 − cos γr ) = sin2 → ωr =
sin .
m m m 2 m 2
(585)
There, the index r indicates that we need to find r eigenfrequencies, and corre-
sponding constants γr and δr . We now need to fix the values of γr and δr to meet
the boundary conditions. As we are looking for real-valued displacements, we
could have equally well chosen the real part of (582), at the cost of more complex
expressions. We make this transition now:

aj,r = ar ei(jγr −δr ) → aj,r = ar cos(jγr − δr ) . (586)

126
Version: 11th Nov, 2017 11:13; svn-65

The boundary condition at the left end (j = 0) requires that a0,r = 0 or


π
0 = a0,r = cos(−δr ) ⇒ δr = , so aj,r = ar sin(jγr ) . (587)
2
The boundary condition on the right end of the chain requires

aN +1,r = 0 ⇒ γr (N + 1) = sπ , s = 1, 2, 3, . . . (588)

Since we need N different solutions for ωr , we simply choose s = r, so



γr = , r = 1...N . (589)
N +1
This leads to eigenvector components
 

aj,r = ar sin j (590)
N +1
and with (585) to eigenfrequencies
s
4k rπ
ωr = sin . (591)
m 2(N + 1)

As in (560), we can express the individual deviations of the masses via normal
coordinates ηr and the corresponding time dependence (570),
X h i
uj (t) = aj,r ηr+ eiωr t + ηr− e−iωr t , (592)
r

with adequate choices for the ηr± to meet initial conditions if required.

10.3.2 Resulting mode structure


The mode structure contained in expression (590) resembles that of standing
waves with different wavelengths; we visualize the different eigenvector compo-
nents aj,r for the example with N = 3 in the graph below. The eigenmodes for
r = 1 . . . 3 follow the pattern of standing waves, sampled at discrete points. The
standing wave meets the boundary conditions at the auxiliary positions j = 0
and j = N + 1 = 4.
Modes for r = 4 and r = 8 do not lead to a meaningful oscillations, as all
eigenvector components vanish. The modes corresponding to r = 5, 6, 7 reproduce
the modes structure corresponding to r = 3, 2, 1, with an additional minus sign for
all components. Hence, they do not constitute any new mode, and the description
of modes with r = 1, 2, 3 is indeed complete.

127
Version: 11th Nov, 2017 11:13; svn-65

aj r=1 aj r=5

1 2 3 j 1 2 3 j

aj r=2 aj r=6

1 2 3 j 1 2 3 j

aj r=3 aj r=7

1 2 3 j 1 2 3 j

aj r=4 aj r=8

1 2 3 j 1 2 3 j

10.3.3 Dispersion relation


Since the amplitude distribution aj,r for the different modes r follow a standing
wave pattern, we can use the wavelength of this wave to characterize the mode.
For this, we assume that all masses spaced by the same distance d from their
next neighbors. Then, the total length of the chain from endpoint to endpoint is
L = (N + 1)d. The wavelength of the fundamental mode (r = 1) is then Λ1 = 2L,
and from (590) the wavelength of a higher order mode can be written as

2L
Λr = . (593)
r
One then can identify mass j also by its distance xj = jd from the left boundary
corresponding to j = 0, and express the displacement uj in that mode by
  !
rπ (dj)rπ
aj,r = ar sin j = ar sin
N +1 L
   
2L π 2π
= ar sin xj = ar sin xj = ar sin (xj qr ) . (594)
Λr L Λr

128
Version: 11th Nov, 2017 11:13; svn-65

Here, qr is a wave number and indicates a spatial frequency:


2π 2πr rπ
qr = = = . (595)
Λr 2L (N + 1)d
Its maximal value is
π N
qr,max = qN = , (596)
dN +1
which converges to qmax = π/d for large N . This is the typical situation e.g. for
the motion of the atoms in a crystal, with lattice spacing d. There, the wave
number q is a much better indicator for the mode as the integer mode index r
because of the large number of atoms involved. The upper limit qmax corresponds
to a minimal wavelength of Λmin = 2π/qmax = 2d; a wavelength smaller than
twice the lattice spacing d is not meaningful for a description of modes.
In the same way, the dependency of the eigenfrequencies ωr in (591) from mode
index r can be described by the wave number q:
s
4k rπ
ω(q) = sin
m 2(N + 1)
s
qd 4k
= ωmax sin , with ωmax = . (597)
2 m
A function that relates the oscillation frequency of a mode with its wavenumber
is called dispersion relation for a coupled system. The figure below shows the
dispersion relation for a coupled linear chain of masses with N = 10:

ωr 1 2 ... N-1 N
r
ωmax

q
0 π/d

The maximal frequency ωmax corresponds to an oscillation mode where adjacent


lattice sites oscillate in opposite direction; thus, the spring constant, compared to

129
Version: 11th Nov, 2017 11:13; svn-65

the case of a single mass subject to two springs attached to a fixed wall, effectively
doubles compared to the result (578).
The theory of coupled spring/mass systems is important in solid state physics;
the problem above is only the simplest example of a lattice vibration in crystalline
solids. It has to be extended to three dimensions, and typically, more complex
crystal structures with more than one atom in a crystal unit cell need to be
considered. But the basic treatment is the same as outlined in this section:
oscillation modes are indexed by a wave number (or wave vector, in a three-
dimensional case), and other parameters that indicate transverse or longitudinal
displacements. As the frequencies of oscillation can be quite high in a solid, such
oscillations need to be treated quantum mechanically, giving rise to “phonons” as
quasi-particles corresponding to a particular mode index q. However, the whole
dispersion relation of lattice vibrations is a purely classical mechanics problem.

10.4 Transition to a continuum


In the last section, we already replaced the particle index j by a position index xj ,
and the discrete eigenmode index r by a wave number q. By increasing the number
N of masses, while shrinking their distance d, but keeping the overall physical
properties of the string, like its length L, total mass M and elastic properties
fixed, we will arrive at a continuous distribution of masses and springs. This is a
suitable description of solids, because it is often neither possible nor interesting
to keep track of the position of all its constituting masses.
We first consider the basic variables of the problem. In the discrete case, these
are the displacements uj , which can be expressed by time-dependent normal
coordinates ηr via (560), (594):
X X
uj (t) = ηr (t)aj,r = ηr (t) sin(xj qr ) . (598)
r r

The ηr exhibit the simple harmonic oscillator dynamics (570) at frequency ωr .


Expression (598) has already a form to make the transition
uj (t) → u(x, t) , (599)
to a function with a continuous position variable x. The time dependency of u for
a given mode q still is a harmonic oscillation, but the ω(q) changes for d → 0. For
small d, we approximate the sinus in dispersion relation (597) for small angles:
s
4k qd
ω(q) = sin
m 2
s s
4k qd d √
≈ = q kd . (600)
m 2 m
The result is written such that the first root is a ratio of distance over mass.
If the global properties of the problems stay fixed, this ratio is the inverse of a

130
Version: 11th Nov, 2017 11:13; svn-65

linear mass density ρ. The second root contains a product of a spring constant
k and distance d, which also remains constant for the limit d → 0: if a spring
made out of a homogenous elastic material is cut in two pieces, an application of
the same force to the short piece will lead to half the compression or extension
of the original spring — to compress it to the same length, one would need to
apply twice the force, i.e., the product of length d and spring constant k will not
change. This product
K := kd (601)
describes the stiffness of the material, and can be calculated from the elastic
modulus or Young’s modulus22 E of a material via K = EA0 for a given cross
section A0 . Dispersion relation (600) then becomes
s
K
ω(q) = q , (602)
ρ
which is linear in q and has no upper limit for q or ω.
Similarly, a continuum relation for the transverse displacement for a string
under a tensional force τ can be calculated: the equations of motion (571) for
longitudinal, and (573) for a transverse displacement are essentially the same.
All results for the longitudinal case can be transferred to the transverse case by
replacing k with τ /d, changing the continuous dispersion relation (600) to
s r s
d τ τ
ω(q) = q d=q . (603)
m d ρ

10.4.1 Coupled equations of motion in a continuum


Instead of transferring the results from a discrete problem to a continuum, one
can also move from a discrete set of coupled equations of motion like (571) or
(573) to a continuum description, and solve the problem directly. For this, we
rewrite (571) to identify the continuum properties:
 
m uj+1 + uj−1 − 2uj
müj = k(uj+1 + uj−1 − 2uj ) → üj = (kd) (604)
d d2
The last parenthesis approximates the second derivative of u with respect to the
position xj . For d → 0,
m uj+1 + uj−1 − 2uj ∂ 2 u(x)
→ ρ, kd → K, → , (605)
d d2 ∂x2
so that the equation of motion (604) becomes
∂ 2u ∂ 2 u(x) ∂ 2 u(x, t) K ∂ 2 u(x, t)
ρ = K or − = 0. (606)
∂t2 ∂x2 ∂t2 ρ ∂x2
22
Named after Thomas Young, 1773-1828, but discovered around 1727 by L. Euler.

131
Version: 11th Nov, 2017 11:13; svn-65

This is a partial differential equation that mixes differentiations with respect to


the different continuous variables x and t of the function u(x, t). The specific
form of (606) is called wave equation, and has a particular set of solutions.

10.4.2 Solving the wave equation by separation of variables


One way of solving the wave equation is to assume a harmonic time dependency
that is multiplied with some space-dependent part:

u(x, t) = v(x)eiωt . (607)

Inserting this into the wave equation leads to


" #
2 K ∂ 2 v(x) iωt
−ω v(x) − e = 0, (608)
ρ ∂x2

where one can divide by the time-dependent oscillation, and end up with

∂ 2 v(x) ρω 2
2
+ q 2 v(x) = 0 , with q 2 = . (609)
∂x K
This is an ordinary differential equation (or a simpler partial differential equa-
tion if more than one space dimension is involved), and is referred to as the
Helmholtz equation23 . For one dimension x, it has the same structure as the
equation of motion of a harmonic oscillator, which is consistent with finding si-
nusoidal solutions for the spatial structure of eigenmodes in (598). The q in
expression (609) is exactly the wave number we used before, and we can directly
extract the dispersion relation.
The observation of continuum solutions can actually help to motivate solutions
for discrete variable cases; in section 10.3.1, the wave approach was presented
without a reasoning, but the solution of a continuum problem really motivated
the choice in (581).

10.4.3 Propagating solutions to the wave equation


Apart from solutions (598) we derived from a finite number of coupled masses, or
could have gotten out of the Helmholtz equation (609), the wave equation (606)
has other interesting solutions. We try the the ansatz

u(x, t) = w(x ± vt) , (610)

i.e., we replace the function of two variable by a function w of a single variable


that is a combination of x and t. To check if this function can solve the wave
23
after Hermann von Helmholtz, 1821-1894

132
Version: 11th Nov, 2017 11:13; svn-65

equation, we need derivatives of u with respect to t and x:

∂ 2u ∂ ′
2
= [w (x ± vt)] = w′′ (x ± vt) ,
∂x ∂x
∂ 2u ∂
2
= [ ± v w′ (x ± vt)] = v 2 w′′ (x ± vt) , (611)
∂t ∂t
where w′′ indicates the second derivative of function w with respect to its param-
eter. Inserting those derivatives into the wave equation leads to
!
K K
v w (x ± vt) − w′′ (x ± vt) = v 2 −
2 ′′
w′′ (x ± vt) = 0 (612)
ρ ρ

For v 2 = K/ρ, this equation can be fulfilled for any function w, as long as it
is differentiable. This is a remarkable result: Any initial distribution u(x, t =
0) = w(x) is supported. Depending on the initial conditions, this distribution
propagates either in positive or negative direction (or a combination of both)
with a velocity
s s
K τ
v= , or v = for transverse displacements . (613)
ρ ρ

This velocity is therefore the speed of sound in a solid, as sound is the phenomenon
of local displacements that propagate through material via elastic coupling.

10.4.4 Derivation of the wave equation via the Hamilton principle


While the transition from a discrete mass problem to a continuum in the last
section leads to the correct wave equation of motion in an elastic one-dimensional
continuum, the equation can also be derived via the Euler-Lagrange mechanism.
The details are beyond the scope of this course, so we only cover this approach
very briefly.
The basis for the Euler-Lagrange formalism is the knowledge of the total kinetic
and total potential energy. We can make the transition from the total kinetic
energy of the discrete chain to a continuum in a similar way as in the previous
chapter:
1X 1 mX 2
T = mu̇2j = u̇ d . (614)
2 j 2 d j j
The ratio m/d is again the linear mass density ρ. The sum of over all velocities
with the distance d can be replaced by an integral over the length of the string,
leading for d → 0 to the asymptotic expression for the total kinetic energy
L !2
1 Z ∂u(x, t)
T = ρ dx . (615)
2 ∂t
0

133
Version: 11th Nov, 2017 11:13; svn-65

For the potential energy, we do a similar transition:


 2
1 X 1 X uj+1 − uj
U = k (uj+1 − uj )2 = (kd) d. (616)
2 j 2 j d2

The fraction in the sum (with the distance d) will become the partial derivative
of the displacement with respect to x for d → 0, and the sum will go over into
an integral:
L !2
1 Z ∂u(x, t)
U= K dx . (617)
2 ∂x
0
Together with the kinetic energy T , this leads to a Lagrange function
 !2 
ZL !2 ZL !
ρ ∂u K ∂u ∂u ∂u
L=T −U =  −  dx = L u, , dx (618)
2 ∂t 2 ∂x ∂t ∂x
0 0

for the continuum, with a Lagrange density L that depends on the function u(x, t),
and its first partial derivatives with respect to x and t.
Without further proof, the equivalent to the Euler-Lagrange equation (194) for
problems with two continuous parameters x and t is
   
∂L ∂  ∂L  ∂  ∂L 
−   −   = 0. (619)
∂u ∂t ∂ ∂u ∂x ∂ ∂u
∂t ∂x

The Lagrange density for the elastically coupled mass density in (618)
! !2 !2
∂u ∂u ρ ∂u K ∂u
L u, , = − , (620)
∂t ∂x 2 ∂t 2 ∂x
has partial derivatives
∂L ∂u ∂L ∂u
  =ρ , and   = −K . (621)
∂ ∂u ∂t ∂ ∂u ∂x
∂t ∂x

Inserting those into the Euler-Lagrange equation (619), and observing that L
does not explicitly depend on u (so ∂L/∂u = 0) leads to
" # " #
∂ ∂u ∂ ∂u ∂ 2u ∂ 2u
− ρ − −K = −ρ 2 + K 2 = 0 , (622)
∂t ∂t ∂x ∂x ∂t ∂x
which reproduces the wave equation (606).
It should be stated that mechanical problems are usually never treated with
this mechanism, since the formalism is way too complicated, and the respective
partial differential equations for the displacement field u(x, t) can be obtained
in a much simpler way. The method outlined above, however, is used in high
energy physics, and when dealing with interactions that do not easily lead to
field equations otherwise.

134
Version: 11th Nov, 2017 11:13; svn-65

11 Non-inertial reference frames


So far, we have described mechanical problems only in inertial reference frames
where Newton’s first law holds, stating that a body not subject to a force remains
in a state of uniform motion. However, there are important cases where inertial
reference frames are inadequate or complicated. Notably, specifying the motion
of bodies on the surface of the Earth is such an example. Because of the rotation
of the Earth, Newton’s laws (or equivalently, the equations of motion derived via
the Euler-Lagrange formalism) do only hold approximately. In this section, we
discuss a suitably modified version of Newton’s second law in these systems.

11.1 Coordinate transformation


Consider two reference frames F and F ′ that are moving with respect to each
other, and may also change their respective orientation over time:
x3 x’3
F x’2
x2 F’

x1 x’1
O O’

The transformation at any instant can be separated in a translation between


the two origins O and O′ , and a single rotation that adjusts the orientation of the
coordinate systems with respect to each other (see section 1.4). If x is a vector
described by coordinates xi in F , and by coordinates x′i in F ′ , i.e.
3
X 3
X
x= xi e i = x′i e′ i , (623)
i=1 i=1

then the two coordinate representations are connected via a rotation matrix R
   
x1 x′1
 x2  = R ·  x′2  .
   
(624)

x3 x3
As seen in section 1.4, the rotation around the x3 axis is e.g. represented by
 
cos φ − sin φ 0
R3 (φ) =  sin φ cos φ 0 

. (625)
0 0 1
We then made the transition to infinitesimal rotations, e.g. for the x3 axis
 
0 −dφ 0

R3 (φ) → R3 (dφ) = 1 + ǫ3 (dφ) = 1 +  dφ 0 0 
, (626)
0 0 0

135
Version: 11th Nov, 2017 11:13; svn-65

with the unity matrix 1. As shown in section 1.5, infinitesimal rotations around
different axes can simply be added up, leading to the general infinitesimal cor-
rection matrix
 
0 −dφ3 dφ2

ǫ(dφ1 , dφ2 , dφ3 ) =  dφ3 0 −dφ1 
, (627)
−dφ2 dφ1 0

where the dφi describe infinitesimal rotations around coordinate axes i. If all
infinitesimal rotations dφi are combined into a vector dφ = (dφ1 , dφ2 , dφ3 ), the
action of correction matrix ǫ can be expressed as a vector product:

ǫ · x = dφ × x (628)

Therefore, we can rewrite the right side of (624) as

R · x = (1 + ǫ) · x = x + dφ × x (629)
| {z }
=:dx

This means that for an infinitesimal rotation dφ, the coordinate vector needs to
be corrected by the infinitesimal amount dx to make the transition between two
coordinate systems.

11.1.1 Transformation of time derivatives of vectors


Dividing the differential dx in (629) by an infinitesimal time dt leads to an ex-
pression for the temporal derivative of vector x:
dx dφ
= × x = ω × x, (630)
dt dt
where ω is the instantaneous angular velocity. This expression reproduces the
relation (30) between the instant velocity v = dr/dt and the position r introduced
earlier, but holds for any vector x, and allows evaluating the temporal derivative
of a vector due to a rotation of the reference frame.
We now consider the temporal derivative of a vector x expressed as a linear
combination of base vectors e′ i of F ′ ,
X
x(t) = x′i (t) e′ i (t) . (631)
i

Both the coefficients x′i and the basis vectors e′ i are time dependent, so

dx d X ′ ′ X
′ ′ ′ ′ dx X
= x i e i = ( ẋ i e i + x i ė i ) =: + x′i ė′ i (632)
dt F dt i i dt ′
F i

The index F indicates that the derivative is supposed to be taken in frame F ,


capturing both the time derivative of the coordinates x′i and the base vectors

136
Version: 11th Nov, 2017 11:13; svn-65

e′ i . The index F ′ in the temporal derivative of x indicates that this is the


derivative in system F ′ : only the temporal derivatives of the coefficients x′i are
taken, multiplied with the base vectors e′ i that do not change in time in reference
system F ′ . The second term contains temporal derivatives of e′ i , as seen from
reference frame F . For this, we can use (630):

de′ i
ė′ i = = ω × e′ i , (633)
dt
With this, we can continue with the time derivative of x:

dx dx X
= + x′i (ω × e′ i )
dt F dt F ′ i

dx X
′ ′ dx
= + ω× x i e i = + ω × x. (634)
dt F ′ i dt F ′

The temporal derivative of a vector x as seen from reference frame F is therefore


given by the temporal derivative of the vector as seen in reference frame F ′ ,

dx X
= ẋ′i e′ i , (635)
dt F ′ i

corrected by a term that takes into account the time evolution of the base vectors
e′ i of frame F ′ as seen from frame F .

11.1.2 Transformation of velocities and accelerations


We now consider the transformation of the characteristic vectors of a moving
point P between two reference frames F and F ′ . The origin O′ of F ′ should be
displaced by a vector R from the origin of F :

e’3
P
e’2
e3 r’
F r
e2 e’1
O’ F’
R
O e1

The position vector r′ of point P with respect to origin O′ can be expressed by


vectors in system F :
r = R + r′ . (636)

137
Version: 11th Nov, 2017 11:13; svn-65

The displacement vector R between the two reference frames may also change
over time. Therefore, the velocity of P as seen from frame F is given by

dr dR dr′
= +
dt dt dt F

dR dr′
= + + ω × r′ , (637)
dt dt F ′
where the indices F and F ′ indicate the system with which respect a temporal
derivative is taken. The above expression can be rewritten as
v = V + v′F ′ + ω × r′ , (638)
where V = Ṙ is the velocity of origin O′ in F , the velocity v′F ′ of P is the one
seen in system F ′ with respect to the origin O′ , angular velocity ω captures the
instantaneous change of orientation of the two coordinate systems.
To obtain the acceleration of point P , as seen from reference system F , we take
another temporal derivative of (637):
" #
dV dv′F ′ dr′
a = + + ω̇ × r′ + ω ×
dt dt F dt F
! " !#
dv′F ′ dr′
= R̈ + + ω × v′F ′ + ω̇ × r′ + ω × + ω × r′
dt F ′ dt F ′
| {z }
=v′F ′

= R̈ + a′F ′ + 2ω × v′F ′ + ω̇ × r′ + ω × (ω × r′ ) (639)


In the first line, only the time dependency of ω and r′ was considered. From the
first to the second line, the temporal derivatives of v′ and r′ were corrected for
the rotation of the two reference frames with respect to each other. The newly
introduced term a′F ′ is the acceleration of P , as seen in reference frame F ′ :
X
a′F ′ = r̈i′ e′ i . (640)
i

11.2 Dynamics in non-inertial reference frames


So far, we only considered the kinematics of transformations between reference
frames F and F ′ . We are not yet able to come up with a description of the
dynamics of a system, as it was defined by Newton’s second law.
To accomplish this, we now postulate that F is an inertial reference frame for
which Newton’s laws apply, while F ′ can be a non-inertial reference frame. The
dynamics of a mass point m can then be described in the inertial reference frame
F via Newton’s second law (51):
F = ma, (641)

138
Version: 11th Nov, 2017 11:13; svn-65

where the force F is a sum of all interactions with the environment or other
masses, like gravitation, Coulomb interaction etc. Since there is also a meaning-
ful acceleration vector a′F ′ in the non-inertial reference frame F ′ , we define an
effective force Feff that allows writing down the equivalent to Newton’s second
law in the non-inertial reference frame:

Feff := m a′F ′ . (642)

With the expression (639) for transforming the accelerations between F and F ′ ,
we find

Feff = F − mR̈ − mω̇ × r′ − 2mω × v′F ′ − mω × (ω × r′ ) (643)

This means that in the non-inertial reference frame, the effective force Feff con-
tains not only the “true” forces generated by interaction of masses, charges etc
also seen in the inertial reference frame F , but also a number of so-called inertial
forces that are a consequence of F ′ not being an inertial reference frame.
The first term results from an acceleration of the reference frame with respect
to the inertial frame; this is the apparent force one observes in an accelerating
elevator, or in an accelerating/decelerating vehicle. The second term is due to an
angular acceleration, and has perhaps not an obvious presence in everyday life.
The third term is an apparent force that is proportional to the velocity v′F ′ in
the non-inertial reference frame F ′ , and referred to as Coriolis force.
The last term in (643) is the centrifugal force that a body in reference frame

F feels that is proportional to the square of the angular velocity.

11.2.1 Centrifugal force


We first look at the geometry of the centrifugal force term in (643),

Fcentrif. = −mω × (ω × r′ ) . (644)


The orientation of the centrifugal force can
ω × r’ be seen in the figure: For an origin O′ of F ′
R on the rotation axis, the vector ω × r′ is the
velocity vector tangential to the trajectory
ω −ω × ( ω × r’ ) of the point P . The resulting second vector
α r’
product with the angular velocity ω makes
the centrifugal force pointing radially away
from the rotation axis.
The modulus of the centrifugal force is given by

|Fcentrif. | = m ω 2 |r′ | sin α = m ω 2 R , (645)

where R is the shortest distance of P from the rotation axis, replicating the
well-known expression for the centrifugal force.

139
Version: 11th Nov, 2017 11:13; svn-65

The centrifugal force is e.g. responsible for the deviation of the shape of the
Earth from an ideal sphere; on the equator, the distance from the center of
the earth is about 21.4km larger than on the poles due to the daily rotation.
Furthermore, the direction of the local acceleration e.g. felt by a mass on a string
does not point directly towards the center-of-mass of the earth, but must be
corrected to take care of the centrifugal term.

11.2.2 Coriolis force


The Coriolis force term24 in (643)
FCoriolis = −2mω × v′F ′ (646)
is only present if a body moves with respect to the moving reference system, i.e.,
v′F ′ 6= 0. The geometry of the Coriolis force can be easily seen for a mass point
that moves with a velocity v′ on a rotating platform:

ω ω

x’2 x’1 x’2 x’1


v’ v’
m FC = −2m ω × v m
FC

In the left figure, the mass point is moving radially towards the center; in a
coordinate system attached to the rotating platform, the velocity v′ is parallel
to the x′2 direction. The resulting Coriolis force according to (646) points in the
x′1 direction. This can be understood in terms of an inertial effect: when moving
radially in, the mass point has an angular momentum that would be too large
for a new (static) position at a smaller radial distance. The mass point tries to
retain its momentum, which appears as an accelerating force in the tangential
direction in the rotating system. In the right figure, the velocity v′ is along the
x′1 direction, resulting in a Coriolis force pointing radially away from the rotation
axis in the rotating frame. This can be interpreted as an additional centrifugal
term, because with its additional tangential velocity, it appears to have a larger
angular velocity than the rotating reference frame F ′ .
The Coriolis force has consequences on moving bodies on the surface of the
Earth where observations are typically expressed in non-inertial coordinates (lat-
itude and longitude) that are fixed to the rotating Earth. In a coordinate system
(x′1 , x′2 ) aligned with a tangential plane to the Earth, the angular velocity vector
ω in the northern hemisphere has a component ω ⊥ that points away from the
surface of the earth.
24
described by Gaspard-Gustave de Coriolis, 1792-1843; published in J. de l’Ecole royale
polytechnique 15, 144–154 (1835)

140
Version: 11th Nov, 2017 11:13; svn-65

ω
x’2
ω ω⊥
ω x’1

FC
FC
m v’ v’

x’2 x’1
FC ||

If a mass moves in the tangential plane with a velocity v′ , the projection of


the Coriolis force FC onto the tangential plane gives rise to a force that pulls the
mass to the right side (with respect to its velocity, and seen when standing on the
tangential plane). Consequently, air masses that move into a low pressure area
on the northern hemisphere form a spiral that rotates counterclockwise around
the low pressure area. However, only the component ω ⊥ of the angular velocity
that is perpendicular to the tangential plane (x′1 , x′2 ) contributes to an in-plane
projection of the Coriolis force FC || .
On the equator, ω ⊥ = 0 because the angular velocity vector ω is parallel to the
surface. For velocities v′ tangential to the surface, the Coriolis force has therefore
no component FC || in the tangential plane. On the southern hemisphere, the
vertical component of the angular velocity vector points into the earth, leading
to air moving into low pressure areas in clockwise spiraling motion.
The Coriolis force is also the basis of recent solid-state inertial sensors for
rotation: in so-called vibrating structure gyroscopes or MEMS gyroscopes, an os-
cillating test mass feels the Coriolis force perpendicular to its oscillating motion
if there is a rotation component perpendicular to the motion of the test mass. A
similar effect seems to be used by some two-winged insects, which sense forces on
so-called halteres vibrating with the wing frequency to determine their angular
velocity and presumably stabilize their orientation in space.

141
Version: 11th Nov, 2017 11:13; svn-65

12 Motion of rigid bodies


In the last section, we encountered a few effects related with the transformation
between an inertial reference frame, and a rotating reference frame. However, we
did not ask about the dynamics of a rotational motion. In this last section, we
will address some of these aspects for so-called rigid bodies, i.e., an ensemble of
mass points in a particular shape with a fixed relative position to each other.

12.1 Orientation of a rigid body in space - Euler angles


Several properties of rigid bodies are easy to describe in a coordinate system that
is fixed to the rigid body. Such a coordinate system may not be an inertial refer-
ence frame, so we need to describe the relative orientation of the body coordinate
system F ′ with respect to a fixed inertial system F . In section 1.4 we saw that
this transformation can be expressed by a proper rotation, so we just need to
parameterize this rotation. A common way to do this is to compose the rotation
by three rotations around fixed axes such that a vector transforms as

x′ = R3 (ψ) · R1 (θ) · R3 (φ) · x , (647)

with rotation matrices (see section 1.4)


   
cos φ − sin φ 0 1 0 0
R3 (φ) =  sin φ cos φ 0 

 and R1 (θ) =  0 cos θ − sin θ 

 . (648)
0 0 1 0 sin θ cos θ

The angles ψ, θ, φ in (647) are referred to as Euler angles25 . The first rotation is
around the x3 axis, followed by a rotation around the x1 axis, followed again by
a rotation around the x3 axis. Such a combination of rotations allows preparing
any orientation of a rigid body in space. The figure below shows the orientation
of the coordinate axes in the intermediate and final steps (for positive rotation
angles φ, θ and ψ):
x3
x’3
θ x’2 ψ

θ
x2
x’1
x1
φ ψ

25
again after Leonhard Euler, 1707-1783

142
Version: 11th Nov, 2017 11:13; svn-65

To work with the Euler angles in practice, the matrix multiplication (647) has
to be directly carried out26 :

R = R3 (ψ) · R1 (θ) · R3 (φ) (649)


 
cos ψ cos φ − cos ψ sin φ
 − sin ψ cos θ sin φ sin ψ sin θ 
 − sin ψ cos θ cos φ 
 
 
 
=  sin ψ cos φ − sin ψ sin φ .

 + cos ψ cos θ sin φ − cos ψ sin θ 
 + sin ψ cos θ cos φ 

 
sin θ sin φ sin θ cos φ cos θ

12.1.1 Euler angles and angular velocity


In some situations, one can use the Euler angles as dynamic variables that de-
scribe the dynamics of the orientation of a body in an inertial reference frame.
Then, the time derivatives of the Euler angles will become the generalized veloc-
ities. The changing angles can be represented by vectors φ̇, θ̇, and ψ̇ pointing in
the direction of the rotation axis as discussed in section 1.5:
x3
x’3
. x’2
.
ψ φ

x2
x’1
.
x1 θ

Vector φ̇ is aligned with the x3 axis in the inertial system F , vector ψ̇ points
into the direction of the x′3 axis in system F ′ attached to the rigid body, and θ̇
points into the direction of the line of nodes where the two circles in the figure
intersect. The instant values of φ̇, θ̇, and ψ̇ can be summed up to an angular
velocity of the rigid body with respect to the inertial frame F . To be more
useful in describing the dynamics later, we will describe this vector in system F ′
attached to the body.
The simplest case is that of the angular velocity associated to a change of Euler
angle ψ, because it is already aligned with the basis vector 3 of F ′ . therefore, the
components of ψ̇ in F ′ are given by:
′ ′ ′
ψ̇ 1 = ψ̇ 2 = 0 , ψ̇ 3 = ψ̇ , (650)
26
To worsen this misery, not all texts use the same convention on how to count the Euler
angles. Check carefully if you ever need them, or avoid them altogether when you can.

143
Version: 11th Nov, 2017 11:13; svn-65

where ψ̇ is simply the change of Euler angle ψ in time. Vector θ̇ is parallel to


the line of nodes, and therefore perpendicular to the x′3 axis; its components in
F ′ are given by
′ ′ ′
θ̇ 1 = θ̇ cos ψ , θ̇ 2 = −θ̇ sin ψ , θ̇ 3 = 0 . (651)
Finally, the components of φ̇ in F ′ are given by
′ ′ ′
φ̇1 = φ̇ sin θ sin ψ , φ̇2 = φ̇ sin θ cos ψ , φ̇3 = φ̇ cos θ . (652)

The total angular velocity vector ω due to a change of Euler angles in time
is given by the sum of individual rotation vectors. In the reference system F ′
attached to the body, the components of ω are therefore given by
 
φ̇ sin θ sin ψ + θ̇ cos ψ
 
ω = φ̇ + θ̇ + ψ̇ =  φ̇ sin θ cos ψ − θ̇ sin ψ  . (653)
φ̇ cos θ + ψ̇

12.1.2 Limitations of Euler angles to describe body orientations


While every rotation of a rigid body with respect to an inertial system can be
described by a set of Euler angles, its usefulness for practical problems is very
limited. The problem is that not every rotation leads to a smooth evolution of the
Euler angles. For example, a body reference system F ′ and the inertial reference
system F are initially aligned, a rotation around axis x2 would require the angle
φ to jump to π/2 and ψ to −π/2, while θ follows the rotation. This problem is
referred to as gimbal lock, and was a problem in early inertial guidance systems,
among them one that occurred during the lunar landing mission of Apollo 11.
Therefore, other ways of representing the orientation of a body are often cho-
sen. One can use the rotation matrix R directly, but would have to store 9
entries. While this allows for fast vector transformations, this method has the
disadvantage that it is numerically intensive to correct for numerical errors in
matrices that are not exact rotations.
A method that requires four variables to store the orientation of a body, and
avoids the gimbal lock problem as well as the re-normalization problem involves
quaternions. They are an extension of the concept of complex numbers, with
three roots of -1 and allow for an efficient calculation of concatenated rotations.
Quaternions are beyond the scope of this course, but they form the basis of
most orientation descriptions in current applications in navigation, robotics and
computer graphics.

144
Version: 11th Nov, 2017 11:13; svn-65

12.2 Inertia of a rigid body


A description of the dynamics of a rotating body requires an expression for its
kinetic energy. We consider the rigid body to be made up by N masses mα , with
a fixed relative position with respect to each other. The positions of the masses
forming the body are described in a coordinate system or reference frame F ′ fixed
to the body, because there, the positions do not depend on the orientation of the
body in space. The position rα of each particle is described by a center-of-mass
position R of the body with respect to some inertial reference frame F with origin
O, and a distance vector r′ α from the center-of-mass location:

rα r’α

COM
O R

By definition, the velocity of the individual masses making up the rigid body
vanishes identically in the reference frame F ′ moving with the body:

′ dr′ α
v α = ≡ 0. (654)
dt F ′
Using (630), the velocity of mass mα in inertial reference frame F is given by

drα dR dr′ α dR dr′ α
vα = = + = + +ω × r′ α
dt F dt dt F dt dt F ′
| {z }
=0

= V+ω×r α, (655)
where V denotes the center-of-mass velocity of the body, and ω is the instanta-
neous rotation vector of the moving reference frame F ′ with respect to F .

12.2.1 Kinetic energy of a rigid body


With this, the kinetic energy of the body can be evaluated:
1X 1X 2
T = mα v2α = mα (V + ω × r′ α )
2 α 2 α
1X X 1X
= mα V 2 + mα V · (ω × R′ α ) + mα (ω × r′ α )2 (656)
2 α α 2 α
! " #
1 X 2
X 1X
= mα V + V · ω × ′
mα r α + mα (ω × r′ α )2 ,
2 α α 2 α
| {z } | {z }
=M =M R′

145
Version: 11th Nov, 2017 11:13; svn-65

where M is the total mass of the body. The second sum over masses contains
the center-of-mass position R′ in the body coordinate system F ′ . If the reference
system F ′ is centered in the center-of-mass position of the body then R′ = 0,
and the total kinetic energy can be written as
1 1X
T = M V2 + mα (ω × r′ α )2 =: Ttrans + Trot , (657)
2 2 α

i.e., the total kinetic energy is a sum of a translation part that only depends on
the center-of-mass velocity V and the total mass M of the body, and a rotational
energy Trot that is independent from the center-of-mass motion.
The modulus of the vector product in the rotational energy can be converted
to scalar products with the vector identity

(a × b)2 = (a × b) · (a × b) = (a · a)(b · b) − (a · b)2 , (658)

leading to the rotational energy


1X h
2
i
Trot = mα ω 2 r′ α − (ω · r′ α )2 . (659)
2 α

We now evaluate the scalar products explicitly in Cartesian components:


 ! ! ! 
3 3 3 3
1X X X 2 X X
Trot = mα  ωi2 x′ α,k − ωi x′ α,i  ωj x′ α,j 
2 α i=1 k=1 i=1 j=1
3
" 3
! #
X X X 2
= mα ωi ωj δij x′ α,k − ωi ωj x′ α,i x′ α,j (660)
α i,j=1 k=1
3
" 3
! !# 3
1 X X X 2 1 X
= ωi ωj mα δij x′ α,k − x′ α,i x′ α,j = ωi ωj Iij .
2 i,j=1 α k=1 2 i,j=1
| {z }
=:Iij

The total kinetic energy is therefore a bilinear function of the vector ω, with
coefficients Iij that only depend on the mass distribution in the rigid body. The
coefficients Iij can be written in matrix, and the kinetic energy becomes a “sand-
wich product” of a matrix between two vectors,
 
I11 I12 I13
1
Trot = ω·I·ω, 
with I =  I21 I22 I23 
. (661)
2
I31 I32 I33

The object I is referred to as the inertia tensor for the rigid body. A tensor
associated with a physical property has slightly richer properties than a simple

146
Version: 11th Nov, 2017 11:13; svn-65

matrix in the sense that it has well-defined transformation properties under sim-
ilarity transformations. Before looking into this, we review the properties of the
matrix entries of I:
 P 
mα (r′ 2 − x′ 2α,1 )
P P
− mα x′α,1 x′α,2 − mα x′α,1 x′α,3
 α P α α α 
mα (r′ 2α − x′ 2α,2 )
P P
− mα x′α,2 x′α,1 mα x′α,2 x′α,3 
 
I=

− . (662)
α α P α ′2
 P ′ ′ P ′2 
− mα xα,3 xα,1 − mα x′α,3 x′α,2 mα (r α − x α,3 )
α α α

The x′α,jrefer to the j-th component of the position of mass element α with
respect to the center-of-mass of the body, and r′ 2α = x′ 2α,1 + x′ 2α,2 + x′ 2α,3 is the
square of the distance of mass element α from the center-of-mass. The tensor
entries are symmetric under index exchange:
Iij = Iji , (663)
which reduces the independent tensor elements to three diagonal terms I11 , I22 ,
and I33 (referred to as moments of inertia), and three independent off-diagonal
elements I12 = I21 , I23 = I32 , and I13 = I31 (referred to as products of inertia).

12.2.2 Evaluation of the tensor elements


As shown in the figure below for the example of axis x3 , the moments of inertia
(diagonal elements of I) in (662) contain the difference
2 2 2
r′ α − x′ α,3 = r′ α,⊥ , (664)
where r′ ⊥ is the distance to the rotation axis of mass element α. This reproduces
the form for the moment of inertia known from elementary physics courses.
x’3

rα,⊥ m
α
x’2
x’α ,3 r’α

x’1
COM

To evaluate the inertia tensor elements for continuous solids, the sum over all
masses in (662) is replaced by an integral over the volume V of the body, and
the masses mα by a (possibly position-dependent) mass density ρ(r′ ):
Z h i
2
Iij = d3 x′ ρ(r′ ) δij r′ − x′i x′j , (665)
V
3 ′
with volume element d x for the integration.

147
Version: 11th Nov, 2017 11:13; svn-65

12.2.3 Angular momentum of a rigid body


From (123) in section 4.2, a rigid body has a total angular momentum
X X
L= rα × pα = R × P + r′α × p′α , (666)
α α
with center-of-mass position R and total linear momentum P. In the reference
frame F ′ fixed to the body (with an origin at its center-of-mass), this simplifies
to X
L′ = r′ α × p′ α . (667)
α
The velocity v α in p α is given by (655) in an inertial reference frame F . It is
′ ′

referenced with respect to the center-of-mass of the body where V′ = 0, so


p′ α = mα v′ α = mα (ω × r′ α ) . (668)
With this, the total angular momentum (667) becomes
X
L′ = mα r′ α × (ω × r′ α )
α
X h i
2
= mα r′ α ω − r′ α (r′ α · ω) , (669)
α
where in the last step, we made use of the vector product identity
a × (b × a) = (a · a)b − a(a · b) . (670)
Expression (669) has two entries that are linear in ω; to see this better, we look
at the components of L′ :
 
X 2 X
L′i = mα r′ α ωi ′
− x α,i ′
x α,j ωj 
α j
" #
X X X 2
= mα ωj δij x′ α,k − x′ α,i x′ α,j ωj
α j k
" !#
X X X 2 X
= mα δij x′ α,k ′
− x α,i x α,j′
= Iij ωj (671)
j α k j
| {z }
=Iij

with the same tensor components Iij as in (660). The last expression is the result
of a multiplication of vector ω from the right side to tensor I, so the total angular
momentum can also be written as:
L′ = I · ω , (672)
which is the rotation analog to the expression of the linear momentum p =
mv. Inertia tensor I therefore takes the role of the inertial object property for
rotations, but it is not a simple scalar like the mass m. The total rotational
energy (661) can be expressed as a scalar product using the angular momentum:
1 1
Trot = ω · (I · ω) = ω · L′ . (673)
2 2

148
Version: 11th Nov, 2017 11:13; svn-65

12.2.4 Transformation properties of the inertia tensor


In section 1.4, we briefly mentioned that vectors and tensors as physical properties
can be identified by their transformation properties under similarity transforma-
tions M. To see this, we take (672) and change the notation slightly, since the
prime in L′ just indicates that the angular momentum was to be taken with
respect to a coordinate system originating in the COM of the rigid body. We
now consider two versions of (672) in two coordinate systems F and F ′ that both
originate in the COM of the rigid body:

L=I·ω and L′ = I′ · ω ′ . (674)

Both L and ω are vectors, so they obey the transformation rules

L′ = M · L and ω ′ = M · ω (675)

with the same matrix M. Using this in (674) leads to

M · I · ω = M · L = L′ = I ′ · ω ′ = I ′ · M · ω . (676)

This equality holds for all vectors ω, leading to the matrix identity

M · I = I′ · M (677)

or
I′ = M · I · M−1 (678)
which is the matrix version of the componentwise tensor transformation rule (43).
This can be seen using

(M−1 )ij = (MT )ij = (M)ji (679)

for similarity transformations M. Then,


 
Iij′ = (I′ )ij = M · I · M−1
ij
X
= (M)ik (I)kl (M−1 )lj
kl
X
= mik Ikl mjl , (680)
kl

which is exactly the required transformation rule (43) for a tensor of rank 2. So
the inertia tensor I is a property of the rigid body, independent of the chosen
coordinate system in which it is described.

149
Version: 11th Nov, 2017 11:13; svn-65

12.2.5 Principal rotation axes of a rigid body


Relation (672) states that the inertia tensor I linearly transforms the vector ω
into another vector L; if ω and L are parallel, they are eigenvectors of I. Each
tensor I has a set of three orthogonal eigenvectors i1 , i2 , i3 . In a coordinate system
where base vectors ei are parallel to the eigenvectors ii , the inertia tensor takes
a simple diagonal form:
   
I11 0 0 I1 0 0
   
I =  0 I22 0  =:  0 I2 0  (681)
0 0 I33 0 0 I3
The directions ei are called principal rotation axes, and for a rotation around
them, the angular momentum L is parallel to ω. The evaluation of the inertial
tensor is particularly simple if the tensor I is evaluated in coordinates aligned with
the principal rotation axes, as only three tensor elements need to be evaluated.

12.2.6 Relation between body symmetries and the inertia tensor


Evaluation of the inertial tensor is simple in the principal rotation axes, but it
is not always obvious how these axes are oriented. Often the symmetry of the
body gives an indication where they are. At the very least, symmetries can help
to reduce the calculation effort. To see how this works, we first consider a body
with a mirror symmetry. The example below has a mirror symmetry with respect
to plane M1 perpendicular to the x1 axis:

x3
x2

x1
O
body
M1

The inertia tensor as a body property should therefore also be invariant under
a mirror transformation, represented by a matrix M1 :
 
−1 0 0
I = I′ = M1 · I · M1−1 with M1 = M−1
1
=
 0 1 0 

(682)
0 0 1
The symmetry requirement is equivalent to

I · M1 = M1 · I , (683)

150
Version: 11th Nov, 2017 11:13; svn-65

which we now explicitly check by calculating the matrix products:


   
−I11 I12 I13 −I11 −I12 −I13

I · M =  −I21 I22 I23 


and M · I =  I21 I22 I23  . (684)
−I31 I32 I33 I31 I32 I33
Since both matrices need to be the same, we can immediately see by comparison
that I12 = I21 = I13 = I31 = 0. The inertia tensor of a body with mirror
symmetry in x1 axis has therefore the form
 
· 0 0
I= 0 · · 

, (685)
0 · ·
where the dots indicate positions in the tensor that may not vanish. This matrix
is block diagonal, so we already know that a vector parallel to the x1 axis is an
eigenvector of I, i.e., x1 is a principal rotation axis of the rigid body. Similarly,
one can show that the mirror symmetry in x3 direction implies a form of the
inertia tensor of  
· · 0
 
I= · · 0 , (686)
0 0 ·
so the x3 axis is also a principal rotation axis. Therefore, the x2 axis is also a
principal rotation axis, and tensor I is diagonal in the shown coordinates. This
method of identifying vanishing tensor entries due to symmetries requires very
little effort, and simplifies the determination of tensor entries a lot.
The shown example has another symmetry: it is invariant under rotations
around the x3 axis by 120◦ . We consider the implications of this symmetry by
looking at a rotation invariance under any rotation around the x3 axis:
 
c −s 0
R3 (α) · I = I · R3 (α) with R3 (α) =  s c 0 

, (687)
0 0 1
where c = cos α and s = sin α. Carrying out the two multiplications leads to
 
cI11 − sI21 cI12 − sI22 cI13 − sI23

R3 (α) · I =  sI11 + cI21 sI12 + cI22 sI13 + cI23 
,
I31 I32 I33
 
cI11 + sI12 −sI11 + cI12 I13
I · R3 (α) =  
 cI21 + sI22 −sI21 + cI22 I23  , (688)
cI31 + sI32 −sI31 + cI32 I33
which we now explore more carefully. By comparing the (1, 1) entries in both
matrices, we find
−sI21 = sI12 = sI21 (689)

151
Version: 11th Nov, 2017 11:13; svn-65

because of the symmetry Iij = Iji from (663). For rotation angles α 6= 0, π, . . .
where s 6= 0, this requires that

I12 = I21 = 0 . (690)

This, by comparing the (1, 2) components in (688), requires also

I11 = I22 . (691)

Next, we compare the (3, 1) entries in the products (688):


I31 s
I31 = cI31 + sI32 → = (692)
I32 1−c
for a non-vanishing I32 . The same procedure for the (3, 2) entries gives
I31 1−c
I32 = −sI31 + cI32 → = . (693)
I32 −s
Both ratios have different signs, which only can be fulfilled if

I31 = I32 = I13 = I23 = 0 . (694)

With these requirements, a tensor I with a rotational symmetry other than 180◦
in x3 direction has the simple form
 
I11 0 0
I =  0 I11 0 

 , (695)
0 0 I33

i.e., the axis of rotational symmetry is a principal rotation axis, and any axis in
the x1 , x2 plane is a principal rotation axis as well, with degenerate moments of
inertia. A body with such properties is referred to as a symmetric top.
The exploration of symmetries to determine the number of independent entries
in tensors goes far beyond the application to the tensor of inertia: In many areas
of physics, symmetries e.g. in materials (due to their crystalline structure) imply
if a physical property signified by a tensor takes a particular form, or is even
present.

12.2.7 Inertia tensor for displaced rotation axes


In the derivation of the inertia tensor components, we assumed that the mass
distribution is described in a coordinate system centered in the center-of-mass
of the rigid body. However, often a rotation takes place around an axes not
containing the center-of-mass, so it is interesting to calculate the inertia tensor
elements with respect to a point Q that is displaced from the center-of-mass by
a vector −a:

152
Version: 11th Nov, 2017 11:13; svn-65

rα r’α

COM
Q a

To find the inertia tensor elements Jij with respect to the new center Q, we
recall expression (660) for the tensor elements in the center-of-mass system,
" 3
! #
X X 2
Iij = mα δij x′ α,k ′
− x α,i x α,j , ′
(696)
α k=1

and evaluate the expression for displaced positions rα = r′ α + a for the masses:
" 3
! #
X X
Jij = mα δij (x′ α,k + ak )2 − (x′ α,i + ai )(x′ α,j + aj )
α k=1
" 3
! #
X X 2
= mα δij x′ α,k − x′ α,i x′ α,j
α k=1
| {z }
=Iij
" 3
! #
X X
+ mα δij (a2k ′ ′
+ 2x α,k ak ) − (ai x α,j + aj x α,i + ai aj ) (697) ′
α k=1

In the second line, all terms that contain a single x′α component appear in a sum
over all α together with mass mα . Since the x′α,k are positions relative to the
center-of-mass, X
mα x′α,k = 0 (698)
α

by definition. This simplifies the expression of the new tensor elements:


" 3
! #
X X
Jij = Iij + mα δij a2k − ai aj
α k=1
= Iij + M [δij a2 − ai aj ] , (699)

where M is the total mass of the rigid body. Thus, the inertia tensor J with
respect to a different rotation center Q than the center-of-mass is given by the
sum of the inertia tensor I of the body with respect to its center-of-mass and
an inertia tensor by a single mass M , displaced by a vector a from the rotation
center. This is the so-called parallel axis– or Huygens-Steiner theorem27 .
27
after Christiaan Huygens, 1629-1695 and Jakob Steiner, 1796-1863

153
Version: 11th Nov, 2017 11:13; svn-65

12.3 Equation of motion of a rotating rigid body


To describe the dynamics of a rotating body, we first choose a coordinate system
F ′ fixed to the body that is aligned with the principal rotation axes. Then, the
inertia tensor is diagonal, and the rotational energy (661) and angular momentum
(672) are given by
3
1X
Trot = Ii ωi2 and Li = Ii ωi . (700)
2 i=1
The influence of external forces to the body is captured by a torque N acting on
the body; we recall (129):
dL
= N. (701)
dt
This relation was derived in an inertial system, but it is most convenient to
describe the dynamics of rotation in the coordinate system F ′ attached to the
body. Therefore, we use the transformation rule (634) for time derivative of
vectors between reference systems:

dL dL
N= = + ω × L. (702)
dt F dt F ′
We now evaluate this equation for one component in the reference system F ′
attached to the body:

dL1 dL1
= N1 = + ω2 L3 − ω3 L2
dt F dt F ′

d(I1 ω1 )
= + ω2 I3 ω3 − ω3 I2 ω2
dt F ′
= I1 ω̇1 + ω3 ω2 (I3 − I2 ) . (703)
The equations for the other components can be obtained by cyclic permutation
of the indices:
N2 = I2 ω̇2 + ω1 ω3 (I1 − I3 ) and
N3 = I3 ω̇3 + ω2 ω1 (I2 − I1 ) . (704)
These equation can be summarized in a compact notation with the totally anti-
symmetric symbol ǫijk :
X
(Ii − Ij )ωi ωj − ǫijk (Ik ω̇k − Nk ) = 0 . (705)
k

These equations are referred to as Euler’s28 equation of motion for the rigid body.
They look relatively innocent, but what makes them hard to solve in practice is
that the torque N must also be expressed in the reference system F ′ attached to
the rotating body, which may require a knowledge of the instantaneous orienta-
tion in an external inertial reference frame.
28
same Leonhard Euler as before

154
Version: 11th Nov, 2017 11:13; svn-65

12.4 Force-free rotation of a symmetric top


The solution of Euler’s equation of motion (705) becomes relatively simple if
there is no torque acting on the rotating body, and it is freely evolving. We start
with the motion of a symmetric top with moments of inertia I1 = I2 6= I3 . The
equations of motion take the form

(I1 − I3 )ω2 ω3 − I1 ω̇1 = 0


(I3 − I1 )ω3 ω1 − I1 ω̇2 = 0
I3 ω̇3 = 0 . (706)

From the last equation, we find ω̇3 = 0 or ω3 = const., so the first two equations
can be re-arranged:
 
I3 −I1
ω̇1 = − I3
ω3 ω2 = −Ωω2
  (707)
I3 −I1
ω̇2 = I3
ω3 ω1 = Ωω1

with a constant
I3 − I1
Ω := ω3 . (708)
I1
This coupled set of equations can be easily solved by introducing a variable
η := ω1 + iω2 , and adding the two equations (707) accordingly:

ω̇1 + iω̇2 = −iΩ(ω1 + iω2 ) = 0 or


η̇ − iΩη = 0 . (709)

This simple linear equation of motion has a solution

η(t) = AeiΩt = A cos Ωt + iA sin Ωt , (710)

or, for the two components of the rotation vector,

ω1 (t) = A cos Ωt
ω2 (t) = A sin Ωt
ω3 (t) = const. (711)

We could have introduced a phase shift in the solution (710) to meet any initial
condition, but that would not have substantiallyqaltered the solution for ω(t). The
angular velocity vector ω (with modulus |ω| = ω32 + A2 is precessing around the
coordinate axis x′3 in the rotating reference frame F ′ with a precession angular
frequency Ω from (708). Depending on the relative magnitude of I1 and I3 , the
precession vector Ω (indicating the precession sense of ω) is either parallel or
antiparallel to e′ 3 .

155
Version: 11th Nov, 2017 11:13; svn-65

e’3 e’3
A A
Ω Ω
ω3 ω3
ω ω
I1 > I3 Ω I1 < I3
Ω O’ O’

This still describes the dynamics of ω in the reference frame of the body. To
transform this back to an inertial reference frame of an observer, we note that the
force-free condition (and the fact that we have assumed no dissipation) requires
that the rotational energy is constant:
1
Trot = ω · L = const. (712)
2
Furthermore, the total angular momentum of the system is conserved in an in-
ertial reference frame F , i.e., the vector L has both a constant modulus and
direction. Energy conservation in the form of (712) requires that the projec-
tion of ω onto L and therefore the angle between them is constant as well. To
understand the relative orientation of vectors L, ω, and the principal axis of
rotation e′ 3 (often referred to as the figure axis of the symmetric top because of
the symmetry relation discussed earlier), we consider the vector product
ω × e′ 3 = ω2 e′ 1 − ω2 e′ 2 (713)
with the components ωi in the rotating reference frame. This vector is perpen-
dicular both to e′ 3 and ω. The scalar product
L · (ω × e′ 3 ) = I1 ω1 ω2 − I2 ω2 ω1 = 0 (714)
vanishes because I1 = I2 in the symmetric top. Therefore, L is perpendicular to
ω × e′ 3 , which implies that L, ω and e′ 3 are all in the same plane:
I1 > I3 L I1 < I3 L
ω
e’3
figure ω
e’3
axis
precession

Ω O’ O’

Depending on the ratio between I1 and I3 , the orientation of the three vectors
is as shown in the figure; here, Ω denotes the direction of ω precessing around the
figure axis e′ 3 . In both cases, the angle between the figure axis and the angular
momentum is constant, and the figure axis precesses around L fixed in space.

156
Version: 11th Nov, 2017 11:13; svn-65

12.5 Rotation of a heavy symmetric top


As an example of a more complex rotary motion, we consider the movement
of a heavy symmetric top that is spinning around its figure axis, touching the
ground at a single point O; the distance between the touching point O and the
center-of-mass of the top C should be l:

x’3 x3

C g
l

In section 12.2.6 we saw that a rotationally symmetric top with a body coor-
dinate system F ′ chosen to include the figure axis has a diagonal inertia tensor I
with two possibly different entries I1 and I3 .
The challenging part of this problem is the gravitational acceleration leading to
a non-vanishing torque N. This torque is well-defined in the inertial frame, but
needs to be transformed into the body coordinate system F ′ to use the equations
of motion (705) to describe the dynamics of the rotation vector. Furthermore,
the rotation vector ω would need to be integrated to obtain the orientation — a
task that can not be accomplished as easily as for the force-free top.
This problem can be tackled if the Euler angles φ, θ and ψ describing the ori-
entation of the top with respect to the inertial reference frame are taken as gen-
eralized coordinates for the problem. The equations of motion are then obtained
with the standard Euler-Lagrange mechanism (206). For this, an expression for
the kinetic energy T is needed; for the symmetric top, it is according to (661)
and (695) given by
1X 1h 2 2 2
i
Trot = Ii ωi′ = I1 (ω ′ 1 + ω ′ 2 ) + I3 ω ′ 3 . (715)
2 i 2

Note that in order to have only a rotational part of the kinetic energy, the inertia
tensor elements need to be evaluated with respect to the origin O of F and F ′ ,
and O is not the center-of-mass of the top. This is not a problem, because the
figure is still rotationally symmetric with respect to x′3 , hence I is diagonal and
has only two different entries I1 , I3 . Using the expression (653) for the angular
velocity ω in the body coordinate system, one obtains a total kinetic energy as
a function of the generalized coordinates and velocities:
1   1  2
T = I1 φ̇2 sin2 θ + θ̇2 + I3 φ̇ cos θ + ψ̇ . (716)
2 2

157
Version: 11th Nov, 2017 11:13; svn-65

The potential energy is simply given by the total mass and the center-of-mass
position,
U = M gl cos θ , (717)
leading to the Lagrange function
1   1  2
L = T − U = I1 φ̇2 sin2 θ + θ̇2 + I3 φ̇ cos θ + ψ̇ − M gl cos θ . (718)
2 2
Immediately, one can identify φ and ψ as cyclic coordinates, leading to two con-
stant corresponding generalized momenta
∂L  
pφ = = I1 sin2 θ + I3 cos2 θ φ̇ + I3 cos θψ̇ = const. , (719)
∂ φ̇
∂L  
pψ = = I3 ψ̇ + φ̇ cos θ = const. (720)
∂ ψ̇
These quantities are angular momenta, and more specifically, projections of the
total angular momentum L on the x3 axis and x′3 axis, respectively. They can be
used to obtain simple differential equations for the Euler angles ψ and φ:
pφ − pψ cos θ
φ̇ = , (721)
I1 sin2 θ
pψ (pφ − pψ cos θ) cos θ
ψ̇ = − . (722)
I3 I1 sin2 θ
To solve the remaining problem, i.e., the time evolution of θ, we follow an ap-
proach of an effective potential similar to the central force problem in section 8.2.
The system is conservative, so the total energy E should be conserved:
1   1 2
E = I1 φ̇2 sin2 θ + θ̇2 + I3 ω ′ 3 +M gl cos θ = const. (723)
2 |2 {z }
=p2ψ /2I3

By subtracting the (constant) kinetic energy term due to rotation around the
figure axis from E one finds the relation
p2ψ 1 2 (pφ − pψ cos θ)2
E− =: E ′ = I1 θ̇ + + M gl cos θ
2I3 2 2I1 sin2 θ
1 2
=: I1 θ̇ + Veff (θ) , (724)
2
introducing an effective potential Veff (θ) that only depends on the angular mo-
mentum constants pψ , pφ and the coordinate θ.
Formally, one could integrate (724), and analogous to (328) in the central force
problem obtain a function

dθ′
t(θ) = q
2
, (725)
θ0 I1
(E ′ − Veff (θ′ ))

158
Version: 11th Nov, 2017 11:13; svn-65

invert the result to obtain θ(t), and use (721/722) to obtain the solutions for
the other two coordinates φ(t), ψ(t). However, this can be reasonably done only
numerically.
The type of motion can be characterized in a similar way as for the central
force problem by looking at Veff (θ):

Veff

E’

0 θ1 θ0 θ2 π/2 π θ

For pφ 6= pψ , the effective potential limits the angle θ to the interval between 0
and π due to the sin2 θ term in the denominator of the effective potential in (724).
The effective potential has a minimum at θ0 , which can lead to a precession of the
top with a constant angle θ0 if the total excess energy E ′ is minimal. Otherwise,
θ will oscillate between two extremal angles θ1 and θ2 . This oscillatory motion is
referred to as nutation.

12.5.1 Precession of the heavy top without nutation


The minimum in the effective potential that leads to a motion with a constant
angle θ0 can be obtained by differentiating the effective potential. With the
abbreviation
β := pφ − pψ cos θ , (726)
that also appears in (719/720) for the other two Euler angles, the condition for
a minimum of veff is
" #
!∂Veff ∂ β2
0 = = + M gl cos θ
dθ θ=θ0 ∂θ 2I1 θ=θ0
2 2
2βpψ sin θ0 sin θ0 − β 2 sin θ0 cos θ0
= − M gl sin θ0
2I1 sin4 θ0
βpψ sin2 θ0 − β 2 cos θ0 − M glI1 sin4 θ0
= , (727)
I1 sin3 θ0
or
βpψ sin2 θ0 − β 2 cos θ0 − M glI1 sin4 θ0 = 0 . (728)

159
Version: 11th Nov, 2017 11:13; svn-65

This equation has two solutions for β:



v 
2 u
pψ sin θ0  u 4M glI1 cos θ0 
β± = 1 ± t1 − 2
. (729)
2 cos θ0 pψ

For the solutions to be real, the argument of the root must be positive. Assuming
that θ0 < π/2 (i.e., the top is above the surface), this requires
2
4M glI1 cos θ0 < p2ψ = I32 ω ′ 3 . (730)

This condition imposes a minimum for the necessary angular momentum projec-
tion on the figure axis x′3 , but allows for a range of possible angles θ0 as long as
the top spins fast enough. If the angular momentum pψ is much larger than this
minimum, the square root in (729) can be approximated,
v
u
u
t1 −
4M glI1 cos θ0 2M glI1 cos θ0
2
≈1− 2
, (731)
pψ pψ

leading to two approximate values for β of

pψ sin2 θ0 M glI1 sin2 θ0


β+ ≈ , and β− ≈ . (732)
cos θ0 pψ

For the change of the orientation of the figure axis in real space, i.e., the precession
speed, we use (719), and find two values corresponding to the two values for β in
(732):

β+ pψ I3 ω3′
φ̇+ = ≈ = ,
I1 sin2 θ0 I1 cos θ0 I1 cos θ0

β− M gl M gl
φ̇− = 2 ≈ = . (733)
I1 sin θ0 pψ I3 ω3′

The lower precession rate φ̇− is apparently what is typically observed for this
problem, leading to a faster and faster precession rate as a top slows down.

160
Version: 11th Nov, 2017 11:13; svn-65

12.6 Stability of rotations of rigid bodies


Even the very simple case of a force-free rotation of a complex object around
its principal rotation axes is not as simple as it appears; we consider a rigid
body with a coordinate system aligned to its principal rotation axes, assuming
I3 > I2 > I1 :

x’3

x’2

x’1
ω1

If the body rotates approximately but not exactly around the x′1 axis, the
angular velocity vector can be written as

ω = ω1 e′ 1 + λe′ 2 + µe′ 3 , with λ, µ ≪ ω1 . (734)

Euler’s equations of motion for the force-free rotating body (705) allow determin-
ing the time-dependence of the coefficients ω1 , λ and µ:

(I2 − I3 )λµ − I1 ω̇1 = 0 , (735)


(I3 − I1 )µω1 − I2 λ̇ = 0 , (736)
(I1 − I2 )λω1 − I3 µ̇ = 0 . (737)

Since we start out with small perturbations of the rotation vector from the prin-
cipal axis, we can approximate the product λµ in (735) initially by 0, so the
angular velocity ω1 remains constant over time:

−I1 ω̇1 = 0 → ω1 = const. (738)

The other two equations lead to a coupled system of differential equations,


I3 − I1 I1 − I2
λ̇ = ω1 µ , µ̇ = ω1 λ . (739)
I2 I3
We can not use the same trick as for the force-free symmetric top in section
12.4, because the coupling constant is not the same. Instead, we can chain the
equations, effectively eliminating the variable µ:
I3 − I1 I3 − I1 I1 − I2 2
λ̈ = ω1 µ̇ = ω1 λ (740)
I2 I2 I3

161
Version: 11th Nov, 2017 11:13; svn-65

or s
(I1 − I3 ) (I1 − I2 )
λ̈ + Ω21 λ = 0 with Ω1 := ω1 . (741)
I2 I3
This second order linear differential equation for λ(t) has the familiar solution
λ(t) = AeiΩ1 t + Be−iΩ1 t . (742)
For I1 < I3 and I1 < I2 , as postulated initially, the angular frequency Ω1 is real,
resulting in an oscillatory evolution of the rotation components along axes x′2
and x′3 with a fixed amplitude. This means that small deviations of ω from the
principal axis x′1 remain small.
The situation changes if the rotation takes place around axis x′2 ; by cyclic
permutation from the expression in (741) one obtains
s
(I2 − I1 ) (I2 − I3 )
Ω2 = ω2 . (743)
I3 I1
This time, the expression in the square root becomes negative, resulting in an
imaginary Ω2 and therefore an exponentially growing contribution in the equiv-
alent solution to (742). Thus, a small deviation of the angular velocity vector
from axis x2 does not lead to a stable rotation. For a rotation around axis x′3
with the largest moment of inertia, one obtains
s
(I3 − I2 ) (I3 − I1 )
Ω3 = ω3 , (744)
I1 I2
which is real-valued again, resulting in a stability of small perturbations of the
rotation axis alignment with x′3 , similar to the rotations around x′1 . In summary,
rotations around the axis with the largest and with the smallest moment of inertia
are stable, a rotation around the third axis is unstable for a rigid body with three
different principal moments of inertia.

12.6.1 Symmetric top


For a symmetric top, with I1 = I2 a rotation around axis x′1 can be treated
similarly, but the equation of motion for µ in (739) becomes simpler:
µ̇ = 0 → µ = const. (745)
consequently, the equation for λ leads to a non-oscillating solution for the per-
turbation component,
I3 − I1
λ̇ = ω1 µ = const. → λ(t) = At + B , (746)
I2
which is not limited over time. Thus, the rotation of a symmetric top around
an axis other than the figure axis is unstable, and only the rotation around the
figure axis x′3 is stable.

162

You might also like