Mechanics Georgi
Mechanics Georgi
Mechanics Georgi
Getting Acquainted
On your sheet of scrap paper, please attempt to answer the following questions. Half-baked
ideas, gut feelings, and wild guesses are welcome! You may skip questions that dont inspire
you.
1. Why are clouds white?
2. What is likelier to damage your microwave, a steel ball or a steel fork?
3. Why can you jump higher lowering yourself quickly before jumping than from a static
squat position?
4. How do .zip files work? Why does .jpg format reduce the file size of a raw bitmap file
more than .zip compression?
5. Prove that a circle has the highest possible ratio of area to perimeter.
6. Why is it that you blow over a glass bottle with a steady stream of air it produces a
pitch, i.e. a vibrating column of air inside the bottle?
Section is a chance to review lecture, but this usually deserves at most fifteen minutes
because you can review much more efficiently alone or with classmates. When I do review
in section, it is to summarize pithily ideas from lecture for your memory files and to give
them motivation and intuition. The real purpose of section is to solve problems. Usually
we will do a warm-up problem and one or two harder problems. I will ask a lot of difficult
1
questions on the way to a solution. The most obvious pitfall of easy questions is that they are
boring. Furthermore, theres no glory in getting the right answer, while the wrong answer is
embarassing. This causes awkward silences, which are a TFs greatest fear. It doesnt cost
any pride to get a hard question wrong and it feels great to get it right. If your answer has a
sliver of truth we can run with it and everyone is happy. The same applies if its wrong for
an interesting reason. Its not for your ego that I reward partial answers its simply the
way science works. Finally, when you use ideas from lecture in unexpected ways, combine
them, and generalize them, you get a deeper and more confident grasp of them.
At office hours (as well as before and after class) you can talk about anything. If something was presented in class from a point of view that doesnt work well for you we can
approach it from a different angle. Or, you may have a few busy weeks and fall behind. If
this happens, dont be embarassed to come in for a detailed review of weeks of material. My
opinions about section do not imply a disdain for review in general. Nothing is too simple
for office hours, including, for example, line-by-line review of lecture slides. When you have
the time, office hours are great for oddball tangents such as rigorous proofs and other areas
of physics.
Homework is mainly for your practice, but it is also a dialogue. It shows me how well
you understand things and give comments accordingly. Sometimes I will just ask you to look
at the solutions, but if you make a very tempting error I try to explain why it doesnt quite
work. I also note particularly elegant and original solutions. Basically, check the margins of
your problem sets.
A Bit of Review
So far all we have seen is a bit of F = ma and solutions to the differential equations that
this implies.
4.1
As discussed in lecture, you can model air resistance with a frictional force proportional to
the square of speed. Hence
F = ma = mv 2 v 2 =
1 dv
dv
= 2 ,
dt
v dt
(1)
where we arbitrarily write the force constant as m instead of just to simplify our expressions. We have grouped all the v-dependent terms on the same side as the dv/dt because
then we can integrate both sides using the change-of-variables formula. Integrating from
some initial time ti to some final time tf and changing variables from t to v(t)
Z
tf
tf
dt =
ti
ti
1 dv
dt =
v 2 dt
2
v(tf )
v(ti )
v(tf )
1
1
dv =
,
2
v
v v(ti )
(2)
which reduces to
(tf ti ) =
4.2
1
1
v(tf ) v(ti )
v(tf ) =
v(ti )
.
1 + v(ti )(tf ti )
(3)
As you probably saw in AP Physics, a useful model is a system whose degree of freedom is
restored to equilibrium with a force proportional to its displacement:
ma = m
d2 x
= F = Kx.
dt2
(4)
Note that for positive K the minus sign is essential. There are systematic ways of deriving the
solution to such an equation, but lets use of more venerable method of guessing a solution
of the form
x(t) = a cos(t) + b sin(t).
(5)
Plugging this into the differential equation gives
m 2 (a cos(t) + b sin(t)) = K (a cos(t) + b sin(t)) =
p
K/m
(6)
We see that the differential equation has nothing to say about a and b. They are determined
by the initial conditions. For example, we might have be given x(0) x0 and x0 (0) v0 . In
terms of a and b we find
x(0) = a a = x0
x0 (0) = b b = v0 /.
(7)
Answer: Surface tension represents the amount of energy it takes to stretch a surface,
that is, to increase its area. Thus we can define it as energy divided by (area), which has
dimensions
E
M L2 T 2
=
= M T 2 .
(8)
2
A
L
So now we need to combine r, , and to get dimensions T 1 . Only surface tension
involves time, so the answer must contain 1/2 , which has dimensions
1/2
= M 1/2 T 1 .
(9)
Since frequency does not involve any length dimensions, we must combine r and so as to
cancel out length. The combination that does this is
2
(10)
r = M.
Finally, we take the square root of r2 to get dimensions of M 1/2 that cancel the mass
dimensions in 1/2 :
r
= T 1 .
(11)
r2
p
We conclude (1/r) /.
Question: Did some cheating occur here? Is there a variable we left out?
Answer: Perhaps the height h of the vibration makes a difference. In fact, it does.
This ruins everything, because the ratio r/h is dimensionless, and as such you can act on
it with any function, not just raise it to various powers. We would then have to include a
prefactor f (r/h), where f is an unknown function. This greatly reduces the predictive power
of dimensional analysis.
However, ignoring h actually isnt so bad.
Question: Why?
Answer: We know from experience with the harmonic oscillator that for small oscillations, with restoring force approximately proportional to displacement, the frequency actually doesnt depend on the amplitude. So in the case of small vibrations, our original answer
is not so bad.
A Bit of Review
Last Thursdays lecture stressed two concepts. The first, conservation of momentum, is
familiar to you from high school. It will come up in todays problems. The second is not
simply vectors, which you know very well, but the subtler idea that it is often useful to manipulate vectors independent of their components, that is, as objects with inherent meaning.
The dot product and cross product are very useful tools for the so-called coordinate-free
representation of vectors. You can think of this as analogous to solving problems with all
the parameters as symbols and only plugging in values at the end, if at all. It saves time
and increases clarity to preserve generality as long as possible.
Problem: A point particle of mass m slides without friction, starting with infinitesimal
initial speed, at the top of a hemisphere of mass M and radius R that moves on a frictionless
plane. At what angle relative to the hemisphere does the particle fly off?
Solution: First, note that we can treat this as a 2-D problem the particle moves along a
0
0
, vp,y
)
the lab frame of reference and some things in the hemispheres frame, so let
vp 0 = (vp,x
be the particles velocity relative to the hemisphere. Let be the angle of the arc the particle
travels relative to vertical, as in the diagram below.
Answer: The normal component of gravity is mg cos , and the particles motion is only
circular in the hemispheres frame. Thus the condition is
2
mg cos = m vp0 /R
2
2
0
0
.
(1)
+ vp,y
gR cos = vp,x
Now we could set up an ugly differential equation and solve for as a function of time.
It is easier to use conservation laws.
Question: What is the consequence of momentum conservation?
Answer: The hemispheres recoil has to balance the particles xmomentum:
M vh = mvp,x
m
vh =
vp,x .
M
(2)
As for energy conservation, the potential energy lost by falling a distance (1 cos )R
must equal the kinetic energy gained:
1
2
2
M vh2 + mvp,x
+ mvp,y
2
m
2
1
2
2
M
vp,x + mvp,x + mvp,y
=
2
M
m 2
2
2gR(1 cos ) = 1 +
v + vp,y
M p,x
mgR(1 cos ) =
(3)
Now there is only one degree of freedom as long as the particle is constrained to move
along the hemispheres surface, so you know vp,x and vp,y cant be independent. A little bit
of geometry in the hemispheres reference frame tells you that
0
vp,y
= tan .
0
vp,x
(4)
(5)
which gives
m
vp,y = 1 +
tan vp,x .
M
The energy conservation equation becomes
m
m 2
2
2
2gR(1 cos ) =
1+
+ 1+
tan vp,x
,
M
M
(6)
(7)
(8)
2
Define 1 + m/M for convenience and eliminate vp,x
to get
( + 2 tan2 ) gR cos
2gR(1 cos ) =
2 (1 + tan2 )
2(1 cos ) 1 + tan2 = 1 + tan2 cos .
(9)
(10)
Question: Are there any cases that we can check this results against?
Answer: The one that first comes to mind is an infinitely heavy, i.e. immobile, hemisphere. Then = 1 and we get 3 cos 2 = 0, so = cos1 (2/3). Personally, I dont have
any reason to think this is intuitive, so lets try a different limit, perhaps slightly silly: an
infinitely massive particle or infinitely light hemisphere. Then and we have
cos3 + 3 cos 2 = 0
3 cos cos3 = 2,
(11)
which implies = 0.
Question: Why is this intuitive?
Answer: If the hemisphere is infinitely light, it acquires an infinite recoil velocity infinitely quickly, hence it scoots out from under the particle when = 0.
Problem: What is the drag force on a sphere of mass M , radius R, and speed v moving
through a gas of particles with mass m and density (number per unit volume) , assuming
that m << M ?
Solution: We are going to try to do this as coordinate-free as possible.
Question: What should our strategy be? Hint: force equals time derivative of momentum.
Answer: Lets consider the momentum that the sphere loses in an individual collision
and and multiply by the rate of collisions. Of course, this depends of what part of the sphere
collides with a gas particle, so we will need to integrate over the different parts of the sphere.
Since the sphere is much more massive than the gas particles, we may ignore its recoil. Then
the momentum it loses equals the momentum that it imparts to a gas particle. We can
Question: In as coordinate-free a way as possible, what is the velocity of the gas particle
after collision, in terms of the position vector r of the point at which it hits the surface of
the sphere?
Answer: Instead of a sphere, we could equally well imagine the particle bouncing off a
plane tangent to the sphere and perpendicular to r.
Question: What is a coordinate-free way of saying that the angle of incidence equals
the angle of reflection?
v = (
v r) r + (
v (
v r) r) = parallel + perpendicular
v = ( v r) r + ( v (
v r) r) .
f
(12)
p = 2m (
v r) r v = 2mv (
r v)2 .
(13)
Finally, we can delay the inevitable no longer and we institute a coordinate system. It is
not hard to see that v r = cos . In addition to , we have an angle that rotates around
v.
4
Question: We will find the average momentum transfer and then multiply by the total
rate of collisions. To get the average, do we simply integrate over the half sphere and divide
by the surface area of the sphere?
R /2 2
R 2
R
sin
d
d2mv cos2
0
0
(14)
2R2
Answer: No! The incoming particles are not evenly distributed over the surface of the
sphere. Rather, they are evenly distributed over the circular cross section of the sphere.
Looking at the sphere head-on, this cross section can be divided in to annuli of infinitesimal
width dr:
The area of an annulus at (cross-sectional) distance from the center of the circle is 2rdr,
so what we want is
RR
(2r)(2mv cos2 )dr
0
(15)
R2
We must relate r to . From the diagram below, we see that R sin = r, and therefore
cos2 = 1 (r/R)2 .
Thus we have the average momentum loss per collision:
RR
(2r)(1 (r/R)2 )dr
2mv 0
R2
= mv.
Question: Finally, what is the rate of collisions?
5
(16)
(17)
Answer: The sphere moves through a distance vdt in time dt, hence it sweeps through
a volume R2 vdt. This volume has R2 vdt gas particles, so the rate of collisions is R2 v.
Therefore, the rate of momentum loss (force) is
F = R2 v 2 m.
(18)
Problem: What are the relative probabilities for molecules in a gas at equilibrium to
have velocity
v = (vx , vy , vz )? That is, what is the function f (v) such that the probability
to have velocity v0,x < vx < v0,x + dvx , v0,y < vy < v0,y + dvy ,v0,z < vz < v0,z + dvz is
f (v0,x , v0,y , v0,z )dvz dvy dvz ?
1
Solution: This seems hopeless. When something in physics seems hopeless, a direct attack
usually increases the hopelessness. We need something clever.
Question: What are some useful symmetries?
Answer: First, q
no direction is more awesome than any other, so f can only depend on
the magnitude v = vx2 + vy2 + vz2 . Second, motion in every direction is independent, which
means that the probability distribution of x-velocities is independent of the distribution of
y-velocities. When things are independent, probabilities simply multiply, so we must have
for some function g
f (v) = g(vx )g(vy )g(vz ).
(19)
Now this is actually a very significant constraint, because there are lots of ways to change
the components of v in such a way that the magnitude v is unaffected. Then the left side
of this equation doesnt change, and so everything must also work out so that the left side
doesnt change! To make this perhaps a little starker, take the natural logarithm:
ln f (v) = ln g(vx ) + ln g(vy ) + ln g(vz ).
(20)
Question: Now what? Hint: whats a mathematical way to talk about changes?
Answer: At this point it is at most a hunch, but lets try taking a derivative. Specifically,
take the partial derivative of both sides with respect to vx , which just means a derivative
in which we pretend vy and vz are constants. If you remember the chain rule, you get
Now v =
1 dg
1 df dv
=
f (v) dv dvx
g(vx ) dvx
(21)
vx
2vx
dv
=
= p 2
,
dvx
v
2 vx + vy2 + vz2
(22)
q
vx2 + vy2 + vz2 , so
whence
1 df vx
1 dg
=
f (v) dv v
g(vx ) dvx
1 df
1
dg
=
f (v)v dv
g(vx )vx dvx
(23)
(24)
= 2vdv
f
ln f = v 2 + c
f exp(v 2 )
(25)
Suppose we have some multivariable function f (x, y) and want to know how much it changes
when x or y changes. To do this, we simply differentiate with respect to x (say) in the familiar
way and pretend that y is a constant. The definition of such of operation is
f (x0 + x, y0 ) f (x0 , y0 )
f
(x0 , y0 ) lim
.
x0
x
x
We define the partial derivative with respect to y analogously. For example
x2 + y 2 = 2x,
(xy) = y,
(sin(xy)) = x cos(xy).
x
x
y
1.2
(1)
(2)
Chain Rule
Now that we can take derivatives, lets invent a chain rule. First, a digression about fruit
sales. Suppose I sell na apples and np pears at prices pa and pb . Further suppose that the
number of apples and pears I can sell is proportional to the number m of sunny days in the
growing season, so that na(b) = a(b) m. Then the rate of change of my total revenue R with
respect to the number of sunny days is
R
= a pa + p pp .
(3)
m
The way you reasoned this is as follows: each sunny day give a apples. And each one of
those apples gives revenue pa , for a total of a pa . Repeating the logic for pears gives the
above total. Now note that this is just another way of saying
R na
R np
R
=
+
.
m
na m np m
(4)
This is exactly the same as the single-variable chain rule, but now the independent variable m
has two paths, one via na and one via np , through which it can cause a change in R. This
logic generalizes: partial derivatives of composite functions are the sum of chain rule-like
expressions for each path. For example, suppose we have some function f (g1 (x, y), g2 (x, y)).
Then
f
f g1
f g2
=
+
.
(5)
y
g1 y
g2 y
Note that we often get lazy and write the partial derivative symbol when technically something is a single derivative. Finally, you might think the apples and pears example was
unrigorous because it was a linear function. However, any well-behaved function looks linear
locally, so as far as derivatives were concerned our example did not suffer a loss of generality.
1
1.3
Quick definition: a vector field is something that assigns a vector to each point in space.
Suppose we wanted a measure of how much a vector field is flowing outward. In the
direction,
diagram below, the right face (colored yellow) of the infinitesimal box faces in the x
so the amount of flow leaving this face is proportional to the x-component Ax of the
field, evaluated at (x + dx, y, z). The amount entering the left face is similarly Ax (x, y, z).
Subtracting, we get a contribution
Ax (x + dx, y, z) Ax (x, y, z)
Ax
x
(6)
Adding the contributions from the four other faces we can define the divergence
A
Ax Ay Az
+
+
.
x
y
z
(7)
The dot product notation is a slight abuse of notation but makes sense if we define the
nabla operator
, ,
.
(8)
x y z
Those of you who have seen the divergence theorem can now appreciate its intuitive
meaning. The flux across a surface is the amount flowing in minus the amount flowing out.
If this surface encloses a volume V , then we can subdivided V into infinitesimal boxes, and
the flux is the total amount flowing in minus the total amount flowing out of these boxes.
But that is simply the integral over the divergence over V .
1.4
The curl is also motivated by a tangible quantity: if you were to walk around an infinitesimal
square, how much would the force field go with or against your motion? In other words,
what is the integral of the component of the force field that is tangent to your path?
Lets say your path were in the x-y plane (this is defined as the z-component of the
curl, since paths in this plane wrap around the z-axis). Then the component of the force
2
along the path c4 is Fx (x, y + y), while along c2 it is Fx (x, y). This gives a total of
Fx (x, y) Fx (x, y + y) Fx /y. Adding in the sides c2 and c4 , the total amount the
force field F(x, y) pushes you as you go around the path is
F
Fy Fy
+
,
x
x
(9)
which defines the curl. As with the divergence, the notation can be taken literally the curl
can be computed as
x
y
z
F=
(10)
.
x y z
Fx Fy Fz
Using this geometric definition of the curl, we can give a non-rigorous but compelling
proof of Stokes Theorem, which says that the line integral of a vector field along the boundary of a surface (the surface is two-dimensional, possibly curved like the surface of a hemisphere, and the boundary is one-dimensional) equals the integral of the curl of the vector
field over the surface itself.
The proof follows from the picture above. If we add (integrate) the curl over all the
infinitesimal squares, we see that contributions from an edge shared by two squares cancels;
3
the clockwise flow is going left for one square and right for the other, or up for one and down
for the other. The only thing that doesnt cancel are edges that arent shared, that is, the
boundary of the surface.
A corollary of Stokes Theorem is that a field with zero curl does zero net pushing
around any closed path, not just infinitesimal ones. This explains why conservative forces
must have zero curl. If not, there would exist some path that would give you more and more
energy the more you walked around it.
Another corollary is that the work a conservative force does between two points doesnt
depend on the path. To prove this, suppose we have two paths, 1 and 2 , from A to B.
Then reversing 2 and adding it to 1 gives a closed path that starts and end at A. Thus
the work along 1 minus the work along 2 has to equal zero, which implies that the work
along the two paths is the same. This lets us define the potential energy at point X as the
(path-independent) amount of work it takes to get to X from some arbitrary starting point.
The arbitrary starting point only affects the potential by an additive constant.
Resonance
Resonance is simple: as we have seen, many systems like to oscillate at a particular frequency.
If you push it back and forth at that frequency, it will keep absorbing energy without limit
unless some damping force keeps the amplitude in check. If you push it near the natural
frequency, you will generate a large amplitude.
In section we also talked about the bottle problem posed in section 1. However, it would
take too long to code the diagrams. Its easier to explain in person.
Conceptual Problem
Problem: A moving mass collides elastically with a pair of other masses at rest, something
like figure 5.44 in Morin. Assume that the three masses are equal. Show that in a two
dimensional collision, it is impossible to have the same non-zero angle between the velocities
of each pair of particles after the collision. Note that the statement of the problem implicitly
assumes that none of the velocities in the final state vanish, because at zero velocity, that
angles are not defined.
Solution: The idea is to use energy and momenum conservation efficiently in vector notation. Again call the initial velocity of mass 1 v. Call the final velocity of mass j vj . Then
conservation of energy is (cancelling a factor of m/2)
v v = v1 v1 + v2 v2 + v3 v3
(11)
(12)
Taking the dot product of both sides of (12) with itself gives
v v = (v1 + v2 + v3 )(v1 + v2 + v3 ) = v1 v1 +v2 v2 +v3 v3 +2 (v1 v2 + v1 v3 + v2 v3 )
(13)
Subtracting (11) from (13) gives
v1 v2 + v1 v3 + v2 v3 = 0
(14)
This is the mathematics we need for both parts. If all the angles are equal, (14) becomes
(v1 v2 + v1 v3 + v2 v3 ) cos = 0
(15)
(16)
which is impossible if all the masses are moving in the final state. Thus two dimensions
doesnt work.
Q: Show that this is possible in three dimensions, and compute .
A: This is now easy from (15). We can satisfy it only if cos = 0. This is possible in
three dimensions. The three velocities in the final state are perpendicular to one another.
Review
Euler-Lagrange Equations
If you can write the kinetic energy T and the potential energy U in terms of some set of
coordinates {qi } and their time derivatives, then, defining the Lagrangian L = T U , the
equations of motion are
d L L
= 0 i.
(1)
dt qi qi
In Morin it is proved that these are equivalent to Newtonian mechanics. The proof is
unilluminating and there is no point recapitulating it here. This may seem like pulling a
rabbit out of a hat, and historically Lagrangian mechanics was just a convenient coincidence.
From the modern point of view, the Lagrangian is more fundamental than force. Physicists
like to start with a few symmetries and logical properties that the laws of physics ought to
exhibit and then deduce the laws from these properties. It turns out, as perhaps we may see
later in the semester, that the Lagrangian is an excellent way to encode symmetries.
By the way, you might be skeptical whether taking a partial derivative with respect to
q (while holding q constant) is a legal operation. If you retrace Morins proof, you will see
that what this really means is that you write the Lagrangian in terms of q and q,
pretend
that they have nothing to do with each other, and then take the derivative as a formal
(and perfectly well-defined) operation. Since this is the operation that the proof refers to,
the Euler-Lagrange equations you obtain this way are valid. Your skepticism is about the
notation, not the equations.
1.2
There are two kinds of conservation laws that follow from the Lagrangian method. The
easy one is that if the Lagrangian depends on qi but not qi for some coordinate qi , then the
Euler-Lagrange equations immediately tell us that
d L
= 0.
dt qi
(2)
where is some continuous parameter that represents how big the transformation is. Above,
we have Taylor-expanded to first order. For example, if we rotate 2-D coordinates by an
angle around the origin yields
x = x cos() y sin() = x y . . .
y = y cos() + x sin() = y + x . . .
(4)
Hence x = y and y = x.
So far there is no physics, only the statemtent that a continuous transformation can be
expanded to first order. Physics occurs when the Lagrangian is unchanged by this transformation, which means that the laws of physics look the same in both the original and the
transformed coordinate systems. If two things are equal, they must also be equal to first
order in , so we have
L(qi , qi ) = L(
qi , qi )
i + . . .)
= L(qi + i + . . . , qi +
X
L
L
i
= L(qi , qi ) +
+ i
qi
i
i
X
i L + i L
0=
qi
qi
i
(5)
(6)
whence
X
i
L
d
d L
i
+ i
=
qi
dt qi
dt
X
i
L
i
qi
!
= 0.
(7)
is conserved.
L
qi
(8)
(9)
Waves on a String
Problem: Derive the equation of motion of a string of length L, tension T , and density
with its endpoints held in place. You may assume that all motion occurs in a single plane.
Solution: The string is essentially a continuous object, while we have only formulated
Lagrangian mechanics in terms of finitely many degrees of freedom. To get around this
difficulty, we could concentrate all the mass of the string at N evenly-spaced points and take
the N limit: The mass is L = mN , where m is the mass of each point, so m = L/N .
The natural choice of coordinates is are the heights yi of each mass, where y0 = yN = 0 are
not free.
The kinetic energy is easy:
mX 2
KE =
y .
(10)
2 j j
The potential energy is the tension times the amount of stretch, which is
i
i
X hp
X hp
PE = T
x2 + y 2 x = T
x2 + (yi+1 yi )2 x
i
(11)
We could leave the potential energy like this, but then the equations of motion would be
non-linear. In a future section we might discuss the effects of non-linearity. For now, we
note that for small displacements we can take a Taylor approximation to get
PE =
T X
(yi+1 yi )2 .
2x i
(12)
Putting it together,
L=
mX 2
T X
y j
(yi+1 yi )2 .
2 j
2x i
(13)
(14)
(15)
T
[(yj+1 yj ) (yj yj1 )] = 0
x
(16)
yk T
yk+1 yk
x
yk yk1
x
=0
x
2y
2y
2 T 2 = 0.
t
x
(17)
where T is the time translation operator T : f (x, t) f (x, t + t) and M is some matrix.
Taking the eigenvectors of M , we get solutions that map to constant multiples of themselves
when time-translated:
T [y](x, t) = y(x, t + t) y(x, t).
(19)
The only well-behaved functions with this property are exponentials, so we are guaranteed
a basis of solutions
y(x, t) = eit f (x).
(20)
Q: Furthermore. . .
A: . . . the same logic applies to x, so we get solutions
y(x, t) = eit eikx .
(21)
p
T /.
(22)
(23)
Q: What is one more contraint that fixes the different values kn , and in turn, n ?
A: We need y(t, 0) = y(t, L) = 0, so kn = n/L. Thus we have determined the normal
modes of the string and their frequencies.
Q: What quantity is conserved by virtue of horizontal translational invariance?
A: Before we take the limit N , this is a discrete symmetry and Noethers Theorem
technically doesnt apply. However, it is a continuous symmetry in reality so we expect
the result to be forgiving of a lack of rigor. In terms of the continuous function y(x, t) a
translation is
y
+ ...
(24)
y(x, t) y(x + , y) = y(x, t) +
x
Back in discrete language, this says that
yi yi + x(yi+1 yi ),
(25)
from which we read off yi = x(yi+1 yi ). Thus we have the consevred quantity
Z L
X
X
y y
L
x
(yi+1 yi )
= mx
(yi+1 yi )y i
dx.
x
t
i
0
i
i
(26)
If it seems redundant to switch back and forth from continuous to discrete so much, it should.
A tool called the functional derivative allows one to carry out the whole analysis without
discretizing.
4
4.1
Conceptual Questions
Bathtubs
Question: When the water in a bathtub drains, it forms a whirlpool. How is this possible
without violating Noethers Theorem?
4.2
Throat Singing
Question: It is possible to simultaneously sing a low note and a very high overtone. Why
doesnt changing the low note (essentially, reducing tension in your vocal cords to reduce
, which should affect all modes, not just the fundamental one) also change the high note?
Note: in section I will try to demonstrate this. If I fail, look it up on YouTube.
Review
Partial Derivatives and Their Notation
Suppose we had a function f (x) that depends on x via x2 and x3 . For example
f (x) = sin(x2 + x3 ) = g(x2 , x3 ), where g(a, b) = sin(a + b).
(1)
(2)
=
sin(arg1 + arg2) = cos(arg1 + arg2)
arg1
arg1
g
=
sin(arg1 + arg2) = cos(arg1 + arg2).
arg2
arg2
(4)
We can introduce notation that is precise, well-suited for computers, and perhaps somewhat
soulless as follows. Define
g
arg1
g
g (0,1) (arg1, arg2)
.
arg2
g (1,0) (arg1, arg2)
(5)
Then, for example,g (2,0) is the second partial derivative with respect to its first argument.
The precise statement of the chain rule is then
dg
d(x2 )
d(x3 )
= g (1,0) (x2 , x3 )
+ g (0,1) (x2 , x3 )
= cos(x2 + x3 )(2x + 3x2 ),
dx
dx
dx
(6)
as obtained by the direct method. The important thing to take away from this is that we
were in no sense pretending that x2 and x3 were independent of each other. All we said is
1
that change in x affects both x2 and x3 , which in turns affects sin(x2 + x3 ). Confusion comes
when we get sloppy and write
g
(x2 )
g
.
g (0,1) (x2 , x3 )
(x3 )
g (1,0) (x2 , x3 )
(7)
On the LHS, it is clear that we differentiate g with respect to its first argument and then
plug in the value x2 . On the RHS,it seems that x2 and x3 are two distinct independent
variables.
Incidentally, this lets us make sense of many of the partial derivatives in the textbook.
For example, for Lagrangian L = L(q, q,
t)
dL
L
6=
???
t
dt
(8)
Heres what these two quantities refer to. The partial derivative really means the partial
derivative of L with respect to its third argument. That is
L(0,0,1) (q, q,
t)
sloppiness
L
t
(9)
The normal derivative,on the other hand is the derivative of the number L,which depends
on t via its first argument q(t), its second argument q(t),and
t)
dt
dt
dt
dt
L dq L
sloppiness L dq
+
+
.
q dt
q dt
t
1.2
(10)
Lets review how stationary action leads to the Euler-Lagrange equations using the good
notation introduced above. We define the action S : C [t1 , t2 ] R
Z t2
S[q]
L(q(t), q(t),
t) dt.
(11)
t1
This is a very general kind of functional that comes up in all sorts of contexts. Now suppose
we want to find the analog of a critical point minimum, maximum, or saddle point. Just
as a single-variable function has a critical point when its derivative vanishes,that is, when
you can change its independent variable by a small amount and get no change up to first
order in , the condition for some function q(t) to be a stationary value, given fixed boundary
conditions q(t1(2) ) = q1(2) , is
d
d
t2
(12)
t1
We need (t) to vanish at the boundaries because otherwise we would be comparing a nearby
function with different boundary conditions. This would be a valid question, but its not the
question we are asking! We are only maximizing or minimizing or finding a saddle point
within the set of functions that have the same boundary conditions at t1(2) . There is no a
priori reason for this except that a lot of useful problems are of this form. Of course, we know
a posteriori that finding the stationary action subject to the boundary conditions yields the
Euler-Lagrange equations.
Anyway, the math proceeds and we cavalierly take the derivative inside the integral sign.
Then we apply the chain rule, noting that t does not depend on . The condition for q(t) to
be stationary is (the condition s.t. (t1 ) = (t2 ) = 0 is implicit throughout)
Z t2
d
t)
q(t)
+ (t)
+ ...
d
d
t1
dt
+ L(0,0,1) (q(t), q(t),
t) dt = 0 (14)
d
Z
t2
t)(t)
dt = 0
(15)
t1
(16)
Z
Now we integrate by parts (
Z
u dv = uv
t) and dv =
(t)
dt, hence v = (t)):
Z t2
Z
(0,1,0)
t2
(0,1,0)
L
(q(t), q(t),
t)(t)
= L
(q(t), q(t),
t)(t) t1
t1
t2
t1
t2
=0
t1
d
L(0,1,0) (q(t), q(t),
t) (t) dt
dt
(17)
d
L(0,1,0) (q(t), q(t),
t) (t) dt,
dt
(18)
where we used the fact that (t1(2) ) = 0. Substituting this the stationarity (is that a word?)
condition becomes
Z t2
d (0,1,0)
(1,0,0)
L
(q(t), q(t),
t) L
(q(t), q(t),
t) (t) dt = 0
(19)
dt
t1
3
Now this is true for all (t). The only thing that can be multiplied by any function such
that the product is always zero is the zero function (you should be able to convince yourself
of this hand-wavingly), so we conclude that the thing in brackets is zero:
L(1,0,0) (q(t), q(t),
t)
d (0,1,0)
L
(q(t), q(t),
t) = 0
dt
d L
sloppiness L
= 0.
q
dt q
(20)
(21)
Thus we have obtained the Euler-Lagrange equation. There are two nifty consequences of
this derivation. First, we have shown that a lot of interesting optimization problems reduce
to the Euler-Lagrange equation. Second, we have shown that Newtonian mechanics, which
is equivalent to the Euler-Lagrange equations, is equivalent to the principle of stationary
action.
1.3
Problem: A particle of mass m slides along a hoop of radius R. The hoop is aligned
vertically and rotates at angular speed around a vertical axis running through its center.
What are the equilibria of the particles position, are they stable, and what are the frequencies
of small oscialltions about those equilibria if they are stable?
Solution: The easiest coordinate to use is the angle along the hoops circumference that
the particle is displaced from the bottom. There are two perpendicular components of the
velocity. They are R along the hoop and R sin in a plane parallel to the ground due the
the hoops rotation. The height relative to the center of the hoop is R cos , so
mR2 2 2
sin + 2 mgR cos .
(22)
L=
2
The corresponding Euler-Lagrange equation is
mR2 mR2 2 sin cos + mgR sin = 0.
4
(23)
The points of equilibrium have constant , so = 0. We find that either sin = 0, in which
case = 0 or = , or cos = g/(R2 ). = is obviously unstable. What about = 0.
Then expanding to first order in small we get
2 + g/R = 0.
(24)
p
By now we can read off that the frequency of small oscillations is = g/R 2 , which is
real (stable) if g/(R2 ) 1. Whats interesting about this is that this is the condition for
the other equilibrium not to exist at all, since cos = g/(R2 ) 1 is impossible.
Now lets look at the other equilibrium. We have already found a condition for its
existence; now we must check its stability. Expand to first order in the deviation of from
0 = cos1 (g/(R2 )): = 0 + . Then
sin = sin(0 + ) = sin(0 ) cos() + cos(0 ) sin() sin(0 ) + cos(0 )
cos = cos(0 + ) = cos(0 ) cos() sin(0 ) sin() cos(0 ) sin(0 ).
(25)
(26)
(27)
Now the part of the LHS that doesnt involve must equal zero, since this is the definition
of an equilibrium (you could of course check this). The first order remainder is
g
+ 2 (sin2 0 cos2 0 ) + cos 0 = 0
(28)
R
g
+ 2 (1 2 cos2 0 ) + cos 0 = 0
(29)
R
g2
+ 2 2 2 = 0
(30)
R
r
g2
The freqeuncy is then 2 2 2 , which is real (stable) iff
R
2
g2
g2
g2
g
1
1.
2
2
2
4
2
4
R
R
R
R2
(31)
Problem: Show that a two-dimensional bubble subject to pressure and surface tension
forms a circle.
Solution: We need to find some energy functional for the system. Perhaps it is
E = circumference surface tension () area pressure (P )
5
(32)
r0 ()2 + r2 d
(34)
dl = dr2 + r2 d2 = r0 ()2 + r2 d L =
0
Then
Z
E
0
1 2
p 0 2
2
r () + r r () d
P
2
(35)
r()
p
p
+ r() = 0
0
2
2
0
P d r () + r
P r ()2 + r2
6
(36)
=0
=R
PR
P
(37)
We have shown that the shape is a circle and we have found how the radius depends on the
physical parameters.
Challenge Problem
Z
V [f ] =
0
2
f (t) dt dx
(38)
f 2 (t) dt = 1.
(39)
Review
Unlike previous weeks, the new material is not mathematically difficult but is conceptually
bizarre. Thus we will do something new and integrate conceptual questions into the review.
Anyway, relativity starts from two postulates:
1. The laws of physics are the same in all non-accelerating reference frames.
2. The speed of light (in a vacuum) is a law of physics.
Now this seems very nice, but something really ought to bother you about the second
postulate.
Q: What is it?
A: Whats so special about light that its speed is a law of physics? After all, an electrons
speed is not a law of physics electrons move at all sorts of speeds. If thats too abstract,
your speed is not a law of physics. Hence,
Q: Why is the speed of light a fundamental law of the universe? You should think about
it for a few minutes, after which the hint is: think hand-wavingly about the Newtonian
mechanics of massless objects.
A1: If you apply F = ma to a massless object, you will find that a massless object that
interacts with other stuff with any force at all experiences an infinite acceleration. This is
untenable and requires that we somehow change the laws of physics so that speeds dont
become infinite. We could do this in three ways give every massless object its own speed
limit with its own ad hoc reason for having that speed, invent some fancy new speed limit
laws, or simply impose the same universal speed limit on all objects. The third is the most
aesthetically satisfying.
A2: Using only Maxwells equations, one can derive a wave equation for light which is
identical to the equation of a string we found two weeks ago:
1 2
2
+
= 0,
x2 c2 t2
(1)
where is some component of the elctric or magnetic field. The actual equation is the 3-D
generalization of this, but thats not important. We solved this and found standing waves,
but you can equally well get travelling waves. Try a solution (x, t) = f (x vt) for some
function f . This represents a shape f (x) travelling at velocity v. Plugging in, we get
v 2 00
f (x vt) 2 f (x vt) = 0.
c
00
(2)
Hence v = c. Since Maxwells equations are laws of physics, the corollary that electromagnetic waves travel at the fixed speed c, which can be computed from the permeability and
permittivity of free space, must also be a law of physics.
1
Q: How could one get around this? Hint: why isnt the speed of waves on the string from
two weeks ago a fundamental law of physics?
A: Just as the wave equation on a string implicitly involves the reference frame of the
string, perhaps Maxwells equations also involve some implicit reference frame, that is, some
medium in which electromagentic waves travel. Nineteenth-century physicists hypothesized
an invisible medium called the ether, which was disproved in the famous Michelson-Morley
experiment. Nowadays people make fun of the ether concept, but really its a very reasonable
idea. Certainly it fares no worse than relativity when subjected to Occams Razor.
So anyway, if you are forced to accept these postulates, you can start calculating some
odd stuff such as time dilation and length contraction. To prevent chaos, one must always be
clear about what reference frame one is using. In class, a reference frame was described as a
collection of synchronized clocks. In more human terms, it is a collection of observers who can
agree on time and distance measurements. Measurement is not the same as perception;
for example, suppose two people live several miles apart and watch lightning storms for fun.
Suppose they want to agree on the exact time of lightning strikes. Although they observe
(hear) the thunder arriving at different times due to the finite speed of sound, they can
compare the times at which they heard the thunder and come up with some triangulation
scheme to agree on when the lightning actually occurred. Similarly relativity deals with
measurements that every observer in a given reference frame could agree on after accounting
for their relative positions, etc.
In lecture you saw the example of two black blocks defining a reference frame with one
blue block moving relative to them. By a light clock argument, we saw that the amount of
time (that is, the number of light clock ticks) that the blue block measures is less than the
time that the blackpblocks measure. That is, time for the lone observer moves more slowly
by a factor = 1/ 1 v 2 /c2 .
Q: First things first, if a light clock runs slow, how do you conclude that all physical
processes run slow?
A: Otherwise, the speed of light relative to other things would be different, which is
the same as saying that the speed of light is not the same in every reference frame. For
example, if the blue observers atomic clock did not run slowly but his light clock did, he
would measure a different speed of light (using his atomic clock) than the black obervers
would with their atomic clock.
Q: How would you challenge the lone observer, slow clock statement?
A: Add a second blue observer who synchronizes with the first!
So now you have two reference frames, blue and black. Each thinks that the others clock
is running slowly! But then we seem to have a paradox: for some time interval, the blue
clock ticks less than the black clock, which ticks less than the blue clock...
Q: Help!
A: The two frames dont agree on time intervals. For example, suppose the two frames
decided to measure each others time as follows: Starting from when the right blue block
passes the left black block and ending when the right and left blocks are at the same positions
(we assume the blocks are separated by the same distance in their reference frames), the blue
2
frame counts the number of black ticks and vice versa. The problem is that this relies on
the right blocks and left blocks lining up simultaneously, which can be true in one frame and
false in the other. Thus there is no paradox, althought there is still weirdness.
Another thing we introduced was the idea of spacetime coordinates and a spacetime
interval. If we have a point x and a time t, then we can combine them into a four-dimensional
object (t, x). This is just convenient notation for a theory where space and time get mixed
up. Then the spacetime interval between two events (t1 , x1 ) and (t2 , x2 ) is just the difference
(t2 , x2 ) (t1 , x1 ). In Morin, the way to switch between different reference frames is derived.
Suppose reference frame S 0 moves at velocity v
x with respect to reference frame S. Then if
frame S measures some two events separated by the interval (x, t), then frame S 0 measures
it occurring at coordinates (x0 , t0 ), where in units c = 1
x0
t0
y 0
z 0
= (x vt)
= (t vx)
= y
= z
v
x0 v
0 =
y 0
0
0
z
0
0
(3)
(4)
(5)
(6)
(8)
and you can check that it is invariant under Lorentz transformations. In words, s measures
the time surplus relative to how long it takes light to travel a certain distance in space. Then
if s2 is positive it means light could travel the spatial interval in less than the time interval.
If two events are separated by such an interval, this means that the earlier can affect the
later one.
3
v() +
= v() + (1 v()2 )) + O( 2 ).
1 + v()
v( + ) v()
= 1 v 2 v = tanh
0
v 0 () = lim
(9)
(10)
I uploaded a document that derives Lorentz transformations from the invariant interval
onto the website, and it shows the mathematical significance of the rapidity.
Lets do some problems.
Problems
2.1
2
=0
x2 t2
(11)
is true in one reference frame, then the same equation is true if we replace (t, x) (t0 , x0 ).
Solution: This calls for the chain rule, which we will apply to second derivatives for the
first time. We can say that depends on (t, x) via the intermediate variables
x0 = (x vt)
t0 = (t vx)
(12)
(13)
(14)
(15)
=
v 0 .
x
x0
t
(16)
= v 0 + 0 .
t
x
t
(17)
Hence
2
2 #
v 0 v 0 + 0
x0
t
x
t
2
2
2
2
2
=
1v
+ (2v 2v) 0 0
(x0 )2 (t0 )2
x t
2
2
= 2 (1 v 2 )
0
2
(x )
(t0 )2
2
2
=
.
(x0 )2 (t0 )2
2
2
= 2
x2 t2
"
(18)
(19)
(20)
(21)
2.2
Problem: One of the postulates of relativity is that the laws of physics are the same in all
inertial reference frames. It would certainly contradict this if a force existed in one frame
and not in another. But wait... do you remember magnetism from AP physics?
Q: Whats the problem?
A: Magnetic fields are produced by and act on moving charges. A charged particle that
is moving in one frame and experiences a magnetic force is motionless in some other frame.
The force is no more.
Q: Whats a possible resolution?
A: As we will see, the forces remain the same, but in one frame they are called magnetic
and in another they are called electric. In other words, given relativity and electrostatics,
magnetism must exist. If thats not compelling enough for you, I dont know what is. Lets
now state the quantitative part of our problem.
Problem: Suppose we model an electric current as a very long line with density of
positive charges moving at velocity v
z and a density of negative charges moving at velocity
v
z , for a net current of I = 2v. This produces a magnetic field
B(r) =
0 I
.
2r
(22)
This means that their densities are increased by the same factors
q
2
= / = / 1 v
(25)
Since the negative carriers move faster, > + and their density in enhanced more. Thus
the charged particle sees a negatively charged wire with net charge density (+ ). This
confirms that for q > 0 and u > 0 the particle is attracted to the wire.
2.3
Problem: A train has length L in its rest frame and is moving with speed vt x with respect
to a station, where v > 0. A conductor walks to the right end at speed vc relative to the train.
His dog also starts at the left end and runs to the right end, back to the conductor, back to
the right end, etc. at speed vd > vc relative to the train. The conductor has an extremely
accurate biological clock to measure the dogs age at both ends of the train. According to
the conductors clock, how much does his dog age?
Solution: The first thing to realize is that the station frame is irrelevant. The second
thing to do is to translate this as a physics question.
Q: What is the question?
A: It is, how much time elapses in the dogs reference frame?
Q: How can we obtain this?
A: We could find the time in the trains reference frame and then use the fact that
the dogs clock runs slowly. Its essentially the twin paradox with the dog replacing the
space-traveling twin and the train replacing the Earth.
Q: In the trains reference frame, how much time elapses?
A: Considering the conductors path, it is L/vc , so the time elapsed in the dogs frame is
p
L 1 vd2
L
=
.
(26)
vc
vc
Q: Does it matter that the conductors frame is not the same as the trains frame?
A: No. The clock is moving with respect to the train , but its measurements are perfectly
valid spacetime events in the train frame. Of course, we could solve the problem in the
conductors frame too, but it would be more complicated.
Review
4-vectors
(1)
(2)
when you switch between reference frames with relative velocity v. This is analogous to
regular vectors in two and three dimensions. For example, in two dimensions the position
vector (x, y) transforms as
x0 = cos x sin y
y 0 = sin x + cos y
(3)
(4)
when you rotate your coordinates by an angle . Then velocity and acceleration are vectors
because the same rule works if you replace (x, y) by (vx , vy ) or (ax , ay ). In contrast, (Moose,
Squirrel) is not a 4-vector, since when you rotate your coordinates you do not get (Moose
cos + Squirrel sin ...)
We can divide both sides of the above equations by some number that does not depend
on the reference frame to get an identical transformation law for A/, which shows that A/
is a 4-vector. This seems a little useless it tells us that 2A is also a 4-vector. However, let
x( ) = (t, x) be the 4-vector describing a particles spacetime coordinates in a laboratory
frame as a function of the particles proper time . Then x( + d ) x( ) is also a 4-vector
for any d . Now d is the same in every frame it is defined that way so we can choose
= d and take the limit 0 to get that
dx dt
x
= (1, v)
v . =
d
dt d
(5)
(6)
conserved in simple collisions. Thus heuristically it seems like they ought to be the enrgy
and momentum, so that P = (E, p). More on that later.
Just as t2 x2 is invariant as a consequence of how Lorentz transformations work, the
same must be true for any 4-vector:
A20 A2 = constant.
(8)
2
= m2 .
2
(9)
This is actually just a special case of the invariance of the 4-vector inner product. Defining
A B A0 B0 A B,
(10)
one can show that A B = A0 B 0 is the same in any inertial reference frame. Then for
energy-momentum we have
P 2 P P = m2 .
(11)
For a single particle this is a useful algebraic relation between E, p, and m, but we can use
it for the total momentum of a system, since the sum of 4-vectors is a 4-vector. Then we get
the mass of the system
p
2
,
(12)
m = Ptot
which says that the mass of a system is the energy in its zero-momentum frame (P0 = m
when v = 0 = 1). Said another way, mass is defined as the energy something has
just for existing, and not for doing anything. The unsettling consequence of this is that the
mass of a system of particles is not simply the sum of the masses of its constituents. If this
doesnt bother you, you are the most sanguine physics student in history. Lets try to make
it bother you less.
Q: Basically the problem is that you think of mass as essentially stuff. Masses should
add together in the same way that 4 ducks and 5 ducks make 9 ducks (or 4 atoms plus 5
atoms etc). How do you get around this?
A: You skirt the issue completely by stating that stuff does not exist! On a subatomic
level there is no such thing as a solid lump of stuff, just point particles and empty space
or fuzzy wavefunctions, depending on your point of view. Thus there is no sense in which
two electron lumps make a lump thats twice as big. So we have gotten over this issue,
with the price that now we realize that in some sense only abstract mathematical objects,
notthings, actually exist.
Now maybe it bothers you that something has energy just by existing, without doing
anything. For a composite particle its not a big deal, since in the zero-momentum frame
of a proton there are a lot of internal motions of the quarks, hence the proton is doing
something. But for a fundamental particle like an electron, it seems weird that something
has energy in the absence of dynamics.
2
1.2
Lagrangian
As discussed a few weeks ago, the fundamental way to define energy and momentum is
as the quantities that are conserved by virtue of translation invariance in time and space,
respectively. This requires a Lagrangian and Noethers Theorem. Since the laws of motion
follow from stationary action and need to be the same in all inertial frames, we hope that the
action itself is frame-independent. For little reason other than we dont have many invariant
quantities to work with and that the Lagrangian ought not to depend explicitly on position,
we take
Z
Z
Z
2
2
dt dx = m
1 x 2 dt.
(13)
S = m d = m
This is your first taste of writing down a Lagrangian that satisfies certain symmetries (in this
case, Lorentz invariance) with no other justification. This method isnt as much of a blind
guess as it seems for very deep reasons that you will learn about in quantum field theory.
If it strikes you as unsatisfying now, without the perspective of QFT, I dont blame you.
What we have done is to toss out all the physics we have learned except the idea that maybe
Lagrangians are a good way of describing things and the need for Lorentz invariance. Then
we tried to write down the simplest possible Lorentz-invariant Lagrangian and hoped that the
consequences would match experimental evidence. This is not a derivation of the relativistic
Lagrangian in the sense that you start with some laws of physics and work backwards in
logical steps until you cast it all in the form of a Lgrangian. In some sense all we did was
hope that the simple Lagrangian was actually correct. This works in physics unreasonably
often. Sometimes (always?) there is a deep principle in the background and its not actually
voodoo.
Anyway, the Lagrangian has the desired invariance properties under translations in space
i
i
X mx 2
i + m 1 x2
=
1 x2
i
= mv 2 + m/
= (m/)(1 + v 2 2 )
v2
= (m/) 1 +
1 v2
1 v2 + v2
m 2
= (m/)
= m
=
1 v2
(14)
(15)
(16)
(17)
(18)
(19)
and
L
= m x i
(20)
x i
These are the energy and momentum that we found earlier. The relativistic Lagrangian has
now served its purpose.
pi =
2
2.1
Problems
Pair Formation
Problem: Two photons of energy E1 and E2 collide at an angle . What is the minimum
possible E1 to produce of two particles of mass m?
Solution: Instead of immediately crunching a lot of kinematics, lets make our task easier.
Q: What is the least possible total energy of the two emitted particles? Hint: their
total momentum is fixed by momentum conservation, so the question is: given p, how do we
minimize E?
A: For fixed total momentum, energy is minimized by having the least possible internal
dynamics of the two-particle system, that is, the least possible energy in the zero-momentum
frame. This is achieved if the two particles have the same velocity. Then in this best-case
scenario, we essentially produce one particle of mass 2m. Let P be the 4-momentum of this
fictitious particle. Then
1 + 2 = P,
(21)
where 1(2) are the photon 4-momenta.
Q: What are these?
A: Since E = |p| for photons, we can choose coordinates where
1 = (E1 , E1 , 0, 0)
2 = (E2 , E2 cos , E2 sin , 0)
4
(22)
(23)
Q: Now what?
A: Square both sides of 4-momentum conservation and use 2 = 0 to get
(1 + 2 )2 = P 2
21 2 = (2m)2
1 2 = 2m2
E1 E2 (1 cos ) = 2m2
2m2
E1 =
.
E2 (1 cos )
2.2
(24)
(25)
(26)
(27)
(28)
Photon Rocket
Problem: A rocket propels itself by emitting photons. When it starts from rest it has mass
m0 and eventually reaches 4-momentum (E, p) and then decelerates back to rest, again by
emitting photons. What is its final mass?
Solution: By conservation of momentum, the photons emitted during the acceleration
phase have total momentum p, and hence have 4-momentum (p, p). Thus 4-momentum
conservation gives
(m0 , 0) = (p, p) + (m1 , p)
m1 = m0 p,
(29)
(30)
where m1 is the mass after acceleration. By the same logic, the deceleration photons have
momentum p and carry off energy p, which is reflected in the rockets loss of mass, so
m2 = m0 2p.
2.3
(31)
Morin 13.11
Problem: Three particles go off at equal speeds v at angle 2/3 with respect to each other.
What is the angle between any two particles velocities in the rest frame of the third?
Solution: The strategy for problems like this is to (1) write out relevant 4-vectors, in this
case the velocity 4-vectors, in terms of unknowns in the desired frame; (2) take inner products
of 4-vectors in this frame and relate them to the desired unknown quantities; (3) calculate
the inner products in whatever frame is easiest. In the rest frame of particle 3 (frame 3),
V10 = 0 (1, v 0 , 0, 0)
V20 = 0 (1, v 0 cos , v 0 sin , 0)
V30 = (1, 0, 0, 0),
(32)
(33)
(34)
where v 0 and 0 are the speed and associated factor of particles 1 and 2 in frame 3. There
are only two independent unknowns, since ( 0 )2 = 1/(1 (v 0 )2 ). Next we take some dot
5
(35)
(36)
The second of these is really convenient since we have an expression for 0 without any
algebra. Using also the relation
(v 0 )2 =
1
(( 0 )2 1),
0
2
( )
(37)
we get
V10
V20
(V10
V30 )2
1
0 2
0
)
1
cos
V
(V
1 0
3
1
(V1 V30 )2
(38)
Now we exploit invariance of the inner product to replace the primed inner products by
unprimed ones. Furthermore, symmetry lets us say
V10 V20 = V1 V2 = V20 V30 = V2 V3 .
then we must compute V1 V2 and solve
1
2
2
= 1 2 1 cos
cos =
+1
(39)
(40)
(41)
So finally, what is the inner product? As in the tetrahedron problem from a problem set,
the sum of velocity 3-vectors is zero and the square of a velocity 4-vector is 1. Thus
(V1 + V2 + V3 )2 = 3V12 + 6V1 V2 = (3, 0, 0, 0)2
3 + 6 = 9 2
3 2 1
=
2
(42)
(43)
(44)
(45)
Hence
cos =
=
3 2 1
3 2 1
2
= 3 22+1
3 2 1
+
1
2
2
3
31+v 2
1
1v 2
1v 2
= 3+1v
2
3
+1
1v 2
1v 2
3 2 1
3 2 + 1
2 + v2
=
4 v2
(46)
(47)
Review
dp
dt
(1)
(2)
We learned a bit about relativistic rocket motion, which is essentially just conservation of
momentum. Relativistic strings are a just a matter of connecting objects with a constant
tension. That doesnt imply that your homework this week is easy, just that a review of
concepts wont be as useful as going straight to the problems.
2
2.1
Problems
Three Quarks
Problem: Three massless quarks are in the same place and go off with equal energies E at
equal angles from each other. They are connected by massless strings of tension T . What is
the period of the resulting motion?
Solution: The time it takes to stop, which for particles travelling at speed c = 1 is equal to
the distance it takes to turn around, is one quarter of a period. Thus the period is four times
the distance d from the center of an equilateral triangle of side length L to one of its vertices,
where L is the length of each string when the quarks first turn around. By conservation of
energy, we need 3E = 3LT , so L = E/T . Some geometry gives d = L/ 3, so
P =
2.2
4E
.
T 3
(3)
Laser-Driven Rocket
Problem: A rocket ship is driven, starting from rest, by a laser beam of power w shot from
Earth. The ship varies the fraction of photons it absorbs by spreading photon-absorbing
sails so that it experiences a constant acceleration g in its own frame. What is (t) according
to the Earths frame?
Solution: First, we need a plan of what information we need to calculate in what order.
Lets work backwards. To get (t), we need to know the rate at which the ship consumes
energy. To get energy consumption, we need to know its energy as a function of time. From
E = m, we need to know its speed v(t) and mass m(t) as a function of time. Since its
acceleration is known, in one frame at least, it seems like v(t) cant be too hard to obtain.
1
Perhaps m(t) will be a bit harder, but maybe once we have v(t) we will have a chance. So,
a loose plan is
1. Get v(t) somehow.
2. Get m(t) somehow.
3. Get E(t) and dE/dt straightforwardly.
4. Findthe rate of laser energy available to ship.
5. Take the ratio.
Q: To get you warmed up, tell me whats wrong with this argument:
The rapidity
Z t
is just the integral of acceleration in the ships frame, so (t) =
mg dt0 = mgt, so
0
v(t) = tanh(mgt).
A: The problem is that the upper limit of integration that defines rapidity is the ships
proper time , not the Earth time t. And you cant just slap on a factor because the speed
is not constant.
Heres an efficient way to get v(t). First, convince yourself that if you know the acceleration as a function of time, then you know the velocity as a function of time, independent
of the mass. Thus to get v(t), we can pretend we have an object of constant mass m that
feels acceleration g in its own frame. Then the force is F = mg in its frame, and since the
relative vecloity of the frames is collinear with the force, this is also the Earth frame force.
Now apply F = dp/dt to get
F = mg =
Hence
dp
p = mgt = mv (t)v(t) = gt.
dt
v(t)
gt
p
= gt v(t) = p
.
1 v 2 (t)
g 2 t2 + 1
(4)
(5)
(6)
(7)
m0
1 v(t)
(8)
(9)
Finally, we divide this by the rate at which energy reaches the ship in the Earth frame.
This is not simply w.
Q: Why?
A: Because the laser beam travels at speed c = 1, while the ship travels at speed v(t).
Thus energy reaches the ship at a reduced speed w(1 v(t)), so we have
3
p
dE/dt
m0 dv/dt
m0 g
2 t2 + 1
g
.
(10)
(t) =
=
gt
+
=
w(1 v(t))
w (1 v(t))3
w
2.3
Unfortunately, it seems that we must have a double integral over time, since different times t1
and t2 interact. We no longer have an action expressed as the time integral of a Lagrangian.
Worse, by making the interaction Lorentz-invariant, we have lost the Lorentz invariance of
the action, since dt1 and dt2 are not invariant. To fix the first problem, we could say that we
only want an interaction that applies when t1 = t2 , but if we need the condition t1 t2 = 0
to hold in every frame, we also need x1 x2 = 0, so that we need something like
Z
L(0, 0 (if x1 = x2 )
Sint =
(13)
0 (otherwise)
In addition to our queasiness with the discontinuity inherent in having an interaction that
suddenly turns on for particles that are in exactly the same place, we still dont have Lorentz
invariance since dt is not invariant.
3
Z
Q: What could we do instead of
matrix?
A: the quantity dt dx dy dz is Lorentz invariant,
time. To verify this, recall that
0
dt
dt
dt0
0
0
0
0
dx0
dt dx dy dz = dt dx dy dz dt
dy
0
dt
dz
dy 0
dt
dy 0
dx
dy 0
dy
dy 0
dz
dz 0
dt
dz 0
dx
dz 0
dy
dz 0
dz
(14)
Because the Lorentz transformation is linear, the matrix of derivatives (the Jacobian) is
simply the matrix of Lorentz transformation coefficients. For boosts along the x-axis, this
comes outs to:
v
0
0
v
0
0
= 2 (1 v 2 ) = 1.
(15)
0
0
1
0
0
0
0 1
Hence dt0 dx0 dy 0 dz 0 = dt dx dy dz. The interaction part of the action now looks like
Z
(16)
(17)
t2 x2 y 2 z 2
is invariant. Then one simple action for a relativistic field is
2
Z
2 2 2 4
2
S = (x) +
2 2 2 d x.
t2
x
y
z
2.4
Relativistic Bucket
(18)
Review
Tensors
By explicit calculation, one finds that the angular momentum of an object is related to its
angular velocity by
L = I,
(1)
where I is a symmetric 3x3 matrix that depends on the geometry of the object and with
respect to which point L is measured. In general, you calculate it by doing tedious integrals.
Okay, I lied a little. Technically, I is a tensor which is represented as a matrix when we
choose a particular coordinate system. Just as the vector one meter in length that points in
front of you could be (3, 3, 1) or (4, 6, 7) etc, but always points in front of you, a tensor has
an abstract meaning independent of any coordinate system. Lets explore what makes it a
tensor. One definition of a vector is that it transforms in a certain way. In special relativity
we saw that 4-vectors transform by Lorentz matrices when we change coordinate systems.
Similarly, a 3-vector is an object that transforms via rotation matrices, which I assume most
of you have encountered before. That is, for each rotation there exists a matrix R such that
X
x0 = Rx, x0i =
Rij xj
(2)
j
relates the vectors coordinates in the original and rotated frames. Now L and are vectors,
and
X
L = I, Lj =
Iij j
(3)
j
(4)
(5)
(6)
(7)
We can actually do a little better using the fact that rotation matrices leave angles and
magnitudes invariant, and hence do not affect the inner product. This is analogous to the
4-vector inner product of special relativity, which normal human beings learn about after
(8)
(9)
a R Rb = a b
(10)
R R = Id3x3
(11)
R1 = RT
(12)
Hence
I0 = R1 IR
(13)
= R IR
(14)
0
Ijk
= RT IRjk =
RT jm Imn Rnk
(15)
m,n
(16)
m,n
This looks like two separate matrix multiplications by the rotation matrix, one for the
first index and one for the second. This transformation defines a tensor. If you have ever
encountered the tensor product in your math classes, this definition is equivalent, for suppose
you have a tensor product x = v w, xij vi wj . Then to get its value under a change of
basis, you would transform the two vectors separately:
X
X
Rim Rjn xmn
(17)
Rim Rjn vm wn =
x0ij = vi0 wj0 =
m,n
m,n
1.2
Mathematically, the existence of three orthogonal principal axes, where angular momentum
and angular velocity are parallel to one another, follows from the spectral theorem, which
says that any real symmetric matrix (or complex matrix equal to its own conjugate transpose,
or under certain conditions, self-adjoint infinite-dimensional linear operators) has a complete
set of eigenvectors. The unintuitive existence of non-principal axes follows from the fact that
not everything is an eigenvector! Now lets try to understand things more physically. What
follows is my personal interpretation, which I hope you will find helpful.
Since angular momentum is conserved while angular velocity is not, you can think of
as a temporary conditions that holds only as long as rotation around a certain axis is
imposed. A system may want to rotate around one axis based on its angular momentum
but may have to rotate around another axis, for example if a rod is stuck through it. If
the rod is removed instantaneously, L does not change but certainly may. Thus, at least
2
one of the angular velocities is not parallel to the angular momentum. Consider just about
the simplest example: a globe. Suppose the globe is a perfect sphere and has a rod going
through the (geographic, not magnetic) poles. By symmetry the rod is a principal axis. Now
lets ruin this by adding two really big identical mountains at antipodal points, for example
Lake Baikal and Tierra del Fuego.
Q: How do these two mountains want to move?
A: You know in your heart that centrifugal forces will try to push the mountains toward
the equator, which the globe could achieve by rotating. If the rod were removed, that is
exactly what the globe would try to do. Thus we see a tangible case where a torque has to
be applied to maintain the direction of angular velocity.
Now suppose these two mountains are in the x z plane, os that their coordinates are
(x, z) and (x, z). Then their contribution to the inertia tensors off-diagonal element Ixz
is mxz + m(x)(z) = 2mxz. The off-diagonal element is non-zero, which represents that
these are not principal axes.
Q: How could we locate the mountains so that the off-diagonal inertia tensor elements
cancel?
A: If the mountains were related by reflection symmetry, that is, at the same latitude but
opposite longitude, the contribution would be mxz + m(x)z = 0. In fact, if we had a lot of
mountains, but every mountain at (x, y, z) had a clone at (x, y, z), the off-diagonal element
would vanish. Physically, our centrifugal force argument doesnt work because you cant
move them both closer to the equator simultaneously, so they remain at equal distances
from the equator. The off-diagonal elements of the inertia tensor cancel to zero due to
symmetry. We can turn this around and say The off-diagonal elements of the inertia tensor
measure the amount of asymmetry of an object. We can further say that the principal axes
define rotational motion where all the centrifugal forces balance each other.
1.3
The proof that angular velocities add is quite simple. Lets go over it.
Theorem:Suppose object A is rotating around some point R with angular velocity A
and object B is rotating with angular velocity B with respect the same point R on object
A. Then object B is (instantaneously) rotating around R with angular velocity A + B .
Proof: When object B is at point r, it is moving with velocity B (r R) in the rest
frame of object A. The same point r on object A has velocity A (r R). Since relative
velocities simply add, the net velocity is (A + B ) (r R), which describes rotational
motion with angular velocity A + B .
Morin first discusses Chasles theorem because we had to know that the most general
motion of a rigid body is a translation plus a rotation, which means that the only instantaneous motion about some fixed point is a rotation. This let us say that the velocity (A + B ) (r R) must equal (r R) for some ,which lets us conclude that
= A + B .
One final note, this only applies to instantaneous angular velocity, which in general is
always changing. It does not mean that the motion over time is a simple rotation!
1.4
Often the princiapl axes can be guessed based on instinct. If you dont have instinct, or
if you used to but relativity made it feel unloved and now its sulking, there is a helpful
theorem.
Q: What are the principal axes of an airplane propeller or a fan ?
A: Because these have N -fold rotational symmetry, the main rotational axis and any two
perpendicular directions are principal axes. The fact that the blades are sloped and do not
lie in a plane does not affect this.
For an asymmetric object, you need to diagonalize the inertia tensor by finding its eigenvectors, which you will not have to do in this course.
Q: What are the principal axes of a flat-head screw? What about a Phillips head screw?
A: A flat-head screw only has 2-fold symmetry, so our theorem doesnt quite apply. A
Phillips head screw has 4-fold symmetry on top, so the theorem applies, but the shaft does
not quite have rotational symmetry. It has helical symmetry , but it is not symmetric under
rotations about the shaft. To see this, consider the point on the bottom of the screw where
the thread ends. There is only one such point, and it moves when you rotate the screw.
Q: How could you design a screw for extreme precision drilling?
A: You could have N 3 separate threads of the screw winding up the shaft in parallel.
Then the shaft would have N -fold symmetry, so that the rotation induced by drilling would
be a princiapl axis.
Q: What are the principal axes of a football?
A: A football has fourfold symmetry about its main axis, so the long axis and any two
perpendicular axes are princiapl axes.
Q: What about if you take into account the laces of the football?
A: The long axis should still be a principal axis by the centrifugal force argument,
the laces are already centered around the equator so they cant get pushed out any farther.
Similarly, an axis throught the middle of the laces ought to be a principal axes. Finally,
the axis perpendicular to these two is principal. If you dont like the centrifugal point of
view, let the long axis be z and let the axis throught the center of the laces be x. Then the
football is symmetric under x x and y y, so all the off-diagonal integrals cancel,
for example xz + (x)z, xy + x(y), etc.
2
2.1
Problems
Problem 1: Plausibility of Tennis Racket Theorem
Problem: Note: this motivates a theorem that will be demonstrated more rigorously, and
with a demonstration, in lecture. Suppose an object subject to no external torque has
4
principal axes with moments I1 < I2 < I3 . Show that rotations around axes 1 and 3 are
stable, while rotations around axis 2 are (probably) unstable, in the sense that a rotation
that at some instant has nearly parallel to axis 2 will not stay near.
Solution: Use conservation of E and L2 . In terms of the principal moments,
L = I = (I1 1 , I2 2 , I3 3 )
(18)
(19)
2E = L = I1 12 + I2 22 + I3 32
(20)
(21)
so
and
k6=j
Taking j = 1, 2, 3 gives
2EI3 L2 = (I3 I1 I12 )12 + (I3 I2 I22 )22
2EI2 L2 = (I2 I1 I12 )12 + (I2 I3 I32 )32
2EI1 L2 = (I1 I2 I22 )22 + (I1 I3 I32 )32 .
(22)
(23)
(24)
(25)
(26)
(27)
Q: So what?
A: These equations describe ellipses in the 1 2 and 3 2 planes and a hyperbola
in the 1 3 plane. Thus if 1 and 2 start out small (for a rotation nearly about axis 3),
they remain small. Similarly if 3 and 2 start small they remain small. However, we cant
say the same if 1 and 3 start small.
2.2
Problem: A sphere of radius R and angular momentum L absorbs cosmic microwave background photons of momentum p at a rate of n per second. The photons come from all
directions with equal probability. For small times t what is the expected angle by which the
angular momentum vector has rotated?
5
L2y
= R , R = (Rx , Ry )
N
X
(Li,x , Li,y ),
(29)
(30)
i=1
where Li,x is the random variable describing the angular momentum carried by the ith photon
and there are N photons.
Q: What can we say about the random variable R?
A: It is the sum of a large number of identical independent random variables and thus
the central limit theorem tells us that it is essentially normally distributed. This is great
because we only need to know its mean and variance in order to know everything!
Q: What is the mean of (Rx , Ry )?
A: By isotropy, the means are zero.
To calculate the variance, let 2 be the variance of the single photon angular momentum
components Li,x(y) . Then the variance of the total Rx is N 2 (the variance of the sum is
the sum of the variance for sums of independent random variables. Then the probability
distribution for Rx (and also Ry ) is
f (x) =
ex
2 /(2N 2 )
.
2N 2
Then we can calculate the expected by integrating over these ditribution functions:
1 q
hi = h Rx2 + Ry2 i
LZ Z
1
1 p 2
1
2
2
2
2
=
x + y2
ex /(2N )
ey /(2N ) dx dy
L
2N 2
2N 2
Z 2 Z
1
1
2
2
=
rer /(2N ) r dr d
2
L 2N 0
0
Z
1
2
2
=
r2 er /(2N ) dr
2
N L 0
r
1
=
(N 2 )3/2
2
N L
2
r
r
1/2
=
N =
(nt)1/2
2L
2L
6
(31)
(32)
(33)
(34)
(35)
(36)
(37)
Note that the average deviation grows as the square root of time. This is typical of random
wals and other random/diffusive processes. Now we just need the variance of the Lx imparted
by a single photon. Since the mean of this is zero, the variance is
2 = hL2i,x i
(38)
Q: Can we exploit symmetry to reduce or eliminate the amount of trig we will have to
do?
A: By symmetry,
2
hL2i,x i
1 2
1
p2 2
1 2
2
2
2
= hLi,x + Li,y + Li,z i = hL i = h(bp) i = hb i,
3
3
3
3
(39)
where b is the impact parameter of the incident photons. The impact parameter b is geometrically just the polar coordinate r in the cross-sectional circle, so
R 2 R R 2
r r dr d
2
hb i = 0 0 2
= R2 /2.
(40)
R
Thus
R 2 p2
=
6
2
and
r
hi =
1
2L
R 2 p2
pR
(nt)1/2 =
(nt/3)1/2 .
6
2L
(41)
(42)
x M Mx = x x
N
0 1
2
I+
+ O( )
1 0
N
0 1
= I+
+ N O(2 )
1 0
n
X
a
0 1
(6)
=
n! 1 0
n=0
where matrix exponentiation is defined by the usual Taylor series for exponential functions, applied to a matrix.
The powers of the matrix are actually simple to handle
because of the nice property
M M=I
(5)
0 1
1 0
2
= I
(7)
(2)
so that
where I is the (2 2) identity matrix. M = I is a solution, the trivial rotation of doing nothing at all. Thus we
restrict ourselves for now to the seemingly modest goal of
finding small rotations, that is, we look for M = I+M1 ,
where is infinitesimal. Substituting this into Eq. (2),
we obtain
(I + MT
1 )(I + M1 ) = I
2
I + (MT
1 + M1 ) + O( ) = I
2
MT
1 + M1 = 0 + O( )
0 1
M1
+ O(2 )
1 0
(3)
Hence we obtain
(8)
LORENTZ TRANSFORMATIONS
M=I+
a4
a2
M=I 1
+
...
2!
4!
a
a3
a5
0 1
+
+
...
1 0
1!
3!
5!
0 1
= I cos(a) +
sin(a)
1 0
cos a sin a
=
sin a cos a
0 1
1 0
+ O(2 )
(4)
As you know, the composition of two linear transformations is given by matrix mulitplication, so for any N ,
(9)
2
where = diag(1, 1, 1, 1). We again have very sensible reasons to want the transformation to be linear, so
for x0 = x, we require
(x)T (x) = xT x
T
x x = x x
x
x
T = I
(10)
The question we need to ask is how many free parameters are there that determine ? Its a 4 4 matrix
with 16 entries, but these are related to each other by
Eq. (10). How many of these are we free to choose, i.e.
how many different types of Lorentz transformations are
there?
As before, the identity is clearly among the solutions
to Eq. (10). Thus we can look for small Lorentz transformations = I + 1 . The claim is that the number
of free parameters for small transformations is equal to
the number of free parameters for arbitrary transformations. Intuitively, this makes sense, because a Lorentz
transformation at a very large speed should equal the
product of many successive Lorentz transformations at
low speeds. However, we can also prove it using the preimage theorem (see e.g. Guillemin and Pollack, Differential Topology pp. 20-23). With the theorem, we would
use the fact that is the inverse image of regular value
of a smooth map to show that the set of all possible is
a manifold. Since a manifolds dimension (which equals
the number of free parameters) is the same everywhere,
we can evaluate the dimension locally near I.
In analogy to Eq. (3), we find
(I + T
1 )(I + 1 ) = I
2
I + (T
1 + 1 ) + O( ) = I
2
T
1 + 1 = 0 + O( )
(11)
If we actually write out the elements of 1 explicitly as
a b c d
e f g h
1 =
(12)
i j k l
m n o p
then Eq. (11) becomes
a e i m
a
b f j n e
c g k o = i
d h l p
m
b
f
j
n
c d
g h
k l
o p
(13)
1 0
0
0
0 1
0
0
xy =
(14)
0 0 cos sin
0 0 sin cos
The first row and first column of affect time, so transformations involving non-zero b, c, d are something new.
Lets look at the transformation e = b 6= 0, all other parameters equal zero. Then the same reasoning as above
gives
0 1 0 0
1 0 0 0
(15)
= exp
b 0 0 0 0
0 0 0 0
Now this matrix that we are exponentiating has the nice
property
0
1
0
0
1
0
0
0
0
0
0
0
2
1 0
0
0 1
0
=
0 0
0
0 0
0
0
0
0
0
0
0
0
0
(16)
1 0 0 0
0 1 0 0
b4
b2
+
...
1+
=
0 0 0 0
2!
4!
0 0 0 0
0 1 0 0
1 0 0 0 b
b3
b5
+
+
+
...
0 0 0 0 1!
3!
5!
0 0 0 0
0 0 0 0
0 0 0 0
+
0 0 1 0
0 0 0 1
cosh b sinh b 0 0
sinh b cosh b 0 0
=
(17)
0
0
1 0
0
0
0 1
which is just the identity when acting on the y and z
coordinates. Acting on x and t, it gives
t0 = t cosh b + x sinh b
x0 = t sinh b + x cosh b
(18)
(1)
f 2 (t) dt = 1.
(2)
Solution
(3)
F (x) dx s.t.
0
(F 0 (x)) dx = 1.
(4)
Jeffreys Method
f = g
(5)
For our purposes, the best way to recast this is to switch to vector notation (x1 , x2 . . .) x
and write that if a maximum occurs at x, then for a change x and small constant ,
f (x + x) f (x) = (g(x + x) g(x)) + O(2 )
f (x + x)
= g(x + x)
=0
=0
Generalizing this to the case of a functional, if F (x) maximizes
Z 1
V [f ]
F 2 (x) dx
0
(6)
(7)
(8)
subject to
Z
G[f ]
(F 0 (x)) dx = 1,
(9)
we must have for some change F (x), small constant , and some constant :
V [F + F ]
= G[F + F ]
=0
=0
Z 1
Z 1
2
2
0
0
(F (x) + F (x)) dx
=
(F (x) + F (x)) dx
0
0
=0
=0
Z 1
Z 1
2F (x)F (x) dx =
2F 0 (x)F 0 (x) dx
0
Z 1
Z0 1
F (x)F (x) dx =
F 00 (x)F (x) dx
0
00
F (x) = F (x).
(10)
(11)
(12)
(13)
(14)
The second-to-last line followed from integration by parts with boundary conditions F (0) =
F (1) = 0 and the logic of the last step was that the integrals are equal for any F (x) that
satisfies the boundary conditions, hence the things multiplying F (x) within the integral
must be equal for all x. We conclude from the differential equation that for some constants
a and b
F (x) = a sin(bx).
(15)
Aleksandars Method
Consider the enlarged functional acting on a function F (x) and a real number :
Z 1h
i
2
2
0
U [F, ]
F (x) + (F (x)) dx
(16)
(17)
Note that is still unknown since we have not yet found the solution F (x), however, in
principle we know that F (1) is some constant. Think of it this way: we are finding a global
minimum over the sapce of all functions F (0), which will certainly also be a minimum over
the subset of functions with F (1) = .
We will now show that maximizing U [F, ] is equivalent to maximizing V [F ] with the
constraint. Thus we will replace a constrained problem with an unconstrained one, which
we can solve using the Euler-Lagrange equations. Suppose for some fixed we could find
some function F0 (x) that maximizes U [F, ] over all functions with the boundary conditions
F (0) = 0, F (1) = . That is,
U [F, ] U [F0 , ] F.
(18)
2
Note that F0 depends on . Because of this dependence, suppose we choose some particular
0 such that F0 satisfies the constraint
Z 1
2
(F00 (x)) dx = 1.
(19)
0
Since this F0 minimizes U [F, 0 ] over all functions with the boundary conditions, it certainly
minimizes U [F, 0 ] over all functions with the boundary conditions and with the integral
constraint. Thus for all functions F (x) satisfying the integral constraint adn boundary
conditions
Z 1
Z 1
Z 1
Z 1
2
2
2
0
2
(20)
0 (F00 (x)) dx
F (x) dx +
0 (F (x)) dx
F0 (x) dx +
0
0
0
0
Z 1
Z 1
2
(21)
F (x) dx + 0
F02 (x) dx + 0
0
0
Z 1
Z 1
2
F (x) dx
F02 (x) dx.
(22)
0
Z
The second-to-last line followed from the definition of the constraint:
(F 0 )2 dx = 1.
This last line explicitly says that F0 maximizes V [F ] subject to the constraint, as desired.
thus all we have to do is maximize the functional
Z 1h
i
2
U [F, 0 ] =
F 2 (x) + 0 (F 0 (x)) dx
(23)
0
with the boundary conditions F (0) = 0, F (1) = . But this just follows from the EulerLagrange equations, which are
0 F 00 (x) = F (x).
(24)
Thus we expect F (x) = a sin(bx).
(25)
(26)
we could simply divide f by z. Then the new function would satisfy the integral constraint
without affecting the value of the functional, the numerator and denominator of which would
both get divided by z. From our earlier work we know that f (x) cos(bx), so lets maximize
with respect to b. Thus we need
2
R 1 R x
d 0 0 cos(bt) dt dx
=0
(27)
R1
db
cos2 (bt) dt = 1
0
d 2b sin(2b)
=0
db b2 (2b + sin(2b)
(28)
You can evaluate this derivative (tedious without Mathematica) and plot it against b. The
graph strongly suggests that b = /2 is a solution. Plugging in b = /2 into the derivative,
we confirm that b = /2.
Finally, we obtain the prefactor by setting
Z 1
(a cos(x/2))2 dx = 1,
(29)
0
which gives a =
2. Hence
f (x) =
2 cos(x/2).
(30)
A Third Method
(31)
R1
F 2 (x) dx
V [f ] = R 1 0
.
0 (x))2 dx
(F
0
Then we need
#
R1
2
0 (F (x) + (x)) dx
R
1 (F 0 (x) + 0 (x))2 dx
0
(32)
= 0.
=0
(33)
(34)
(35)
Now since (x) can be anything, choose it to be a Dirac function (x) = (x y). If you
are not familiar with this function, look it up on Wikipedia. Its not complicated! Then we
have
Z
Z
1
F (y)
00
F 2 (x) dx = 0
F (x) dx + F (y)
0
(36)
Now as far as F (y) is concerned, the integrals over the dummy variable x are just some
constants, so we have
F (y) + 2 F 00 (y) = 0 F (y) = sin(y),
(37)
where
R1
=
R 01
0
F 2 (x) dx
F 0 (x)2 dx
R1
=
sin2 (x) dx
R1
2 0 cos2 (x) dx
2 sin(2)
= 2
.
(2 + sin(2))
This is equivalent to the algebraic equation we solved above.
(38)
(39)
(40)